Some of production-grade projects I’ve worked on across backend systems, data pipelines, AI integrations, and scalable platforms. Each project highlights real-world problem solving, system design, and engineering decisions.

API — Contact & Account Enrichment Platform

Overview

A production Rest API backend for contact and account enrichment, reverse lookups, and ICP-based lead generation, combining proprietary data with external enrichment providers.

Problem

Sales and marketing workflows depend on accurate contact and company data, but existing solutions are often fragmented, slow, and inefficient for bulk processing.

Solution

Built a scalable, API-first platform that aggregates internal datasets and third-party enrichment services to deliver real-time and batch enrichment. The system supports credit-based usage, rate limiting, and background processing for large workloads.

Read More

Key Features

  • Contact & company enrichment
  • Email & LinkedIn reverse lookup
  • ICP/persona-based lead generation
  • Credit-based billing system with usage tracking
  • Redis-backed rate limiting
  • Background processing for long-running jobs
  • S3-based file handling for large datasets

Key Highlights

  • Designed a metered API system with credit accounting and transaction logging
  • Integrated multiple external enrichment providers with retry and fallback mechanisms
  • Optimized performance using async processing and batch workflows
  • Built production-ready features like rate limiting, auditing, and scalable deployment

Links

https://www.icustomer.ai/platform/docs

Email Finder — Professional Email Generation & Validation Service

Overview

A microservice that generates and validates professional email addresses using name and domain inputs, optimized with caching to reduce external API usage and improve performance.

Problem

Finding accurate professional email addresses programmatically is unreliable and expensive due to repeated validation calls and lack of standardized patterns.

Solution

Built a lightweight service that generates multiple email patterns, validates them using an external provider, and caches results in a database to minimize redundant API calls. The system supports both containerized and serverless deployments.

Read More

Key Features

  • Email generation from name + domain inputs
  • Validation using third-party email verification service
  • PostgreSQL-backed caching to reduce repeated lookups
  • Input limiting to control request size and cost
  • Header-based API authentication
  • Deployable via Docker and AWS Lambda (serverless)

Key Highlights

  • Reduced external validation costs by implementing database-backed caching with reuse logic
  • Designed candidate email pattern generation system for higher success rates
  • Built a modular service separating API layer and business logic for maintainability
  • Enabled flexible deployment using containerized and serverless architectures

Links

—-

CRM & Platform Integrations — Unified Data Sync Service

Overview

Integration service that enables seamless data synchronization between internal databases and multiple external platforms, including CRMs, communication tools, and data warehouses.

Problem

Organizations often rely on multiple platforms (CRMs, email tools, data warehouses), but keeping data consistent across them requires manual effort or fragmented integrations, leading to delays and inconsistencies.

Solution

Built a unified integration layer that reads data from internal databases, transforms it using configurable mappings, and syncs it across multiple external platforms via API-based connectors. The system supports both real-time and batch workflows with flexible deployment options.

Read More

Key Features

  • Multi-platform integrations across CRM, communication, and data systems
  • Configurable field-mapping for transforming relational data into platform-specific formats
  • API-driven data sync workflows (push + fetch operations)
  • Support for both real-time and batch synchronization
  • Token-based API authentication
  • Serverless and containerized deployment support

Key Highlights

  • Designed a modular integration architecture to support multiple platforms without changing core logic
  • Built a flexible ETL pipeline using Pandas for mapping and transforming structured data
  • Integrated with multiple external systems including
    • Salesforce
    • HubSpot
    • Gmail, Google Drive, Google Calendar
    • Zoho CRM
    • Mailchimp
    • Snowflake
    • Formsort
  • Enabled serverless deployment (AWS Lambda via Mangum) for scalable and cost-efficient execution
  • Implemented structured logging and monitoring using AWS CloudWatch

MCP Server — AI Integration Bridge for Enrichment Tools

Overview

A FastMCP-based server that exposes enrichment and discovery capabilities as tools for AI assistants, enabling seamless integration with local AI workflows and automation pipelines.

Problem

AI assistants and automation tools lack direct access to structured enrichment and discovery systems, making it difficult to integrate real-time data into AI-driven workflows.

Solution

Built a lightweight MCP server that wraps enrichment and discovery APIs into structured, validated tools that can be consumed by AI assistants. The system supports multiple input formats and ensures secure, low-latency communication between local AI environments and external APIs.

Read More

Key Features

  • MCP-based tool exposure for AI assistants
  • Contact, email, phone, and LinkedIn enrichment
  • Bulk enrichment and discovery workflows
  • Persona-based discovery and profiling tools
  • Taxonomy utilities for standardized inputs (industry, geography, company size)
  • API-key based authentication (header + environment fallback)

Key Highlights

  • Designed typed input validation using Pydantic to prevent invalid or wasteful API calls
  • Built async tool handlers for efficient request routing and response handling
  • Enabled integration with AI tools like
    • Claude Desktop
    • Cursor
  • Implemented flexible API-key handling for secure usage across local and integrated environments
  • Standardized inputs using taxonomy layers to improve consistency across enrichment workflows

Webhook Middleware — Reliable Event Fan-Out & Delivery Service

Overview

A lightweight Python service that receives webhook events and reliably forwards them to multiple destinations with logging, retry handling, and audit tracking.

Problem

Integrating third-party webhooks directly into internal systems can lead to failures, missed events, and poor visibility into delivery status, especially when multiple downstream services are involved.

Solution

Built a middleware layer that captures incoming webhook payloads, logs them for auditability, and forwards them to multiple configured endpoints. The system ensures reliable delivery through retry mechanisms and detailed logging of each request and response.

Read More

Key Features

  • Webhook ingestion and payload parsing
  • Fan-out delivery to multiple destination endpoints
  • SQLite-backed logging for request/response tracking
  • Retry mechanism for failed deliveries
  • Structured logging for debugging and observability
  • Containerized deployment using Docker

Key Highlights

  • Designed a reliable fan-out architecture to decouple webhook providers from internal systems
  • Implemented persistent audit logging for full visibility into delivery status
  • Built retry and failure tracking to improve delivery reliability
  • Kept the system lightweight and portable using minimal dependencies and SQLite

CAPI Integration — Serverless Event Tracking & Attribution Pipeline

Overview

A serverless event-tracking pipeline that captures user interactions and booking events, processes and hashes sensitive data, and forwards conversion events to marketing and analytics platforms.

Problem

Tracking user behavior and conversions across multiple platforms (web apps, booking systems) is fragmented and often unreliable, leading to inaccurate attribution and incomplete analytics.

Solution

Built a centralized event pipeline that captures frontend interactions and webhook events, enriches them with tracking data, applies privacy-safe transformations, and forwards them to external analytics and marketing systems while storing raw data for reporting.

Read More

Key Features

  • Event tracking from frontend interactions and booking webhooks
  • UTM, click ID, and fingerprint-based enrichment
  • SHA-256 hashing of sensitive user data for privacy compliance
  • Conversion tracking via Meta Conversions API
  • Data storage in BigQuery for analytics and reporting
  • Email notifications for new form submissions
  • Serverless deployment using AWS Lambda

Key Highlights

  • Designed a privacy-first tracking system with secure hashing of PII before external transmission
  • Built a unified event pipeline combining frontend tracking and backend webhook ingestion
  • Integrated with
    • Meta Conversions API
    • Google BigQuery
    • Zoho Bookings
    • Bubble.io
  • Enabled accurate marketing attribution using UTM parameters and click identifiers (fbclid, gclid)
  • Built as a Lambda-compatible containerized service for scalable deployment

Unified Company Enrichment — LLM-Powered Data Intelligence Service

Overview

A FastAPI-based service that aggregates multi-source company data and uses LLM-driven processing to generate structured, enriched company profiles for analytics and research workflows.

Problem

Company data from multiple sources is often fragmented, inconsistent, and difficult to use directly for analysis or decision-making.

Solution

Built an end-to-end enrichment pipeline that ingests raw data from multiple sources, normalizes and merges records, and applies LLM-based processing to generate structured, high-quality company insights accessible via API endpoints.

Read More

Key Features

  • Multi-source data ingestion and consolidation
  • Domain-based normalization and record merging
  • LLM-powered extraction of structured company insights
  • API endpoints for enrichment and research queries
  • Per-domain data persistence and aggregation
  • Configurable enrichment workflows with logging

Key Highlights

  • Designed a hybrid pipeline combining deterministic data merging with LLM-based parsing
  • Built scalable enrichment workflows with async orchestration and batch processing
  • Enabled both automated enrichment and ad-hoc research queries via API endpoints
  • Implemented structured data outputs for downstream analytics and reporting
  • Containerized the service for flexible deployment

Data Quality Microservices — Validation & Standardization Platform

Overview

A modular FastAPI-based platform providing a suite of data validation and standardization services for contact, company, and metadata quality across data pipelines.

Problem

Raw business data often contains inconsistencies, invalid entries, and mismatched formats, leading to errors in downstream systems like analytics, enrichment, and CRM workflows.

Solution

Built a collection of independent microservices that validate, clean, and standardize different types of data (e.g., phone, email, company, address, industry, revenue) using rule-based logic and curated taxonomies, exposed via API endpoints for seamless integration into pipelines.

Read More

Key Features

  • Validation services for phone, email, website, and address data
  • Standardization of job titles, company names, and metadata
  • Account matching and duplicate detection
  • Taxonomy-based mappings (industry, country, SIC codes)
  • Modular API design with independent service endpoints
  • Support for batch and pipeline-based processing
  • Containerized and serverless deployment support

Key Highlights

  • Built a modular microservices architecture with multiple independent validators under a unified API
  • Designed typed request/response models using Pydantic for strict validation and consistency
  • Integrated curated rule sets and taxonomies to improve data accuracy and standardization
  • Enabled flexible deployment using Docker and AWS Lambda (via Mangum)
  • Structured the system for scalable data pipeline integration and maintainability

Domain Validation Tool — Scalable Website Status & Quality Checker

Overview

A Python-based system that validates large volumes of domain names by checking availability, redirects, and content quality using concurrent processing and heuristic analysis.

Problem

Maintaining clean and reliable domain datasets is challenging due to inactive websites, redirects, and parked domains, which can negatively impact downstream processes like enrichment, outreach, and analytics.

Solution

Built a bulk validation pipeline that processes domain lists concurrently, fetches and analyzes website responses, and classifies domains based on availability, redirects, and content signals. The system outputs structured results for further processing and review.

Read More

Key Features

  • Bulk domain validation using CSV-based input
  • Concurrent HTTP processing for high throughput
  • Detection of invalid, parked, or suspended domains
  • Redirect tracking with destination capture
  • Heuristic content analysis using keyword matching
  • Structured CSV output with detailed status and metadata

Key Highlights

  • Implemented multi-threaded processing to handle large domain datasets efficiently
  • Designed content-based validation heuristics to detect parked and inactive websites
  • Built robust error handling for timeouts, SSL issues, and connection failures
  • Optimized processing using chunked data handling to manage memory usage
  • Automated domain hygiene workflows for downstream data pipelines

Lead Generation Pipeline — Automated Contact Enrichment System

Overview

A FastAPI-based, serverless pipeline that generates and enriches contact data by combining web scraping, data standardization, and database ingestion for scalable lead generation.

Problem

Building high-quality lead lists manually is time-consuming and error-prone, especially when sourcing data from unstructured platforms like LinkedIn and search engines.

Solution

Developed an automated pipeline that takes ICP and persona inputs, discovers relevant contacts through web scraping, standardizes and enriches the data, removes duplicates, and stores validated leads in a structured database for downstream use.

Read More

Key Features

  • ICP and persona-driven lead generation
  • Web scraping and contact extraction from search results
  • Data standardization for names, job titles, and companies
  • Deduplication using unique identifiers (e.g., LinkedIn URLs)
  • Automated email generation for contacts
  • Batch processing with concurrent execution
  • Database ingestion and job status tracking

Key Highlights

  • Built a concurrent scraping pipeline using thread-based parallelism for faster data collection
  • Integrated external services for data standardization and enrichment
  • Designed a multi-stage processing flow (scraping → filtering → normalization → ingestion)
  • Implemented deduplication and validation to improve lead quality and reliability

ID Resolution Service — Entity Matching & Deduplication Platform

Overview

A service that standardizes and resolves contact and company data by generating deterministic unique identifiers and performing large-scale deduplication across datasets.

Problem

Data from multiple sources often contains duplicates and inconsistencies, making it difficult to identify unique entities and maintain reliable datasets for analytics, enrichment, and CRM systems.

Solution

Built an ID resolution pipeline that standardizes raw data, generates deterministic UUID-based identifiers, and applies configurable matching rules to group duplicates and produce unified “golden” records for downstream systems.

Read More

Key Features

  • Contact and company data standardization
  • Deterministic UID generation (UUID-based)
  • Configurable match-rule engine for deduplication
  • Generation of match tables and golden records
  • BigQuery integration for large-scale data processing
  • API endpoints for ingestion, processing, and UID generation
  • Credential management and integration setup for data sources

Key Highlights

  • Designed a deterministic identity system ensuring consistent IDs across multiple data sources
  • Built a rule-based matching pipeline to cluster duplicate records and generate unified datasets
  • Integrated with
    • Google BigQuery
    • PostgreSQL (metadata and run tracking)
  • Enabled scalable processing by combining API orchestration with data warehouse execution
  • Implemented modular pipeline components for standardization, matching, and ingestion workflows

Snowflake Native Application — Customer Data Platform & Identity Resolution

Overview

A Snowflake Native Application that unifies customer and company data across multiple sources, enabling identity resolution, audience segmentation, and analytics directly within the data warehouse.

Problem

Customer data is often fragmented across systems (CRM, marketing, sales), making it difficult to create a unified view for analytics, targeting, and decision-making.

Solution

Built a native application inside Snowflake that connects multiple data sources, standardizes and unifies records, and enables identity resolution and audience creation using SQL-driven workflows—all without moving data outside the warehouse.

Read More

Key Features

  • Multi-source customer data integration within Snowflake
  • Deterministic identity resolution for contacts and companies
  • Unified customer profiles across schemas
  • SQL-based audience segmentation and filtering
  • Built-in data standardization and cleansing
  • Role-based access control for secure data operations

Key Highlights

  • Designed a warehouse-native architecture eliminating the need for external data movement
  • Built identity resolution workflows to unify customer and company records across datasets
  • Enabled SQL-driven audience segmentation for analytics and activation use cases
  • Leveraged Snowflake’s RBAC and compute model for secure and scalable execution
  • Developed a native app experience integrating directly into the data platform

Web Scraping & Data Acquisition — Multi-Source Intelligence Pipeline

Overview

A scalable data acquisition system that extracts, processes, and structures data from multiple web sources to support enrichment, lead generation, and analytics workflows.

Problem

High-quality business and contact data is distributed across multiple platforms, often inaccessible in structured form and difficult to aggregate at scale.

Solution

Built a flexible scraping pipeline that collects data from various platforms, standardizes extracted information, and integrates it into downstream enrichment and data processing systems.

Read More

Key Features

  • Multi-source data extraction across business and contact platforms
  • Structured parsing and normalization of scraped data
  • Integration with enrichment and validation pipelines
  • Support for both batch and targeted scraping workflows
  • Error handling and fallback mechanisms for unstable sources

Key Highlights

  • Designed scraping workflows for platforms including:
    • LinkedIn (People & Company)
    • LinkedIn Sales Navigator
    • ZoomInfo
    • Lusha
    • Apollo.io
    • Google Maps
    • Bloomberg
    • Glassdoor
    • Dun & Bradstreet (Hoovers)
    • Datanyze
  • Built reusable scraping components for HTML parsing, data extraction, and transformation
  • Integrated scraping outputs into data pipelines for enrichment, validation, and storage
  • Handled challenges like dynamic content, anti-bot mechanisms, and inconsistent data formats