Some of production-grade projects I’ve worked on across backend systems, data pipelines, AI integrations, and scalable platforms. Each project highlights real-world problem solving, system design, and engineering decisions.

API — Contact & Account Enrichment Platform

Overview

A production Rest API backend for contact and account enrichment, reverse lookups, and ICP-based lead generation, combining proprietary data with external enrichment providers.

Problem

Sales and marketing workflows depend on accurate contact and company data, but existing solutions are often fragmented, slow, and inefficient for bulk processing.

Solution

Built a scalable, API-first platform that aggregates internal datasets and third-party enrichment services to deliver real-time and batch enrichment. The system supports credit-based usage, rate limiting, and background processing for large workloads.

Contact & company enrichment
Email & LinkedIn reverse lookup
ICP/persona-based lead generation
Credit-based billing system with usage tracking
Redis-backed rate limiting
Background processing for long-running jobs
S3-based file handling for large datasets

Key Highlights

Designed a metered API system with credit accounting and transaction logging
Integrated multiple external enrichment providers with retry and fallback mechanisms
Optimized performance using async processing and batch workflows
Built production-ready features like rate limiting, auditing, and scalable deployment

Email Finder — Professional Email Generation & Validation Service

Overview

A microservice that generates and validates professional email addresses using name and domain inputs, optimized with caching to reduce external API usage and improve performance.

Problem

Finding accurate professional email addresses programmatically is unreliable and expensive due to repeated validation calls and lack of standardized patterns.

Solution

Built a lightweight service that generates multiple email patterns, validates them using an external provider, and caches results in a database to minimize redundant API calls. The system supports both containerized and serverless deployments.

Email generation from name + domain inputs
Validation using third-party email verification service
PostgreSQL-backed caching to reduce repeated lookups
Input limiting to control request size and cost
Header-based API authentication
Deployable via Docker and AWS Lambda (serverless)

Key Highlights

Reduced external validation costs by implementing database-backed caching with reuse logic
Designed candidate email pattern generation system for higher success rates
Built a modular service separating API layer and business logic for maintainability
Enabled flexible deployment using containerized and serverless architectures

CRM & Platform Integrations — Unified Data Sync Service

Overview

Integration service that enables seamless data synchronization between internal databases and multiple external platforms, including CRMs, communication tools, and data warehouses.

Problem

Organizations often rely on multiple platforms (CRMs, email tools, data warehouses), but keeping data consistent across them requires manual effort or fragmented integrations, leading to delays and inconsistencies.

Solution

Built a unified integration layer that reads data from internal databases, transforms it using configurable mappings, and syncs it across multiple external platforms via API-based connectors. The system supports both real-time and batch workflows with flexible deployment options.

Multi-platform integrations across CRM, communication, and data systems
Configurable field-mapping for transforming relational data into platform-specific formats
API-driven data sync workflows (push + fetch operations)
Support for both real-time and batch synchronization
Token-based API authentication
Serverless and containerized deployment support

Key Highlights

Designed a modular integration architecture to support multiple platforms without changing core logic
Built a flexible ETL pipeline using Pandas for mapping and transforming structured data
Integrated with multiple external systems including
- Salesforce
- HubSpot
- Gmail, Google Drive, Google Calendar
- Zoho CRM
- Mailchimp
- Snowflake
- Formsort
Enabled serverless deployment (AWS Lambda via Mangum) for scalable and cost-efficient execution
Implemented structured logging and monitoring using AWS CloudWatch

MCP Server — AI Integration Bridge for Enrichment Tools

Overview

A FastMCP-based server that exposes enrichment and discovery capabilities as tools for AI assistants, enabling seamless integration with local AI workflows and automation pipelines.

Problem

AI assistants and automation tools lack direct access to structured enrichment and discovery systems, making it difficult to integrate real-time data into AI-driven workflows.

Solution

Built a lightweight MCP server that wraps enrichment and discovery APIs into structured, validated tools that can be consumed by AI assistants. The system supports multiple input formats and ensures secure, low-latency communication between local AI environments and external APIs.

MCP-based tool exposure for AI assistants
Contact, email, phone, and LinkedIn enrichment
Bulk enrichment and discovery workflows
Persona-based discovery and profiling tools
Taxonomy utilities for standardized inputs (industry, geography, company size)
API-key based authentication (header + environment fallback)

Key Highlights

Designed typed input validation using Pydantic to prevent invalid or wasteful API calls
Built async tool handlers for efficient request routing and response handling
Enabled integration with AI tools like
- Claude Desktop
- Cursor
Implemented flexible API-key handling for secure usage across local and integrated environments
Standardized inputs using taxonomy layers to improve consistency across enrichment workflows

Webhook Middleware — Reliable Event Fan-Out & Delivery Service

Overview

A lightweight Python service that receives webhook events and reliably forwards them to multiple destinations with logging, retry handling, and audit tracking.

Problem

Integrating third-party webhooks directly into internal systems can lead to failures, missed events, and poor visibility into delivery status, especially when multiple downstream services are involved.

Solution

Built a middleware layer that captures incoming webhook payloads, logs them for auditability, and forwards them to multiple configured endpoints. The system ensures reliable delivery through retry mechanisms and detailed logging of each request and response.

Webhook ingestion and payload parsing
Fan-out delivery to multiple destination endpoints
SQLite-backed logging for request/response tracking
Retry mechanism for failed deliveries
Structured logging for debugging and observability
Containerized deployment using Docker

Key Highlights

Designed a reliable fan-out architecture to decouple webhook providers from internal systems
Implemented persistent audit logging for full visibility into delivery status
Built retry and failure tracking to improve delivery reliability
Kept the system lightweight and portable using minimal dependencies and SQLite

CAPI Integration — Serverless Event Tracking & Attribution Pipeline

Overview

A serverless event-tracking pipeline that captures user interactions and booking events, processes and hashes sensitive data, and forwards conversion events to marketing and analytics platforms.

Problem

Tracking user behavior and conversions across multiple platforms (web apps, booking systems) is fragmented and often unreliable, leading to inaccurate attribution and incomplete analytics.

Solution

Built a centralized event pipeline that captures frontend interactions and webhook events, enriches them with tracking data, applies privacy-safe transformations, and forwards them to external analytics and marketing systems while storing raw data for reporting.

Event tracking from frontend interactions and booking webhooks
UTM, click ID, and fingerprint-based enrichment
SHA-256 hashing of sensitive user data for privacy compliance
Conversion tracking via Meta Conversions API
Data storage in BigQuery for analytics and reporting
Email notifications for new form submissions
Serverless deployment using AWS Lambda

Key Highlights

Designed a privacy-first tracking system with secure hashing of PII before external transmission
Built a unified event pipeline combining frontend tracking and backend webhook ingestion
Integrated with
- Meta Conversions API
- Google BigQuery
- Zoho Bookings
- Bubble.io
Enabled accurate marketing attribution using UTM parameters and click identifiers (fbclid, gclid)
Built as a Lambda-compatible containerized service for scalable deployment

Unified Company Enrichment — LLM-Powered Data Intelligence Service

Overview

A FastAPI-based service that aggregates multi-source company data and uses LLM-driven processing to generate structured, enriched company profiles for analytics and research workflows.

Problem

Company data from multiple sources is often fragmented, inconsistent, and difficult to use directly for analysis or decision-making.

Solution

Built an end-to-end enrichment pipeline that ingests raw data from multiple sources, normalizes and merges records, and applies LLM-based processing to generate structured, high-quality company insights accessible via API endpoints.

Multi-source data ingestion and consolidation
Domain-based normalization and record merging
LLM-powered extraction of structured company insights
API endpoints for enrichment and research queries
Per-domain data persistence and aggregation
Configurable enrichment workflows with logging

Key Highlights

Designed a hybrid pipeline combining deterministic data merging with LLM-based parsing
Built scalable enrichment workflows with async orchestration and batch processing
Enabled both automated enrichment and ad-hoc research queries via API endpoints
Implemented structured data outputs for downstream analytics and reporting
Containerized the service for flexible deployment

Data Quality Microservices — Validation & Standardization Platform

Overview

A modular FastAPI-based platform providing a suite of data validation and standardization services for contact, company, and metadata quality across data pipelines.

Problem

Raw business data often contains inconsistencies, invalid entries, and mismatched formats, leading to errors in downstream systems like analytics, enrichment, and CRM workflows.

Solution

Built a collection of independent microservices that validate, clean, and standardize different types of data (e.g., phone, email, company, address, industry, revenue) using rule-based logic and curated taxonomies, exposed via API endpoints for seamless integration into pipelines.

Validation services for phone, email, website, and address data
Standardization of job titles, company names, and metadata
Account matching and duplicate detection
Taxonomy-based mappings (industry, country, SIC codes)
Modular API design with independent service endpoints
Support for batch and pipeline-based processing
Containerized and serverless deployment support

Key Highlights

Built a modular microservices architecture with multiple independent validators under a unified API
Designed typed request/response models using Pydantic for strict validation and consistency
Integrated curated rule sets and taxonomies to improve data accuracy and standardization
Enabled flexible deployment using Docker and AWS Lambda (via Mangum)
Structured the system for scalable data pipeline integration and maintainability

Domain Validation Tool — Scalable Website Status & Quality Checker

Overview

A Python-based system that validates large volumes of domain names by checking availability, redirects, and content quality using concurrent processing and heuristic analysis.

Problem

Maintaining clean and reliable domain datasets is challenging due to inactive websites, redirects, and parked domains, which can negatively impact downstream processes like enrichment, outreach, and analytics.

Solution

Built a bulk validation pipeline that processes domain lists concurrently, fetches and analyzes website responses, and classifies domains based on availability, redirects, and content signals. The system outputs structured results for further processing and review.

Bulk domain validation using CSV-based input
Concurrent HTTP processing for high throughput
Detection of invalid, parked, or suspended domains
Redirect tracking with destination capture
Heuristic content analysis using keyword matching
Structured CSV output with detailed status and metadata

Key Highlights

Implemented multi-threaded processing to handle large domain datasets efficiently
Designed content-based validation heuristics to detect parked and inactive websites
Built robust error handling for timeouts, SSL issues, and connection failures
Optimized processing using chunked data handling to manage memory usage
Automated domain hygiene workflows for downstream data pipelines

Lead Generation Pipeline — Automated Contact Enrichment System

Overview

A FastAPI-based, serverless pipeline that generates and enriches contact data by combining web scraping, data standardization, and database ingestion for scalable lead generation.

Problem

Building high-quality lead lists manually is time-consuming and error-prone, especially when sourcing data from unstructured platforms like LinkedIn and search engines.

Solution

Developed an automated pipeline that takes ICP and persona inputs, discovers relevant contacts through web scraping, standardizes and enriches the data, removes duplicates, and stores validated leads in a structured database for downstream use.

ICP and persona-driven lead generation
Web scraping and contact extraction from search results
Data standardization for names, job titles, and companies
Deduplication using unique identifiers (e.g., LinkedIn URLs)
Automated email generation for contacts
Batch processing with concurrent execution
Database ingestion and job status tracking

Key Highlights

Built a concurrent scraping pipeline using thread-based parallelism for faster data collection
Integrated external services for data standardization and enrichment
Designed a multi-stage processing flow (scraping → filtering → normalization → ingestion)
Implemented deduplication and validation to improve lead quality and reliability

ID Resolution Service — Entity Matching & Deduplication Platform

Overview

A service that standardizes and resolves contact and company data by generating deterministic unique identifiers and performing large-scale deduplication across datasets.

Problem

Data from multiple sources often contains duplicates and inconsistencies, making it difficult to identify unique entities and maintain reliable datasets for analytics, enrichment, and CRM systems.

Solution

Built an ID resolution pipeline that standardizes raw data, generates deterministic UUID-based identifiers, and applies configurable matching rules to group duplicates and produce unified “golden” records for downstream systems.

Contact and company data standardization
Deterministic UID generation (UUID-based)
Configurable match-rule engine for deduplication
Generation of match tables and golden records
BigQuery integration for large-scale data processing
API endpoints for ingestion, processing, and UID generation
Credential management and integration setup for data sources

Key Highlights

Designed a deterministic identity system ensuring consistent IDs across multiple data sources
Built a rule-based matching pipeline to cluster duplicate records and generate unified datasets
Integrated with
- Google BigQuery
- PostgreSQL (metadata and run tracking)
Enabled scalable processing by combining API orchestration with data warehouse execution
Implemented modular pipeline components for standardization, matching, and ingestion workflows

Snowflake Native Application — Customer Data Platform & Identity Resolution

Overview

A Snowflake Native Application that unifies customer and company data across multiple sources, enabling identity resolution, audience segmentation, and analytics directly within the data warehouse.

Problem

Customer data is often fragmented across systems (CRM, marketing, sales), making it difficult to create a unified view for analytics, targeting, and decision-making.

Solution

Built a native application inside Snowflake that connects multiple data sources, standardizes and unifies records, and enables identity resolution and audience creation using SQL-driven workflows—all without moving data outside the warehouse.

Multi-source customer data integration within Snowflake
Deterministic identity resolution for contacts and companies
Unified customer profiles across schemas
SQL-based audience segmentation and filtering
Built-in data standardization and cleansing
Role-based access control for secure data operations

Key Highlights

Designed a warehouse-native architecture eliminating the need for external data movement
Built identity resolution workflows to unify customer and company records across datasets
Enabled SQL-driven audience segmentation for analytics and activation use cases
Leveraged Snowflake’s RBAC and compute model for secure and scalable execution
Developed a native app experience integrating directly into the data platform

Web Scraping & Data Acquisition — Multi-Source Intelligence Pipeline

Overview

A scalable data acquisition system that extracts, processes, and structures data from multiple web sources to support enrichment, lead generation, and analytics workflows.

Problem

High-quality business and contact data is distributed across multiple platforms, often inaccessible in structured form and difficult to aggregate at scale.

Solution

Built a flexible scraping pipeline that collects data from various platforms, standardizes extracted information, and integrates it into downstream enrichment and data processing systems.

Multi-source data extraction across business and contact platforms
Structured parsing and normalization of scraped data
Integration with enrichment and validation pipelines
Support for both batch and targeted scraping workflows
Error handling and fallback mechanisms for unstable sources

Key Highlights

Designed scraping workflows for platforms including:
- LinkedIn (People & Company)
- LinkedIn Sales Navigator
- ZoomInfo
- Lusha
- Apollo.io
- Google Maps
- Bloomberg
- Glassdoor
- Dun & Bradstreet (Hoovers)
- Datanyze
Built reusable scraping components for HTML parsing, data extraction, and transformation
Integrated scraping outputs into data pipelines for enrichment, validation, and storage
Handled challenges like dynamic content, anti-bot mechanisms, and inconsistent data formats

API — Contact & Account Enrichment Platform

Overview

Problem

Solution

Key Features

Key Highlights

Email Finder — Professional Email Generation & Validation Service

Overview

Problem

Solution

Key Features

Key Highlights

CRM & Platform Integrations — Unified Data Sync Service

Overview

Problem

Solution

Key Features

Key Highlights

MCP Server — AI Integration Bridge for Enrichment Tools

Overview

Problem

Solution

Key Features

Key Highlights

Webhook Middleware — Reliable Event Fan-Out & Delivery Service

Overview

Problem

Solution

Key Features

Key Highlights

CAPI Integration — Serverless Event Tracking & Attribution Pipeline

Overview

Problem

Solution

Key Features

Key Highlights

Unified Company Enrichment — LLM-Powered Data Intelligence Service

Overview

Problem

Solution

Key Features

Key Highlights

Data Quality Microservices — Validation & Standardization Platform

Overview

Problem

Solution

Key Features

Key Highlights

Domain Validation Tool — Scalable Website Status & Quality Checker

Overview

Problem

Solution

Key Features

Key Highlights

Lead Generation Pipeline — Automated Contact Enrichment System

Overview

Problem

Solution

Key Features

Key Highlights

ID Resolution Service — Entity Matching & Deduplication Platform

Overview

Problem

Solution

Key Features

Key Highlights

Snowflake Native Application — Customer Data Platform & Identity Resolution

Overview

Problem

Solution

Key Features

Key Highlights

Web Scraping & Data Acquisition — Multi-Source Intelligence Pipeline

Overview

Problem

Solution

Key Features

Key Highlights