Some of production-grade projects I’ve worked on across backend systems, data pipelines, AI integrations, and scalable platforms. Each project highlights real-world problem solving, system design, and engineering decisions.
API — Contact & Account Enrichment Platform
Overview
A production Rest API backend for contact and account enrichment, reverse lookups, and ICP-based lead generation, combining proprietary data with external enrichment providers.
Problem
Sales and marketing workflows depend on accurate contact and company data, but existing solutions are often fragmented, slow, and inefficient for bulk processing.
Solution
Built a scalable, API-first platform that aggregates internal datasets and third-party enrichment services to deliver real-time and batch enrichment. The system supports credit-based usage, rate limiting, and background processing for large workloads.
Read More
Key Features
- Contact & company enrichment
- Email & LinkedIn reverse lookup
- ICP/persona-based lead generation
- Credit-based billing system with usage tracking
- Redis-backed rate limiting
- Background processing for long-running jobs
- S3-based file handling for large datasets
Key Highlights
- Designed a metered API system with credit accounting and transaction logging
- Integrated multiple external enrichment providers with retry and fallback mechanisms
- Optimized performance using async processing and batch workflows
- Built production-ready features like rate limiting, auditing, and scalable deployment
Links
Email Finder — Professional Email Generation & Validation Service
Overview
A microservice that generates and validates professional email addresses using name and domain inputs, optimized with caching to reduce external API usage and improve performance.
Problem
Finding accurate professional email addresses programmatically is unreliable and expensive due to repeated validation calls and lack of standardized patterns.
Solution
Built a lightweight service that generates multiple email patterns, validates them using an external provider, and caches results in a database to minimize redundant API calls. The system supports both containerized and serverless deployments.
Read More
Key Features
- Email generation from name + domain inputs
- Validation using third-party email verification service
- PostgreSQL-backed caching to reduce repeated lookups
- Input limiting to control request size and cost
- Header-based API authentication
- Deployable via Docker and AWS Lambda (serverless)
Key Highlights
- Reduced external validation costs by implementing database-backed caching with reuse logic
- Designed candidate email pattern generation system for higher success rates
- Built a modular service separating API layer and business logic for maintainability
- Enabled flexible deployment using containerized and serverless architectures
Links
—-
CRM & Platform Integrations — Unified Data Sync Service
Overview
Integration service that enables seamless data synchronization between internal databases and multiple external platforms, including CRMs, communication tools, and data warehouses.
Problem
Organizations often rely on multiple platforms (CRMs, email tools, data warehouses), but keeping data consistent across them requires manual effort or fragmented integrations, leading to delays and inconsistencies.
Solution
Built a unified integration layer that reads data from internal databases, transforms it using configurable mappings, and syncs it across multiple external platforms via API-based connectors. The system supports both real-time and batch workflows with flexible deployment options.
Read More
Key Features
- Multi-platform integrations across CRM, communication, and data systems
- Configurable field-mapping for transforming relational data into platform-specific formats
- API-driven data sync workflows (push + fetch operations)
- Support for both real-time and batch synchronization
- Token-based API authentication
- Serverless and containerized deployment support
Key Highlights
- Designed a modular integration architecture to support multiple platforms without changing core logic
- Built a flexible ETL pipeline using Pandas for mapping and transforming structured data
- Integrated with multiple external systems including
- Salesforce
- HubSpot
- Gmail, Google Drive, Google Calendar
- Zoho CRM
- Mailchimp
- Snowflake
- Formsort
- Enabled serverless deployment (AWS Lambda via Mangum) for scalable and cost-efficient execution
- Implemented structured logging and monitoring using AWS CloudWatch
MCP Server — AI Integration Bridge for Enrichment Tools
Overview
A FastMCP-based server that exposes enrichment and discovery capabilities as tools for AI assistants, enabling seamless integration with local AI workflows and automation pipelines.
Problem
AI assistants and automation tools lack direct access to structured enrichment and discovery systems, making it difficult to integrate real-time data into AI-driven workflows.
Solution
Built a lightweight MCP server that wraps enrichment and discovery APIs into structured, validated tools that can be consumed by AI assistants. The system supports multiple input formats and ensures secure, low-latency communication between local AI environments and external APIs.
Read More
Key Features
- MCP-based tool exposure for AI assistants
- Contact, email, phone, and LinkedIn enrichment
- Bulk enrichment and discovery workflows
- Persona-based discovery and profiling tools
- Taxonomy utilities for standardized inputs (industry, geography, company size)
- API-key based authentication (header + environment fallback)
Key Highlights
- Designed typed input validation using Pydantic to prevent invalid or wasteful API calls
- Built async tool handlers for efficient request routing and response handling
- Enabled integration with AI tools like
- Claude Desktop
- Cursor
- Implemented flexible API-key handling for secure usage across local and integrated environments
- Standardized inputs using taxonomy layers to improve consistency across enrichment workflows
Webhook Middleware — Reliable Event Fan-Out & Delivery Service
Overview
A lightweight Python service that receives webhook events and reliably forwards them to multiple destinations with logging, retry handling, and audit tracking.
Problem
Integrating third-party webhooks directly into internal systems can lead to failures, missed events, and poor visibility into delivery status, especially when multiple downstream services are involved.
Solution
Built a middleware layer that captures incoming webhook payloads, logs them for auditability, and forwards them to multiple configured endpoints. The system ensures reliable delivery through retry mechanisms and detailed logging of each request and response.
Read More
Key Features
- Webhook ingestion and payload parsing
- Fan-out delivery to multiple destination endpoints
- SQLite-backed logging for request/response tracking
- Retry mechanism for failed deliveries
- Structured logging for debugging and observability
- Containerized deployment using Docker
Key Highlights
- Designed a reliable fan-out architecture to decouple webhook providers from internal systems
- Implemented persistent audit logging for full visibility into delivery status
- Built retry and failure tracking to improve delivery reliability
- Kept the system lightweight and portable using minimal dependencies and SQLite
CAPI Integration — Serverless Event Tracking & Attribution Pipeline
Overview
A serverless event-tracking pipeline that captures user interactions and booking events, processes and hashes sensitive data, and forwards conversion events to marketing and analytics platforms.
Problem
Tracking user behavior and conversions across multiple platforms (web apps, booking systems) is fragmented and often unreliable, leading to inaccurate attribution and incomplete analytics.
Solution
Built a centralized event pipeline that captures frontend interactions and webhook events, enriches them with tracking data, applies privacy-safe transformations, and forwards them to external analytics and marketing systems while storing raw data for reporting.
Read More
Key Features
- Event tracking from frontend interactions and booking webhooks
- UTM, click ID, and fingerprint-based enrichment
- SHA-256 hashing of sensitive user data for privacy compliance
- Conversion tracking via Meta Conversions API
- Data storage in BigQuery for analytics and reporting
- Email notifications for new form submissions
- Serverless deployment using AWS Lambda
Key Highlights
- Designed a privacy-first tracking system with secure hashing of PII before external transmission
- Built a unified event pipeline combining frontend tracking and backend webhook ingestion
- Integrated with
- Meta Conversions API
- Google BigQuery
- Zoho Bookings
- Bubble.io
- Enabled accurate marketing attribution using UTM parameters and click identifiers (fbclid, gclid)
- Built as a Lambda-compatible containerized service for scalable deployment
Unified Company Enrichment — LLM-Powered Data Intelligence Service
Overview
A FastAPI-based service that aggregates multi-source company data and uses LLM-driven processing to generate structured, enriched company profiles for analytics and research workflows.
Problem
Company data from multiple sources is often fragmented, inconsistent, and difficult to use directly for analysis or decision-making.
Solution
Built an end-to-end enrichment pipeline that ingests raw data from multiple sources, normalizes and merges records, and applies LLM-based processing to generate structured, high-quality company insights accessible via API endpoints.
Read More
Key Features
- Multi-source data ingestion and consolidation
- Domain-based normalization and record merging
- LLM-powered extraction of structured company insights
- API endpoints for enrichment and research queries
- Per-domain data persistence and aggregation
- Configurable enrichment workflows with logging
Key Highlights
- Designed a hybrid pipeline combining deterministic data merging with LLM-based parsing
- Built scalable enrichment workflows with async orchestration and batch processing
- Enabled both automated enrichment and ad-hoc research queries via API endpoints
- Implemented structured data outputs for downstream analytics and reporting
- Containerized the service for flexible deployment
Data Quality Microservices — Validation & Standardization Platform
Overview
A modular FastAPI-based platform providing a suite of data validation and standardization services for contact, company, and metadata quality across data pipelines.
Problem
Raw business data often contains inconsistencies, invalid entries, and mismatched formats, leading to errors in downstream systems like analytics, enrichment, and CRM workflows.
Solution
Built a collection of independent microservices that validate, clean, and standardize different types of data (e.g., phone, email, company, address, industry, revenue) using rule-based logic and curated taxonomies, exposed via API endpoints for seamless integration into pipelines.
Read More
Key Features
- Validation services for phone, email, website, and address data
- Standardization of job titles, company names, and metadata
- Account matching and duplicate detection
- Taxonomy-based mappings (industry, country, SIC codes)
- Modular API design with independent service endpoints
- Support for batch and pipeline-based processing
- Containerized and serverless deployment support
Key Highlights
- Built a modular microservices architecture with multiple independent validators under a unified API
- Designed typed request/response models using Pydantic for strict validation and consistency
- Integrated curated rule sets and taxonomies to improve data accuracy and standardization
- Enabled flexible deployment using Docker and AWS Lambda (via Mangum)
- Structured the system for scalable data pipeline integration and maintainability
Domain Validation Tool — Scalable Website Status & Quality Checker
Overview
A Python-based system that validates large volumes of domain names by checking availability, redirects, and content quality using concurrent processing and heuristic analysis.
Problem
Maintaining clean and reliable domain datasets is challenging due to inactive websites, redirects, and parked domains, which can negatively impact downstream processes like enrichment, outreach, and analytics.
Solution
Built a bulk validation pipeline that processes domain lists concurrently, fetches and analyzes website responses, and classifies domains based on availability, redirects, and content signals. The system outputs structured results for further processing and review.
Read More
Key Features
- Bulk domain validation using CSV-based input
- Concurrent HTTP processing for high throughput
- Detection of invalid, parked, or suspended domains
- Redirect tracking with destination capture
- Heuristic content analysis using keyword matching
- Structured CSV output with detailed status and metadata
Key Highlights
- Implemented multi-threaded processing to handle large domain datasets efficiently
- Designed content-based validation heuristics to detect parked and inactive websites
- Built robust error handling for timeouts, SSL issues, and connection failures
- Optimized processing using chunked data handling to manage memory usage
- Automated domain hygiene workflows for downstream data pipelines
Lead Generation Pipeline — Automated Contact Enrichment System
Overview
A FastAPI-based, serverless pipeline that generates and enriches contact data by combining web scraping, data standardization, and database ingestion for scalable lead generation.
Problem
Building high-quality lead lists manually is time-consuming and error-prone, especially when sourcing data from unstructured platforms like LinkedIn and search engines.
Solution
Developed an automated pipeline that takes ICP and persona inputs, discovers relevant contacts through web scraping, standardizes and enriches the data, removes duplicates, and stores validated leads in a structured database for downstream use.
Read More
Key Features
- ICP and persona-driven lead generation
- Web scraping and contact extraction from search results
- Data standardization for names, job titles, and companies
- Deduplication using unique identifiers (e.g., LinkedIn URLs)
- Automated email generation for contacts
- Batch processing with concurrent execution
- Database ingestion and job status tracking
Key Highlights
- Built a concurrent scraping pipeline using thread-based parallelism for faster data collection
- Integrated external services for data standardization and enrichment
- Designed a multi-stage processing flow (scraping → filtering → normalization → ingestion)
- Implemented deduplication and validation to improve lead quality and reliability
ID Resolution Service — Entity Matching & Deduplication Platform
Overview
A service that standardizes and resolves contact and company data by generating deterministic unique identifiers and performing large-scale deduplication across datasets.
Problem
Data from multiple sources often contains duplicates and inconsistencies, making it difficult to identify unique entities and maintain reliable datasets for analytics, enrichment, and CRM systems.
Solution
Built an ID resolution pipeline that standardizes raw data, generates deterministic UUID-based identifiers, and applies configurable matching rules to group duplicates and produce unified “golden” records for downstream systems.
Read More
Key Features
- Contact and company data standardization
- Deterministic UID generation (UUID-based)
- Configurable match-rule engine for deduplication
- Generation of match tables and golden records
- BigQuery integration for large-scale data processing
- API endpoints for ingestion, processing, and UID generation
- Credential management and integration setup for data sources
Key Highlights
- Designed a deterministic identity system ensuring consistent IDs across multiple data sources
- Built a rule-based matching pipeline to cluster duplicate records and generate unified datasets
- Integrated with
- Google BigQuery
- PostgreSQL (metadata and run tracking)
- Enabled scalable processing by combining API orchestration with data warehouse execution
- Implemented modular pipeline components for standardization, matching, and ingestion workflows
Snowflake Native Application — Customer Data Platform & Identity Resolution
Overview
A Snowflake Native Application that unifies customer and company data across multiple sources, enabling identity resolution, audience segmentation, and analytics directly within the data warehouse.
Problem
Customer data is often fragmented across systems (CRM, marketing, sales), making it difficult to create a unified view for analytics, targeting, and decision-making.
Solution
Built a native application inside Snowflake that connects multiple data sources, standardizes and unifies records, and enables identity resolution and audience creation using SQL-driven workflows—all without moving data outside the warehouse.
Read More
Key Features
- Multi-source customer data integration within Snowflake
- Deterministic identity resolution for contacts and companies
- Unified customer profiles across schemas
- SQL-based audience segmentation and filtering
- Built-in data standardization and cleansing
- Role-based access control for secure data operations
Key Highlights
- Designed a warehouse-native architecture eliminating the need for external data movement
- Built identity resolution workflows to unify customer and company records across datasets
- Enabled SQL-driven audience segmentation for analytics and activation use cases
- Leveraged Snowflake’s RBAC and compute model for secure and scalable execution
- Developed a native app experience integrating directly into the data platform
Web Scraping & Data Acquisition — Multi-Source Intelligence Pipeline
Overview
A scalable data acquisition system that extracts, processes, and structures data from multiple web sources to support enrichment, lead generation, and analytics workflows.
Problem
High-quality business and contact data is distributed across multiple platforms, often inaccessible in structured form and difficult to aggregate at scale.
Solution
Built a flexible scraping pipeline that collects data from various platforms, standardizes extracted information, and integrates it into downstream enrichment and data processing systems.
Read More
Key Features
- Multi-source data extraction across business and contact platforms
- Structured parsing and normalization of scraped data
- Integration with enrichment and validation pipelines
- Support for both batch and targeted scraping workflows
- Error handling and fallback mechanisms for unstable sources
Key Highlights
- Designed scraping workflows for platforms including:
- LinkedIn (People & Company)
- LinkedIn Sales Navigator
- ZoomInfo
- Lusha
- Apollo.io
- Google Maps
- Bloomberg
- Glassdoor
- Dun & Bradstreet (Hoovers)
- Datanyze
- Built reusable scraping components for HTML parsing, data extraction, and transformation
- Integrated scraping outputs into data pipelines for enrichment, validation, and storage
- Handled challenges like dynamic content, anti-bot mechanisms, and inconsistent data formats