Snehangsu De
Snehangsu De Profile Picture - Data Engineer

Snehangsu De

Data Engineer & Architect

I architect and build the data backbones that power intelligent business decisions. With over 6 years of expertise, I specialize in transforming chaotic data into clean, governed, and actionable assets. Recently, at The Organic Agency, I automated ETL processes delivering 34% efficiency gains while reducing costs by 23%. My focus is on high-performance pipelines, scalability, and automation, leveraging modern cloud ecosystems to drive reliability and speed.

6+

Years Experience

97.8%

Pipeline Uptime

Work Experience

  1. Senior Data Engineer, Black Piano

    The Organic Agency

    Organic Agency Core

    • Spearheaded backend architecture for high-volume data scraping and LLM-driven applications.
    • Built an auto-scaling Node.js and TypeScript backend using Fastify on Google Cloud Run, achieving 99.9% uptime for massive concurrent workloads.
    • Designed fault-tolerant ETL pipelines leveraging Cloud Tasks to orchestrate integrations with DataForSEO and Oxylabs.
    • Implemented hybrid architecture using PostgreSQL with MikroORM for transactions and BigQuery for large-scale analytics.
    • Developed a strongly-typed GraphQL API (Mercurius, TypeGraphQL) featuring a multi-LLM abstraction layer (OpenAI, Anthropic, Gemini) for semantic clustering.
    • Owned the entire DevOps lifecycle: Docker containerization, Cloud Build CI/CD pipelines, Redis caching, and rigorous validation (class-validator, JSON Schema).

    Project: Corigan

    • Owned SEO data platform (Python, PySpark, DBT, Mage, BigQuery), processing 20M+ daily records (GSC, GA4, sitemaps) at 99.8% SLA.
    • Reduced infrastructure costs by 62% ($250k/year) through strategic partitioning and Terraform-managed infrastructure optimizations.
    • Engineered a Prisma competitor tool and scrapers (Python, Selenium, Node.js) with LLM-based change detection, cutting manual audit time by 45%.
    • Created PySpark ML models for keyword clustering (50M+ pages, +15% CTR) and implemented real-time monitoring via Pub/Sub and Dataflow (40% faster incident resolution).
  2. Data Engineer, Cloudcraftz Solutions

    Shikshalokam

    • Maintained robust data pipeline with 97.8% uptime, ingesting streaming/transactional data from 8 diverse sources using PySpark, Druid, Azure Storage, and Python.
    • Orchestrated migration from Apache Druid to CassandraDB with strategic data modeling, delivering a 17% performance increase.
    • Utilized Apache Flink and Kafka for real-time event pre-processing, enabling dynamic Cassandra-based reporting for BI tools like Superset.
    • Developed cloud-agnostic storage architectures and automated logs; proposed MongoDB schema improvements enhancing user data validation.

    HSBC

    • Developed comprehensive internal dashboard using ETL and PySpark on Dataproc cluster, storing data in GCP BigQuery and visualizing via Looker (Reduced manual workload by 34%).
    • Enhanced BI by organizing exact geo-location data from user addresses, focusing on credit card-only users.
    • Executed complex SQL scripts to supply pertinent data to stakeholders for ad-hoc inquiries.

    Udaan

    • Scraped critical government reference data using Selenium (S3 Data Lake -> Redshift ETL).
    • Designed database schema and implemented Kimball data modeling (Star Schema) for vendor and commodity procurement.
    • Analyzed market trends using Pandas and Numpy, providing daily/weekly price prediction reports to ensure inventory sustainability (JIT models).
  3. Data Services, Concentrix (Intuit)

    • Handled complex data issues by restoring damaged corporate files for Quickbooks Enterprise/Premier using MySQL (89.3% satisfaction score).
    • Utilized Python, Docker, and PowerBI to extract, load, and visualize small business data.
    • Optimized data organization using Star Schema and ER diagrams to provide actionable insights.
  4. Customer Operation, Amazon

    • Managed customer relations and abuse prevention, demonstrating excellent communication and problem-solving skills in a high-pressure environment.
    • Served as a Subject Matter Expert (SME) for new hires.

Technical Expertise

Core Stack

Python SQL Node.js / TS Apache Spark / PySpark BigQuery (GCP) PostgreSQL GraphQL Kafka / PubSub Apache Flink / Dataflow DBT

Tools & Ecosystem

GCP Docker & K8s Terraform Cloud Run Cloud Build AWS Lambda Mage / Prefect Cassandra Redis MongoDB Fastify MikroORM Git / GitHub Looker Power BI Selenium Pandas / Numpy

Architecture & Concepts

Star Schema (Kimball)
Data Warehousing
Distributed Systems
ER Modeling
LLM Integration
Event-Driven Arch
SOLID Principles

Core Projects

Logical & Visualization Projects

Writings & Notebooks

Get In Touch