DATA ENGINEER. Architect.
Hi, I’m Snehangsu De. I architect enterprise-grade data systems, balancing performance, reliability, and operational efficiency. I design and deliver cloud-native pipelines and automation platforms that cut through complexity and help organisations operate, scale, and make decisions with confidence.
The Architect
5+ years of experience building production data stacks. I engineer ingestion, transformation, storage, and orchestration layers with end-to-end observability across workloads. Currently spearheading data architecture at The Organic Agency.
India
Open to Global Opportunities
99.8%
Pipeline SLA
34%
Process Efficiency
Core Arsenal
Resume
PDF Version • 2025
Execution History
Senior Data Engineer
Black Piano
May 2024 - Present
Architecting auto-scaling Node.js/TS backends (Fastify) and SEO data platforms. Processing 20M+ daily records (GSC, GA4) via BigQuery/dbt with a 99.8% SLA and 62% cost reduction through Terraform-driven infrastructure.
Data Engineer
Cloudcraftz Solutions
May 2022 - Apr 2024
Multi-cloud data ingestion using PySpark/Dataproc. Orchestrated CassandraDB migrations, real-time event processing with Flink/Kafka, and reduced HSBC workload by 34% through automated BQ/Looker pipelines.
Data Services
Intuit (Concentrix)
Jul 2019 - May 2021
Restored enterprise QuickBooks databases via complex RDBMS recovery. Managed small business data extraction and loading using Python, Docker, and MySQL.
Customer Operations
Amazon
Aug 2016 - March 2019
Early career foundation in operational efficiency, process optimization, and data-driven customer problem-solving.
Strategic Influence
& Technical Deep-Dive.
My work focuses on the engineering of high-cardinality distributed systems. I bridge the gap between transactional reliability and large-scale analytical depth, ensuring systems are observable, fault-tolerant, and economically sustainable at scale.
Backend & Systems Architecture
Engineered auto-scaling Node.js/TS (Fastify) on Cloud Run with 99.9% uptime. Designed fault-tolerant async ETL via Cloud Tasks, multi-provider LLM layers (OpenAI, Anthropic, Gemini), and typed GraphQL (Mercurius/TypeGraphQL).
Data Pipelines & Processing
Owned multi-cloud platforms (PySpark, dbt, Mage) processing 20M+ daily records at 99.8% SLA. Optimized streaming stacks (Kafka, Flink, Druid) for 17% performance gains and tracking 10M+ keywords with LLM detection.
Machine Learning & Analytics
Developed PySpark ML models for keyword clustering (50M+ pages) with 15% CTR uplift. Built Prisma competitor tools with 78% accuracy and designed Kimball-dimensional models for commodity procurement.
Infrastructure & Reliability
Owned CI/CD stack (Docker, Cloud Build) and monitoring (Pub/Sub, Slack) reducing resolution time by 40%. Implemented Redis caching, JSON validation, and advanced RDBMS recovery.
Visualization & Reporting
Created enterprise dashboards (Looker, Superset, PowerBI) across BigQuery and Cassandra, reducing manual workloads by 34%. Delivered predictive analytics (Pandas/NumPy) for inventory planning.
50M+
Pages processed for keyword clustering and classification, resulting in a documented 15% CTR boost across core assets.
62%
Reduction in infrastructure spend ($250k/year) through strategic partitioning, TTL management, and serverless auto-scaling.
700k+
Keywords tracked across 2k+ domains with 78% prediction accuracy in competitor intelligence tools.
40%
Faster incident resolution through real-time Pub/Sub, Dataflow, and Slack alerting stacks.
45%
Reduction in manual SEO audit time via LLM-powered competitor content change detection.
20M+
Records ingested daily from GSC, GA4, and sitemaps with 99.8% SLA in production SEO platforms.
Building multi-cloud storage architectures (Redshift, S3, BigQuery) with automated data-pipeline logs and hybrid caching strategies (Redis) for real-time analytical consistency.