Overview Projects Writings
Connect

DATA ENGINEER. Architect.

Hi, I’m Snehangsu De. I architect enterprise-grade data systems, balancing performance, reliability, and operational efficiency. I design and deliver cloud-native pipelines and automation platforms that cut through complexity and help organisations operate, scale, and make decisions with confidence.

The Architect

5+ years of experience building production data stacks. I engineer ingestion, transformation, storage, and orchestration layers with end-to-end observability across workloads. Currently spearheading data architecture at The Organic Agency.

Strategy Engineering Leadership
Snehangsu De

India

Open to Global Opportunities

99.8%

Pipeline SLA

34%

Process Efficiency

Core Arsenal

Python Node.js BASH PySpark Kafka DBT Docker Terraform

Resume

PDF Version • 2025

Execution History

Senior Data Engineer

Black Piano

May 2024 - Present

Architecting auto-scaling Node.js/TS backends (Fastify) and SEO data platforms. Processing 20M+ daily records (GSC, GA4) via BigQuery/dbt with a 99.8% SLA and 62% cost reduction through Terraform-driven infrastructure.

Data Engineer

Cloudcraftz Solutions

May 2022 - Apr 2024

Multi-cloud data ingestion using PySpark/Dataproc. Orchestrated CassandraDB migrations, real-time event processing with Flink/Kafka, and reduced HSBC workload by 34% through automated BQ/Looker pipelines.

Data Services

Intuit (Concentrix)

Jul 2019 - May 2021

Restored enterprise QuickBooks databases via complex RDBMS recovery. Managed small business data extraction and loading using Python, Docker, and MySQL.

Customer Operations

Amazon

Aug 2016 - March 2019

Early career foundation in operational efficiency, process optimization, and data-driven customer problem-solving.

Strategic Influence & Technical Deep-Dive.

My work focuses on the engineering of high-cardinality distributed systems. I bridge the gap between transactional reliability and large-scale analytical depth, ensuring systems are observable, fault-tolerant, and economically sustainable at scale.

Backend & Systems Architecture

Engineered auto-scaling Node.js/TS (Fastify) on Cloud Run with 99.9% uptime. Designed fault-tolerant async ETL via Cloud Tasks, multi-provider LLM layers (OpenAI, Anthropic, Gemini), and typed GraphQL (Mercurius/TypeGraphQL).

Data Pipelines & Processing

Owned multi-cloud platforms (PySpark, dbt, Mage) processing 20M+ daily records at 99.8% SLA. Optimized streaming stacks (Kafka, Flink, Druid) for 17% performance gains and tracking 10M+ keywords with LLM detection.

Machine Learning & Analytics

Developed PySpark ML models for keyword clustering (50M+ pages) with 15% CTR uplift. Built Prisma competitor tools with 78% accuracy and designed Kimball-dimensional models for commodity procurement.

Infrastructure & Reliability

Owned CI/CD stack (Docker, Cloud Build) and monitoring (Pub/Sub, Slack) reducing resolution time by 40%. Implemented Redis caching, JSON validation, and advanced RDBMS recovery.

Visualization & Reporting

Created enterprise dashboards (Looker, Superset, PowerBI) across BigQuery and Cassandra, reducing manual workloads by 34%. Delivered predictive analytics (Pandas/NumPy) for inventory planning.

Throughput

50M+

Pages processed for keyword clustering and classification, resulting in a documented 15% CTR boost across core assets.

Optimization

62%

Reduction in infrastructure spend ($250k/year) through strategic partitioning, TTL management, and serverless auto-scaling.

Scale

700k+

Keywords tracked across 2k+ domains with 78% prediction accuracy in competitor intelligence tools.

Reliability

40%

Faster incident resolution through real-time Pub/Sub, Dataflow, and Slack alerting stacks.

Efficiency

45%

Reduction in manual SEO audit time via LLM-powered competitor content change detection.

Daily Volume

20M+

Records ingested daily from GSC, GA4, and sitemaps with 99.8% SLA in production SEO platforms.

Enterprise Data Foundations

Building multi-cloud storage architectures (Redshift, S3, BigQuery) with automated data-pipeline logs and hybrid caching strategies (Redis) for real-time analytical consistency.