Snehangsu De · Senior Data Engineer & Architect · open to global roles

Reliable data,
at scale.

I design cloud-native pipelines and platforms that turn 20M+ records a day into decisions leaders can trust - at 99.8% reliability, while cutting infrastructure cost by 62%. Five years turning operational chaos into systems teams can rely on.

Snehangsu De

India

Open to global & remote opportunities

5+
years in production data
99.8%

Pipeline reliability sustained across production workloads.

$250K

Annual cloud spend removed - a 62% cut through architecture, not corners.

20M+

Records ingested & processed daily from GSC, GA4 & sitemaps.

What I actually do

I build the layer between day-to-day operations and big-picture analytics - and keep it reliable, observable, and affordable even as data volumes explode.

Selected impact

Three problems, what I built, and what it returned to the business.

01 / Platform & Cost

Took $250K a year off the cloud bill.

Re-architected a 20M-record/day SEO data platform on BigQuery + dbt, with auto-scaling Node.js/TS (Fastify) services and fully Terraform-driven infrastructure. Strategic partitioning, TTL management and serverless scaling did the rest.

62% cost reduction 99.8% SLA ~$250K/yr saved
02 / Architecture

Built a secure gateway for AI agents.

Designed and built a Fastify MCP server with OAuth 2.1 + PKCE and Firebase multi-tenant auth, plus LLM-driven context scoping — so any external agent gets exactly the SERP and intent data it needs, and nothing else.

OAuth 2.1 + PKCE Multi-tenant auth MCP server
03 / Forecasting

Saw the revenue dip weeks before it hit.

Built an early-warning system on Statistical Process Control where leading demand signals move weeks ahead of revenue - a metrics cascade running down to leads, the single strongest revenue predictor.

Weeks of lead time ~0.73 correlation SPC early-warning

Trust

Worked with teams shipping at real-world scale.

Intuit

Fintech · Enterprise data

HSBC

Global banking

Udaan

B2B e-commerce

ShikshaLokam

Edtech · Non-profit

The Organic Agency

Current · Data architecture

Your team next? →

Let's talk

The journey

From the front line to the architecture.

May 2024 - Present

Senior Data Engineer & Architect

The Organic Agency · via Black Piano

Building the data and backend behind IntentOS, a share-of-intent market-intelligence platform - BigQuery SERP and intent pipelines, a TypeScript/GraphQL backend, and an MCP server - while running 20M+ records/day at a 99.8% SLA with a 62% infrastructure cost cut.

May 2022 - Apr 2024

Data Engineer

Cloudcraftz Solutions

Multi-cloud ingestion with PySpark/Dataproc, real-time event processing on Flink/Kafka, and CassandraDB migrations - reducing an HSBC workload by 34% through automated BigQuery/Looker pipelines.

Jul 2019 - May 2021

Data Services

Intuit (Concentrix)

Restored enterprise QuickBooks databases through complex RDBMS recovery, and ran small-business data extraction & loading with Python, Docker and MySQL.

Aug 2016 - Mar 2019

Customer Operations

Amazon

Where it started: a foundation in operational efficiency, process optimization, and data-driven customer problem-solving.

The full range

One engineer, several hats.

Senior data engineer first - but the work runs across backend, applied marketing science, and the systems underneath. Here's what each looks like in practice.

Data Engineering & Architecture

The pipelines and warehouse behind the product.

  • Consolidated SERP table unifying 29+ feature tables (~400K queries) from an Oxylabs crawl.
  • An intent-scoring engine on closed-form exponential decay - replaced a recursive approach for scale.
  • 20M+ records/day on BigQuery at a 99.8% SLA, with deep production-pipeline debugging.

Backend Engineering

The IntentOS platform, end to end.

  • TypeScript workflow handlers (MikroORM + type-graphql, JSONB scoring) for persona-to-subtopic relations.
  • A Meta Ads targeting pipeline: persona to structured keywords to Meta's interest taxonomy to five funnel-stage specs.
  • A Fastify MCP server with OAuth 2.1 + PKCE and Firebase multi-tenant auth.

Analytics & Modelling

Turning noisy data into decisions.

  • Bayesian mix-modelling (Meridian) that recovers true driver contributions from correlated, seasonal data.
  • A Statistical Process Control early-warning system where leading signals move weeks ahead of revenue (~0.73 correlation).
  • Large-scale comparative analysis surfacing 7,285 gap cells against a named competitor.

Systems & Developer Tooling

The environment that makes it fast.

  • Self-hosted Linux: cut Docker overhead from ~10 GB to 92 MB and idle RAM from 8.9 to 4.4 GB.
  • Pruned a 10 GB Postgres dump to 2.3 GB - 77% smaller - via schema-copy + filtered COPY.
  • Open-source tooling for the AI engineering stack - agent sandboxes, AI task management, structured logging. Each one built around a real gap in how you work alongside LLMs. See Projects for the full list.

Core arsenal

TypeScript Python GraphQL MikroORM Fastify BigQuery PostgreSQL MCP OpenAI Docker Linux PySpark Kafka dbt Altair Flink Druid Mage Terraform Redis Cloud Run Looker Pandas / NumPy Rust · learning

See it in code →

Pipelines, open-source tools and agent infrastructure. Most of what I build ends up on GitHub.

Let's talk

Ready to scale?

I'm open to senior data engineering & architecture roles, technical partnerships, and consultations. Tell me what you're trying to build - I'll tell you how to make it reliable and affordable.