About

Hi, I'm Snehangsu, an Data Engineer, transforming raw information into valuable insights. With over 6 years of experience, I excel in crafting efficient data pipelines with data governance, obeservality and automation. Let's collaborate and empower your business into a strategic asset.

The Journey: Working in customer service provided a valuable lens into real-world business complexities and user challenges. This exposure fueled my passion to leverage data, and I transitioned to the data-driven landscape, eager to learn, apply my skills to the development of better solutions.

Recent Work:
- Data pre-processing for an open-source platform (DIKSHA) to monitor educational activities.
- Designing the data-pipeline and data modelling for Udaan, a commodities trading platform.
- Performing the ETL process for HSBC using Apache Spark, BigQuery, and Dataproc to fulfill ad-hoc dashboard requests.

Explore some of my notable work and projects in the links below!

Download Resumé

Download Cover Letter

Contact

GitHub
Twitter
Kaggle
Linkedin
StackOverflow
desnehangsu@gmail.com
+91-888-679-5859

Skills

Frequent

Python
BASH
GCP
BigQuery
Docker
Apache Spark / PySpark

DBT
SQL
Apache Kafka
Looker Studio
Cassandra DB
Terraform

Occasionally

AWS Lambda
Git/GitHub
Power BI
Apache Flink
Mongo DB

Prefect
Mage
DLT
Selenium
Apache Druid

Concepts

ER Diagram
Star Schema (Kimball)
Data Warehouse Architecture
Map-Reduce Concepts

SOLID Principles
OOPs Principle
Distributed Architecture

Work Experience

Data Engineer, Cloudcraftz Solutions
2022-Present
- Shikshalokam
  - Bridged the gap between business needs and data by maintaining a robust 97.8% uptime for our data pipelines.
    Leveraged PySpark, Druid, Azure Storage, and Python to seamlessly ingest streaming and transactional data from 8 diverse sources.
  - Collaborated on the migration of data from Apache Druid to CassandraDB.
    Contributing to a 17% performance increase through strategic data modeling and a switch to event-based data capture
  - Enabled real-time reporting by building a data pipeline with Apache Flink and Kafka to a dynamic Cassandra database.
    Utilized my expertise in Apache Flink, Kafka, and Cassandra to achieve this
  - Handled data transformation & governace using DBT to maintain bronze-silver-gold data hierachy.
    Leveraged my proficiency in DBT and data warehouse best practices to achieve this.
  - Developed a cloud-agnostic data storage solution with automated monitoring.
    Leveraged slack, database design and data modelling skills to achieve this.
- Udaan
  - Established a centralized data pipeline to analyze transactional data.
    Utilized Selenium and data warehousing (BigQuery) to build this efficient data pipeline.
  - Optimized commodity procurement through a data model designed for vendor deals, commodities, prices, and purchases.
    Contributed to a Kimball data modeling architecture and designed the database schema, enabling efficient vendor and commodity management.
  - Analyzed commodity prices for reports, enabling short-term price forecasts and data governance for sustainable inventory and JIT models.
    Leveraged my expertise in market analysis to achieve this.
- Data Services, Concentrix (Intuit)
  2019-2021
  - Ensuring data integrity through the implementation and enforcement of robust governance and security frameworks..
    Safeguarded data through critical thinking (database issues, data migration, security) and comprehensive DB health assessments.
  - Provided data insights by optimizing small business data organization using star schema and ER diagram expertise.
- Customer Operation, Amazon
  2016-2019
  - Managed customer relations, resolved disputes, and prevented abuse in Concession Abuse Prevention.
    Showcasing excellent communication and problem-solving skills.
  - Served as a Subject Matter Expert in understanding and redefining customer policies, streamlining the process.
    Showcasing deeper understanding of customer requirements along with data analysis.

Blogposts

Technical Articles:

Visit Blog: Expected Space

Notebooks:

View Notebooks: Expected Space

Core Projects

Beyond The Bias

Using Kafka, Flink, to ingest data from RSS news feed scraped with Selenium aggregating various news sources to summarize news reports. (inProgress)

NYC Taxi Data - DLT, DBT, Dashboard

Using dlt library, Mage, DBT to load data to bigquery, creating Kimball data model and showing repsctive dashboard for NYC taxi data 2019-23. (inProgress)