Akshay Gupta

Hi, I'm

Akshay Gupta

I

Lead Data Engineer turning complex data chaos into reliable, cost-efficient infrastructure. IIT Guwahati Alum.

Tech Stack

Technologies I use to build things

Languages

Python
Java
Scala
Google Script
Go

Databases

Clickhouse
Tigergraph
BigQuery
image/svg+xml
Databricks
KSQLDB
MySQL
PostgreSQL
MongoDB
Redis

Frameworks

Airflow
Spark
Django
Pandas
Apache Beam
Kubernetes
Streamlit

Queuing Systems

Kafka
Icon_24px_Pub-Sub_Color
Google Pubsub

Miscellaneous

Google Analytics
Redash
MetaBase
GCP
AWS
GitHub Actions
ArgoCD

Experience

Senior Data Engineer β†’ Lead Data Engineer

Merkle Science

Merkle Science builds blockchain intelligence and compliance infrastructure used by financial institutions, exchanges, and regulators worldwide. I joined in 2021 as the youngest engineer on the team. By 2023, I was leading it.

The platform story: When I joined, the platform supported 7 blockchains and was under constant strain β€” unstable, expensive, and hard to scale. Over four years, I led five end-to-end architecture overhauls β€” each planned, executed, and delivered without breaking customer trust. Today, the platform supports 22+ blockchains, handles 500TB of data, runs at 99.99% uptime, and costs 80% less ($360K saved annually) despite carrying 3–4x the data volume. That’s not a one-time win. It’s a compounding pattern of doing more with less, repeatedly.

What I built:

  • Harvester β€” a unified blockchain data extraction framework that reduced new chain integration time by 70%, turning a slow, bespoke process into a standardised, repeatable one. We went from 7 to 22+ chains.
  • Nimbus β€” a real-time streaming pipeline (evolved from Apache Beam β†’ Kafka + KSQL) that makes blockchain transactions available on the platform within 7 seconds of confirmation, at horizontal scale.
  • TigerGraph overhaul β€” redesigned the entire schema and multi-hop network exposure algorithm from scratch. The result: P99 query latency under 1 second, at 90% lower infrastructure cost, with significantly better explainability for compliance teams.
  • 4 ClickHouse migrations β€” each one standardising schemas, introducing pre-aggregated data models, and delivering consistent sub-second query performance at enterprise scale.
  • Cost optimisation β€” led cross-functional initiatives with DevOps to keep GCP compute costs flat for 3 years despite 3–4x data growth, and orchestrated the move to ClickHouse Cloud to cut self-hosting overhead.

How I lead:

  • Team & culture: Directly manage a team of 3 engineers while informally mentoring across Data Science, Product, and DevOps. Built a documentation-first culture where architecture decisions, runbooks, and hard-won learnings are team property β€” not locked in any one person’s head. Mentored junior engineers through structured code reviews and design guidance, helping them grow into independent contributors who own their projects end to end.
  • Stakeholder management: Serve as the primary interface between data engineering and the rest of the business β€” translating complex technical constraints into clear business language and vice versa. Regularly present infrastructure decisions, cost trade-offs, and data quality insights to non-technical stakeholders, enabling informed decisions on resource allocation and vendor contracts. Led a company-wide initiative to redefine how transaction counts are measured β€” a deceptively complex project requiring alignment across Product, Data Science, and leadership before a single line of code was written.
  • Navigating ambiguity: Consistently take ownership of high-stakes, open-ended problems where the path isn’t clear β€” whether that’s evaluating a new blockchain for feasibility, responding to an unplanned production incident, or deciding when to rebuild vs. iterate on aging infrastructure.
  • Shielding the team: Proactively absorb on-call escalations and cross-team coordination so engineers can stay in deep focus. Pair this with deliberate knowledge distribution to ensure no critical system depends on a single person β€” including me.

Analyst β†’ Data Engineer β†’ Senior Data Engineer

Testbook

Testbook is one of India’s largest ed-tech platforms for competitive exam preparation. I was one of the first two data hires β€” joining as an analyst, and leaving as a senior data engineer who had built the entire data function, a revenue-generating CRM platform, and ML-powered product features.

Starting from zero: I set up Testbook’s first analytics infrastructure β€” Excel to Jupyter notebooks to live Redash dashboards pulling from MongoDB, BigQuery, and MySQL. These became the operating layer for product, content, and marketing teams. One early win: identifying a post-signup dropout rate of 50% and reducing it to 30% through data analysis and targeted interventions.

The CRM that became a revenue engine: What started as a two-person checkout recovery experiment evolved into a fully custom internal telesales platform. I designed the lead flow architecture, database model, real-time ingestion pipelines (Pub/Sub + Cloud Functions), lead scoring, and attribution strategy. At its peak, this platform contributed to 30% of Testbook’s revenue β€” built from scratch because nothing off-the-shelf fit the use case.

The T-Score moment: Indian Railways exams use a statistical normalisation method called T-Score to rank millions of aspirants. When a major exam was announced with weeks β€” not months β€” to prepare, we saw an opportunity. I built the automation pipeline to compute and deliver personalised T-Scores to every user who attempted our test series, using population-level aggregates decoupled from user-facing delivery. We were the only platform in India offering this at scale. The result: 60% of aspirants for that exam came to Testbook.

Other highlights:

  • Built a question difficulty quantification algorithm using Apache Spark β€” reducing content curation time from months to days and improving educational quality at scale.
  • Architected a CDC-based MongoDB consolidation pipeline using Airflow and GCS, saving 50+ dashboards and 1000+ queries from breaking during a critical database migration.
  • Founded and scaled the data engineering department to a 4-person team.

Education

Bachelor of Technology

Indian Institute of Technology(IIT) Guwahati

CGPA: 8.38/10

Achievements

MITACS Globalink Research Intern

MITACS

Selected for the prestigious Canadian research internship. Among 50 candidates selected from India.

McM Scholarship

IIT Guwahati

Awarded full tuition fee waiver for academic excellence.

Recent Posts

Get In Touch

Connect

I'm always interested in hearing about new tech and ideas. Feel free to reach out!

βœ“ Copied to clipboard!
Send a message

Find me on