Projects & Research

This page highlights a few representative projects spanning production ML systems, Document AI, open-source benchmarks, and biomedical ML.

Production ML systems (AWS)

Note: Details are partially redacted due to NDA; I’m happy to discuss architecture, tradeoffs, and evaluation approach.
Built a Temporal-orchestrated, human-in-the-loop fax document processing system on AWS to automate splitting, classification, and patient-to-order matching; reduced manual processing time by 80–90% and supported 300–475 faxes/day.
Implemented Temporal worker services (background processes): heavier workers for OCR/LLM inference and lighter workers for orchestration and workflow bookkeeping.
OCR + extraction with AWS Textract + AWS Bedrock; workflow state persisted in Amazon Aurora PostgreSQL.

Note: Details are partially redacted due to NDA; I’m happy to discuss schema design, reliability, and evaluation.
Designed an LLM-based system to extract structured fields from insurance denial documents, enabling potential recovery of ~$3M/year in lost claims.

Built ontology-aware features and trained an XGBoost model to prioritize genes; deployed to EC2 and reduced manual variant analysis workload by 50%.
Built an LLM-based pedigree image classifier on AWS Bedrock; 99% accuracy on an internal test dataset.

Engineered normalizer and parser libraries for Abugida Unicode texts supporting 7 Indic languages
Improved LLM robustness under adversarial conditions by 5–10 points across multiple metrics.
paper

Built an active learning-based training pipeline for SimCLR on histopathology images; reduced data requirements by 93% and training time by 62%.
paper, code

Curated an MRI soft tissue tumor segmentation dataset (199 patients)
Developed multimodal UNet + Segment Anything Model (SAM) approaches; achieved Dice 80% (state of the art).
paper, code

Engineered a CNN for inferior myocardial infarction detection from ECG signals; accuracy 84.54% (state of the art at time).
paper, code