|
Tahsin Reasat
Summary
Applied ML engineer with 8+ years of experience building and deploying production ML systems and research prototypes across Document AI, NLP, computer vision, and speech. I specialize in end-to-end ML: problem framing, data, model training, evaluation, deployment, and monitoring.
Quick links: Projects & Research, Publications, Resume (PDF).
Featured work
Production engineering
- Fax Document AI (human-in-the-loop): Temporal-orchestrated workflow + OCR/LLM extraction on AWS; cut manual processing time by 80-90% and supported 300-475 faxes/day.
Stack: AWS (Textract, Bedrock, Aurora PostgreSQL), Temporal, Python. Details redacted due to NDA. Short write-up
- Insurance denial extraction (LLM IE): extracted structured fields from denial letters to support claims recovery (~$3M/year).
Stack: LLMs, Python, AWS. Details redacted due to NDA. Short write-up
Research & open-source contribution
- OOD-Speech (ASR benchmark): 1100+ hours from 22,000+ contributors across 17 domains; fine-tuned Whisper for regional Bengali ASR.
Links: paper, Kaggle, demo. More
- BaDLAD (Document AI dataset): 33,695 annotated Bengali document samples; trained Mask R-CNN / YOLO-based layout detectors.
Links: paper. More
- Data-efficient contrastive learning (Histopathology): active sampling for SimCLR; 93% less data and 62% less training time.
Links: paper, code. More
- MRI tumor segmentation (MSTT): curated 199-patient dataset; multimodal UNet + SAM approaches; Dice 80% (SOTA).
Links: paper, code. More
Selected publications
- Data Efficient Contrastive Learning in Histopathology using Active Sampling. Machine Learning with Applications, 2024. link
- Abugida Normalizer and Parser for Unicode Texts. LREC-COLING, 2024. link
- OOD-Speech: A Large Bengali Speech Recognition Dataset for Out-of-Distribution Benchmarking. Interspeech, 2023. link
See the full list on the Publications page (or Google Scholar).
Resume
For a complete work history and skills list, see Resume.
|