Aayush Verma

Data Scientist and Machine Learning Engineer

Education

University of Maryland, College Park

Master’s in Data Science

(Fall 2024 - Spring 2026)

(GPA - 4.0/4.0)

Relevant Coursework: Machine Learning, Data Science, Probability and Statistics, Algorithms for Data Science, Big Data Systems, Data Representation and Modelling

Work Experience

LawIntel.AI (2024 - 2024)

Assisted development of a Legal LLM solutions to produce accurate judgments tailored to the Indian legal system.

Mu Sigma Inc. (2022 - 2024)

  • Developed data-driven supply chain optimization tools to enhance efficiency in operation and inventory.
  • Built ETL pipelines in Python, SQL, and PySpark to enhance SKU fulfillment, leading to 30% efficiency and EBIT savings worth $8M.
  • Built a CatBoost-based demand forecasting model in AWS Sagemaker, leading to a 5% reduction in split shipment and optimized inventory assortment.
  • Built scalable pipelines in AWS S3, Snowflake, and Teradata through Apache Airflow to offer real-time supply chain analytics.
  • Won two consecutive recognition awards in 2022 for technical acumen and project upscaling.

TCS (2020 - 2021)

Refined digital infrastructure for India's largest banks, reducing loading time by 10%.

Skills

A quick map of the tools and ideas I reach for most often.

Programming & Analysis

  • Python
  • R
  • SQL

Machine Learning

  • Supervised & Unsupervised & Semi-Supervised ML
  • SVMs, Tree-based models, Ensembles
  • Model evaluation & experiment design

Generative AI & LLMs

  • LLM apps with LangChain & LCEL
  • RAG pipelines (embeddings, vector DBs, retrievers)
  • Tool-using agents (ReAct, multi-tool routing)
  • Groq & OpenAI models (Llama 3 family, GPT)
  • Prompt design, system prompts, evaluation

Time Series & Forecasting

  • ARIMA, SARIMA
  • Prophet & decomposition
  • Backtesting & uncertainty bands

Data Engineering

  • Pandas, Polars, PySpark
  • ETL pipelines & data cleaning
  • Versioning & CI/CD for data workflows

Cloud & Infrastructure

  • AWS (S3, EC2, SageMaker)
  • Snowflake, Teradata
  • Docker, Airflow

Visualization & Communication

  • Matplotlib, ggplot2, dashboards
  • Tableau / Power BI
  • Technical writing & teaching

Projects

Dynamic Meta-Classifier for Loan Defaulter Detection

Developed an ensemble meta-classifier to predict minority-class defaulters in imbalanced datasets using CatBoost and One-Class SVM as base models, with a Random Forest meta-learner dynamically switching between them based on feature patterns.

MCare Predictive Maintenance

Part of the R&D team that built a predictive maintenance system to forecast potential failures in heavy machinery and estimate remaining useful life using real-time sensor data and ML techniques.

Multi-Tool AI Chatbot (LangChain · Groq · Streamlit)

Built an interactive AI chatbot powered by LangChain ReAct agents and Groq LLMs, capable of routing queries across multiple tools – Wikipedia for general knowledge, ArXiv for research papers, and DuckDuckGo for real-time web search – and falling back to Llama-3 for free-form reasoning. The app features a Streamlit UI with chat bubbles, persistent conversation history, and live streaming of the agent’s thoughts and tool calls.

Live app

PDF Question Answering Assistant (LangChain · LCEL · RAG)

Built a Streamlit-based RAG app that lets users upload any PDF and ask natural language questions grounded in its content. The pipeline uses PyPDFLoader for text extraction, RecursiveCharacterTextSplitter for intelligent chunking, OpenAI text-embedding-3-small for embeddings, and an in-memory Chroma vector store for retrieval, orchestrated via LangChain LCEL into a Retriever → Prompt → Groq LLM → Output Parser chain powered by Groq’s Llama 3.1 8B model.

Live app

About Me

When I'm not wrestling with data, I play badminton and watch Formula - 1.

Aayush Verma

Contact Me