skip to content

Aarjav Jain

AI × Bio. Graduate Researcher at Brown.

Providence, RI  ·  contact

I'm Aarjav, a graduate student at Brown working where machine learning meets biology: spatial and single-cell genomics, representation learning, and graph-grounded discovery.

  • Lab. the Singh Lab at Brown, building an agentic drug repurposing framework with Prof. Ritambhara Singh.
  • Researching. representation learning for spatial transcriptomics through self-supervised pretraining with Prof. Ying Ma.
  • Studying. Statistical and AI-Powered Methods for High-Dimensional Genomics Data and Computational Linguistics at Brown.
  • Excited by. large-scale Perturb-seq atlases and what they unlock for causal modeling in single cell. E.g. X-Atlas/Pisces [Wang et al.]).

Updated April 2026.

  1. Brown University, Providence, RI

    MSc Computer Science [AI + Computational Biology track]

    GTA: Data Structures, Algorithms, and Intractability.

    Relevant Courses: Deep Learning in Genomics; Statistical and AI-Powered Methods for High-Dimensional Genomics Data Analysis.

  2. King's College London, London, UK

    BSc Computer Science (Artificial Intelligence) with Management and a Year Abroad — First Class Honors

    Year Abroad: AI + Biotechnology, University of Toronto.

  1. Graduate Researcher, Singh Lab

    Brown University · advised by Prof. Ritambhara Singh
    • Building an agentic drug repurposing framework over a biomedical knowledge graph integrating PharmaDB and Hetionet.
    • The integrated graph spans roughly 67K nodes and 1.7M edges; drug-gene-disease hypotheses are prioritized across 48 cancer types.
    • Used KPaths retrieval for relevant-path subgraphs, experimenting with inference strategies including LLM-as-judge, LLM-ensemble, and LLM-council.
    • Constructed ontological mappings between clinical trial data and PharmaDB to validate against Phase 1/2 trial outcomes, in collaboration with Dr. Alejandro Schaffer (NIH/NCI), evaluating with precision@k and F0.5.
  2. Self-Supervised Spatial Transcriptomics (JEPA), Brown University

    advised by Prof. Ying Ma
    • Designing a self-supervised spatial transcriptomics framework based on Joint-Embedding Predictive Architecture that learns tissue representations via spatial masking in latent space.
    • Extended a Perceiver encoder with relational encoding to capture spatial neighborhoods on SToCorpus-88M, with coverage across high-resolution single-cell spatial transcriptomics technologies.
    • Evaluating on cell-type annotation, spatial domain identification, and gene-expression imputation against scGPT-spatial, STAGATE, and GraphST baselines.
  3. Neuropathology Stage Imputation from snRNA-seq, Brown University

    advised by Prof. Ritambhara Singh
    • Trained a hierarchical transformer on the SEA-AD dataset (Allen Institute) to infer Alzheimer's neuropathology stages (Braak, Thal, CERAD, ADNC) from snRNA-seq profiles across MTG and A9 brain regions.
    • Designed donor-level attention architecture to capture cross-cell-type and cross-region structure that flat per-cell models miss.
    • Applied batch correction to remove donor bias, then fine-tuned scGPT brain-pretrained embeddings on the AD snRNA-seq dataset to capture disease-relevant signal.
  4. Fascicle Length Segmentation, Vision in Human Robotics Lab @ KCL

    advised by Dr. Letizia Gionfrida
    • Developed a zero-shot Noise2Noise CNN to automate B-mode ultrasound preprocessing, boosting Jaccard Similarity Coefficient accuracy by 11%.
    • Tracked fascicle motion using affine optical flow and sparse representations, improving temporal consistency and reducing segmentation drift.
  1. Graduate Software Engineer, Deutsche Bank

    London, UK
    • Built a real-time risk monitoring system ingesting thousands of trades per second with nanosecond-level latency.
    • Designed multi-partition data retrieval stacks in KDB+/Q for 100× faster high-volume queries; added a cross-stack de-duplication layer to eliminate redundant retrieval.
    • Shipped a dashboard surfacing noisy usage of market data, yielding ~$400,000 annual cost savings.
  2. Co-Founder, Upsizzle AI

    London / San Francisco
    • Architected a multi-agent AI pipeline for Generative Engine Optimization, orchestrating 10+ LLM and non-LLM workers (NER, web crawlers, scoring models) across OpenAI, Gemini, Grok, and Qwen, assigning models by task, cost, and capability
    • Built an autonomous orchestrator with quality-check feedback loops and failure-tolerant redispatch, enabling agents to retry and refine outputs without manual intervention
    • Designed a per-client pipeline that generated simulated AI search responses across personas, crawled and summarized 200-600 cited web pages per run, and performed NER and sentiment extraction across all sources
    • Implemented weighted scoring systems across brand mentions, sentiment, and source authority to generate automated competitive strategy recommendations
  1. Lucid Bio — Screening Layer for Biosynthesis

    • Built a protein screening pipeline that decomposes sequences into functional domains and runs parallel structure prediction (ESMFold) and similarity search (Foldseek, Diamond) to catch threats that evade standard BLAST screening.
    • Designed an agentic LLM layer that reasons over per-domain signals to assess combinatorial risk from chimeric sequences.
  2. ShardCompute — Distributed Inference Network

    • Distributed model-sharding inference that partitions large models across heterogeneous devices beyond single-machine memory limits.
    • Coordinator-worker architecture for synchronized tensor computation across shards.
    • Fault-tolerant state coordination across workers.
    • Relay networking early on; migrated to P2P to reduce relay-hop latency.
  • LanguagesPython · Java · C++ · SQL · KDB+/Q · Scala · TypeScript
  • ML / ScientificPyTorch · TensorFlow · scikit-learn · NumPy · pandas · SciPy · vLLM · MLX · Scanpy · AnnData · OpenCV
  • InfrastructureAWS (EC2, Lambda, Step, S3, DynamoDB) · GCP · Docker · SLURM · Linux · Git · Neo4j · MLflow · Vertex AI

Last revised April 2026.