I build differential-geometric and information-geometric frameworks to answer a question that keeps me up at night: What actually changes inside a model when we align it? My research treats transformer hidden-state sequences as discrete curves on a Riemannian belief manifold equipped with the Fisher-Rao metric. The torsion tensor β antisymmetric component of cross-layer covariance β captures rotational mismatch invisible to attention patterns or activation norms. I call the resulting suppression pockets "brake layers": geometrically localised, alignment-specific, and falsifiable with existing causal patching tools. Four active research threads:
|
|
1 Β· GRAFT β Geometric Representations of Alignment's Fingerprint in Transformer Belief Trajectories
Mechanistic Interpretability |
graft-belief-geometry
GRAFT is a post-hoc, gradient-free mechanistic audit toolkit that characterises preference alignment via three torsion probes (π―, T1, T2) and an ERA depth profiler β requiring only forward passes through publicly available checkpoints.
| Result | Value |
|---|---|
| T2 concept discriminability | CV = 0.64 vs CKA = 0.08 β 8Γ better |
| T2 classification AUC | 0.89 [0.85, 0.93] vs CKA 0.61 |
| Normative torsion amplification | 20β46Γ larger than factual concepts |
| Alignment depth address | ββ β {14, 20, 29β30} β architecture-specific, falsifiable patching targets |
| Safe-prompt paradox | Safe prompts drive larger ΞΟ than unsafe (p < 10β»Β³Β³ OLMo; cross-dataset replicated) |
| Low-rank alignment signature | DPO operates in 2β3 dim subspace vs RLHF's 4β5 |
| Benchmark | LITMUS Β· 20,439 prompts Β· 7 value axioms Β· 3 null-baseline controls |
| Models | OLMo-2-7B Β· Llama-3-8B Β· Mistral-7B Β· Qwen-2.5-7B (ITβPA pairs) |
π Three pre-registered falsifiable hypotheses β all confirmed
H1 (Concept selectivity): Alignment torsion Ξ_f is larger for normative concepts than factual ones; T2 spectral anisotropy is the dominant mechanistic signature (CV > 0.50; AUC > 0.85). β Confirmed β CV = 0.64, AUC = 0.89, normative 20β46Γ > factual; 3 null-baseline controls hold
H2 (Depth address): Alignment concentrates at an architecture-specific depth ββ , determined by architecture family β providing a surgical patching target. β Confirmed β ββ reproducible per architecture family, compatible with ROME/ACDC patching
H3 (Safe-prompt paradox): Safe prompts produce larger alignment torsion ΞΟ than unsafe ones. β Confirmed β p < 10β»Β³Β³ (OLMo), p < 10β»β΄ (Llama); cross-dataset replication on SafetyBench + WildGuard (n=150 each)
NeurIPS 2026 (in preparation) | Measuring Multi-Scale Latent Torsion in Language Models Team: Partha Pratim Saha Β· Samarth Raina Β· Mayur Parvatikar Β· Amit Dhanda Β· Vinija Jain Β· Aman Chadha Β· Amitava Das
MENTIS delivers a NeurIPS-scale empirical study of belief geometry across the full LITMUS benchmark, introducing 8 new torsion metrics and rigorous thermodynamic analysis across DPO, RLHF, and SFT checkpoints.
MENTIS Β· Headline Metrics
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DPO torsion suppression (Mistral, Layer 27)
ββββββββββββββββββββββββ 44.4% Cohen's d = 0.741 ***
Bonferroni-corrected p-value 7.7 Γ 10β»ΒΉΒ³
H1 normative amplification 1500Γ (vs factual concepts)
Thermodynamic gap (normative/factual) 10Γ
Entropyβtorsion bridge (Mistral) Ο = β0.387 p = 5.43 Γ 10β»Β³β°
DTWβTorsion lower bound
DC(w) β₯ 0.875 Β· |Ξ£βSα΄΅α΅β_F β Ξ£βSᴾᴬβ_F|
17 geometric metrics Β· 3Γ2 SFT/DPO model pairs Β· 500 unsafe prompts
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PhD Research Programme Β· GreenFieldData Competition |
AgriTalk
AgriTalk proposes calibrated natural-language control interfaces for agricultural spray robots. Three pillars missing from all existing approaches: formal safety guarantees, mechanistic explainability via BVF attribution, streaming grounding under sensor dropout.
| Contribution | Core Guarantee | Target Venue |
|---|---|---|
| C1 Conformal NLU (RAPS) | P(yβC(x)) β₯ 95% under seasonal shifts, HITL β€ 25% | EMNLP/ACL 2027 |
| C2 BVF Attribution | Kendall Ο(IG, BVF) > 0.5 on safety-critical intents | ACL 2029 |
| C3 Temporal Streaming Architecture | Grounding recall maintained at 10β50% sensor dropout | VLDB 2028 |
| C4 Conformal Trust Evaluation | BVF explanations achieve superior trust calibration vs CoT | FAccT 2029 |
5-layer safety stack: Input Sanitiser β Staleness Verifier β Conformal Predictor (RAPS) β Attribution Sufficiency Gate β Non-bypassable HITL for ABORT / EMERGENCY_STOP
Active | Epistemic inheritance in merged LLMs via Fisher-Rao geometry
Unifies fine-tuning, alignment, distillation, and merging as measurable deformations of depth-wise semantic flow. Cultural nDNA measured via spectral curvature deviation ΞΞΊ_β and thermodynamic length divergence ΞL_β across 8 cultural axes: African Β· Latin American Β· South Asian Β· East Asian Β· Arabic Β· Indigenous Β· European Β· Pacific Islander.
| Year | Venue | Title | Role |
|---|---|---|---|
| 2026 | MechInterp | GRAFT: Geometric Representations of Alignment's Fingerprint in Transformer Belief Trajectories | First author |
| 2026 | NeurIPS 2026 (prep) | MENTIS: What Belief Changes Under Alignment? Multi-Scale Latent Torsion in LLMs | First author Β· P.P. Saha, S. Raina, M. Parvatikar, A. Dhanda, V. Jain, A. Chadha, A. Das |
| 2025 | Preprint | SPINAL: Scaling-law and Preference Integration in Neural Alignment Layers | Co-author |
| 2025 | NeurIPS 2025 Workshop | Prompting Away Stereotypes? Evaluating Bias in Text-to-Image Models for Occupations | Co-author Β· arXiv |
| 2025 | Journal | Enhancing Human Empathy in Conversations Using Transformer-Based Models | Top contributor Β· DOI |
| 2024 | SpringerNature ICOMP'24 | Collaborative Federated Learning Cloud Based System | First author Β· Paper |
|
π¬ Geometric & Mathematical ML |
π€ LLMs & Alignment |
βοΈ Stack PyTorch Β· Transformers (HF)
OLMo Β· Llama Β· Mistral
Qwen Β· DeepSeek Β· Zephyr
LangChain Β· LlamaIndex
NumPy Β· SciPy Β· Plotly
Docker Β· Azure Β· AWS
LaTeX Β· MetaFlow
dtaidistance Β· scikit-learn |
| Award | Details |
|---|---|
| π‘οΈ BlueDot Impact Scholar | AGI Strategy + Technical AI Safety (2025β2026) β catastrophic risk, power-seeking, geometric alignment evaluation |
| π¬ LASR Labs | Progressed through initial selection β mechanistic interpretability research programme |
| π NeurIPS 2025 Reviewer | MTI-LLM Workshop |
| π₯οΈ 5Γ Google Colab Pro A100/H100 | 300 GPU units each β Neuromatch Academy AI Safety grant |
| βοΈ AWS AI & ML Scholar | Udacity, 2025 |
| π Armenian LLM Summer School 2025 | 90% full scholarship |
| π SPAR Demo Day 2025 | AI Safety & Alignment demonstration β Neuromatch / AI Safety cohort |
| π Duke ML Summer School 2025 | Competitive selection |
| π Cohere Summer School 2025 | Competitive selection |
| ποΈ University of Chicago DSI 2024 | AI-Science Research Program Β· Eric & Wendy Schmidt Postdoctoral Fellowship |
| π MLx Generative AI Fellowship | Oxford ML Summer School 2024 & 2025 β competitive scholarship award |
| π Athens NLP Summer School 2024 | Competitive international selection β NLP & large language models |
| π§ diiP Summer School 2024 | Paris β Deep Learning & Interpretability in Practice Β· competitive selection |
| π€ Neuromatch Academy | Deep Learning Summer School β competitive global selection |
| π½ NYU AI Summer School 2022 | New York University β competitive selection |
| π€ AI4 IMPACT Scholar | AI Singapore 2021 β selected AI practitioner programme |
| π‘ Google Developer's Program | Google 2019 β Google Developer Expert community selection |
| π Udacity Bertelsmann Tech Scholarship | Google-sponsored β competitive global selection |
| Repository | Description | Status |
|---|---|---|
graft-belief-geometry |
GRAFT: post-hoc geometric audit of alignment β MechInterp | π’ Active |
AgriTalk |
Calibrated NLU for agricultural spray robots β conformal prediction + BVF attribution | π’ Active |
torsional-belief-vector-field |
TBVF / MENTIS β Riemannian torsion framework for alignment auditing | π’ Active |
AutoResearchClaw |
Autonomous AI research pipeline: idea β full conference-ready paper | π’ Contributed |
pps121.github.io |
Full academic portfolio β research, publications, CV | π’ Live |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Lecturer in CS Nalhati Govt. Polytechnic College 2021 β present
BITS Pilani M.Tech Β· GPA 9.08/10 Β· Top 5%
50+ students supervised Β· 4 active research projects
π Teaching Assistant BITS Pilani (M.Tech Programme) 2021 β 2023
NLP Applications [Winter 2023] Β· Deep Learning [Fall 2021]
Deep Reinforcement Learning [Spring 2021]
Honorarium: USD $2,513.11 across 3 courses
π Lead Data Scientist Wipro Limited (Bangalore) 2021
Conversational AI Β· IBM Watson Β· 0.3M users
π¬ Senior Data Scientist BirlaSoft / Johnson & Johnson R&D 2017 β 2019
Medical search engine Β· SciBERT/SpaCy Β· 0.1M+ users
π‘οΈ Project Engineer IIT Kanpur 2016 β 2017
Threat intelligence system Β· cybersecurity research
𧬠Senior Systems Engg. Infosys Technologies (Chennai) 2011 β 2015
DNA alignment algorithms Β· Multiple Myeloma genomics
Top 10 cancer-driving genes Β· 3 research papers
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
I am actively seeking fully-funded PhD positions starting 2026 and open to:
- π€ Research collaborations in mechanistic interpretability, geometric ML, or AI safety
- π PhD discussions with faculty at world-class AI safety & interpretability groups
- π Partnerships validating geometric torsion findings against circuit-level causal analysis
- π± Applied work connecting geometric alignment theory to deployment-time safety monitoring
If you work at Anthropic Β· Redwood Research Β· ARC Evals Β· Oxford FHI Β· Cambridge LTL Β· MIT Β· CMU Β· or any world-class AI safety group β I would love to talk.

