Build reliable customer-facing AI agents with Parlant: an interaction control harness optimized for controlled, consistent, and predictable LLM interactions.
-
Updated
May 7, 2026 - Python
Build reliable customer-facing AI agents with Parlant: an interaction control harness optimized for controlled, consistent, and predictable LLM interactions.
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022
A curated list of trustworthy deep learning papers. Daily updating...
📚 A curated list of papers & technical articles on AI Quality & Safety
The open-source diagnostic for AI misalignment. 32 tests across fabrication, manipulation, deception, unpredictability, and opacity. Provider-agnostic. Runs against OpenAI, Anthropic, Bedrock, Azure, Gemini, and more. Letter grade in under 5 minutes, content-addressed manifest for bit-identical replay. Built by iMe.
Code accompanying the paper Pretraining Language Models with Human Preferences
How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚
[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".
Reading list for adversarial perspective and robustness in deep reinforcement learning.
A curated list of awesome academic research, books, code of ethics, courses, databases, data sets, frameworks, institutes, maturity models, newsletters, principles, podcasts, regulations, reports, responsible scale policies, tools and standards related to Responsible, Trustworthy, and Human-Centered AI.
A curated list of awesome resources for Artificial Intelligence Alignment research
A rationalist ruleset for "debugging" LLMs, auditing their internal reasoning and uncovering biases; also a jailbreak.
Sparse probing paper full code.
[ICLR 2026] - Official repo for the paper: "RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models"
Official Implementation of Nabla-GFlowNet (ICLR 2025)
Website to track people, organizations, and products (tools, websites, etc.) in AI safety
Educational analysis of LLM alignment, safety behavior, and framing-sensitive response patterns.
Just like the elite potential of a high-drive Belgian Malinois, an AI system's raw capabilities are wasted when deployed without proper structure. The technological value is no longer found in creating the drive, but in mastering the leash. Synapptic gives your AI Assistant persistent memory that updates in real time, saving you tokens and time.
[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
Add a description, image, and links to the ai-alignment topic page so that developers can more easily learn about it.
To associate your repository with the ai-alignment topic, visit your repo's landing page and select "manage topics."