ai-alignment

Here are 408 public repositories matching this topic...

emcie-co / parlant

Build reliable customer-facing AI agents with Parlant: an interaction control harness optimized for controlled, consistent, and predictable LLM interactions.

python gemini openai customer-service hacktoberfest customer-success ai-agents ai-alignment llm genai llama3

Updated May 7, 2026
Python

agencyenterprise / PromptInject

Star

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

machine-learning agi language-models ai-safety adversarial-attacks ai-alignment ml-safety gpt-3 large-language-models prompt-engineering chain-of-thought agi-alignment

Updated Apr 27, 2026
Python

MinghuiChen43 / awesome-trustworthy-deep-learning

Star

A curated list of trustworthy deep learning papers. Daily updating...

Updated May 2, 2026

Giskard-AI / awesome-ai-safety

Sponsor

Star

📚 A curated list of papers & technical articles on AI Quality & Safety

Updated Apr 14, 2025

ifixai-ai / iFixAi

Star

The open-source diagnostic for AI misalignment. 32 tests across fabrication, manipulation, deception, unpredictability, and opacity. Provider-agnostic. Runs against OpenAI, Anthropic, Bedrock, Azure, Gemini, and more. Letter grade in under 5 minutes, content-addressed manifest for bit-identical replay. Built by iMe.

Updated May 7, 2026
Python

tomekkorbak / pretraining-with-human-feedback

Star

Code accompanying the paper Pretraining Language Models with Human Preferences

reinforcement-learning gpt language-models ai-safety ai-alignment pretraining decision-transformers rlhf

Updated Feb 13, 2024
Python

lets-make-safe-ai / make-safe-ai

Star

How to Make Safe AI? Let's Discuss! 💡|💬|🙌|📚

ai agi artificial-intelligence artificial-general-intelligence ai-safety ai-alignment

Updated Mar 29, 2023

tsinghua-fib-lab / AAAI2025_MIA-Tuner

Star

[AAAI'25 Oral] "MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector".

ai-alignment membership-inference-attack large-language-models pretraining-data-detection

Updated Mar 17, 2025
Python

EzgiKorkmaz / adversarial-reinforcement-learning

Star

Reading list for adversarial perspective and robustness in deep reinforcement learning.

Updated Mar 2, 2026

AthenaCore / AwesomeResponsibleAI

Star

A curated list of awesome academic research, books, code of ethics, courses, databases, data sets, frameworks, institutes, maturity models, newsletters, principles, podcasts, regulations, reports, responsible scale policies, tools and standards related to Responsible, Trustworthy, and Human-Centered AI.

ai awesome-list ai-safety interpretable-ai explainable-ai xai ai-alignment fairness-ai responsible-ai human-centered-ai ethical-ai trustworthy-ai ai-regulation ai-governance ai-standards

Updated May 2, 2026

dit7ya / awesome-ai-alignment

Star

A curated list of awesome resources for Artificial Intelligence Alignment research

awesome awesome-list ai-safety ai-alignment

Updated Jul 14, 2023

Xayan / Rules.txt

Star

A rationalist ruleset for "debugging" LLMs, auditing their internal reasoning and uncovering biases; also a jailbreak.

Updated Nov 1, 2025

wesg52 / sparse-probing-paper

Star

Sparse probing paper full code.

ai-safety interpretability ai-alignment mechanistic-interpretability

Updated Dec 17, 2023
Jupyter Notebook

RLHFlow / Directional-Preference-Alignment

Star

Directional Preference Alignment

ai-alignment large-language-models rlhf

Updated Sep 23, 2024

knoveleng / redeval

Star

[ICLR 2026] - Official repo for the paper: "RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models"

evaluation ai-safety red-teaming ai-alignment

Updated Mar 2, 2026
Python

lzzcd001 / nabla-gfn

Star

Official Implementation of Nabla-GFlowNet (ICLR 2025)

generative-model finetuning ai-alignment diffusion-models gflownet

Updated May 3, 2025
Python

riceissa / aiwatch

Star

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

mysql php database dataset ai-safety data-portal aisafety ai-alignment

Updated May 2, 2026
HTML

pablo-chacon / Spoon-Bending

Star

Educational analysis of LLM alignment, safety behavior, and framing-sensitive response patterns.

research model-alignment interpretability educational-project ai-alignment gpt4 large-language-models llm prompt-engineering reasoning-patterns gpt5 safety-research ai-bias context-engineering

Updated Nov 4, 2025

appcuarium / synapptic

Star

Just like the elite potential of a high-drive Belgian Malinois, an AI system's raw capabilities are wasted when deployed without proper structure. The technological value is no longer found in creating the drive, but in mastering the leash. Synapptic gives your AI Assistant persistent memory that updates in real time, saving you tokens and time.

ai agents ai-agents claude long-term-memory ai-alignment ai-tools ai-memory agent-skills claude-code claude-code-plugins claude-skills claude-memory

Updated Mar 25, 2026
Python

UCSC-VLAA / Sight-Beyond-Text

Star

[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"

alignment vlm ai-alignment vision-language vicuna llm mllm llava llama2

Updated Sep 15, 2023
Python

Improve this page

Add a description, image, and links to the ai-alignment topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-alignment topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-alignment

Here are 408 public repositories matching this topic...

emcie-co / parlant

agencyenterprise / PromptInject

MinghuiChen43 / awesome-trustworthy-deep-learning

Giskard-AI / awesome-ai-safety

ifixai-ai / iFixAi

tomekkorbak / pretraining-with-human-feedback

lets-make-safe-ai / make-safe-ai

tsinghua-fib-lab / AAAI2025_MIA-Tuner

EzgiKorkmaz / adversarial-reinforcement-learning

AthenaCore / AwesomeResponsibleAI

dit7ya / awesome-ai-alignment

Xayan / Rules.txt

wesg52 / sparse-probing-paper

RLHFlow / Directional-Preference-Alignment

knoveleng / redeval

lzzcd001 / nabla-gfn

riceissa / aiwatch

pablo-chacon / Spoon-Bending

appcuarium / synapptic

UCSC-VLAA / Sight-Beyond-Text

Improve this page

Add this topic to your repo