Blog, Articles, and discussions

Community Articles

view all

The Pharmome Map: a comprehensive public dataset for drug-target interaction modeling

5 days ago

•

KV Caching Explained: Optimizing Transformer Inference Efficiency

Jan 30

•

177

To Think or Not to Think: A Router for Hybrid LLMs

6 days ago

•

Uncensor any LLM with abliteration

Jun 13, 2024

•

722

🧠 SQaLe: Enabling new Text-to-SQL models with our massive dataset

4 days ago

•

Why Did MiniMax M2 End Up as a Full Attention Model?

24 days ago

•

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

20 days ago

•

Code a simple RAG from scratch

Oct 29, 2024

•

248

Visualizing How VLMs Work

Oct 7

•

Granite 4.0 Nano: Just how small can you go?

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

June 3, 2025

vlmmultimodaltrl

Preference Optimization for Vision Language Models

July 10, 2024

researchrlrlhf

Putting RL back in RLHF

107

June 12, 2024

researchrlrlhf

Constitutional AI with Open LLMs

February 1, 2024

rlrlhfnlp

Preference Tuning LLMs with Direct Preference Optimization Methods

January 18, 2024

researchrlrlhf

The N Implementation Details of RLHF with PPO

October 24, 2023

guidediffusersrl

Finetune Stable Diffusion Models with DDPO via TRL

September 29, 2023

rlrlhfnlp

Fine-tune Llama 2 with DPO

August 8, 2023

rlrlhfnlp

StackLLaMA: A hands-on guide to train LLaMA with RLHF

April 5, 2023

rlrlhfnlp

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

March 9, 2023

llmsrlhfred-teaming

Red-Teaming Large Language Models

February 24, 2023

rlhfChatGPTcot

What Makes a Dialog Agent Useful?

January 24, 2023

rlhfrlguide

Illustrating Reinforcement Learning from Human Feedback (RLHF)

373

December 9, 2022

Community Articles

We’re open-sourcing our text-to-image model and the process behind it

11 days ago

•

Apriel-H1: The Surprising Key to Distilling Efficient Reasoning Models

4 days ago

•

Text-to-image Architectural Experiments

9 days ago

•

Introducing Cogito v2.1

3 days ago

•

Projected Abliteration

28 days ago

•

AI Model Optimization More Flexible Than Ever

5 days ago

•

The Heterogeneous Feature of RoPE-based Attention in Long-Context LLMs

7 days ago

•

How to make NeuTTS-air generate over 200 seconds of audio in a single second.

1 day ago

•

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

18 days ago

•

Norm-Preserving Biprojected Abliteration

16 days ago

•

The Pharmome Map: a comprehensive public dataset for drug-target interaction modeling

5 days ago

•

KV Caching Explained: Optimizing Transformer Inference Efficiency

Jan 30

•

177

To Think or Not to Think: A Router for Hybrid LLMs

6 days ago

•

Uncensor any LLM with abliteration

Jun 13, 2024

•

722

🧠 SQaLe: Enabling new Text-to-SQL models with our massive dataset

4 days ago

•

Why Did MiniMax M2 End Up as a Full Attention Model?

24 days ago

•

The 1 Billion Token Challenge: Finding the Perfect Pre-training Mix

20 days ago

•

Code a simple RAG from scratch

Oct 29, 2024

•

248

Visualizing How VLMs Work

Oct 7

•

Granite 4.0 Nano: Just how small can you go?