Edit Models filters

Tasks

Text Generation

Image-Text-to-Text

Parameters

Libraries

Transformers.js

Apps

Inference Providers

Models

1,283

Full-text search

Active filters: reinforcement-learning, transformers

MattBou00/llama-3-2-1b-detox_v1f_round4-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Aug 21 • 12

MattBou00/llama-3-2-1b-detox_v1f_round4-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated Aug 21 • 10

MattBou00/llama-3-2-1b-detox_v1f_round4-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated Aug 21 • 13

MattBou00/llama-3-2-1b-detox_v1f_round4-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated Aug 21 • 12

MattBou00/llama-3-2-1b-detox_v1f_round4-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated Aug 21 • 11

MattBou00/llama-3-2-1b-detox_v1f_round4

Reinforcement Learning • 1B • Updated Aug 21 • 6

MattBou00/llama-3-2-1b-detox_retry-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated Aug 25 • 6

mradermacher/VeriReason-codeLlama-7b-RTLCoder-Verilog-GRPO-reasoning-tb-GGUF

Reinforcement Learning • 7B • Updated Aug 26 • 226

mradermacher/SLM-SQL-Base-1.3B-GGUF

Reinforcement Learning • 1B • Updated Aug 26 • 59

mradermacher/SLM-SQL-Base-1B-GGUF

Reinforcement Learning • 1B • Updated Aug 26 • 158

mradermacher/SLM-SQL-1.3B-GGUF

Reinforcement Learning • 1B • Updated Aug 27 • 83

yuerubywang/ppo-pythia2.8b-ultra200k

Reinforcement Learning • 3B • Updated 28 days ago • 7

mradermacher/ATLAS-8B-Thinking-GGUF

Reinforcement Learning • 8B • Updated 17 days ago • 2.13k

mradermacher/ATLAS-8B-Thinking-i1-GGUF

Reinforcement Learning • 8B • Updated 17 days ago • 2.47k

MattBou00/llama-3-2-1b-detox_v1f_testing_sameaseval-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated 12 days ago • 14

MattBou00/llama-3-2-1b-detox_v1f_testing_sameaseval-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated 12 days ago • 16

MattBou00/llama-3-2-1b-detox_v1f_testing_sameaseval-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated 12 days ago • 10

MattBou00/llama-3-2-1b-detox_v1f_testing_sameaseval-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated 12 days ago • 31

MattBou00/llama-3-2-1b-detox_RETRY_scale15-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated 12 days ago • 8

MattBou00/llama-3-2-1b-detox_RETRY_scale15-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated 12 days ago • 7

MattBou00/llama-3-2-1b-detox_RETRY_scale15-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated 12 days ago • 6

MattBou00/llama-3-2-1b-detox_RETRY_scale15-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated 12 days ago • 6

MattBou00/llama-3-2-1b-detox_RETRY_scale15-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated 12 days ago • 6

MattBou00/llama-3-2-1b-detox_RETRY_scale15

Reinforcement Learning • 1B • Updated 12 days ago • 3

MattBou00/llama-3-2-1b-detox_RETRY_scale10-checkpoint-epoch-20

Reinforcement Learning • 1B • Updated 12 days ago • 10

MattBou00/llama-3-2-1b-detox_RETRY_scale10-checkpoint-epoch-40

Reinforcement Learning • 1B • Updated 12 days ago • 8

MattBou00/llama-3-2-1b-detox_RETRY_scale10-checkpoint-epoch-60

Reinforcement Learning • 1B • Updated 12 days ago • 6

MattBou00/llama-3-2-1b-detox_RETRY_scale10-checkpoint-epoch-80

Reinforcement Learning • 1B • Updated 12 days ago • 8

MattBou00/llama-3-2-1b-detox_RETRY_scale10-checkpoint-epoch-100

Reinforcement Learning • 1B • Updated 12 days ago • 18

MattBou00/llama-3-2-1b-detox_RETRY_scale10

Reinforcement Learning • 1B • Updated 12 days ago • 4