Ksenia Se's picture

1 99 3

Ksenia Se

Kseniase

·

https://www.turingpost.com/

AI & ML interests

None yet

Recent Activity

replied to their post 2 days ago

10 Latest Preference Optimization Techniques Models need feedback on what makes outputs “good” or “bad.” Policy optimization (PO) turns preferences and rewards into actual training signals. This field is evolving quickly, moving far beyond classics like PPO and GRPO. So here is our overview of 10 newest PO methods: 1. Pref-GRPO → https://huggingface.co/papers/2508.20751 Stabilizes text-to-image reinforcement learning (RL) with pairwise preference rewards and a unified UNIGENBENCH benchmark 2. PVPO (Policy with Value Preference Optimization) → https://huggingface.co/papers/2508.21104 This critic-free RL method uses a pre-trained model as a reference anchor to reduce bias and guide learning, selecting high-value examples through data pre-sampling 3. DCPO (Dynamic Clipping Policy Optimization) → https://huggingface.co/papers/2509.02333 Uses dynamic clipping, which adjusts probability limits per token for better token exploration, and smooth reward standardization to balance rewards over training steps and prevent wasted updates 4. ARPO (Agentic Reinforced Policy Optimization) → https://huggingface.co/papers/2507.19849 Optimizes multi-turn LLM agents that use external tools. It uses an entropy-based adaptive rollout to explore post-tool use and an advantage attribution method to better assign credit across steps, leading to more efficient tool use with fewer resources 5. GRPO-RoC (Group Relative Policy Optimization with Resampling-on-Correct) → https://huggingface.co/papers/2508.20722 Oversamples rollouts, then resamples them to keep diverse mistakes and only the highest-quality correct answers. It reduces noises and ends up with stronger reasoning in a code environment Read further below ⬇️ If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

posted an update 2 days ago

10 Latest Preference Optimization Techniques Models need feedback on what makes outputs “good” or “bad.” Policy optimization (PO) turns preferences and rewards into actual training signals. This field is evolving quickly, moving far beyond classics like PPO and GRPO. So here is our overview of 10 newest PO methods: 1. Pref-GRPO → https://huggingface.co/papers/2508.20751 Stabilizes text-to-image reinforcement learning (RL) with pairwise preference rewards and a unified UNIGENBENCH benchmark 2. PVPO (Policy with Value Preference Optimization) → https://huggingface.co/papers/2508.21104 This critic-free RL method uses a pre-trained model as a reference anchor to reduce bias and guide learning, selecting high-value examples through data pre-sampling 3. DCPO (Dynamic Clipping Policy Optimization) → https://huggingface.co/papers/2509.02333 Uses dynamic clipping, which adjusts probability limits per token for better token exploration, and smooth reward standardization to balance rewards over training steps and prevent wasted updates 4. ARPO (Agentic Reinforced Policy Optimization) → https://huggingface.co/papers/2507.19849 Optimizes multi-turn LLM agents that use external tools. It uses an entropy-based adaptive rollout to explore post-tool use and an advantage attribution method to better assign credit across steps, leading to more efficient tool use with fewer resources 5. GRPO-RoC (Group Relative Policy Optimization with Resampling-on-Correct) → https://huggingface.co/papers/2508.20722 Oversamples rollouts, then resamples them to keep diverse mistakes and only the highest-quality correct answers. It reduces noises and ends up with stronger reasoning in a code environment Read further below ⬇️ If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

replied to their post 9 days ago

11 Powerful Image Models Everyone is buzzing around image generation this week, or more specifically, Google's Nano-Banana. So today we want to share a list of models that can be your great toolkit for image generation + editing + multi-turn refinement. 1. Gemini 2.5 Flash Image, or Nano-Banana → https://deepmind.google/models/gemini/image/ Google’s newest image model with conversational editing, character consistency, and multi-image fusion. Available in AI Studio and the Gemini API. Price: $2.50 per 1M tokens 2. FLUX (Black Forest Labs) → https://bfl.ai/ A family of models known for rich detail and, excellent prompt adherence, and fast iterative generation. Offered in several variants, from Pro to open-source, it's accessible via Hugging Face, Replicate, Azure AI Foundry, etc., and used as a base in many pipelines. Price: $0.025-0.08 per image 3. Midjourney v7 → https://www.midjourney.com/ Enhanced image fidelity, prompt comprehension, and anatomical coherence (hands, bodies, objects) + provides a smart lightbox editor. The Omni-reference tool improves character and object consistency in your images. It remains accessible via Discord with a supporting web interface. Price: $10-60/month 4. Stable Diffusion 3.5 (Stability AI) → https://stability.ai/stable-image Open-weights line with improved text rendering, photorealism, and prompt adherence compared to earlier versions. It introduces technical innovations through its MMDiT architecture. Price: $0.025-0.065 per image 5. OpenAI GPT-Image-1 →https://platform.openai.com/docs/guides/image-generation?image-generation-model=gpt-image-1 It's the same multimodal model that powers ChatGPT's image capabilities, offering high-fidelity image generation, precise edits, including inpainting, and accurate text rendering. Available via the Images API. Price: $40 per 1M tokens Read further below ⬇️ If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe

View all activity

Organizations

published an article 2 months ago

Article

What Coding Agent Wins?

By

and 1 other •

Jun 26

• 7

published an article 4 months ago

Article

🦸🏻#17: What is A2A and why is it – still! – underappreciated?

By

•

May 7

• 13

published an article 4 months ago

Article

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

By

and 1 other •

Apr 27

• 9

published an article 5 months ago

Article

Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers?

By

and 1 other •

Apr 4

• 14

published an article 6 months ago

Article

FOD#93: When AI meant Ambient Intelligence

By

•

Mar 25

• 1

published an article 6 months ago

Article

🎙️🧩 TP/Inference: Sharon Zhou on AI Hallucinations, Agents Hype, and Giving Developers the Keys to GenAI

By

•

Mar 24

• 2

published an article 6 months ago

Article

What is Qwen-Agent framework? Inside the Qwen family

By

and 1 other •

Mar 20

• 12

published an article 6 months ago

Article

🌁#92: Fight for Developers and the Year of Orchestration

By

•

Mar 18

• 5

published an article 6 months ago

Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

By

•

Mar 17

• 335

published an article 6 months ago

Article

How to Reduce Memory Use in Reasoning Models

By

and 1 other •

Mar 13

• 14

published an article 6 months ago

Article

🌁#91: We are failing in AI literacy

By

and 1 other •

Mar 10

• 3

published an article 6 months ago

Article

🌁#90: Why AI’s Reasoning Tests Keep Failing Us

By

•

Mar 3

• 9

published an article 6 months ago

Article

🦸🏻#13: Action! How AI Agents Execute Tasks with UI and API Tools

By

•

Mar 10

• 9

published an article 6 months ago

Article

🦸🏻#12: How Do Agents Learn from Their Own Mistakes? The Role of Reflection in AI

By

•

Mar 9

• 8

published an article 6 months ago

Article

Everything You Need to Know about Knowledge Distillation

By

and 1 other •

Mar 6

• 40

published an article 6 months ago

Article

Inside the family of Smol models

By

and 1 other •

Feb 27

• 13

published an article 7 months ago

Article

🌁#89: AI in Action: How AI Engineers, Self-Optimizing Models, and Humanoid Robots Are Reshaping 2025

By

•

Feb 25

• 4

published an article 7 months ago

Article

🦸🏻#11: How Do Agents Plan and Reason?

By

•

Feb 24

• 15

published an article 7 months ago

Article

Topic 28: What is Mixture-of-Mamba?

By

and 1 other •

Feb 20

• 3

published an article 7 months ago

Article

🌁#88: Can DeepSeek Inspire Global Collaboration?

By

•

Feb 17

• 3