[TOC]
Content
-
- note:
- abstract:
-
2510.10481 UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models
- note:笔记
- abstract:扩充LLaDA的长上下文
-
2509.24389 LLaDA-MoE: A Sparse MoE Diffusion Language Model
- note:笔记
- abstract:修改了LLaDA的Dense Transformer为MoE架构
-
2509.13866 Masked Diffusion Models as Energy Minimization
- 暂未阅读
-
2507.06203 A Survey on Latent Reasoning
- 暂未阅读
-
2505.19223 LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
- note:笔记(难度较高,未读完)
- abstract:提出VRPO方法,对LLaDA进行强化学习
-
2505.16933 LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
- note:笔记
- abstract:实现了图片理解的多模态LLaDA模型
-
2502.09992 Large Language Diffusion Models
- note:笔记
- abstract:LLaDA