CCNet Extracting High Quality Monolingual Datasets from Web Crawl Data
个人笔记 · 一个关于大语言模型(LLM)评估的指南手册
OPENCSG CHINESE CORPUS A SERIES OF HIGHQUALITY CHINESE DATASETS FOR LLM TRAINING
The FineWeb Datasets Decanting the Web for the Finest Text Data at Scale
LLM生成评估指标,协助标注数据完成Reward模型训练