Pretraining Data

Common Crawl, deduplication, quality filtering, domain mixing ratios, and data scaling laws.