【Common Crawl Web Languages:一个众包项目,旨在帮助Common Crawl更好地爬取资源较少的语言网页,促进全球各种语言内容的覆盖和可访问性】'commoncrawl/web-languages: Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages.' GitHub: github.com/commoncrawl/web-languages
语言多样性 众包项目 网页爬取 AI创造营