External Resources

  • Paracrawl: Broader/Continued Web-Scale Provision of Parallel Corpora for European Languages
  • Common Crawl: open repository of web crawl data that can be accessed and analyzed by anyone
  • OSCAR: Open Super-large Crawled ALMAnaCH coRpus is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus
  • OpusMT: NMT systems trained with all the languages and corpora available in Opus