Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - The Danish Gigaword Project
[go: Go Back, main page]

Papers
arxiv:2005.03521

The Danish Gigaword Project

Published on May 7, 2020
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

A large-scale Danish text corpus (Danish Gigaword Corpus) is presented to address language technology limitations due to insufficient data availability.

Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialects.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2005.03521
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 4

Datasets citing this paper 4

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.