SLäNDa, the Swedish literature corpus of narrative and dialogue, is a corpus made up of eight Swedish literary novels from the 19th and early 20th centuries, manually annotated mainly for different aspects of dialogue. The full annotation also contains other cited materials, like thoughts, signs and letters. The main motivation for including these categories as well, is to be able to identify the main narrative, which is all remaining unannotated text.
SLäNDa version 2.0 extends version 1.0 mainly by adding more data, but also by additional quality control, and a slight modification of the annotation scheme. In addition, the data is organized into test sets with different types of speech marking: quotation marks, dashes, and no marking.