Danish Fungi 2020 (DF20) is a fine-grained dataset and benchmark. The dataset, constructed from observations submitted to the Danish Fungal Atlas, is unique in its taxonomy-accurate class labels, small number of errors, highly unbalanced long-tailed class distribution, rich observation metadata, and well-defined class hierarchy. DF20 has zero overlap with ImageNet, allowing unbiased comparison of models fine-tuned from publicly available ImageNet checkpoints.
The dataset has 1,604 different classes, with 248,466 training images and 27,608 test images.
Several popular Machine Learning techniques are originally designed for the solution of two-class problems. However, several classification problems have more than two classes. One approach to deal with multiclass problems using binary classifiers is to decompose the multiclass problem into multiple binary sub-problems disposed in a binary tree. This approach requires a binary partition of the classes for each node of the tree, which defines the tree structure. This paper presents two algorithms to determine the tree structure taking into account information collected from the used dataset. This approach allows the tree structure to be determined automatically for any multiclass dataset.
The dataset with 409,679 images belonging to 772 snake species from 188 countries and all continents (386,006 images with labels targeted for development and 23,673 images without labels for testing). In addition, we provide a simple train/val (90% / 10%) split to validate preliminary results while ensuring the same species distributions. Furthermore, we prepared a compact subset (70,208 images) for fast prototyping. The test set data consists of 23,673 images submitted to the iNaturalist platform within the "first four months of 2021.
All data were gathered from online biodiversity platforms (i.e., iNaturalist, HerpMapper) and further extended by data scraped from Flickr. The provided dataset has a heavy long-tailed class distribution, where the most frequent species (Thamnophis sirtalis) is represented by 22,163 images and the least frequent by just 10 (Achalinus formosanus).