A dataset for learning from crowd-sourced dot-annotations.
The penguin dataset is a collection of images of penguin colonies in Antarctica coming from the larger penguin watch project, which was setup with the purpose of monitoring their changes in population. The images are taken by fixed cameras in over 40 different locations, which have been capturing an image per hour for several years. In order to track the colony sizes, the number of penguins in each of the images in the dataset is required.
So far, the penguin count has been done with the help of citizen scientists on the Penguin Watch site by Zooniverse, where interested users can place dots on top of the penguins. Here we release part of this data to the vision community in order to learn from the crowd-sourced dot-annotations to automatically annotate these images. For more information about the project, please visit Penguin Watch.Each of the sites in the penguin dataset has different properties, which result in different levels of difficulty. For example, some cameras are placed to capture wide shots, where penguins appear in very low effective resolution. Other cameras are placed in such a way that the perspective creates constant occlusion. Factors external to the cameras, such as the weather on site, also represent difficulty factors that must be dealt with. Examples from different sites are shown below.
The annotations provided with the penguin dataset come from the Penguin Watch citizen-science site. These mainly consist of raw X-Y coordinates of dots placed by each of the site's volunteers, who are instructed to click con each penguin. In the following example we show the dots placed by eight different (and colour coded) volunteers.
| Filename | Description | Size |
|---|---|---|
| README | Information and license for using this dataset | 2KB |
| Dataset | List of URLs to download the images (28GB), annotations (47MB) and splits (1.2MB) of the dataset | 1KB |
| Annotations | (Included in URLs above) Raw dot-annotations in MATLAB and JSON formats, and regions of interest | 47MB |
| Model | Our model trained for penguin counting in MatConvNet | 483MB |
We thank Dr. Tom Hart and team, as well as the Zooniverse group for their leading role in the penguin watch project. Financial support was provided by the RCUK Centre for Doctoral Training in Healthcare Innovation (EP/G036861/1) and the EPSRC Programme Grant Seebibyte (EP/M013774/1).