Determining the accuracy of crowdsourced tweet verification for auroral research

The article “Determining the accuracy of crowdsourced tweet verification for auroral research” was recently accepted to the journal Citizen Science: Theory and Practice. The article, for which Nathan was the lead author, describes the accuracy of citizen scientists at determining whether tweets relating to “aurora” were in fact sightings of the natural phenomenon.

Abstract

The Aurorasaurus project harnesses volunteer crowdsourcing to identify sightings of an aurora (the “northern/southern lights”) posted by citizen scientists on Twitter. Previous studies have demonstrated that aurora sightings can be mined from Twitter with the caveat that there is a large background level of non-sighting tweets, especially during periods of low auroral activity. Aurorasaurus attempts to mitigate this, and thus increase the quality of its Twitter sighting data, by using volunteers to sift through a pre-filtered list of geolocated tweets to verify real-time aurora sightings. In this study, the current implementation of this crowdsourced verification system, including the process of geolocating tweets, is described and its accuracy (which, overall, is found to be 68.4%) is determined. The findings suggest that citizen science volunteers are able to accurately filter out unrelated, spam-like, Twitter data but struggle when filtering out somewhat related, yet undesired, data. The citizen scientists particularly struggle with determining the real-time nature of the sightings, so care must be taken when relying on crowdsourced identification.

The article is open source and free to download from the publisher’s site.