Detection and identification of individuals
Idtracker.ai uses two deep convolutional neural networks (CNN). One to detect when animals touch or cross (deep crossing detector) and a second one to identify animals(identification network).
The system starts the process by capturing image data within a range between 25 and 50 frames per second. This allows the system to collect images belonging to the same individual and organise them in fragments for accurate tracking and identification without overloading the system with redundant image data.
Images representing either single or multiple touching animals are extracted from the video. Each image is labelled as either a single individual or a crossing. Groups of images in subsequent frames of the video in which the same individual (or crossing)is represented are named individual and crossing fragments, respectively. idtracker.ai tracks the individuals by relying on their visual features.
“The main challenge for us was to accurately track each of the individuals. We wanted to record a large group of juvenile zebrafish with a density of animals low enough to allow the group to express a different range of behaviours. Hence, we needed to cover a big area in proportion to the actual size of the animals. This is when having a high resolution camera came handy. With the 20 megapixels of theHT-20000-M we were able to cover a wide area and still have enough pixels per animal for idtracker.ai to work.“, says Francisco Romero-Ferrero, PhD student atCollective Behaviour Lab, Champalimaud Research.
A subset of the collection of individual fragments, in which all the individuals are visible in the same part of the video is then used to generate a dataset of individual images labelled with the corresponding identities. This dataset is then utilised to train a second CNN to classify images according to their identity (identification network).
The information gained from the first dataset of labelled images will allow either to accurately assign the entire collection of individual fragments, or to increase the first dataset by incorporating safely identified individual fragments throughout the video. Finally, trivial identification errors are corrected by a series of post-processing routines, and the identity of the crossing fragments is inferred in a last computational core.