The growing presence of devices carrying digital cameras, such as mobile phones and tablets, combined with ever improving internet networks have enabled ordinary citizens, victims of human rights abuse, participants in armed conflicts, protests and disaster situations to capture and share via social media networks images and videos of specific events. Furthermore, after setting the performance benchmarks for image, video, speech and audio processing, deep convolutional networks have been core to the greatest advances in image recognition tasks in recent times. This raises the question of whether there are any benefit in targeting these remarkable deep architectures with the unattempted task of recognising human rights violations through digital images. The goals of my current research are motivated by the critical need for systematic studies that define and develop context-based vision systems, which could in turn help organisations concerned with human rights to manage and discriminate large collections of images more effectively.
The entire pipeline used for the experiments is depicted in the figure below, alongside a brief description for every phase of the system.
Collection of keywords related to different human rights violations
Python interface for downloading images for each query term
Elimination of exact duplicate and irrelevant images
Manual addition of other suitable images in order to construct the Human Rights UNderstanding (HRUN) dataset
Different pre-trained ConvNets are used as fixed feature extractors
Linear Support Vector Machine (SVM) is trained with the extracted features
Mean Average Precision (mAP) metric is used to assess the performance of the system