15 January 2024

Emotion Recognition

'The unbearable (technical) unreliability of automated facial emotion recognition' by Federico Cabitza, Andrea Campagner and Martina Mattioli in (2022) 9(2) Big Data and Society comments 

Emotion recognition, and in particular facial emotion recognition (FER), is among the most controversial applications of machine learning, not least because of its ethical implications for human subjects. In this article, we address the controversial conjecture that machines can read emotions from our facial expressions by asking whether this task can be performed reliably. This means, rather than considering the potential harms or scientific soundness of facial emotion recognition systems, focusing on the reliability of the ground truths used to develop emotion recognition systems, assessing how well different human observers agree on the emotions they detect in subjects’ faces. Additionally, we discuss the extent to which sharing context can help observers agree on the emotions they perceive on subjects’ faces. Briefly, we demonstrate that when large and heterogeneous samples of observers are involved, the task of emotion detection from static images crumbles into inconsistency. We thus reveal that any endeavour to understand human behaviour from large sets of labelled patterns is over-ambitious, even if it were technically feasible. We conclude that we cannot speak of actual accuracy for facial emotion recognition systems for any practical purposes. ... 

Emotional artificial intelligence (AI) (McStay, 2020) is an expression that encompasses all computational systems that leverage ‘affective computing and AI techniques to sense, learn about and interact with human emotional life’. Within the emotional AI domain (but even more broadly, within the entire field of AI based on machine learning (ML) techniques), acial emotion recognition (FER), which denotes applications that attempt to infer the emotions experienced by a person from their facial expression (Paiva-Silva et al., 2016; McStay, 2020; Barrett et al., 2019), is one of the most controversial (Ghotbi et al., 2021) and debated (Stark and Hoey, 2021) applications. 

In fact, ‘turning the human face into another object for measurement and categorization by automated processes controlled by powerful companies and governments touches the right to human dignity’ and ‘the ability to extract […physiological and psychological characteristics such as ethnic origin, emotion and wellbeing…] from an image and the fact that a photograph can be taken from some distance without the knowledge of the data subject demonstrates the level of data protection issues which can arise from such technologies’. On the other hand, opinions diverge among the specialist literature. Some authors highlight the accurate performance of FER applications and their potential benefits in a variety of fields; for instance, customer satisfaction (Bouzakraoui et al., 2019), car driver safety (Zepf et al., 2020), or the diagnosis of behavioural disorders (Paiva-Silva et al., 2016; Jiang et al., 2019). Others have raised concerns regarding the potentially harmful uses in sectors such as human resource (HR) selection (Mantello et al., 2021; Bucher, 2022), airport safety controls (Jay, 2017), and mass surveillance settings (Mozur, 2020). In addition, the scientific basis of FER applications has been called into question, either by equating their assumptions with pseudo-scientific theories, such as phrenology or physiognomy (Stark and Hutson, Forthcoming), or by questioning the validity of the reference psychological theories (Barrett et al., 2019), which assume the universality of emotion expressions through facial expressions (Elfenbein and Ambady, 2002). Lastly, others have noted that the use of proxy data (such as still and posed images) to infer emotions should be supported by other contextual information (McStay and Urquhart, 2019), especially if the output of the FER systems is used to make sensitive decisions, so as to avoid misinterpretation of the broader context. According to Stark and Hoey (2021) ‘normative judgements can emerge from conceptual assumptions, themselves grounded in a particular interpretation of empirical data or the choice of what data is serving as a proxy for emotive expression’. 

From a technical point of view, FER is a measurement procedure (Mari, 2003) in which the emotions conveyed in facial expressions are probabilistically gauged to detect the dominant one or a collection of prevalent emotions. As a result, FER can be related to the concepts of validity and reliability. A recognition system is valid if it recognizes what it is designed to recognize (i.e. basic emotions); it is reliable if the outcome of its recognition is consistent when applied to the same objects (i.e. a subject’s expression). However, when FER is achieved by means of a classification system based on ML techniques, its reliability cannot (and should not) be separated from the reliability of its ground truth, i.e. training and test datasets (Cabitza et al., 2019). In this scenario, reliability is defined as the extent to which the categorical data from which the system is expected to develop its statistical model are generated from ‘precise measurements’, i.e. human ‘recognitions’ exhibiting an acceptable agreement. This is because, by definition, no classification model can outperform the quality of the human reference (Cabitza et al., 2020b). 

In this study, we will not contribute to the vast (and heated) debate still currently going on about the validity of automatic FER systems (Franzoni et al., 2019; Feldman Barrett, 2021; Stark and Hoey, 2021), that is, we do not address the classification task from the conceptual point of view (how to define emotions, if possible at all) nor merely from the technical point of view (how to recognize emotions, whatever they are). For the sake of argument, we assume that the main psychological emotion models make perfect sense and we do not address how robust recognition algorithms are, how well they perform in external settings, and, most importantly, how useful they can be, i.e. whether they provide the benefits that their promoters envision and advocate. 

Instead, we focus on the reliability of their ground truth, which is not a secondary concern from a pragmatic standpoint (Cabitza et al., 2020a, 2020b). To that end, we conducted a survey of the major FER datasets concentrating on their reported reliability as well as a small user study by which we address three related research questions: Do existing FER ground truths have an adequate level of reliability? Are human observers in agreement regarding the emotions they sense in static facial expressions? Do they agree more when the context information is shared before interpreting the expressions? 

The first question is addressed in the ‘Related work and motivations’ section and the answer is in Table 3. The other questions are addressed by means of a user study described in the ‘User study: Methods’ section and whose results are reported in the ‘Results’ section. Finally, in the ‘Discussion’ section, we discuss these findings and their immediate implications, while in the ‘Conclusion’ section we interpret them within the bigger picture of FER reliability and relate them to implications for the use of automated FER systems in sensitive domains and critical human decision making.

'What an International Declaration on Neurotechnologies and Human Rights Could Look like: Ideas, Suggestions, Desiderata' by Jan Christoph Bublitz in (2024) 15(2) AJOB Neuroscience 96 comments 

 Ethical and legal worries arising from novel neurotechnological applications have reached the level of international human rights institutions and prompted ongoing deliberations about a new legal instrument that sets international standards for the development, regulation, and use of neurotechnologies. In a recent report on human rights implications of neurotechnologies, the International Bioethics Committee of UNESCO (IBC) considers the idea of a “governance framework set forth in a future UNESCO Universal Declaration on the Human Brain and Human Rights” or a “New Universal Declaration on Human Rights and Neurotechnology” (2021, at 184c). Other human rights agencies have been concerned with the matter, hosted hearings and commissioned reports (especially OECD  2019; see also Ienca  2021; OECD  2017; Sosa et al.  2022). The UN Human Rights Council (2022) mandated its Advisory Committee to prepare a comprehensive study on neurotechnologies and human rights. A novel international instrument will likely emerge from these debates (cf. UNESCO Docs. 216 EX/Dec.9 and EX/50). As the first global instrument specifically tailored to neurotechnologies, it will set the tone for further regulations at domestic, supranational, and international levels. Although some stakeholders have been consulted in previous proceeedings, the development has so far largely evaded the broader attention of the neuroscience, neurotech, and neuroethics communities. This is unfortunate, as academic input is vital to identify problems, frame debates and develop solutions, not least because international agencies lack subject matter expertise and have relied on a limited number of experts so far. The timing is critical. Once debates move to the political arena and intergovernmental negotiations, the room for academic and big picture debates narrows as matters tend to become increasingly technical and arguments tend to become interest-based. Accordingly, the time for impactful academic interventions is now. To facilitate it and to widen the perspective of current debates, this target article puts to discussion twenty-five considerations and desiderata for a future instrument. In particular, it wishes to transcend the confines of the debate about so called neurorights that dominates the current discourse (e.g., Borbón and Borbón  2021; Bublitz  2022c; Genser, Herrmann, and Yuste  2022; Ienca  2021; Ligthart et al.  2023; Rommelfanger, Pustilnik, and Salles 2022; Yuste et al. 2017; Zúñiga-Fajuri et al. 2021). Proceeding on the basis of existing rights, the following remains uncommitted as to whether novel rights are needed. This debate overshadows a broader and richer field of relevant questions, and it is time to turn to them. 

Setting the stage, the nature and the limits of a future instrument should be clarified. It will likely be a soft law instrument such as a recommendation by UNESCO or a resolution by the UN General Assembly. Such documents are not legally binding and lack enforcement mechanisms. Whether they qualify as law at all depends on legal theory’s perennial question about the nature of law and may be answered differently with respect to different types of documents (Andorno  2012; Shelton  2008). Suffice it to note here that such documents understand themselves as more than mere ethical statements because they demand compliance by signatory States without creating enforceable legal obligations. Theoretical matters aside, soft law instruments can be practically effective governance tools that draw attention to problems and set standards which are often observed by States and other stakeholders. They may, for instance, affect governmental research funding, decisions by ethics committees, or the regulatory conditions for market approval of devices. Soft law may also turn into hard law in several ways. It may provide guidance for courts in interpreting norms, rendering the content of rights more concrete and resolving normative conflicts. It may inform secondary soft law such as general comments by treaty bodies, and inspire further binding acts at domestic or international levels. Soft law’s greater flexibility is an advantageous feature in fast-moving fields without firm normative underpinnings such as neurotechnologies, and has therefore become the prime legal-regulatory tool for technology governance at both the international and the domestic level (Hagemann and Skees  2018; Marchant and Tournas 2019). At any rate, because of the often insurmountable political hurdles that binding treaties of international law face, especially in the current geopolitical climate, soft law instruments are the best form of international governance of neurotechnologies that is realistically attainable in the near future. 

The nature of an instrument shapes its content. In contrast to the abstract and elegantly worded Universal Declaration of Human Rights and the international covenants that followed it, soft law instruments allow for more aspirational goals and broader scopes but also for more concrete norms and standards. In addition, they are not only directed at States as the protagonists of international law but also at other stakeholders, notably private actors such as businesses that may threaten human rights, individuals whose rights may have been violated, but also other relevant parties such as engineers and developers of neurotechnologies. Moreover, given the aspiration of global applicability and the need for consensus in matters about which countries and cultures may reasonably disagree, instruments must allow for local adaptability, value pluralism, compromises, and gravitate toward smallest common denominators. These conditions are reflected in the texts of such documents, which are often replete with references to general values of the human rights systems, not always entirely coherent, and sometimes even intentionally vague at critical points. But despite and because of these weaknesses, soft law instruments can set norms and standards that are observed and steer the course of the future development of a field. The UNESCO Recommendation on Artificial Intelligence (AI), adopted in 2021, may serve as a model for a future neurotech instrument. It contains recommendations at different levels of abstraction, from broad values over principles to actionable policy options. Although not free from textual weaknesses, the Recommendation provides some novel, concrete, and surprisingly far-reaching standards. 

It is further worth noting that international norms for the regulation of neurotechnologies already exist. Current debates sometimes evoke the impression that they develop in a legal vacuum, but this is a bit misleading. For instance, placing devices on markets is regulated by domestic and supranational device regulation, such as the EU Medical Device Regulation, which covers neurotechnologies for medical and some non-medical purposes (European Union  2017). It leaves neurodevices for non-medical neuroimaging outside of its scope, but this is not a gap but rather an intentional regulatory decisions. At the international human rights level, the Oviedo Convention on Human Rights and Biomedicine (1997), a legally binding international treaty signed by more than 30 States, seeks to safeguard the dignity and integrity of persons “with regard to the application of biology and medicine” (Council of Europe 1997, preamble). Likewise, the non-binding UNESCO Universal Declaration on Bioethics and Human Rights (2005) was adopted in view of the “rapid advances in science and their technological applications” (2005, preamble). Both instruments contain various norms about human rights and informed consent that apply to neurobiological interventions. The same is true for the Recommendation on Responsible Innovation in Neurotechnology (OECD  2019). This leads to the first desideratum: (i) A future instrument should cohere with existing instruments but not merely repeat them; it should neither contradict them without compelling reasons, nor address similar points by different terms, and should strive to go beyond them by suggesting more concrete norms or addressing substantially different aspects. 

The following presents further desiderata and considerations for a future instrument. It proceeds from the general to the particular, from meta-considerations to concrete rights and technical suggestions, and at least partially attempts to deduce the latter from the former. The points are thus interwoven rather than distinct; they are sometimes couched in the idiosyncratic style of international documents and should not be understood as conclusive but as an invitation for criticism and additions.