'Suspect AI: Vibraimage, Emotion Recognition Technology and Algorithmic Opacity' by James Wright in (2021) Science, Technology and Society comments
Vibraimage is a digital system that quantifies a subject’s mental and emotional state by analysing video footage of the movements of their head. Vibraimage is used by police, nuclear power station operators, airport security and psychiatrists in Russia, China, Japan and South Korea, and has been deployed at two Olympic Games, a FIFA World Cup and a G7 Summit. Yet there is no reliable empirical evidence for its efficacy; indeed, many claims made about its effects seem unprovable. What exactly does vibraimage measure and how has it acquired the power to penetrate the highest profile and most sensitive security infrastructure across Russia and Asia? xx I first trace the development of the emotion recognition industry, before examining attempts by vibraimage’s developers and affiliates scientifically to legitimate the technology, concluding that the disciplining power and corporate value of vibraimage are generated through its very opacity, in contrast to increasing demands across the social sciences for transparency. I propose the term ‘suspect artificial intelligence (AI)’ to describe the growing number of systems like vibraimage that algorithmically classify suspects/non-suspects, yet are themselves deeply suspect. Popularising this term may help resist such technologies’ reductivist approaches to ‘reading’—and exerting authority over—emotion, intentionality and agency.
Wright states
As I sat in the meeting room of a nondescript office building in Tokyo, the managing director of a company called ELSYS Japan discussed my emotional and psychological state, referring to a series of charts and tables displayed on a large screen at the front of the room: Aggression … 20-50 is the normal range, but you scored 52.4 … this is a bit too high. Probably you yourself didn’t know this, but you’re a very aggressive person, potentially… Next is stress. Your stress is 29.2, within the range of 20-40, with a statistical deviation of 14—that’s OK… I think you have very good stress… Just tension—your [average] value is within the range, but because your statistical deviation is high—over 20—so you’re a little tense. Mental balance is 64 from a range of 50-100, so it fits correctly in the range… Charm … 74.6 is pretty good. Now, neuroticism is 35.3, this is also in the range, but the statistical deviation is high. But some people have a high score the first time they are measured. There are people who have high scores for neuroticism as well as for tension, yes. People who possess a delicate heart.1 (Interview, 17 April 2019) xx The director’s seemingly authoritative statements were based on an assessment of various measurements produced by ‘vibraimage’, a patented2 system developed to quantify a subject’s mental and emotional state through an automated analysis of video footage of the physical movements of their face and head. This system, distributed in Japan by ELSYS Japan under the brands ‘Mental Checker’ and ‘Defender-X’, provides numerical values for levels of aggression, tension, balance, energy, inhibition, stress, suspiciousness,3 charm, self-regulation, neuroticism, extroversion and stability, categorising these automatically into positive and negative ‘emotions’. Mental Checker generates an impressive array of statistical data arranged across tables, pie chart, histogram and line chart, producing an image of mathematical precision and solid scientific legitimacy (see Figure 1). The report also provides a visualisation of what ELSYS Japan terms an ‘aura’—a horizontal colour-coded bar chart, indicating the frequency of micro-vibrations of a subject’s head, superimposed against a still image of their face.
Vibraimage technology has already entered the global security marketplace. It was deployed at the 2014 Sochi Olympics (Herszenhorn, 2014), 2018 PyeongChang Winter Olympics, 2018 FIFA World Cup in Russia and at major Russian airports to detect suspect individuals among crowds (JETRO, 2019). It has been used at the Russian State Atomic Energy Corporation in experiments to monitor the professionalism of workers handling and disposing of spent nuclear fuel and radioactive waste (Bobrov et al., 2019; Shchelkanova et al., 2019), and to diagnose their psychosomatic illnesses (Novikova et al., 2019). In Japan, Mental Checker and Defender-X have been used by one of the largest technology and electronics companies, NEC,4 to vet staff at nuclear power stations and by a leading security services firm, ALSOK, to detect and potentially deny entry to or detain suspicious individuals at major events, including the G7 Summit in 2016, as well as sporting events and theme parks (Interview with ELSYS Japan, 17 April 2019). Managers at ELSYS Japan expected that the technology would be used at the 2020 Tokyo Olympics (Nonaka 2018, p. 148, Interview with ELSYS Japan, 17 April 2019), an event that spurred significant increased spending on domestic security services and infrastructure, with estimated market growth of 18% between 2016 and 2019 (Teraoka, 2018).5 ELSYS Japan’s customers also include Fujitsu and Toshiba, which have considered ‘incorporat[ing] [vibraimage]… into their own recognition technologies to differentiate their original products’ (Nonaka, 2018, p. 147), and managers told me that Mental Checker has been used by an unspecified number of Japanese psychiatrists to confirm diagnoses of depression.
In South Korea, the Korean National Police Agency, Seoul Metropolitan Policy Agency and several universities have collaborated on research aiming to establish the use of vibraimage in a video-based ‘contactless’ lie-detection system as an alternative to polygraph testing (Lee & Choi, 2018; Lee et al., 2018), while, in China, it has been deployed in Inner Mongolia, Zhejiang and elsewhere to identify suspects for questioning and detention, and has been officially certified for use by Chinese police (Choi et al., 2018a, 2018b).6 Other corporate applications of vibraimage are also proposed: an ELSYS Japan brochure suggests using Mental Checker to discover how employees really feel about their company; measure their levels of stress, fatigue and ‘potential ability’; counter employees’ accusations of bullying and abuses of power in the workplace; and even ‘to know the risk of hiring persons who might commit a crime’ (ELSYS Japan Brochure, undated). The brochure provides a screenshot of a suggested employee report, with grades (A+, B−, C, etc.) for qualities that include stability, fulfilment and happiness, social skills, teamwork, communication, ability to take action, aggressiveness, stress tolerance and ability to ‘recognise reality’.
Vibraimage forms one part of the rapid growth in algorithmic security, surveillance, predictive policing and smart city infrastructure across urban East Asia, enabling the ‘active sorting, identification, prioritization and tracking of bodies, behaviours and characteristics of subject populations on a continuous, real-time basis’ (Graham & Wood, 2003, p. 228). Amid an international boom in both surveillance technologies and artificial intelligence (AI) systems designed to extract maximal information from digital photographic and video data relating to the body, companies are developing algorithms that move beyond facial recognition intended to identify individuals and increasingly aim to analyse their behaviour and emotional states (AI Now Institute, 2018, pp. 50–52). The digital emotion recognition industry was worth up to US$12 billion in 2018, and it continues to grow rapidly (AI Now Institute, 2018).
As the concepts of algorithmic regulation and governance (Goldstein et al., 2013; Introna, 2016) are increasingly becoming a reality, transparency has become a key theme in critiques of black-boxed algorithms and AI, including those used in emotion recognition. This is particularly the case with machine learning, in which algorithms recursively adjust themselves and can quickly become inexplicable even to data science experts. As Maclure puts it, ‘we are delegating tasks and decisions that directly affect the rights, opportunities and wellbeing of humans to opaque systems which cannot explain and justify their outcomes’ (Maclure, 2019, p. 3). Transparency is linked to and overlaps with values of comprehensibility, explicability, accountability and social justice, and it is frequently presented as a vital component of ethical or ‘good’ AI (Floridi et al., 2018; Hayes et al., 2020; Leslie, 2019). ...
... This article uses the case of vibraimage to examine issues around opacity and the work it does for companies and governments in the provision of security services, by attempting to shed light on the algorithms of vibraimage and its imagined and actual uses, as far as possible based on publicly available data. What exactly does vibraimage measure and how does the data the system produces, processed through an algorithmic black box, deliver reports that have acquired the power to penetrate corporate and public security systems involved in the highest profile and most sensitive security tasks in Russia, Japan, China and elsewhere? The first section of the article examines emotion detection techniques and their digitalisation. The second section focuses on vibraimage and how its proponents, many of whom have commercial relationships with companies distributing it, have engaged in processes of scientific legitimation of the technology while making claims for its actual and potential uses. The final section considers how the disciplining power and corporate value of vibraimage are generated through its very opacity, in stark contrast to increasingly urgent demands across the social sciences and society, more broadly, for transparency as a prerequisite for ‘good AI’. I propose the term ‘suspect AI’ reflexively to describe the increasing number of algorithmic systems, such as vibraimage, in operation globally across law enforcement and security services, which automatically classify subjects as suspects or non-suspects. Popularising this term may be one way to resist such reductivist approaches to reading and exerting authority over human emotion, intentionality, behaviour and agency.
Emotion Recognition Based on Facial Expressions
Psychologist Paul Ekman pioneered research exploring the relationship between emotions and facial expressions since the 1960s, building on Darwin’s (2012[1872]) work on evolutionary connections between the two among animals, including humans. Ekman conducted experiments around the world, aiming to demonstrate the universality of a handful of basic emotions (such as anger, contempt, disgust, fear, happiness, sadness and surprise) across all cultures and societies, and of their articulation through similar facial expressions (Ekman, 1992). This work was highly influential because it seemed to provide overwhelming empirical evidence that individuals of all cultures were able to ‘correctly’ categorise the expressions of people of their own and other cultures provided in photos, matching them to the ‘basic emotions’ they supposedly expressed (Ekman & Friesen, 1971).
Ekman further argued that facial expressions could be used to identify incongruities between professed and ‘real’ emotions, enabling facial expression analysis to be used for lie detection (Ekman & Friesen, 1969). This attracted substantial interest from corporations concerned with ensuring the honesty of employees or gaining covert insights in business negotiations, and from governments and security forces concerned with identifying dissimulating and suspect individuals. Ekman and collaborators in this field like David Matsumoto formed companies, running workshops and holding consultations with corporations and public bodies about how to read subjects’ facial micro-expressions and behavioural cues to evaluate personality, truthfulness and potential danger. In 2001, the American Psychological Association named Ekman one of the most influential psychologists of the twentieth century (APA, 2002).
The identification of emotions through facial expressions underwent digitalisation via machine learning techniques pioneered since the mid-1990s by Rosalind Picard and Rana el Kaliouby at Massachusetts Institute of Technology (MIT). They commercialised this new field of ‘affective computing’ via their venture capital–backed company Affectiva, founded in 2009, which provides emotional analysis software to businesses based on algorithms trained on large databases of facial expressions (Johnson, 2019). According to Affectiva, this enables a test subject’s emotional responses to, for example, TV commercials, to be tracked in real time. With the recent boom in facial recognition technology, emotion recognition represents a rapidly expanding area of AI development, used across industries, including recruitment and marketing research (Devlin, 2020). A growing number of companies offer emotion recognition services based on analysis of facial expressions, including Microsoft (Emotion application programming Interface [API]), Amazon (Rekognition), Apple (Emotient, which Ekman advised on) and Google (Cloud Vision API).
Such systems are increasingly being used in border protection and law enforcement to identify dissimulating and otherwise suspect individuals, regardless of substantial evidence of efficacy. From 2007, the Transportation Security Administration (TSA) spent US$900 million on a ‘behaviour-detection programme’ entitled Screening Passengers by Observation Technique (SPOT), until it was ruled ineffective by the Department of Homeland Security and the Government Accountability Office (GAO, 2013). Ekman consulted on SPOT, and the system incorporated his techniques; his company also provided consulting services to US courts (Fischer, 2013). Another system—Automated Virtual Agent for Truth Assessments in Real-Time (AVATAR), was developed for lie detection targeting migrants on the USA–Mexico border (Daniels, 2018), while the EU trialled the iBorderCtrl system, supplied by the consortium European Dynamics and funded by Horizon 2020, using the interpretation of micro-expressions to detect deceit among migrants in Hungary, Greece, and Latvia (Boffey, 2018; see also AI Now Institute, 2018, pp. 50–52).
Recently, this work on facial expression analysis for emotion recognition has come under increasing scrutiny despite its ongoing popularity among many psychologists. The most basic critique is that one does not necessarily smile when one is happy—common sense suggests that facial expressions do not always, or even often, map to inner feelings, that emotions are often fleeting or momentary, and that facial expressions and their meaning are highly dependent on sociocultural context. Barrett et al. (2019) summarise these and other critiques, arguing that approaches positing a limited number of prototypical basic emotions that can be ‘read’ through universal facial expressions fail to grasp what emotions are and what facial expressions convey.
In anthropology, the ‘affective turn’ has drawn attention to the distinction between affect and emotion—the former a precognitive sensory response or potential to affect and be affected, and the latter a more culturally mediated expression of feeling. White describes this as the difference between ‘how bodies feel and how subjects make sense of how they feel’ (White, 2017, p. 177). These nuances are overlooked in the field of emotion recognition, which reduces emotion to a simplistic and digitally scalable model. Barrett argues that emotion is: a contingent act of perception that makes sense of the information coming in from the world around you, how your body is feeling in the moment, and everything you’ve ever been taught to understand as emotion. Culture to culture, person to person even, it’s never quite the same. (Fischer, 2013)
We might, therefore, define the process of interpreting one’s own emotional state as making sense of an inner noise of biological signals and memories, in contextually contingent and socioculturally mediated ways, and placing them into—and in the process co-constructing—socioculturally mediated categories. It may also sometimes involve not definitively categorising or making sense of these affective feelings. As this article will show, it is the very ambiguity or malleability of this process that may help make vibraimage a convincing technology of emotion recognition and provide authority to its analysis.
Given these growing critiques of Ekmanian theories of universal basic emotions expressed through facial expressions, researchers at the organisation AI Nowhave concluded that, by extension, the digital emotion detection industry is ‘built on markedly shaky foundations…. There remains little to no evidence that these new affect-recognition products have any scientific validity’ (AI Now Institute, 2018, p. 50). Baesler, similarly, argues that the use of emotion detection software by the TSA was ‘unconfirmed by peer-reviewed research and untested in the field’ (Baesler, 2015, pp. 60–61), while holding significant potential for harm through misuse. In common with broader critiques of AI from critical algorithm studies (e.g., Eubanks, 2018; Lum & Isaac, 2016), machine learning methods involved in emotion recognition systems have been criticised for racial bias, based on their training data sets (Rhue, 2018). Indeed, Ekman’s work not only constructs ethnocentric emotional categories but also racial subject categories, for example in his creation, with Matsumoto, of the Japanese and Caucasian Facial Expressions of Emotion stimulus set of photos showing emotional expressions of archetypal ‘Japanese’ and ‘Caucasian’ subjects (Biehl et al., 1997; https://www/humintell.com), which continues to be used in psychology experiments. For all of these reasons, the increasingly widespread application of this technology has raised growing ethical and civil liberties concerns
'Automated Video Interviewing as the New Phrenology' by Ifeoma Ajunwa in (2022) 36 Berkeley Technology Law Journal 101 comments
This Article deploys the new business practice of automated video interviewing as a case study to illuminate the limitations of traditional employment antidiscrimination laws. Employment antidiscrimination laws are inadequate to address unlawful discrimination attributable to emerging workplace technologies that gatekeep equal opportunity in employment. The Article shows how the practice of automated video interviewing is based on shaky or non-proven technological principles that disproportionately impact racial minorities. In this way, the practice of automated video interviewing is analogous to the pseudo-science of phrenology, which enabled societal and economic exclusion through the legitimization of eugenics and racist attitudes. After parsing the limitations of traditional anti-discrimination law to curtail emerging workplace technologies such as video interviewing, this Article argues that ex ante legal regulations, such as those derived from the late Professor Joel Reidenberg’s Lex Informatica framework, may be more effective than ex post remedies derived from the traditional employment antidiscrimination law regime. The Article argues that one major benefit of applying a Lex Informatica framework to video interviewing is developing legislation that considers the capabilities of the technology itself rather than how actors intend to use it. In the case of automated hiring, such an approach would mean actively using the Uniform Guideline on Employee Selection Procedures to govern the design of automated hiring systems. For example, the guidelines could dictate design features for the collection of personal information and treatment of content. Other frameworks, such as Professor Pamela Samuelson’s “privacy as trade secrecy” approach could govern design features for how information from automated video interviewing systems may be transported and shared. Rather than reifying techno solutionism, a focus on the technological capabilities of automated decision-making systems offers the opportunity for regulation to start at inception, which in turn could affect the development and design of the technology. This is a preemptive approach that sets standards for how the technology will be used and is a more proactive legal approach than merely addressing the negative consequences of the technology after they have occurred.