'Predictive privacy: Collective data protection in the context of artificial intelligence and big data' by Rainer Mühlhoff in (2012) Big Data And Society comments
Big data and artificial intelligence pose a new challenge for data protection as these techniques allow predictions to be made about third parties based on the anonymous data of many people. Examples of predicted information include purchasing power, gender, age, health, sexual orientation, ethnicity, etc. The basis for such applications of “predictive analytics” is the comparison between behavioral data (e.g. usage, tracking, or activity data) of the individual in question and the potentially anonymously processed data of many others using machine learning models or simpler statistical methods. The article starts by noting that predictive analytics has a significant potential to be abused, which manifests itself in the form of social inequality, discrimination, and exclusion. These potentials are not regulated by current data protection law in the EU; indeed, the use of anonymized mass data takes place in a largely unregulated space. Under the term “predictive privacy,” a data protection approach is presented that counters the risks of abuse of predictive analytics. A person's predictive privacy is violated when personal information about them is predicted without their knowledge and against their will based on the data of many other people. Predictive privacy is then formulated as a protected good and improvements to data protection with regard to the regulation of predictive analytics are proposed. Finally, the article points out that the goal of data protection in the context of predictive analytics is the regulation of “prediction power,” which is a new manifestation of informational power asymmetry between platform companies and society.
One of the today's most important applications of artificial intelligence (AI) technology is so-called predictive analytics. I use this term to describe data-based predictive models that make predictions about any individual based on available data. These predictions can relate to future behavior (e.g. what is someone likely to buy?), to unknown personal attributes (e.g. sexual identity, ethnicity, wealth, education level), to momentary vulnerabilities (vulnerable conditions such as frustration, depression, loneliness, financial difficulties, pregnancy, etc.), or to personal risk factors (e.g. mental or physical disease predispositions, addictive behavior, or credit risk). Predictive analytics is controversial because, although it has socially beneficial applications, the technology has an enormous potential for abuse and is currently scarcely regulated by law. Predictive analytics makes it possible to automate and, therefore, significantly scale the exploitation of individual vulnerabilities, as well as fostering unequal treatment of individuals in terms of access to economic and social resources such as employment, education, knowledge, healthcare, and law enforcement. Specifically, in the context of data protection and anti-discrimination, the application of predictive AI models needs to be analyzed as a new form of data power large IT companies wield and which relates to the stabilization and production of discriminatory structures, patterns of exploitation, and data-based societal inequalities.
Against the backdrop of the enormous societal impact of predictive analytics, I will argue (as others have argued before me, cf. Hildebrandt, 2009; Hildebrandt and Gutwirth, 2008; Mittelstadt, 2017; Taylor et al., 2016; Taylor, 2016; Vedder, 1999) that we need new approaches to data protection in the context of big data and AI. In my approach, I will use the concept of predictive privacy to normatively capture this novel form of privacy violation through inferred or predicted information. That is, applying predictive models to individuals in order to support decisions is a violation of privacy, yet it is one which does not come about either through “data theft” or a breach of anonymization. Predictive analytics proceeds according to the principle of “pattern matching” by learning algorithms that compare auxiliary data known about a target individual (e.g. usage data on social media, browsing history, geolocation data) against the data of many thousands of other users. This pattern matching is at the core of predictive privacy violations and is possible wherever there is a sufficiently large group of users disclosing their sensitive attributes alongside behavioral and auxiliary data—usually, because they are unaware that this data can be exploited using big data-based methods, or because they think they personally “have nothing to hide.” As I will argue, the problem of predictive privacy denotes a limit to the liberalism inherent in contemporary views of data privacy as the individual's right to control what data is shared about them. The issue of predictive privacy thus strengthens the case for anchoring collectivist protective goods and collectivist defensive rights in data protection.
In the philosophical theories of privacy, collectivist perspectives have long taken into account that one's own data can potentially have negative effects on other people as well, and have therefore posited that individuals should not be free to decide in every respect what data they disclose about themselves to modern data companies (Hildebrandt, 2009; Hildebrandt and Gutwirth, 2008; Loi, Christen, 2020; Mantelero, 2016; Mittelstadt, 2017; cf. Regan, 2002; Taylor et al., 2016). I will also argue that large collections of anonymized data relating to many individuals should not be freely processable by data processors because predictive capacities can be extracted from anonymous data sets. This is in contrast to the current legal situation under the EU General Data Protection Regulation (GDPR), which does not restrict the processing and storage of anonymized data and the predictive models (or “profiles,” to use the terminology of Hildebrandt, 2009) derived from them. Finally, I will call for the rights of data subjects as outlined by the GDPR (right of access, rectification, deletion, and so on) to be reformulated in a collectivist manner, so that affected groups and the community as a whole would be empowered, for the sake of the common good, to exercise such rights against data-processing organizations and thereby prevent the misuse of predictive capacities.