02 August 2020

Big Data Ontologies

'Big Data, urban governance, and the ontological politics of hyperindividualism' by Robert W Lake in (2017) Big Data and Society comments 

Big Data’s calculative ontology relies on and reproduces a form of hyperindividualism in which the ontological unit of analysis is the discrete data point, the meaning and identity of which inheres in itself, preceding, separate, and independent from its context or relation to any other data point. The practice of Big Data governed by an ontology of hyperindividualism is also constitutive of that ontology, naturalizing and diffusing it through practices of governance and, from there, throughout myriad dimensions of everyday life. In this paper, I explicate Big Data’s ontology of hyperindividualism by contrasting it to a coconstitutive ontology that prioritizes relationality, context, and interdependence. I then situate the ontology of hyperindividualism in its genealogical context, drawing from Patrick Joyce’s history of liberalism and John Dewey’s pragmatist account of individualism, liberalism, and social action. True to its genealogical provenance, Big Data’s ontological politics of hyperindividualism reduces governance to the management of atomistic behavior, undermines the contribution of urban complexity as a resource for governance, erodes the potential for urban democracy, and eviscerates the possibility of collective resistance. 
 
Lake argues 

Data politics dominated newspaper headlines in New York City at the end of 2015. Controversy erupted when a former Police Commissioner charged that the city’s method of collecting crime data underreported actual events. He cited as an example the NYPD’s practice of recording a “shooting” only if a bullet wounds a victim. According to the New York Times account:
 
a shooting … is recorded only if someone is hit …. If a bullet tears a person’s clothing but does not wound the victim, the episode is not included in the Police Department’s official tally of shootings … Gunfire at a car in which the occupants are wounded by shattered glass but not by a bullet is not recorded as a shooting. (Goodman, 2015)
 
As the official in charge of the police department’s CompStat (Computer Statistics) program explained: “‘We need the bullet to cause the injury … and we need blood’” (Goodman, 2015). A follow-up article a few weeks later reported that “the number of murders recorded by the (police) department is almost always lower than those counted as homicides by the city’s medical examiner” (Goodman, 2016). The Police Commissioner defended such practices, saying that “I stand by my crime statistics because they are factual, they are the truth,” while a civil liberties advocate countered that “the controversy highlights just how soft and subjective police statistics can be” (Goodman, 2015).
 
Meanwhile, some 100 miles to the south, in the economically devastated city of Camden, New Jersey, police officials reported a large-scale expansion of that city’s “ShotSpotter” automated gunfire detection system (Adomaitis, 2015). ShotSpotter is described by its corporate provider as “an acoustic surveillance technology that incorporates audio sensors to detect, locate and alert police agencies of gunfire incidents in real time …. The alerts include … the precise time and location (latitude and longitude) represented on a map and other situational intelligence” (ShotSpotter Fact Sheet, 2016). The expanded ShotSpotter system in Camden was part of a larger strategy of augmented video surveillance and data collection designed to reassert the appearance of police control in a city that routinely tops national rankings in the incidence of violent crimes (NeighborhoodScout, 2016).
 
What counts as a “gunshot” in Camden, in many cases, would not register as a “shooting” in New York City. Whereas New York construes a “shooting” in the narrowest possible terms requiring the presence of a shooter, a bullet, and a victim’s blood, Camden’s citywide acoustic surveillance system automatically records every “digital alert” of an “actual gun discharge” as a “gunshot crime in progress” pinpointed in time and space (ShotSpotter Fact Sheet, 2016). These differences between New York City and Camden cannot be separated from their political context. The outcome of mayoral elections in New York City, as well as the city’s attractiveness for residents, tourists, and investors, depends on the public perception of safety and security, exerting downward pressure, in turn, on the practice of collecting and documenting crime statistics. The NYPD’s CompStat program tracks weekly crime data by precinct as a tool for managing organizational personnel and resources but it is equally a tool for managing public opinion (Eterno and Silverman, 2010). In a similar manner but conveying a different message, Camden’s expanded ShotSpotter detection system deploying sensors and monitors in every neighborhood also influences political opinion by establishing a visible police presence throughout the city.
 
A related controversy over categories, exclusions, and measurement erupted over data on New York City’s homeless population at a time when visible homelessness, like crime, had become a political liability for the city’s mayor. The annual homelessness count reported by the U.S. Department of Housing and Urban Development (HUD) in late 2015 found 75,323 homeless individuals in New York City but that number was quickly challenged by advocates for the homeless and HUD acknowledged uncertainty in the “reliability and consistency” of the data (Stewart, 2015a; U.S. Department of Housing and Urban Development, 2015). The ambiguities in the data were manifold. Individuals and families who became homeless through eviction, fire, landlord harassment or other reasons, and were living doubled-up with friends or relatives were not considered homeless by HUD’s definition and were excluded from the count and HUD’s report listed as zero the number of chronically homeless families in New York City not in homeless shelters. Although the city’s Human Resources Administration (HRA) funds 45 emergency and transitional shelters for women and their children forced to flee their homes due to domestic violence, HUD also reported as zero the number of homeless domestic violence (DV) victims in shelters because the DV shelters operated by HRA were considered separate from the homeless shelters operated by the Department of Homeless Services (New York City Department of Homeless Services, 2016). Simultaneously, the Mayor’s Office announced an “unprecedented expansion” in the number of shelter beds for homeless victims of domestic violence to accommodate “a 50 percent increase over the current 8,800 individuals served yearly” (New York City Office of the Mayor, 2015; Stewart, 2015b). Further confounding HUD’s data, HUD’s count of 1706 homeless youth almost certainly underestimated a significant subgroup of the homeless who, advocates said, might exceed 10,000 (Gibson, 2011) but “avoid public places where they could be counted for fear of referral to Child Protective Services and … avoid shelters out of safety concerns” (Navarro, 2015; Stewart, 2015a, 2016).
 
The selective practices of categorization and measurement illustrated in these examples might easily be dismissed as the intrusion of political agendas in the otherwise objective and politically neutral construction of data as, in the words of the NYPD Commissioner, “factual” and “the truth.” If this were the case, a solution might lie in the rationalization and depoliticization of methods of data collection, categorization, and analysis, bringing actual practices into closer alignment with normative claims. The ubiquity of Big Data as a technique of governance, biopolitics, and bureaucratic control, however, has expanded the scope of the problem and amplified the challenge of delineating solutions. My argument in this paper is that the challenge of (and to) Big Data is not confined only to the politicization of its practices but rather is situated in its foundational ontological premises, involving the evisceration of context through an ontology of hyperindividualism. An ontology of atomistic individualism underlies the construction of calculative data in general (Hacking, 1990, 1991, 2006) but the arrival of Big Data, involving the algorithmic production, manipulation, and application of very large datasets, has exacerbated and expanded the scope of the problem by obscuring from critical scrutiny its foundational hyperindividualist ontology.
 
This paper aims at a partial corrective by examining Big Data’s underlying calculative ontology. By ontology I mean “a set of contentions about the fundamental character of human being and the world” (Bennett, 2001: 160) or simply “a theory of objects and their ties” (Theory and History of Ontology, 2016). Specifying Big Data’s “ontological imaginary” (Bennett, 2001: 161) answers the question starkly posed by Wagner-Pacifici et al. (2015: 5) who ask, with respect to Big Data: “Just what is our basic ‘ontological unit?’” or, even more plainly, “What is a thing?” (see also Beauregard, 2015, 2016). Big Data’s “onto-story” (Bennett, 2001: 161) can be briefly summarized in the premise that the world is knowable via calculation and measurement and can be represented as the aggregation of discrete, independent, empirically observable units. These units are the “data points” representing, to list only a few examples, gunshots, homeless people, sociodemographic characteristics, credit card swipes, Internet searches, or geo-tagged locational coordinates captured from smartphones (Goldstein, 2016; Kitchin, 2013, 2014; Wagner-Pacifici et al., 2015; Weber, 1946). This calculative ontology both relies on and reproduces a form of atomistic individualism in which the ontological unit of analysis is the discrete data point, the meaning and identity of which inheres in itself, preceding, separate, and independent from its context or its relation to any other data point.
 
By the hyperindividualism of Big Data, I refer to the practice of disaggregation and reaggregation that proceeds through a multistep process of interconnected and interdependent constructions of the world. Big Data’s ontological imaginary involves (1) the division and disaggregation of data fields (“variables”) into ever-smaller units measured at ever finer-grained levels of resolution, (2) the practice of counting each individual observation as an autonomous unit—a thing-in-itself—extracted from and independent of its context, and (3) the reaggregation and recontextualization of the resultant data “bits” through the automated algorithmic search for statistical patterns and correlations hidden within the dataset. While an ontology of atomistic individualism underlies calculative practices in general, the diffusion of Big Data both relies on and produces a form of hyperindividualism of an unprecedented scope and scale. The hyperindividualization of Big Data results, first, from the hyperdisaggregation of data fields in what Kitchin (2014: 2) describes as the production of “massive, dynamic flows of diverse, fine-grained, relational data” recording and counting, for example, Internet transactions, selected words within social media posts, demographic “variables,” real-time spatiotemporal registers, and so on, where the identity or meaning of each data point is self-evidently and inherently given as a thing-in-itself divorced from its context. That hyperindividualization permits, second, the reaggregation and intercorrelation of data observations to construct new observations and “facts,” the meaning of which is based on, imposed by, and imputed from the discursive categorical labels in the data table rather than from the meaning residing in the lived experience of the original units of observation.
 
Consideration of Big Data’s ontology of hyperindividualism moves beyond epistemological debates over definitions, categorizations, data collection methods, and data accuracy. The interrogation of such matters derives from an internal critique of Big Data’s ontological framework while adopting and remaining within its ontological assumptions and focusing on problems of operationalization and implementation, that is, on problems of method (Lake, 2014). Motivating such internal critique is the belief that better (i.e. more accurate, consistent, objective, or comprehensive) methods of data collection, aggregation, and analysis will produce better knowledge. Beyond merely addressing internal operational mechanics, however, internecine conflicts over the “how” of Big Data have constitutive effects. By performing and naturalizing Big Data’s ontological assumptions, debates over what gets counted, through what methods, via what algorithms (Kwan, 2016), and despite what omissions and (mis)categorizations reproduce its foundational premises while deflecting attention away from a critical assessment of those underlying principles (Zaloom, 2003). The practice of Big Data governed by an ontology of hyperindividualism is also constitutive of that ontology, naturalizing and diffusing it through practices of governance and, from there, throughout myriad dimensions of everyday life. The challenge for governance is that problems inherent in the ontology underlying a practice cannot be resolved by altering the practice but must be addressed at the level of foundational ontological assumptions. Changing those ontological assumptions, however, destabilizes the entire edifice of practice built up on the prior underlying foundation that allowed the politicization of data construction to proceed in the first place. As Garfinkel observed, there are often “‘good’ organizational reasons for ‘bad’ clinical records” (Garfinkel, 1967: 186). Resistance to change on the part of interests invested in those current practices (e.g. the police or the mayor) all but guarantees the preservation of the status quo.
 
My purpose in this paper, accordingly, is to consider the implications for governance of Big Data’s ontology of hyperindividualism. Rather than taking Big Data’s ontological assumptions as the starting point of the analysis, however, my concern is to sketch a brief genealogical account of their emergence. A genealogical narrative understands practices (and their consequences) as situated in the confluence of the circumstances from which they emerged (Foucault, 1984; Hacking, 1991; Nietzsche, 1913). “History matters,” Trevor Barnes (2013: 298) reminds us, but, unlike history’s search for origins or causes, a genealogical approach problematizes the given-ness of Big Data’s ontological premises by unraveling and exposing their contingent emergence. Focusing on emergence rather than origins helps, as Jane Bennett (2001: 11) observes, to “counter the teleological tendency of one’s thoughts.” For Colin Koopman:
 
Genealogical problematization … provokes a question by rendering the inevitable contingent ….A genealogy also shows us how that which we took to be inevitable was contingently composed. A genealogy does not just show us that our practices in the present are contingent rather than necessary, for it also shows how our practices in the present contingently became what they are. The history of that which was once presumed inevitable not only makes us forget the inevitability, it also provides us with the materials we would need to transformatively work on that which we had taken to be a necessity. (Koopman, 2011: 545)
 
In the remainder of this paper, therefore, I explicate Big Data’s ontology of hyperindividualism as a radical extension of atomistic liberal individualism and I contrast it to a coconstitutive ontology that prioritizes relationality, context, and interdependence. I then situate the ontology of hyperindividualism in the longue durĂ©e of its genealogical emergence, drawing primarily from Patrick Joyce’s (2003) history of 19th-century liberalism and John Dewey’s (1929, 1935) pragmatist account of individualism, liberalism, and social action. In the concluding section of the paper, I consider the implications for governance of Big Data’s ontological politics of hyperindividualism. While Big Data’s hyperindividualist ontology extends throughout its applications in information technology, I focus here on the ways in which that foundational ontology affects the definition of urban problems, the dynamics of urban politics, and the practice of urban governance in the age of Big Data.