'Data is Infrastructure' by Elettra Bietti in (2024) Theoretical Inquiries in Law comments
Data is a contextual phenomenon. It reflects the social and material context from which it is derived and in which it is generated. It embeds the purposes, assumptions and rationales of those who produce, collect, use, share and monetize it. In the AI and digital platform economy, data's role is primarily infrastructural. Its core uses are internal to companies. Data only rarely serves as a medium of exchange or commodity, and more frequently serves to profile users, train models, produce predictions, bundle and extend product capabilities which in turn are sold to advertisers and other customers. Insofar as they focus on the former, many technical, economic and legal attempts at defining data have inspired reductive policy efforts that include data protection, data ownership and limited data sharing remedies. This paper argues that understanding data as part of infrastructural pipelines can have significant conceptual and policy implications, and can redirect the way privacy, property and antitrust experts understand and govern data. This argument becomes more salient as market actors and regulators grapple with the catalyzing effects of neural networks and generative AI models on digital markets. In antitrust and competition law especially, regulators are consciously adopting a view of data as an infrastructural input into AI and other digital markets. Treating data as an input over which certain firms have competitive advantages can have significant implications for nascent AI markets, and yet the views in antitrust remain too narrow. Understanding data infrastructurally means viewing it not only as a critical input but also as inseparable from other material digital resources such as protocols, algorithms, semiconductors, and platform interfaces; as having important collective functions; and as calling for public interest regulation. Understanding data as infrastructure can move us past limited legal efforts and remedial solutions such as data separations, data sharing, and individual controls, and help reorient how data is produced, stored and managed toward public uses.