Many consumers have been loath to examine too closely the price we pay, in terms of forfeiting control of our personal data, for all the convenience, communication, and fun of a free-ranging and mostly free cyberspace. We are vaguely aware that cookies attach to us wherever we go, tracking our every click and view. We tell Trip Advisor our travel plans, open our calendars to Google Now, and post our birthdays on Facebook. We broadcast pictures of our newborns on Instagram; ask questions about intimate medical conditions on WebMD; and inform diet sites what we ate that day and how long we spent at the gym. Google Maps, Twitter and Four Square know where we are. Uber, Capital BikeShare, and Metro’s trip planner know where we’re going and how we plan to get there.
We spew data every minute we walk the street, park our cars, or enter a building – the ubiquitous CCTV and security cameras blinking prettily in the background – every time we go online, use a mobile device, or hand a credit card to a merchant who is online or on mobile. We spend most of our days, and a good deal of our nights, surfing the web, tapping at apps, or powering on our smart phones, constantly adding to the already bursting veins from which data miners are pulling pure gold. That’s where the “big” in “big data” comes from.
We send our digital information out into cyberspace and get back access to the magic of our wired lives. We sense this, but it took Snowden to make concrete what exactly the exchange means – that firms or governments or individuals, without our knowledge or consent, and often in surprising ways, may amass private information about us to use in a manner we don’t expect or understand and to which we have not explicitly agreed.
It is disconcerting to face how much of our privacy we have already forfeited. But with that knowledge comes power – the power to review, this time with eyes wide open, what privacy means – or should mean – in the age of the Internet. I believe that’s what President Obama meant last week when he called for a “national conversation…about the general problem of these big data sets because this is not going to be restricted to government entities.”
I’d like to pose two questions that are key to getting this conversation going, and then spend some time today trying to answer them. First, what are the major challenges to privacy posed by big data, particularly in its use in the commercial arena? And second, what steps can we take to meet these challenges? ....
We are awash in data. Estimates are that 1.8 trillion gigabytes of data were created in the year 2011 alone – that’s the equivalent of every U.S. citizen writing 3 tweets per minute for almost 27,000 years. Ninety percent of the world’s data, from the beginning of time until now, has been generated over the past two years, and it is estimated that that total will double every two years from now on. As the costs of storing data plummet and massive computing power becomes widely available, crunching large data sets is no longer the sole purview of gigantic companies or research labs. As Schonberger-Mayer and Cukier write, big data has become democratized.
First Challenge: the Fair Credit Reporting Act
This astounding spread of big data gives birth to its first big challenge: how to educate the growing and highly decentralized community of big data purveyors about the rules already in place governing the ways certain kinds of data can be used. For instance, under the Fair Credit Reporting Act, or “FCRA,” entities collecting information across multiple sources and providing it to those making employment, credit, insurance and housing decisions must do so in a manner that ensures the information is as accurate as possible and used for appropriate purposes.
The Federal Trade Commission has warned marketers of mobile background and criminal screening apps that their products and services may come under the FCRA, requiring them to give consumers notice, access, and correction rights. We’ve also entered into consent decrees that allow us to monitor the activities of other apps and online services that have similarly wandered into FCRA territory. But while we are working hard to educate online service providers and app developers about the rules surrounding collecting and using information for employment, credit, housing, and insurance decisions, it is difficult to reach all of those who may be – perhaps unwittingly – engaged in activities that fall into this category.
Further, there are those who are collecting and using information in ways that fall right on —or just beyond —the boundaries of FCRA and other laws. Take for example the new-fangled lending institutions that forgo traditional credit reports in favor of their own big-data-driven analyses culled from social networks and other online sources. Or eBureau, which prepares rankings of potential customers that look like credit scores on steroids. The New York Times describes this company as analyzing disparate data points, from “occupation, salary and home value to spending on luxury goods or pet food, … with algorithms that their creators say accurately predict spending.” These “e-scores” are marketed to businesses, which use them to decide to whom they will offer their goods and services and on what terms. It can be argued that e-scores don’t yet fall under FCRA because they are used for marketing and not for determinations on ultimate eligibility. But what happens if lenders and other financial service providers do away with their phone banks and storefronts and market their loans and other financial products largely or entirely online? Then, the only offers consumers will see may be those tailored based on their e-scores. ...
Second Challenge: Transparency
The second big challenge to big data is transparency. Consumers don’t know much about either the more traditional credit reporting agencies and data brokers or the newer entrants into the big data space. In fact, most consumers have no idea who is engaged in big data predictive analysis.
To their credit, some data brokers allow consumers to access some of the information in their dossiers, approve their use for marketing purposes, and correct the information for eligibility determinations. In the past, however, even well-educated consumers have had difficulty obtaining meaningful information about what the data brokers know about them. Just yesterday, “the big daddy of all data brokers”, Acxiom, announced that it plans to open its dossiers so that consumers can see the information the company holds about them. This is a welcome step. But since most consumers have no way of knowing who these data brokers are, let alone finding the tools the companies provide, the reality is that current access and correction rights provide only the illusion of transparency.
Third Challenge: Notice and Choice
A third challenge involves those aspects of big data to which the FCRA is irrelevant – circumstances in which data is collected and used for determinations unrelated to credit, employment, housing, and insurance, or other eligibility decisions. We need to consider these cases within the frameworks of the Federal Trade Commission Act, the OECD’s Fair Information Privacy Principles, and the FTC’s 2012 Privacy Report, for it is within those contexts we can see how big data is testing established privacy principles such as notice and choice. ....
Fourth Challenge: Deidentification
The final big challenge of big data that I would like to discuss is one that I’ve been assured by many of its proponents I shouldn’t strain too hard to solve – that of predictive analytics attaching its findings to individuals. Most data brokers and advertisers will tell you they are working with de-identified information, that is, data stripped of a name and address. And that would be great if we didn’t live in a world where more people know us by our user names than our given ones. Our online tracks are tied to a specific smartphone or laptop through UDIDs, IP addresses, “fingerprinting” and other means. Given how closely our smartphones and laptops are associated with each of us, information linked to specific devices is, for all intents and purposes, linked to individuals.
Furthermore, every day we hear how easy it is to reattach identity to data that has been supposedly scrubbed. In an analysis just published in Scientific Reports, researchers found that they could recognize a specific individual with 95 percent accuracy by looking at only four points of so-called “mobility data” tracked by recording the pings cell phones send to towers when we make calls or send texts. NSF-funded research by Alessandro Acquisti has shown that, using publicly available online data and off-the-shelf facial recognition technology, it is possible to predict – with an alarming level of accuracy – identifying information as private as an individual’s social security number from an anonymous snapshot.In response Brill says
So let’s turn to some ways to solve the challenges big data poses to meaningful notice and choice as well as transparency. A part of the solution will be for companies to build more privacy protections into their products and services, what we at the FTC call “privacy by design”. We have recommended that companies engage in cradle-to-grave review of consumer data as it flows through their servers, perform risk assessments, and minimize and deidentify data wherever possible. Mayer-Schonberger and Cukier have helpfully called for the creation of “algorithmists” – licensed professionals with ethical responsibilities for an organization’s appropriate handling of consumer data. But the algorithmist will only thrive in an environment that thoroughly embraces “privacy by design,” from the C-suite to the engineers to the programmers.
And unfortunately, even if the private sector embraces privacy by design and we license a cadre of algorithmists, we will not have met the fundamental challenge of big data in the marketplace: that is, consumers’ loss of control of their most private and sensitive information.
Changing the law would help. I support legislation that would require data brokers to provide notice, access, and correction rights to consumers scaled to the sensitivity and use of the data at issue. For example, Congress should require data brokers to give consumers the ability to access their information and correct it when it is used for eligibility determinations, and the ability to opt-out of information used for marketing.
But we can begin to address consumers’ loss of control over their most private and sensitive information even before legislation is enacted. I would suggest we need a comprehensive initiative – one I am calling “Reclaim Your Name.” Reclaim Your Name would give consumers the knowledge and the technological tools to reassert some control over their personal data – to be the ones to decide how much to share, with whom, and for what purpose – to reclaim their names.
Reclaim Your Name would empower the consumer to find out how brokers are collecting and using data; give her access to information that data brokers have amassed about her; allow her to opt-out if she learns a data broker is selling her information for marketing purposes; and provide her the opportunity to correct errors in information used for substantive decisions – like credit, insurance, employment, and other benefits.
Over a year ago, I called on the data broker industry to develop a user-friendly, one-stop online shop to achieve these goals. Over the past several months, I have discussed the proposal with a few leaders in the data broker business, and they have expressed some interest in pursuing ideas to achieve greater transparency. I sincerely hope the entire industry will come to the table to help consumers reclaim their names.
In addition, data brokers that participate in Reclaim Your Name would agree to tailor their data handling and notice and choice tools to the sensitivity of the information at issue. As the data they handle or create becomes more sensitive – relating to health conditions, sexual orientation, and financial condition – the data brokers would provide greater transparency and more robust notice and choice to consumers. The credit reporting industry has to do its part, too. There are simply too many errors in traditional credit reports. The credit bureaus need to develop better tools to help consumers more easily obtain and understand their credit reports so they can correct them. I have asked major credit reporting agencies to improve and streamline consumers’ ability to correct information across multiple credit reporting agencies.