05 December 2013

Profiling and Big Data

US prep service Kaplan reports that
The percentages of college admissions officers who say they have Googled an applicant (29%) or visited an applicant’s Facebook or other social networking page to learn more about them (31%) have risen to their highest levels yet, according to Kaplan Test Prep’s 2013 survey of college admissions officers. When Kaplan first began tracking this issue in 2008, barely 10% of admissions officers reported checking an applicant’s Facebook page. Last year, 27% had used Google and 26% had visited Facebook — up from 20% and 24%, respectively, in 2011. ....
Despite the growth in online checking, however, there’s been a dip — to 30% this year from 35% in Kaplan’s 2012 survey — in the number of admissions officers reporting that they’re finding something that negatively impacted an applicant’s admissions chances. And notably, in a separate survey of college-bound students, more than three-quarters said they would not be concerned if an admissions officer Googled them. In response to the question, “If a college admissions officers were to do an online search of you right now, how concerned would you be with what they found negatively impacting your chances of getting in?” 50% said they would be “Not at all concerned” while 27% said “Not too concerned.” Only 14% of students said they would be “Very concerned” while the remainder said they would be “Somewhat concerned.”
Alas, no indication of why they would or wouldn't be concerned.

The Kaplan media release states that “Many students are becoming more cautious about what they post, and also savvier about strengthening privacy settings and circumventing search” and that its survey  showed that
  • 22% had changed their searchable names on social media, 
  • 26% had untagged themselves from photos, 
  • 12% had deleted their social media profiles altogether.
Kaplan's advice is for students to
run themselves through online search engines on a regular basis to be aware of what information is available about them online, and know that what’s online is open to discovery and can impact them ... Sometimes that impact is beneficial, if online searches turn up postings of sports scores, awards, public performances or news of something interesting they’ve undertaken. But digital footprints aren’t always clean, so students should maintain a healthy dose of caution, and definitely think before posting.
Meanwhile the Financial Times reports - nothing like more promo from a subject - that
Nearly 900m internet users were tracked by hundreds of third-party internet and advertising companies when they visited pornographic sites this summer.
The claim is made by 'tracking blocker' Ghostery, described by the FT as "a company that monitors online tracking" and that is lauded on Ghostery's site as "a Web tracking blocker that actually helps the ad industry", consistent with ownership by Evidon, formerly known as “The Better Advertising Project.”

The FT states that
Those tracking details can include information about the URL of the site, how long a person stays on the site and how many clicks they make there. ...
Privacy advocates fear that details about visits to adult-oriented sites could be incorporated into the vast dossiers that internet, advertising and data companies create about individuals, and are used to tailor the ads and content people see, among other purposes. Porn sites are estimated to make up at least 15 per cent of the internet.
A credit card company, for instance, could choose not to target ads to a person who frequently visits porn sites, judging them to be a higher risk customer. A gambling operation, meanwhile, could target more ads to people who spend hours visiting adult sites, considering them to have more addictive tendencies.
While some companies said that browsing behaviour from adult-oriented sites is not used to determine what ads people see, several internet and advertising companies’ privacy policies do not explicitly bar the practice. Regulations and the industry’s self-regulatory guidelines also do not prohibit the tracking or use of data related to a person’s interest or participation in adult entertainment.
'Big Data's Other Privacy Problem' by James Grimmelmann in Big Data and the Law (West Academic, 2014) comments that
Big Data has not one privacy problem, but two. We are accustomed to talking about surveillance of data subjects. But Big Data also enables disconcertingly close surveillance of its users. The questions we ask of Big Data can be intensely revealing, but, paradoxically, protecing subjects' privacy can require spying on users. Big Data is an ideology of technology, used to justify the centralization of information and power in data barons, pushing both subjects and users into a kind of feudal subordination. This short and polemical essay uses the Bloomberg Terminal scandal as a window to illuminate Big Data's other privacy problem. 
Grimmelmann notes that
We are accustomed to speaking about Big Data’s privacy concerns in terms of the surveillance it enables of data subjects.  Anyone high enough to take a tenthousand- foot view can see over fences. Take a wide-angle shot, zoom and enhance, and you have a telephoto close-up. But consider now the user of the Bloomberg terminal, zipping from function to function, running down a hunch and preparing to make a killing. Perhaps he correlates historical chart data for energy-sector indices with news reports on international naval incidents in the Pacific Rim. He pulls patterns out of after-hours trading data, checking them against SEC filings and earnings calls. He has a theory, about what happens when certain shipbuilders report their quarterlies—two usually-coupled bond funds briefly diverge—and he stands ready to pocket some cash the next time it happens by exploiting this informational advantage with overwhelming financial force. Tell him that someone has been watching every keystroke, and you will see the blood drain from his face. .....
There is another way of understanding the relationship between Big Data subjects and Big Data users. The fact that users also have privacy interests at stake complicates the project of protecting subject privacy. To understand the problem, it helps to understand something of the debate over how what to do about safeguarding those whose personal information has been hoovered up at terabyte scale.
For a time, it appeared that no restrictions on use might be necessary because there were no data subject privacy interests at stake. Deidentification was the watchword of the day: it was thought that some simple scrubbing—stripping a dataset of names, ranks, and serial numbers—would render these data driftnets dolphin-safe. And the database wranglers would have gotten away with it, too, if it hadn’t been for those meddling computer scientists. Personal information always contains something unique. It expresses its singularity even in an IP address, and a very modest grade of data has in it something irreducible, which is one man’s alone. That something he may be reidentified from, unless there is a restriction in access to the database. Although there is a lively dispute about where to draw the balance between the needs of the many (as data subjects) and the needs of the many (as research beneficiaries), it is by now painfully clear that some such balance must be struck.
The next line of defense, implicit in the burgeoning discourse of Big Data boosterism, is that only incorruptible researchers who are pure of heart will be plowing through the piles of data in search of ponies. Epidemiologists are the poster children, perhaps because public health officials would never, ever jump to conclusions about poorly understood diseases sweeping through their communities. This ideal of a trusted elite priesthood of data analysts bears an uncanny similarity to National Rifle Association head Wayne LaPierre’s invocation of “good guys with guns.” When Big Data is outlawed, only outlaws will have Big Data. Actuaries and supply chain optimizers, perhaps, come close to this technocratic ideal. But Big Data today is probably better embodied by marketers and hedge-fund traders, two professions not known for their generous concern for human flourishing.
It is hard to feel sanguine about the Big Swinging Dicks who brought us the subprime financial Chernobyl or about ad men in the business of running A/B tests to optimize their manipulation of consumers’ cognitive biases. Any sufficiently advanced marketing technology is indistinguishable from blackmail. The global phishing industry shows what happens when confidence men scale up their scams. And all of this is to say nothing about Carnivore, Total Information Awareness, PRISM, EvilOlive, and the other ominously-named trappings of the National Surveillance State. Give the CIA six megabytes of metadata inadvertently emitted by the most honest of men, and it will find something in them to put him on the drone kill list. One might — as the Obama Administration asks — simply trust in the good faith and minimal competence of the Three Letter Agencies that brought us extraordinary rendition, COINTELPRO, and the Clipper Chip. Or, more realistically, one might question the wisdom of creating comprehensive fusion centers accessible to every vindictive cop with a score to settle.
Thus, since Big Data cannot be entirely defanged and its users cannot be entirely trusted, it becomes necessary to watch them at work. It seems like a natural enough response to the problem of the Panopticon. Subject privacy is at risk because Big Data users can hide in the shadows as they train their telescopes not on the stars but on their neighbors. And so we might say, turn the floodlights around: ensure that there are no dark corners from which to spy. We would demand audit trails—permanent, tamper-proof records of every query and computation.
But if we are serious about user privacy as well as about subject privacy, transparency is deeply problematic. The audit trails that are supposed to protect Big Data subjects from abuse are themselves a perfect vector for abusing Big Data users. Indeed, they are doubly sensitive, because they are likely to contain sensitive information about both subjects and users. The one-way vision metaphor of the Panopticon, then, is double-edged. Think about glasses. A common intuition is that mirrorshades are creepy, because the wearer can see what he chooses without revealing where his interest lies. Everyone is up in arms about the Google Glass-holes who wear them into restrooms. But the all-seeing Eye is a window to the soul. The Segway for your face is also a camera pointed directly at your brain that syncs all its data to the cloud. The assumption Glass users are making, presumably, is that no one else will have access to their data, and so no one else will be pondering what they’re pondering. But that’s what Bloomberg Terminal users thought, too.
This leaves meta-oversight: watching the watchmen. Audit trails don’t need to be public; access to them could be restricted to a small and specialized group of auditors. But this privacy epicycle introduces complications of its own. You have a security problem, so you audit your users. Now you have two security problems: you are committed to safeguarding and watching over not just your data, but your data about how your data is being used. Whoever looks through the logfiles will be able to gain remarkable insight into users’ methods and madnesses. Yes, the auditors will be looking for suspicious access patterns, but they’ll need to have access to the full, sensitive range of information. You wouldn’t want an insider trading scandal in which an auditor piggybacked on an analyst’s research, or a auditor who picks a favorite user and turns into a stalker. Your auditors, in other words, are also Big Data users, which means that they too will have to be audited. It’s watchmen all the way down.