26 October 2018

Universities, Grey Data and Privacy

'Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier' by Christine L Borgman  in (2018) 33 Berkeley Technology Law Journal 365 comments 
As universities recognize the inherent value in the data they collect and hold, they encounter unforeseen challenges in stewarding those data in ways that balance accountability, transparency, and protection of privacy, academic freedom, and intellectual property. Two parallel developments in academic data collection are converging: (1) open access requirements, whereby researchers must provide access to their data as a condition of obtaining grant funding or publishing results in journals; and (2) the vast accumulation of “grey data” about individuals in their daily activities of research, teaching, learning, services, and administration. The boundaries between research and grey data are blurring, making it more difficult to assess the risks and responsibilities associated with any data collection. Many sets of data, both research and grey, fall outside privacy regulations such as HIPAA, FERPA, and PII. Universities are exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters. Commercial entities are besieging universities with requests for access to data or for partnerships to mine them. The privacy frontier facing research universities spans open access practices, uses and misuses of data, public records requests, cyber risk, and curating data for privacy protection. This Article explores the competing values inherent in data stewardship and makes recommendations for practice by drawing on the pioneering work of the University of California in privacy and information security, data governance, and cyber risk.
Borgman concludes
Universities are as enamored of “big data” as other sectors of the economy and are similarly effective in exploiting those data to competitive advantage. They have privileged access to research data and to data about their communities, all of which can be mined and combined in innovative ways. Universities also have a privileged social status as guardians of the public trust, which carries additional responsibilities in protecting privacy, academic and intellectual freedom, and intellectual property. They must be good stewards of the data entrusted to them, especially when conflicts arise between community practices and values. For some kinds of data, good stewardship requires that access to data be sustained indefinitely, and in ways that those data can be reused for new purposes. For other kinds of data, good stewardship requires that they be protected securely for limited periods of time and then destroyed. Factors that distinguish data worth keeping or discarding vary widely by domain, content, format, funding source, potential for reuse, and other circumstances. Criteria for data protection and access also can change over time, whether due to different uses of a data collection, such as grey data being mined for research or research data being deployed for operations; transfer of stewardship within and between institutions; changes in laws and policies; or new externalities. 
The rate of data collection has grown exponentially over the last decade through both research and grey data within universities, along with data collection in the other economic sectors with which universities partner. These include government and business, social media, sensor networks, the Internet of Things, and much more. As the ability to mine and combine data improves, and technologies become more interoperable, the boundaries between data types and origins continue to blur. Responsibilities for stewardship and exposure to cyber risk increases accordingly. Risks to privacy invasion, both information privacy and autonomy privacy, accelerate as most of these data can be associated with individuals, whether as content or creators of data.  Anonymity, which is fundamental to most methods of privacy protection, has become extremely difficult to sustain as methods of re-identifying individuals become more sophisticated. Notice and informed consent remain necessary but are far from sufficient for maintaining privacy in universities or in other sectors. 
Open access to publications and to data are social policies that promote transparency and accountability in the research enterprise. Adoption is uneven because costs, benefits, and incentives for open access, especially to data, are aligned in only a few fields and domains. For most researchers, releasing data involves considerable costs, with benefits going to others. These costs may include curation (e.g., providing metadata, documentation, and records of provenance and licensing), computer storage and maintenance, software acquisition and maintenance, migration to new software and hardware, and fees for data deposit. Disposal of data also involves costs to assess what to keep and what to discard, and to ensure safe destruction of confidential or proprietary materials. Individual researchers, their employers, or their funders may bear the costs of data stewardship and responsibilities for protecting privacy, academic and intellectual freedom, intellectual property, and other values. 
None of these frontier challenges is easily addressed, nor will appropriate responses be consistent across the university sector in the U.S., much less in other countries and cultures. Data are valuable institutional assets, but they come at a price. Individuals and institutions must be prepared to protect the data they collect. These recommendations, which draw heavily on experiences in the University of California, are offered as starting points for discussion. 
A. Begin with first principles 
Universities should focus on their core missions of teaching, research, and services to address priorities for data collection and stewardship. Tenets of privacy by design, the Code of Fair Information Practice, the Belmont Report, and codifications of academic and intellectual freedom are established and tested. Implementation is often incomplete, however. For faculty, students, staff, research subjects, patients, and other members of the university community to enjoy protection of information and autonomy privacy, more comprehensive enforcement of principles such as limiting data collection, ensuring data quality, and constraining the purposes for each data element is necessary. Digital data do not survive by benign neglect, nor are records destroyed by benign neglect. Active management is necessary. Notice and consent should never be implicit. When institutions ask for permission to acquire personal data, are transparent, and are accountable for uses of data, they are more likely to gain respect in the court of public opinion. 
B.Embed the Ethic 
Data practices, privacy, academic and intellectual freedom, intellectual property, trust, and stewardship all are moving targets. Principles live longer than do the practices necessary to implement those principles. Universities are embedding data science and computational thinking into their curricula at all levels. This is an opportune moment to embed data management, privacy, and information security into teaching and practice as well. By encouraging each individual to focus on uses of data, the problem becomes personal. Rather than collecting all data that could conceivably be collected, and exploiting those data in all conceivable ways, encourage people to take a reflective step backwards. Consider the consequences of data collection about yourself and others, and how those data could be used independently or when aggregated with other data, now and far into the future. Think about potential opportunities and risks, for whom, and for how long. Study data management processes at all levels and develop best practices. Collect data that matter, not just data that are easy to gather. Interesting conversations should ensue. The Golden Rule still rules. 
C.Promote joint governance 
The successes of the University of California in developing effective principles for governing privacy and information security have resulted from extensive deliberations between faculty, administrators, and students. These can be long and arduous conversations to reach consensus but have proven constructive at creating communication channels and building trust. Many years of conversations about information technology policy at UCLA, for example, have resulted in much deeper understanding between parties. Faculty have learned to appreciate the challenges faced by administrators who need to balance competing interests, keep systems running, and pay for infrastructure out of fluctuating annual budgets. Administrators, in turn, have learned to appreciate the challenges faced by faculty who have obligations to collaborators, funding agencies, and other partners scattered around the world, and daily obligations to support students who have disparate skills and access to disparate technologies. Institutional learning is passed down through generations of faculty, students, and administrators through joint governance processes. These mechanisms are far from perfect and can be slow to respond at the pace of technological change. However, echoing Churchill’s assessment of democracy, it works better than any other system attempted to date. 
D.Promote awareness and transparency 
The massive data breaches of Equifax, Target stores, J.P. Morgan Chase, Yahoo, the National Security Agency, and others have raised community awareness about data tracking, uses of those data by third parties, and the potential for exposure. This is an ideal time to get the community’s attention about opportunities and risks inherent in data of all kinds. Individuals, as well as institutions, need to learn how to protect themselves and where to place trust online. People may react in anger if they suspect that personal data are being collected without notice and consent or think they are being surveilled without their knowledge. Universities are at no less cyber risk than other sectors but are still held to higher standards for the public trust. They have much to lose when that trust is undermined. 
E.Do not panic 
Panic makes people risk-averse, which is counterproductive. Locking down all data lest they be released under open access regulations, public records requests, or breaches will block innovation and the ability to make good use of research data or grey data. The opportunities in exploiting data are only now becoming understood. Balanced approaches to innovation, privacy, academic and intellectual freedom, and intellectual property are in short supply. Patience and broad consultation of stakeholders is needed.
'Achieving big data privacy in education' by Joel R. Reidenberg and Florian Schaub in (2018) 16(2) Theory and Research in Education 263-279 comments
Education, Big Data, and student privacy are a combustible mix. The improvement of education and the protection of student privacy are key societal values. Big Data and Learning Analytics offer the promise of unlocking insights to improving education through large-scale empirical analysis of data generated from student information and student interactions with educational technology tools. This article explores how learning technologies also create ethical tensions between privacy and the use of Big Data for educational improvement. We argue for the need to demonstrate the efficacy of learning systems while respecting privacy and how to build accountability and oversight into learning technologies. We conclude with policy recommendations to achieve these goals.
The authors comment
The improvement of education and the protection of student privacy are key societal values. On one side, Big Data offers the promise of unlocking insights to improving education through large- scale empirical analysis of data generated from student information and student interactions with educational technology tools (O’Brian, 2014). As the Data Quality Campaign (2017) has articulated, ‘data is one of the most powerful tools to inform, engage, and create opportunities for students along their education journey – and it‘s much more than test scores. Data helps us make connections that lead to insights and improvements’. But, at the same time, privacy of student information is important for education because of the adverse impact that inappropriate uses or disclosures may have on student learning and social development. In addition, fear of surreptitious monitor- ing of every mouse click and page load can create chilling effects, or possibly affect students’ well-being by amplifying performance-related stress, in ways that are detri- mental to the educational mission, as well as the goals of Big Data use in education. Algorithmic assessment and decision making may disadvantage certain learners, due to biased data or algorithms (Harel Ben Shahar, 2017) or by emphasizing indicators of learning success that undermine individuality in education (Clayton and Halliday, 2017) rather than engaging with individual students to jointly define what success may mean for them (Dishon, 2017). Big Data in education may further curtail opportunities for self-discovery by charting a path for learners personalized to their predicted aptitude instead of allowing learners to chart their own paths (Schouten, 2017). The mass collection and centralization of student information pose significant threats to student privacy (e.g. see Reidenberg et al., 2013; Reidenberg and Debelak, 2009) and raises questions about data ownership and consent (Lynch, 2017). 
These conflicts are not insurmountable as long as the use of Big Data tools in education are developed with a consideration for moral and ethical detail (Ben-Porath and Harel Ben Shahar, 2017). 
In this article, we focus on adequate safeguards for privacy in the context of Big Data in education. While recognizing that missions of education and educational institutions may change as a result of technological innovation in education, we argue that privacy safeguards will have to be provided for learners regardless of changes to the educational landscape. We focus in particular on K-12 and higher education institutions but expect that our arguments may also be applicable to non-traditional and non-institutional forms of education, such as online educational offerings. Expanding on prior work on privacy and ethical considerations for the treatment of student data, we argue that the privacy safeguards will need to be developed through technological tools, organizational approaches, and law. We further argue that the safeguards will only be successful if these various means are combined. 
We discuss specific safeguards in each category. We argue that technological measures should provide transparency about data uses, provide accountability for algorithmic decisions, and ensure the security of learners’ data. Such technological measures need to be complemented by organizational safeguards that appropriately limit access to educa- tional data and outcomes of educational data mining. Furthermore, we make the case that Big Data technologies in education should be assessed holistically with respect to both their educational impact, that is, their pedagogical benefits, as well as their impact on student privacy. With respect to law, we argue that the definition of the educational record needs to be broadened to encompass learning analytics data and thus restrict pro- cessing to ‘legitimate educational uses’. We further argue that the precautionary princi- ple should be applied to educational Big Data tools, thus requiring the assessment of potential harms when such systems are introduced into the educational process. Finally, given that public school systems and universities are major clients of educational technology vendors, public procurement criteria can provide the leverage to assure that appropriate safeguards are integrated into big data technology for education.