16 January 2015

Metadata

In a study sponsored by the US Office of the Director for National Intelligence (ODNI) the US National Research Council has argued that
No software-based technique can fully replace the bulk collection of signals intelligence, but methods can be developed to more effectively conduct targeted collection and to control the usage of collected data. Automated systems for isolating collected data, restricting queries that can be made against those data, and auditing usage of the data can help to enforce privacy protections and allay some civil liberty concerns
The study reflects Presidential Policy Directive 28 of January 2014 regarding  U.S. signals intelligence practices. The Directive instructed ODNI to produce a report within one year "assessing the feasibility of creating software that would allow the intelligence community more easily to conduct targeted information acquisition rather than bulk collection." ODNI commissioned the Research Council (ie the operating arm of the National Academy of Sciences and National Academy of Engineering)  to conduct a study.

Committee chair  Robert F. Sproull comments
 From a technological standpoint, curtailing bulk data collection means analysts will be deprived of some information,” said committee chairman former director of Oracle’s Sun Labs. “It does not necessarily mean that current bulk collection must continue. A reduction in bulk collection can be partially mitigated by improving targeted collection, and technologies can improve oversight and transparency and help reduce the conflict between collection and privacy.
The 86 page report [PDF] defines “collection” as the process of extracting data from a source, filtering it according to some criteria, and storing the results. If a significant portion of the collected data is not associated with current targets or subjects of interest in an investigation, it is considered bulk; otherwise, it is targeted. The report notes that the committee was not asked to and did not consider whether the loss of effectiveness from reducing bulk collection would be too great, or whether the potential gain in privacy from adopting an alternative collection method is worth the potential loss of intelligence information.

It should accordingly not be misread in relation to the often uninformed and highly polemical debate about Australian metadata retention proposals.

The Committee considers that a key value of bulk collection is its record of past signals intelligence that may be relevant to subsequent investigations. Other sources of information (for example, metadata held by third parties such as communications providers) might provide a partial substitute for bulk collection in some circumstances. Improving the relevance of collected information to future investigations could also be achieved with new approaches to targeting.
Rapidly updating filtering criteria to include new targets as they are discovered will help collect data that would otherwise be lost, and if done quickly enough and well enough, bulk information about past events may not be needed. However, targeted collection cannot substitute for bulk collection if past events were unique or if the delay in collecting the new information is too long. 
The Committee argues that
As an alternative to controlling the collection of data, automated controls on the use of collected data can help to protect the privacy of people who are not subjects of investigation, the committee found. The report describes three key technical elements required to control and automate usage: isolating bulk data so that it can be accessed only in specific ways; restricting the types of queries that can be made against stored data; and auditing the queries that have been done. The way these controls work can be made public without revealing sensitive data, so that outside inspectors can verify that the intelligence community has and abides by adequate procedures to protect privacy. While some of the necessary technologies to enhance targeted collection or improve automated usage controls require further research and development, some of the techniques are already in use in the intelligence community or in private companies, some have been demonstrated in research laboratories, and many are feasible to deploy within the next five years. Automating usage controls will be easier if the rules governing collection and use are technology-neutral and based on a consistent set of definitions.
 Given the Committee's charter the report unsurprisingly concludes
Ultimately, the decision to deploy any given technology is a policy question that requires determining whether increased effectiveness and apparent transparency are worth the cost in equipment, labor, and potential interference with the intelligence mission. Such discussions were beyond the scope of this report.