Sharing Sensitive Data with Confidence: The Datatags System

Research report | Submission/proposal/advocacy/recommendation

Website/link: https://techscience.org/a/2015101601/

Website/link: Visit website

Published date: October 15, 2015

Author: Latanya Sweeny, Merce Crosas, and Michael Bar-Sinai

Subject tag: Data Access | Privacy and data protection

We introduce the notion of datatags as a means of identifying handling and access requirements for a file. Handling includes security features, such as the use of encryption in the storage and transmission of files. Access requirements for those receiving files include providing credentials and agreeing to terms of use. A datatags repository shares data having varying levels of sensitivity by assigning tags that encode varying levels of handling and sharing restrictions. Although there are thousands of data sharing laws and regulations, and numerous ways to specify security for any given file, the datatags approach reduces this complexity to a few well-defined choices. A datatags-compliant repository provably complies with the policies associated with the designated tag to make sure promised and legally necessary handling requirements are met. There are many possible ways to construct a datatags repository, and which one is best depends on use. We introduce a model set of six tags to support options from data having no risk to data requiring maximum protection. We use the set of model datatags to present exemplar architectures for research labs, research repositories, government repositories, multinational corporations, and institutional review boards. We show implementation details for medical data, provide an interview system for tagging medical and educational data, and demonstrate how to construct a global research repository. Finally, decision makers and scholars can use a datatags repository, even without access to data, to study, compare, and analyze data sharing regimes.
[This entry was sourced with minor edits from the Carnegie Endowment’s Partnership for Countering Influence Operations and its baseline datasets initiative. You can find more information here: https://ceip.knack.com/pcio-baseline-datasets]