Designing an Infrastructure for Sharing of Sensitive, Massive, Data

Best practice guide | Research report


Website/link: Visit website

Published date: January 1, 2019

Author: John L Manferdelli

Subject tag: Data Access | Privacy and data protection

Our goal here is ultimately to guide the design and operation of the technical infrastructure for the storage, use, curation, distribution and analysis of this critical data. However, designing infrastructure requires some thought as to policy goals for use of the data and this requires some speculation on the desiderata for the policy objectives of such an infrastructure. These should
1. The ability to store and efficiently retrieve large amounts of structured and unstructured data, petabytes of original data.
2. The ability to ensure availability of the data on demand and the recovery of the data in the event of malfunctions, “insider” errors and even malign intent from some insiders and robustness in the face of physical catastrophes.
3. Reliable curation of data from source provenance through correction of errors and updates based on subsequent information.
4. Confidentiality of data, preventing authorized use and data integrity, preventing corruption of the source data.
5. Audit logs that can track data access and reveal subtle misuse or unauthorized access.
6. Audited, certified removal of data that must be withdrawn because of defects or legal process.
7. Clear standards and procedures for releasable data analysis.
8. Strong authentication of subject users to prevent unreasonable access due to stolen credentials.
[This entry was sourced with minor edits from the Carnegie Endowment’s Partnership for Countering Influence Operations and its baseline datasets initiative. You can find more information here:]