Best practice guide | Submission/proposal/advocacy/recommendation
January 1, 2019
Subject tag: Data Access
Our goal here is ultimately to guide the design and operation of the technical infrastructure for the storage, use, curation, distribution and analysis of this critical data. However, designing infrastructure requires some thought as to policy goals for use of the data and this requires some speculation on the desiderata for the policy objectives of such an infrastructure. These should
1. The ability to store and efficiently retrieve large amounts of structured and unstructured data, petabytes of original data.
2. The ability to ensure availability of the data on demand and the recovery of the data in the event of malfunctions, “insider” errors and even malign intent from some insiders and robustness in the face of physical catastrophes.
3. Reliable curation of data from source provenance through correction of errors and updates based on subsequent information.
4. Confidentiality of data, preventing authorized use and data integrity, preventing corruption of the source data.
5. Audit logs that can track data access and reveal subtle misuse or unauthorized access.
6. Audited, certified removal of data that must be withdrawn because of defects or legal process.
7. Clear standards and procedures for releasable data analysis.
8. Strong authentication of subject users to prevent unreasonable access due to stolen credentials.
9. Processing standards that prevent subject users from introducing vulnerabilities into the processing infrastructure.
10. Processing infrastructure allowing computationally intensive use of the data.
11. Possibly standardized but continually evolving analysis tools to ensure accuracy and cross calibration of results.
12. The ability to automatically provided differentiated policy enforcements for a wide variety of data sets from a wide variety of suppliers who may uneasy about other providers, users or attribution.
13. The ability to respond, if required, to lawful requests for data production as well as procedures to prevent abuse.
14. Technical mechanism to “verify” results which do not interfere with protected researcher ideas, materials or experiments.
15. Development of tools and techniques to measure data accuracy and fidelity
[This entry was sourced with minor edits from the Carnegie Endowment’s Partnership for Countering Influence Operations and its baseline datasets initiative. You can find more information here: https://ceip.knack.com/pcio-baseline-datasets]