Lessons Learned from the Creation of Adminstrative Data Centers for Government Data

Research report

Website/link: https://securelysharingdata.com/resources/lane.pdf

Website/link: Visit website

Published date: January 1, 2019

Author: Julia Lane

Subject tag: Data Access

We envision the development of automated tools to create the equivalent of an Amazon.com or TripAdvisor for the access and use of confidential micro data. There are three steps involved here.
The first is to use text analysis and machine learning techniques on a series of different pre-processed publication corpora to develop models for identifying the datasets, people, and additional desired information on research methods and data use referenced in each publication.
The second is to then apply these machine learning models on a broader set of publications to validate the results, and then iterate on the most promising to improve the learning algorithms.
The third is to use gamification approaches to
a) incentivize human curation of the results and enable patterns to be identified and
b) incentivize humans to contribute new tacit knowledge that was hitherto not routinely shared. An important feature will be to make it iterative in nature, providing a framework and a platform for creating significant human feedback to feed ongoing improvements in algorithmic learning about the traits of datasets, text and people.
[This entry was sourced with minor edits from the Carnegie Endowment’s Partnership for Countering Influence Operations and its baseline datasets initiative. You can find more information here: https://ceip.knack.com/pcio-baseline-datasets]