Facts and Where to Find Them: Empirical Research on Internet Platforms and Content Moderation

Research report

Website/link: https://pdfs.semanticscholar.org/4f7c/5761ff034439f86c255768fb1fc0b0d1ae1d.pdf?_ga=2.65152296.74278938.1637610609-1227352201.1637610609

Website/link: Visit website

Published date: December 16, 2019

Author: Daphne Keller, Paddy Leerssen

Subject tag: Algorithmic systems | Data Access | Government transparency

Reliable information about platforms’ content removal systems was, for many years, hard to come by. But data and disclosures are steadily emerging as researchers focus on the topic and platforms ramp up their transparency measures, including both self-regulatory efforts as well as disclosures required by law. This essay reviews the current and likely future sources of information. First, we discuss disclosures from platforms and other participants in content moderation, such as users and governments. Second, we discuss independent research from third parties such as academics and journalists, including data analysis, interviews and surveys. Finally, before concluding the essay, we list specific questions and areas for future empirical research.
• Accuracy rates in identifying prohibited material
○ In notices from third parties generally
○ In notices from expert or “trusted” third parties
○ In flags generated by automated tools
○ In platform decision-making
• Areas of higher or lower accuracy
○ For different claims (such as defamation or copyright)
○ For different kinds of content (such as images vs. text; English language
vs. Hindi; news articles vs. poems)
○ For different kinds of notifiers (such as “trusted experts”)
• Success rates of mechanisms designed to prevent over-removal
○ Legal obligations or penalties for notifiers
○ Legal obligations or penalties for platforms
○ Counter-notice by users accused of posting unlawful content
○ Audits by platforms
○ Audits by third parties
○ Public transparency
• Costs
○ Economic or other costs to platforms
○ Economic or other costs to third parties when platforms under-remove
(prohibited content persists on platforms)
○ Economic or other costs to third parties when platforms over-remove
(when platforms take down lawful or permitted content)
• Filters
○ Accuracy in identifying duplicates
○ Accuracy in classifying never-before-seen content
○ Ability to discern or assess when the same item of content appears in a
new context (such as news reporting)
○ Relative accuracy for different kinds of prohibited content (such as nudity
vs. support of terrorism)
242 Daphne Keller & Paddy Leerssen
terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781108890960
Downloaded from https://www.cambridge.org/core. UVA Universiteitsbibliotheek, on 23 Dec 2020 at 13:21:48, subject to the Cambridge Core
○ Relative accuracy for different kinds of files or media (such as text
vs. MP3)
○ Effectiveness of human review by platform employees to correct filtering
errors
○ Cost, including implementation and maintenance costs for platforms that
license third-party filtering technology
○ Impact on subsequent technical development (such as locking in particular technical designs)
• Community Guidelines
○ Rules enforced
○ Processes, including appeal
○ Accuracy and cost of enforcement
○ Governments’ role in setting Community Guidelines
○ Governments’ role in specific content-removal decisions
• Consequences of removal, over-removal, and under-removal
○ Public information and discourse, including trust in media
○ Electoral outcomes
○ Violence
○ Commercial interests of notifiers
○ Commercial interests of businesses impacted by removals
○ Disparate impact based on race, gender, etc. pg. 242
[This entry was sourced with minor edits from the Carnegie Endowment’s Partnership for Countering Influence Operations and its baseline datasets initiative. You can find more information here: https://ceip.knack.com/pcio-baseline-datasets]