'The effectiveness of moderating harmful online content' by Philipp J Schneider and Marian-Andrei Rizoiu in (2023) 120(34) PNAS comments
In 2022, the European Union introduced the Digital Services Act (DSA), a new legislation to report and moderate harmful content from online social networks. Trusted flaggers are mandated to identify harmful content, which platforms must remove within a set delay (currently 24 h). Here, we analyze the likely effectiveness of EU-mandated mechanisms for regulating highly viral online content with short half-lives. We deploy self-exciting point processes to determine the relationship between theregulated moderation delay and the likely harm reduction achieved. We find that harm reduction is achievable for the most harmful content, even for fast-paced platforms such as Twitter. Our method estimates moderation effectiveness for a given platform and provides a rule of thumb for selecting content for investigation and flagging, managing flaggers’ workload.
Social media platforms are the new town squares (1)—dematerialized, digital, and unregulated town squares. In 2022, Elon Musk acquired Twitter with the stated goal of preserving free speech for the future. However, alongside free speech, harmful content disseminates and prospers in this unregulated space: mis- and disinformation that spreads faster than its debunking (2), social bots that infiltrate political processes (3), hate speech against women, immigrants, and minorities (4) or viral challenges that put teens’ lives at risk. In response, there have been calls for the governments to intervene and regulate. As the first move of its kind, the European Council introduced the Digital Services Act (DSA) and the Digital Markets Act (DMA) (5), EU legislation aimed at projecting the regulations of our offline world onto the digital one. It implements notice and action mechanisms (cf. Art. 16) to report harmful online content. Furthermore, the regulation introduces a process for appointing trusted flaggers, subject matter experts in detecting harmful content (cf. Art. 22). Once such content is flagged, platforms must promptly remove the content. However, online content is notorious for its “virality”—it spreads at high speeds and has short lifespans. Therefore, we ask about the effectiveness of this new legislation: how to quantify the likely harm caused by harmful content and how to determine the response time for effective mitigation?
In this work, we leverage state-of-the-art information spread modeling to assess the effectiveness of the DSA regulation and the EU code of conduct for countering harmful online speech. Fig. 1 conceptualizes an online discussion, where each post (or) draws more people into the discussion and generates more posts, referred to as offspring. This phenomenon of content spreading is known as the self-exciting property. A harmful post() will therefore generate potentially other harmful posts (and) with a decreasing intensity, shown by the red dashed line on the Bottom panel of Fig. 1. How would the new EU legislation potentially stop the propagation of the harm? The core concept is to limit harmful posts’ reach and the offspring generation. We denote the number of harmful, direct offspring as the potential harm—denoted asn∗ and comparable in meaning to R0,the basic reproduction number of infectious diseases (6). Content moderation is achieved by removing the harmful post () at time Δ after posting and thus stemming offspring generation after this time (). In addition, we assume that any harmful direct offspring generated before Δ() are also moderated; their number defines the actual harm—labeled as n∗Δ. The harm reduction휒is the percentage of all harmful offspring avoided, both direct and indirect—i.e., offspring of the offspring generated via the recurrent branchingprocess.
The effect of the policy heavily depends on the speed at which the discussionsunfold on social networks. We quantify this using thecontent half-life, defined as thetime required to generate half of the direct offspring. A recent (as of 2023) empirical investigation (7) determined the half-life of social media posts on different platforms:Twitter (24 min), Facebook (105 min), Instagram (20 h), LinkedIn (24 h), YouTube(8.8 d), and Pinterest (3.75 mo). A lower half-life means that most harm happens right after the content is posted, and content moderation needs to be performed quickly to be effective.