NBC News is publishing its database of more than 200,000 tweets that Twitter has tied to "malicious activity" from Russia-linked accounts during the 2016 U.S. presidential election.
These accounts, working in concert as part of large networks, pushed hundreds of thousands of inflammatory tweets, from fictitious tales of Democrats practicing witchcraft to hardline posts from users masquerading as Black Lives Matter activists. Investigators have traced the accounts to a Kremlin-linked propaganda outfit founded in 2013 known as the Internet Research Agency (IRA). The organization has been assessed by the U.S. Intelligence Community to be part of a Russian state-run effort to influence the outcome of the 2016 U.S. presidential race. And they're not done.
"There should be no doubt that Russia perceives its past efforts as successful and views the 2018 US midterm elections as a potential target for Russian influence operations," Director of National Intelligence Dan Coats told the Senate Intelligence Committee Tuesday.
"The Russians utilize this tool because it's relatively cheap, it's low risk, it offers what they perceive as plausible deniability and it's proven to be effective at sowing division," he told the annual hearing on worldwide threats. "We expect Russia to continue using propaganda, social media, false flag personas, sympathetic spokesmen, and other means of influence to try to build on its wide range of operations and exacerbate social and political fissures in the United States."
“Frankly, the United States is under attack,” he said.
The depth of this problem is still being understood. In January, Twitter emailed nearly 700,000 users to tell them they may have engaged with the Russian accounts. Less than two weeks later, Twitter announced the number of users notified had more than doubled.
Twitter said it has a "commitment to transparency" and has handed over to Congress a list of 3,814 account names it linked to the IRA through third-party information and by analyzing accounts that had purchased promoted tweets during the election. Twitter has since suspended those accounts.
But when Twitter suspends accounts, it deletes their tweets from public view and demands that anyone else using its data delete the tweets, too.
Experts say the social media network shouldn't apply the same policies to gadget spammers as to evidence of foreign election interference.
"Twitter's knee jerk reaction is to purge," said David Carroll, an associate professor of media design at the New School. "What it does is erase history in an almost Orwellian way."
At the request of NBC News, three sources familiar with Twitter's data systems cross-referenced the list of names released by Congress, excluding any account that Twitter later restored, to create a partial database of tweets that could be recovered from the suspended accounts. The sources asked to remain anonymous to avoid any politicization of their work or being identified as possibly violating Twitter's developer policy.
NBC News has already used the data to expose how Russian accounts impersonated everyday Americans and drew hundreds of millions of followers, exploiting terrorist attacks, the debates and other breaking news events. Our investigations revealed how the accounts pushed graphic, racist and conspiracy theory-filled disinformation, while flattering, arguing and cajoling more than 40 U.S. politicians, media figures and celebrities into interacting with and amplifying their propaganda.
Despite Twitter's pledges to crack down, Russian trolls and bots are still exploiting the social media platform to interfere with elections, including our midterms. To help shine a light on this persistent threat to democracy, NBC News is open sourcing its data.
If you publish using the data, please credit NBC News, link to this page, and let us know. Send questions and projects to email@example.com or @bpopken.
Get the data:
- Regular reader? Download streamlined spreadsheet (29 mb) with just usernames, tweet and timestamps. We recommend you right click on links and select "save link as" or similar, otherwise it may take a long time to load in your browser.
- View full data for ten influential accounts in Google Sheets
- Researcher? Download tweets.csv (50 mb) and users.csv with full underlying data
- Explore a graph database in Neo4j
For help getting started with the graph database prepared by our partners at Neo4j's Data Journalism Accelerator program, whose software powered the Panama Papers and Paradise Papers investigations, read this.
To recreate a link to an individual tweet found in the spreadsheet, replace "user_key" in https://twitter.com/user_key/status/tweet_id with the screenname from the "user_key" field and "tweet_id" with the number in the "tweet_id" field.
Following the links will lead to a suspended page on Twitter. But some copies of the tweets as they originally appeared, including images, can be found by entering the links on webcaches like the Internet Archive’s Wayback Machine and archive.is.
Additional reporting by Maura Barrett.
Update: This database initially included a user whose account was restored by Twitter after the company provided the list of accounts published by Congress. We have since updated our methodology to exclude any such account.