Social media companies such as Facebook and Twitter handed over information detailing those activities to Congress as part of the broader Congressional investigation into the role of Russian intervention in the 2016 elections. The United States House of Representative’s Permanent Select Committee on Intelligence released some 3517 advertisements that Facebook had ex-post flagged as being run by state-linked entities.
These documents were released in redacted .PDF format making their analysis rather inaccessible. To remove this barrier to entry and allow for more research on this topic, the relevant text data has been cleaned into a downloadable .xlsx file.
There exists inconsistencies in some entries due to the redacted nature of the files but an attempt has been made to retain as much information as possible.
The code used to produce this .xlsx can be obtained here.
adID: The ad identification number provided in the original PDF
adText: The text contained in the advert in the original PDF (In a small number of cases, they had to be manually imputed due to detection issues)
adSpend: The amount spent (in Roubles) to boost the reach of the advertisement
adLP: Landing pages associated with the ad
adImpress: Number of impressions made by the ad
adClicks: Number of clicks received
adCreation: Date and time that the advertisement was originally posted
groups: Targetted groups based on identity, likes as well as other biodata
adLocation_cleaned: Cities and states targeted by the advertisement
adStates: States targeted
adDate: Date (in proper lubridate format) that the advert was posted
A small number of entries in certain columns will spillover and it is an issue that is still currently being resolved.
This project is part of my thesis looking at the application of active measures to influence public policy. Will eventually upload some visualisations using the data here as well!
The adverts were categorised into 5 types of adverts using LDA (created from .xlsx i a separate process, script to be made available). These 5 distinct classes were posts targeted at:
Right-wing themes
African-American history
Generic marketing adverts
Institutional discrimination
Latin-American themes
The lines are as follows: Group 1 (Red), Group 2 (Blue), Group 3 (Green), Group 4 (Purple) and Group 5 (Pink).
More importantly, I hope the data is helpful in producing more insights into the topic.