Background

Social media companies such as Facebook and Twitter handed over information detailing those activities to Congress as part of the broader Congressional investigation into the role of Russian intervention in the 2016 elections. The United States House of Representative’s Permanent Select Committee on Intelligence released some 3517 advertisements that Facebook had ex-post flagged as being run by state-linked entities.

Rationale

These documents were released in redacted .PDF format making their analysis rather inaccessible. To remove this barrier to entry and allow for more research on this topic, the relevant text data has been cleaned into a downloadable .xlsx file.

There exists inconsistencies in some entries due to the redacted nature of the files but an attempt has been made to retain as much information as possible.

The code used to produce this .xlsx can be obtained here.

Information contained in the .xlsx

  1. adID: The ad identification number provided in the original PDF

  2. adText: The text contained in the advert in the original PDF (In a small number of cases, they had to be manually imputed due to detection issues)

  3. adSpend: The amount spent (in Roubles) to boost the reach of the advertisement

  4. adLP: Landing pages associated with the ad

  5. adImpress: Number of impressions made by the ad

  6. adClicks: Number of clicks received

  7. adCreation: Date and time that the advertisement was originally posted

  8. groups: Targetted groups based on identity, likes as well as other biodata

  9. adLocation_cleaned: Cities and states targeted by the advertisement

  10. adStates: States targeted

  11. adDate: Date (in proper lubridate format) that the advert was posted

Limitations

A small number of entries in certain columns will spillover and it is an issue that is still currently being resolved.

Endgoal

This project is part of my thesis looking at the application of active measures to influence public policy. Will eventually upload some visualisations using the data here as well!

Some very preliminary visuals:

The adverts were categorised into 5 types of adverts using LDA (created from .xlsx i a separate process, script to be made available). These 5 distinct classes were posts targeted at:

  1. Right-wing themes

  2. African-American history

  3. Generic marketing adverts

  4. Institutional discrimination

  5. Latin-American themes

Geographic distributions of advertisement to target groups Geographic breakdown of posts
Geographic distribution of states targeted by ads. All ads (Left) and Cluster 1: Right-Wing (Right)Geographic distribution of states targeted by ads. All ads (Left) and Cluster 1: Right-Wing (Right)

Geographic distribution of states targeted by ads. All ads (Left) and Cluster 1: Right-Wing (Right)

Geographic distribution of states targeted by ads. Cluster 2: African-American heritage and pride (Left) and Cluster 3: Marketing (Right)Geographic distribution of states targeted by ads. Cluster 2: African-American heritage and pride (Left) and Cluster 3: Marketing (Right)

Geographic distribution of states targeted by ads. Cluster 2: African-American heritage and pride (Left) and Cluster 3: Marketing (Right)

Geographic distribution of states targeted by ads. Cluster 4: African-American institutional discrimination (Left) and Cluster 5: Latin-American (Right)Geographic distribution of states targeted by ads. Cluster 4: African-American institutional discrimination (Left) and Cluster 5: Latin-American (Right)

Geographic distribution of states targeted by ads. Cluster 4: African-American institutional discrimination (Left) and Cluster 5: Latin-American (Right)

Post frequency for each group
˜Posts over time (2015-2017) by group

˜Posts over time (2015-2017) by group

The lines are as follows: Group 1 (Red), Group 2 (Blue), Group 3 (Green), Group 4 (Purple) and Group 5 (Pink).

More importantly, I hope the data is helpful in producing more insights into the topic.