Online discussion related to Muslim identity and Islam is prolific. While much of this debate is benign, around certain events, such as terror attacks and high profile court cases involving Muslim defendants, online speech can become antagonistic. The use of artificial intelligence to detect online hate speech has gathered pace (Burnap and Williams 2016; Davidson et al. 2017; Nobata et al. 2016; Silva et al. 2016). However, automatic detection in itself does not allow for sociological insight. Although previous research has addressed the online propagation of cyberhate following discrete trigger events (Williams and Burnap, 2015), little is currently known about the longitudinal diffusion (e.g. years instead of days) of online information flows relating to Muslim identity and Islam. This limitation in the field means we do not know if the findings from cross-sectional research on the topic are generalisable across time, locations and events.
This research will address the following research questions: (i) How, when and where is Muslim identity and Islam being discussed on Twitter over 12-month period; (ii) What topics are prevalent within such discussion and how is Muslim identity being constructed on Twitter around events?; (iii) which social, content and temporal factors influence the production of anti-Muslim cyberhate?
The data will be collected from the Twitter filter stream API over 12-months using Muslim identity and Islam related keywords. Various computational social science techniques will be used for data collection, pre-processing, visualisation and modelling using R and Python languages. The computational and big-data oriented approach allows for corpora from social media to be visualised geographically over time. Using Latent Dirichlet Allocation, positive, neutral and negative topics (e.g. sporting successes, Ramadan, terror attacks) will be extracted from the vast 12-month corpora mentioning Islam and Muslims. Social, temporal and content factors influencing information flows around Muslim identity and Islam will be statistically modelled to predict size and survival. An anti-Muslim online hate speech classifier will be developed and applied to the corpora (Joulin et al. 2016). The output will be visualised as time-series plots to explore how the longitudinal distribution of cyberhate differs from general discussions around Muslim identity on Twitter. Logistic regression models will predict the production of cyberhate. In particular, the influence of different user typologies such as police, media, public officials, hate groups and victimised groups will be compared.
It is hoped that this PhD will have policy contributions. Hate speech on social media is high on the policy agenda. In 2017 the Home Affairs Select Committee on Hate Crime and its Violent Consequences received evidence from Facebook, Twitter and Google on their efforts to stem the production and spread of hate on their platforms. Evidence provided by HateLab at Cardiff University demonstrated how hate speech can be automatically detected and tracked in real-time. This PhD will add to this significant body of policy-relevant work by testing if previous cross-sectional work is generalisable over time, locations and events.