Incels

The Problem

The Method

Tools Used

The data was scraped from r/IncelExit on October 15, 2020 using the Python Reddit API Wrapper (PRAW). This represents about a year’s worth of data, as the subreddit was formed on October 17, 2019. I gathered 49,279 total posts and comments. After removing missing data and posts by the Auto-moderator, I had 43,302 viable pieces of text, consisting of both “posts” and “comments”. “Posts” are the parent on the subreddit, and comments are nested under posts (and comments can also be nested within comments). The text was processed via the dictionaries produced by Farrell et al. (2019, 2020) which measure Manosphere jargon and misogyny, a sentiment valence dictionary (Wilson, 2005), and Pennebaker et al.’s (2015) Linguistic Inquiry and Word Count (LIWC) software. The LIWC software can input .txt files or multiple pieces of text in a .csv column and generates a percentage of total word count that falls into certain dictionaries. The LIWC can produce multiple psychological and linguistic variables (Affective/Perceptual/Cognitive/Social/Biological Processes, Summary language, informal language, relativity, time orientation, drives, and personal concerns). For Farrell et al.’s lexicons, I did a similar calculation in order to produce a percentage of total word count. The sentiment valence dictionary has words measured with 1 for positive words and -1 for negative words and produces either a positive or negative valence for a piece of text. See table 1 for means, standard deviations, and ranges for all variables.


Analyses

Cluster Analysis

Once these variables were calculated and standardized, they were all entered into a K-means cluster analysis in R as a first step of exploring the data. Cluster analysis creates groupings of data based on several independent variables that minimizes within group variation and maximizes between-group variable. To determine the number of clusters to examine, I used the scree plot method and silhouette method. Thought the silhouette method recommended 2 clusters, I followed the scree plot method’s recommendation of 4 clusters. I did this because the inertia, or the sum of distances of all points within a cluster from the centroid of that cluster, was lower for the 4-cluster solution (SSE for 4 groups = 3,290,393.52 vs SSE for 2 groups = 3,420,950.96). Secondly, the 4-cluster solution simply seemed to produce more interesting groupings. Following this cluster analysis, I aggregated the independent variables by group and created bar graphs in order to interpret these differences by groups (see Tables 2a-2c).

I conducted a cluster analysis using 81 features and found four different groups. Results on this analysis will be discussed in the section below and follow up steps for the remainder of this study protocol will follow.

Sentiment analysis is utilized to explore the positivity of the community in response to the original post. Additionally, lexicons based on the Adverse Childhood Experiences Scale (ACES) and the Big Five personality scale will be used to measure traumatic experiences and personality attributes that may also help to predict adherence to incel ideology.  

Talk about NLP 

Results

Reflections. Group 1 is the second largest group of texts and leans toward neutral sentiment. These texts have some mention of misogyny, particularly stoicism (the pain and hardship associated with the lack of intimacy and relationships). They refer heavily to groups in the manosphere and use deterministic language (the essentialism of inceldom). Group 1 has slightly more positive than negative affect. There are mentions of health, body, sexuality, and social processes. Group 1 focuses on authenticity, with not much informal language. Words relating to drive towards affiliation, achievement, and reward are present, with more relating to power. They are highly concerned with leisure and money. I took a cursory look at the texts labeled with group 1 and noticed a lot of “I” language.

“I can be upset about a lack of dating life without feeling that women owe me anything. It's more that I blame myself for being undesirable. I don't know how to tell what's wrong with me though. I have decent hygiene...I wish I knew what my biggest flaws are so that I know what I needed to work on most.”

Brief Validations. Group 2 has the most positive language, with no mention of misogyny or incel jargon. Group 2 heavily leans toward positive sentiment. They mention sexual matters occasionally. Mentions of social processes lean towards family and men. Group 2 is very analytic, certain, and focused on tone, with the heaviest use of informal language. They are highly focused on the present. These texts are concerned with affiliation, reward, leisure, as well as money and religion. The majority of these texts were 1-2 word responses (e.g. “Adorable”, “*hugs*”, “bro”, “lol”, “Sure.”). After seeing these texts, I understand that the word count variables may miss the use of sarcasm. NLP techniques may not capture subtleties in online language and context, and thus lump two types of affirmations together. First is “bro” and “Bruh…”, which could both indicate surprise, disgust, empathy, or all depending on context. “Sure.” may be an affirmative statement of belief but on the internet could indicate doubt.


Helpful advice. Group 3 is the largest group of texts and has more neutral sentiment, as well as some mentions of misogyny, other members of the manosphere, and determinism. Group 3 has little mention of affect, slightly more positive than negative. They focus on health. Their cognitive processes lean towards tentativeness and differences. Group 3 has more mention of clout. There is not much informal language and are highly focused on the present, with some mention of the past. Words relating to affiliation are most present, but other drives are present: mostly leisure and money. There was a lot of “you” language 

“Your appearance doesn’t define yourself worth. It sucks that shitty pop culture bullshit has made it impossible for any person to decouple this idea though.”


The Deeply Hurt. Group 4 has plenty of affect as well, most of which is negative and angry, with some sadness. They are heavily focused on body and sexual matters with some mention of social processes. Their cognitive processes lean more heavily towards differences. Group 4 has more negative sentiment, with the highest use of misogyny, particularly stoicism, hostility, mention of physical violence, and belittling of others. Group 4 is also focused on clout. There is a little informal language, mostly in the form of swears. They are highly focused on the present, with some mention of the past. Words relating to power and risk are most present. They are highly concerned with leisure, religion, and death, as well as money. I noticed a lot more pain in these texts. 

Conclusions

These four groups give a first glimpse into what is going on in r/IncelExit. Most of the texts contain reflection and advice, while a few texts contain brief validations that could also be sarcasm, and a few others contain a deep sense of pain and loneliness. Further analysis can dive deeper into Groups 1 and 3 to tease apart other themes that can contribute to healing, while group 4 texts can show the psychological pain that incels are experiencing as they work through their experiences and strive toward rehabilitation.

Webscraping & the Manosphere

Web Scraping and the Manosphere

A presentation I gave to my workplace (ICCCR) on my proposed project on Incels.


Incel and Outcel

Presentation at the 2020 SSSS Annual Meeting where I used NLP techniques and cluster analysis to study posts on r/incelexit, a community where incels can look for help.