As part of the "Utilizing Digital Technology for Social Cohesion, Positive Messaging and Peace by Boosting Collaboration, Exchange and Solidarity" project, pari is an open-source tool designed in collaboration with experts from various fields including linguistics, computer science, social sciences, IT sector, and civil society. Using artificial intelligence technology, pari enables the detection of online hate speech in Turkish and Arabic texts.
The meanings of pari
The name pari carries different but complementary meanings in several languages:
Latin: "Equal," emphasizing our values of equality and justice.
Armenian: "Good," encouraging well-intentioned, constructive communication.
These multilayered meanings reflect pari's inclusive and pluralistic approach and represent our intention to contribute to strengthening a culture of coexistence in both language and digital spaces.
Text analysis
pari can perform discourse analysis on different scales, from a small number of texts to large-scale data sets. Users can easily learn whether their texts contain hate speech or discriminatory language by uploading them. Two types of analysis options are offered:
Single Text Analysis: With standard membership, users can analyze individual texts for hate speech by entering them into the search field.
Batch Text Analysis: Available only with 'Professional Membership', this feature allows users to analyze multiple texts simultaneously by uploading Excel files in .xlsx format.
Data-based approach
The tool helps users understand and identify hate speech through language analysis-based outputs. It also provides the opportunity to structurally examine such discourse through data analysis.
Who can use it?
Anyone who wants to detect hate speech and discriminatory discourse can use pari. With its professional membership option, pari also aims to partially automate and speed up hate speech monitoring efforts.
- Individual users
- Academics and researchers
- Civil society and media professionals
Combating hate speech
Language can be both a carrier and preventer of discrimination. pari allows researchers to analyze language more deeply and provides a tool for combating discrimination through language.
If you would like to try pari, you can access it here.
Methodology
pari was developed as a machine learning-based artificial intelligence model for hate speech detection. For training the model, tweets obtained from the X platform were used in addition to extensive news data archived by scanning 10 years of written press by the Hrant Dink Foundation. To create a high-quality dataset, an interdisciplinary team developed detailed labeling guidelines aimed at eliminating ambiguities and ensuring inclusivity. Under the guidance of this directive, hate speech was divided into four categories and its level of violence was evaluated on a scale from 0 to 10.
Exaggeration/Generalization/ Attribution/Distortion: Discourses that draw larger conclusions and inferences from an event, situation, or action, manipulate real data by distorting it, or attribute isolated incidents to the entirety of an identity.
Swearing/Insult/Defamation/Dehumanization: Discourses that include direct insults, slurs, or demeaning remarks towards a community, or describe them with actions or attributes typically associated with non-human entities.
Threat of Enmity/War/Attack/Murder/Harm: Discourses that contain hostile statements, invoke war-like language, or express a desire to harm the specific identity in question.
Symbolization: Discourses in which an element of identity itself is used as an element of insult, hatred, or humiliation and the identity is symbolized in such manners.
In addition to hate speech categories, Tweets have also been annotated with respect to discriminatory discourse.
Discriminatory Discourse: Discourses where a community is viewed negatively as different from the dominant group in areas such as benefiting from rights and freedoms or inclusion in society.
As part of the data annotation process, annotators from different fields, comprising mostly university students, received extensive training prior to the labelling of tweets. Each tweet was annotated by three different annotators to ensure accuracy.
The dynamic and evolving nature of hate speech, cultural references, lack of context, and linguistic ambiguity (e.g. irony) are among the ongoing challenges in automatic hate speech classification. Adapting the model to accommodate these changes and maintaining accuracy requires regular data updates and retraining. Therefore, it is our aim to improve pari by regularly feeding more data. pari maintains 85% accuracy in Turkish and 80% accuracy in Arabic. In Turkish language, pari has achieved 17% false positive (FP) and 15% false negative (FN) results based on 2189 test data. In Arabic language, pari has yielded 6% false positive (FP) and 27% false negative (FN) results on 499 test data. (For further information on accuracy rates, please see “Utilizing AI in Against Hate Speech: A guide to annotation, classification and detection”).
pari, was developed in collaboration with Hrant Dink Foundation, Boğaziçi University, and Sabancı University
with the financial assistance of the European Union and Friedrich Naumann Foundation.
This project is funded by the European Union.