The Hrant Dink Foundation cordially invites you to the ‘Hate Speech Detection in Turkish and Arabic Tweets’ contest, which will be held as part of ‘The 18th Conference of the European Chapter of the Association for Computational Linguistics’ (EACL 2024). The contest has been organized within the scope of the project ‘Utilizing Digital Technology for Social Cohesion, Positive Messaging and Peace by Boosting Collaboration, Exchange and Solidarity’ sponsored by the European Union and Friedrich Naumann Foundation and in collaboration with Boğaziçi University and Sabancı University.

 Hate Speech Detection in Turkish and Arabic Tweets

 

Who can participate?

  • Academics, specialists, students and civil social organization employees.
  • To enroll, you may complete this form until 5 January 2024.
  • For detailed information about the application process and for any inquiries about the contest, you can visit this page or contact us by email to This email address is being protected from spambots. You need JavaScript enabled to view it..

Agenda for the contest and workshop:

  • Deadline for enrollment: 5 January 2024
  • Training Data Release Date (Problem A): 23 December 2023
  • Training Data Release Date (Problem B): 26 December 2023
  • Testing Phase: 14-16 January 2024
  • Article Submission Deadline: 23 January 2024
  • Author Notification: 29 January 2024
  • Final Article Submission: 2 February 2024
  • Workshop Dates: [To be announced]

Evaluation will be conducted on two different problems.

A) Hate Speech Detection in Turkish across Various Contexts

To develop a model capable of detecting hate speech in Turkish social media texts, participants will engage in binary classification tasks to determine whether tweets contain hate speech. The dataset contains various Turkish tweets labeled as either containing hate speech or not. Both the training and test sets include tweets consisting of statements about the Israel-Palestine Conflict, refugees, and Greeks. During the evaluation phase, the performance of the submitted model will be assessed using data containing hate speech related to these three topics.

B) Hate Speech Detection with Limited Data in Arabic

This dataset, consisting of approximately 1000 Arabic tweets, focuses on statements related to refugees with the goal of developing a model that can detect hate speech in the Arabic language with limited data. The model's performance will be evaluated using test data containing tweets that include hate speech against refugees.

Dataset

According to Twitter's Rules and Policies, tweet texts cannot be publicly shared. To facilitate your participation in the competition, we will distribute the necessary dataset to you via email. Sharing or publishing any data other than Tweet IDs is strictly prohibited. The dataset should not be shared or published without permission.

Evaluation Criteria

The methods developed will be evaluated on the Kaggle platform with independent test sets, and the success of different approaches will be compared using appropriate metrics. Multiple submissions per team will be allowed for each subtask, but there will be a limitation. Rankings will be based on the best performance per team. The evaluation metric for all subtasks will be the F1 score obtained from the test data. During the testing phase, two different scoreboards will be applied. Participants are required to submit their code along with a brief model description (150-250 words).
For each task, there will be an award distributed to the top submissions based on their performance. The total prize pool for the entire competition is 7,000 Euros gross. The distribution of the total prize amount will be at the discretion of the evaluation committee, determined by the performance of eligible submissions.


This project is financed by the European Union.