Classification Of Spanish Election Tweets (COSET)

Mediaflows organiza en colaboración con el Pattern Recognition and Human Language Technology research center (PRHLT) una de las tareas que tendrán lugar en el próximo Congreso Nacional de la Sociedad Española de Procesamiento de Lenguaje Natural (SEPLN)

En concreto, se trata de la tarea Classification Of Spanish Election Tweets (COSET), en la que los participantes tendrán que desarrollar un sistema de clasificación automatizado de tweets de contenido político. Esta tarea forma parte del workshop IberEval 2017 (Evaluation of Human Language Technologies for Iberian languages), integrado en el Congreso de la SEPLN 2017. Está reunión científica tendrá lugar en Murcia, y los talleres y tutoriales se celebrarán el día 19 de septiembre de 2017.

CFP Classification Of Spanish Election Tweets (COSET)

Task Description

Political conversation in Twitter increases when a General Election comes close. Analyzing the topics discussed by users provides interesting insights of this growing public conversation on politics.

In COSET, the aim is to classify a corpus of political tweets in 5 categories of classification: political issues, related to the most abstract electoral confrontation; policy issues, about sectorial policies; personal issues, on the life and activities of the candidates; campaign issues, related with the evolution of the campaign; and other issues.

The tweets are written in Spanish and they talk about the 2015 Spanish General Election. In the training phase participants will be provided with Twitter Ids and their manually issue codification.

We cordially invite all researchers and practitioners from all fields to participate in COSET.

Important Dates

March 20th, 2017 Release of training data
April 24th, 2017 Release of test data
May 08th, 2017 Submission of runs
May 15th, 2017 Evaluation results
May 29th, 2017 Working notes due
June 12nd, 2017 Review to authors
June 26th, 2017 Camera ready due

Corpus

To develop your classification, we provide you with a training data set that consists of the Twitter Id and their issue category.

Click here to download the training corpus (The file is password-protected. To obtain the password, you need to write first to coset2017@gmail.com).

To test your classification, we provide you with a data set that consists of the Twitter Id.

Click here to download the test corpus (The file is password-protected).

Track Coordinators

Tomás Baviera, Mediaflows Research Group, Valencian International University, Spain
Germán Llorca, José Gámir Ríos, Mediaflows Research Group, Universitat de València, Valencia, Spain
Dafne Calvo, Mediaflows Research Group, Universidad de Valladolid, Spain
Maite Giménez, Paolo Rosso, Roberto Paredes, PRHLT research centre, Universitat Politècnica de València, Spain

Evaluation

The metric used for evaluating the participating systems will be the F-1 macro measure. This metric considers the precision and the recall of the system’s predictions, combining them using the harmonic mean. Provided that the classes are not balanced, we proposed using the macro-averaging method for preventing systems biased towards the most populated classes.

Results

This is the ranking of the results, after evaluating the 39 submissions received for the COSET Project.

The COSET Organizing Committee appreciates all the participants’ interest on this task.