The Telegram
Data Clustering Contest starts now.
Based on the input data we provide, you are expected to develop an algorithm that:
1. Identifies content in English and Russian and discards the rest.
2. Identifies news articles from the result of (1) and discards the rest.
3. Classifies each news piece from the result of (2) into one of these 7 categories: Society, Economy, Technology, Entertainment, Science, Sports and Other.
4. Identifies news pieces about the same event and groups them together into news threads.
5. Sorts news threads based on perceived importance.
Below is the sample input data. We will be publishing more sample data sets as the contest progresses. Check out the detailed description of the contest task
here (and
here in Russian).
Participants have two weeks until
December, 2 (the deadline is
23:50 Dubai time) to come up with a solution and upload it to
@jobs_bot.
The authors of the best solutions will share a prize fund of
$100,000 and will be able to take part in the second stage of the contest, getting a chance to claim another
$100,000 in prizes.