AL
The data input format has been changed after the contest is finished.
As problem description in https://contest.com/docs/data_clustering is not as clear as ICPC-style problems, we assumed that the data will be given in the same format as in archives: multi-language raw .html fies.
Our solution expected the format described in https://contest.com/docs/data_clustering:
tgnews languages source_dir
tgnews news source_dir
tgnews categories source_dir
tgnews threads source_dir
tgnews top source_dir
where so
urce_dir i
s the same variable and not divided into ru_source_dir a
nd en_source_dir.
In our submission we use Java because we followed your suggestion and decided to use quick languages as C/C++/Java to increase performance speed.Now we're stuck because:
1) input data format has been changed after the contest is finished;
2) only some subset of participants get advantage to fix their solution.
Unfortunately, we're not in those lucky ones. Our issue clearly described here: https://contest.com/data-clustering/entry1187
I expect we can go forward and be more flexible with the rule of not changing source code. In our case it's one line fix.
Thank you.