Builds and saves the
repetition set of a crawl.
The input format for the tool are TAB-separated triples <store,position,URL>, which are assumed to be stably sorted by URL (the position
is the ordinal position in the store, as generated by the chosen filters).
The triples must contain all the URLs overall appearing in all involved stores. For each URL that appears more than once,
the pairs <store,position> of the copies following the first appearance are saved in a LongOpenHashSet
as
store << 48 | position