Victor is a web page cleaning tool. It is aimed at removing menu, ads, footers, headers, etc. from HTML web pages, so that only main web page content remains. Victor is based on a conditional random fields algorithm.
THE LINDAT/CLARIAH-CZ PROJECT (LM2018101; which is a direct legal successor of the LINDAT/CLARIN projects LM2010013 and LM2015071) IS FULLY SUPPORTED BY THE MINISTRY OF EDUCATION, SPORTS AND YOUTH OF THE CZECH REPUBLIC UNDER THE PROGRAMME LM OF "LARGE INFRASTRUCTURES".
Copyright (c) 2020 UFAL MFF UK. All rights reserved.