Catalog
Repository
Education
Projects
Tools
Services
About
Partners
Mission Statement
CLARIN
DARIAH
Service integrations
Project partnerships
Login
LINDAT/CLARIAH-CZ Repository Home
View Item
Show/Hide Menu
Browse
All of the Repository
Issue Date
Authors
Titles
Subjects
Publisher
Language
Type
Rights Label
My Account
Login
Statistics
Statistics
BETA
General Information
Deposit
Cite
Submission Lifecycle
FAQ
About
Help Desk
Victor
LINDAT / CLARIAH-CZ
Authors
Marek, Michal
Item identifier
http://hdl.handle.net/11858/00-097C-0000-0001-48FD-B
Project URL
http://ufal.mff.cuni.cz/victor/
Date issued
2009-11-02
Type
toolService
Description
Victor is a web page cleaning tool. It is aimed at removing menu, ads, footers, headers, etc. from HTML web pages, so that only main web page content remains. Victor is based on a conditional random fields algorithm.
Publisher
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Subject(s)
html cleaning
Collection(s)
LINDAT / CLARIAH-CZ Data & Tools
Show full item record
Files in this item
This item is
Publicly Available
and licensed under:
GNU General Public License, version 2
Name
victor-1.0-beta.tar.bz2
Size
1.79 MB
Format
application/x-bzip2
Description
Installation file (Linux, 32bits)
MD5
3cbeda259d5eefee2d5bd8fed1a531ee
Download file