dc.contributor.author | Németh, László |
dc.contributor.author | Halácsy, Péter |
dc.contributor.author | Kornai, András |
dc.contributor.other | László, Németh |
dc.date.accessioned | 2014-07-30T21:34:47Z |
dc.date.available | 2014-07-30T21:34:47Z |
dc.date.issued | 2014-07-30 |
dc.identifier.uri | http://hdl.handle.net/11372/LRT-1338 |
dc.description | HunToken is a rule based tokenizer and sentence boundary detector for Hungarian (and English) texts. |
dc.publisher | Budapest Technical University Media Research Centre |
dc.rights | GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0) |
dc.rights.uri | http://opensource.org/licenses/LGPL-3.0 |
dc.subject | tokenizer |
dc.title | huntoken - tokenizer and sentence splitter |
dc.type | toolService |
dc.rights.label | PUB |
has.files | yes |
additional.metadata | Documentation language(s) (field_tool_documentation_langua):Hungarian Language(s) of input data (field_tool_input_language):English||Hungarian Implementation language(s) (field_tool_implementation_langu):GNU Flex, C Short name (field_tool_short_name):huntoken Readily Available (field_tool_available):Readily available Availibility (field_tool_availibility):Freely available under LGPL licence Nid:2328 Platform(s) (field_tool_platform):UNIX Character encoding of output data (field_tool_char_encoding_output):Latin 1 (ISO 8859-1)||Latin 2 (ISO 8859-2) Documentation link (field_tool_document_link):http://mokk.bme.hu/resources/huntoken/huntoken.pdf Open source code (field_tool_open_source_code):yes Language(s) of output data (field_tool_output_language):English||Hungarian Character encoding of input data (field_tool_char_encoding):Latin 1 (ISO 8859-1)||Latin 2 (ISO 8859-2) |
branding | LRT + Open Submissions |
dc.coverage.placeName | Hungary |
files.size | 419787 |
files.count | 1 |
Soubory tohoto záznamu
Licenční kategorie:
Licence: GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0)
Publicly Available
Licence: GNU Library or "Lesser" General Public License 3.0 (LGPL-3.0)
- Název
- huntoken-1.6.tgz
- Velikost
- 409.95 KB
- Formát
- application/x-gzip
- Popis
- Huntoken
- MD5
- f2e24178f2ed18bba994c0ec5e2c7fe4
- huntoken-1.6
- hun_sentence.flex5 kB
- hun_token.flex68 kB
- LEIRAS1 kB
- hun_sentclean.flex490 B
- hun_clean.flex10 kB
- LICENC17 kB
- Makefile4 kB
- CVS
- Repository18 B
- Entries571 B
- Root11 B
- bin
- hun_abbrev_en27 kB
- huntoken1 kB
- hun_test546 B
- CVS
- Repository22 B
- Entries210 B
- Root11 B
- hun_szeged1 kB
- hun_macro967 B
- hun_head1 kB
- hun_latin1.flex6 kB
- EREDMENY2 kB
- token.flex++69 kB
- test
- CVS
- Repository23 B
- Entries2 B
- Root11 B
- CVS
- hun_abbrev_en.flex.m44 kB
- example
- 1984.xml779 kB
- HOLTLELKEK.sbd36 kB
- HOLTLELKEK.txt35 kB
- CVS
- Repository26 B
- Entries143 B
- Root11 B
- HOLTLELKEK.xml119 kB
- hun_sentbreak.flex712 B
- data
- abbrevations.txt1 kB
- abbrev_en.txt656 B
- CVS
- Repository23 B
- Entries51 B
- Root11 B
- man
- huntoken.11 kB
- CVS
- Repository22 B
- Entries45 B
- Root11 B
- hun_abbrev.flex.m44 kB
- doc
- huntoken.doc103 kB
- huntoken.sxw18 kB
- CVS
- Repository22 B
- Entries92 B
- Root11 B