Subject: machine learning - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Subject machine learning

11. Smoothing decision boundaries to avoid overfitting in neural network training

Creator:: el Hindi, Khalil and Al-Akhras, Mousa
Format:: bez média and svazek
Type:: model:article and TEXT
Subject:: Artificial neural network, instance-based learning, instance reduction, instance selection, machine learning, noise filtering, overfitting, over learning, and prototype selection
Language:: English
Description:: This work addresses the problem of overfitting the training data. We suggest smoothing the decision boundaries by eliminating border instances from the training set before training Artificial Neural Networks (ANNs). This is achieved by using a variety of instance reduction techniques. A large number of experiments were performed using 21 benchmark data sets from UCI machine learning repository, the experiments were performed with and without the introduction of noise in the data set. Our empirical results show that using a noise filtering algorithm to filter out border instances before training an ANN does not only improve the classification accuracy but also speeds up the training process by reducing the number of training epochs. The effectiveness of the approach is more obvious when the training data contains noisy instances.
Rights:: http://creativecommons.org/publicdomain/mark/1.0/ and policy:public

12. SnakeCLEF 2021

Creator:: Picek, Lukáš, Bolon, Isabelle, Durso, Andrew M., and Castañeda, Rafael Ruiz de
Publisher:: CEUR Workshop Proceedings (CEUR-WS.org)
Type:: IMAGE and corpus
Subject:: LifeCLEF, SnakeCLEF, global health, epidemiology, snake bite, snake, reptile, benchmark, biodiversity, machine learning, computer vision, and Classification
Language:: No linguistic content
Description:: The dataset with 409,679 images belonging to 772 snake species from 188 countries and all continents (386,006 images with labels targeted for development and 23,673 images without labels for testing). In addition, we provide a simple train/val (90% / 10%) split to validate preliminary results while ensuring the same species distributions. Furthermore, we prepared a compact subset (70,208 images) for fast prototyping. The test set data consists of 23,673 images submitted to the iNaturalist platform within the "first four months of 2021. All data were gathered from online biodiversity platforms (i.e., iNaturalist, HerpMapper) and further extended by data scraped from Flickr. The provided dataset has a heavy long-tailed class distribution, where the most frequent species (Thamnophis sirtalis) is represented by 22,163 images and the least frequent by just 10 (Achalinus formosanus).
Rights:: BSD 3-Clause "New" or "Revised" license, http://opensource.org/licenses/BSD-3-Clause, and PUB

13. The potential applications of artificial intelligence in drug discovery and development

Creator:: Farghali, Hassan, Kutinová Canová, Nikolina, and Arora, Mahak
Format:: počítač and online zdroj
Type:: model:article and TEXT
Subject:: artificial intelligence, computer-assisted drug discovery, drug repositioning, machine learning, and DSP-1181
Language:: English
Description:: Development of a new dug is a very lengthy and highly expensive process since only preclinical, pharmacokinetic, pharmacodynamic and toxicological studies include a multiple of in silico, in vitro, in vivo experimentations that traditionally last several years. In the present review, we briefly report some examples that demonstrate the power of the computer-assisted drug discovery process with some examples that are published and revealing the successful applications of artificial intelligence (AI) technology on this vivid area. Besides, we address the situation of drug repositioning (repurposing) in clinical applications. Yet few success stories in this regard that provide us with a clear evidence that AI will reveal its great potential in accelerating effective new drug finding. AI accelerates drug repurposing and AI approaches are altogether necessary and inevitable tools in new medicine development. In spite of the fact that AI in drug development is still in its infancy, the advancements in AI and machine-learning (ML) algorithms have an unprecedented potential. The AI/ML solutions driven by pharmaceutical scientists, computer scientists, statisticians, physicians and others are increasingly working together in the processes of drug development and are adopting AI-based technologies for the rapid discovery of medicines. AI approaches, coupled with big data, are expected to substantially improve the effectiveness of drug repurposing and finding new drugs for various complex human diseases.
Rights:: http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public

14. WMT16 APE Shared Task Data

Creator:: Turchi, Marco, Chatterjee, Rajen, and Negri, Matteo
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: machine translation, machine learning, automatic postediting, and shared task
Language:: English and German
Description:: Training, development and text data (the same used for the Sentence-level Quality Estimation task) consist in English-German triplets (source, target and post-edit) belonging to the IT domain and already tokenized. Training and development respectively contain 12,000 and 1,000 triplets, while the test set 2,000 instances. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

15. WMT16 APE Shared Task Data - Reference sentences

Creator:: Turchi, Marco, Negri, Matteo, and Chatterjee, Rajen
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: machine translation, machine learning, automatic post-editing, and shared task
Language:: German
Description:: Training, development and test data consist in German sentences belonging to the IT domain and already tokenized. These sentences are the references of the data released for the 2016 edition of the WMT APE shared task. Differently from the data previously released, these sentences are obtained by manually translating the source sentence without leveraging the raw mt outputs. Training and development respectively contain 12,000 and 1,000 segments, while the test set 2,000 items. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

16. WMT16 Quality Estimation Shared Task Training and Development Data

Creator:: Specia, Lucia, Logacheva, Varvara, and Scarton, Carolina
Publisher:: University of Sheffield
Type:: text and corpus
Subject:: machine translation, quality estimation, and machine learning
Language:: English and German
Description:: Training and development data for the WMT16 QE task. Test data will be published as a separate item. This shared task will build on its previous four editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, sentence-level and document-level estimation. The sentence and word-level tasks will explore a large dataset produced from post-editions by professional translators (as opposed to crowdsourced translations as in the previous year). For the first time, the data will be domain-specific (IT domain). The document-level task will use, for the first time, entire documents, which have been human annotated for quality indirectly in two ways: through reading comprehension tests and through a two-stage post-editing exercise. Our tasks have the following goals: - To advance work on sentence and word-level quality estimation by providing domain-specific, larger and professionally annotated datasets. - To study the utility of detailed information logged during post-editing (time, keystrokes, actual edits) for different levels of prediction. - To analyse the effectiveness of different types of quality labels provided by humans for longer texts in document-level prediction. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for the sentence and word-level tasks, and multiple MT systems were used to produce translations for the document-level task. Therefore, MT system-dependent information will be made available where possible.
Rights:: AGREEMENT ON THE USE OF DATA IN QT21, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

17. WMT17 Quality Estimation Shared Task Training and Development Data

Creator:: Specia, Lucia and Logacheva, Varvara
Publisher:: University of Sheffield
Type:: text and corpus
Subject:: machine translation, quality estimation, and machine learning
Language:: English and German
Description:: Training and development data for the WMT17 QE task. Test data will be published as a separate item. This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks will make use of a large dataset produced from post-editions by professional translators. The data will be domain-specific (IT and Pharmaceutical domains) and substantially larger than in previous years. In addition to advancing the state of the art at all prediction levels, our goals include: - To test the effectiveness of larger (domain-specific and professionally annotated) datasets. We will do so by increasing the size of one of last year's training sets. - To study the effect of language direction and domain. We will do so by providing two datasets created in similar ways, but for different domains and language directions. - To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes.
Rights:: AGREEMENT ON THE USE OF DATA IN QT21, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

18. WMT17 Quality Estimation Shared Test Data

Creator:: Specia, Lucia and Logacheva, Varvara
Publisher:: University of Sheffield
Type:: text and corpus
Subject:: machine translation, quality estimation, and machine learning
Language:: English and German
Description:: Test data for the WMT17 QE task. Train data can be downloaded from http://hdl.handle.net/11372/LRT-1974 This shared task will build on its previous five editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks will make use of a large dataset produced from post-editions by professional translators. The data will be domain-specific (IT and Pharmaceutical domains) and substantially larger than in previous years. In addition to advancing the state of the art at all prediction levels, our goals include: - To test the effectiveness of larger (domain-specific and professionally annotated) datasets. We will do so by increasing the size of one of last year's training sets. - To study the effect of language direction and domain. We will do so by providing two datasets created in similar ways, but for different domains and language directions. - To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. This year's shared task provides new training and test datasets for all tasks, and allows participants to explore any additional data and resources deemed relevant. A in-house MT system was used to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes.
Rights:: AGREEMENT ON THE USE OF DATA IN QT21, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

19. WMT18 Quality Estimation Shared Task Test Data

Creator:: Specia, Lucia, Logacheva, Varvara, Blain, Frederic, Fernandez, Ramon, and Martins, André
Publisher:: University of Sheffield
Type:: text and corpus
Subject:: machine translation, quality estimation, and machine learning
Language:: English, German, Czech, and Latvian
Description:: Test data for the WMT18 QE task. Train data can be downloaded from http://hdl.handle.net/11372/LRT-2619. This shared task will build on its previous six editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks make use of datasets produced from post-editions by professional translators. The datasets are domain-specific (IT and life sciences/pharma domains) and extend from those used previous years with more instances and more languages. One important addition is that this year we also include datasets with neural MT outputs. In addition to advancing the state of the art at all prediction levels, our specific goals are: To study the performance of quality estimation approaches on the output of neural MT systems. We will do so by providing datasets for two language language pairs where the same source segments are translated by both a statistical phrase-based and a neural MT system. To study the predictability of deleted words, i.e. words that are missing in the MT output. TO do so, for the first time we provide data annotated for such errors at training time. To study the effectiveness of explicitly assigned labels for phrases. We will do so by providing a dataset where each phrase in the output of a phrase-based statistical MT system was annotated by human translators. To study the effect of different language pairs. We will do so by providing datasets created in similar ways for four language language pairs. To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. Measure progress over years at all prediction levels. We will do so by using last year's test set for comparative experiments. In-house statistical and neural MT systems were built to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes. Participants are allowed to explore any additional data and resources deemed relevant.
Rights:: AGREEMENT ON THE USE OF DATA IN QT21, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

20. WMT18 Quality Estimation Shared Task Training and Development Data

Creator:: Specia, Lucia, Logacheva, Varvara, Blain, Frederic, Fernandez, Ramon, and Martins, André
Publisher:: University of Sheffield
Type:: text and corpus
Subject:: machine translation, quality estimation, and machine learning
Language:: English, German, Czech, and Latvian
Description:: Training and development data for the WMT18 QE task. Test data will be published as a separate item. This shared task will build on its previous six editions to further examine automatic methods for estimating the quality of machine translation output at run-time, without relying on reference translations. We include word-level, phrase-level and sentence-level estimation. All tasks make use of datasets produced from post-editions by professional translators. The datasets are domain-specific (IT and life sciences/pharma domains) and extend from those used previous years with more instances and more languages. One important addition is that this year we also include datasets with neural MT outputs. In addition to advancing the state of the art at all prediction levels, our specific goals are: To study the performance of quality estimation approaches on the output of neural MT systems. We will do so by providing datasets for two language language pairs where the same source segments are translated by both a statistical phrase-based and a neural MT system. To study the predictability of deleted words, i.e. words that are missing in the MT output. TO do so, for the first time we provide data annotated for such errors at training time. To study the effectiveness of explicitly assigned labels for phrases. We will do so by providing a dataset where each phrase in the output of a phrase-based statistical MT system was annotated by human translators. To study the effect of different language pairs. We will do so by providing datasets created in similar ways for four language language pairs. To investigate the utility of detailed information logged during post-editing. We will do so by providing post-editing time, keystrokes, and actual edits. Measure progress over years at all prediction levels. We will do so by using last year's test set for comparative experiments. In-house statistical and neural MT systems were built to produce translations for all tasks. MT system-dependent information can be made available under request. The data is publicly available but since it has been provided by our industry partners it is subject to specific terms and conditions. However, these have no practical implications on the use of this data for research purposes. Participants are allowed to explore any additional data and resources deemed relevant.
Rights:: AGREEMENT ON THE USE OF DATA IN QT21, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

11. Smoothing decision boundaries to avoid overfitting in neural network training

12. SnakeCLEF 2021

13. The potential applications of artificial intelligence in drug discovery and development

14. WMT16 APE Shared Task Data

15. WMT16 APE Shared Task Data - Reference sentences

16. WMT16 Quality Estimation Shared Task Training and Development Data

17. WMT17 Quality Estimation Shared Task Training and Development Data

18. WMT17 Quality Estimation Shared Test Data

19. WMT18 Quality Estimation Shared Task Test Data

20. WMT18 Quality Estimation Shared Task Training and Development Data

Limit your search

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Coverage

Creator

Show values starting with

Format

Language

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from