Descriptive analysis of the magnitude and situation of road safety in general and road accidents in particular is important, but understanding of data quality, factors related with dangerous situations and various interesting patterns in data is of even greater importance. Under the umbrella of information architecture research for road safety in developing countries, the objective of this machine learning experimental research is to explore data quality issues, analyze trends and predict the role of road users on possible injury risks. The research employed TreeNet, Classification and Adaptive Regression Trees (CART), Random Forest (RF) and hybrid ensemble approach. To identify relevant patterns and illustrate the performance of the techniques for the road safety domain, road accident data collected from Addis Ababa Traffic Office is subject to several analyses. Empirical results illustrate that data quality is a major problem that needs architectural guideline and the prototype models could classify accidents with promising accuracy. In addition, an ensemble technique proves to be better in terms of predictive accuracy in the domain under study.
The goal of this article is to inform social scientists, especially those of a quantitative orientation, about the basic characteristics of Big Data and to present the opportunities and limitations of using such data in social research. The paper informs about three basic types of Big Data as they are distinguished in contemporary methodological literature, namely administrative data, transaction data and social network data, and exemplifies how they can be utilized by quantitative social research. According to many, questionnaire-based sample survey as the dominant method of quantitative social research has found itself in a crisis, especially as response rates have decreased in most developed countries and public confidence in opinion polling has declined. The author presents the characteristics and specifics of Big Data compared to survey research - a method whose primary distinguishing characteristic is the capacity to quantify individual behaviour, social action and attitudes at the level of populations. In this context, the article draws attention to the differences between Big Data and survey data typically presented in scholarly literature, namely that datasets are not representative of known populations, the values of observed variables are systematically biased, there is a limited number of variables in Big Data sets, there is uncertainty about the meaning of observed values, and social environment has direct influence on the behaviours captured by Big Data. Attention is also paid to such characteristics of Big Data that pose an obstacle to smooth integration of this type of data in the social scientific mainstream. First, the collection, processing and analysis of Big Data is extremely demanding in terms of programming skills, something social scientists typically do not have. Second, the availability of Big Data is limited as they are normally possessed by private corporations, some of which (Facebook, Google) have undoubtedly come to form data oligopolies - and their management is mostly unwilling to share their data with traditional academics. Based on the above-mentioned specifics, differences and limitations, it is argued that Big Data currently do not have the potential of becoming a full-fledged source of social science data and replacing sample surveys as the dominant research method. Finally, the article draws attention to the specifics of different types of Big Data as they are primarily generated for purposes other than social research and result from specific situations framed by existing social relations - and it is from this perspective that Big Data should be viewed by social researchers., Johana Chylíková., and Obsahuje bibliografické odkazy