Dependency Tree

Universal Dependencies - English - GUM

LanguageEnglish
ProjectGUM
Corpus Parttrain
AnnotationPeng, Siyao;Zeldes, Amir

Select a sentence

s-1 1 Introduction
s-2 Tenured and tenure-track university faculty play a special role in determining the speed and direction of scientific progress, both directly through their research and indirectly through their training of new researchers.
s-3 Past studies establish that each of these efforts is strongly and positively influenced through various forms of faculty diversity, including ethnic, racial, and gender diversity.
s-4 As an example, research shows that greater diversity within a community or group can lead to improved critical thinking [1] and more creative solutions to complex tasks [2, 3] by pairing together individuals with unique skillsets and perspectives that complement and often augment the abilities of their peers.
s-5 Additionally, diversity has been shown to produce more supportive social climates and effective learning environments [4], which can facilitate the mentoring of young scientists.
s-6 Despite these positive effects, however, quantifying the impact of diversity in science remains exceedingly difficult, due in large part to a lack of comprehensive data about the scientific workforce.
s-7 Measuring the composition and dynamics of a scientific workforce, particularly in a rapidly expanding field like computer science, is a crucial first step toward understanding how scholarly research is conducted and how it might be enhanced.
s-8 For many scientific fields, however, there is no central listing of all tenure-track faculty, making it difficult to define a rigorous sample frame for analysis.
s-9 Further, rates of adoption of services like GoogleScholar and ResearchGate vary within, and across disciplines.
s-10 For instance, gender representation in computing is an important issue with broad implications [5], but without a full census of computing faculty, the degree of inequality and its possible sources are difficult to establish [6].
s-11 Some disciplines, like political science, are organized around a single professional society, whose membership roll approximates a full census [7].
s-12 Most fields, on the other hand, including computer science, lack a single all-encompassing organization and membership information is instead distributed across many disjoint lists, such as web-based faculty directories for individual departments.
s-13 Because assembling such a full census is difficult, past studies have tended to avoid this task and have instead used samples of researchers [8 11], usually specific to a particular field [12 16], and often focused on the scientific elite [17, 18].
s-14 Although useful, such samples are not representative of the scientific workforce as a whole and thus have limited generalizability.
s-15 One of the largest census efforts to date assembled, by hand, a nearly complete record of three academic fields: computer science, history, and business [19].
s-16 This data set has shed considerable light on dramatic inequalities in faculty training, placement, and scholarly productivity [6, 19, 20].
s-17 But, this data set is only a single snapshot of an evolving and expanding system and hence offers few insights into the changing composition and diversity trends within these academic fields.
s-18 In some fields, yearly data on faculty numbers and composition are available in aggregate.
s-19 In computer science, the Computing Research Association (CRA) documents trends in the employment of PhD recipients through the annual Taulbee survey of computing departments in North America (cra.org/resources/taulbee-survey).
s-20 Such surveys can provide valuable insight into trends and summary statistics on the scientific workforce but suffer from two key weaknesses.
s-21 First, surveys are subject to variable response rates and the misinterpretation of questions or sample frames, which can inject bias into fine-grained analyses [21, 22].
s-22 Second, aggregate information provides only a high-level view of a field, which can make it difficult to investigate causality [23].
s-23 For example, differences in recruitment and retention strategies across departments will be washed out by averaging, thereby masking any insights into the efficacy of individual strategies and policies.
s-24 Here, we present a novel system, based on a topical web crawler, that can quickly and automatically assemble a full census of an academic field using digital data available on the public World Wide Web.
s-25 This system is efficient and accurate, and it can be adapted to any academic discipline and used for continuous collection.
s-26 The system is capable of collecting census data for an entire academic field in just a few hours using off-the-shelf computing hardware, a vast improvement over the roughly 1600 hours required to do this task by hand [19].
s-27 By assembling an accurate census of an entire field from online information alone, this system will facilitate new research on the composition of academic fields by providing access to complete faculty listings, without having to rely on surveys or professional societies.
s-28 This system can also be used longitudinally to study how the workforce’s composition changes over time, which is particularly valuable for evaluating the effectiveness of policies meant to broaden participation or improve retention of faculty.
s-29 Finally, applied to many academic fields in parallel, the system can elucidate scientists’ movement between different disciplines and relate those labor flows to scientific advances.
s-30 In short, many important research questions will benefit from the availability of accurate and frequently-recollected census data.
s-31 Our study is organized as follows.
s-32 We begin by detailing the design and implementation of our web crawler framework.
s-33 Next, we present the results of our work in two sections.
s-34 The first demonstrates the validity and utility of the crawler by collecting census data for the field of computer science and comparing it to a hand-curated census, collected in 2011 [19].
s-35 The second provides an example of the type of research enabled by our system and uses the 2011 and 2017 censuses to investigate the leaky pipeline problem in faculty retention.

Text viewDownload CoNNL-U