Theoretical and empirical approaches to the design of effective messages to increase healthy and reduce risky behavior have shown only incremental progress. This article explores approaches to the development of a “recommendation system” for archives of public health messages. Recommendation systems are algorithms operating on dense data involving both individual preferences and objective messa…
Methods for analyzing neural and computational social science data are usually used by different types of scientists and generally seen as distinct, but they strongly complement one another. Computational social science methodologies can strengthen and contextualize individual-level analysis, specifically our understanding of the brain. Neuroscience can help to unpack the mechanisms that lead f…
The vast majority of social science research uses small (megabyte- or gigabyte-scale) datasets. These fixed-scale datasets are commonly downloaded to the researcher’s computer where the analysis is performed. The data can be shared, archived, and cited with well-established technologies, such as the Dataverse Project, to support the published results. The trend toward big data—including large-s…
With the rise of networked media such as Twitter, celebrities’ ability to speak on policy matters directly to the public has become amplified. We investigate the political implications of celebrity activism on Twitter by estimating the political ideology of thirty-four South Korean news outlets and fourteen political celebrities based on the co-following pattern among 1,868,587 Twitter users. W…
There is considerable controversy surrounding the study of presidential debates, particularly efforts to connect their content and impact. Research has long debated whether the citizenry reacts to what candidates say, how they say it, or simply how they appear. This study uses detailed coding of the first 2012 debate between Barack Obama and Mitt Romney to test the relative influence of the can…
This study examines the dynamics of the framing of mass shooting incidences in the U.S. occurring in the traditional commercial online news media and Twitter. We demonstrate that there is a dynamic, reciprocal relationship between the attention paid to different aspects of mass shootings in online news and in Twitter: tweets tend to be responsive to traditional media reporting, but traditional …
Most electronic behavior traces available to social scientists offer a site-centric view of behavior. We argue that to understand patterns of interpersonal communication and media consumption, a more person-centric view is needed. The ideal research platform would capture reading as well as writing and friending, behavior across multiple sites, and demographic and psychographic variables. It wo…
Theorists have long predicted that like-minded individuals will tend to use social media to self-segregate into enclaves and that this tendency toward homophily will increase over time. Many studies have found moment-in-time evidence of network homophily, but very few have been able to directly measure longitudinal changes in the diversity of social media users’ habits. This is due in part to a…
This article explores the relative influence of individual and network-level effects on the emergence of online social relationships. Using network modeling and data drawn from logs of social behavior inside the virtual world Second Life, we combine individual- and network-level theories into an integrated model of online social relationship formation. Results reveal that time spent online and …
Twitter provides a direct method for political actors to connect with citizens, and for those citizens to organize into online clusters through their use of hashtags (i.e., a word or phrase marked with # to identify an idea or topic and facilitate a search for it). We examine the political alignments and networking of Twitter users, analyzing 9 million tweets produced by more than 23,000 random…
People create, consume, and share content online in increasingly complex ways, often including multiple news, entertainment, and social media platforms. This article explores methods for tracing political media content across overlapping communication infrastructures. Using the 2011 Occupy Movement protests and 2013 consumer boycotts as cases, we illustrate methods for creating integrated datas…
Content analysis of political communication usually covers large amounts of material and makes the study of dynamics in issue salience a costly enterprise. In this article, we present a supervised machine learning approach for the automatic coding of policy issues, which we apply to news articles and parliamentary questions. Comparing computer-based annotations with human annotations shows that…
This article examines the prevalence and nature of negativity in news content. Using dictionary-based sentiment analysis, we examine roughly fifty-five thousand front-page news stories, comparing four different affect lexicons, one for general negativity, and three capturing different measures of fear and anger. We show that fear and anger are distinct measures that capture different sentiments…
This study offers a systematic comparison of automated content analysis tools. The ability of different lexicons to correctly identify affective tone (e.g., positive vs. negative) is assessed in different social media environments. Our comparisons examine the reliability and validity of publicly available, off-the-shelf classifiers. We use datasets from a range of online sources that vary in th…
Researchers have long measured people’s thoughts, feelings, and personalities using carefully designed survey questions, which are often given to a relatively small number of volunteers. The proliferation of social media, such as Twitter and Facebook, offers alternative measurement approaches: automatic content coding at unprecedented scales and the statistical power to do open-vocabulary explo…
This article discusses methodological challenges of using big data that rely on specific sites and services as their sampling frames, focusing on social network sites in particular. It draws on survey data to show that people do not select into the use of such sites randomly. Instead, use is biased in certain ways yielding samples that limit the generalizability of findings. Results show that a…
Analytic techniques developed for big data have much broader applications in the social sciences, outperforming standard regression models even—or rather especially—in smaller datasets. This article offers an overview of machine learning methods well-suited to social science problems, including decision trees, dimension reduction methods, nearest neighbor algorithms, support vector models, and …
Over the past few years, we have seen the emergence of “big data”: disruptive technologies that have transformed commerce, science, and many aspects of society. Despite the tremendous enthusiasm for big data, there is no shortage of detractors. This article argues that many criticisms stem from a fundamental confusion over goals: whether the desired outcome of big data use is “better science” o…
One of the challenges associated with high-volume, diverse datasets is whether synthesis of open data streams can translate into actionable knowledge. Recognizing that challenge and other issues related to these types of data, the National Institutes of Health developed the Big Data to Knowledge or BD2K initiative. The concept of translating “big data to knowledge” is important to the social an…
Despite the apparent partisan divide over issues such as global warming and hydraulic fracturing, little is known about what shapes citizens’ willingness to accept scientific recommendations on political issues. We examine the extent to which Democrats, Republicans, and independents are likely to defer to scientific expertise in matters of policy. Our study draws on an October 2013 U.S. nationa…