Curate language data (1/2): organizing meta-data

In this post I will provide an overview of the process of taking raw text and meta-data and organizing them into a tidy data set; that is, a tabular data format where each row is an observation and each column a corresponding attribute of the data.

Acquiring data for language research (3/3): web scraping

In this last post dedicated to acquiring data for language research with R I discuss strategies for scraping language from the public-facing web. The `rvest` and `tidyverse` packages will do the heavy lifting but to put this software into practice we need to get up to speed with the language of the web: HTML.

Acquiring data for language research (2/3): package interfaces

This is the second of three posts dedicated to acquiring for language research with R. I will cover connecting to web service APIs with R packages. I will also discuss R vectors and data frames in more detail.

Acquiring data for language research (1/3): direct downloads

In this post, I will provide an overview of the first of three common strategies for acquiring corpus data in R: accessing corpus data from data repositories and individual sites. I will cover acquiring data from different sources and introduce you to the R code that will help speed the process, maintain consistency in our data, and set the stage for a reproducible workflow.

Data for language research -types and sources

In this Recipe you will learn about the types of data available for language research and where to find data. The goal, then, is to introduce you to the landscape of language data available and provide a general overview of the characteristics of language data from a variety of sources providing you with resources to begin your own quantitative investigations.

Introduction to statistical thinking

In this post I will cover some of these topics including the importance of identifying a research question, how different statistical approaches relate to different types of research, and understanding data from a sampling and organizational standpoint. I will also provide some examples of linking research questions with variables in a toy dataset as we begin to discuss how to approach data analysis, primarily through visualization techniques.

Project management for scalable data analysis

The third post in the Recipe series, I provide an overview of and steps for the organization of a scalable data science project. This will include details on how to set up an R project, organizing scripts, data, and reports, and touch on various best practices which will lead to an efficient workflow and set the stage for a portable and easily reproducible research project.

Getting started with R and RStudio

In this second entry in the Recipe series, we will get started by getting familiar with R and the RStudio IDE.

Introducing the Recipe series

This is the introduction to the Recipe series for doing language research with R and RStudio.

Testing features in `blogdown`

A post to test some of the basic features and functions of `blogdown`.