Chapter 7 Curate data
This is a block quote. — Me
In this chapter you will learn:
- …
… introduction, and outline of the chapter…
First steps in formating data that is conducive for quantitative research. It is important to make a distinction between two types of data that we are typically working with in text analysis: linguistic and non-linguistic data.
Linguistic data is the text that we are aiming to analyze.
Non-linguistic data, or metadata, is the data that we aim to associate with the linguistic data. Non-linguistic variables are tightly related to the type of text. For example, in text acquired from one-on-one interviews, non-linguistic variables may include information about the speakers’ gender, age, language background, etc.
Before continuing complete the following learnr
/ interactive code exercises:
- Custom functions: creation, documentation, and organization for subsequent use
-
Iterative function application via
purrr
- Understanding Markup languages; XML, HTML
-
Parsing and extracting information from markup with
rvest
7.1 Metadata
7.2 Language data
Activities
Concept questions
- This is the first question, right?
Code exercises