Jerid Francom bio photo

Jerid Francom

Associate Professor of Spanish and Linguistics
Romance Languages
Wake Forest University

Curriculum vitae

Email Twitter Github Stackoverflow Last.fm

In this post I will step through how to integrate geo-tagged tweets into the choropleth plots I created in the previous post, “Census 2010 I”. I will also show how you can use the plotly package to make the plot interactive and enable a hover-over effect to display the tweet content.

Getting some tweet data

Not all tweets have geolocation information available. Accessing the Twitter API via the streamR package particular parameters can be set to only include those tweets with geolocation enabled. You can also specify a bounding box to further filter the geographic area from which you would like to draw your sample tweets. Now, the bounding box is going to be just that, a box. And we are dealing with polygons. To isolate tweets from a specific geo-political region, such as a census tract, you can use the sp package. For the details on how to both get geo-tagged tweets in R and how to clip the tweets to fit a specific spatial object polygon, refer to my previous post Access Twitter posts by country.

Picking up from last time

Our starting point, therefore, looks like this:

ls()
## [1] "p"              "p.roadmap"      "pima.data"      "pima.tweets"   
## [5] "span.total"     "speakers.total"

p and p.roadmap are the plots and span.total and speakers.total are the variables I created in the last post based on the American Fact Finder data to visualize the level of Spanish speakers by census tract.

centercenter

Here I will be working with a small set of tweets collected from Twitter and clipped to only include posts that emanated from within Pima county, Arizona –the county in which Tucson resides. So here’s a quick look at the variables in the data:

pima.tweets %>% names
 [1] "lon"                       "lat"                      
 [3] "text"                      "retweet_count"            
 [5] "favorited"                 "truncated"                
 [7] "id_str"                    "in_reply_to_screen_name"  
 [9] "source"                    "retweeted"                
[11] "created_at"                "in_reply_to_status_id_str"
[13] "in_reply_to_user_id_str"   "lang"                     
[15] "listed_count"              "verified"                 
[17] "location"                  "user_id_str"              
[19] "description"               "geo_enabled"              
[21] "user_created_at"           "statuses_count"           
[23] "followers_count"           "favourites_count"         
[25] "protected"                 "user_url"                 
[27] "name"                      "time_zone"                
[29] "user_lang"                 "utc_offset"               
[31] "friends_count"             "screen_name"              
[33] "country_code"              "country"                  
[35] "place_type"                "full_name"                
[37] "place_name"                "place_id"                 
[39] "place_lat"                 "place_lon"                
[41] "expanded_url"              "url"                      

There is plenty of interesting information you can play around with –but note, fields with user input often contain unreliable information. In this post I’ll only need a few key features (lon, lat, and text) and include one other (lang) which facilitates my aim to explore the relationship between language choice on Twitter and US Census demographic information.

pima.tweets <- subset(pima.tweets, 
                      select = c("lon", "lat", "text", "lang"))

To include points on our map corresponding to Twitter posts we use the geom_point function including specifying the pima.tweets dataset.

p + geom_point(data = pima.tweets,
               aes(x = lon, y = lat, group = 1))

center

There are various aesthetics that ggplot2 makes available that we can use to visualize language (lang). In this case I don’t want to see languages other than English and Spanish so I will subset the data using en and es and map it to the color aesthetic. Note that I’m naively trusting the language detection algorithm that Twitter uses.

pima.tweets <- subset(pima.tweets, lang == 'en' | lang == 'es')

p + geom_point(data = pima.tweets,
                       aes(x = lon, y = lat, group = 1, 
                           color = lang)) +
  scale_color_manual(values = c("yellow","red"), name = "Language")

center

If you’re me, you’re thinking it would be cool to see what the content of these tweets are. The plotly package can be hooked up with ggplot2 and you can get a really cool effect in which the text appears on hovering over a point on the map.

Just load the plotly library, create your standard plot, and then apply the ggplotly() function.

library(plotly)
pp <- p + geom_point(data = pima.tweets,
                       aes(x = lon, y = lat, group = 1, 
                           color = lang, text = text)) +
  scale_color_manual(values = c("yellow","red"), name = "Language")
ggplotly(pp)

This plot has only scratched the surface. There is a lot more to learn about plot.ly. I encourage you to head on over to their website and check out the growing documentation on the R API.

sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.3 (El Capitan)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
## [1] plotly_2.0.16    plyr_1.8.3       sp_1.2-2         data.table_1.9.6
## [5] knitr_1.12.3     magrittr_1.5     ggplot2_2.0.0   
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.3         formatR_1.2.1       methods_3.2.3      
##  [4] base64enc_0.1-3     viridis_0.3.2       tools_3.2.3        
##  [7] digest_0.6.9        jsonlite_0.9.19     evaluate_0.8       
## [10] gtable_0.1.2        lattice_0.20-33     png_0.1-7          
## [13] mapproj_1.2-4       yaml_2.1.13         rgdal_1.1-3        
## [16] proto_0.3-10        gridExtra_2.0.0     stringr_1.0.0      
## [19] httr_1.1.0          RgoogleMaps_1.2.0.7 htmlwidgets_0.5    
## [22] maps_3.0.2          grid_3.2.3          R6_2.1.2           
## [25] jpeg_0.1-8          RJSONIO_1.3-0       ggmap_2.6.1        
## [28] reshape2_1.4.1      scales_0.3.0        htmltools_0.3      
## [31] geosphere_1.5-1     colorspace_1.2-6    labeling_0.3       
## [34] stringi_1.0-1       munsell_0.4.2       chron_2.3-47       
## [37] rjson_0.2.15