Jerid Francom bio photo

Jerid Francom

Associate Professor of Spanish and Linguistics
Romance Languages
Wake Forest University

Curriculum vitae

Email Twitter Github Stackoverflow Last.fm

I was just browsing the web today and stumbled on an overview of data visualization in R. As I was scrolling the page something caught my eye: a hexbin plot. I had never heard of such a plot before.

To create a hexbin plot in base R with plot() you need to install the hexbin package.

install.packages("hexbin")

Then you can load the library and create your hexbin object setting x and y.

library(hexbin) # load the hexbin lib
a <- hexbin(x = diamonds$carat, y = diamonds$price) # create a hexbin object
plot(a) # plot

center

What a great way to visualize bivariate frequency data and avoid overplotting issues. Then, I thought: does ggplot2 have a geom for that?

It turns out it’s a stat. But, yes, yes it does. In this case you will not need to install/ load the hexbin package. Just apply stat_binhex() to your plot.

ggplot(diamonds, aes(x = carat, y = price)) + # create plot
  stat_binhex() # apply stat

center

In both the plot and ggplot2 approaches you can tweak the bin size to adjust the desired granularity of the plot.

a <- hexbin(x = diamonds$carat, y = diamonds$price, xbins = 10)
plot(a)

center

ggplot(diamonds, aes(x = carat, y = price)) + 
  stat_binhex(bins = 10) 

center

And since I was poking around on the ggplot2 documentation I also discovered another stat to use to avoid overplotting. This time when working with two categorical variables: stat_sum().

ggplot(diamonds, aes(x = cut, y = clarity)) + stat_sum(aes(group = 1))

center

sessionInfo()
## R version 3.1.3 (2015-03-09)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.4 (Yosemite)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] graphics  grDevices utils     datasets  methods   stats     base     
## 
## other attached packages:
## [1] hexbin_1.27.0 knitr_1.10.5  Rdym_0.2.0    ggplot2_1.0.1
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.2-6 digest_0.6.8     evaluate_0.7     formatR_1.2     
##  [5] grid_3.1.3       gtable_0.1.2     labeling_0.3     lattice_0.20-31 
##  [9] magrittr_1.5     MASS_7.3-40      munsell_0.4.2    plyr_1.8.2      
## [13] proto_0.3-10     Rcpp_0.11.6      reshape2_1.4.1   scales_0.2.4    
## [17] stringi_0.4-1    stringr_1.0.0    tools_3.1.3