PhD+ Data Literacy: Data Preparation

The Corner Building

Data analysis involves a large amount of preparing, cleaning, and “munging” data to facilitate downstream data analysis. This session will cover data cleaning and “tidy data,” and will introduce participants to R packages that enable data manipulation, analysis, and visualization using split-apply-combine strategies. Participants will learn how to use the dplyr package in R to effectively manipulate and conditionally compute summary statistics over subsets of a “big” dataset containing many observations. The session will assume understanding of the material in the preceding sessions and will build on a common research case, using Albemarle Real Estate Property data (though each workshop may also introduce additional examples and data).


David Martin - Clinical Research Data Specialist

For workshop materials, please visit

Graduate students register here

Postdocs register here

Core Module
Core Module Sub Categories