Skills Series

Data Literacy

PhD Plus offers two workshop series, focusing on two commonly used languages, R and Python, aim to help PhD students and Postdocs in all disciplines to acquire foundational skills in using these languages for data wrangling, analysis, visualization, and more. Python series takes place every Fall and R series in the Spring. Both series intend to prepare trainees for a variety of careers, such as research (faculty, research analysts) and professional sectors that rely on data analytics, data science, data management, and data visualization/storytelling. The skills training module is developed and offered in collaboration with the UVA Library’s Research Data Services, Data Services at the Health Sciences Library, and UVA Research Computing

DATA LITERACY: PYTHON

Python is a popular language widely used for data analysis and machine learning. This PhD+ series will introduce students to programming in the language. The sessions will be taught in a "flipped classroom" manner: students will study learning materials prior to each session and come to online workshop sessions via Zoom for questions and receive instructors’ assistance with programming projects. 

The UVA Library’s Research Data Services Team and Research Computing are partnering with UVA’s PhD Plus program to offer – Introduction to Programming in Python, a series to build data analysis skills. For Fall 2023, the participants will have access to pre-recorded lectures and the opportunity to meet with instructors during weekly "office hours" for questions and answers over Zoom. The whole series through lecturing, hands-on exercises, and discussions aims to help participants become familiar and comfortable with using Python for research activities and working with data. The live discussions will be offered every Wednesday beginning September 6 and running until October 4, 2023. Sessions will begin at 4 pm in Clark Hall, Brown Library Room 133. Attendees will be expected to turn in homework assignments per session in order to get credit. Upon successful completion of this series, PhD students are eligible for a non-credit credential (PhDP 9530) in their academic transcript. 

Typical Topics 

  • Introduction to Python
  • Data types, data input and output, and organizing codes
  • Data analysis and visualization

Session Dates

This series was offered in Fall 2023. Stay tuned for 2024 series dates. 

Instructors

Katherine Holcomb - Senior Research Systems Consultant

Erich Purpur - Science and Engineering Librarian

Check https://data.library.virginia.edu/training/ for more python training opportunities.

 

DATA LITERACY: Data Foundations for Non-Specialists

Tuesdays, March 12th-April 16th, 2024, 3:30-5 PM, In-Person (Register below for location)

This six-part workshop series is meant to give non-specialists a practical foundation for working with data and understanding approaches to answering questions with data that are rooted in statistical thinking. Topics covered:

A) Data formatting: This session will get students comfortable with handling quantitative datasets, coding text into the appropriate format, understanding different types of variables (continuous, categorical, ordinal), understanding when transformations might be needed and how to execute them, and tidying data for analysis.

B) Intro to graphs: Students will learn to visualize individual variables, relationships between different variables, and the appropriate way to present them depending on the question. This will also cover some basics of aesthetics associated with formatting figures to aid in showing patterns in the data through graphs.

C) Intro to statistical thinking: Here students will learn the motivations for measuring central tendencies and spread of data, their relevance in hypothesis testing and data analysis, and how to interpret these summaries.

D) Experimental design: We will cover appropriate study design to test hypotheses, types of statistical tests and how to determine which to use. Students will learn how to report statistical analysis in a way that describes patterns in the data.

E) Storytelling with your data: Over the course of each of the previous 4 workshops, students will have time to come up with a question they would be interested in testing with a dataset of their choice (either gathered from existing databases, collected from their own projects, etc.) They will apply principles learned in the past 4 workshops to prepare, visualize, and analyze their own data and create a data analysis exercise for other students.

F) Bridge to R: We will discuss the limitations of graphical user interface tools such as DCU and other platforms and when coding based statistical tools may be needed. We will then see how data visualizations and statistical tests can be conducted in a language such as R using our Bridge to R interface. At each point we will highlight relevant modules that students could take in other data literacy courses in R/Python.

Upon successful completion (5 or more sessions) of this series, PhD students are eligible for a non-credit credential on their academic transcript (at no cost to students).

Click here to register for the Spring 2024 Data Foundations for Non-Specialists series. 

(Note: Enrollment in this workshop series is capped at 25 students. Those who can commit to attending the majority of the workshop series will get priority.)

 

DATA LITERACY: R

PhD Plus Data Literacy: R seminars are offered every Spring semester. This series is developed through collaboration with UVA Library's Research Data Services Group. Seminars are scheduled to meet weekly through lecturing, hands-on exercises, and discussions to help participants become familiar and comfortable in using R for research activities and working with data. Upon successful completion of this series, PhD students are eligible for a non-credit credential (PhDP 9520) in their academic transcript. 

Weekly Topics for Spring 2024:

Session 1 - February 21st from 2pm-4pm

Getting Started with R

The first step to developing data literacy is learning a computer program to help you analyze and visualize data. In the first session we’ll get you up and running with R, a computing environment and programming language designed specifically for data analysis. We’ll assume no prior knowledge of R and start from the very beginning. We’ll work with small data sets to demonstrate how R works and how it can help us quickly explore and investigate data. This session is meant to get you excited about using R and provides the computational foundations for the sessions that follow!

Session 2 - February 28th from 2pm-4pm 

Preparing Data for Analysis

Whether we generate our own data through an experiment or use data collected by someone else, we almost always need to pre-process the data before we can analyze it. In this session we work with real-life data sets to introduce various strategies for cleaning and preparing data. Topics will include importing data, merging data, using regular expressions to identify text patterns, working with factors, and more. If you don’t know what that stuff means, this session is for you. 

Session 3 - March 13th from 2pm-4pm

Visualizing Data

Visualizing data allows us to look for patterns and associations in data as well as identify unusual observations. R is especially good for this. In this session we’ll take a deep dive into the ggplot2 package to learn how to rapidly create insightful data visualizations. We’ll also briefly introduce the plotly package for making interactive visualizations. In addition, we’ll address some of the judgments we must make when visualizing data and offer some advice on best practices.

Session 4 - March 20th from 2pm-4pm

Essential Statistics

Data literacy includes statistical literacy. This means being able to calculate and interpret statistical summaries, estimate uncertainty, evaluate assumptions, and avoid falling prey to common statistical errors. In this session we learn how to use R to carry out essential statistical analyses such as comparing proportions and means, calculating percentiles, creating cross tabulations, estimating uncertainty, and carrying out hypothesis tests. We emphasize application and interpretation with minimal math. In addition, we highlight sources of common statistical mistakes and how to avoid them.

Session 5 - March 27th from 2pm-4pm

Models and Machine Learning

One of the reasons we collect and analyze data is to attempt to quantify relationships and/or make predictions. For example, how much can we expect the value of our home to increase if we add 500 square feet to it? Or can we use laser scanner images of an eye’s background to predict whether someone has glaucoma or not? In this session we cover some basic ideas of regression models and selected machine learning algorithms, and demonstrate how to implement them in R. No knowledge of advanced mathematics is required. The goal is to make you a more informed consumer of results that use these methods.

Session 6 - April 3rd from 2pm-4pm

Creating Deliverables

Once we have used R to pre-process, summarize, visualize, and model our data, we usually want to communicate the results to an audience in the form of a deliverable. This can be a document, a presentation, a website, a book, a dashboard, or even an interactive application. In the final session we learn how to use R and RStudio to create final products such as these so we can effectively present results and communicate what we’ve learned from our data analysis. The ideas and techniques learned in this session will serve you well no matter your field of study or level of statistical expertise.

Click here for registration. 

Instructor

Clay Ford, Senior Research Data Scientist, Research Data Services, UVA Library

 

 

Previous Offerings:

Statistical Analysis in R (Pilot Offering in Fall 2022)

This series is developed through collaboration with UVA Health’s Health Sciences Library. Sessions are scheduled to meet weekly through lecturing, hands-on exercises, and group discussions to help participants become familiar and comfortable conducting statistical analyses in R. Upon successful completion of this series, PhD students are eligible for a non-credit credential on their academic transcript (*required attendance for at least 4 out of 5 synchronous sessions). 

In this series, participants will learn to conduct and interpret output from basic statistical approaches using the statistical programming language, R. Using real research data from the life sciences, we will introduce the concept of each technique, discuss its assumptions, learn what to do when assumptions are not met, evaluate the model fit, interpret the output, and visualize the results. Participants from any discipline are welcome to register and attend. While the examples are drawn from life sciences research, participants will be able to conduct analysis and apply learning to data on any topic.

Participants should commit to attend at least the first three sessions to lay the foundation for the later sessions. Sessions will be recorded to accommodate occasional absence. However, participants must be available to attend live sessions to qualify for the PhD+ non-credit credential.

Pre-Requisite

Participants MUST have working knowledge of R and RStudio, preferably with previous experience using tidyverse packages (dplyr and ggplot2). This series’ pre-requisite may be met through any of the following:

  • Attendance at Health Sciences Library’s 4 workshops in R (offered monthly)
  • Participation in at least 3 sessions of PhDPlus Data Literacy in R series (offered Spring semester annually)
  • Participation in Brain Immunology and Glia Center Learn R series (offered Fall annually)
  • Completion of a curricular course using R (within the past 3 years)
  • Independent learning of R and instructor approval

Session Dates

Virtual format Oct 19, Oct 26, Nov 2, Nov 9, Nov 16 (Wednesdays), 10 am – 12 pm (ET)

Typical Topics 

  • Linear regression
  • ANOVA
  • Assumptions of linear regression
  • Logistic regression
  • Linear mixed effects models

Instructor

Marieke Jones, Research Data Specialist, UVA Health Sciences Library

 

Data Literacy: Manage Your Research Data

Spring 2023 workshop date: Jan 26, Thursday, 12-1:30 PM (ET) | Virtual

We offer a new workshop on data management in partnership with the UVA Health Sciences Library at the beginning of each semester to help PhD students and Postdocs be better oriented in research projects and programming. We strongly recommend attendees of all disciplinary backgrounds attend this workshop before engaging in Python or R series. 

This session will make your life easier by adopting good data practices now. Using real-life examples, this 90-minute interactive workshop will cover:

  • Recommended approaches to data organization, versioning, documentation, and storage
  • “Tidy data” practices to enable efficient data entry, preparation, and analysis 
  • UVA research data support services and resources

Instructors

Andrea H. Denton, M.I.L.S. - Research and Data Services Manager, Claude Moore Health Sciences Library
Lucy Carr Jones, M.S.I.S. - Library Assistant, Claude Moore Health Sciences Library