4.1 Importing data

One of the biggest challenges for first-time R users is importing a dataset. Though the process is similar, the specific functions for loading data depend on the format of your data. Here we cover how to import the most common types of data files.

4.1.1 R data (.rdata)

The file extension for a data frame saved in R is .RData. An object can be opened using the load('filename') command. Test this out by loading the DCPS testing.RData data.

  load('DCPS testing.RData')

Note that this works only if the dataset is saved in the working directory. If not, you need to specify the complete file path in this command.

4.1.2 Delimited (.csv) files

Often the datasets you work with will not be in .rdata format, or it will be more convenient to store them in another format so that you can create or work with them using other software. The simplest option is to work with Comma-Separated Values (.csv) files. To import data from a .csv file, use the read_csv('filename') function defined within tidyverse.

  myCSV <- read_csv('csvData.csv')

4.1.3 Excel (.xls, .xlsx) files

To import data from Excel (.xlsx or .xls), use the read_excel('filename') function defined in the readxl package. Note that readxl installs automatically with the tidyverse, but you have to load it separately. Practice this command by loading the biopic.xls data required for this guide.

  library(readxl)

  film <- read_excel('biopics.xls')

If it does not load, verify that the dataset is saved in your working directory.

4.1.4 Stata (.dta) or SPSS (.sav) files

To import datafiles written in Stata (.dta) or SPSS (.sav), we rcommend using the read_dta('filename') and read_spss('filename') functions defined in the haven package. Note that haven is part of the tidyverse, but you must load it separately.

  library(haven)

  myStata <- read_dta('stataData.dta') # Stata format
  mySPSS <- read_spss('spssData.sav') # SPSS format