4.1 Importing data
One of the biggest challenges for first-time R
users is importing a dataset. Though the process is similar, the specific functions for loading data depend on the format of your data. Here we cover how to import the most common types of data files.
4.1.1 R data (.rdata)
The file extension for a data frame saved in R
is .RData
. An object can be opened using the load('filename')
command. Test this out by loading the DCPS testing.RData
data.
load('DCPS testing.RData')
Note that this works only if the dataset is saved in the working directory. If not, you need to specify the complete file path in this command.
4.1.2 Delimited (.csv) files
Often the datasets you work with will not be in .rdata
format, or it will be more convenient to store them in another format so that you can create or work with them using other software. The simplest option is to work with Comma-Separated Values (.csv) files. To import data from a .csv
file, use the read_csv('filename')
function defined within tidyverse
.
<- read_csv('csvData.csv') myCSV
4.1.3 Excel (.xls, .xlsx) files
To import data from Excel (.xlsx
or .xls
), use the read_excel('filename')
function defined in the readxl
package. Note that readxl
installs automatically with the tidyverse
, but you have to load it separately. Practice this command by loading the biopic.xls
data required for this guide.
library(readxl)
<- read_excel('biopics.xls') film
If it does not load, verify that the dataset is saved in your working directory.
4.1.4 Stata (.dta) or SPSS (.sav) files
To import datafiles written in Stata (.dta
) or SPSS (.sav
), we rcommend using the read_dta('filename')
and read_spss('filename')
functions defined in the haven
package. Note that haven
is part of the tidyverse
, but you must load it separately.
library(haven)
<- read_dta('stataData.dta') # Stata format
myStata <- read_spss('spssData.sav') # SPSS format mySPSS