Lian Arzbecker

Postdoctoral researcher


Curriculum vitae


arzbecker.1 (at) osu (dot) edu | lianarzb (at) buffalo (dot) edu


Motor Speech Disorders Lab

Communicative Disorders and Sciences, University at Buffalo



0 Setting up your RStudio workspace


Preparing the environment and loading data files


Table of contents

  1. 🏠️Setting the working directory
  2. 📦️Installing and loading packages
  3. 🗒️Importing data: Assigning names and reading files
  4. 🏷️Labeling variables
  5. 🗂️Factorizing categorical variables
  6. ⭐️Putting it all together
⬇️
Click here to download the example CSV file, be sure to save as "dogs.csv" 
🏠️

1. Setting the working directory

  • Function "setwd"
  • Parameter: The file path as a character vector
  • Purpose: Maps your directory to the specified path
# Syntax usage, any text in bold is user input
setwd("
path/here/with/forward/slashes")
# Set working directory
setwd("C:/Users/lianjarzbecker/Downloads")


# Verify working directory
getwd()
📦️

2. Installing and loading packages

  • Function "install_and_load"
  • Parameter: A list of package names as a character vector
  • Purpose:
    • Checks if each package is installed
    • Installs any missing packages
    • Loads all specified packages
# Synatx usage
packages <- c("
package1", "package2")
# Package list
packages <- c("readxl", "expss")

### No need to edit anything below this triple comment symbol
# Function for installing missing packages
install_and_load <- function(pkg) {
  if (!require(pkg, character.only = TRUE)) {
    install.packages(pkg, repos = "http://cran.us.r-project.org")
    library(pkg, character.only = TRUE)
  }
}

# Install and load packages
invisible(lapply(packages, install_and_load))
### No need to edit anything above this triple comment symbol
🗒️

3. Importing data: Assigning names and reading files

  • Variable names can be assigned directly without calling a function by using the assignment operator "<-"
    • Parameter: Variable name
    • Purpose: Stores variable and allows for referencing
  • When reading a file, function used depends on file type
    • Parameter: Fle name as a character vector, including the file extension
    • Purpose: Reads the file (assuming it's in your working directory) and stores it as a data frame under your assigned variable name
# Syntax usage
data_frame_name <- read.csv("filename.csv")
data_frame_name <- read_xlsx("filename.xlsx")
data_frame_name <- read.table("filename.txt")
  • Function "read.csv"
  • Parameter: File name
  • Purpose: Read CSV files
  • Function "read_xlsx" (requires "readxl" package)
  • Parameter: File name
  • Purpose: Read Excel files
  • Function "read.table"
  • Parameter: File name
  • Purpose: Read text files
⬇️
Click here to download the xlsx version and click here to download the txt version.
# Backward arrow names & saves as data frame
dogs <- read.csv("dogs.csv")


# Handling missing data: empty cells are filled with "NA"
dogs <- read.csv("dogs.csv", na.strings = c("","NA"))


# Verify structure & first few rows of data
str(dogs)
head(dogs)


# Same data, just in different file formats
# Reads both .xlsx and .xls files
library(readxl)
dogs_excel <- read_xlsx("dogs.xlsx")


# Function "read.table" assumes space delimiter & no header
# need to indicate the separator is a tab and there is a header
dogs3_text <- read.table("dogs.txt", sep = "\t", header = T)
🏷️

4. Labeling variables

  • Function "apply_labels" (requires "expss" package)
  • Parameter: Variable name as character vector
  • Purpose: Enhances readability and interpretability of your data
# Syntax usage
data = apply_labels(data,
                   
variable1 = "Variable 1",
                   
variable2 = "Variable 2")
# Give variables labels. Might seem redundant but think of future you!
library(expss)
dogs = apply_labels(dogs,
                    breed = "Breed",
                    group = "Group",
                    height = "Height (in)",
                    weight = "Weight (lb)",
                    life_expect = "Life span (yr)",
                    affection = "Affectionate",
                    kids = "Good with kids",
                    dogs = "Good with other dogs",
                    shedding = "Shedding level",
                    grooming = "Coat grooming frequency",
                    drooling = "Drooling level",
                    coatT = "Coat type",
                    coatL = "Coat length",
                    strangers = "Stranger openness",
                    playful = "Playfulness",
                    protec = "Protectiveness",
                    adapt = "Adaptability",
                    train = "Trainability",
                    energy = "Energy level",
                    bark = "Barking level",
                    stim = "Mental stimulation needs")

# Verify labels have been correctly assigned
str(dogs)
🗂️

5. Factorizing categorical variables

  • Function "factor" and "as.factor"
  • Parameter:
    • Character or numeric column in your data frame
    • Column in the data frame referenced by "$"
    • Specify levels (raw data values)
    • Outline labels (new data names)
  • Purpose: Creates a new column with the factorized data
# Syntax usage
data$new_column_name <- as.factor(data$column_name)
data$new_column_name <- factor(data$column_name,
                          levels = c(
1, 2, 3),
                          labels = c("
One", "Two", "Three"))
# Create new "groupF" column in "dogs" data frame
# Reference "group" column in "dogs" data frame with "$"
dogs$groupF <- factor(dogs$group,
                      levels = c("Herding", "Hound", "Toy", "Non-sporting",
                                 "Sporting", "Terrier", "Working",
                                 "Miscellaneous", "FSS"),
                      labels = c("Herding", "Hound", "Toy", "Non-sporting",
                                 "Sporting", "Terrier", "Working",
                                 "Miscellaneous", "FSS"))


# If levels and labels are identical, "as.factor" is more efficient
dogs$groupF <- as.factor(dogs$group)


# More factorizing examples below
# Example 1: Group Recognition by AKC (Yes or No)
dogs$groupR <- factor(dogs$group,
                      levels = c("Herding","Hound","Toy","Non-sporting",
                                 "Sporting","Terrier","Working",
                                 "Miscellaneous","FSS"),
                      labels = c("Yes","Yes","Yes","Yes", "Yes",
                                 "Yes","Yes","No","No"))


# Example 2: Barking Behavior Classification
# (Only to alert, Rarely, Somtimes, Often, Very Vocal)
dogs$barkF <- factor(dogs$bark,
                     levels = c(1, 2, 3, 4, 5),
                     labels = c("Only to alert", "Rarely",
                                "Sometimes", "Often", "Very vocal"))

# Example 2a: Alternative Barking Classification
# (Low, Medium, High)
dogs$barkF2 <- factor(dogs$bark,
                      levels = c(1, 2, 3, 4, 5),
                      labels = c("Low","Low", "Medium", "High", "High"))
⭐️

6. Putting it all together

  • The script below provides a comprehensive workflow for preparing your R environment, loading data, and preparing it for further analysis or visualization tasks
  • Adjust the file path to match the location and name of your actual CSV file
# RStudio workspace setup
setwd("C:/Users/lianjarzbecker/Downloads")

# List of packages to install and load
packages <- c("expss")

# Function to install and load packages if not already installed
install_and_load <- function(pkg) {
  if (!require(pkg, character.only = TRUE)) {
    install.packages(pkg, repos = "http://cran.us.r-project.org")
    library(pkg, character.only = TRUE)
  }
}

# Install and load packages
invisible(lapply(packages, install_and_load))

# Read data from CSV file and label variables
dogs <- read.csv("dogs.csv", na.strings = c("","NA"))
dogs = apply_labels(dogs,
                    breed = "Breed",
                    group = "Group",
                    height = "Height (in)",
                    weight = "Weight (lb)",
                    life_expect = "Life span (yr)",
                    affection = "Affectionate",
                    kids = "Good with kids",
                    dogs = "Good with other dogs",
                    shedding = "Shedding level",
                    grooming = "Coat grooming frequency",
                    drooling = "Drooling level",
                    coatT = "Coat type",
                    coatL = "Coat length",
                    strangers = "Stranger openness",
                    playful = "Playfulness",
                    protec = "Protectiveness",
                    adapt = "Adaptability",
                    train = "Trainability",
                    energy = "Energy level",
                    bark = "Barking level",
                    stim = "Mental stimulation needs")

# Factorize variables
dogs$groupF <- as.factor(dogs$group)
dogs$barkF <- factor(dogs$bark,
                     levels = c(1, 2, 3, 4, 5),
                     labels = c("Only to alert", "Rarely",
                                "Sometimes", "Often", "Very vocal"))

# Optional: View the structure and stats summary of the data frame
str(dogs)
summary(dogs)