Lian Arzbecker

Postdoctoral researcher


Curriculum vitae


arzbecker.1 (at) osu (dot) edu | lianarzb (at) buffalo (dot) edu


Motor Speech Disorders Lab

Communicative Disorders and Sciences, University at Buffalo



2 Scatterplots: Further customization


Adding a regression line and correlation coefficient, plotting text


Table of contents
  1. ➖️Adding a regression line and correlation coefficient
  2. 🔡Plotting text (🎁Bonus: Subsetting data)
  3. ⚪️Encircling data
🏠️
Directions for downloading the dataset and setting up the workspace can be found here.
➖️

1. Adding a regression line and correlation coefficient

  • Function "geom_smooth", "cor", and "annotate"
  • Parameters: variables and aesthetics
  • Purpose: Adds statistics to the plot
# Syntax usage
plot + geom_smooth(method = method, se = T OR F, color = "name" OR "#hex code",
                   linetype = "
linetype" OR linetype code,
                   linewidth =
line width)

cor_coeff <- cor(
data$variable1, data$variable2)

plot + annotate("text", x = x coordinate, y = y coordinate,
                label = paste("r=", round(cor.coeff,
                                         
decimal places, size = 4.5)
# Create plot1
# Uses default "geom_smooth" settings
plot1 <- ggplot(dogs2, aes(x = weight, y = height)) +
  ggtitle("plot1") + geom_point() +
  scale_x_continuous(name = "Weight (lb)", limits = c(0, 180),
                     breaks = seq(0, 175, by = 25)) +
  scale_y_continuous(name = "Height (in)", limits = c(0, 36),
                     breaks = seq(0, 35, by = 5)) +
  geom_smooth()
print(plot1)


# Calculate the correlation coefficient
cor_coeff <- cor(dogs2$weight, dogs2$height)


# Create plot2
plot2 <- ggplot(dogs2, aes(x = weight, y = height)) +
  ggtitle("plot2") + geom_point() +
  scale_x_continuous(name = "Weight (lb)", limits = c(0, 180),
                     breaks = seq(0, 175, by = 25)) +
  scale_y_continuous(name = "Height (in)", limits = c(0, 36),
                     breaks = seq(0, 35, by = 5)) +
  geom_smooth(method = lm,
              # Changes method to Linear Model from
              # default Locally Estimated Scatterplot Smoothing
              se = F, # Remove confidence interval
              color = "red", # Color as a name or #hex code
              linetype = "longdash", # Linetype as a name or numeric code
              linewidth = 1.5) + 
  annotate("text", x = 150, y = 5, # Text placement coordinates
           label = paste("r =", round(cor_coeff, 2)),
           size = 4.5) # Font size
print(plot2)
🔡

2. Plotting text

  • Function: "geom_text" and "geom_text_repel"
  • Parameters:
  • Purpose
# Syntax usage
🎁
Bonus: Subsetting data
  • Function: "subset"
  • Parameter: Original dataset and subset variable names
  • Purpose: Creates a new, subsetted data frame
# Syntax usage
subset_name <- subset(original_data, variable_to_subset_by
                          %in% c("
variable_value1", "variable_value2"))
# Create composite variables "friendly" and "upkeep"
dogs2$friendly <- (dogs2$affection + dogs2$kids + dogs2$dogs + dogs2$playful +
                     (6 - dogs2$strangers)) / 5 # Negative code strangers
dogs2$upkeep <- (dogs2$shedding + dogs2$grooming + dogs2$drooling +
                   dogs2$energy + dogs2$stim) / 5


# Define custom labels for the tick marks
friendly_labels <- c("2" = "Grinch", "3" = "Friendly",
                     "4" = "Friendlier", "5" = "Mr. Rogers")
upkeep_labels <- c("1" = "Ron\nSwanson", # "\n" creates a new line in text
                   "2" = "Some\nupkeep", "3" = "High\nupkeep", "4" = "Diva")


# Create plot3
plot3 <- ggplot(dogs2, aes(x = friendly, y = upkeep, color = groupF,
                           label = breed)) + # Specify text variable
  ggtitle("plot3") + geom_point() + geom_text() + # Add breed names as text
  scale_x_continuous(breaks = 2:5,
                     labels = friendly_labels, limits = c(2, 5)) +
  scale_y_continuous(breaks = 1:4,
                     labels = upkeep_labels, limits = c(1, 4)) +
  labs(x = "Friendliness", y = "Maintenance")
print(plot3)


# Create subset from dogs2 of only "Terrier", or "Working" groupF
dogs_subset <- subset(dogs2, groupF %in% c("Terrier", "Working"))


# Create plot4
# Same concept as plot3 but uses a subset of data and improves aesthetics
plot4 <- ggplot(dogs_subset, aes(x = friendly, y = upkeep,
                                 color = groupF, label = breed)) +
  ggtitle("plot4") + geom_point() +
  geom_text_repel(size = 3.5, color = "black",
                  max.overlaps = Inf) + # No limit to overlapping points
  scale_x_continuous(breaks = 2:5,
                     labels = friendly_labels, limits = c(2, 5)) +
  scale_y_continuous(breaks = 1:4,
                     labels = upkeep_labels, limits = c(1, 4)) +
  scale_color_manual(values = c("#051DD1","#E2240A")) +
  labs(x = "Friendliness", y = "Maintenance",
       color = "Group") # Specify legend name
print(plot4)
⚪️

3. Encircling data

  • Function "geom_encircle" (requires "ggalt" package)
  • Parameters: 
  • Purpose: Circles specified data
# Syntax usage
plot +  geom_encircle(data = data,
                      s_shape =
1,
                      expand =
1,
                      color = "
name" OR "#hex code")

Tools
Translate to