Post 3 - Using parellel iteration and writing functions to generate plots

Examples of using pmap() and map() to loop through the data set to create plots, plus one example of writing a function that plots linear models.

Joanna Wright https://s-gluck.github.io/funprog_blog
06-01-2020

Table of Contents


Set-up


library(ggplot2)
library(tidyverse)
library(janitor)
library(glue)
library(purrr)

Loading the data

For details on how to load the data, see Post #1 by Brock. For details about this dataset, see Post #2 by Akhila and Stephanie.


files <- list.files(here::here("data"),
                    full.names = TRUE)
files
d <- read_csv(files[1]) %>%
  clean_names()

Nesting and using pmap() for plotting

Nesting

Let’s say we want to create a separate plot for mobility trends in each country. First, we need to group_by country and nest() the rest of the data. This creates a list of tibbles, one for each country. I’ve called reactable() here so that we can return a simple, interactive table to see the results of nesting.


d %>%
  group_by(country) %>% 
  nest() %>%
  reactable::reactable()

Now we can use this as the input for our plots.

pmap() and ggplot() for generating a series of plots

About pmap(): pmap(.l, .f, …)

pmap(.l, .f, ...) is a variation of map() which allows us to apply a function to multiple vectors simultaneously. According to the help documentation, it takes 2 or more arguments:

Getting started with plotting

In the syntax below, you’ll see that the first vector listed corresponds to ..1 (in this case, country) and the second vector listed (data) corresponds to ..2. The vectors listed in aes() are from within the data, which we specifed with ..2. For this example, we’ll create one plot for each country, for just one of the mobility categories (workplaces).


plots_workplaces <- d %>%
  group_by(country) %>% 
  nest() %>% 
  mutate(plots = pmap(list(country, data),  
                      ~ggplot(..2, aes(date, workplaces)) +
                        geom_point() + 
                        geom_line()))

Test to see what the plots look like:


plots_workplaces$plots[1]

[[1]]


plots_workplaces$plots[2]

[[1]]


# or we could view all of them like this: 
# plots_workplaces$plots[1:19]

Plot improvements and customization

We can add a unique title to each plot with {..1} to select the 1st column (country) to insert country names into the title.

Plus a few other visualization adjustments…


plots_workplaces <- d %>%
  group_by(country) %>% 
    nest() %>%
    mutate(plots = pmap(list(country, data),
    ~ggplot(..2, aes(date, workplaces)) +
      geom_line(color = "grey70") +
      geom_point(aes(color = workplaces > 0), 
                 size = 2) +
      scale_color_brewer(palette = "Set2") +      
      theme(axis.title.x = element_blank(), 
            legend.position = 'none', 
            plot.background = element_rect(fill="gray20"),
            panel.background = element_rect(fill="gray20"),
            panel.grid.minor = element_blank(), 
            plot.title = element_text(color="white",
                                      hjust= 0,
                                      vjust=1, 
                                      size=rel(1.5)), 
            axis.text = element_text(color="white", 
                               size=rel(1.5)),
            axis.text.y  = element_text(hjust=1), 
            axis.title.y = element_text(color = "white"),
            plot.caption = element_text(color = "white")) +
      labs(title = glue("Mobility trends to/from workplaces in {..1}"), #unique title
           y = "% change in mobility", 
           caption = "Googlemaps data from https://www.kaggle.com/gustavomodelli/covid-community-measures"))
    )

# testing: 
plots_workplaces$plots[1]

[[1]]


plots_workplaces$plots[2]

[[1]]


# or we could see all of them: 
# plots_workplaces$plots[1:19]

Another application of pmap()

What if we wanted to see plots for all mobility categories, not just workplaces? We can apply the same logic as above, but first nest by both country and mobility category.

In order to do so, we’ll first have to restructure the data so that mobility category is it’s own column. This can be done using pivot_longer():


by_country_mobil_cat <- d %>%
  pivot_longer(
    cols = 4:9,
    names_to = "mobil_category",
    values_to = "perc_change"
  ) %>%
  nest(-country, -mobil_category)

#head(by_country_mobil_cat)

reactable::reactable(by_country_mobil_cat)

Now we can use the same code structure as before for plotting. pmap() is super useful here because we can list() any number of columns. Note that here the first argument for ggplot() is now ..3, referring to the 3rd item (“data”) in the pmap(list()). Also, the aes() now calls for perc_change instead of parks or residential, etc., because that is the name of the column after using pivot_longer.


all_plots <- by_country_mobil_cat %>% 
  mutate(plots = pmap(list(country, mobil_category, data),
                      ~ggplot(..3, aes(date, perc_change)) +
      geom_line(color = "grey70") +
      geom_point(aes(color = perc_change > 0), 
                 size = 2) +
      scale_color_brewer(palette = "Set2") +      
      theme(axis.title.x = element_blank(), 
            legend.position = 'none', 
            plot.background = element_rect(fill="gray20"),
            panel.background = element_rect(fill="gray20"),
            panel.grid.minor = element_blank(), 
            plot.title = element_text(color="white",
                                      hjust= 0,
                                      vjust=1, 
                                      size=rel(1.5)), 
            axis.text = element_text(color="white", 
                               size=rel(1.5)),
            axis.text.y  = element_text(hjust=1), 
            axis.title.y = element_text(color = "white"),
            plot.caption = element_text(color = "white")) +
      labs(title = glue("Mobility trends to/from {..2} in {..1}"), #unique title
           y = "% change in mobility", 
           caption = "Googlemaps data from https://www.kaggle.com/gustavomodelli/covid-community-measures"))
    )

all_plots$plots[1]

[[1]]


all_plots$plots[10]

[[1]]


all_plots$plots[20]

[[1]]

Using walk2() to save plots

We can use walk2() to save our plots. It’s similar to map() but doesn’t print anything to the screen, instead creating files and file pathways to save to.


fs::dir_create(here::here("plots", "mobility-plots")) # creates a folder called "plots" and a subdirectory called "mobility"

# then create a vector that has all of the file paths: 

files_c <- str_replace_all(tolower(all_plots$country), " ", "-")
files_m <- str_replace_all(all_plots$mobil_category, "_", "-")
paths <- here::here("plots", "mobility-plots", glue("{files_c}-{files_m}.png"))
#paths

walk2(paths, all_plots$plots, ggsave,
      width = 9.5, 
      height = 6.5,
      dpi = 500)

Creating a function to plot linear model

So far, we’ve only explored mobility trends, not mobility trends in relation to infections. And, we’ve illustrated one strategy for applying a function to multiple vectors - parellel iteration. Another strategy that is especially useful for processes that you find yourself repeating is to write a function yourself! Here’s an example of writing a function to plot linear models of mobility trends by infection data.

Before writing a function, it’s usually helpful to just write the code that you would normally use to achieve the result you want. Here is the code for plotting a linear model:

(first I unnested the data)


d_unnest <- by_country_mobil_cat %>%
  unnest(data)

lm_total_mob <- lm(perc_change ~ total_cases, data = d_unnest)
call <- as.character(match.call()) # for customized title in labs() 
plot(perc_change ~ total_cases, data = d_unnest)
abline(lm_total_mob)

Now let’s use that code to generalize it to a function. This function will have 3 arguments:

Here’s the function:


plot_lm <- function(df, mobil_cat, infect_var) {
  lm_mob_by_infect <- lm(mobil_cat ~ infect_var, data = df)
  plot(mobil_cat ~ infect_var, data = df)
  abline(lm_mob_by_infect)
}

Test plot_lm() on one mobility category (parks):


plot_lm(d, d$parks, d$total_cases)

Using map() to loop through our plot_lm() function

We can use map() to loop through each mobilility category column to plot a lm for each.

map(.d, .f) takes a dataframe argument and a function argument.

Here, the ~ means “apply this function” and is paired with the .x argument indicating “over all columns”. For the data argument in map(), I’ve selected only the columns I want to loop through: d[5:9].


map(d[5:9], ~plot_lm(d, .x, d$total_cases))

This is just to demonstrate creating a function and using map() with ~ and .x to loop through that function. Of course, these plots would be more useful with customized title and axes labels, as shown above.

Interesting to see how different the 2nd plot is from the rest! This plot corresponds to “residential” mobility – while the other plots show decreasing mobility to parks, workplaces, etc., this plot shows a large increase in people staying at home as total infections rose.