class: center, middle, inverse, title-slide # tidyr and purrr ### Colin Rundel ### 2019-10-07 --- exclude: true --- class: middle .center[ <img src="imgs/hex-tidyr.png" width="50%" /> ] --- ## Example - Grades Is the following data tidy? ```r (grades = tibble( name = c("Alice", "Bob", "Carol", "Dave"), hw_1 = c(19, 18, 18, 19), hw_2 = c(19, 20, 20, 19), hw_3 = c(18, 18, 18, 18), hw_4 = c(20, 16, 17, 19), exam_1 = c(89, 77, 96, 86), exam_2 = c(95, 88, 99, 82) )) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 4 x 7</span><span> ## name hw_1 hw_2 hw_3 hw_4 exam_1 exam_2 ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice 19 19 18 20 89 95 ## </span><span style='color: #BCBCBC;'>2</span><span> Bob 18 20 18 16 77 88 ## </span><span style='color: #BCBCBC;'>3</span><span> Carol 18 20 18 17 96 99 ## </span><span style='color: #BCBCBC;'>4</span><span> Dave 19 19 18 19 86 82 </span></CODE></PRE> -- <br/><br/> .center[ This is an example of *wide* data, which is almost never *tidy*. ] --- ## Updating `tidyr` The current version of tidyr installed in Noteable is slightly out of date (v0.8.3 vs v1.0.0). To fix this run the following, ```r lib = Sys.getenv("R_LIBS_USER") dir.create(lib, recursive=TRUE, showWarnings=FALSE) install.packages("tidyr", lib=lib) ``` <img src="imgs/tidyr_update.png" width="66%" style="display: block; margin: auto;" /> --- ## Wider <-> Longer <img src="imgs/tidyr_longer-wider.gif" width="66%" style="display: block; margin: auto;" /> .footnote[ From Mara Averick's [tidyexplain](https://github.com/batpigandme/tidyexplain/tree/pivot) repo] --- <img src="imgs/tidyr_longer_wider2.png" width="100%" style="display: block; margin: auto;" /> --- ## `pivot_longer` ```r pivot_longer(table, cols = -country, names_to = "year", values_to = "cases") ``` <img src="imgs/tidyr_gather.png" width="60%" style="display: block; margin: auto;" /> --- ## `pivot_wider` ```r pivot_wider(table, id_cols = country:year, names_from = type, values_from = count) ``` <img src="imgs/tidyr_spread.png" width="70%" style="display: block; margin: auto;" /> --- ## Separate ```r separate(table, col = rate, sep = "/", into = c("cases", "pop")) ``` <img src="imgs/tidyr_separate.png" width="70%" style="display: block; margin: auto;" /> --- ## Unite ```r unite(table, century, year, col = "year", sep = "") ``` <img src="imgs/tidyr_unite.png" width="70%" style="display: block; margin: auto;" /> --- ## Example 1 - Summarizing Grades Is the following data tidy? ```r (grades = tibble( name = c("Alice", "Bob", "Carol", "Dave"), hw_1 = c(19, 18, 18, 19), hw_2 = c(19, 20, 20, 19), hw_3 = c(18, 18, 18, 18), hw_4 = c(20, 16, 17, 19), exam_1 = c(89, 77, 96, 86), exam_2 = c(95, 88, 99, 82) )) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 4 x 7</span><span> ## name hw_1 hw_2 hw_3 hw_4 exam_1 exam_2 ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice 19 19 18 20 89 95 ## </span><span style='color: #BCBCBC;'>2</span><span> Bob 18 20 18 16 77 88 ## </span><span style='color: #BCBCBC;'>3</span><span> Carol 18 20 18 17 96 99 ## </span><span style='color: #BCBCBC;'>4</span><span> Dave 19 19 18 19 86 82 </span></CODE></PRE> -- How would we calculate a final score based on the following formula, `$$\text{score} = 0.6\,\frac{\sum\text{hw}_i}{80} + 0.4\,\frac{\sum\text{exam}_j}{200}$$` --- ## Semi-tidy approach ```r grades %>% mutate( hw_avg = (hw_1+hw_2+hw_3+hw_4)/4, exam_avg = (exam_1+exam_2)/2 ) %>% mutate( overall = 0.4*(exam_avg/100) + 0.6*(hw_avg/20) ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 4 x 10</span><span> ## name hw_1 hw_2 hw_3 hw_4 exam_1 exam_2 hw_avg exam_avg overall ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice 19 19 18 20 89 95 19 92 0.938 ## </span><span style='color: #BCBCBC;'>2</span><span> Bob 18 20 18 16 77 88 18 82.5 0.87 ## </span><span style='color: #BCBCBC;'>3</span><span> Carol 18 20 18 17 96 99 18.2 97.5 0.938 ## </span><span style='color: #BCBCBC;'>4</span><span> Dave 19 19 18 19 86 82 18.8 84 0.899 </span></CODE></PRE> -- <br/><br/> .center[ What is problematic about this approach? ] --- ## Wide -> Long (`pivot_longer`) ```r tidyr::pivot_longer(grades, cols = hw_1:exam_2, names_to = "assignment", values_to = "score") ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 24 x 3</span><span> ## name assignment score ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> Alice hw_1 19 ## </span><span style='color: #BCBCBC;'> 2</span><span> Alice hw_2 19 ## </span><span style='color: #BCBCBC;'> 3</span><span> Alice hw_3 18 ## </span><span style='color: #BCBCBC;'> 4</span><span> Alice hw_4 20 ## </span><span style='color: #BCBCBC;'> 5</span><span> Alice exam_1 89 ## </span><span style='color: #BCBCBC;'> 6</span><span> Alice exam_2 95 ## </span><span style='color: #BCBCBC;'> 7</span><span> Bob hw_1 18 ## </span><span style='color: #BCBCBC;'> 8</span><span> Bob hw_2 20 ## </span><span style='color: #BCBCBC;'> 9</span><span> Bob hw_3 18 ## </span><span style='color: #BCBCBC;'>10</span><span> Bob hw_4 16 ## </span><span style='color: #949494;'># … with 14 more rows</span><span> </span></CODE></PRE> --- ```r tidyr::pivot_longer(grades, cols = hw_1:exam_2, names_to = c("type", "id"), names_sep = "_", values_to = "score") ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 24 x 4</span><span> ## name type id score ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> Alice hw 1 19 ## </span><span style='color: #BCBCBC;'> 2</span><span> Alice hw 2 19 ## </span><span style='color: #BCBCBC;'> 3</span><span> Alice hw 3 18 ## </span><span style='color: #BCBCBC;'> 4</span><span> Alice hw 4 20 ## </span><span style='color: #BCBCBC;'> 5</span><span> Alice exam 1 89 ## </span><span style='color: #BCBCBC;'> 6</span><span> Alice exam 2 95 ## </span><span style='color: #BCBCBC;'> 7</span><span> Bob hw 1 18 ## </span><span style='color: #BCBCBC;'> 8</span><span> Bob hw 2 20 ## </span><span style='color: #BCBCBC;'> 9</span><span> Bob hw 3 18 ## </span><span style='color: #BCBCBC;'>10</span><span> Bob hw 4 16 ## </span><span style='color: #949494;'># … with 14 more rows</span><span> </span></CODE></PRE> --- ## Tidy approach? ```r grades %>% tidyr::pivot_longer( cols = hw_1:exam_2, names_to = c("type", "id"), names_sep = "_", values_to = "score" ) %>% group_by(name, type) %>% summarize(total = sum(score)) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 8 x 3</span><span> ## </span><span style='color: #949494;'># Groups: name [4]</span><span> ## name type total ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice exam 184 ## </span><span style='color: #BCBCBC;'>2</span><span> Alice hw 76 ## </span><span style='color: #BCBCBC;'>3</span><span> Bob exam 165 ## </span><span style='color: #BCBCBC;'>4</span><span> Bob hw 72 ## </span><span style='color: #BCBCBC;'>5</span><span> Carol exam 195 ## </span><span style='color: #BCBCBC;'>6</span><span> Carol hw 73 ## </span><span style='color: #BCBCBC;'>7</span><span> Dave exam 168 ## </span><span style='color: #BCBCBC;'>8</span><span> Dave hw 75 </span></CODE></PRE> --- ## Long -> Wide (`pivot_wider`) ```r grades %>% tidyr::pivot_longer( cols = hw_1:exam_2, names_to = c("type", "id"), names_sep = "_", values_to = "score" ) %>% group_by(name, type) %>% summarize(total = sum(score)) %>% tidyr::pivot_wider( names_from = type, values_from = total ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 4 x 3</span><span> ## </span><span style='color: #949494;'># Groups: name [4]</span><span> ## name exam hw ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice 184 76 ## </span><span style='color: #BCBCBC;'>2</span><span> Bob 165 72 ## </span><span style='color: #BCBCBC;'>3</span><span> Carol 195 73 ## </span><span style='color: #BCBCBC;'>4</span><span> Dave 168 75 </span></CODE></PRE> --- ## Finishing up ```r grades %>% tidyr::pivot_longer( cols = hw_1:exam_2, names_to = c("type", "id"), names_sep = "_", values_to = "score" ) %>% group_by(name, type) %>% summarize(total = sum(score)) %>% tidyr::pivot_wider( names_from = type, values_from = total ) %>% mutate( score = 0.6*(hw/80) + 0.4*(exam/200) ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 4 x 4</span><span> ## </span><span style='color: #949494;'># Groups: name [4]</span><span> ## name exam hw score ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice 184 76 0.938 ## </span><span style='color: #BCBCBC;'>2</span><span> Bob 165 72 0.87 ## </span><span style='color: #BCBCBC;'>3</span><span> Carol 195 73 0.938 ## </span><span style='color: #BCBCBC;'>4</span><span> Dave 168 75 0.899 </span></CODE></PRE> --- class: middle count: false # Functional Programming --- class: middle count: false # Apply functions --- ## Apply functions The apply functions are a collection of tools for functional programming in R, they are variations of the `map` function found in many other languages ```r ??apply --- ## ## Help files with alias or concept or title matching ‘apply’ using fuzzy ## matching: ## ## base::apply Apply Functions Over Array Margins ## base::.subset Internal Objects in Package 'base' ## base::by Apply a Function to a Data Frame Split by Factors ## base::eapply Apply a Function Over Values in an Environment ## base::lapply Apply a Function over a List or Vector ## base::mapply Apply a Function to Multiple List or Vector Arguments ## base::rapply Recursively Apply a Function to a List ## base::tapply Apply a Function Over a Ragged Array ``` --- ## lapply Usage: `lapply(X, FUN, ...)` `lapply` returns a list of the same length as `X`, each element of which is the result of applying `FUN` to the corresponding element of `X`. <br/> .pull-left[ ```r lapply(1:8, sqrt) %>% str() ``` ``` ## List of 8 ## $ : num 1 ## $ : num 1.41 ## $ : num 1.73 ## $ : num 2 ## $ : num 2.24 ## $ : num 2.45 ## $ : num 2.65 ## $ : num 2.83 ``` ] .pull-right[ ```r lapply(1:8, function(x) (x+1)^2) %>% str() ``` ``` ## List of 8 ## $ : num 4 ## $ : num 9 ## $ : num 16 ## $ : num 25 ## $ : num 36 ## $ : num 49 ## $ : num 64 ## $ : num 81 ``` ] --- ```r lapply(1:8, function(x, pow) x^pow, pow=3) %>% str() ``` ``` ## List of 8 ## $ : num 1 ## $ : num 8 ## $ : num 27 ## $ : num 64 ## $ : num 125 ## $ : num 216 ## $ : num 343 ## $ : num 512 ``` ```r lapply(1:8, function(x, pow) x^pow, x=2) %>% str() ``` ``` ## List of 8 ## $ : num 2 ## $ : num 4 ## $ : num 8 ## $ : num 16 ## $ : num 32 ## $ : num 64 ## $ : num 128 ## $ : num 256 ``` --- ## sapply Usage: `sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)` `sapply` is a *user-friendly* version and wrapper of `lapply`, it is a *simplifying* version of lapply. Whenever possible it will return a vector, matrix, or an array. <br/> ```r sapply(1:8, sqrt) ``` ``` ## [1] 1.000000 1.414214 1.732051 2.000000 2.236068 2.449490 2.645751 2.828427 ``` ```r sapply(1:8, function(x) (x+1)^2) ``` ``` ## [1] 4 9 16 25 36 49 64 81 ``` --- ```r sapply(1:8, function(x) c(x, x^2, x^3, x^4)) ``` ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] ## [1,] 1 2 3 4 5 6 7 8 ## [2,] 1 4 9 16 25 36 49 64 ## [3,] 1 8 27 64 125 216 343 512 ## [4,] 1 16 81 256 625 1296 2401 4096 ``` ```r sapply(1:8, function(x) list(x, x^2, x^3, x^4)) ``` ``` ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] ## [1,] 1 2 3 4 5 6 7 8 ## [2,] 1 4 9 16 25 36 49 64 ## [3,] 1 8 27 64 125 216 343 512 ## [4,] 1 16 81 256 625 1296 2401 4096 ``` --- ```r sapply(2:6, seq) ``` ``` ## [[1]] ## [1] 1 2 ## ## [[2]] ## [1] 1 2 3 ## ## [[3]] ## [1] 1 2 3 4 ## ## [[4]] ## [1] 1 2 3 4 5 ## ## [[5]] ## [1] 1 2 3 4 5 6 ``` --- ## [ls]apply and data frames We can use these functions with data frames, the key is to remember that a data frame is just a fancy list. ```r df = data.frame(a = 1:6, b = letters[1:6], c = c(TRUE,FALSE)) lapply(df, class) %>% str() ``` ``` ## List of 3 ## $ a: chr "integer" ## $ b: chr "factor" ## $ c: chr "logical" ``` ```r sapply(df, class) ``` ``` ## a b c ## "integer" "factor" "logical" ``` --- ## other less common applies * `apply(X, MARGIN, FUN, ...)` - applies a function over the rows or columns of a data frame, matrix or array * `vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)` - is similar to `sapply`, but has a enforced return type and size * `mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)` - like `sapply` but will iterate over multiple vectors at the same time. * `rapply(object, f, classes = "ANY", deflt = NULL, how = c("unlist", "replace", "list"), ...)` - a recursive version of `lapply`, behavior depends largely on the `how` argument * `eapply(env, FUN, ..., all.names = FALSE, USE.NAMES = TRUE)` - apply a function over an environment. --- ## Exercise 1 Using the `sw_people` data set in the `repurrrsive` package, extract the names of all of the characters using: * a for loop * one of the apply functions Start by examining the structure of the data using RStudio's viewer, ```r library(repurrrsive) View(sw_people) ``` --- class: middle .center[ <img src="imgs/hex-purrr.png" width="50%" /> ] --- ## Map functions Basic functions for looping over an object and returning a value (of a specific type) - replacement for `lapply`/`sapply`/`vapply`. * `map()` - returns a list. * `map_lgl()` - returns a logical vector. * `map_int()` - returns a integer vector. * `map_dbl()` - returns a double vector. * `map_chr()` - returns a character vector. * `map_dfr()` - returns a data frame by row binding. * `map_dfc()` - returns a data frame by column binding. * `walk()` - returns nothing, call function exclusively for its side effects --- ## Type Consistency R is a weakly / dynamically typed language which means there is no simple way to define a function which enforces the argument or return types. This flexibility can be useful at times, but often it makes it hard to reason about your code and requires more verbose code to handle edge cases. ```r x = list(rnorm(1e3),rnorm(1e3),rnorm(1e3)) ``` ```r map_dbl(x, mean) ``` ``` ## [1] -0.009105024 0.035028661 -0.027726877 ``` ```r map_chr(x, mean) ``` ``` ## [1] "-0.009105" "0.035029" "-0.027727" ``` ```r map_int(x, mean) ``` ``` ## Error: Can't coerce element 1 from a double to a integer ``` --- ## Shortcut - Anonymous Functions An anonymous function is one that is never given a name (assigned to a variable) ```r sapply(1:5, function(x) x^(x+1)) ``` ``` ## [1] 1 8 81 1024 15625 ``` purrr lets us write anonymous functions using one sided formulas where the argument is given by `.` or `.x` for `map` and related functions. ```r map_dbl(1:5, ~ .^(.+1)) ``` ``` ## [1] 1 8 81 1024 15625 ``` ```r map_dbl(1:5, ~ .x^(.x+1)) ``` ``` ## [1] 1 8 81 1024 15625 ``` --- ## Shortcut - Anonymous Functions - `map2` Functions with the `map2` prefix work the same as the `map` functions but they iterate over two objects instead of one. Arguments in an anonymous function are instead given by `.x` and `.y` (or `..1` and `..2`) respectively. ```r map2_dbl(1:5, 1:5, ~ .x^(.y+1)) ``` ``` ## [1] 1 8 81 1024 15625 ``` ```r map2_dbl(1:5, 1:5, ~ ..1^(..2+1)) ``` ``` ## [1] 1 8 81 1024 15625 ``` ```r map2_chr(letters[1:5], LETTERS[1:5], paste0) ``` ``` ## [1] "aA" "bB" "cC" "dD" "eE" ``` --- ## Purrr shortcut - Lookups Very often we want to extract only certain (named) values from a list, `purrr` provides a shortcut for this operation when you provide either a character or numeric value instead of a function to apply. ```r x = list(list(a=1L,b=2L,c=list(d=3L,e=4L)), list(a=5L,b=6L,c=list(d=7L,e=8L,f=9L))) ``` -- .pull-left[ ```r map_int(x, "a") ``` ``` ## [1] 1 5 ``` ```r map_dbl(x, c("c","e")) ``` ``` ## [1] 4 8 ``` ```r map_chr(x, list(3,"d")) ``` ``` ## [1] "3" "7" ``` ] -- .pull-right[ ```r map_df(x, 3) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 2 x 3</span><span> ## d e f ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> 3 4 </span><span style='color: #BB0000;'>NA</span><span> ## </span><span style='color: #BCBCBC;'>2</span><span> 7 8 9 </span></CODE></PRE> ```r map_dfc(x, 3) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 1 x 5</span><span> ## d e d1 e1 f ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> 3 4 7 8 9 </span></CODE></PRE> ] --- ```r x = list(list(a=1L,b=2L,c=list(d=3L,e=4L)), list(a=5L,b=6L,c=list(d=7L,e=8L,f=9L))) ``` ```r map(x, list(3,"f")) ``` ``` ## [[1]] ## NULL ## ## [[2]] ## [1] 9 ``` ```r map_int(x, list(3,"f")) ``` ``` ## Result 1 must be a single integer, not NULL of length 0 ``` ```r map_int(x, list(3,"f"), .default=NA) ``` ``` ## [1] NA 9 ``` --- ## Exercise 2 Using the `sw_people` data set again, generate a tidy data frame (tibble) containing as many details as possible. --- ## list columns ```r d = tibble( name = purrr::map_chr(sw_people, "name"), starships = purrr::map(sw_people, "starships") ) d ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 87 x 2</span><span> ## name starships ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><list></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> Luke Skywalker </span><span style='color: #949494;'><chr [2]></span><span> ## </span><span style='color: #BCBCBC;'> 2</span><span> C-3PO </span><span style='color: #949494;'><NULL></span><span> ## </span><span style='color: #BCBCBC;'> 3</span><span> R2-D2 </span><span style='color: #949494;'><NULL></span><span> ## </span><span style='color: #BCBCBC;'> 4</span><span> Darth Vader </span><span style='color: #949494;'><chr [1]></span><span> ## </span><span style='color: #BCBCBC;'> 5</span><span> Leia Organa </span><span style='color: #949494;'><NULL></span><span> ## </span><span style='color: #BCBCBC;'> 6</span><span> Owen Lars </span><span style='color: #949494;'><NULL></span><span> ## </span><span style='color: #BCBCBC;'> 7</span><span> Beru Whitesun lars </span><span style='color: #949494;'><NULL></span><span> ## </span><span style='color: #BCBCBC;'> 8</span><span> R5-D4 </span><span style='color: #949494;'><NULL></span><span> ## </span><span style='color: #BCBCBC;'> 9</span><span> Biggs Darklighter </span><span style='color: #949494;'><chr [1]></span><span> ## </span><span style='color: #BCBCBC;'>10</span><span> Obi-Wan Kenobi </span><span style='color: #949494;'><chr [5]></span><span> ## </span><span style='color: #949494;'># … with 77 more rows</span><span> </span></CODE></PRE> --- ```r d %>% mutate( n_starships = purrr::map_int(starships, length) ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 87 x 3</span><span> ## name starships n_starships ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><list></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> Luke Skywalker </span><span style='color: #949494;'><chr [2]></span><span> 2 ## </span><span style='color: #BCBCBC;'> 2</span><span> C-3PO </span><span style='color: #949494;'><NULL></span><span> 0 ## </span><span style='color: #BCBCBC;'> 3</span><span> R2-D2 </span><span style='color: #949494;'><NULL></span><span> 0 ## </span><span style='color: #BCBCBC;'> 4</span><span> Darth Vader </span><span style='color: #949494;'><chr [1]></span><span> 1 ## </span><span style='color: #BCBCBC;'> 5</span><span> Leia Organa </span><span style='color: #949494;'><NULL></span><span> 0 ## </span><span style='color: #BCBCBC;'> 6</span><span> Owen Lars </span><span style='color: #949494;'><NULL></span><span> 0 ## </span><span style='color: #BCBCBC;'> 7</span><span> Beru Whitesun lars </span><span style='color: #949494;'><NULL></span><span> 0 ## </span><span style='color: #BCBCBC;'> 8</span><span> R5-D4 </span><span style='color: #949494;'><NULL></span><span> 0 ## </span><span style='color: #BCBCBC;'> 9</span><span> Biggs Darklighter </span><span style='color: #949494;'><chr [1]></span><span> 1 ## </span><span style='color: #BCBCBC;'>10</span><span> Obi-Wan Kenobi </span><span style='color: #949494;'><chr [5]></span><span> 5 ## </span><span style='color: #949494;'># … with 77 more rows</span><span> </span></CODE></PRE> --- class: middle .center[ <img src="imgs/hex-tidyr.png" width="25%" /> and <img src="imgs/hex-purrr.png" width="25%" /> ] --- ## Tidy data from nested lists The recent version of `tidyr` have added several functions that are designed to aide in the tidying of heirachical data. Since they are part of `tidyr` all of the following functions work with data frames. From `tidyr` > `hoist()`, `unnest_longer()`, and `unnest_wider()` provide tools for rectangling, collapsing deeply nested lists into regular columns. --- ```r (d = tibble(people=sw_people)) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 87 x 1</span><span> ## people ## </span><span style='color: #949494;font-style: italic;'><list></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='color: #949494;'><named list [16]></span><span> ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='color: #949494;'><named list [14]></span><span> ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='color: #949494;'><named list [14]></span><span> ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='color: #949494;'><named list [15]></span><span> ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='color: #949494;'><named list [15]></span><span> ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='color: #949494;'><named list [14]></span><span> ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='color: #949494;'><named list [14]></span><span> ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='color: #949494;'><named list [14]></span><span> ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='color: #949494;'><named list [15]></span><span> ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='color: #949494;'><named list [16]></span><span> ## </span><span style='color: #949494;'># … with 77 more rows</span><span> </span></CODE></PRE> -- ```r unnest_wider(d, people) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 87 x 16</span><span> ## name height mass hair_color skin_color eye_color birth_year gender ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> Luke… 172 77 blond fair blue 19BBY male ## </span><span style='color: #BCBCBC;'> 2</span><span> C-3PO 167 75 n/a gold yellow 112BBY n/a ## </span><span style='color: #BCBCBC;'> 3</span><span> R2-D2 96 32 n/a white, bl… red 33BBY n/a ## </span><span style='color: #BCBCBC;'> 4</span><span> Dart… 202 136 none white yellow 41.9BBY male ## </span><span style='color: #BCBCBC;'> 5</span><span> Leia… 150 49 brown light brown 19BBY female ## </span><span style='color: #BCBCBC;'> 6</span><span> Owen… 178 120 brown, gr… light blue 52BBY male ## </span><span style='color: #BCBCBC;'> 7</span><span> Beru… 165 75 brown light blue 47BBY female ## </span><span style='color: #BCBCBC;'> 8</span><span> R5-D4 97 32 n/a white, red red unknown n/a ## </span><span style='color: #BCBCBC;'> 9</span><span> Bigg… 183 84 black light brown 24BBY male ## </span><span style='color: #BCBCBC;'>10</span><span> Obi-… 182 77 auburn, w… fair blue-gray 57BBY male ## </span><span style='color: #949494;'># … with 77 more rows, and 8 more variables: homeworld </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, films </span><span style='color: #949494;font-style: italic;'><list></span><span style='color: #949494;'>, ## # species </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, vehicles </span><span style='color: #949494;font-style: italic;'><list></span><span style='color: #949494;'>, starships </span><span style='color: #949494;font-style: italic;'><list></span><span style='color: #949494;'>, created </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # edited </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, url </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span></CODE></PRE> --- ```r unnest_longer(d, people) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 1,244 x 2</span><span> ## people people_id ## </span><span style='color: #949494;font-style: italic;'><list></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='color: #949494;'><chr [1]></span><span> name ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='color: #949494;'><chr [1]></span><span> height ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='color: #949494;'><chr [1]></span><span> mass ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='color: #949494;'><chr [1]></span><span> hair_color ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='color: #949494;'><chr [1]></span><span> skin_color ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='color: #949494;'><chr [1]></span><span> eye_color ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='color: #949494;'><chr [1]></span><span> birth_year ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='color: #949494;'><chr [1]></span><span> gender ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='color: #949494;'><chr [1]></span><span> homeworld ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='color: #949494;'><chr [5]></span><span> films ## </span><span style='color: #949494;'># … with 1,234 more rows</span><span> </span></CODE></PRE> --- ```r unnest_wider(d, people) %>% select(name, starships) %>% unnest_longer(starships, ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 98 x 2</span><span> ## name starships ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> Luke Skywalker http://swapi.co/api/starships/12/ ## </span><span style='color: #BCBCBC;'> 2</span><span> Luke Skywalker http://swapi.co/api/starships/22/ ## </span><span style='color: #BCBCBC;'> 3</span><span> C-3PO </span><span style='color: #BB0000;'>NA</span><span> ## </span><span style='color: #BCBCBC;'> 4</span><span> R2-D2 </span><span style='color: #BB0000;'>NA</span><span> ## </span><span style='color: #BCBCBC;'> 5</span><span> Darth Vader http://swapi.co/api/starships/13/ ## </span><span style='color: #BCBCBC;'> 6</span><span> Leia Organa </span><span style='color: #BB0000;'>NA</span><span> ## </span><span style='color: #BCBCBC;'> 7</span><span> Owen Lars </span><span style='color: #BB0000;'>NA</span><span> ## </span><span style='color: #BCBCBC;'> 8</span><span> Beru Whitesun lars </span><span style='color: #BB0000;'>NA</span><span> ## </span><span style='color: #BCBCBC;'> 9</span><span> R5-D4 </span><span style='color: #BB0000;'>NA</span><span> ## </span><span style='color: #BCBCBC;'>10</span><span> Biggs Darklighter http://swapi.co/api/starships/12/ ## </span><span style='color: #949494;'># … with 88 more rows</span><span> </span></CODE></PRE> --- ```r tibble(people = sw_people) %>% hoist( people, name = "name", height = "height", mass = "mass", primary_starship = list("starships", 1) ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 87 x 5</span><span> ## name height mass primary_starship people ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><list></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> Luke Skywalker 172 77 http://swapi.co/api/starships… </span><span style='color: #949494;'><named list [1</span><span>… ## </span><span style='color: #BCBCBC;'> 2</span><span> C-3PO 167 75 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #949494;'><named list [1</span><span>… ## </span><span style='color: #BCBCBC;'> 3</span><span> R2-D2 96 32 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #949494;'><named list [1</span><span>… ## </span><span style='color: #BCBCBC;'> 4</span><span> Darth Vader 202 136 http://swapi.co/api/starships… </span><span style='color: #949494;'><named list [1</span><span>… ## </span><span style='color: #BCBCBC;'> 5</span><span> Leia Organa 150 49 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #949494;'><named list [1</span><span>… ## </span><span style='color: #BCBCBC;'> 6</span><span> Owen Lars 178 120 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #949494;'><named list [1</span><span>… ## </span><span style='color: #BCBCBC;'> 7</span><span> Beru Whitesun la… 165 75 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #949494;'><named list [1</span><span>… ## </span><span style='color: #BCBCBC;'> 8</span><span> R5-D4 97 32 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #949494;'><named list [1</span><span>… ## </span><span style='color: #BCBCBC;'> 9</span><span> Biggs Darklighter 183 84 http://swapi.co/api/starships… </span><span style='color: #949494;'><named list [1</span><span>… ## </span><span style='color: #BCBCBC;'>10</span><span> Obi-Wan Kenobi 182 77 http://swapi.co/api/starships… </span><span style='color: #949494;'><named list [1</span><span>… ## </span><span style='color: #949494;'># … with 77 more rows</span><span> </span></CODE></PRE> --- class: middle count: false # Acknowledgments --- ## Acknowledgments Above materials are derived in part from the following sources: * Hadley Wickham - [Adv-R Functionals](http://adv-r.had.co.nz/Functionals.html) * Hadley Wickham - [R for Data Science](http://r4ds.had.co.nz/) * Neil Saunders - [A brief introduction to "apply" in R](http://nsaunders.wordpress.com/2010/08/20/a-brief-introduction-to-apply-in-r/) * Jenny Bryan - [Purrr Tutorial](https://jennybc.github.io/purrr-tutorial/) * [R Language Definition](http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html)