class: center, middle, inverse, title-slide # Tidy Data, dplyr, and tidyr ### Colin Rundel ### 2019-09-30 --- exclude: true --- class: middle count: false # Functions --- ## Function Parts The two parts of a function are the arguments (`formals`) and the code (`body`). ```r gcd = function(long1, lat1, long2, lat2) { R = 6371 # Earth mean radius in km # distance in km acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1)) * R } ``` -- .pull-left[ ```r formals(gcd) ``` ``` ## $long1 ## ## ## $lat1 ## ## ## $long2 ## ## ## $lat2 ``` ] .pull-right[ ```r body(gcd) ``` ``` ## { ## R = 6371 ## acos(sin(lat1) * sin(lat2) + cos(lat1) * cos(lat2) * cos(long2 - ## long1)) * R ## } ``` ] --- ## Return values There are two ways of returning values in R: explicitly or implicitly. <br/> *Explicit* - includes one or more `return` statements ```r f = function(x) { return(x*x) } ``` <br/> *Implicit* - value of the last statement is returned. ```r f = function(x) { x*x } ``` --- ## Argument names When defining a function we are also implicitly defining names for the arguments, when calling the function we can use these names to pass arguments in a different order. ```r f = function(x,y,z) { paste0("x=",x," y=",y," z=",z) } ``` .pull-left[ ```r f(1,2,3) ``` ``` ## [1] "x=1 y=2 z=3" ``` ```r f(z=1,x=2,y=3) ``` ``` ## [1] "x=2 y=3 z=1" ``` ] .pull-right[ ```r f(y=2,1,3) ``` ``` ## [1] "x=1 y=2 z=3" ``` ```r f(y=2,1,x=3) ``` ``` ## [1] "x=3 y=2 z=1" ``` ] ```r f(1,2,3,m=1) ``` ``` ## Error in f(1, 2, 3, m = 1): unused argument (m = 1) ``` --- ## Argument defaults It is also possible to give function arguments default values so that they don't need to be provided every time the function is called. ```r f = function(x,y=1,z=1) { paste0("x=",x," y=",y," z=",z) } ``` ```r f() ``` ``` ## Error in paste0("x=", x, " y=", y, " z=", z): argument "x" is missing, with no default ``` ```r f(x=3) ``` ``` ## [1] "x=3 y=1 z=1" ``` ```r f(y=2,2) ``` ``` ## [1] "x=2 y=2 z=1" ``` --- ## Return values Many of the built in functions in R will return a value, even if you haven't noticed that this is the case. This can be particularly problematic if you are using implicit return values, since you might be returning something you didn't expect. Some examples, .pull-left[ ```r x = y = 5 ``` ```r x ``` ``` ## [1] 5 ``` ```r y ``` ``` ## [1] 5 ``` ] .pull-right[ ```r z = if (rnorm(1) > 0) { "pos" } else { "neg" } ``` ```r z ``` ``` ## [1] "neg" ``` ] -- <br/> .pull-left[ ```r y = 5 ``` ] .pull-right[ ```r if (rnorm(1) > 0) { "pos" } else { "neg" } ``` ``` ## [1] "pos" ``` ] --- ## More oddness ```r r = rnorm(1) ``` .pull-left[ ```r if (r > 0) { print("pos") } else { print("neg") } ``` ``` ## [1] "neg" ``` ] .pull-right[ ```r z = if (r > 0) { print("pos") } else { print("neg") } ``` ``` ## [1] "neg" ``` ```r z ``` ``` ## [1] "neg" ``` ] -- <br/> ```r typeof(print("ABC")) ``` ``` ## [1] "ABC" ``` ``` ## [1] "character" ``` ```r z = typeof(print("ABC")) ``` ``` ## [1] "ABC" ``` ```r z ``` ``` ## [1] "character" ``` --- ## Invisible values .pull-left[ ```r f = function(x) { invisible(x) } ``` ] .pull-right[ ```r g = function(x) { x } ``` ] -- <br/> .pull-left[ ```r f(1) ``` ] .pull-right[ ```r g(1) ``` ``` ## [1] 1 ``` ] -- <br/> .pull-left[ ```r x = f(1) x ``` ``` ## [1] 1 ``` ] .pull-right[ ```r y = g(1) y ``` ``` ## [1] 1 ``` ] --- ## Even Operators are functions .pull-left[ ```r `+` ``` ``` ## function (e1, e2) .Primitive("+") ``` ```r typeof(`+`) ``` ``` ## [1] "builtin" ``` ```r `+`(4:1,2) ``` ``` ## [1] 6 5 4 3 ``` ```r 4:1 + 2 ``` ``` ## [1] 6 5 4 3 ``` ] .pull-right[ ```r `|` ``` ``` ## function (e1, e2) .Primitive("|") ``` ```r typeof(`|`) ``` ``` ## [1] "builtin" ``` ```r `|`(TRUE,FALSE) ``` ``` ## [1] TRUE ``` ```r TRUE | FALSE ``` ``` ## [1] TRUE ``` ] -- <br/> ```r `$` ``` ``` ## .Primitive("$") ``` ```r `[[` ``` ``` ## .Primitive("[[") ``` ```r `names<-` ``` ``` ## function (x, value) .Primitive("names<-") ``` --- class: middle count: false .center[ <img src="imgs/hex-tidyverse.png" width="50%" /> ] --- ## Tidy data <img src="imgs/tidy.png" width="100%" /> * One variable per column * One observation per row * Each type of observational unit forms a table .footnote[ From R4DS - [tidy data](r4ds.had.co.nz/tidy-data.html) ] --- class: middle count: false .center[ <img src="imgs/hex-tibble.png" width="50%" /> ] --- ## Modern data frames Hadley Wickham / RStudio have a package that modifies data frames to be more modern, or as he calls them surly and lazy. ```r library(tibble) class(iris) ``` ``` ## [1] "data.frame" ``` ```r tbl_iris = as_tibble(iris) class(tbl_iris) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` --- ## Fancy Printing .font_small[ ```r tbl_iris ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 150 x 5</span><span> ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><fct></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> 5.1 3.5 1.4 0.2 setosa ## </span><span style='color: #BCBCBC;'> 2</span><span> 4.9 3 1.4 0.2 setosa ## </span><span style='color: #BCBCBC;'> 3</span><span> 4.7 3.2 1.3 0.2 setosa ## </span><span style='color: #BCBCBC;'> 4</span><span> 4.6 3.1 1.5 0.2 setosa ## </span><span style='color: #BCBCBC;'> 5</span><span> 5 3.6 1.4 0.2 setosa ## </span><span style='color: #BCBCBC;'> 6</span><span> 5.4 3.9 1.7 0.4 setosa ## </span><span style='color: #BCBCBC;'> 7</span><span> 4.6 3.4 1.4 0.3 setosa ## </span><span style='color: #BCBCBC;'> 8</span><span> 5 3.4 1.5 0.2 setosa ## </span><span style='color: #BCBCBC;'> 9</span><span> 4.4 2.9 1.4 0.2 setosa ## </span><span style='color: #BCBCBC;'>10</span><span> 4.9 3.1 1.5 0.1 setosa ## </span><span style='color: #949494;'># … with 140 more rows</span><span> </span></CODE></PRE> ```r iris ``` ``` ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5.0 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa ## 11 5.4 3.7 1.5 0.2 setosa ## 12 4.8 3.4 1.6 0.2 setosa ## 13 4.8 3.0 1.4 0.1 setosa ## 14 4.3 3.0 1.1 0.1 setosa ## 15 5.8 4.0 1.2 0.2 setosa ## 16 5.7 4.4 1.5 0.4 setosa ## 17 5.4 3.9 1.3 0.4 setosa ## 18 5.1 3.5 1.4 0.3 setosa ## 19 5.7 3.8 1.7 0.3 setosa ## 20 5.1 3.8 1.5 0.3 setosa ## 21 5.4 3.4 1.7 0.2 setosa ## 22 5.1 3.7 1.5 0.4 setosa ## 23 4.6 3.6 1.0 0.2 setosa ## 24 5.1 3.3 1.7 0.5 setosa ## 25 4.8 3.4 1.9 0.2 setosa ## 26 5.0 3.0 1.6 0.2 setosa ## 27 5.0 3.4 1.6 0.4 setosa ## 28 5.2 3.5 1.5 0.2 setosa ## 29 5.2 3.4 1.4 0.2 setosa ## 30 4.7 3.2 1.6 0.2 setosa ## 31 4.8 3.1 1.6 0.2 setosa ## 32 5.4 3.4 1.5 0.4 setosa ## 33 5.2 4.1 1.5 0.1 setosa ## 34 5.5 4.2 1.4 0.2 setosa ## 35 4.9 3.1 1.5 0.2 setosa ## 36 5.0 3.2 1.2 0.2 setosa ## 37 5.5 3.5 1.3 0.2 setosa ## 38 4.9 3.6 1.4 0.1 setosa ## 39 4.4 3.0 1.3 0.2 setosa ## 40 5.1 3.4 1.5 0.2 setosa ## 41 5.0 3.5 1.3 0.3 setosa ## 42 4.5 2.3 1.3 0.3 setosa ## 43 4.4 3.2 1.3 0.2 setosa ## 44 5.0 3.5 1.6 0.6 setosa ## 45 5.1 3.8 1.9 0.4 setosa ## 46 4.8 3.0 1.4 0.3 setosa ## 47 5.1 3.8 1.6 0.2 setosa ## 48 4.6 3.2 1.4 0.2 setosa ## 49 5.3 3.7 1.5 0.2 setosa ## 50 5.0 3.3 1.4 0.2 setosa ## 51 7.0 3.2 4.7 1.4 versicolor ## 52 6.4 3.2 4.5 1.5 versicolor ## 53 6.9 3.1 4.9 1.5 versicolor ## 54 5.5 2.3 4.0 1.3 versicolor ## 55 6.5 2.8 4.6 1.5 versicolor ## 56 5.7 2.8 4.5 1.3 versicolor ## 57 6.3 3.3 4.7 1.6 versicolor ## 58 4.9 2.4 3.3 1.0 versicolor ## 59 6.6 2.9 4.6 1.3 versicolor ## 60 5.2 2.7 3.9 1.4 versicolor ## 61 5.0 2.0 3.5 1.0 versicolor ## 62 5.9 3.0 4.2 1.5 versicolor ## 63 6.0 2.2 4.0 1.0 versicolor ## 64 6.1 2.9 4.7 1.4 versicolor ## 65 5.6 2.9 3.6 1.3 versicolor ## 66 6.7 3.1 4.4 1.4 versicolor ## 67 5.6 3.0 4.5 1.5 versicolor ## 68 5.8 2.7 4.1 1.0 versicolor ## 69 6.2 2.2 4.5 1.5 versicolor ## 70 5.6 2.5 3.9 1.1 versicolor ## 71 5.9 3.2 4.8 1.8 versicolor ## 72 6.1 2.8 4.0 1.3 versicolor ## 73 6.3 2.5 4.9 1.5 versicolor ## 74 6.1 2.8 4.7 1.2 versicolor ## 75 6.4 2.9 4.3 1.3 versicolor ## 76 6.6 3.0 4.4 1.4 versicolor ## 77 6.8 2.8 4.8 1.4 versicolor ## 78 6.7 3.0 5.0 1.7 versicolor ## 79 6.0 2.9 4.5 1.5 versicolor ## 80 5.7 2.6 3.5 1.0 versicolor ## 81 5.5 2.4 3.8 1.1 versicolor ## 82 5.5 2.4 3.7 1.0 versicolor ## 83 5.8 2.7 3.9 1.2 versicolor ## 84 6.0 2.7 5.1 1.6 versicolor ## 85 5.4 3.0 4.5 1.5 versicolor ## 86 6.0 3.4 4.5 1.6 versicolor ## 87 6.7 3.1 4.7 1.5 versicolor ## 88 6.3 2.3 4.4 1.3 versicolor ## 89 5.6 3.0 4.1 1.3 versicolor ## 90 5.5 2.5 4.0 1.3 versicolor ## 91 5.5 2.6 4.4 1.2 versicolor ## 92 6.1 3.0 4.6 1.4 versicolor ## 93 5.8 2.6 4.0 1.2 versicolor ## 94 5.0 2.3 3.3 1.0 versicolor ## 95 5.6 2.7 4.2 1.3 versicolor ## 96 5.7 3.0 4.2 1.2 versicolor ## 97 5.7 2.9 4.2 1.3 versicolor ## 98 6.2 2.9 4.3 1.3 versicolor ## 99 5.1 2.5 3.0 1.1 versicolor ## 100 5.7 2.8 4.1 1.3 versicolor ## 101 6.3 3.3 6.0 2.5 virginica ## 102 5.8 2.7 5.1 1.9 virginica ## 103 7.1 3.0 5.9 2.1 virginica ## 104 6.3 2.9 5.6 1.8 virginica ## 105 6.5 3.0 5.8 2.2 virginica ## 106 7.6 3.0 6.6 2.1 virginica ## 107 4.9 2.5 4.5 1.7 virginica ## 108 7.3 2.9 6.3 1.8 virginica ## 109 6.7 2.5 5.8 1.8 virginica ## 110 7.2 3.6 6.1 2.5 virginica ## 111 6.5 3.2 5.1 2.0 virginica ## 112 6.4 2.7 5.3 1.9 virginica ## 113 6.8 3.0 5.5 2.1 virginica ## 114 5.7 2.5 5.0 2.0 virginica ## 115 5.8 2.8 5.1 2.4 virginica ## 116 6.4 3.2 5.3 2.3 virginica ## 117 6.5 3.0 5.5 1.8 virginica ## 118 7.7 3.8 6.7 2.2 virginica ## 119 7.7 2.6 6.9 2.3 virginica ## 120 6.0 2.2 5.0 1.5 virginica ## 121 6.9 3.2 5.7 2.3 virginica ## 122 5.6 2.8 4.9 2.0 virginica ## 123 7.7 2.8 6.7 2.0 virginica ## 124 6.3 2.7 4.9 1.8 virginica ## 125 6.7 3.3 5.7 2.1 virginica ## 126 7.2 3.2 6.0 1.8 virginica ## 127 6.2 2.8 4.8 1.8 virginica ## 128 6.1 3.0 4.9 1.8 virginica ## 129 6.4 2.8 5.6 2.1 virginica ## 130 7.2 3.0 5.8 1.6 virginica ## 131 7.4 2.8 6.1 1.9 virginica ## 132 7.9 3.8 6.4 2.0 virginica ## 133 6.4 2.8 5.6 2.2 virginica ## 134 6.3 2.8 5.1 1.5 virginica ## 135 6.1 2.6 5.6 1.4 virginica ## 136 7.7 3.0 6.1 2.3 virginica ## 137 6.3 3.4 5.6 2.4 virginica ## 138 6.4 3.1 5.5 1.8 virginica ## 139 6.0 3.0 4.8 1.8 virginica ## 140 6.9 3.1 5.4 2.1 virginica ## 141 6.7 3.1 5.6 2.4 virginica ## 142 6.9 3.1 5.1 2.3 virginica ## 143 5.8 2.7 5.1 1.9 virginica ## 144 6.8 3.2 5.9 2.3 virginica ## 145 6.7 3.3 5.7 2.5 virginica ## 146 6.7 3.0 5.2 2.3 virginica ## 147 6.3 2.5 5.0 1.9 virginica ## 148 6.5 3.0 5.2 2.0 virginica ## 149 6.2 3.4 5.4 2.3 virginica ## 150 5.9 3.0 5.1 1.8 virginica ``` ] --- .font_small[ ```r df = data.frame(x = rnorm(10,sd=5), y = rnorm(10), z = runif(10)) ``` ```r as_tibble(df) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 10 x 3</span><span> ## x y z ## </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> -</span><span style='color: #BB0000;'>1.58</span><span> 0.301 0.575 ## </span><span style='color: #BCBCBC;'> 2</span><span> 2.79 0.380 0.590 ## </span><span style='color: #BCBCBC;'> 3</span><span> -</span><span style='color: #BB0000;'>2.53</span><span> -</span><span style='color: #BB0000;'>1.04</span><span> 0.525 ## </span><span style='color: #BCBCBC;'> 4</span><span> -</span><span style='color: #BB0000;'>2.64</span><span> -</span><span style='color: #BB0000;'>1.12</span><span> 0.835 ## </span><span style='color: #BCBCBC;'> 5</span><span> -</span><span style='color: #BB0000;'>3.20</span><span> -</span><span style='color: #BB0000;'>0.324</span><span> 0.673 ## </span><span style='color: #BCBCBC;'> 6</span><span> -</span><span style='color: #BB0000;'>3.77</span><span> -</span><span style='color: #BB0000;'>0.344</span><span> 0.823 ## </span><span style='color: #BCBCBC;'> 7</span><span> -</span><span style='color: #BB0000;'>6.88</span><span> -</span><span style='color: #BB0000;'>0.154</span><span> 0.710 ## </span><span style='color: #BCBCBC;'> 8</span><span> -</span><span style='color: #BB0000;'>1.24</span><span> 0.657 0.642 ## </span><span style='color: #BCBCBC;'> 9</span><span> 9.01 0.437 0.056</span><span style='text-decoration: underline;'>2</span><span> ## </span><span style='color: #BCBCBC;'>10</span><span> 3.01 1.60 0.417 </span></CODE></PRE> ```r df ``` ``` ## x y z ## 1 -1.581470 0.3008048 0.57475339 ## 2 2.789190 0.3803478 0.58980821 ## 3 -2.526325 -1.0411151 0.52470878 ## 4 -2.644518 -1.1245520 0.83510024 ## 5 -3.196826 -0.3237571 0.67307743 ## 6 -3.771731 -0.3442898 0.82257162 ## 7 -6.875631 -0.1535736 0.70978067 ## 8 -1.244910 0.6570045 0.64154025 ## 9 9.011300 0.4373594 0.05624825 ## 10 3.005200 1.5967680 0.41691734 ``` ] --- ## Tibbles are lazy ```r tbl_iris[1,] ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 1 x 5</span><span> ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><fct></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> 5.1 3.5 1.4 0.2 setosa </span></CODE></PRE> -- .pull-left[ ```r tbl_iris[,"Species"] ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 150 x 1</span><span> ## Species ## </span><span style='color: #949494;font-style: italic;'><fct></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> setosa ## </span><span style='color: #BCBCBC;'> 2</span><span> setosa ## </span><span style='color: #BCBCBC;'> 3</span><span> setosa ## </span><span style='color: #BCBCBC;'> 4</span><span> setosa ## </span><span style='color: #BCBCBC;'> 5</span><span> setosa ## </span><span style='color: #BCBCBC;'> 6</span><span> setosa ## </span><span style='color: #BCBCBC;'> 7</span><span> setosa ## </span><span style='color: #BCBCBC;'> 8</span><span> setosa ## </span><span style='color: #BCBCBC;'> 9</span><span> setosa ## </span><span style='color: #BCBCBC;'>10</span><span> setosa ## </span><span style='color: #949494;'># … with 140 more rows</span><span> </span></CODE></PRE> ] -- .pull-right[ ```r tibble( x = 1:3, y = c("A","B","C") ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 3 x 2</span><span> ## x y ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> 1 A ## </span><span style='color: #BCBCBC;'>2</span><span> 2 B ## </span><span style='color: #BCBCBC;'>3</span><span> 3 C </span></CODE></PRE> ] --- ## More laziness .pull-left[ ```r head( tbl_iris[1] ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 6 x 1</span><span> ## Sepal.Length ## </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> 5.1 ## </span><span style='color: #BCBCBC;'>2</span><span> 4.9 ## </span><span style='color: #BCBCBC;'>3</span><span> 4.7 ## </span><span style='color: #BCBCBC;'>4</span><span> 4.6 ## </span><span style='color: #BCBCBC;'>5</span><span> 5 ## </span><span style='color: #BCBCBC;'>6</span><span> 5.4 </span></CODE></PRE> ] -- .pull-right[ ```r head( tbl_iris[[1]] ) ``` ``` ## [1] 5.1 4.9 4.7 4.6 5.0 5.4 ``` ] -- <br/> .font_small[ ```r head( iris$Sp ) ``` ``` ## [1] setosa setosa setosa setosa setosa setosa ## Levels: setosa versicolor virginica ``` ```r tbl_iris$Sp ``` ``` ## Warning: Unknown or uninitialised column: 'Sp'. ``` ``` ## NULL ``` ```r head( tbl_iris$Species ) ``` ``` ## [1] setosa setosa setosa setosa setosa setosa ## Levels: setosa versicolor virginica ``` ] --- ## Tibbles and length coercion ```r tibble(x = 1:4, y = 1) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 4 x 2</span><span> ## x y ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> 1 1 ## </span><span style='color: #BCBCBC;'>2</span><span> 2 1 ## </span><span style='color: #BCBCBC;'>3</span><span> 3 1 ## </span><span style='color: #BCBCBC;'>4</span><span> 4 1 </span></CODE></PRE> -- ```r tibble(x = 1:4, y = 1:2) ``` ``` ## Tibble columns must have consistent lengths, only values of length one are recycled: ## * Length 2: Column `y` ## * Length 4: Column `x` ``` -- ```r tibble(x = 1:4, y = 1:3) ``` ``` ## Tibble columns must have consistent lengths, only values of length one are recycled: ## * Length 3: Column `y` ## * Length 4: Column `x` ``` --- ## Tibbles and S3 ```r d = tibble( x = 1:3, y = c("A","B","C") ) class(d) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` -- <br/> ```r class(d) = rev(class(d)) class(d) ``` ``` ## [1] "data.frame" "tbl" "tbl_df" ``` ```r d ``` ``` ## x y ## 1 1 A ## 2 2 B ## 3 3 C ``` --- class: middle count: false .center[ <img src="imgs/hex-magrittr.jpg" width="50%" /> # magrittr ] --- ## Pipes in R You can think about the following sequence of actions - find key, unlock car, start car, drive to school, park. <br/> Expressed as a set of nested functions in R pseudocode this would look like: ```r park(drive(start_car(find("keys")), to="campus")) ``` <br/> Writing it out using pipes give it a more natural (and easier to read) structure: ```r find("keys") %>% start_car() %>% drive(to="campus") %>% park() ``` --- ## Approaches All of the following are fine, it comes down to personal preference: <br/> Nested: ```r h( g( f(x), y=1), z=1 ) ``` <br/> Piped: ```r f(x) %>% g(y=1) %>% h(z=1) ``` <br/> Intermediate: ```r res = f(x) res = g(res, y=1) res = h(res, z=1) ``` --- ## What about other arguments? Sometimes we want to send our results to an function argument other than first one or we want to use the previous result for multiple arguments. In these cases we can refer to the previous result using `.`. -- ```r data.frame(a = 1:3, b = 3:1) %>% lm(a~b, data=.) ``` ``` ## ## Call: ## lm(formula = a ~ b, data = .) ## ## Coefficients: ## (Intercept) b ## 4 -1 ``` -- ```r data.frame(a = 1:3, b = 3:1) %>% .[[1]] ``` ``` ## [1] 1 2 3 ``` -- ```r data.frame(a = 1:3, b = 3:1) %>% .[[length(.)]] ``` ``` ## [1] 3 2 1 ``` --- class: middle .center[ <img src="imgs/hex-dplyr.png" width="50%" /> ] --- ## A Grammar of Data Manipulation dplyr is based on the concepts of functions as verbs that manipulate data frames. Single data frame functions / verbs: * `filter()` / `slice()`: pick rows based on criteria * `select()` / `rename()`: select columns by name * `pull()`: grab a column as a vector * `arrange()`: reorder rows * `mutate()` / `transmute()`: add new variables * `distinct()`: filter for unique rows * `sample_n()` / `sample_frac()`: randomly sample rows * `summarise()` / `count()`: reduce variables to values * `group_by()` / `ungroup()`: modify other verbs to act on subsets * ... (many more) --- ## dplyr rules 1. First argument is *always* a data frame 2. Subsequent arguments say what to do with that data frame 3. *Always* return a data frame 4. Don't modify in place 5. Lazy evaluation magic --- ## Example Data We will demonstrate dplyr's functionality using the nycflights13 data. ```r library(dplyr) library(nycflights13) flights ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 517 515 2 830 819 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 533 529 4 850 830 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 542 540 2 923 850 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 544 545 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 </span><span style='text-decoration: underline;'>1</span><span>022 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 600 -</span><span style='color: #BB0000;'>6</span><span> 812 837 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 558 -</span><span style='color: #BB0000;'>4</span><span> 740 728 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 555 600 -</span><span style='color: #BB0000;'>5</span><span> 913 854 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 709 723 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 838 846 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 558 600 -</span><span style='color: #BB0000;'>2</span><span> 753 745 ## </span><span style='color: #949494;'># … with 336,766 more rows, and 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ## filter() - March flights ```r flights %>% filter(month == 3) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 28,834 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 4 </span><span style='text-decoration: underline;'>2</span><span>159 125 318 56 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 50 </span><span style='text-decoration: underline;'>2</span><span>358 52 526 438 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 117 </span><span style='text-decoration: underline;'>2</span><span>245 152 223 </span><span style='text-decoration: underline;'>2</span><span>354 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 454 500 -</span><span style='color: #BB0000;'>6</span><span> 633 648 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 505 515 -</span><span style='color: #BB0000;'>10</span><span> 746 810 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 521 530 -</span><span style='color: #BB0000;'>9</span><span> 813 827 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 537 540 -</span><span style='color: #BB0000;'>3</span><span> 856 850 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 541 545 -</span><span style='color: #BB0000;'>4</span><span> </span><span style='text-decoration: underline;'>1</span><span>014 </span><span style='text-decoration: underline;'>1</span><span>023 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 549 600 -</span><span style='color: #BB0000;'>11</span><span> 639 703 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 550 600 -</span><span style='color: #BB0000;'>10</span><span> 747 801 ## </span><span style='color: #949494;'># … with 28,824 more rows, and 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ## filter() - Flights in the first 7 days of March ```r flights %>% filter(month == 3, day <= 7) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 6,530 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 4 </span><span style='text-decoration: underline;'>2</span><span>159 125 318 56 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 50 </span><span style='text-decoration: underline;'>2</span><span>358 52 526 438 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 117 </span><span style='text-decoration: underline;'>2</span><span>245 152 223 </span><span style='text-decoration: underline;'>2</span><span>354 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 454 500 -</span><span style='color: #BB0000;'>6</span><span> 633 648 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 505 515 -</span><span style='color: #BB0000;'>10</span><span> 746 810 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 521 530 -</span><span style='color: #BB0000;'>9</span><span> 813 827 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 537 540 -</span><span style='color: #BB0000;'>3</span><span> 856 850 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 541 545 -</span><span style='color: #BB0000;'>4</span><span> </span><span style='text-decoration: underline;'>1</span><span>014 </span><span style='text-decoration: underline;'>1</span><span>023 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 549 600 -</span><span style='color: #BB0000;'>11</span><span> 639 703 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 550 600 -</span><span style='color: #BB0000;'>10</span><span> 747 801 ## </span><span style='color: #949494;'># … with 6,520 more rows, and 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ## filter() - Flights to LAX *or* JFK in March ```r flights %>% filter(dest == "LAX" | dest == "JFK", month==3) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 1,178 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 607 610 -</span><span style='color: #BB0000;'>3</span><span> 832 925 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 629 632 -</span><span style='color: #BB0000;'>3</span><span> 844 952 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 657 700 -</span><span style='color: #BB0000;'>3</span><span> 953 </span><span style='text-decoration: underline;'>1</span><span>034 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 714 715 -</span><span style='color: #BB0000;'>1</span><span> 939 </span><span style='text-decoration: underline;'>1</span><span>037 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 716 710 6 958 </span><span style='text-decoration: underline;'>1</span><span>035 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 727 730 -</span><span style='color: #BB0000;'>3</span><span> </span><span style='text-decoration: underline;'>1</span><span>007 </span><span style='text-decoration: underline;'>1</span><span>100 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 836 840 -</span><span style='color: #BB0000;'>4</span><span> </span><span style='text-decoration: underline;'>1</span><span>111 </span><span style='text-decoration: underline;'>1</span><span>157 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 857 900 -</span><span style='color: #BB0000;'>3</span><span> </span><span style='text-decoration: underline;'>1</span><span>202 </span><span style='text-decoration: underline;'>1</span><span>221 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 903 900 3 </span><span style='text-decoration: underline;'>1</span><span>157 </span><span style='text-decoration: underline;'>1</span><span>220 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 1 904 831 33 </span><span style='text-decoration: underline;'>1</span><span>150 </span><span style='text-decoration: underline;'>1</span><span>151 ## </span><span style='color: #949494;'># … with 1,168 more rows, and 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ## slice() - First 10 flights ```r flights %>% slice(1:10) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 10 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 517 515 2 830 819 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 533 529 4 850 830 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 542 540 2 923 850 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 544 545 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 </span><span style='text-decoration: underline;'>1</span><span>022 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 600 -</span><span style='color: #BB0000;'>6</span><span> 812 837 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 558 -</span><span style='color: #BB0000;'>4</span><span> 740 728 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 555 600 -</span><span style='color: #BB0000;'>5</span><span> 913 854 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 709 723 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 838 846 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 558 600 -</span><span style='color: #BB0000;'>2</span><span> 753 745 ## </span><span style='color: #949494;'># … with 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, ## # tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ## slice() - Last 5 flights ```r flights %>% slice((n()-4):n()) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 5 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 9 30 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='text-decoration: underline;'>1</span><span>455 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #BB0000;'>NA</span><span> </span><span style='text-decoration: underline;'>1</span><span>634 ## </span><span style='color: #BCBCBC;'>2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 9 30 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='text-decoration: underline;'>2</span><span>200 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #BB0000;'>NA</span><span> </span><span style='text-decoration: underline;'>2</span><span>312 ## </span><span style='color: #BCBCBC;'>3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 9 30 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='text-decoration: underline;'>1</span><span>210 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #BB0000;'>NA</span><span> </span><span style='text-decoration: underline;'>1</span><span>330 ## </span><span style='color: #BCBCBC;'>4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 9 30 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='text-decoration: underline;'>1</span><span>159 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #BB0000;'>NA</span><span> </span><span style='text-decoration: underline;'>1</span><span>344 ## </span><span style='color: #BCBCBC;'>5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 9 30 </span><span style='color: #BB0000;'>NA</span><span> 840 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #BB0000;'>NA</span><span> </span><span style='text-decoration: underline;'>1</span><span>020 ## </span><span style='color: #949494;'># … with 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, ## # tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ## select() - Individual Columns ```r flights %>% select(year, month, day) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 3</span><span> ## year month day ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #949494;'># … with 336,766 more rows</span><span> </span></CODE></PRE> --- ## select() - Exclude Columns ```r flights %>% select(-year, -month, -day) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 16</span><span> ## dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> 517 515 2 830 819 11 UA ## </span><span style='color: #BCBCBC;'> 2</span><span> 533 529 4 850 830 20 UA ## </span><span style='color: #BCBCBC;'> 3</span><span> 542 540 2 923 850 33 AA ## </span><span style='color: #BCBCBC;'> 4</span><span> 544 545 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 </span><span style='text-decoration: underline;'>1</span><span>022 -</span><span style='color: #BB0000;'>18</span><span> B6 ## </span><span style='color: #BCBCBC;'> 5</span><span> 554 600 -</span><span style='color: #BB0000;'>6</span><span> 812 837 -</span><span style='color: #BB0000;'>25</span><span> DL ## </span><span style='color: #BCBCBC;'> 6</span><span> 554 558 -</span><span style='color: #BB0000;'>4</span><span> 740 728 12 UA ## </span><span style='color: #BCBCBC;'> 7</span><span> 555 600 -</span><span style='color: #BB0000;'>5</span><span> 913 854 19 B6 ## </span><span style='color: #BCBCBC;'> 8</span><span> 557 600 -</span><span style='color: #BB0000;'>3</span><span> 709 723 -</span><span style='color: #BB0000;'>14</span><span> EV ## </span><span style='color: #BCBCBC;'> 9</span><span> 557 600 -</span><span style='color: #BB0000;'>3</span><span> 838 846 -</span><span style='color: #BB0000;'>8</span><span> B6 ## </span><span style='color: #BCBCBC;'>10</span><span> 558 600 -</span><span style='color: #BB0000;'>2</span><span> 753 745 8 AA ## </span><span style='color: #949494;'># … with 336,766 more rows, and 9 more variables: flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ## select() - Ranges ```r flights %>% select(year:day) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 3</span><span> ## year month day ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 ## </span><span style='color: #949494;'># … with 336,766 more rows</span><span> </span></CODE></PRE> --- ## select() - Exclusion Ranges ```r flights %>% select(-(year:day)) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 16</span><span> ## dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> 517 515 2 830 819 11 UA ## </span><span style='color: #BCBCBC;'> 2</span><span> 533 529 4 850 830 20 UA ## </span><span style='color: #BCBCBC;'> 3</span><span> 542 540 2 923 850 33 AA ## </span><span style='color: #BCBCBC;'> 4</span><span> 544 545 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 </span><span style='text-decoration: underline;'>1</span><span>022 -</span><span style='color: #BB0000;'>18</span><span> B6 ## </span><span style='color: #BCBCBC;'> 5</span><span> 554 600 -</span><span style='color: #BB0000;'>6</span><span> 812 837 -</span><span style='color: #BB0000;'>25</span><span> DL ## </span><span style='color: #BCBCBC;'> 6</span><span> 554 558 -</span><span style='color: #BB0000;'>4</span><span> 740 728 12 UA ## </span><span style='color: #BCBCBC;'> 7</span><span> 555 600 -</span><span style='color: #BB0000;'>5</span><span> 913 854 19 B6 ## </span><span style='color: #BCBCBC;'> 8</span><span> 557 600 -</span><span style='color: #BB0000;'>3</span><span> 709 723 -</span><span style='color: #BB0000;'>14</span><span> EV ## </span><span style='color: #BCBCBC;'> 9</span><span> 557 600 -</span><span style='color: #BB0000;'>3</span><span> 838 846 -</span><span style='color: #BB0000;'>8</span><span> B6 ## </span><span style='color: #BCBCBC;'>10</span><span> 558 600 -</span><span style='color: #BB0000;'>2</span><span> 753 745 8 AA ## </span><span style='color: #949494;'># … with 336,766 more rows, and 9 more variables: flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- class: split-50 ## select() - Matching ```r flights %>% select(contains("dep"), contains("arr")) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 7</span><span> ## dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay carrier ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> 517 515 2 830 819 11 UA ## </span><span style='color: #BCBCBC;'> 2</span><span> 533 529 4 850 830 20 UA ## </span><span style='color: #BCBCBC;'> 3</span><span> 542 540 2 923 850 33 AA ## </span><span style='color: #BCBCBC;'> 4</span><span> 544 545 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 </span><span style='text-decoration: underline;'>1</span><span>022 -</span><span style='color: #BB0000;'>18</span><span> B6 ## </span><span style='color: #BCBCBC;'> 5</span><span> 554 600 -</span><span style='color: #BB0000;'>6</span><span> 812 837 -</span><span style='color: #BB0000;'>25</span><span> DL ## </span><span style='color: #BCBCBC;'> 6</span><span> 554 558 -</span><span style='color: #BB0000;'>4</span><span> 740 728 12 UA ## </span><span style='color: #BCBCBC;'> 7</span><span> 555 600 -</span><span style='color: #BB0000;'>5</span><span> 913 854 19 B6 ## </span><span style='color: #BCBCBC;'> 8</span><span> 557 600 -</span><span style='color: #BB0000;'>3</span><span> 709 723 -</span><span style='color: #BB0000;'>14</span><span> EV ## </span><span style='color: #BCBCBC;'> 9</span><span> 557 600 -</span><span style='color: #BB0000;'>3</span><span> 838 846 -</span><span style='color: #BB0000;'>8</span><span> B6 ## </span><span style='color: #BCBCBC;'>10</span><span> 558 600 -</span><span style='color: #BB0000;'>2</span><span> 753 745 8 AA ## </span><span style='color: #949494;'># … with 336,766 more rows</span><span> </span></CODE></PRE> --- ```r flights %>% select(starts_with("dep"), starts_with("arr")) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 4</span><span> ## dep_time dep_delay arr_time arr_delay ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> 517 2 830 11 ## </span><span style='color: #BCBCBC;'> 2</span><span> 533 4 850 20 ## </span><span style='color: #BCBCBC;'> 3</span><span> 542 2 923 33 ## </span><span style='color: #BCBCBC;'> 4</span><span> 544 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 -</span><span style='color: #BB0000;'>18</span><span> ## </span><span style='color: #BCBCBC;'> 5</span><span> 554 -</span><span style='color: #BB0000;'>6</span><span> 812 -</span><span style='color: #BB0000;'>25</span><span> ## </span><span style='color: #BCBCBC;'> 6</span><span> 554 -</span><span style='color: #BB0000;'>4</span><span> 740 12 ## </span><span style='color: #BCBCBC;'> 7</span><span> 555 -</span><span style='color: #BB0000;'>5</span><span> 913 19 ## </span><span style='color: #BCBCBC;'> 8</span><span> 557 -</span><span style='color: #BB0000;'>3</span><span> 709 -</span><span style='color: #BB0000;'>14</span><span> ## </span><span style='color: #BCBCBC;'> 9</span><span> 557 -</span><span style='color: #BB0000;'>3</span><span> 838 -</span><span style='color: #BB0000;'>8</span><span> ## </span><span style='color: #BCBCBC;'>10</span><span> 558 -</span><span style='color: #BB0000;'>2</span><span> 753 8 ## </span><span style='color: #949494;'># … with 336,766 more rows</span><span> </span></CODE></PRE> Some other helpers (provide by tidyselect): .center[ `starts_with`, `ends_with`, `everything`, `matches`, `num_range`, `one_of`, `everything`, `last_col`. ] --- ## select_if() - Get non-numeric columns ```r flights %>% select_if(function(x) !is.numeric(x)) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 5</span><span> ## carrier tailnum origin dest time_hour ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dttm></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> UA N14228 EWR IAH 2013-01-01 </span><span style='color: #949494;'>05:00:00</span><span> ## </span><span style='color: #BCBCBC;'> 2</span><span> UA N24211 LGA IAH 2013-01-01 </span><span style='color: #949494;'>05:00:00</span><span> ## </span><span style='color: #BCBCBC;'> 3</span><span> AA N619AA JFK MIA 2013-01-01 </span><span style='color: #949494;'>05:00:00</span><span> ## </span><span style='color: #BCBCBC;'> 4</span><span> B6 N804JB JFK BQN 2013-01-01 </span><span style='color: #949494;'>05:00:00</span><span> ## </span><span style='color: #BCBCBC;'> 5</span><span> DL N668DN LGA ATL 2013-01-01 </span><span style='color: #949494;'>06:00:00</span><span> ## </span><span style='color: #BCBCBC;'> 6</span><span> UA N39463 EWR ORD 2013-01-01 </span><span style='color: #949494;'>05:00:00</span><span> ## </span><span style='color: #BCBCBC;'> 7</span><span> B6 N516JB EWR FLL 2013-01-01 </span><span style='color: #949494;'>06:00:00</span><span> ## </span><span style='color: #BCBCBC;'> 8</span><span> EV N829AS LGA IAD 2013-01-01 </span><span style='color: #949494;'>06:00:00</span><span> ## </span><span style='color: #BCBCBC;'> 9</span><span> B6 N593JB JFK MCO 2013-01-01 </span><span style='color: #949494;'>06:00:00</span><span> ## </span><span style='color: #BCBCBC;'>10</span><span> AA N3ALAA LGA ORD 2013-01-01 </span><span style='color: #949494;'>06:00:00</span><span> ## </span><span style='color: #949494;'># … with 336,766 more rows</span><span> </span></CODE></PRE> --- ## rename() - Change column names ```r flights %>% rename(tail_number = tailnum) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 517 515 2 830 819 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 533 529 4 850 830 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 542 540 2 923 850 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 544 545 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 </span><span style='text-decoration: underline;'>1</span><span>022 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 600 -</span><span style='color: #BB0000;'>6</span><span> 812 837 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 558 -</span><span style='color: #BB0000;'>4</span><span> 740 728 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 555 600 -</span><span style='color: #BB0000;'>5</span><span> 913 854 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 709 723 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 838 846 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 558 600 -</span><span style='color: #BB0000;'>2</span><span> 753 745 ## </span><span style='color: #949494;'># … with 336,766 more rows, and 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, tail_number </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ## select() vs. rename() .small[ ```r flights %>% select(tail_number = tailnum) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 1</span><span> ## tail_number ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> N14228 ## </span><span style='color: #BCBCBC;'> 2</span><span> N24211 ## </span><span style='color: #BCBCBC;'> 3</span><span> N619AA ## </span><span style='color: #BCBCBC;'> 4</span><span> N804JB ## </span><span style='color: #BCBCBC;'> 5</span><span> N668DN ## </span><span style='color: #BCBCBC;'> 6</span><span> N39463 ## </span><span style='color: #BCBCBC;'> 7</span><span> N516JB ## </span><span style='color: #BCBCBC;'> 8</span><span> N829AS ## </span><span style='color: #BCBCBC;'> 9</span><span> N593JB ## </span><span style='color: #BCBCBC;'>10</span><span> N3ALAA ## </span><span style='color: #949494;'># … with 336,766 more rows</span><span> </span></CODE></PRE> ```r flights %>% rename(tail_number = tailnum) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 517 515 2 830 819 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 533 529 4 850 830 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 542 540 2 923 850 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 544 545 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 </span><span style='text-decoration: underline;'>1</span><span>022 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 600 -</span><span style='color: #BB0000;'>6</span><span> 812 837 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 558 -</span><span style='color: #BB0000;'>4</span><span> 740 728 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 555 600 -</span><span style='color: #BB0000;'>5</span><span> 913 854 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 709 723 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 838 846 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 558 600 -</span><span style='color: #BB0000;'>2</span><span> 753 745 ## </span><span style='color: #949494;'># … with 336,766 more rows, and 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, tail_number </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> ] --- ## pull() ```r names(flights) ``` ``` ## [1] "year" "month" "day" "dep_time" ## [5] "sched_dep_time" "dep_delay" "arr_time" "sched_arr_time" ## [9] "arr_delay" "carrier" "flight" "tailnum" ## [13] "origin" "dest" "air_time" "distance" ## [17] "hour" "minute" "time_hour" ``` ```r flights %>% pull("year") %>% head() ``` ``` ## [1] 2013 2013 2013 2013 2013 2013 ``` ```r flights %>% pull(1) %>% head() ``` ``` ## [1] 2013 2013 2013 2013 2013 2013 ``` ```r flights %>% pull(-1) %>% head() ``` ``` ## [1] "2013-01-01 05:00:00 EST" "2013-01-01 05:00:00 EST" ## [3] "2013-01-01 05:00:00 EST" "2013-01-01 05:00:00 EST" ## [5] "2013-01-01 06:00:00 EST" "2013-01-01 05:00:00 EST" ``` --- ## arrange() - Sort data ```r flights %>% filter(month==3,day==2) %>% arrange(origin, dest) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 765 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 2 </span><span style='text-decoration: underline;'>1</span><span>336 </span><span style='text-decoration: underline;'>1</span><span>329 7 </span><span style='text-decoration: underline;'>1</span><span>426 </span><span style='text-decoration: underline;'>1</span><span>432 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 2 628 629 -</span><span style='color: #BB0000;'>1</span><span> 837 849 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 2 637 640 -</span><span style='color: #BB0000;'>3</span><span> 903 915 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 2 743 745 -</span><span style='color: #BB0000;'>2</span><span> 945 </span><span style='text-decoration: underline;'>1</span><span>010 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 2 857 900 -</span><span style='color: #BB0000;'>3</span><span> </span><span style='text-decoration: underline;'>1</span><span>117 </span><span style='text-decoration: underline;'>1</span><span>126 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 2 </span><span style='text-decoration: underline;'>1</span><span>027 </span><span style='text-decoration: underline;'>1</span><span>030 -</span><span style='color: #BB0000;'>3</span><span> </span><span style='text-decoration: underline;'>1</span><span>234 </span><span style='text-decoration: underline;'>1</span><span>247 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 2 </span><span style='text-decoration: underline;'>1</span><span>134 </span><span style='text-decoration: underline;'>1</span><span>145 -</span><span style='color: #BB0000;'>11</span><span> </span><span style='text-decoration: underline;'>1</span><span>332 </span><span style='text-decoration: underline;'>1</span><span>359 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 2 </span><span style='text-decoration: underline;'>1</span><span>412 </span><span style='text-decoration: underline;'>1</span><span>415 -</span><span style='color: #BB0000;'>3</span><span> </span><span style='text-decoration: underline;'>1</span><span>636 </span><span style='text-decoration: underline;'>1</span><span>630 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 2 </span><span style='text-decoration: underline;'>1</span><span>633 </span><span style='text-decoration: underline;'>1</span><span>636 -</span><span style='color: #BB0000;'>3</span><span> </span><span style='text-decoration: underline;'>1</span><span>848 </span><span style='text-decoration: underline;'>1</span><span>908 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 2 </span><span style='text-decoration: underline;'>1</span><span>655 </span><span style='text-decoration: underline;'>1</span><span>700 -</span><span style='color: #BB0000;'>5</span><span> </span><span style='text-decoration: underline;'>1</span><span>857 </span><span style='text-decoration: underline;'>1</span><span>924 ## </span><span style='color: #949494;'># … with 755 more rows, and 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ## arrange() & desc() - Descending order ```r flights %>% filter(month==3,day==2) %>% arrange(desc(origin), dest) %>% select(origin, dest, tailnum) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 765 x 3</span><span> ## origin dest tailnum ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> LGA ATL N928AT ## </span><span style='color: #BCBCBC;'> 2</span><span> LGA ATL N623DL ## </span><span style='color: #BCBCBC;'> 3</span><span> LGA ATL N680DA ## </span><span style='color: #BCBCBC;'> 4</span><span> LGA ATL N996AT ## </span><span style='color: #BCBCBC;'> 5</span><span> LGA ATL N510MQ ## </span><span style='color: #BCBCBC;'> 6</span><span> LGA ATL N663DN ## </span><span style='color: #BCBCBC;'> 7</span><span> LGA ATL N942DL ## </span><span style='color: #BCBCBC;'> 8</span><span> LGA ATL N511MQ ## </span><span style='color: #BCBCBC;'> 9</span><span> LGA ATL N910DE ## </span><span style='color: #BCBCBC;'>10</span><span> LGA ATL N902DE ## </span><span style='color: #949494;'># … with 755 more rows</span><span> </span></CODE></PRE> --- ## mutate() - Modify columns ```r flights %>% select(year:day) %>% mutate(date = paste(year,month,day,sep="/")) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 4</span><span> ## year month day date ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 2013/1/1 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 2013/1/1 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 2013/1/1 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 2013/1/1 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 2013/1/1 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 2013/1/1 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 2013/1/1 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 2013/1/1 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 2013/1/1 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 2013/1/1 ## </span><span style='color: #949494;'># … with 336,766 more rows</span><span> </span></CODE></PRE> --- ## transmute() - Create new tibble from existing columns ```r flights %>% select(year:day) %>% transmute(date = paste(year,month,day,sep="/")) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 1</span><span> ## date ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> 2013/1/1 ## </span><span style='color: #BCBCBC;'> 2</span><span> 2013/1/1 ## </span><span style='color: #BCBCBC;'> 3</span><span> 2013/1/1 ## </span><span style='color: #BCBCBC;'> 4</span><span> 2013/1/1 ## </span><span style='color: #BCBCBC;'> 5</span><span> 2013/1/1 ## </span><span style='color: #BCBCBC;'> 6</span><span> 2013/1/1 ## </span><span style='color: #BCBCBC;'> 7</span><span> 2013/1/1 ## </span><span style='color: #BCBCBC;'> 8</span><span> 2013/1/1 ## </span><span style='color: #BCBCBC;'> 9</span><span> 2013/1/1 ## </span><span style='color: #BCBCBC;'>10</span><span> 2013/1/1 ## </span><span style='color: #949494;'># … with 336,766 more rows</span><span> </span></CODE></PRE> --- ## distinct() - Find unique rows ```r flights %>% select(origin, dest) %>% distinct() %>% arrange(origin,dest) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 224 x 2</span><span> ## origin dest ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> EWR ALB ## </span><span style='color: #BCBCBC;'> 2</span><span> EWR ANC ## </span><span style='color: #BCBCBC;'> 3</span><span> EWR ATL ## </span><span style='color: #BCBCBC;'> 4</span><span> EWR AUS ## </span><span style='color: #BCBCBC;'> 5</span><span> EWR AVL ## </span><span style='color: #BCBCBC;'> 6</span><span> EWR BDL ## </span><span style='color: #BCBCBC;'> 7</span><span> EWR BNA ## </span><span style='color: #BCBCBC;'> 8</span><span> EWR BOS ## </span><span style='color: #BCBCBC;'> 9</span><span> EWR BQN ## </span><span style='color: #BCBCBC;'>10</span><span> EWR BTV ## </span><span style='color: #949494;'># … with 214 more rows</span><span> </span></CODE></PRE> --- ## Sampling rows .small[ ```r flights %>% sample_n(10) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 10 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 6 4 859 900 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>208 </span><span style='text-decoration: underline;'>1</span><span>204 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 3 </span><span style='text-decoration: underline;'>1</span><span>614 </span><span style='text-decoration: underline;'>1</span><span>555 19 </span><span style='text-decoration: underline;'>1</span><span>733 </span><span style='text-decoration: underline;'>1</span><span>715 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 8 27 749 740 9 </span><span style='text-decoration: underline;'>1</span><span>029 </span><span style='text-decoration: underline;'>1</span><span>055 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 11 4 757 757 0 </span><span style='text-decoration: underline;'>1</span><span>004 </span><span style='text-decoration: underline;'>1</span><span>024 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 6 9 </span><span style='text-decoration: underline;'>1</span><span>636 </span><span style='text-decoration: underline;'>1</span><span>630 6 </span><span style='text-decoration: underline;'>1</span><span>839 </span><span style='text-decoration: underline;'>1</span><span>920 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 5 8 624 630 -</span><span style='color: #BB0000;'>6</span><span> 829 849 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 9 11 </span><span style='text-decoration: underline;'>1</span><span>133 </span><span style='text-decoration: underline;'>1</span><span>130 3 </span><span style='text-decoration: underline;'>1</span><span>305 </span><span style='text-decoration: underline;'>1</span><span>311 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 12 837 830 7 931 944 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 10 4 </span><span style='text-decoration: underline;'>1</span><span>816 </span><span style='text-decoration: underline;'>1</span><span>800 16 </span><span style='text-decoration: underline;'>1</span><span>949 </span><span style='text-decoration: underline;'>1</span><span>920 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 5 31 658 700 -</span><span style='color: #BB0000;'>2</span><span> 957 </span><span style='text-decoration: underline;'>1</span><span>034 ## </span><span style='color: #949494;'># … with 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, ## # tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> ```r flights %>% sample_frac(0.00003) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 10 x 19</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 4 1 </span><span style='text-decoration: underline;'>1</span><span>255 </span><span style='text-decoration: underline;'>1</span><span>120 95 </span><span style='text-decoration: underline;'>1</span><span>430 </span><span style='text-decoration: underline;'>1</span><span>302 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 12 5 937 945 -</span><span style='color: #BB0000;'>8</span><span> </span><span style='text-decoration: underline;'>1</span><span>307 </span><span style='text-decoration: underline;'>1</span><span>235 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 4 2 </span><span style='text-decoration: underline;'>1</span><span>051 </span><span style='text-decoration: underline;'>1</span><span>056 -</span><span style='color: #BB0000;'>5</span><span> </span><span style='text-decoration: underline;'>1</span><span>206 </span><span style='text-decoration: underline;'>1</span><span>215 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 10 6 </span><span style='text-decoration: underline;'>1</span><span>040 </span><span style='text-decoration: underline;'>1</span><span>046 -</span><span style='color: #BB0000;'>6</span><span> </span><span style='text-decoration: underline;'>1</span><span>322 </span><span style='text-decoration: underline;'>1</span><span>330 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 10 17 </span><span style='text-decoration: underline;'>1</span><span>128 </span><span style='text-decoration: underline;'>1</span><span>130 -</span><span style='color: #BB0000;'>2</span><span> </span><span style='text-decoration: underline;'>1</span><span>335 </span><span style='text-decoration: underline;'>1</span><span>334 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 3 23 </span><span style='text-decoration: underline;'>1</span><span>045 </span><span style='text-decoration: underline;'>1</span><span>049 -</span><span style='color: #BB0000;'>4</span><span> </span><span style='text-decoration: underline;'>1</span><span>354 </span><span style='text-decoration: underline;'>1</span><span>340 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 12 6 552 600 -</span><span style='color: #BB0000;'>8</span><span> 649 701 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 2 </span><span style='text-decoration: underline;'>2</span><span>141 </span><span style='text-decoration: underline;'>2</span><span>145 -</span><span style='color: #BB0000;'>4</span><span> </span><span style='text-decoration: underline;'>2</span><span>301 </span><span style='text-decoration: underline;'>2</span><span>311 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 12 3 653 650 3 915 919 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 2 25 604 610 -</span><span style='color: #BB0000;'>6</span><span> 800 818 ## </span><span style='color: #949494;'># … with 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, ## # tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> ] --- ## summarise() ```r flights %>% summarize(n(), min(dep_delay), max(dep_delay)) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 1 x 3</span><span> ## `n()` `min(dep_delay)` `max(dep_delay)` ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> </span><span style='text-decoration: underline;'>336</span><span>776 </span><span style='color: #BB0000;'>NA</span><span> </span><span style='color: #BB0000;'>NA</span><span> </span></CODE></PRE> -- ```r flights %>% summarize( n = n(), min_dep_delay = min(dep_delay, na.rm = TRUE), max_dep_delay = max(dep_delay, na.rm = TRUE) ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 1 x 3</span><span> ## n min_dep_delay max_dep_delay ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> </span><span style='text-decoration: underline;'>336</span><span>776 -</span><span style='color: #BB0000;'>43</span><span> </span><span style='text-decoration: underline;'>1</span><span>301 </span></CODE></PRE> --- ## group_by() ```r flights %>% group_by(origin) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 19</span><span> ## </span><span style='color: #949494;'># Groups: origin [3]</span><span> ## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time ## </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 517 515 2 830 819 ## </span><span style='color: #BCBCBC;'> 2</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 533 529 4 850 830 ## </span><span style='color: #BCBCBC;'> 3</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 542 540 2 923 850 ## </span><span style='color: #BCBCBC;'> 4</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 544 545 -</span><span style='color: #BB0000;'>1</span><span> </span><span style='text-decoration: underline;'>1</span><span>004 </span><span style='text-decoration: underline;'>1</span><span>022 ## </span><span style='color: #BCBCBC;'> 5</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 600 -</span><span style='color: #BB0000;'>6</span><span> 812 837 ## </span><span style='color: #BCBCBC;'> 6</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 554 558 -</span><span style='color: #BB0000;'>4</span><span> 740 728 ## </span><span style='color: #BCBCBC;'> 7</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 555 600 -</span><span style='color: #BB0000;'>5</span><span> 913 854 ## </span><span style='color: #BCBCBC;'> 8</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 709 723 ## </span><span style='color: #BCBCBC;'> 9</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 557 600 -</span><span style='color: #BB0000;'>3</span><span> 838 846 ## </span><span style='color: #BCBCBC;'>10</span><span> </span><span style='text-decoration: underline;'>2</span><span>013 1 1 558 600 -</span><span style='color: #BB0000;'>2</span><span> 753 745 ## </span><span style='color: #949494;'># … with 336,766 more rows, and 11 more variables: arr_delay </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, ## # carrier </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, flight </span><span style='color: #949494;font-style: italic;'><int></span><span style='color: #949494;'>, tailnum </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, origin </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, dest </span><span style='color: #949494;font-style: italic;'><chr></span><span style='color: #949494;'>, ## # air_time </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, distance </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, hour </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, minute </span><span style='color: #949494;font-style: italic;'><dbl></span><span style='color: #949494;'>, time_hour </span><span style='color: #949494;font-style: italic;'><dttm></span><span> </span></CODE></PRE> --- ## summarise() with group_by() ```r flights %>% group_by(origin) %>% summarize( n = n(), min_dep_delay = min(dep_delay, na.rm = TRUE), max_dep_delay = max(dep_delay, na.rm = TRUE) ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 3 x 4</span><span> ## origin n min_dep_delay max_dep_delay ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> EWR </span><span style='text-decoration: underline;'>120</span><span>835 -</span><span style='color: #BB0000;'>25</span><span> </span><span style='text-decoration: underline;'>1</span><span>126 ## </span><span style='color: #BCBCBC;'>2</span><span> JFK </span><span style='text-decoration: underline;'>111</span><span>279 -</span><span style='color: #BB0000;'>43</span><span> </span><span style='text-decoration: underline;'>1</span><span>301 ## </span><span style='color: #BCBCBC;'>3</span><span> LGA </span><span style='text-decoration: underline;'>104</span><span>662 -</span><span style='color: #BB0000;'>33</span><span> 911 </span></CODE></PRE> --- ```r flights %>% group_by(origin, carrier) %>% summarize( n = n(), min_dep_delay = min(dep_delay, na.rm = TRUE), max_dep_delay = max(dep_delay, na.rm = TRUE) ) %>% filter(n > 10000) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 10 x 5</span><span> ## </span><span style='color: #949494;'># Groups: origin [3]</span><span> ## origin carrier n min_dep_delay max_dep_delay ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> EWR EV </span><span style='text-decoration: underline;'>43</span><span>939 -</span><span style='color: #BB0000;'>25</span><span> 548 ## </span><span style='color: #BCBCBC;'> 2</span><span> EWR UA </span><span style='text-decoration: underline;'>46</span><span>087 -</span><span style='color: #BB0000;'>18</span><span> 424 ## </span><span style='color: #BCBCBC;'> 3</span><span> JFK 9E </span><span style='text-decoration: underline;'>14</span><span>651 -</span><span style='color: #BB0000;'>24</span><span> 747 ## </span><span style='color: #BCBCBC;'> 4</span><span> JFK AA </span><span style='text-decoration: underline;'>13</span><span>783 -</span><span style='color: #BB0000;'>15</span><span> </span><span style='text-decoration: underline;'>1</span><span>014 ## </span><span style='color: #BCBCBC;'> 5</span><span> JFK B6 </span><span style='text-decoration: underline;'>42</span><span>076 -</span><span style='color: #BB0000;'>43</span><span> 453 ## </span><span style='color: #BCBCBC;'> 6</span><span> JFK DL </span><span style='text-decoration: underline;'>20</span><span>701 -</span><span style='color: #BB0000;'>18</span><span> 960 ## </span><span style='color: #BCBCBC;'> 7</span><span> LGA AA </span><span style='text-decoration: underline;'>15</span><span>459 -</span><span style='color: #BB0000;'>24</span><span> 803 ## </span><span style='color: #BCBCBC;'> 8</span><span> LGA DL </span><span style='text-decoration: underline;'>23</span><span>067 -</span><span style='color: #BB0000;'>33</span><span> 911 ## </span><span style='color: #BCBCBC;'> 9</span><span> LGA MQ </span><span style='text-decoration: underline;'>16</span><span>928 -</span><span style='color: #BB0000;'>26</span><span> 366 ## </span><span style='color: #BCBCBC;'>10</span><span> LGA US </span><span style='text-decoration: underline;'>13</span><span>136 -</span><span style='color: #BB0000;'>18</span><span> 500 </span></CODE></PRE> --- ## count() .pull-left[ ```r flights %>% group_by(origin, carrier) %>% summarize(n = n()) %>% ungroup() ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 35 x 3</span><span> ## origin carrier n ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> EWR 9E </span><span style='text-decoration: underline;'>1</span><span>268 ## </span><span style='color: #BCBCBC;'> 2</span><span> EWR AA </span><span style='text-decoration: underline;'>3</span><span>487 ## </span><span style='color: #BCBCBC;'> 3</span><span> EWR AS 714 ## </span><span style='color: #BCBCBC;'> 4</span><span> EWR B6 </span><span style='text-decoration: underline;'>6</span><span>557 ## </span><span style='color: #BCBCBC;'> 5</span><span> EWR DL </span><span style='text-decoration: underline;'>4</span><span>342 ## </span><span style='color: #BCBCBC;'> 6</span><span> EWR EV </span><span style='text-decoration: underline;'>43</span><span>939 ## </span><span style='color: #BCBCBC;'> 7</span><span> EWR MQ </span><span style='text-decoration: underline;'>2</span><span>276 ## </span><span style='color: #BCBCBC;'> 8</span><span> EWR OO 6 ## </span><span style='color: #BCBCBC;'> 9</span><span> EWR UA </span><span style='text-decoration: underline;'>46</span><span>087 ## </span><span style='color: #BCBCBC;'>10</span><span> EWR US </span><span style='text-decoration: underline;'>4</span><span>405 ## </span><span style='color: #949494;'># … with 25 more rows</span><span> </span></CODE></PRE> ] .pull-right[ ```r flights %>% count(origin, carrier) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 35 x 3</span><span> ## origin carrier n ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> EWR 9E </span><span style='text-decoration: underline;'>1</span><span>268 ## </span><span style='color: #BCBCBC;'> 2</span><span> EWR AA </span><span style='text-decoration: underline;'>3</span><span>487 ## </span><span style='color: #BCBCBC;'> 3</span><span> EWR AS 714 ## </span><span style='color: #BCBCBC;'> 4</span><span> EWR B6 </span><span style='text-decoration: underline;'>6</span><span>557 ## </span><span style='color: #BCBCBC;'> 5</span><span> EWR DL </span><span style='text-decoration: underline;'>4</span><span>342 ## </span><span style='color: #BCBCBC;'> 6</span><span> EWR EV </span><span style='text-decoration: underline;'>43</span><span>939 ## </span><span style='color: #BCBCBC;'> 7</span><span> EWR MQ </span><span style='text-decoration: underline;'>2</span><span>276 ## </span><span style='color: #BCBCBC;'> 8</span><span> EWR OO 6 ## </span><span style='color: #BCBCBC;'> 9</span><span> EWR UA </span><span style='text-decoration: underline;'>46</span><span>087 ## </span><span style='color: #BCBCBC;'>10</span><span> EWR US </span><span style='text-decoration: underline;'>4</span><span>405 ## </span><span style='color: #949494;'># … with 25 more rows</span><span> </span></CODE></PRE> ] --- ## mutate() with group_by() ```r flights %>% group_by(origin) %>% mutate( n = n(), ) %>% select(origin, n) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 336,776 x 2</span><span> ## </span><span style='color: #949494;'># Groups: origin [3]</span><span> ## origin n ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><int></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> EWR </span><span style='text-decoration: underline;'>120</span><span>835 ## </span><span style='color: #BCBCBC;'> 2</span><span> LGA </span><span style='text-decoration: underline;'>104</span><span>662 ## </span><span style='color: #BCBCBC;'> 3</span><span> JFK </span><span style='text-decoration: underline;'>111</span><span>279 ## </span><span style='color: #BCBCBC;'> 4</span><span> JFK </span><span style='text-decoration: underline;'>111</span><span>279 ## </span><span style='color: #BCBCBC;'> 5</span><span> LGA </span><span style='text-decoration: underline;'>104</span><span>662 ## </span><span style='color: #BCBCBC;'> 6</span><span> EWR </span><span style='text-decoration: underline;'>120</span><span>835 ## </span><span style='color: #BCBCBC;'> 7</span><span> EWR </span><span style='text-decoration: underline;'>120</span><span>835 ## </span><span style='color: #BCBCBC;'> 8</span><span> LGA </span><span style='text-decoration: underline;'>104</span><span>662 ## </span><span style='color: #BCBCBC;'> 9</span><span> JFK </span><span style='text-decoration: underline;'>111</span><span>279 ## </span><span style='color: #BCBCBC;'>10</span><span> LGA </span><span style='text-decoration: underline;'>104</span><span>662 ## </span><span style='color: #949494;'># … with 336,766 more rows</span><span> </span></CODE></PRE> --- ## Demos 1. How many flights to Los Angeles (LAX) did each of the legacy carriers (AA, UA, DL or US) have in May from JFK, and what was their average duration? <br/> 2. What was the shortest flight out of each airport in terms of distance? In terms of duration? --- ## Exercise 1 1. Which plane (check the tail number) flew out of each New York airport the most? <br/> 2. Which date should you fly on if you want to have the lowest possible average departure delay? What about arrival delay? --- class: middle .center[ <img src="imgs/hex-tidyr.png" width="50%" /> ] --- ## Gather <img src="imgs/tidyr_gather.png" width="60%" style="display: block; margin: auto;" /> .footnote[ From [data import cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-import.pdf) ] --- ## Spread <img src="imgs/tidyr_spread.png" width="70%" style="display: block; margin: auto;" /> .footnote[ From [data import cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-import.pdf) ] --- ## Separate <img src="imgs/tidyr_separate.png" width="70%" style="display: block; margin: auto;" /> .footnote[ From [data import cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-import.pdf) ] --- ## Unite <img src="imgs/tidyr_unite.png" width="70%" style="display: block; margin: auto;" /> .footnote[ From [data import cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-import.pdf) ] --- ## Example 1 - Grades Is the following data tidy? ```r (grades = tibble( name = c("Alice", "Bob", "Carol", "Dave"), hw_1 = c(19, 18, 18, 19), hw_2 = c(19, 20, 20, 19), hw_3 = c(18, 18, 18, 18), hw_4 = c(20, 16, 17, 19), exam_1 = c(89, 77, 96, 86), exam_2 = c(95, 88, 99, 82) )) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 4 x 7</span><span> ## name hw_1 hw_2 hw_3 hw_4 exam_1 exam_2 ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice 19 19 18 20 89 95 ## </span><span style='color: #BCBCBC;'>2</span><span> Bob 18 20 18 16 77 88 ## </span><span style='color: #BCBCBC;'>3</span><span> Carol 18 20 18 17 96 99 ## </span><span style='color: #BCBCBC;'>4</span><span> Dave 19 19 18 19 86 82 </span></CODE></PRE> -- How would we calculate a final score based on the following formula, `$$\text{score} = 0.6\,\frac{\sum\text{hw}_i}{80} + 0.4\,\frac{\sum\text{exam}_j}{200}$$` --- ## Semi-tidy approach ```r grades %>% mutate( hw_avg = (hw_1+hw_2+hw_3+hw_4)/4, exam_avg = (exam_1+exam_2)/2 ) %>% mutate( overall = 0.4*(exam_avg/100) + 0.6*(hw_avg/20) ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 4 x 10</span><span> ## name hw_1 hw_2 hw_3 hw_4 exam_1 exam_2 hw_avg exam_avg overall ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice 19 19 18 20 89 95 19 92 0.938 ## </span><span style='color: #BCBCBC;'>2</span><span> Bob 18 20 18 16 77 88 18 82.5 0.87 ## </span><span style='color: #BCBCBC;'>3</span><span> Carol 18 20 18 17 96 99 18.2 97.5 0.938 ## </span><span style='color: #BCBCBC;'>4</span><span> Dave 19 19 18 19 86 82 18.8 84 0.899 </span></CODE></PRE> --- ## Wide -> Long (`pivot_longer`) ```r tidyr::pivot_longer(grades, cols = hw_1:exam_2, names_to = "assignment", values_to = "score") ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 24 x 3</span><span> ## name assignment score ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> Alice hw_1 19 ## </span><span style='color: #BCBCBC;'> 2</span><span> Alice hw_2 19 ## </span><span style='color: #BCBCBC;'> 3</span><span> Alice hw_3 18 ## </span><span style='color: #BCBCBC;'> 4</span><span> Alice hw_4 20 ## </span><span style='color: #BCBCBC;'> 5</span><span> Alice exam_1 89 ## </span><span style='color: #BCBCBC;'> 6</span><span> Alice exam_2 95 ## </span><span style='color: #BCBCBC;'> 7</span><span> Bob hw_1 18 ## </span><span style='color: #BCBCBC;'> 8</span><span> Bob hw_2 20 ## </span><span style='color: #BCBCBC;'> 9</span><span> Bob hw_3 18 ## </span><span style='color: #BCBCBC;'>10</span><span> Bob hw_4 16 ## </span><span style='color: #949494;'># … with 14 more rows</span><span> </span></CODE></PRE> --- ```r tidyr::pivot_longer(grades, cols = hw_1:exam_2, names_to = c("type", "id"), names_sep = "_", values_to = "score") ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 24 x 4</span><span> ## name type id score ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'> 1</span><span> Alice hw 1 19 ## </span><span style='color: #BCBCBC;'> 2</span><span> Alice hw 2 19 ## </span><span style='color: #BCBCBC;'> 3</span><span> Alice hw 3 18 ## </span><span style='color: #BCBCBC;'> 4</span><span> Alice hw 4 20 ## </span><span style='color: #BCBCBC;'> 5</span><span> Alice exam 1 89 ## </span><span style='color: #BCBCBC;'> 6</span><span> Alice exam 2 95 ## </span><span style='color: #BCBCBC;'> 7</span><span> Bob hw 1 18 ## </span><span style='color: #BCBCBC;'> 8</span><span> Bob hw 2 20 ## </span><span style='color: #BCBCBC;'> 9</span><span> Bob hw 3 18 ## </span><span style='color: #BCBCBC;'>10</span><span> Bob hw 4 16 ## </span><span style='color: #949494;'># … with 14 more rows</span><span> </span></CODE></PRE> --- ## Tidy approach? ```r grades %>% tidyr::pivot_longer( cols = hw_1:exam_2, names_to = c("type", "id"), names_sep = "_", values_to = "score" ) %>% group_by(name, type) %>% summarize(total = sum(score)) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 8 x 3</span><span> ## </span><span style='color: #949494;'># Groups: name [4]</span><span> ## name type total ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice exam 184 ## </span><span style='color: #BCBCBC;'>2</span><span> Alice hw 76 ## </span><span style='color: #BCBCBC;'>3</span><span> Bob exam 165 ## </span><span style='color: #BCBCBC;'>4</span><span> Bob hw 72 ## </span><span style='color: #BCBCBC;'>5</span><span> Carol exam 195 ## </span><span style='color: #BCBCBC;'>6</span><span> Carol hw 73 ## </span><span style='color: #BCBCBC;'>7</span><span> Dave exam 168 ## </span><span style='color: #BCBCBC;'>8</span><span> Dave hw 75 </span></CODE></PRE> --- ## Long -> Wide (`pivot_wider`) ```r grades %>% tidyr::pivot_longer( cols = hw_1:exam_2, names_to = c("type", "id"), names_sep = "_", values_to = "score" ) %>% group_by(name, type) %>% summarize(total = sum(score)) %>% tidyr::pivot_wider( names_from = type, values_from = total ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 4 x 3</span><span> ## </span><span style='color: #949494;'># Groups: name [4]</span><span> ## name exam hw ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice 184 76 ## </span><span style='color: #BCBCBC;'>2</span><span> Bob 165 72 ## </span><span style='color: #BCBCBC;'>3</span><span> Carol 195 73 ## </span><span style='color: #BCBCBC;'>4</span><span> Dave 168 75 </span></CODE></PRE> --- ## Finishing up ```r grades %>% tidyr::pivot_longer( cols = hw_1:exam_2, names_to = c("type", "id"), names_sep = "_", values_to = "score" ) %>% group_by(name, type) %>% summarize(total = sum(score)) %>% tidyr::pivot_wider( names_from = type, values_from = total ) %>% mutate( score = 0.6*(hw/80) + 0.4*(exam/200) ) ``` <PRE class="fansi fansi-output"><CODE>## <span style='color: #949494;'># A tibble: 4 x 4</span><span> ## </span><span style='color: #949494;'># Groups: name [4]</span><span> ## name exam hw score ## </span><span style='color: #949494;font-style: italic;'><chr></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> </span><span style='color: #949494;font-style: italic;'><dbl></span><span> ## </span><span style='color: #BCBCBC;'>1</span><span> Alice 184 76 0.938 ## </span><span style='color: #BCBCBC;'>2</span><span> Bob 165 72 0.87 ## </span><span style='color: #BCBCBC;'>3</span><span> Carol 195 73 0.938 ## </span><span style='color: #BCBCBC;'>4</span><span> Dave 168 75 0.899 </span></CODE></PRE>