<br><br><br> .maintitle[<i class="fa fa-angle-double-right" aria-hidden="true"></i>intRoduction] <hr> <br> ### Marie-Hélène Brice ### Kevin Cazelles #### UQAM - March 26th, 2019 -- [<i class="fa fa-github" aria-hidden="true"></i>](https://github.com/inSileco/IntroR) .right[ ![:scale 20%](images/inSilecoLogo.png) ![:scale 20%](images/Rlogo.png) ] --- # Aims <br> ### Better ideas about what R actually is ### Better ideas about what R can do ### Better ideas about the topics covered --- # Outline ### <i class="fa fa-commenting-o" aria-hidden="true"></i> R? ~10min ### <i class="fa fa-wrench" aria-hidden="true"></i> Data manipulations ~1h15min ### <i class="fa fa-bar-chart" aria-hidden="true"></i> Data visualization (part1) ~30min ### <i class="fa fa-cutlery" aria-hidden="true"></i> Lunch break 12pm ### <i class="fa fa-bar-chart" aria-hidden="true"></i> Data visualization (part2) ~30min ### <i class="fa fa-map" aria-hidden="true"></i> mapping ~ 1h30 ### <i class="fa fa-coffee" aria-hidden="true"></i> *coffee break* 3pm ### <i class="fa fa-sitemap" aria-hidden="true"></i> ordination ~1h30 --- class: inverse, center, middle # R? ## <i class="fa fa-commenting-o" aria-hidden="true"></i> --- # R <br> > R is a programming language and free software environment for statistical computing and graphics [...]. <br> > The R language is widely used among statisticians and data miners for developing statistical software and data analysis. [R (programming language) - Wikipedia](https://en.wikipedia.org/wiki/R_(programming_language) --- # R is popular .center[![:scale 85%](images/redmonk.png)] [Python](https://www.python.org/) / [Julia](https://julialang.org/) --- # R is popular among Biological Sciences ### For instance 1. Ecologists 2. Bioinformatics [(Bioconductor)](https://www.bioconductor.org/) 3. Medicine [<i class="fa fa-external-link" aria-hidden="true"></i>](https://r-medicine.com/) -- ### Why? 1. An incredible tool for statistics 2. Fairly accessible for non-programmer people 3. A very high diversity of packages 4. Improved workflow --- # R is popular .center[![:scale 72%](https://revolution-computing.typepad.com/.a/6a010534b1db25970b01b8d2594d25970c-800wi)] [*CRAN now has 10,000 R packages* Revolutions. January, 2017](https://blog.revolutionanalytics.com/2017/01/cran-10000.html) [MetaCRAN](https://www.r-pkg.org/) / [R Package Documentation](https://rdrr.io/) --- # R, RStudio, Ropenscience? ### Main links - [R project](https://www.r-project.org/) - [CRAN](https://cran.r-project.org/) - [RStudio](https://www.rstudio.com/) - [rOpenSci](https://ropensci.org) ### Get reliable documentation - [CRAN Manual & Contributed](https://cran.r-project.org/) - [bookdown](https://bookdown.org/) - [RStudio](https://www.rstudio.com/) - [Cheat Sheets](https://www.rstudio.com/resources/cheatsheets/) - [Webinars](https://resources.rstudio.com/webinars) - [QCBS](https://qcbsrworkshops.github.io/Workshops/) - [DataCarpentery](https://datacarpentry.org/R-genomics/index.html) - [<i class="fa fa-stack-overflow" aria-hidden="true"></i>](https://stackoverflow.com/questions/tagged/r) --- class: inverse, center, middle # Data manipulation ## <i class="fa fa-wrench" aria-hidden="true"></i> --- # Let's start <br> - Open R console / R GUI / RStudio / ... -- ```R getwd() setwd("pasth2workingdirectory") ``` -- - Install the following packages ```R install.packages(c("tidyverse", "sf", "raster", "mapview", "vegan", "ade4", "scales", "Rcolorbrewer")) ``` --- # R's principles <br> ### 1. Everything in R is an object ### 2. Everything that happens in R is a function call ### 3. Interfaces to other software are part of R. [Extending-R by Chambers](https://www.crcpress.com/Extending-R/Chambers/p/book/9781498775717) --- # Everything in R is an object <br> ```r 2 # [1] 2 ``` -- ```r class(2) # [1] "numeric" ``` -- ```r class("A") # [1] "character" class(library) # [1] "function" plot # function (x, y, ...) # UseMethod("plot") # <bytecode: 0x56490cfc97f8> # <environment: namespace:graphics> ``` --- class: center # Everything that happens in R is a function call <br> ```R class(2) ``` ## <i class="fa fa-arrow-circle-o-down" aria-hidden="true"></i> ### `enter` ## <i class="fa fa-arrow-circle-o-down" aria-hidden="true"></i> ``` # [1] "numeric" ``` --- # Interfaces to other software are part of R <br> ### Four examples among many others: - [Rcpp](https://cran.r-project.org/web/packages/Rcpp/index.html) - [Reticulate](https://rstudio.github.io/reticulate/index.html) - [Rmarkdown](https://rmarkdown.rstudio.com/) - [plotly](https://plot.ly/r/) - [mapview](https://r-spatial.github.io/mapview/) --- # Basic commands ```r 2+2 # [1] 4 2*3 # [1] 6 3>2 # [1] TRUE ``` -- ```r let <- LETTERS[1:10] let2 = LETTERS[1:10] let # [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" let2 # [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" identical(let, let2) # [1] TRUE ``` -- ```r let3 <- sample(let, 5) let3 # [1] "J" "G" "H" "I" "E" ``` -- ```R # Functions documentation ?sample ``` --- # Vectors **Create a vector** ```r vec <- c(2:5, 9:12, 2) vec # [1] 2 3 4 5 9 10 11 12 2 vec > 4 # [1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE ``` -- ```r let # [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" length(let) # [1] 10 ``` -- **Subset a vector** ```r let[1:2] # [1] "A" "B" let[c(1:2, 5:6)] # [1] "A" "B" "E" "F" let[-c(1,7)] # [1] "B" "C" "D" "E" "F" "H" "I" "J" vec[vec > 4] # [1] 5 9 10 11 12 let[let > "C"] # [1] "D" "E" "F" "G" "H" "I" "J" ``` --- # Matrices ```r # use arguments mat <- matrix(1:18, ncol = 3, nrow = 6) mat # [,1] [,2] [,3] # [1,] 1 7 13 # [2,] 2 8 14 # [3,] 3 9 15 # [4,] 4 10 16 # [5,] 5 11 17 # [6,] 6 12 18 dim(mat) # [1] 6 3 mat[1,2] # [1] 7 ``` -- ```r mat[1,] # [1] 1 7 13 mat[,2] # [1] 7 8 9 10 11 12 mat[2:4, 1:2] # [,1] [,2] # [1,] 2 8 # [2,] 3 9 # [3,] 4 10 ``` --- # Data frames ## <i class="fa fa-question-circle-o" aria-hidden="true"></i> -- ```r df <- data.frame( letter = let[1:5], val = vec[1:5], logic = vec[1:5]>2 ) df # letter val logic # 1 A 2 FALSE # 2 B 3 TRUE # 3 C 4 TRUE # 4 D 5 TRUE # 5 E 9 TRUE class(df) # [1] "data.frame" dim(df) # [1] 5 3 ``` --- # Data frames ```r df[1, 1] # [1] "A" ``` ```r df$letter # [1] "A" "B" "C" "D" "E" df[,1] # [1] "A" "B" "C" "D" "E" df[1] # letter # 1 A # 2 B # 3 C # 4 D # 5 E df[1:2] # letter val # 1 A 2 # 2 B 3 # 3 C 4 # 4 D 5 # 5 E 9 ``` ```r class(df$letter) # [1] "character" class(df$val) # [1] "numeric" ``` --- # Data frames <br> ```r library(datasets) ``` To see the list of available datasets: ```R data() ``` To access the documentation of a particular dataset: ```R ?CO2 ``` --- # Data frames ```r library(datasets) head(CO2) # Grouped Data: uptake ~ conc | Plant # Plant Type Treatment conc uptake # 1 Qn1 Quebec nonchilled 95 16.0 # 2 Qn1 Quebec nonchilled 175 30.4 # 3 Qn1 Quebec nonchilled 250 34.8 # 4 Qn1 Quebec nonchilled 350 37.2 # 5 Qn1 Quebec nonchilled 500 35.3 # 6 Qn1 Quebec nonchilled 675 39.2 names(CO2) # [1] "Plant" "Type" "Treatment" "conc" "uptake" summary(CO2) # Plant Type Treatment conc uptake # Qn1 : 7 Quebec :42 nonchilled:42 Min. : 95 Min. : 7.70 # Qn2 : 7 Mississippi:42 chilled :42 1st Qu.: 175 1st Qu.:17.90 # Qn3 : 7 Median : 350 Median :28.30 # Qc1 : 7 Mean : 435 Mean :27.21 # Qc3 : 7 3rd Qu.: 675 3rd Qu.:37.12 # Qc2 : 7 Max. :1000 Max. :45.50 # (Other):42 ``` -- ```r CO2$Plant # [1] Qn1 Qn1 Qn1 Qn1 Qn1 Qn1 Qn1 Qn2 Qn2 Qn2 Qn2 Qn2 Qn2 Qn2 Qn3 Qn3 Qn3 Qn3 Qn3 Qn3 Qn3 Qc1 # [23] Qc1 Qc1 Qc1 Qc1 Qc1 Qc1 Qc2 Qc2 Qc2 Qc2 Qc2 Qc2 Qc2 Qc3 Qc3 Qc3 Qc3 Qc3 Qc3 Qc3 Mn1 Mn1 # [45] Mn1 Mn1 Mn1 Mn1 Mn1 Mn2 Mn2 Mn2 Mn2 Mn2 Mn2 Mn2 Mn3 Mn3 Mn3 Mn3 Mn3 Mn3 Mn3 Mc1 Mc1 Mc1 # [67] Mc1 Mc1 Mc1 Mc1 Mc2 Mc2 Mc2 Mc2 Mc2 Mc2 Mc2 Mc3 Mc3 Mc3 Mc3 Mc3 Mc3 Mc3 # Levels: Qn1 < Qn2 < Qn3 < Qc1 < Qc3 < Qc2 < Mn3 < Mn2 < Mn1 < Mc2 < Mc3 < Mc1 ``` --- # Lists ```r mylist <- list(CO2 = CO2, mymat = mat, mylet = let[1], awesome = "cool") names(mylist) # [1] "CO2" "mymat" "mylet" "awesome" mylist$awesome # [1] "cool" mylist[3:4] # $mylet # [1] "A" # # $awesome # [1] "cool" ``` -- ```r class(mylist[2]) # [1] "list" class(mylist[[2]]) # [1] "matrix" mylist[[2]] # [,1] [,2] [,3] # [1,] 1 7 13 # [2,] 2 8 14 # [3,] 3 9 15 # [4,] 4 10 16 # [5,] 5 11 17 # [6,] 6 12 18 class(mylist[[2]][2,3]) # [1] "integer" mylist[[2]][2,3] # [1] 14 ``` --- # For loops ```r for (i in 1:4) { # actions print(i) } # [1] 1 # [1] 2 # [1] 3 # [1] 4 ``` ```r for (j in LETTERS[1:5]) { print(j) } # [1] "A" # [1] "B" # [1] "C" # [1] "D" # [1] "E" ``` ```r for (i in c(1,4,6)) { print(i) } # [1] 1 # [1] 4 # [1] 6 ``` --- # Logical conditions > >= < <= == != -- ```r vec <- c(2:5, 9:12, 2) vec # [1] 2 3 4 5 9 10 11 12 2 vec[1] > 4 # [1] FALSE vec > 4 # [1] FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE ``` -- `if` ... `else` ... ```r for (i in vec) { if (i <= 5) { # actions print(i) } else { print("nope") } } # [1] 2 # [1] 3 # [1] 4 # [1] 5 # [1] "nope" # [1] "nope" # [1] "nope" # [1] "nope" # [1] 2 ``` --- class: inverse, center, middle # Questions? ## <i class="fa fa-question-circle-o" aria-hidden="true"></i> --- # Tidyverse - a metapackage <br> ```r library(tidyverse) ``` ### <i class="fa fa-wrench" aria-hidden="true"></i> A great tool belt for data science [<i class="fa fa-external-link" aria-hidden="true"></i>](https://www.tidyverse.org/) -- - <i class="fa fa-check" aria-hidden="true"></i> Pros: - well-documented, intuitive - efficient - very popular - <i class="fa fa-exclamation-triangle" aria-hidden="true"></i>Cons: - an alternative way of doing the same manipulation things - prevents the user from learning programming basics --- # Data manipulations <br> ### 1. Import data / read file(s) <i class="fa fa-arrow-circle-right" aria-hidden="true"></i> get R object(s) ### 2. Select / Filter <i class="fa fa-arrow-circle-right" aria-hidden="true"></i> find the data of interest ### 3. Join <i class="fa fa-arrow-circle-right" aria-hidden="true"></i> combine data ### 4. Mutate / Aggregate <i class="fa fa-arrow-circle-right" aria-hidden="true"></i> create new data ### 5. Cast and Melt <i class="fa fa-arrow-circle-right" aria-hidden="true"></i> Format your data (see tidyr) ### 6. R object(s) <i class="fa fa-arrow-circle-right" aria-hidden="true"></i> Export data / write file(s) --- # Read a file ```r df2 <- read.csv("data/environ.csv") head(df2, 3) # dfs alt slo flo pH har pho nit amm oxy bdo # 1 3 934 6.176 84 79 45 1 20 0 122 27 # 2 22 932 3.434 100 80 40 2 20 10 103 19 # 3 102 914 3.638 180 83 52 5 22 5 105 35 class(df2) # [1] "data.frame" ``` -- ```r df3 <- read_csv("data/environ.csv") head(df3, 3) # # A tibble: 3 x 11 # dfs alt slo flo pH har pho nit amm oxy bdo # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 3 934 6.18 84 79 45 1 20 0 122 27 # 2 22 932 3.43 100 80 40 2 20 10 103 19 # 3 102 914 3.64 180 83 52 5 22 5 105 35 class(df3) # [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame" ``` -- - `read.table()` - [SQL in R](http://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/sql.html) - [MongoDB in R](https://jeroen.github.io/mongolite/) - [MariaDB](https://mariadb.com/kb/en/library/r-statistical-programming-using-mariadb-as-the-background-database/) --- # Doubs River Fish Dataset <br> .pull-left[ Verneaux (1973) dataset: - characterization of fish communities - 27 different species - 30 different sites - 11 environmental variables Load the Doubs River species data ```r library(ade4) data(doubs) # ?doubs ``` ] .pull.right[ ![:scale 50%](images/DoubsRiver.png) ] --- # Dataset 'doubs' in package 'ade4' <br> ```r class(doubs) # [1] "list" names(doubs) # [1] "env" "fish" "xy" "species" head(doubs$env) # dfs alt slo flo pH har pho nit amm oxy bdo # 1 3 934 6.176 84 79 45 1 20 0 122 27 # 2 22 932 3.434 100 80 40 2 20 10 103 19 # 3 102 914 3.638 180 83 52 5 22 5 105 35 # 4 185 854 3.497 253 80 72 10 21 0 110 13 # 5 215 849 3.178 264 81 84 38 52 20 80 62 # 6 324 846 3.497 286 79 60 20 15 0 102 53 ``` --- # Dataset 'doubs' in package 'ade4' <br> ```r as_tibble(doubs$env) # # A tibble: 30 x 11 # dfs alt slo flo pH har pho nit amm oxy bdo # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 3 934 6.18 84 79 45 1 20 0 122 27 # 2 22 932 3.43 100 80 40 2 20 10 103 19 # 3 102 914 3.64 180 83 52 5 22 5 105 35 # 4 185 854 3.50 253 80 72 10 21 0 110 13 # 5 215 849 3.18 264 81 84 38 52 20 80 62 # 6 324 846 3.50 286 79 60 20 15 0 102 53 # 7 268 841 4.20 400 81 88 7 15 0 111 22 # 8 491 792 3.26 130 81 94 20 41 12 70 81 # 9 705 752 2.56 480 80 90 30 82 12 72 52 # 10 990 617 4.61 1000 77 82 6 75 1 100 43 # # … with 20 more rows ``` --- # Piping -- ```r res <- as_tibble(data.frame(res = log(diff(exp(1:6))))) res # # A tibble: 5 x 1 # res # <dbl> # 1 1.54 # 2 2.54 # 3 3.54 # 4 4.54 # 5 5.54 ``` -- ```r # library(magrittr) res <- 1:6 %>% exp %>% diff %>% log %>% as.data.frame(col.names = "res") %>% as_tibble res # # A tibble: 5 x 1 # . # <dbl> # 1 1.54 # 2 2.54 # 3 3.54 # 4 4.54 # 5 5.54 ``` --- # Select -- ```r denv <- as_tibble(doubs$env) dim(denv) # [1] 30 11 names(denv) # [1] "dfs" "alt" "slo" "flo" "pH" "har" "pho" "nit" "amm" "oxy" "bdo" head(denv, 20) # # A tibble: 20 x 11 # dfs alt slo flo pH har pho nit amm oxy bdo # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 3 934 6.18 84 79 45 1 20 0 122 27 # 2 22 932 3.43 100 80 40 2 20 10 103 19 # 3 102 914 3.64 180 83 52 5 22 5 105 35 # 4 185 854 3.50 253 80 72 10 21 0 110 13 # 5 215 849 3.18 264 81 84 38 52 20 80 62 # 6 324 846 3.50 286 79 60 20 15 0 102 53 # 7 268 841 4.20 400 81 88 7 15 0 111 22 # 8 491 792 3.26 130 81 94 20 41 12 70 81 # 9 705 752 2.56 480 80 90 30 82 12 72 52 # 10 990 617 4.61 1000 77 82 6 75 1 100 43 # 11 1234 483 3.74 1990 81 96 30 160 0 115 27 # 12 1324 477 2.83 2000 79 86 4 50 0 122 30 # 13 1436 450 3.09 2110 81 98 6 52 0 124 24 # 14 1522 434 2.56 2120 83 98 27 123 0 123 38 # 15 1645 415 1.79 2300 86 86 40 100 0 117 21 # 16 1859 375 3.04 1610 80 88 20 200 5 103 27 # 17 1985 348 1.79 2430 80 92 20 250 20 102 46 # 18 2110 332 2.20 2500 80 90 50 220 20 103 28 # 19 2246 310 1.79 2590 81 84 60 220 15 106 33 # 20 2477 286 2.20 2680 80 86 30 300 30 103 28 ``` --- # Select ```r head(denv[, c(1:2, 5)]) # # A tibble: 6 x 3 # dfs alt pH # <dbl> <dbl> <dbl> # 1 3 934 79 # 2 22 932 80 # 3 102 914 83 # 4 185 854 80 # 5 215 849 81 # 6 324 846 79 head(denv[, c("dfs", "alt", "pH")]) # # A tibble: 6 x 3 # dfs alt pH # <dbl> <dbl> <dbl> # 1 3 934 79 # 2 22 932 80 # 3 102 914 83 # 4 185 854 80 # 5 215 849 81 # 6 324 846 79 ``` --- # Select ```r denvS <- denv %>% select(dfs, alt, pH) head(denvS) # # A tibble: 6 x 3 # dfs alt pH # <dbl> <dbl> <dbl> # 1 3 934 79 # 2 22 932 80 # 3 102 914 83 # 4 185 854 80 # 5 215 849 81 # 6 324 846 79 ``` ```r denvS <- denv %>% select(alt, pH, dfs) head(denvS) # # A tibble: 6 x 3 # alt pH dfs # <dbl> <dbl> <dbl> # 1 934 79 3 # 2 932 80 22 # 3 914 83 102 # 4 854 80 185 # 5 849 81 215 # 6 846 79 324 ``` --- # Select ```r head(denv[, -c(1:2, 5)]) # # A tibble: 6 x 8 # slo flo har pho nit amm oxy bdo # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 6.18 84 45 1 20 0 122 27 # 2 3.43 100 40 2 20 10 103 19 # 3 3.64 180 52 5 22 5 105 35 # 4 3.50 253 72 10 21 0 110 13 # 5 3.18 264 84 38 52 20 80 62 # 6 3.50 286 60 20 15 0 102 53 # head(denv[, -c("dfs", "alt", "pH")]) won't work ``` ```r denv %>% select(-dfs, -alt, -pH) # # A tibble: 30 x 8 # slo flo har pho nit amm oxy bdo # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 6.18 84 45 1 20 0 122 27 # 2 3.43 100 40 2 20 10 103 19 # 3 3.64 180 52 5 22 5 105 35 # 4 3.50 253 72 10 21 0 110 13 # 5 3.18 264 84 38 52 20 80 62 # 6 3.50 286 60 20 15 0 102 53 # 7 4.20 400 88 7 15 0 111 22 # 8 3.26 130 94 20 41 12 70 81 # 9 2.56 480 90 30 82 12 72 52 # 10 4.61 1000 82 6 75 1 100 43 # # … with 20 more rows ``` --- # Filter ```r denv$alt # [1] 934 932 914 854 849 846 841 792 752 617 483 477 450 434 415 375 348 332 310 286 262 254 # [23] 246 241 231 214 206 195 183 172 ``` -- ```r denvF <- denv %>% dplyr::filter(alt > 400) denvF # # A tibble: 15 x 11 # dfs alt slo flo pH har pho nit amm oxy bdo # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 3 934 6.18 84 79 45 1 20 0 122 27 # 2 22 932 3.43 100 80 40 2 20 10 103 19 # 3 102 914 3.64 180 83 52 5 22 5 105 35 # 4 185 854 3.50 253 80 72 10 21 0 110 13 # 5 215 849 3.18 264 81 84 38 52 20 80 62 # 6 324 846 3.50 286 79 60 20 15 0 102 53 # 7 268 841 4.20 400 81 88 7 15 0 111 22 # 8 491 792 3.26 130 81 94 20 41 12 70 81 # 9 705 752 2.56 480 80 90 30 82 12 72 52 # 10 990 617 4.61 1000 77 82 6 75 1 100 43 # 11 1234 483 3.74 1990 81 96 30 160 0 115 27 # 12 1324 477 2.83 2000 79 86 4 50 0 122 30 # 13 1436 450 3.09 2110 81 98 6 52 0 124 24 # 14 1522 434 2.56 2120 83 98 27 123 0 123 38 # 15 1645 415 1.79 2300 86 86 40 100 0 117 21 ``` --- # Filter <br> ```r denvF2 <- denv %>% dplyr::filter(alt > 400 & pH>=80) denvF2 # # A tibble: 11 x 11 # dfs alt slo flo pH har pho nit amm oxy bdo # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 22 932 3.43 100 80 40 2 20 10 103 19 # 2 102 914 3.64 180 83 52 5 22 5 105 35 # 3 185 854 3.50 253 80 72 10 21 0 110 13 # 4 215 849 3.18 264 81 84 38 52 20 80 62 # 5 268 841 4.20 400 81 88 7 15 0 111 22 # 6 491 792 3.26 130 81 94 20 41 12 70 81 # 7 705 752 2.56 480 80 90 30 82 12 72 52 # 8 1234 483 3.74 1990 81 96 30 160 0 115 27 # 9 1436 450 3.09 2110 81 98 6 52 0 124 24 # 10 1522 434 2.56 2120 83 98 27 123 0 123 38 # 11 1645 415 1.79 2300 86 86 40 100 0 117 21 ``` --- # Filter <br> ```r denvF3 <- denv %>% dplyr::filter(alt > 600 | pH>=82) denvF3 # # A tibble: 14 x 11 # dfs alt slo flo pH har pho nit amm oxy bdo # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 3 934 6.18 84 79 45 1 20 0 122 27 # 2 22 932 3.43 100 80 40 2 20 10 103 19 # 3 102 914 3.64 180 83 52 5 22 5 105 35 # 4 185 854 3.50 253 80 72 10 21 0 110 13 # 5 215 849 3.18 264 81 84 38 52 20 80 62 # 6 324 846 3.50 286 79 60 20 15 0 102 53 # 7 268 841 4.20 400 81 88 7 15 0 111 22 # 8 491 792 3.26 130 81 94 20 41 12 70 81 # 9 705 752 2.56 480 80 90 30 82 12 72 52 # 10 990 617 4.61 1000 77 82 6 75 1 100 43 # 11 1522 434 2.56 2120 83 98 27 123 0 123 38 # 12 1645 415 1.79 2300 86 86 40 100 0 117 21 # 13 3947 195 1.39 4320 83 100 74 400 30 81 45 # 14 4530 172 1.10 6900 82 109 65 160 10 82 44 ``` --- # Filter <br> Selection + Filter -- ```r denvF4 <- denv %>% select(dfs, alt, pH) %>% dplyr::filter(alt > 400 & pH>=80) denvF4 # # A tibble: 11 x 3 # dfs alt pH # <dbl> <dbl> <dbl> # 1 22 932 80 # 2 102 914 83 # 3 185 854 80 # 4 215 849 81 # 5 268 841 81 # 6 491 792 81 # 7 705 752 80 # 8 1234 483 81 # 9 1436 450 81 # 10 1522 434 83 # 11 1645 415 86 ``` --- # Mutate <br> -- ```r denvM <- denv %>% mutate(pH2 = pH + 1) denvM # # A tibble: 30 x 12 # dfs alt slo flo pH har pho nit amm oxy bdo pH2 # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 3 934 6.18 84 79 45 1 20 0 122 27 80 # 2 22 932 3.43 100 80 40 2 20 10 103 19 81 # 3 102 914 3.64 180 83 52 5 22 5 105 35 84 # 4 185 854 3.50 253 80 72 10 21 0 110 13 81 # 5 215 849 3.18 264 81 84 38 52 20 80 62 82 # 6 324 846 3.50 286 79 60 20 15 0 102 53 80 # 7 268 841 4.20 400 81 88 7 15 0 111 22 82 # 8 491 792 3.26 130 81 94 20 41 12 70 81 82 # 9 705 752 2.56 480 80 90 30 82 12 72 52 81 # 10 990 617 4.61 1000 77 82 6 75 1 100 43 78 # # … with 20 more rows ``` --- # Mutate <br> ```r denvM2 <- denv %>% mutate(index = 2*nit + pho + amm) denvM2 # # A tibble: 30 x 12 # dfs alt slo flo pH har pho nit amm oxy bdo index # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 3 934 6.18 84 79 45 1 20 0 122 27 41 # 2 22 932 3.43 100 80 40 2 20 10 103 19 52 # 3 102 914 3.64 180 83 52 5 22 5 105 35 54 # 4 185 854 3.50 253 80 72 10 21 0 110 13 52 # 5 215 849 3.18 264 81 84 38 52 20 80 62 162 # 6 324 846 3.50 286 79 60 20 15 0 102 53 50 # 7 268 841 4.20 400 81 88 7 15 0 111 22 37 # 8 491 792 3.26 130 81 94 20 41 12 70 81 114 # 9 705 752 2.56 480 80 90 30 82 12 72 52 206 # 10 990 617 4.61 1000 77 82 6 75 1 100 43 157 # # … with 20 more rows ``` --- # Mutate ```r denvM3 <- denv %>% select(dfs, alt, pH, nit, pho, amm) %>% dplyr::filter(alt > 400) %>% mutate(pH2 = pH + 1) %>% mutate(index = 2*nit + pho + amm) denvM3 # # A tibble: 15 x 8 # dfs alt pH nit pho amm pH2 index # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 3 934 79 20 1 0 80 41 # 2 22 932 80 20 2 10 81 52 # 3 102 914 83 22 5 5 84 54 # 4 185 854 80 21 10 0 81 52 # 5 215 849 81 52 38 20 82 162 # 6 324 846 79 15 20 0 80 50 # 7 268 841 81 15 7 0 82 37 # 8 491 792 81 41 20 12 82 114 # 9 705 752 80 82 30 12 81 206 # 10 990 617 77 75 6 1 78 157 # 11 1234 483 81 160 30 0 82 350 # 12 1324 477 79 50 4 0 80 104 # 13 1436 450 81 52 6 0 82 110 # 14 1522 434 83 123 27 0 84 273 # 15 1645 415 86 100 40 0 87 240 ``` --- # Mutate ```r model <- denv %>% select(dfs, alt, pH, nit, pho, amm) %>% dplyr::filter(alt > 400) %>% mutate(pH2 = pH + 1) %>% mutate(index = 2*nit + pho + amm) %>% lm(index ~ pH2, data = .) summary(model) # # Call: # lm(formula = index ~ pH2, data = .) # # Residuals: # Min 1Q Median 3Q Max # -117.64 -66.20 -23.96 54.46 212.04 # # Coefficients: # Estimate Std. Error t value Pr(>|t|) # (Intercept) -1243.01 959.57 -1.295 0.218 # pH2 16.84 11.74 1.435 0.175 # # Residual standard error: 93.11 on 13 degrees of freedom # Multiple R-squared: 0.1367, Adjusted R-squared: 0.07033 # F-statistic: 2.059 on 1 and 13 DF, p-value: 0.1749 ``` --- class: inverse, center, middle # Questions? ## <i class="fa fa-question-circle-o" aria-hidden="true"></i> --- # Join <br> -- ```r dfis <- doubs$fish head(denv, 4) # # A tibble: 4 x 11 # dfs alt slo flo pH har pho nit amm oxy bdo # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 3 934 6.18 84 79 45 1 20 0 122 27 # 2 22 932 3.43 100 80 40 2 20 10 103 19 # 3 102 914 3.64 180 83 52 5 22 5 105 35 # 4 185 854 3.50 253 80 72 10 21 0 110 13 head(dfis, 4) # Cogo Satr Phph Neba Thth Teso Chna Chto Lele Lece Baba Spbi Gogo Eslu Pefl Rham Legi Scer # 1 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # 2 0 5 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # 3 0 5 5 5 0 0 0 0 0 0 0 0 0 1 0 0 0 0 # 4 0 4 5 5 0 0 0 0 0 1 0 0 1 2 2 0 0 0 # Cyca Titi Abbr Icme Acce Ruru Blbj Alal Anan # 1 0 0 0 0 0 0 0 0 0 # 2 0 0 0 0 0 0 0 0 0 # 3 0 0 0 0 0 0 0 0 0 # 4 0 1 0 0 0 0 0 0 0 ``` --- # Join <br> ```r denv <- denv %>% mutate(idSite = 1:nrow(denv)) head(denv, 4) # # A tibble: 4 x 12 # dfs alt slo flo pH har pho nit amm oxy bdo idSite # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> # 1 3 934 6.18 84 79 45 1 20 0 122 27 1 # 2 22 932 3.43 100 80 40 2 20 10 103 19 2 # 3 102 914 3.64 180 83 52 5 22 5 105 35 3 # 4 185 854 3.50 253 80 72 10 21 0 110 13 4 dfis$idSite <- 1:nrow(denv) head(dfis, 4) # Cogo Satr Phph Neba Thth Teso Chna Chto Lele Lece Baba Spbi Gogo Eslu Pefl Rham Legi Scer # 1 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # 2 0 5 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # 3 0 5 5 5 0 0 0 0 0 0 0 0 0 1 0 0 0 0 # 4 0 4 5 5 0 0 0 0 0 1 0 0 1 2 2 0 0 0 # Cyca Titi Abbr Icme Acce Ruru Blbj Alal Anan idSite # 1 0 0 0 0 0 0 0 0 0 1 # 2 0 0 0 0 0 0 0 0 0 2 # 3 0 0 0 0 0 0 0 0 0 3 # 4 0 1 0 0 0 0 0 0 0 4 ``` --- # Join <br> ```r dmerg <- denv %>% inner_join(dfis) head(dmerg) # # A tibble: 6 x 39 # dfs alt slo flo pH har pho nit amm oxy bdo idSite Cogo Satr Phph # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> # 1 3 934 6.18 84 79 45 1 20 0 122 27 1 0 3 0 # 2 22 932 3.43 100 80 40 2 20 10 103 19 2 0 5 4 # 3 102 914 3.64 180 83 52 5 22 5 105 35 3 0 5 5 # 4 185 854 3.50 253 80 72 10 21 0 110 13 4 0 4 5 # 5 215 849 3.18 264 81 84 38 52 20 80 62 5 0 2 3 # 6 324 846 3.50 286 79 60 20 15 0 102 53 6 0 3 4 # # … with 24 more variables: Neba <dbl>, Thth <dbl>, Teso <dbl>, Chna <dbl>, Chto <dbl>, # # Lele <dbl>, Lece <dbl>, Baba <dbl>, Spbi <dbl>, Gogo <dbl>, Eslu <dbl>, Pefl <dbl>, # # Rham <dbl>, Legi <dbl>, Scer <dbl>, Cyca <dbl>, Titi <dbl>, Abbr <dbl>, Icme <dbl>, # # Acce <dbl>, Ruru <dbl>, Blbj <dbl>, Alal <dbl>, Anan <dbl> dim(dmerg) # [1] 30 39 ``` --- # Join <br> ### Several ways of joining data frames [(see documentation)](https://dplyr.tidyverse.org/reference/join.html) ```r dmerg1 <- denv[-1,] %>% inner_join(dfis[-2,]) dim(dmerg1) # [1] 28 39 ``` -- ```r dmerg2 <- denv[-1,] %>% right_join(dfis[-2,]) dmerg3 <- denv[-1,] %>% left_join(dfis[-2,]) dmerg4 <- denv[-1,] %>% full_join(dfis[-2,]) dim(dmerg2) # [1] 29 39 dim(dmerg3) # [1] 29 39 dim(dmerg4) # [1] 30 39 ``` --- # Aggregate <br> -- ```r denv2 <- denv %>% mutate(nit2 = nit >=125) denv2 # # A tibble: 30 x 13 # dfs alt slo flo pH har pho nit amm oxy bdo idSite nit2 # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <lgl> # 1 3 934 6.18 84 79 45 1 20 0 122 27 1 FALSE # 2 22 932 3.43 100 80 40 2 20 10 103 19 2 FALSE # 3 102 914 3.64 180 83 52 5 22 5 105 35 3 FALSE # 4 185 854 3.50 253 80 72 10 21 0 110 13 4 FALSE # 5 215 849 3.18 264 81 84 38 52 20 80 62 5 FALSE # 6 324 846 3.50 286 79 60 20 15 0 102 53 6 FALSE # 7 268 841 4.20 400 81 88 7 15 0 111 22 7 FALSE # 8 491 792 3.26 130 81 94 20 41 12 70 81 8 FALSE # 9 705 752 2.56 480 80 90 30 82 12 72 52 9 FALSE # 10 990 617 4.61 1000 77 82 6 75 1 100 43 10 FALSE # # … with 20 more rows ``` --- # Aggregate <br> ```r denv2$alt2 <- "low" denv2$alt2[denv2$alt>450] <- "medium" denv2$alt2[denv2$alt>750] <- "high" denv2$alt2 <- as.factor(denv2$alt2) denv2 # # A tibble: 30 x 14 # dfs alt slo flo pH har pho nit amm oxy bdo idSite nit2 alt2 # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <lgl> <fct> # 1 3 934 6.18 84 79 45 1 20 0 122 27 1 FALSE high # 2 22 932 3.43 100 80 40 2 20 10 103 19 2 FALSE high # 3 102 914 3.64 180 83 52 5 22 5 105 35 3 FALSE high # 4 185 854 3.50 253 80 72 10 21 0 110 13 4 FALSE high # 5 215 849 3.18 264 81 84 38 52 20 80 62 5 FALSE high # 6 324 846 3.50 286 79 60 20 15 0 102 53 6 FALSE high # 7 268 841 4.20 400 81 88 7 15 0 111 22 7 FALSE high # 8 491 792 3.26 130 81 94 20 41 12 70 81 8 FALSE high # 9 705 752 2.56 480 80 90 30 82 12 72 52 9 FALSE high # 10 990 617 4.61 1000 77 82 6 75 1 100 43 10 FALSE medium # # … with 20 more rows ``` --- # Aggregate <br> ```r denv2 %>% group_by(nit2) %>% summarize(n()) # # A tibble: 2 x 2 # nit2 `n()` # <lgl> <int> # 1 FALSE 14 # 2 TRUE 16 denv2 %>% group_by(nit2, alt2) %>% summarize(n()) # # A tibble: 5 x 3 # # Groups: nit2 [2] # nit2 alt2 `n()` # <lgl> <fct> <int> # 1 FALSE high 9 # 2 FALSE low 3 # 3 FALSE medium 2 # 4 TRUE low 15 # 5 TRUE medium 1 ``` --- # Aggregate <br> ```r denv2 %>% group_by(nit2) %>% summarize(n()) # # A tibble: 2 x 2 # nit2 `n()` # <lgl> <int> # 1 FALSE 14 # 2 TRUE 16 denv2 %>% group_by(nit2, alt2) %>% summarize(nb_obs = n(), mean_amm = mean(amm), sd_oxy = sd(oxy)) # # A tibble: 5 x 5 # # Groups: nit2 [2] # nit2 alt2 nb_obs mean_amm sd_oxy # <lgl> <fct> <int> <dbl> <dbl> # 1 FALSE high 9 6.56 18.6 # 2 FALSE low 3 0 3.79 # 3 FALSE medium 2 0.5 15.6 # 4 TRUE low 15 37.9 20.6 # 5 TRUE medium 1 0 NA ``` --- # Write files ### Write data frames ```r write.csv(dmerg, "output/dmerg.csv") write_csv(dmerg, "output/dmerg2.csv") ``` -- ### Write any R objects ```r saveRDS(doubs, "output/doubs.rds") # readRDS("output/dmerg.rds") write_rds(doubs, "output/doubs.rds") # read_rds ``` --- class: inverse, center, middle # Data visualization ## <i class="fa fa-bar-chart" aria-hidden="true"></i> --- # Data visualization <br> ## [Material +](https://insileco.github.io/Visualisation-SentinelleNord/#1) ## [bookdown](https://bookdown.org/) --- # Components <br> .pull-left[ - "a chart is worth a thousand words" ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-43-1.png" style="display: block; margin: auto;" /> ] --- # Components <br> .pull-left[ - "a chart is worth a thousand words" - plot window ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-44-1.png" style="display: block; margin: auto;" /> ] --- # Components <br> .pull-left[ - "a chart is worth a thousand words" - plot window - plot region ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-45-1.png" style="display: block; margin: auto;" /> ] --- # Components <br> .pull-left[ - "a chart is worth a thousand words" - plot window - plot region - data ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-46-1.png" style="display: block; margin: auto;" /> ] --- # Components <br> .pull-left[ - "a chart is worth a thousand words" - plot window - plot region - data - axes ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-47-1.png" style="display: block; margin: auto;" /> ] --- # Components <br> .pull-left[ - "a chart is worth a thousand words" - plot window - plot region - data - axes - title ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-48-1.png" style="display: block; margin: auto;" /> ] --- # Components <br> .pull-left[ - "a chart is worth a thousand words" - plot window - plot region - data - axes - titles - legend ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-49-1.png" style="display: block; margin: auto;" /> ] --- # graphics vs grid <br> .center[![:scale 80%](images/Murrell2015.jpg)] Murrell, P. (2015) [The gridGraphics Package]("https://journal.r-project.org/archive/2015-1/murrell.pdf"). The R Journal. --- # Workflows ### Base plot ```R dev(...) par(...) plot(...) fun1(...) fun2(...) dev.off() ``` <br> ### ggplot2 ```R myplot <- ggplot() + gg_XX1 + gg_XX2 + gg_XX3 myplot ggsave(...) ``` --- # Hundreds of packages <br> ### [List of packages](https://insileco.github.io/wiki/rgraphpkgs/) ### [r-graph-gallery](https://www.r-graph-gallery.com/) ### [data-to-viz](https://www.data-to-viz.com/) ### [Cookbook for R](http://www.cookbook-r.com/Graphs/) --- # Basic plots .pull-left[ ```r # denv as used above plot(x = denv2$alt, y = denv2$amm) ``` <img src="index_files/figure-html/unnamed-chunk-50-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r p <- denv2 %>% ggplot(aes(x = alt, y = amm)) + geom_point() p ``` <img src="index_files/figure-html/unnamed-chunk-51-1.png" style="display: block; margin: auto;" /> ] --- # Basic plots .pull-left[ ```r plot(x = denv2$alt, y = denv$amm, type = "l") ``` <img src="index_files/figure-html/unnamed-chunk-52-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r p <- denv2 %>% ggplot(aes(x = alt, y = amm)) + geom_line() p ``` <img src="index_files/figure-html/unnamed-chunk-53-1.png" style="display: block; margin: auto;" /> ] --- # Basic plots .pull-left[ ```r par(las = 1) plot(x = denv2$alt, y = denv$amm, type = "l", xlab = "altitude", ylab = "amm") ``` <img src="index_files/figure-html/unnamed-chunk-54-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r p <- denv2 %>% ggplot(aes(x = alt, y = amm)) + geom_line() p ``` <img src="index_files/figure-html/unnamed-chunk-55-1.png" style="display: block; margin: auto;" /> ] --- # Basic plots .pull-left[ ```r par(las = 1) plot(x = denv2$alt, y = denv$amm, type = "l", xlab = "altitude", ylab = "amm") ``` <img src="index_files/figure-html/unnamed-chunk-56-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r p <- denv2 %>% ggplot(aes(x = alt, y = amm)) + geom_line() + xlab("altitude") + theme_bw() p ``` <img src="index_files/figure-html/unnamed-chunk-57-1.png" style="display: block; margin: auto;" /> ] --- # Basic plots .pull-left[ ```r par(las = 1) plot(x = denv2$alt, y = denv$amm, pch = 19, col = denv2$nit2) ``` <img src="index_files/figure-html/unnamed-chunk-58-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r p <- denv2 %>% ggplot(aes(x = alt, y = amm, colour = nit2)) + geom_point() + xlab("altitude") + theme_bw() p ``` <img src="index_files/figure-html/unnamed-chunk-59-1.png" style="display: block; margin: auto;" /> ] --- # Basic plots ```r colors()[1:50] # [1] "white" "aliceblue" "antiquewhite" "antiquewhite1" "antiquewhite2" # [6] "antiquewhite3" "antiquewhite4" "aquamarine" "aquamarine1" "aquamarine2" # [11] "aquamarine3" "aquamarine4" "azure" "azure1" "azure2" # [16] "azure3" "azure4" "beige" "bisque" "bisque1" # [21] "bisque2" "bisque3" "bisque4" "black" "blanchedalmond" # [26] "blue" "blue1" "blue2" "blue3" "blue4" # [31] "blueviolet" "brown" "brown1" "brown2" "brown3" # [36] "brown4" "burlywood" "burlywood1" "burlywood2" "burlywood3" # [41] "burlywood4" "cadetblue" "cadetblue1" "cadetblue2" "cadetblue3" # [46] "cadetblue4" "chartreuse" "chartreuse1" "chartreuse2" "chartreuse3" library(RColorBrewer) ``` --- # Basic plots .pull-left[ ```r par(las = 1) plot(x = denv2$alt, y = denv$amm, pch = 19, xlab = "altitude", ylab = "amm", col = c("salmon1", "turquoise2")[denv2$nit2+1], cex = 2) ``` <img src="index_files/figure-html/unnamed-chunk-61-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r p <- denv2 %>% ggplot(aes(x = alt, y = amm, colour = nit2)) + geom_point(size = 3) + xlab("altitude") + theme_bw() p ``` <img src="index_files/figure-html/unnamed-chunk-62-1.png" style="display: block; margin: auto;" /> ] --- # Basic plots .pull-left[ ```r par(las = 1, mar = c(4,4,1,1)) plot(x = denv2$alt, y = denv$amm, pch = 19, xlab = "altitude", ylab = "amm", col = c("salmon1", "turquoise2")[denv2$nit2+1], cex = 1.5) ``` <img src="index_files/figure-html/unnamed-chunk-63-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r p <- denv2 %>% ggplot(aes(x = alt, y = amm, colour = nit2)) + geom_point(size = 3) + xlab("altitude") + theme_bw() p ``` <img src="index_files/figure-html/unnamed-chunk-64-1.png" style="display: block; margin: auto;" /> ] --- # Export your figure <br> ```R png("output/figb.png", unit = "in", width = 6, height = 5, res = 300) par(las = 1, mar = c(4,4,1,1)) plot(x = denv2$alt, y = denv$amm, pch = 19, xlab = "altitude", ylab = "amm", col = c("salmon1", "turquoise2")[denv2$nit2+1], cex = 1.5) dev.off() ``` --- # Export your figure <br> ```R p <- denv2 %>% ggplot(aes(x = alt, y = amm, colour = nit2)) + geom_point(size = 3) + xlab("altitude") + theme_bw() ggsave("output/figg.png", unit = "in", width = 6, height = 5) ``` --- # Hints <br> ### Base plots - remove all components and add one layer at a time - use `?par` as well as the help of the functions you'll use <br> ### ggplot2 - https://ggplot2.tidyverse.org/ - find the geom (`gg_XXX`) you need: [ggplot2 extensions](https://www.ggplot2-exts.org/) --- # Other graphs? <br> ## [Cookbook for R](http://www.cookbook-r.com/Graphs/) ## [r-graph-gallery](https://www.r-graph-gallery.com/) --- # Boxplots `amm` vs in `denv2` -- ### Baseplot - `boxplot()` - `par()` -- ### ggplot2 - `geom_boxplot()` --- # Boxplots .pull-left[ ```r par(las = 1) boxplot(oxy~alt2, data = denv2) ``` <img src="index_files/figure-html/unnamed-chunk-65-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r p <- denv2 %>% ggplot(aes(x = alt2, y = oxy)) + geom_boxplot() p ``` <img src="index_files/figure-html/unnamed-chunk-66-1.png" style="display: block; margin: auto;" /> ] --- # Custom plot <br> - Let's create a 3-panels figure ```r mat <- matrix(c(1,1,2,3), 2, 2) mat # [,1] [,2] # [1,] 1 2 # [2,] 1 3 ``` --- # Custom plot - 3 panels <br> .pull-left[ ```R layout(mat, widths = c(1.5, 1)) layout.show(3) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-68-1.png" style="display: block; margin: auto;" /> ] --- # Custom plot - 3 panels <br> .pull-left[ ```R layout(mat, widths = c(1.1, 1)) par(las = 1, mar = c(4,4,1,1)) #-- Panel 1 plot(1,1) #-- Panel 2 plot(x = denv2$alt, y = denv$amm, pch = 19, xlab = "altitude", ylab = "amm", col = c("salmon1", "turquoise2")[denv2$nit2+1], cex = 1.2) #-- Panel 3 boxplot(amm~alt2, data = denv2, ylim = c(0, 80)) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-69-1.png" style="display: block; margin: auto;" /> ] --- # Custom plot - Panel 1 .pull-left[ ```R plot(denv2$pho, -denv2$alt) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-70-1.png" style="display: block; margin: auto;" /> ] --- # Custom plot - Panel 1 .pull-left[ ```R par(las = 1, bty = "l") plot(denv2$pho, -denv2$alt, ann = FALSE, axes = FALSE, type = "n") ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-71-1.png" style="display: block; margin: auto;" /> ] --- # Custom plot - Panel 1 .pull-left[ ```R par(las = 1, bty = "l") plot(denv2$pho, -denv2$alt, ann = FALSE, axes = FALSE, type = "n", xlim = c(0, 500), ylim = c(-1000, 0)) points(denv2$pho, -denv2$alt, pch = 19, cex = 1.1) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-72-1.png" style="display: block; margin: auto;" /> ] --- # Custom plot - Panel 1 .pull-left[ ```R par(las = 1, bty = "l") plot(denv2$pho, -denv2$alt, ann = FALSE, axes = FALSE, type = "n", xlim = c(0, 500), ylim = c(-1000, 0)) lines(denv2$pho, -denv2$alt, lwd = 1.5) axis(1) axis(2, at = seq(-1000, 0, 200), labels = abs(seq(-1000, 0, 200))) ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-73-1.png" style="display: block; margin: auto;" /> ] --- # Custom plot - Panel 1 .pull-left[ ```R par(las = 1, bty = "l") plot(denv2$pho, -denv2$alt, ann = FALSE, axes = FALSE, type = "n", xlim = c(0, 500), ylim = c(-1000, 0)) lines(denv2$pho, -denv2$alt, lwd = 1.5) axis(1) axis(2, at = seq(-1000, 0, 200), labels = abs(seq(-1000, 0, 200))) title(main = "A", xlab = "Phosphorous", ylab = "Altitude") ``` ] .pull-right[ <img src="index_files/figure-html/unnamed-chunk-74-1.png" style="display: block; margin: auto;" /> ] --- # Custom plot ```r png("output/figure1.png", unit = "in", width = 7, height = 6, res = 300) layout(mat, widths = c(1.2, 1)) par(mar = c(4,4,1,1)) #-- Panel 1 par(las = 1, bty = "l", xaxs = "i", yaxs = "i") plot(denv2$pho, -denv2$alt, ann = FALSE, axes = FALSE, type = "n", xlim = c(0, 500), ylim = c(-1000, -100)) lines(denv2$pho, -denv2$alt, lwd = 1.5) axis(1) axis(2, at = seq(-1000, 0, 200), labels = abs(seq(-1000, 0, 200))) title(main = "A", xlab = "Phosphorous", ylab = "Altitude") #-- Panel 2 par(bty = "o", xaxs = "r", yaxs = "r") plot(x = denv2$alt, y = denv$amm, pch = 19, xlab = "altitude", ylab = "amm", col = c("salmon1", "turquoise2")[denv2$nit2+1], cex = 1.5) #-- Panel 3 boxplot(oxy~alt2, data = denv2) ``` --- # Custom plot .center[![:scale 80%](output/figure1.png)] --- # Custom plot <br> - Another example: https://insileco.github.io/Visualisation-SentinelleNord/#33 - See https://insileco.github.io/VisualiseR/ - Partition with ggplot2 - [article on sthda](http://www.sthda.com/english/wiki/ggplot2-facet-split-a-plot-into-a-matrix-of-panels) - [see patchwork](https://github.com/thomasp85/patchwork) --- class: inverse, center, bottom background-image: url(https://upload.wikimedia.org/wikipedia/commons/thumb/a/ad/BlankMap-World_gray.svg/1405px-BlankMap-World_gray.svg.png) background-size: contain # Mapping with R ![:faic](map) <br> --- # Overview of the section <br> <img src="images/todo.png" style="float:right;width:170px;margin: 0px 0px"> 1. Why mapping in R? 2. Spatial data & Coordinate reference systems 3. Vector data with **`sf`** 4. Raster data with **`raster`** 5. Thematic maps 6. Interactive maps 7. Questions, discussion, and use of your data --- # Packages for this section <br> ```r library(sf) # spatial vector data library(raster) # spatial raster data library(RColorBrewer) # colors library(mapview) # interactive maps ``` --- # Why mapping in Ecology? .pull-left[ ![](images/thematic-map.png) > [R script for this map](https://mhbrice.github.io/Rspatial/Rspatial_script.html) ] .pull-left[ 1. show where your plots are 2. show the spatial distribution of your variables 3. show results of spatial analyses ] --- # Why using R for mapping? 1. Open-source, free 2. Workflow and reproducibility 3. Quite efficient - well-defined spatial classes - can read/write/convert many formats 4. Can also be used for spatial data manipulation and analysis as a GIS [![:faic](external-link)](https://mhbrice.github.io/Rspatial/Rspatial_script.html) --- class: inverse, center, middle # Spatial data --- # Vector data <br><br> .center[ ![](images/vector-data.png) ] --- # Raster data <br><br> .center[ ![](images/raster-data.png) ] --- # Geospatial data in R .column-left[ #### Abiotic data - [`raster`](https://cran.r-project.org/web/packages/raster/index.html) - [`marmap`](https://github.com/ericpante/marmap) - [`rnoaa`](https://github.com/ropensci/rnoaa) - [`rWBclimate`](https://github.com/ropensci/rWBclimate) - [`sdmpredictors`](https://cran.r-project.org/web/packages/sdmpredictors/index.html) ] .column-center[ #### Biotic data - [`rgbif`](https://github.com/ropensci/rgbif) - [`robis`](https://github.com/iobis/robis) - [`spocc`](https://github.com/ropensci/spocc) ] .column-right[ #### Base maps - [`ggmap`](https://github.com/dkahle/ggmap) - [`mregions`](https://github.com/ropenscilabs/mregions) - [`osmdata`](https://github.com/ropensci/osmdata) - [`raster`](https://cran.r-project.org/web/packages/raster/index.html) - [`rnaturalearth`](https://github.com/ropenscilabs/rnaturalearth) ] --- class: inverse, center, middle # Coordinate reference systems .center[ ![:scale 50%](http://gistbok.ucgis.org/sites/default/files/figure2-projections.png) ] --- # Geographic vs projected CRS <img src="index_files/figure-html/canada-1.png" width="648" style="display: block; margin: auto;" /> > [What are geographic coordinate systems?](http://desktop.arcgis.com/en/arcmap/10.3/guide-books/map-projections/about-geographic-coordinate-systems.htm) > <br> > [What are projected coordinate systems?](http://desktop.arcgis.com/en/arcmap/10.3/guide-books/map-projections/about-projected-coordinate-systems.htm) --- # Define CRS with EPSG or proj4string Many, many ways to represent the 3-D shape of the earth and to project it in a 2-D plane - each CRS can be defined either by an `EPSG` or a `proj4string` > The EPSG code is a numeric representation of a CRS, while the proj4string reprensents the full set of parameters spelled out in a string: > > <br> > > EPSG `4326` ![:faic](arrow-right) proj4 `+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs` > > <br> > > EPSG `32188` ![:faic](arrow-right) proj4 `+proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs` - All geographic files are created with a specific CRS - but it's not always defined! - To find CRS in any format: [Spatial Reference](http://spatialreference.org/) --- class: inverse, center, middle # Vector data with `sf` --- # Intro to Simple Features <br> <img src="https://user-images.githubusercontent.com/520851/34887433-ce1d130e-f7c6-11e7-83fc-d60ad4fae6bd.gif" style="float:right;width:150px;height:150px;"> - .alert[Simple features refers to a formal standard] (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers - .alert[`sf` objects are easy to plot, manipulate, import and export] - Spatial objects are stored as data frames, with the feature geometries stored in list-columns - `tidyverse` friendly - .alert[GREAT documentation!] See [sf vignettes](https://cran.rstudio.com/web/packages/sf/index.html) --- # Intro to Simple Features .center[ ![:scale 60%](images/sf-classes.png) ] --- # Intro to Simple Features ![](images/sf_object.png) > [sf vignette #1](https://cran.r-project.org/web/packages/sf/vignettes/sf1.html) --- # Geospatial data in R Let's create a thematic map of Quebec (our study area) using data ravailable in R using `getData()` from `raster`: - Canadian provincial boundaries - vector data - Climate data - raster data ```r # Create a new directory dir.create("data") can <- getData("GADM", country = "CAN", level = 1, path = "data") can ``` ``` # class : SpatialPolygonsDataFrame # features : 13 # extent : -141.0069, -52.61889, 41.67693, 83.11042 (xmin, xmax, ymin, ymax) # coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 # variables : 10 # names : GID_0, NAME_0, GID_1, NAME_1, VARNAME_1, NL_NAME_1, TYPE_1, ENGTYPE_1, CC_1, HASC_1 # min values : CAN, Canada, CAN.1_1, Alberta, Acadia|Nouvelle-Écosse, NA, Province, Province, 10, CA.AB # max values : CAN, Canada, CAN.9_1, Yukon, Yukon Territory|Territoire du Yukon|Yukon|Yuk¢n, NA, Territoire, Territory, 62, CA.YT ``` --- # Prep vector data Convert from `sp` (`sf`'s ancester) to `sf` objects to facilitate manipulation ```r can_sf <- st_as_sf(can) can_sf[1:3,] ``` ``` # Simple feature collection with 3 features and 10 fields # geometry type: MULTIPOLYGON # dimension: XY # bbox: xmin: -139.0522 ymin: 48.30861 xmax: -88.96722 ymax: 60.00208 # epsg (SRID): 4326 # proj4string: +proj=longlat +datum=WGS84 +no_defs # GID_0 NAME_0 GID_1 NAME_1 VARNAME_1 NL_NAME_1 # 1 CAN Canada CAN.1_1 Alberta <NA> <NA> # 6 CAN Canada CAN.2_1 British Columbia Colombie britannique|New Caledonia <NA> # 7 CAN Canada CAN.3_1 Manitoba <NA> <NA> # TYPE_1 ENGTYPE_1 CC_1 HASC_1 geometry # 1 Province Province 48 CA.AB MULTIPOLYGON (((-111.9534 4... # 6 Province Province 59 CA.BC MULTIPOLYGON (((-123.5406 4... # 7 Province Province 46 CA.MB MULTIPOLYGON (((-90.385 57.... ``` --- # Prep vector data Retrieve Quebec and surrounding provinces ```r neigh <- c("Québec", "Ontario", "Nova Scotia", "New Brunswick", "Newfoundland and Labrador") qc_neigh <- dplyr::filter(can_sf, NAME_1 %in% neigh) # or using base R # qc_neigh <- can_sf[can_sf$NAME_1 %in% neigh,] qc_neigh # Simple feature collection with 5 features and 10 fields # geometry type: MULTIPOLYGON # dimension: XY # bbox: xmin: -95.15598 ymin: 41.67693 xmax: -52.61889 ymax: 63.69792 # epsg (SRID): 4326 # proj4string: +proj=longlat +datum=WGS84 +no_defs # GID_0 NAME_0 GID_1 NAME_1 # 1 CAN Canada CAN.4_1 New Brunswick # 2 CAN Canada CAN.5_1 Newfoundland and Labrador # 3 CAN Canada CAN.7_1 Nova Scotia # 4 CAN Canada CAN.9_1 Ontario # 5 CAN Canada CAN.11_1 Québec # VARNAME_1 NL_NAME_1 TYPE_1 ENGTYPE_1 CC_1 HASC_1 # 1 Nouveau-Brunswick|Acadia <NA> Province Province 13 CA.NB # 2 Newfoundland|Terre-Neuve|Terre-Neuve-et-Labrador <NA> Province Province 10 CA.NF # 3 Acadia|Nouvelle-Écosse <NA> Province Province 12 CA.NS # 4 Upper Canada <NA> Province Province 35 CA.ON # 5 Lower Canada <NA> Province Province 24 CA.QC # geometry # 1 MULTIPOLYGON (((-66.84995 4... # 2 MULTIPOLYGON (((-53.37361 4... # 3 MULTIPOLYGON (((-66.01603 4... # 4 MULTIPOLYGON (((-74.48135 4... # 5 MULTIPOLYGON (((-74.64564 4... ``` --- # Prep vector data Change projection for a better representation of Quebec. > Here, I chose EPSG 32188 which corresponds to NAD83 - MTM zone 9 ```r qc_neigh_prj <- st_transform(qc_neigh, crs = 32188) qc_neigh_prj # Simple feature collection with 5 features and 10 fields # geometry type: MULTIPOLYGON # dimension: XY # bbox: xmin: -1260743 ymin: 4653809 xmax: 1873536 ymax: 7071370 # epsg (SRID): 32188 # proj4string: +proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs # GID_0 NAME_0 GID_1 NAME_1 # 1 CAN Canada CAN.4_1 New Brunswick # 2 CAN Canada CAN.5_1 Newfoundland and Labrador # 3 CAN Canada CAN.7_1 Nova Scotia # 4 CAN Canada CAN.9_1 Ontario # 5 CAN Canada CAN.11_1 Québec # VARNAME_1 NL_NAME_1 TYPE_1 ENGTYPE_1 CC_1 HASC_1 # 1 Nouveau-Brunswick|Acadia <NA> Province Province 13 CA.NB # 2 Newfoundland|Terre-Neuve|Terre-Neuve-et-Labrador <NA> Province Province 10 CA.NF # 3 Acadia|Nouvelle-Écosse <NA> Province Province 12 CA.NS # 4 Upper Canada <NA> Province Province 35 CA.ON # 5 Lower Canada <NA> Province Province 24 CA.QC # geometry # 1 MULTIPOLYGON (((833789.8 49... # 2 MULTIPOLYGON (((1840715 537... # 3 MULTIPOLYGON (((910807.4 48... # 4 MULTIPOLYGON (((227546.9 49... # 5 MULTIPOLYGON (((214478.2 49... ``` --- # Prep vector data It would take a while to plot because there is a lot of unnecessary details, so we can simplify the shape of the polygons using `st_simplify()`. ```r qc_neigh_simple <- st_simplify(qc_neigh_prj, dTolerance = 1000) # simplify geometry at a 1km scale) ``` --- # Simple plot - default plot of `sf` object creates multiple maps (one per attribute). - useful for exploring the spatial distribution of different variables. ```r plot(qc_neigh_simple) ``` <img src="index_files/figure-html/base_plot-1.png" width="432" style="display: block; margin: auto;" /> --- # Simple plot ```r # retrieve Québec only qc_simple <- dplyr::filter(qc_neigh_simple, NAME_1=="Québec") plot(st_geometry(qc_neigh_simple), col = "grey65") plot(st_geometry(qc_simple), col = "blue3", add = TRUE) ``` <img src="index_files/figure-html/map_1-1.png" width="432" style="display: block; margin: auto;" /> --- # Create a layer of `MULTIPOINTS` - We sampled vegetation in 100 plots across Québec; - We want to plot them with points proportional to their species richness. - Let's create a data frame with coordinates + random values. ```r # Sample random points from our study area sample_pts <- st_sample(x = qc_simple, size = 100) # Create an attribute of fake species richness (with 5 to 50 species per plot) richness <- sample(x = 5:50, size = length(sample_pts), replace = TRUE) sample_pts <- st_sf(sample_pts, richness = richness) sample_pts # Simple feature collection with 100 features and 1 field # geometry type: POINT # dimension: XY # bbox: xmin: -121687.8 ymin: 5002564 xmax: 1178245 ymax: 6809289 # epsg (SRID): 32188 # proj4string: +proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs # First 10 features: # richness sample_pts # 1 14 POINT (771.6457 6024937) # 2 35 POINT (131842.4 5418371) # 3 22 POINT (373684.6 5445284) # 4 24 POINT (-90667.16 5625926) # 5 32 POINT (57234.65 6036940) # 6 8 POINT (262141 6543959) # 7 38 POINT (165641.7 6185047) # 8 10 POINT (253332.6 6762438) # 9 16 POINT (521013.7 6304468) # 10 42 POINT (223655.9 6388818) ``` --- # Get bioclimatic rasters - Assuming we are interested in the latitudinal temperature gradient, we add a raster of mean annual temperature (`bio1`) as a background to our map. - Let's use a low resolution so it does not take too long to plot. ```r bioclim <- getData("worldclim", var = "bio", res = 10, path = "data") # There are 19 layers in this raster. plot(bioclim) ``` <img src="index_files/figure-html/temp-1.png" width="576" style="display: block; margin: auto;" /> --- # Prep raster data Look at the annual mean temperature raster data, `bio1`: ```r bioclim$bio1 # class : RasterLayer # dimensions : 900, 2160, 1944000 (nrow, ncol, ncell) # resolution : 0.1666667, 0.1666667 (x, y) # extent : -180, 180, -60, 90 (xmin, xmax, ymin, ymax) # coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 # data source : /home/kevcaz/Github/Tutorials/IntroR/data/wc10/bio1.bil # names : bio1 # values : -269, 314 (min, max) ``` -- We need to divide by 10 (because Worldclim temperature data are in °C * 10) ```r temp <- bioclim$bio1/10 temp # class : RasterLayer # dimensions : 900, 2160, 1944000 (nrow, ncol, ncell) # resolution : 0.1666667, 0.1666667 (x, y) # extent : -180, 180, -60, 90 (xmin, xmax, ymin, ymax) # coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 # data source : in memory # names : bio1 # values : -26.9, 31.4 (min, max) ``` --- # Prep raster data - projection Change projection to match with the polygons using `projectRaster()`. - `projectRaster()` requires the PROJ.4 format of the CRS: ```r st_crs(qc_simple)$proj4string # [1] "+proj=tmerc +lat_0=0 +lon_0=-73.5 +k=0.9999 +x_0=304800 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs" ``` ```r temp_prj <- projectRaster(temp, crs = st_crs(qc_simple)$proj4string) ``` --- # Prep raster data - crop and mask 1. `crop()` decreases the extent of a raster using the extent of another spatial object. `crop()` expects a `sp` object, so we need to transform the polygon first. 2. `mask()` keeps the raster values only in the area of interest and set the rest to `NA`. ```r temp_crop <- crop(temp_prj, as(qc_simple, "Spatial")) temp_mask <- mask(temp_crop, qc_simple) ``` ```r par(mfrow = c(1,2)) raster::plot(temp_crop, main = "crop") raster::plot(temp_mask, main = "mask") ``` <img src="index_files/figure-html/unnamed-chunk-81-1.png" width="504" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Thematic map --- # Creating a simple layout ```r my.layout <- layout(matrix(1:2, 2), heights = c(1,.2)) layout.show(my.layout) ``` <img src="index_files/figure-html/layout-1.png" width="396" style="display: block; margin: auto;" /> --- # Define color palette Palette of color for the temperature raster using `brewer.pal()` and `colorRampPalette()` ```r display.brewer.pal(11, "RdYlBu") ``` <img src="index_files/figure-html/color-1.png" width="432" style="display: block; margin: auto;" /> ```r mypal1 <- rev(brewer.pal(11, "RdYlBu")) mypal1 # [1] "#313695" "#4575B4" "#74ADD1" "#ABD9E9" "#E0F3F8" "#FFFFBF" "#FEE090" "#FDAE61" # [9] "#F46D43" "#D73027" "#A50026" mypal2 <- colorRampPalette(mypal1) mypal2(16) # [1] "#313695" "#3E60A9" "#5487BD" "#74ADD1" "#98CAE1" "#BCE1ED" "#E0F3F8" "#F4FBD2" # [9] "#FEF4AF" "#FEE090" "#FDBE70" "#FA9857" "#F46D43" "#E04430" "#C62026" "#A50026" mypal <- mypal2(200) ``` --- # Step by step thematic map Plot neighbor provinces as background ```r layout(matrix(1:2, 2), heights = c(1,.2)) par(las = 1, xaxs='i', yaxs='i', mar = c(2,3,0,0)) plot(st_geometry(qc_neigh_simple), col = '#b5cfbd', border = 'grey50', axes = TRUE, graticule = TRUE) # add graticules ``` <img src="index_files/figure-html/unnamed-chunk-82-1.png" width="396" style="display: block; margin: auto;" /> --- # Step by step thematic map Add the temperature raster ```r raster::image(temp_mask, add = TRUE, col = mypal) ``` <img src="index_files/figure-html/unnamed-chunk-83-1.png" width="396" style="display: block; margin: auto;" /> --- # Step by step thematic map Add the Quebec polygon's boundary on top ```r plot(st_geometry(qc_simple), add = TRUE, border = 'grey15', lwd = 1.4) ``` <img src="index_files/figure-html/unnamed-chunk-84-1.png" width="396" style="display: block; margin: auto;" /> --- # Step by step thematic map Add sample points with size proportional to their species richness ```r plot(st_geometry(sample_pts), add = TRUE, pch = 21, bg = "#63636388", col = "grey15", lwd = 1.4, cex = sample_pts$richness/25) # Size proportional to richness ``` <img src="index_files/figure-html/unnamed-chunk-85-1.png" width="396" style="display: block; margin: auto;" /> --- # Step by step thematic map Create a color legend manually ```r val <- range(values(temp_mask), na.rm = TRUE) val # [1] -9.870578 6.487884 mat <- as.matrix(seq(val[1], val[2], length = 200)) image(x = mat[,1], y = 1, z = mat, col = mypal, axes = FALSE, ann = FALSE) axis(1) mtext(side = 1, line = 1.8, text = 'Mean annual temperature (°C)') ``` <img src="index_files/figure-html/unnamed-chunk-86-1.png" width="432" style="display: block; margin: auto;" /> --- # Step by step thematic map Add the color legend under the map ```r par(mar = c(3,4,1,4)) raster::image(x = mat[,1], y = 1, z = mat, col = mypal, axes = FALSE, ann = FALSE) axis(1, cex.axis = .8) ``` <img src="index_files/figure-html/unnamed-chunk-87-1.png" width="396" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Interactive map --- # Interactive map ```r mapView(qc_simple) ```
--- # Interactive map ```r mapView(sample_pts, layer.name = "Species richness") ```
--- # Interactive map ```r mapView(sample_pts, col.regions = "red", cex = 'richness', legend = FALSE, map.types = "Esri.WorldImagery") ```
--- # Great online resources .pull-left[ #### Good tutorials for spatial data in R - [Raster analysis in R](https://mgimond.github.io/megug2017/) - [Geocomputation with R](https://geocompr.robinlovelace.net/intro.html) - [Spatial data in R](https://github.com/Pakillo/R-GIS-tutorial/blob/master/R-GIS_tutorial.md) - [Document par Nicolas Casajus (fr)](https://qcbs.ca/wiki/_media/gisonr.pdf) - [r-spatial](http://r-spatial.org/) - [Tutorial on datacamp](https://www.datacamp.com/courses/spatial-analysis-in-r-with-sf-and-raster) - [R in space - Insileco](https://insileco.github.io/2018/04/14/r-in-space---a-series/) - [Geospatial analyses & maps with R](https://mhbrice.github.io/Rspatial/Rspatial_script.html) #### Get free data - [free data at country level](http://www.diva-gis.org/gdata) - [Quebec free data](http://mffp.gouv.qc.ca/le-ministere/acces-aux-donnees-gratuites/) - [find more spatial data](https://freegisdata.rtwilson.com/) - [create shapefile on line](http://geojson.io/) - EPSG: [link1](http://spatialreference.org/); [link2](http://epsg.io/) ] .pull-right[ #### Maps in R - [Introduction to visualising spatial data in R](https://cran.r-project.org/doc/contrib/intro-spatial-rl.pdf) - [Geocomputation with R](https://geocompr.robinlovelace.net/adv-map.html) - [choropleth](https://cengel.github.io/rspatial/4_Mapping.nb.html) - [leaflet](https://rstudio.github.io/leaflet/) - [Mapview](https://r-spatial.github.io/mapview/index.html) - [tmap](https://cran.r-project.org/web/packages/tmap/vignettes/tmap-nutshell.html) - [plotly](https://plot.ly/python/maps/) - [Animated maps](https://insileco.github.io/2017/07/05/animations-in-r-time-series-of-erythemal-irradiance-in-the-st.-lawrence/) #### `sf` manipulations - [sf vignette #4](https://cran.r-project.org/web/packages/sf/vignettes/sf4.html) - [Geocomputation with R](https://geocompr.robinlovelace.net/attr.html) - [Attribute manipulations](https://insileco.github.io/2018/04/09/r-in-space---attribute-manipulations/) - [Tidy spatial data in R](http://strimas.com/r/tidy-sf/) ] --- class: inverse, center, middle # Introduction to ordination: PCA+RDA ## <i class="fa fa-sitemap" aria-hidden="true"></i> --- # Ordination of multivariate data <br> .center[![:scale 70%](images/ordi1.png)] --- # Ordination of multivariate data <br> .center[![:scale 70%](images/ordi2.png)] --- # Ordination of multivariate data <br> .center[![:scale 70%](images/ordi3.png)] --- # Unconstrained vs constrained ordinations ### Unconstrained ordination - Reveal the main patterns in multivariate data (e.g. matrix of species abundance at different sites or matrix of environmental variables at different sites) - Describe relationship among variables of one matrix. - Descriptive method: no statistical test ### Constrained/canonical ordination - Explicitly puts into relationship two matrices: one dependent matrix and one explanatory matrix. - This approach combines the techniques of ordination and multiple regression. - Explanatory method: statistical test <br> *Both types of ordination are based upon a comparison of all possible pairs of objects (or descriptors) using association measures.* --- # Unconstrained vs constrained ordinations <br> | Response data | Explanatory variables | Analysis | |---------------|------------------------|---------------------| | 1 variable | 1 variable | Simple regression | | 1 variable | m variables | Multiple regression | | p variables | - | Simple ordination | | p variables | m variables | Canonical ordination| --- # Steps to ordination 1. Explore your data: - multivariate data, species abundance or environmental variables, 1 or 2 matrices 2. Prep your data 3. Perform an ordination analysis: - Unconstrained to describe 1 matrix - Constrained to explain the species matrix by the environmental matrix 4. Interpret the output 5. Plot 6. For constrained ordination: variable selection + test of significance --- # Packages for this section ```r library(vegan) # for all multivariate methods library(ade4) # for the Doubs dataset library(scales) # for color transparency ``` --- class: inverse, center, middle # Explore data --- # Doubs River Fish Dataset .pull-left[ Verneaux (1973) dataset: - characterization of fish communities - 27 different species - 30 different sites - 11 environmental variables Load the Doubs River species data ```r data(doubs) # available in ade4 spe <- doubs$fish env <- doubs$env ``` ] .pull.right[ ![:scale 50%](images/DoubsRiver.png) ] --- # Explore environmental data ```r head(env) # first 6 rows # dfs alt slo flo pH har pho nit amm oxy bdo # 1 3 934 6.176 84 79 45 1 20 0 122 27 # 2 22 932 3.434 100 80 40 2 20 10 103 19 # 3 102 914 3.638 180 83 52 5 22 5 105 35 # 4 185 854 3.497 253 80 72 10 21 0 110 13 # 5 215 849 3.178 264 81 84 38 52 20 80 62 # 6 324 846 3.497 286 79 60 20 15 0 102 53 ``` --- # Explore environmental data ```r pairs(env) ``` <img src="index_files/figure-html/unnamed-chunk-91-1.png" width="792" style="display: block; margin: auto;" /> --- # Explore species data ```r head(spe) # Cogo Satr Phph Neba Thth Teso Chna Chto Lele Lece Baba Spbi Gogo Eslu Pefl Rham Legi Scer # 1 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # 2 0 5 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 # 3 0 5 5 5 0 0 0 0 0 0 0 0 0 1 0 0 0 0 # 4 0 4 5 5 0 0 0 0 0 1 0 0 1 2 2 0 0 0 # 5 0 2 3 2 0 0 0 0 5 2 0 0 2 4 4 0 0 2 # 6 0 3 4 5 0 0 0 0 1 2 0 0 1 1 1 0 0 0 # Cyca Titi Abbr Icme Acce Ruru Blbj Alal Anan # 1 0 0 0 0 0 0 0 0 0 # 2 0 0 0 0 0 0 0 0 0 # 3 0 0 0 0 0 0 0 0 0 # 4 0 1 0 0 0 0 0 0 0 # 5 0 3 0 0 0 5 0 0 0 # 6 0 2 0 0 0 1 0 0 0 ``` --- # Explore species data Take a look at the distribution of species frequencies ```r ab <- table(unlist(spe)) barplot(ab, las = 1, col = grey(5:0/5), xlab = "Abundance class", ylab = "Frequency") ``` <img src="index_files/figure-html/unnamed-chunk-93-1.png" width="576" style="display: block; margin: auto;" /> .alert[Note the proportion of 0s.] --- # Double zero **How do we interpret the double absence of species in a matrix of species abundances (or presence-absence)?** <img style="float: right; width:50%;margin: 1%" src="images/double-zero.png"> - the .red[*presence*] of a species in 2 sites indicates .red[ecological resemblance] between these 2 sites - the sites provide a set of minimal conditions allowing the species to survive. - the .red[*absence*] of a species from 2 sites may be due to a .red[variety of causes]: competition, several dimensions of the niche (pH, light, temperature...), random dispersal process. .alert[A measured 0 (e.g 0mg/L, 0°C) is not the same than a 0 representing an absence of observation] --- class: inverse, center, middle # Data preparation --- # Distance measures .comment[Ordination analysis are based on the comparison of all possible pairs of objects. PCA and RDA are based on the Euclidean distance.] <br> Here we want to measure the ecological resemblance (distance or similarity) among sites. <img style="float: right; width:30%;margin: 1%" src="images/distMes.png"> - To compare sites based on their .alert[environmental variables], we can use the .alert[Eucidean distance] - To compare sites based on their .alert[species composition], we don't want to consider double-zero as a similarity. .alert[Euclidean distance is not appropriate]. - Solution 1: **Pre-transform** the species matrix to trick the PCA or RDA to use an appropriate distance such as Hellinger. - Solution 2: Run a correspondance analysis (CA) which is based on the Chi-square distance - Solution 3: Compute a distance matrix from the species matrix and run a PCoA --- # Standardization of environmental data Standardizing environmental variables is necessary as you cannot compare the effects of variables with different units ```r env.z <- decostand(env, method = "standardize") # env.z <- scale(env) ``` Standardization centers (mean `\(\mu\)` = 0) and scales (standard deviation `\(\sigma\)` = 1) the variables ```r apply(env.z, 2, mean) # dfs alt slo flo pH har # 2.307905e-17 1.814232e-18 -1.196236e-17 4.219715e-17 -5.547501e-18 3.348595e-16 # pho nit amm oxy bdo # 2.662801e-17 -4.104360e-17 -4.357409e-17 -2.636780e-16 6.859928e-17 apply(env.z, 2, sd) # dfs alt slo flo pH har pho nit amm oxy bdo # 1 1 1 1 1 1 1 1 1 1 1 ``` --- # Transforming species community data <br> .center[ ![](images/transformation.png)] --- # Transforming species community data .alert[Hellinger transformation] ```r spe.hel <- decostand(spe, method = "hellinger") ``` <br> ### Other transformation methods: - `method = "chi.square"` ![:faic](arrow-right) **Chi-2 transformation**. Using this transformation in a PCA yields similar results to that of a correspondance analysis (CA). - `method = "normalize"` ![:faic](arrow-right) **Chord transformation** - `method = "total"` ![:faic](arrow-right) **Species profile transformation** > According to [Legendre & Gallagher](http://adn.biol.umontreal.ca/~numericalecology/Reprints/Legendre_&_Gallagher.pdf): The Hellinger and chord transformations appear to be the best for general use with species data. --- class: inverse, center, middle # Unconstrained ordination ## PCA --- # Principal Component Analysis (PCA) - Starting from a multidimensional sites x descriptors matrix - where the descriptors can be species abundance (or presence-absence) or environmental variables - A PCA will preserve the maximum amount of variation in the data in a reduced number of dimensions (usually we use the first 2 dimensions) - The resulting, synthetic variables are orthogonal (i.e. perpendicular and therefore uncorrelated) .center[ ![:scale 70%](images/pca_3d_2d.jpg)] --- # PCA - Multidimensional case <br/><br/> - **PC1** ![:faic](arrow-right) axis that maximizes the variance of the points that are projected perpendicularly onto the axis. - **PC2** ![:faic](arrow-right) must be perpendicular to PC1, but the direction is again the one in which variance is maximized when points are perpendicularly projected - **PC3** ![:faic](arrow-right) and so on: perpendicular to the first two axes <br/> .red[For a matrix containing *p* descriptors, a PCA produces *p* axes which are ordered according to the % of variation of the data they explain] --- # PCA - Let's try it on the fish dataset! - For both PCA and RDA, we can use the `rda()` function from the vegan package - `rda(Y ~ X)` or `rda(Y, X)` ![:faic](arrow-right) RDA - `rda(Y)` or `rda(X)` ![:faic](arrow-right) PCA --- # PCA - Run a PCA on the Hellinger-transformed fish data and extract the results ```r spe.h.pca <- rda(spe.hel) summary(spe.h.pca) ... # # Call: # rda(X = spe.hel) # # Partitioning of variance: # Inertia Proportion # Total 0.5023 1 # Unconstrained 0.5023 1 # # Eigenvalues, and their contribution to the variance # # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 # Eigenvalue 0.2491 0.06592 0.04615 0.03723 0.02125 0.01662 0.01477 0.01297 # Proportion Explained 0.4959 0.13122 0.09188 0.07412 0.04230 0.03309 0.02940 0.02582 # Cumulative Proportion 0.4959 0.62715 0.71903 0.79315 0.83544 0.86853 0.89794 0.92376 # PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 # Eigenvalue 0.01054 0.006666 0.00504 0.004258 0.002767 0.002612 0.001505 0.001387 # Proportion Explained 0.02098 0.013269 0.01003 0.008477 0.005507 0.005200 0.002996 0.002761 # Cumulative Proportion 0.94474 0.958011 0.96804 0.976521 0.982029 0.987229 0.990225 0.992986 ... ``` --- #PCA - Interpretation of Output ```r summary(spe.h.pca) ... # Call: # rda(X = spe.hel) # # Partitioning of variance: # Inertia Proportion # Total 0.5023 1 # Unconstrained 0.5023 1 ... ``` - `Inertia` ![:faic](arrow-right) general term for "variation" in the data. - Total variance of the dataset (here the fish species) = 0.5023 - In PCA, note that the "Total" and "Unconstrained" portion of the inertia is identical --- # PCA - Interpretation of Output ```r summary(spe.h.pca) ... # Call: # rda(X = spe.hel) # # Partitioning of variance: # Inertia Proportion # Total 0.5023 1 # Unconstrained 0.5023 1 # # Eigenvalues, and their contribution to the variance # # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 # Eigenvalue 0.2491 0.06592 0.04615 0.03723 0.02125 0.01662 0.01477 0.01297 # Proportion Explained 0.4959 0.13122 0.09188 0.07412 0.04230 0.03309 0.02940 0.02582 # Cumulative Proportion 0.4959 0.62715 0.71903 0.79315 0.83544 0.86853 0.89794 0.92376 ... ``` - `Eigenvalue` ![:faic](arrow-right) measures the amount of variation represented by each ordination axe (Principal Component or PC) - One eigenvalue associated to each PC (in this output there are 27 PCs, as this is the number of species) <br/> .center[**0.2491 + 0.06592 + ... = 0.5023 Total inertia**] --- # PCA - Interpretation of Output ```r summary(spe.h.pca) ... # Call: # rda(X = spe.hel) # # Partitioning of variance: # Inertia Proportion # Total 0.5023 1 # Unconstrained 0.5023 1 # # Eigenvalues, and their contribution to the variance # # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 # Eigenvalue 0.2491 0.06592 0.04615 0.03723 0.02125 0.01662 0.01477 0.01297 # Proportion Explained 0.4959 0.13122 0.09188 0.07412 0.04230 0.03309 0.02940 0.02582 # Cumulative Proportion 0.4959 0.62715 0.71903 0.79315 0.83544 0.86853 0.89794 0.92376 ... ``` - `Proportion Explained` ![:faic](arrow-right) proportion of variation accounted for, by dividing the eigenvalues by the total inertia. <br/> .center[**0.2491/0.5023 = 0.4959 or 49.59%**] --- # PCA - Interpretation of Output ```r summary(spe.h.pca) ... # Species scores # # PC1 PC2 PC3 PC4 PC5 PC6 # Cogo 0.17113 0.08669 -0.060772 0.2536941 -0.027774 0.0129709 # Satr 0.64097 0.02193 -0.232895 -0.1429053 -0.059150 0.0013299 # Phph 0.51106 0.19774 0.165053 0.0203364 0.105582 0.1226508 # Neba 0.38002 0.22219 0.235145 -0.0344577 0.126570 0.0570437 # Thth 0.16679 0.06494 -0.087248 0.2444247 0.016349 0.0537362 # Teso 0.07644 0.14707 -0.041152 0.2304754 -0.104984 -0.0424943 # Chna -0.18392 0.05238 -0.042963 0.0222676 0.071349 0.0299974 # Chto -0.14601 0.17844 -0.029561 0.0622957 -0.002615 -0.0839282 ... ``` - `Species scores` ![:faic](arrow-right) coordinates of all descriptors in the multidimensional space of the PCA. <br> - Species always refers to your descriptors (the columns in your matrix), here the fish species. Even if you run a PCA on environmental variables, the descriptors will still be called Species scores --- # PCA - Interpretation of Output ```r summary(spe.h.pca) ... # Site scores (weighted sums of species scores) # # PC1 PC2 PC3 PC4 PC5 PC6 # 1 0.370801 -0.49113 -1.018411 -0.58387 -0.491672 -0.622062 # 2 0.507019 -0.05688 -0.175969 -0.42418 0.407790 0.160725 # 3 0.464652 0.02937 -0.067360 -0.49566 0.324339 0.313053 # 4 0.299434 0.19037 0.241315 -0.54538 0.009838 0.197974 # 5 -0.003672 0.13483 0.515723 -0.53640 -0.796183 -0.208554 # 6 0.212943 0.16142 0.538108 -0.44366 -0.138553 -0.066196 # 7 0.440596 -0.01853 0.174782 -0.31336 0.171713 0.131172 # 8 0.032182 -0.53492 -0.354289 -0.07870 -0.125287 -0.632592 ... ``` - `Site scores` ![:faic](arrow-right) coordinates of all sites in the multidimensional space of the PCA. <br> - Site refers to the rows in your dataset, here the different sites along the Doubs river (but it can be points in time, etc) --- # Selecting Important PCs Select PCs which capture more variance than the average (Kaiser-Guttman criterion) ```r ev <- spe.h.pca$CA$eig # extract eigenvalues barplot(ev, main = "Eigenvalues", col = "grey", las = 2) abline(h = mean(ev), col = "red3", lwd = 2) ``` <img src="index_files/figure-html/unnamed-chunk-103-1.png" width="720" style="display: block; margin: auto;" /> --- # PCA - environmental variables We can also run PCAs on standardized environmental variables, to compare environmental conditions of sites or how environmental variables are correlated... - Run a PCA on the standardized environmental variables and extract the results ```r env.pca <- rda(env.z) summary(env.pca) ... # # Call: # rda(X = env.z) # # Partitioning of variance: # Inertia Proportion # Total 11 1 # Unconstrained 11 1 # # Eigenvalues, and their contribution to the variance # # Importance of components: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 # Eigenvalue 6.3216 2.2316 1.00420 0.50068 0.37518 0.24797 0.16647 0.107161 # Proportion Explained 0.5747 0.2029 0.09129 0.04552 0.03411 0.02254 0.01513 0.009742 # Cumulative Proportion 0.5747 0.7776 0.86886 0.91437 0.94848 0.97102 0.98616 0.995898 # PC9 PC10 PC11 # Eigenvalue 0.02354 0.017259 0.0043137 # Proportion Explained 0.00214 0.001569 0.0003922 # Cumulative Proportion 0.99804 0.999608 1.0000000 ... ``` --- # PCA - environmental variables - Plot the eigenvalues above average ```r ev <- env.pca$CA$eig # extract eigenvalues barplot(ev, main = "Eigenvalues", col = "grey", las = 2) abline(h = mean(ev), col = "red3", lwd = 2) ``` <img src="index_files/figure-html/unnamed-chunk-105-1.png" width="720" style="display: block; margin: auto;" /> --- # PCA - Visualization The abundance of information produced by PCA is easier to understand and interpret using biplots to visualize patterns - We can produce a quick biplot of the PCA using the function `biplot()` in base R ```r biplot(spe.h.pca) ``` <img src="index_files/figure-html/unnamed-chunk-106-1.png" width="432" style="display: block; margin: auto;" /> --- # PCA - Scaling ```r biplot(spe.h.pca, scaling = 1, main = "Scaling 1 = distance biplot") biplot(spe.h.pca, scaling = 2, main = "Scaling 2 = correlation biplot") ``` <img src="index_files/figure-html/unnamed-chunk-107-1.png" width="720" style="display: block; margin: auto;" /> - In a PCA, we interpret the .purple[**correlation among species**] by looking at the angles among their arrows, while we interpret the .purple[**distance among sites**]. - However, we cannot optimally display sites and species together in a PCA biplot. --- # PCA - Scaling <img src="index_files/figure-html/unnamed-chunk-108-1.png" width="648" style="display: block; margin: auto;" /> .pull-left[ **Scaling 1**: - distances among objects are approximations of Euclidean distances; - the angles among descriptors vector are meaningless. .alert[Best for interpreting relationships among objects (sites)!] ] .pull-right[**Scaling 2** (Default): - angles between descriptors vectors reflect their correlations; - distances among objects are not approximations of Euclidean distances. .alert[Best for interpreting relationships among descriptors (species)!] ] --- # Customized biplot <br> - Extract species and site scores along the 1st and 2nd PC: ```r spe.scores <- scores(spe.h.pca, display = "species", choices = 1:2) site.scores <- scores(spe.h.pca, display = "sites", choices = 1:2) ``` - Extract % of variation on the 1st and 2nd PC: ```r prop1 <- round(spe.h.pca$CA$eig[1]/sum(spe.h.pca$CA$eig)*100, 2) prop2 <- round(spe.h.pca$CA$eig[2]/sum(spe.h.pca$CA$eig)*100, 2) ``` --- # Customized biplot Empty plot with informative axis labels ```r plot(spe.h.pca, type = "none", xlab = paste0("PC1 (", prop1, "%)"), ylab = paste0("PC2 (", prop2, "%)")) ``` <img src="index_files/figure-html/unnamed-chunk-111-1.png" width="432" style="display: block; margin: auto;" /> --- # Customized biplot Add points for sites ```r points(site.scores, pch = 21, bg = "steelblue", cex = 1.2) ``` <img src="index_files/figure-html/unnamed-chunk-112-1.png" width="432" style="display: block; margin: auto;" /> --- # Customized biplot Add arrows for species ```r arrows(x0 = 0, y0 = 0, x1 = spe.scores[,1], y1 = spe.scores[,2], length = 0.1) # size of arrow head ``` <img src="index_files/figure-html/unnamed-chunk-113-1.png" width="432" style="display: block; margin: auto;" /> --- # Customized biplot Add labels for species ```r text(spe.scores[,1], spe.scores[,2], labels = rownames(spe.scores), col = "red3", cex = 0.8) ``` <img src="index_files/figure-html/unnamed-chunk-114-1.png" width="432" style="display: block; margin: auto;" /> --- # Customized biplot Complete code for the biplot ```r plot(spe.h.pca, type = "none", xlab = paste0("PC1 (", prop1, "%)"), ylab = paste0("PC2 (", prop2, "%)")) points(site.scores, pch = 21, bg = "steelblue", cex = 1.2) arrows(x0 = 0, y0 = 0, x1 = spe.scores[,1], y1 = spe.scores[,2], length = 0.1) text(spe.scores[,1], spe.scores[,2], labels = rownames(spe.scores), col = "red3", cex = 0.8) ``` --- class: inverse, center, middle # Constrained ordinations ## Redundancy analysis - RDA --- # RDA - RDA = combination of multiple regressions and a PCA. - It allows us to identify and test relationships between a response matrix and explanatory matrix or matrices .center[ ![:scale 60%](images/RDA_mat.png)] - Explanatory variables can be quantitative, qualitative, or binary (0/1). <br> - .alert[Transform] species data and .alert[standardize] quantitative variables prior to running a RDA. --- # Run the RDA - Model the effect of all environmental variables on fish community composition ```r spe.rda <- rda(spe.hel ~., data = env.z) # "~." indicates all variables # or spe.rda <- rda(X = spe.hel, Y = env.z) ``` - .comment[It is better to use the formula interface which is more flexible (allows for factors and interaction among variables)], for example: - `rda(spe.hel ~ var1 + var2*var3 + fac, data = env.z)` - You can also control for a group of variables, for example you are interested in the effect var1,2,3 while holding constant var4,5: - `rda(spe.hel ~ var1 + var2*var3 + Condition(var4 + var5), data = env.z)` --- # RDA output in R ```r summary(spe.rda) ... # # Call: # rda(formula = spe.hel ~ dfs + alt + slo + flo + pH + har + pho + nit + amm + oxy + bdo, data = env.z) # # Partitioning of variance: # Inertia Proportion # Total 0.5023 1.0000 # Constrained 0.3557 0.7081 # Unconstrained 0.1467 0.2919 ... ``` - **`Constrained Proportion`** ![:faic](arrow-right) variance of the response matrix explained by the environmental matrix - .alert[(0.3557/0.5023 = 70.81%)] - Equivalent of unadjusted R2 - **`Unconstained Proportion`** ![:faic](arrow-right) unexplained variance in the response matrix (residuals) - .alert[(0.1467/0.5023 = 29.21%)] --- # RDA output in R ```r summary(spe.rda) ... # Eigenvalues, and their contribution to the variance # # Importance of components: # RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7 RDA8 # Eigenvalue 0.2218 0.0545 0.03245 0.02186 0.009033 0.006009 0.003783 0.002822 # Proportion Explained 0.4415 0.1085 0.06460 0.04351 0.017983 0.011962 0.007532 0.005618 # Cumulative Proportion 0.4415 0.5500 0.61463 0.65814 0.676119 0.688082 0.695614 0.701231 # RDA9 RDA10 RDA11 PC1 PC2 PC3 PC4 PC5 # Eigenvalue 0.001944 0.0008515 0.0006343 0.04218 0.02887 0.01910 0.01363 0.01060 # Proportion Explained 0.003869 0.0016951 0.0012628 0.08396 0.05746 0.03801 0.02713 0.02111 # Cumulative Proportion 0.705100 0.7067955 0.7080583 0.79202 0.84948 0.88749 0.91462 0.93573 # PC6 PC7 PC8 PC9 PC10 PC11 PC12 # Eigenvalue 0.008346 0.00732 0.004500 0.004236 0.002861 0.001931 0.001015 # Proportion Explained 0.016615 0.01457 0.008958 0.008432 0.005696 0.003845 0.002020 # Cumulative Proportion 0.952346 0.96692 0.975876 0.984308 0.990004 0.993848 0.995868 ... ``` - **`Eigenvalues, and their contribution to the variance`** ![:faic](arrow-right) eigenvalues for the canonical axes (RDA1 to RDA11) and the unconstrained axes (PC1 to PC18), as well as the cumulative proportion of .purple[**variance explained by the RDA axes**] and .purple[**represented by the residual PC axes**]. - The first 2 RDA axes cumulatively explain 55% of the fish community variance. --- # Selecting variables Using .alert[stepwise selection], we can select the explanatory variables that are significant. ```r # Forward selection of variables: mod0 <- rda(spe.hel ~ 1, data = env.z) # Model with intercept only rda.sel <- ordiR2step(mod0, # lower model limit scope = formula(spe.rda), # upper model limit (full model) direction = "forward", # could be backward or both R2scope = TRUE, # can't surpass the full model's R2 step = 1000, trace = FALSE) # change to TRUE to see the selection process! ``` .comment[Here, we are essentially adding one variable at a time, and retaining it if it significantly increases the model's adjusted R2.] --- # Selecting variables - Which variables are retained by the forward selection? ```r rda.sel$call # rda(formula = spe.hel ~ dfs + oxy + bdo, data = env.z) ``` - What is the **adjusted R2** of the RDA with the selected variables? ```r RsquareAdj(rda.sel) # $r.squared # [1] 0.5645572 # # $adj.r.squared # [1] 0.5143137 ``` How would you report these results in a paper? - .comment[The selected environmental variables (dfs + oxy + bdo) explain .alert[51.43%] of the variation in fish community composition in the 30 sites along the Doubs river.] - Note: you should report the adjusted R2 which takes into account the number of explanatory variables. --- # Significance testing Use `anova.cca()` to test the significance of your RDA. ```r anova.cca(rda.sel, step = 1000) # Permutation test for rda under reduced model # Permutation: free # Number of permutations: 999 # # Model: rda(formula = spe.hel ~ dfs + oxy + bdo, data = env.z) # Df Variance F Pr(>F) # Model 3 0.28360 11.236 0.001 *** # Residual 26 0.21874 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` You can also test the significance of each axis! ```r anova.cca(rda.sel, step = 1000, by = "axis") ``` ... and of each variable! ```r anova.cca(rda.sel, step = 1000, by = "term") ``` .center[ .alert[![:faic](exclamation-triangle) If your global test is not significant, RDA results should not be interpreted!] ] --- # RDA - Visualization RDA allows a **simultaneous visualization** of your response and explanatory variables ```r ordiplot(rda.sel, scaling = 1, type = "text") ordiplot(rda.sel, scaling = 2, type = "text") ``` <img src="index_files/figure-html/unnamed-chunk-125-1.png" width="792" style="display: block; margin: auto;" /> .pull-left[ **Scaling 1** distances among objects reflect their similarities ] .pull-right[ **Scaling 2** angles between variables (species and explanatory variables) reflect their correlation ] --- # RDA triplot - Scaling ### Scaling 1 – .comment[distance triplot] .pull-left[ <img src="index_files/figure-html/unnamed-chunk-126-1.png" width="432" style="display: block; margin: auto;" /> ] .pull-right[ - .alert[Distances among sites] approximate their Euclidean distances, i.e. sites that are close together share similar species. Idem for centroids of qualitative variables. - .alert[Projecting a site at right angle] on a response variable or a quantitative explanatory variable approximates the value of the site along that variable. - The .alert[angles between response and explanatory variables] reflect their correlations (*but not the angles among response variables*). ] --- # RDA triplot - Scaling ### Scaling 2 - .comment[correlation triplot] .pull-left[ <img src="index_files/figure-html/unnamed-chunk-127-1.png" width="432" style="display: block; margin: auto;" /> ] .pull-right[ - The .alert[angles between *all* vector variables] reflect their correlations. - Distances among sites do **not** approximate their Euclidean distances. Idem for centroids of qualitative variables. - .alert[Projecting a site at right angle] on a response variable or a quantitative explanatory variable approximates the value of the site along that variable. - Environmental arrows that are .alert[along the first axis and longer] are more important to explain the variation in the community matrix. ] --- # Customizing RDA triplot Both `plot()` and `ordiplot()` make quick and simple ordination plots, but you can customize your plots by manually setting the aesthetics of points, text, and arrows. - Extract species and site scores along the 1st and 2nd RDA: ```r spe.scores <- scores(rda.sel, display = "species", choices = 1:2) site.scores <- scores(rda.sel, display = "sites", choices = 1:2) env.scores <- scores(rda.sel, display = "bp", choices= 1:2) # default: scaling = 2 ``` - Extract % of variation on the 1st and 2nd RDA: ```r prop1 <- round(rda.sel$CCA$eig[1]/sum(rda.sel$CCA$eig)*100, 2) prop2 <- round(rda.sel$CCA$eig[2]/sum(rda.sel$CCA$eig)*100, 2) ``` --- # Customizing RDA triplot ```r # Empty plot with axes plot(rda.sel, type = "none", xlab = paste0("RDA1 (", prop1, "%)"), ylab = paste0("RDA2 (", prop2, "%)")) # Points for sites points(site.scores, pch = 19, col = alpha("grey", 0.5), cex = 1.1) # Arrows and text for species arrows(x0 = 0, y0 = 0, x1 = spe.scores[,1], y1 = spe.scores[,2], length = 0.1, col = "grey15") text(spe.scores[,1], spe.scores[,2]+sign(spe.scores[,2])*.02, # shift labels labels = rownames(spe.scores), col = "grey15", cex = 0.9) # Arrows and text for environmental variables arrows(x0 = 0, y0 = 0, x1 = env.scores[,1], y1 = env.scores[,2], length = 0.1, lwd = 1.5, col = "#CC3311") text(env.scores[,1]+sign(env.scores[,1])*.05, # shift labels env.scores[,2], labels = rownames(env.scores), col = "#CC3311", cex = 1, font = 2) ``` --- # Customizing RDA triplot <img src="index_files/figure-html/unnamed-chunk-131-1.png" width="504" style="display: block; margin: auto;" /> --- # Customizing RDA triplot ```r spe.scores <- spe.scores * 1.4 ``` <img src="index_files/figure-html/unnamed-chunk-132-1.png" width="504" style="display: block; margin: auto;" /> --- # Great online resources - Multivariate analyses - dissimilarity measures, clustering, PCA, CA...: - [en](https://wiki.qcbs.ca/r_workshop9) - [fr](https://wiki.qcbs.ca/r_atelier9) - Advance multivariate analyses - RDA, CCA, MRT: - [en](https://wiki.qcbs.ca/r_workshop10) - [fr](https://wiki.qcbs.ca/r_atelier10) - Book with R scripts - [Numerical Ecology with R](http://adn.biol.umontreal.ca/~numericalecology/numecolR/) - [First edition available online](http://jinliangliu.weebly.com/uploads/2/5/7/8/25781074/numerical_ecology_with_r.pdf) --- class: inverse, center, middle # Thank you for attending this workshop!