class: center, middle, inverse, title-slide .title[ # Instrumentos de Análisis Urbanos II ] .subtitle[ ## Maestría en Economía Urbana ] .author[ ### ] .institute[ ### Universidad Torcuato Di Tella ] .date[ ### 11/07/2023 ] --- layout: true <div class="my-footer"><span>Instrumentos de Análisis Urbanos II - <a href="https://tuqmano.github.io/geo_utdt/">https://tuqmano.github.io/geo_utdt/</a></span></div> --- class: inverse, center, middle --- class: inverse, center, middle # Sesion 4 ## Domar los Datos II ### + [`{dplyr}`](https://dplyr.tidyverse.org/) --- class: inverse, center, middle # Importar datos de inmuebles: ```r datos <- vroom::vroom("https://infra.datos.gob.ar/catalog/otros/dataset/6/distribution/6.1/download/inmuebles-estado-nacional.csv") ``` --- # Inspeccionamos dataset ```r skimr::skim(datos) ``` <img src="../figs/skimr.png" width="100%" /> --- background-image: url(https://github.com/TuQmano/hex-stickers/raw/master/PNG/dplyr.png) background-position: 95% 5% background-size: 10% ## + verbos de `{dplyr}` -- * `n()` -- * `slice()` (y variantes: `slice_*()`) -- * `rename()` -- * `case_when()` (re versión de `ifelse()`) -- * Variantes de `mutate_*()` y `summarise_*()` -- background-image: url(https://github.com/TuQmano/hex-stickers/raw/master/PNG/dplyr.png), url(https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/dplyr.png) background-position: 95% 5%, 95% 50% background-size: 10%, 10% ## [*nuevo* `{dplyr}` <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg>](https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/) * `across()` -- * grupos por filas con `rowwise()` --- background-image: url(https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/dplyr.png) background-position: 95% 5% background-size: 10% ## Datos relacionales <img src="https://raw.githubusercontent.com/gadenbuie/tidyexplain/main/images/left-join.gif" width="300" /> [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> Tidy Explain, Garrick Aden‑Buie](https://www.garrickadenbuie.com/project/tidyexplain/) --- background-image: url(https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/dplyr.png) background-position: 95% 5% background-size: 10% # Datos relacionales ## 3 tipos: * Uniones de **transformación** (del inglés _mutating joins_), que agregan nuevas variables a un data frame a partir de las observaciones coincidentes en otra tabla (*vg* `left_join()`) -- * Uniones de **filtro** (del inglés _filtering joins_), que filtran observaciones en un _data frame_ con base en si coinciden o no con una observación de otra tabla (*vg* `anti_join()`). -- * Operaciones de **conjuntos** (del inglés _set operations_), que tratan las observaciones como elementos de un conjunto (*vg* `set_diff()`). <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> [Datos Relacionales - R4DS](https://es.r4ds.hadley.nz/datos-relacionales.html) --- class: inverse, middle, center # Domar los Datos ### (II Parte) --- background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/stringr.png) background-position: 95% 5% background-size: 10% # Extracción de texto -- ### Extraer caracteres según la posición en el texto: ```r library(tidyverse) ``` ```r library(tidyverse) str_sub(string = `texto`, start = `posicion_inicial`, end = `posicion_final`) ``` -- #### Ejemplo 1: ```r texto_ejemplo1 <- "año 1" str_sub(string = texto_ejemplo1, start = 5, end = 5) ## [1] "1" ``` -- #### Ejemplo 2: ```r texto_ejemplo2 <- c("año 1", "año 2", "año 3") str_sub(string = texto_ejemplo2, start = 5, end = 5) ## [1] "1" "2" "3" ``` -- ¿Y al revés? --- background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/stringr.png) background-position: 95% 5% background-size: 10% # Espacios en blanco ### Completar texto: ```r str_pad(string = `texto`, width = `ancho ideal del texto`, side = `donde agregar`, pad = `qué agregar`) ``` #### Ejemplo: ```r texto_ejemplo <- c("1", "011", "01", "0111") str_pad(string = texto_ejemplo, width = 4, side = "left", pad = "0") ## [1] "0001" "0011" "0001" "0111" ``` -- ```r texto_con_pad <- str_pad(string = texto_ejemplo, width = 4, side = "left", pad = "0") str_sub(string = texto_con_pad, start = 1, end = 3) ## [1] "000" "001" "000" "011" ``` --- background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/stringr.png) background-position: 95% 5% background-size: 10% # Espacios en blanco ### Eliminar espacios en blanco ```r str_trim(string = `texto`, side = `lado sobre el que trabaja`) ``` -- #### Ejemplo: ```r texto_ejemplo <- c("acá no debería haber espacio a la derecha ") str_length(texto_ejemplo) ## [1] 45 ``` ```r texto_resultado <- str_trim(string = texto_ejemplo, side = "right") str_length(texto_resultado) ## [1] 41 ``` --- background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/stringr.png) background-position: 95% 5% background-size: 10% # Detección de patrones ### Funciones que detectan coincidencia de patrones: -- #### Ejemplo: ### Me quiero quedar con todos los inmuebles que se encuentran sobre avenidas ```r b_inm_avenidas <- datos %>% filter(str_detect(string = calle, pattern = "Av")) head(b_inm_avenidas$calle) ## [1] "Avenida Rivadavia" "Avenida de Mayo" "Avenida Paseo Colon" ## [4] "Avenida Rivadavia" "Avda. del Libertador" "Av. Rivadavia" ``` -- Ver uso de _expresiones regulares_ -[_regex_ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg>](https://stringr.tidyverse.org/articles/regular-expressions.html): **`str_detect(string = x, pattern = ":digits:")`** -- [<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg>`{stringr}`](https://stringr.tidyverse.org/articles/stringr.html) --- background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/forcats.png) background-position: 95% 5% background-size: 10% # Domar los datos II ## Variables categóricas > *Los factores son útiles cuando se tiene datos categóricos, variables que tienen un conjunto de valores fijo y conocido, y cuando se desea mostrar los vectores de caracteres en un orden específico* **R4DS - <https://es.r4ds.hadley.nz/factores.html>** -- ```r factor(x = `variable a transformar a factor`, levels = `niveles de la variable`, labels = `etiquetas de los niveles`) ``` -- * `fct_reorder()` > modifica el orden -- * `fct_recode()` > modifica valores (no niveles) -- * `fct_collapse()`> colapsar es útil para re codificar muchos niveles -- * `fct_lump()` > agrupa