Instrumentos de Análisis Urbanos II

.title[
# Instrumentos de Análisis Urbanos II
]
.subtitle[
## Maestría en Economía Urbana
]
.author[
### 
]
.institute[
### Universidad Torcuato Di Tella
]
.date[
### 11/07/2023
]

---

layout: true
  
<div class="my-footer"><span>Instrumentos de Análisis Urbanos II - <a href="https://tuqmano.github.io/geo_utdt/">https://tuqmano.github.io/geo_utdt/</a></span></div>

---
class: inverse, center, middle

---

# Sesion 4

## Domar los Datos II

### + [`{dplyr}`](https://dplyr.tidyverse.org/)

---

# Importar datos de inmuebles:

```r

datos <- vroom::vroom("https://infra.datos.gob.ar/catalog/otros/dataset/6/distribution/6.1/download/inmuebles-estado-nacional.csv")

```

---

# Inspeccionamos dataset

```r

skimr::skim(datos)

```

---

background-image: url(https://github.com/TuQmano/hex-stickers/raw/master/PNG/dplyr.png)
background-position: 95% 5%
background-size: 10%

## + verbos de `{dplyr}`

* `n()`

* `slice()` (y variantes:  `slice_*()`)

* `rename()`

* `case_when()` (re versión de `ifelse()`)

* Variantes de `mutate_*()` y `summarise_*()`

background-image: url(https://github.com/TuQmano/hex-stickers/raw/master/PNG/dplyr.png), url(https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/dplyr.png)
background-position: 95% 5%, 95% 50%
background-size: 10%, 10%

## [*nuevo* `{dplyr}` <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">  <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg>](https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/)

* `across()`

* grupos por filas con `rowwise()`

---

background-image: url(https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/dplyr.png)
background-position: 95% 5%
background-size: 10%

## Datos relacionales

[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">  <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg> Tidy Explain, Garrick Aden‑Buie](https://www.garrickadenbuie.com/project/tidyexplain/)

---

background-image: url(https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/dplyr.png)
background-position: 95% 5%
background-size: 10%

# Datos relacionales

## 3 tipos:

* Uniones de **transformación** (del inglés _mutating joins_), que agregan nuevas variables a un data frame a partir de las observaciones coincidentes en otra tabla (*vg* `left_join()`)

* Uniones de **filtro** (del inglés _filtering joins_), que filtran observaciones en un _data frame_ con base en si coinciden o no con una observación de otra tabla (*vg* `anti_join()`).

* Operaciones de **conjuntos** (del inglés _set operations_), que tratan las observaciones como elementos de un conjunto (*vg* `set_diff()`).

---
class: inverse, middle, center

# Domar los Datos

### (II Parte)

---

background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/stringr.png)
background-position: 95% 5%
background-size: 10%

# Extracción de texto

### Extraer caracteres según la posición en el texto:

```r
library(tidyverse)
```

```r
library(tidyverse)
str_sub(string = `texto`, 
        start = `posicion_inicial`, 
        end = `posicion_final`)
```

#### Ejemplo 1:

```r
texto_ejemplo1 <- "año 1"

str_sub(string = texto_ejemplo1, 
        start = 5, 
        end = 5)
## [1] "1"
```

#### Ejemplo 2:

```r
texto_ejemplo2 <- c("año 1", "año 2", "año 3")

str_sub(string = texto_ejemplo2, 
        start = 5, 
        end = 5)
## [1] "1" "2" "3"
```

¿Y al revés?

---

background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/stringr.png)
background-position: 95% 5%
background-size: 10%

# Espacios en blanco

### Completar texto:

```r
str_pad(string = `texto`, 
        width =  `ancho ideal del texto`, 
        side = `donde agregar`, 
        pad = `qué agregar`)
```

#### Ejemplo:

```r
texto_ejemplo <- c("1", "011", "01", "0111")

str_pad(string = texto_ejemplo, 
        width = 4, 
        side = "left", 
        pad = "0")
## [1] "0001" "0011" "0001" "0111"
```
--

```r
texto_con_pad <- str_pad(string = texto_ejemplo,
                         width = 4, 
                         side = "left", 
                         pad = "0")

str_sub(string = texto_con_pad, 
        start = 1,
        end = 3)
## [1] "000" "001" "000" "011"
```

---

background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/stringr.png)
background-position: 95% 5%
background-size: 10%

# Espacios en blanco

### Eliminar espacios en blanco

```r
str_trim(string = `texto`, 
         side = `lado sobre el que trabaja`)
```

#### Ejemplo:

```r
texto_ejemplo <- c("acá no debería haber espacio a la derecha    ")

str_length(texto_ejemplo)
## [1] 45
```

```r
texto_resultado <- str_trim(string = texto_ejemplo,
                            side = "right")

str_length(texto_resultado)
## [1] 41
```

---

background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/stringr.png)
background-position: 95% 5%
background-size: 10%

# Detección de patrones

### Funciones que detectan coincidencia de patrones:

#### Ejemplo:

### Me quiero quedar con todos los inmuebles que se encuentran sobre avenidas

```r
b_inm_avenidas <- datos %>% 
  filter(str_detect(string = calle, 
                    pattern = "Av"))

head(b_inm_avenidas$calle)
## [1] "Avenida Rivadavia"    "Avenida de Mayo"      "Avenida Paseo Colon" 
## [4] "Avenida Rivadavia"    "Avda. del Libertador" "Av. Rivadavia"
```
--

Ver uso de _expresiones regulares_ -[_regex_ <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">  <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg>](https://stringr.tidyverse.org/articles/regular-expressions.html): 
**`str_detect(string = x, pattern = ":digits:")`**

[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">  <path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"></path></svg>`{stringr}`](https://stringr.tidyverse.org/articles/stringr.html)

---

background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/forcats.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos II

## Variables categóricas

> *Los factores son útiles cuando se tiene datos categóricos, variables que tienen un conjunto de valores fijo y conocido, y cuando se desea mostrar los vectores de caracteres en un orden específico* **R4DS - <https://es.r4ds.hadley.nz/factores.html>**
--

```r
factor(x = `variable a transformar a factor`, 
       levels = `niveles de la variable`, 
       labels = `etiquetas de los niveles`)
```

* `fct_reorder()` > modifica el orden

* `fct_recode()` > modifica valores (no niveles)

* `fct_collapse()`> colapsar es útil para re codificar muchos niveles 
--

* `fct_lump()` > agrupa