Herramientas Cuantitativas para el Análisis Político

class: center, middle, inverse, title-slide

# Herramientas Cuantitativas para el Análisis Político
## Maestría en Ciencia Política [CP44]
### 
### Universidad Torcuato Di Tella
### 12/10/2021

---

layout: true
 
<div class="my-footer">Juan Pablo Ruiz Nicolini | @TuQmano | <a href="https://tuqmano.ar/">www.tuqmano.ar</a></div>

---
class: inverse, center, middle

# REPASO

---

##  Ciencia de Datos - Domar Datos

[R para Ciencia de Datos - R4DS](https://es.r4ds.hadley.nz)

---

background-image: url(https://github.com/TuQmano/hex-stickers/raw/master/PNG/tidyr.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos I

## Datos Ordenados

1. Cada variable debe tener su propia columna.

2. Cada observación debe tener su propia fila.

3. Cada valor debe tener su propia celda.

---

background-image: url(https://github.com/TuQmano/hex-stickers/raw/master/PNG/tidyr.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos I

## Datos Ordenados: _pivotear_

Entre los distintos verbos, se destacan:

* `pivot_longer()`: reduce cantidad de columnas y aumenta las filas

* `pivot_wider()`: reduce cantidad de filas y aumenta columnas

### Más verbos:

`complete / fill / replace_na / drop_na`,

`nest / unnest`,

`unite /separate / extract`

[`{tidyr}`](https://tidyr.tidyverse.org/)

---

background-image: url(https://github.com/TuQmano/hex-stickers/raw/master/PNG/dplyr.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos I

## Transformar

### Una caja de herramientas

#### Verbos principales de `{dplyr}` para manipular la _data_

* `filter()`: reduce la cantidad de filas  (observaciones)

* `select()`: reduce la cantidad de columnas (variables)

* `mutate()`: crea o modifica variables

* `arrange()`: ordena (sort)

* `group_by()`: agrupa observaciones

* `summarize()`: reduce múltiples observaciones a un valor

---

background-image: url(https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/dplyr.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos I

## + verbos de `{dplyr}`

* `n()`

* `slice()` (y variantes:  `slice_*()`)

* `rename()`

* `case_when()` (re versión de `ifelse()`)

* Variantes de `mutate_*()` y `summarise_*()`

- `across()` en el [*nuevo* `{dplyr}` ](https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/)

---

background-image: url(https://www.tidyverse.org/blog/2020/06/dplyr-1-0-0/dplyr.png)
background-position: 95% 5%
background-size: 10%

## Datos relacionales

.pull-left[
![](https://www.garrickadenbuie.com/project/tidyexplain/images/left-join-extra.gif)

[ Tidy Explain, Garrick Aden-Buie](https://www.garrickadenbuie.com/project/tidyexplain/)

]

.pull-right[

* Uniones de transformación (del inglés _mutating joins_), que agregan nuevas variables a un data frame a partir de las observaciones coincidentes en otra tabla (*vg* `left_join()`)

* Uniones de filtro (del inglés _filtering joins_), que filtran observaciones en un _data frame_ con base en si coinciden o no con una observación de otra tabla (*vg* `anti_join()`).

* Operaciones de conjuntos (del inglés _set operations_), que tratan las observaciones como elementos de un conjunto (*vg* `set_diff()`).

[Datos Relacionales - R4DS](https://es.r4ds.hadley.nz/datos-relacionales.html)

]

---
class: inverse, middle, center

# Domar los Datos
### (II Parte)

---

background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/stringr.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos II

## Caracteres

* Funciones que permiten manipular caracteres individuales dentro de las cadenas en vectores de caracteres

**`str_sub(string = x, start = 1, end = 4)`**

* Herramientas para agregar, eliminar y manipular espacios en blanco

**`str_pad(string = x , width = 2 , side = "left" , pad = 0)`**

* Funciones que detectan coincidencia de patrones como las _expresiones regulares_ -[_regex_ ](https://stringr.tidyverse.org/articles/regular-expressions.html) :

**`str_detect(string = x, pattern = ":digits:")`**

[`{stringr}`](https://stringr.tidyverse.org/articles/stringr.html)

---
background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/lubridate.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos II

## Días y horas

`{lubridate}` incluye una gran variedad de funciones para **(a) *paresear* días y horas**; **(b) crear y extraer información**; (c) manejar zonas horarias (_tz_); y hasta calcular intervalos de tiempo y _aritmética de tiempo_

```r
library(lubridate) # (a)
dmy("5 de octubre de 2021")
## [1] "2021-10-05"
```

```r
library(lubridate) # (b)
today() + 365
## [1] "2022-10-12"
```

[`{lubridate}`](https://lubridate.tidyverse.org/index.html)

---

background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/forcats.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos II

## Variables categóricas

> *Los factores son útiles cuando se tiene datos categóricos, variables que tienen un conjunto de valores fijo y conocido, y cuando se desea mostrar los vectores de caracteres en orden no alfabético*

**R4DS - <https://es.r4ds.hadley.nz/factores.html>**

* `fct_reorder()` > modifica el orden

* `fct_recode()` > modifica valores (no niveles)

* `fct_collapse()`> colapsar es útil para re codificar muchos niveles 
--

* `fct_lump()` > agrupa

---

class:  middle, center, inverse

# Programación (Intro)

---

## Referencias

* [_Pipes_, Funciones, Vectores e Iteración](https://es.r4ds.hadley.nz/programar-intro.html), en **Wickham y Grolemnud**

---
background-image: url(https://github.com/tidyverse/magrittr/raw/master/man/figures/logo.png)
background-position: 95% 5%
background-size: 10%

# "*Esto no es una pipa*"

### Una receta

```r
the_data <-
 read.csv('/path/to/data/file.csv') %>%
 subset(variable_a > x) %>%
 transform(variable_c = variable_a/variable_b) %>%
 head(100)
```
--
* Secuencia de comandos u ordenes

* Lectura de izquierda a derecha

* Minimizar (i) funciones anidadas y (ii)
creación de objetos intermedios

* Facilita posibiidad de modificar secuencia y agregar pasos en el medio de la misma

[{magrittr}](https://magrittr.tidyverse.org/)

---

background-image: url(https://github.com/tidyverse/glue/raw/master/man/figures/logo.png)
background-position: 95% 3%
background-size: 10%

# Facilitando el _pegado_

```
## Mi nombre es TuQmano. 
## Trabajo de Cientista de Datos. 
## Nací el jueves 15 de septiembre de 1983
```

[{glue}](https://glue.tidyverse.org/) 
[y alternativas](https://trinkerrstuff.wordpress.com/2013/09/15/paste-paste0-and-sprintf-2/) como `paste()`, `paste0()` y `sprintf()`.

```r
library(glue)
nombre <- "TuQmano"
ocupacion <- "Cientista de Datos"
aniversario <- as.Date("1983-09-15")
```

```r
glue("Mi nombre es {nombre}. 
     Trabajo de {ocupacion}.
     Nací el {format(aniversario, '%A, %d de %B de %Y')}")
```

---
class: middle

background-image: url(https://github.com/tidyverse/glue/raw/master/man/figures/logo.png)
background-position: 95% 3%
background-size: 10%

# Facilitando el _pegado_

```r
nombres_ocupacion_aniversario <- tibble::tribble(
 ~nombre, ~ocupacion, ~aniversario,
 "Juan", "Arquitecto", "25/10/1945",
 "María", "Presidenta", "17/10/1968",
 "Ruperto", "Maestro", "23/10/1975",
 "Germán", "dibujante", "9 de abril de 1936",
 "Josefina", "Contadora", "6 de enero de 1982"
 )

nombres_ocupacion_aniversario
## # A tibble: 5 x 3
## nombre ocupacion aniversario 
## <chr> <chr> <chr> 
## 1 Juan Arquitecto 25/10/1945 
## 2 María Presidenta 17/10/1968 
## 3 Ruperto Maestro 23/10/1975 
## 4 Germán dibujante 9 de abril de 1936
## 5 Josefina Contadora 6 de enero de 1982
```

---
class: middle

background-image: url(https://github.com/tidyverse/glue/raw/master/man/figures/logo.png)
background-position: 95% 3%
background-size: 10%

# Facilitando el _pegado_

```r
library(glue) # para pegado 'programatico'
library(dplyr) # para trasformar y manipular variables
library(lubridate) # Para parsear las fechas

nombres_ocupacion_aniversario %>% 
  mutate(aniversario = dmy(aniversario)) %>% 
  mutate(texto = glue("Mi nombre es {nombre}.Trabajo de {ocupacion}. Nací el {format(aniversario, '%A, %d de %B de %Y')}")) %>% 
  pull(texto)
## Mi nombre es Juan.Trabajo de Arquitecto. Nací el jueves, 25 de octubre de 1945
## Mi nombre es María.Trabajo de Presidenta. Nací el jueves, 17 de octubre de 1968
## Mi nombre es Ruperto.Trabajo de Maestro. Nací el jueves, 23 de octubre de 1975
## Mi nombre es Germán.Trabajo de dibujante. Nací el jueves, 09 de abril de 1936
## Mi nombre es Josefina.Trabajo de Contadora. Nací el miércoles, 06 de enero de 1982
```

---

# Programando con `R base`

```r
df <- tibble::tibble(
 a = rnorm(10),
 b = rnorm(10),
 c = rnorm(10),
 d = rnorm(10)
)

df
## # A tibble: 10 x 4
## a b c d
## <dbl> <dbl> <dbl> <dbl>
## 1 -0.186 1.45 1.04 0.381 
## 2 -1.00 -0.791 -0.245 -1.88 
## 3 -0.551 0.422 0.289 0.387 
## 4 1.22 0.608 -2.34 -0.274 
## 5 -0.0109 0.392 -2.17 -0.979 
## 6 -1.90 -0.913 0.880 0.404 
## 7 -0.263 1.03 0.407 -0.0694
## 8 -1.25 0.150 -0.705 0.0641
## 9 -1.31 0.949 -0.708 -0.235 
## 10 -1.59 -0.0960 0.539 -1.23
```

---

# Programando con `R base`

```r
df$a <- (df$a - min(df$a)) /
 (max(df$a) - min(df$a))

df$b <- (df$b - min(df$b)) /
 (max(df$b) - min(df$a))

df$c <- (df$c - min(df$c)) /
 (max(df$c) - min(df$c))

df$d <- (df$d - min(df$d)) /
 (max(df$d) - min(df$d))
```

--
* Qué estamos calculando?

--
* Dónde está el error?

> **Deberías considerar escribir una función cuando has copiado y pegado un bloque de código más de dos veces** - [** R4DS**](https://es.r4ds.hadley.nz/funciones.html#cu%C3%A1ndo-deber%C3%ADas-escribir-una-funci%C3%B3n)

---

# Programando con `R base`

```r
x <- df$a

(x - min(x)) / (max(x) - min(x))
##  [1] 0.5496595 0.2891946 0.4328676 1.0000000 0.6054950 0.0000000 0.5247648
##  [8] 0.2102301 0.1903936 0.1003444
```

```r
rng <- range(x)

(x - rng[1]) / (rng[2] - rng[1])
##  [1] 0.5496595 0.2891946 0.4328676 1.0000000 0.6054950 0.0000000 0.5247648
##  [8] 0.2102301 0.1903936 0.1003444
```

```r
rescale01 <- function(x) {
 rng <- range(x, na.rm = TRUE)
 (x - rng[1]) / (rng[2] - rng[1])
}

rescale01(c(22, 50, 10, 32))
## [1] 0.30 1.00 0.00 0.55
```

---
background-image: url(https://politicaargentina.github.io/electorAr/reference/figures/logo.png)
background-position: 95% 3%
background-size: 10%

## Datos `{electorAr}`

```r
library(electorAr)
tucuman_dip_gral_2017 %>% 
 get_names()
## # A tibble: 6 x 9
## # Groups: codprov [1]
## category round year codprov name_prov electores listas votos nombre_lista 
## <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> 
## 1 dip gral 2017 23 TUCUMAN 1217274 0180 154930 FUERZA REPUBL~
## 2 dip gral 2017 23 TUCUMAN 1217274 0503 46609 FRENTE DE IZQ~
## 3 dip gral 2017 23 TUCUMAN 1217274 0521 319221 CAMBIEMOS PAR~
## 4 dip gral 2017 23 TUCUMAN 1217274 0548 459257 FRENTE JUSTIC~
## 5 dip gral 2017 23 TUCUMAN 1217274 blancos 5920 blancos 
## 6 dip gral 2017 23 TUCUMAN 1217274 nulos 12947 nulos
```

---

background-image: url(https://politicaargentina.github.io/electorAr/reference/figures/logo.png)
background-position: 95% 3%
background-size: 10%

## % votos

```r
library(electorAr)
library(dplyr)

tucuman_dip_gral_2017 %>% 
 get_names() %>% 
 transmute(nombre_lista, votos, 
* pct = round(votos/sum(votos)*100,1))
## # A tibble: 6 x 4
## # Groups: codprov [1]
## codprov nombre_lista votos pct
## <chr> <chr> <dbl> <dbl>
## 1 23 FUERZA REPUBLICANA 154930 15.5
## 2 23 FRENTE DE IZQUIERDA Y DE LOS TRABAJADORES 46609 4.7
## 3 23 CAMBIEMOS PARA EL BICENTENARIO 319221 32 
## 4 23 FRENTE JUSTICIALISTA POR TUCUMAN 459257 46 
## 5 23 blancos 5920 0.6
## 6 23 nulos 12947 1.3
```

---

background-image: url(https://github.com/PoliticaArgentina/electorAr/raw/main/man/figures/logo.png)
background-position: 95% 3%
background-size: 10%

## `function()` 
### generalizar cálculo de % para un vector

```r
calcular_pct <- function(data){
 
* round(data/sum(data)*100,1)
}
```

---

background-image: url(https://github.com/PoliticaArgentina/electorAr/raw/main/man/figures/logo.png)
background-position: 95% 3%
background-size: 10%

## % votos 
###  `calcular_pct(data)`

```r
datos <- electorAr::tucuman_dip_gral_2017
datos %>% 
 get_names() %>% 
 dplyr::transmute(nombre_lista,
* pct = calcular_pct(data = votos))
## # A tibble: 6 x 3
## # Groups: codprov [1]
## codprov nombre_lista pct
## <chr> <chr> <dbl>
## 1 23 FUERZA REPUBLICANA 15.5
## 2 23 FRENTE DE IZQUIERDA Y DE LOS TRABAJADORES 4.7
## 3 23 CAMBIEMOS PARA EL BICENTENARIO 32 
## 4 23 FRENTE JUSTICIALISTA POR TUCUMAN 46 
## 5 23 blancos 0.6
## 6 23 nulos 1.3
```