Herramientas Cuantitativas para el Análisis Político

# Herramientas Cuantitativas para el Análisis Político
## [CP44] Maestría en Ciencia Política
### Juan Pablo Ruiz Nicolini
### Universidad Torcuato Di Tella
### 13/10/2020

---

---

## SESIÓN 5
### Domar los datos (II) & Programación (Intro)

#### [/MetodosCiPol/](https://tuqmano.github.io/MetodosCiPol/)

#### [/MetodosCiPol/](https://github.com/TuQmano/MetodosCiPol)

---
class: inverse, middle, center

# Domar los Datos
### (II Parte)

---

background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/stringr.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos II

## Caracteres

* Funciones que permiten manipular caracteres individuales dentro de las cadenas en vectores de caracteres (_vg_ : **`str_sub(string = x, start = 1, end = 4)`**).

* Herramientas para agregar, eliminar y manipular espacios en blanco (_vg_ : **`str_pad(string = x , width = 2 , side = "left" , pad = 0)`**).

* Funciones que detectan coincidencia de patrones como las _expresiones regulares_ -[_regex_ ](https://stringr.tidyverse.org/articles/regular-expressions.html): 
**`str_detect(string = x, pattern = ":digits:")`**

[`{stringr}`](https://stringr.tidyverse.org/articles/stringr.html)

---
background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/lubridate.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos II

## Días y horas

`{lubridate}` incluye una gran variedad de funciones para **(a) *paresear* días y horas**; **(b) crear y extraer información**; (c) manejar zonas horarias (_tz_); y hasta calcular intervalos de tiempo y _aritmética de tiempo_

```r
library(lubridate) # (a)

dmy("6 de octubre de 2020")
## [1] "2020-10-06"
```

```r
library(lubridate) # (b)

today() + 365
## [1] "2021-10-13"
```

[`{lubridate}`](https://lubridate.tidyverse.org/index.html)

---

background-image: url(https://github.com/rstudio/hex-stickers/raw/master/PNG/forcats.png)
background-position: 95% 5%
background-size: 10%

# Domar los datos II

## Variables categóricas

> *Los factores son útiles cuando se tiene datos categóricos, variables que tienen un conjunto de valores fijo y conocido, y cuando se desea mostrar los vectores de caracteres en orden no alfabético* **R4DS - <https://es.r4ds.hadley.nz/factores.html>**

* `fct_reorder()` > modifica el orden

* `fct_recode()` > modifica valores (no niveles)

* `fct_collapse()`> colapsar es útil para re codificar muchos niveles 
--

* `fct_lump()` > agrupa

---

### Misceláneas

#### **DB** y **Tablas Relacionales**

##### `extra_data_and_script/misc.R`

##### `extra_data_and_script/manipulate_twitter_data.R`

---

# Programación (Intro)

---

## Referencias

* [_Pipes_, Funciones, Vectores e Iteración](https://es.r4ds.hadley.nz/programar-intro.html), en **Wickham y Grolemnud**

---

# Programando con `R base`

```r
df <- tibble::tibble(
 a = rnorm(10),
 b = rnorm(10),
 c = rnorm(10),
 d = rnorm(10)
)

df
## # A tibble: 10 x 4
## a b c d
## <dbl> <dbl> <dbl> <dbl>
## 1 0.812 0.0731 1.52 -0.0295 
## 2 -0.294 0.171 -0.0504 -0.316 
## 3 0.485 -1.12 1.30 1.63 
## 4 1.45 -0.307 0.0126 -0.826 
## 5 -0.547 -0.331 1.03 -0.124 
## 6 1.70 1.09 -1.50 -1.06 
## 7 0.797 0.514 -0.109 -1.71 
## 8 -1.02 1.36 0.289 -0.415 
## 9 -0.264 -0.766 0.435 0.742 
## 10 -0.0613 0.996 -0.599 0.00737
```

---

# Programando con `R base`

```r
df$a <- (df$a - min(df$a)) /
 (max(df$a) - min(df$a))

df$b <- (df$b - min(df$b)) /
 (max(df$b) - min(df$a))

df$c <- (df$c - min(df$c)) /
 (max(df$c) - min(df$c))

df$d <- (df$d - min(df$d)) /
 (max(df$d) - min(df$d))

```

--
* Qué estamos calculando?

--
* Dónde está el error?

> **Deberías considerar escribir una función cuando has copiado y pegado un bloque de código más de dos veces** - [** R4DS**](https://es.r4ds.hadley.nz/funciones.html#cu%C3%A1ndo-deber%C3%ADas-escribir-una-funci%C3%B3n)

---

# Progrmando con `R base`

```r

x <- df$a
(x - min(x)) / (max(x) - min(x))
## [1] 0.6734004 0.2667015 0.5531452 0.9098550 0.1732851 1.0000000 0.6678752
## [8] 0.0000000 0.2773905 0.3521686
```

```r
rng <- range(x)
(x - rng[1]) / (rng[2] - rng[1])
## [1] 0.6734004 0.2667015 0.5531452 0.9098550 0.1732851 1.0000000 0.6678752
## [8] 0.0000000 0.2773905 0.3521686
```

```r
rescale01 <- function(x) {
 rng <- range(x, na.rm = TRUE)
 (x - rng[1]) / (rng[2] - rng[1])
}
rescale01(c(22, 50, 10, 32))
## [1] 0.30 1.00 0.00 0.55
```

---
background-image: url(https://github.com/electorArg/polAr/raw/master/hex/hex-polAr.png?raw=true)
background-position: 95% 5%
background-size: 10%

## Datos `{polAr}`

```r

library(polAr)

tucuman_dip_gral_2017 %>% 
 get_names()
## # A tibble: 6 x 9
## # Groups: codprov [1]
## category round year codprov name_prov electores listas votos nombre_lista 
## <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> 
## 1 dip gral 2017 23 TUCUMAN 1217274 0180 154930 FUERZA REPUBL~
## 2 dip gral 2017 23 TUCUMAN 1217274 0503 46609 FRENTE DE IZQ~
## 3 dip gral 2017 23 TUCUMAN 1217274 0521 319221 CAMBIEMOS PAR~
## 4 dip gral 2017 23 TUCUMAN 1217274 0548 459257 FRENTE JUSTIC~
## 5 dip gral 2017 23 TUCUMAN 1217274 blancos 5920 blancos 
## 6 dip gral 2017 23 TUCUMAN 1217274 nulos 12947 nulos
```

---

background-image: url(https://github.com/electorArg/polAr/raw/master/hex/hex-polAr.png?raw=true)
background-position: 95% 5%
background-size: 10%

## % votos

```r

library(polAr)
library(dplyr)

tucuman_dip_gral_2017 %>% 
 get_names() %>% 
 transmute(nombre_lista, votos, 
* pct = round(votos/sum(votos)*100,1))
## # A tibble: 6 x 4
## # Groups: codprov [1]
## codprov nombre_lista votos pct
## <chr> <chr> <dbl> <dbl>
## 1 23 FUERZA REPUBLICANA 154930 15.5
## 2 23 FRENTE DE IZQUIERDA Y DE LOS TRABAJADORES 46609 4.7
## 3 23 CAMBIEMOS PARA EL BICENTENARIO 319221 32 
## 4 23 FRENTE JUSTICIALISTA POR TUCUMAN 459257 46 
## 5 23 blancos 5920 0.6
## 6 23 nulos 12947 1.3
```

---

background-image: url(https://github.com/electorArg/polAr/raw/master/hex/hex-polAr.png?raw=true)
background-position: 95% 5%
background-size: 10%

## `function()` 
### generalizar cálculo de % para un vector

```r
calcular_pct <- function(data){
 
* round(data/sum(data)*100,1)

}
```

---

background-image: url(https://github.com/electorArg/polAr/raw/master/hex/hex-polAr.png?raw=true)
background-position: 95% 5%
background-size: 10%

## % votos 
###  `calcular_pct(data)`

```r

datos <- polAr::tucuman_dip_gral_2017

datos %>% 
 get_names() %>% 
 dplyr::transmute(nombre_lista,
* pct = calcular_pct(data = votos))
## # A tibble: 6 x 3
## # Groups: codprov [1]
## codprov nombre_lista pct
## <chr> <chr> <dbl>
## 1 23 FUERZA REPUBLICANA 15.5
## 2 23 FRENTE DE IZQUIERDA Y DE LOS TRABAJADORES 4.7
## 3 23 CAMBIEMOS PARA EL BICENTENARIO 32 
## 4 23 FRENTE JUSTICIALISTA POR TUCUMAN 46 
## 5 23 blancos 0.6
## 6 23 nulos 1.3
```

---
background-image: url(https://github.com/tidyverse/magrittr/raw/master/man/figures/logo.png)
background-position: 95% 5%
background-size: 10%

# "*Esto no es una pipa*"

### Una receta

```r

the_data <-
 read.csv('/path/to/data/file.csv') %>%
 subset(variable_a > x) %>%
 transform(variable_c = variable_a/variable_b) %>%
 head(100)

```
--
* Secuencia de comandos u ordenes

* Lectura de izquierda a derecha

* Minimizar (i) funciones anidadas y (ii)
creación de objetos intermedios

* Facilita posibiidad de modificar secuencia y agregar pasos en el medio de la misma

[{magrittr}](https://magrittr.tidyverse.org/)

---

background-image: url(https://github.com/tidyverse/glue/raw/master/man/figures/logo.png)
background-position: 95% 5%
background-size: 10%

# Facilitando el _pegado_

```
## Mi nombre es TuQmano. 
## Trabajo de Cientista de Datos. 
## Nací el jueves 15 de septiembre de 1983
```

[{glue}](https://glue.tidyverse.org/) 
[y alternativas](https://trinkerrstuff.wordpress.com/2013/09/15/paste-paste0-and-sprintf-2/) como `paste()`, `paste0()` y `sprintf()`.

```r
glue("Mi nombre es {nombre}. 
     Trabajo de {ocupacion}.
     Nací el {format(aniversario, '%A, %d de %B de %Y')}")

```

```r
library(glue)

nombre <- "TuQmano"
ocupacion <- "Cientista de Datos"
aniversario <- as.Date("1983-09-15")
```

---

background-image: url(https://github.com/tidyverse/purrr/raw/master/man/figures/logo.png)
background-position: 95% 5%
background-size: 10%

# Iteración

[{purrr}](https://purrr.tidyverse.org/)

**<https://www.gerkelab.com/blog/2018/09/import-directory-csv-purrr-readr/>**

---

background-image: url(https://github.com/tidyverse/purrr/raw/master/man/figures/logo.png)
background-position: 95% 5%
background-size: 10%

# Iteración 
## Ejemplo de lectura de múltiples archivos

[ **Claus Wilke**: _Reading and combining many tidy data files in R_](https://clauswilke.com/blog/2016/06/13/reading-and-combining-many-tidy-data-files-in-r/)

```r
require(readr)  # for read_csv()
require(dplyr)  # for mutate()
require(tidyr)  # for unnest()
require(purrr)  # for map()

files <- dir(pattern = "*.csv", path="datos/00.PRESIDENCIAL/", full.names = T)

files
##  [1] "datos/00.PRESIDENCIAL/arg_presi_balota2015.csv"
##  [2] "datos/00.PRESIDENCIAL/arg_presi_gral2019.csv"  
##  [3] "datos/00.PRESIDENCIAL/arg_presi_paso2019.csv"  
##  [4] "datos/00.PRESIDENCIAL/presi_balota2015.csv"    
##  [5] "datos/00.PRESIDENCIAL/presi_gral2003.csv"      
##  [6] "datos/00.PRESIDENCIAL/presi_gral2007.csv"      
##  [7] "datos/00.PRESIDENCIAL/presi_gral2011.csv"      
##  [8] "datos/00.PRESIDENCIAL/presi_gral2015.csv"      
##  [9] "datos/00.PRESIDENCIAL/presi_paso2011.csv"      
## [10] "datos/00.PRESIDENCIAL/presi_paso2015.csv"
```

---

background-image: url(https://github.com/tidyverse/purrr/raw/master/man/figures/logo.png)
background-position: 95% 5%
background-size: 10%

# Iteración 
## Ejemplo de lectura de múltiples archivos

```r

data <- files %>%
 map_dfr(.f = read_csv)

data
## # A tibble: 888,023 x 52
## codprov depto coddepto circuito mesa electores blancos nulos `0131` `0135`
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 01 Comu~ 001 0001 0001 343 7 6 113 138
## 2 01 Comu~ 001 0001 0002 346 6 0 101 138
## 3 01 Comu~ 001 0001 0003 345 3 7 106 139
## 4 01 Comu~ 001 0001 0004 344 8 10 105 134
## 5 01 Comu~ 001 0001 0005 343 2 7 110 125
## 6 01 Comu~ 001 0001 0006 344 9 0 125 133
## 7 01 Comu~ 001 0001 0007 344 5 7 122 120
## 8 01 Comu~ 001 0001 0008 344 4 3 106 145
## 9 01 Comu~ 001 0001 0009 346 5 4 119 126
## 10 01 Comu~ 001 0001 0010 344 3 3 102 146
## # ... with 888,013 more rows, and 42 more variables: `00024` <dbl>,
## # `00036` <dbl>, `00037` <dbl>, `00039` <dbl>, `00050` <dbl>, `00108` <dbl>,
## # `00001` <dbl>, `00008` <dbl>, `00005` <dbl>, `00010` <dbl>, `00011` <dbl>,
## # `00009` <dbl>, `00002` <dbl>, `00051` <dbl>, `00004` <dbl>, `0001` <dbl>,
## # `0003` <dbl>, `0005` <dbl>, `0014` <dbl>, `0022` <dbl>, `0030` <dbl>,
## # `0037` <dbl>, `0050` <dbl>, `0051` <dbl>, `0053` <dbl>, `0132` <dbl>,
## # `0133` <dbl>, `0134` <dbl>, `0136` <dbl>, `0137` <dbl>, `0138` <dbl>,
## # `0023` <dbl>, `0038` <dbl>, `0048` <dbl>, `0056` <dbl>, `0057` <dbl>,
## # `0059` <dbl>, `0060` <dbl>, `0254` <dbl>, `0047` <dbl>, `0013` <dbl>,
## # `0081` <dbl>
```

---

background-image: url(https://github.com/tidyverse/purrr/raw/master/man/figures/logo.png)
background-position: 95% 5%
background-size: 10%

# Iteración 
## Ejemplo de lectura de múltiples archivos

```r
data <- data_frame(filename = files) %>% # create a data frame
 # holding the file names
 mutate(file_contents = map(filename, # read files into
 ~ read_csv(.)) # a new data column
 ) 
data
## # A tibble: 10 x 2
## filename file_contents 
## <chr> <list> 
## 1 datos/00.PRESIDENCIAL/arg_presi_balota2015.csv <tibble [94,956 x 10]> 
## 2 datos/00.PRESIDENCIAL/arg_presi_gral2019.csv <tibble [100,057 x 14]>
## 3 datos/00.PRESIDENCIAL/arg_presi_paso2019.csv <tibble [98,834 x 18]> 
## 4 datos/00.PRESIDENCIAL/presi_balota2015.csv <tibble [94,956 x 10]> 
## 5 datos/00.PRESIDENCIAL/presi_gral2003.csv <tibble [62,323 x 26]> 
## 6 datos/00.PRESIDENCIAL/presi_gral2007.csv <tibble [72,350 x 25]> 
## 7 datos/00.PRESIDENCIAL/presi_gral2011.csv <tibble [85,935 x 15]> 
## 8 datos/00.PRESIDENCIAL/presi_gral2015.csv <tibble [96,339 x 14]> 
## 9 datos/00.PRESIDENCIAL/presi_paso2011.csv <tibble [85,936 x 15]> 
## 10 datos/00.PRESIDENCIAL/presi_paso2015.csv <tibble [96,337 x 19]>
```

---

background-image: url(https://github.com/tidyverse/purrr/raw/master/man/figures/logo.png)
background-position: 95% 5%
background-size: 10%

# Iteración 
## Ejemplo de lectura de múltiples archivos

```r

data %>% 
 unnest()
## # A tibble: 888,023 x 53
## filename codprov depto coddepto circuito mesa electores blancos nulos `0131`
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 datos/0~ 01 Comu~ 001 0001 0001 343 7 6 113
## 2 datos/0~ 01 Comu~ 001 0001 0002 346 6 0 101
## 3 datos/0~ 01 Comu~ 001 0001 0003 345 3 7 106
## 4 datos/0~ 01 Comu~ 001 0001 0004 344 8 10 105
## 5 datos/0~ 01 Comu~ 001 0001 0005 343 2 7 110
## 6 datos/0~ 01 Comu~ 001 0001 0006 344 9 0 125
## 7 datos/0~ 01 Comu~ 001 0001 0007 344 5 7 122
## 8 datos/0~ 01 Comu~ 001 0001 0008 344 4 3 106
## 9 datos/0~ 01 Comu~ 001 0001 0009 346 5 4 119
## 10 datos/0~ 01 Comu~ 001 0001 0010 344 3 3 102
## # ... with 888,013 more rows, and 43 more variables: `0135` <dbl>,
## # `00024` <dbl>, `00036` <dbl>, `00037` <dbl>, `00039` <dbl>, `00050` <dbl>,
## # `00108` <dbl>, `00001` <dbl>, `00008` <dbl>, `00005` <dbl>, `00010` <dbl>,
## # `00011` <dbl>, `00009` <dbl>, `00002` <dbl>, `00051` <dbl>, `00004` <dbl>,
## # `0001` <dbl>, `0003` <dbl>, `0005` <dbl>, `0014` <dbl>, `0022` <dbl>,
## # `0030` <dbl>, `0037` <dbl>, `0050` <dbl>, `0051` <dbl>, `0053` <dbl>,
## # `0132` <dbl>, `0133` <dbl>, `0134` <dbl>, `0136` <dbl>, `0137` <dbl>,
## # `0138` <dbl>, `0023` <dbl>, `0038` <dbl>, `0048` <dbl>, `0056` <dbl>,
## # `0057` <dbl>, `0059` <dbl>, `0060` <dbl>, `0254` <dbl>, `0047` <dbl>,
## # `0013` <dbl>, `0081` <dbl>
```