```{R} library(tidyverse) library(HumanitiesDataAnalysis) ``` (Optional) Change a SQL code block to count something other than title. ```{R} SELECT title, COUNT(*) AS N FROM names GROUP BY title ORDER BY N DESC LIMIT 8 ``` The following code in SQL does a simple select. ```{R} SELECT title, COUNT(*) AS N FROM names GROUP BY pid ORDER BY N DESC LIMIT 8; ``` Rewrite it in R. The code below takes part of preparing the table itself for work, which corresponds to “FROM names” above; the rest, you’ll have to do yourself and put into a sensible order.
```{R} pleiades = pleiades_db() pleiades_db() |> tbl("names") |> collect() |> KEEP_GOING() ``` ```{R} pleiades = pleiades_db() pleiades_db() |> tbl("names") |> collect() |> KEEP_GOING() ```
We can use joins and nests to make datasets much larger. We’ve looked a bit before at the “subjects” field in our books dataset. Now we’ll split it up using an unnest operation. ```{R} all_subjects = books |> mutate(subject = str_split(subjects, " -- |///")) |> select(lccn, subject) all_subjects |> head(3) all_subjects |> head(10) |> unnest(subject) ``` There are standard census divisions in the US for representing states as part of *regions*. You can find these easily online: the code below reads one in. Using an inner join and a summarize function, plot the urban population by *region* or *division*. For example, you might have four lines: North, South, East, and West. ```{R} divisions = read_csv("https://raw.githubusercontent.com/cphalpert/census-regions/master/us%20census%20bureau%20regions%20and%20divisions.csv") long_cities = CESTA |> pivot_longer(`1790`:`2010`, names_to = "year", values_to = "population") ``` ```{R} divisions = read_csv("https://raw.githubusercontent.com/cphalpert/census-regions/master/us%20census%20bureau%20regions%20and%20divisions.csv") long_cities = CESTA |> pivot_longer(`1790`:`2010`, names_to = "year", values_to = "population") ```