```{R}
library(tidyverse)
library(HumanitiesDataAnalysis)
```
(Optional) Change a SQL code block to count something other than title.
```{R}
SELECT title, COUNT(*) AS N
FROM names GROUP BY title
ORDER BY N DESC
LIMIT 8
```
The following code in SQL does a simple select.
```{R}
SELECT title, COUNT(*) AS N
FROM names GROUP BY pid
ORDER BY N DESC
LIMIT 8;
```
Rewrite it in R. The code below takes part of preparing the table itself
for work, which corresponds to “FROM names” above; the rest, you’ll have
to do yourself and put into a sensible order.
```{R}
pleiades = pleiades_db()
pleiades_db() |>
tbl("names") |>
collect() |>
KEEP_GOING()
```
```{R}
pleiades = pleiades_db()
pleiades_db() |>
tbl("names") |>
collect() |>
KEEP_GOING()
```
We can use joins and nests to make datasets much larger. We’ve looked a
bit before at the “subjects” field in our books dataset. Now we’ll split
it up using an unnest operation.
```{R}
all_subjects = books |>
mutate(subject = str_split(subjects, " -- |///")) |>
select(lccn, subject)
all_subjects |> head(3)
all_subjects |> head(10) |> unnest(subject)
```
There are standard census divisions in the US for representing states as
part of *regions*. You can find these easily online: the code below
reads one in.
Using an inner join and a summarize function, plot the urban population
by *region* or *division*. For example, you might have four lines:
North, South, East, and West.
```{R}
divisions = read_csv("https://raw.githubusercontent.com/cphalpert/census-regions/master/us%20census%20bureau%20regions%20and%20divisions.csv")
long_cities = CESTA |>
pivot_longer(`1790`:`2010`, names_to = "year", values_to = "population")
```
```{R}
divisions = read_csv("https://raw.githubusercontent.com/cphalpert/census-regions/master/us%20census%20bureau%20regions%20and%20divisions.csv")
long_cities = CESTA |>
pivot_longer(`1790`:`2010`, names_to = "year", values_to = "population")
```