```{R} library(tidyverse) library(HumanitiesDataAnalysis) ``` What’s the difference between using `%in%` and `str_detect` to filter down a dataset by a string? (Hint: try seeing how they both behave on some very *short* strings.) 1. Start by just editing some code. The code below finds the first date that appears in this collection. Edit it to find the minimum **age** in the set. ```{R} crews |> drop_na(date) |> summarize(min = min(date)) ``` 2. Use `filter` to determine: what is the name of that youngest person? When did he or she sail? ```{R} ``` 3. How many sailors left on ‘Barks’ between 1850 and 1880? Chain together `filter` and `summarize` with the special `n()` function. Note that this has a number of different conditions in the filter statement. You could build several filters in a row: but you can also include multiple filters by separating them with commas. For instance, `filter(school=="NYU",year==2020)` might be a valid filter on some dataset (though not this one.) To filter by date you may need to use a function like `as.Date` on your input. ```{R} ``` Question 3 told you how many sailors left on barks in those years. How many distinct voyages left? The variable `Voyage.number` identifies distinct voyages in this set. (This may require reading some documentation: reach out to me or a classmate if you can’t figure it out. There are at least two ways: one involves using the `dplyr` function `distinct` before summarizing, and the second involves using the functions `length` and `unique` in your call to `summarize`.) ```{R} ``` Change the code above to count the distinct “Residence” locations in the dataset. Then add two more pipes to the end to arrange by count. 1. Try to get a sense of what is the books set based on some keyword searches. Can you get a sense of what the biases of this subset of the catalog are? Here are a couple examples having to do with geographic terms in subjects; you’d probably do better to explore some other kind of resource.