library(tidyverse)
library(plotly)
library(skimr)
library(tidytext)
library(here)
library(scales)Exploring R Consortium ISC Grants

Background
I’ve never really contributed to tidytuesday. Recently, I’ve been trying to spark some inspiration, so I thought contributing to this social data project would be a good start. I used this post as an opportunity to get more comfortble using plotly and Tableau for creating data visualizations.
data_isc_grants <-
read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-02-20/isc_grants.csv')Data description
The data represents information about past projects funded by the R Consortium Infrastructure Committee (ISC) Grant Program. The purpose of these grants is to support projects contributing to the R community. Learn more about the most recent round of funding by checking out their blog post announcing this round of grants.
The data includes columns like: year, group (i.e., funding cycle), title, funded (i.e., funding amount), and summary. Before creating some data visualizations, let’s do some quick exploratory analysis.
glimpse(data_isc_grants)Rows: 85
Columns: 7
$ year <dbl> 2023, 2023, 2023, 2023, 2023, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, …
$ group <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, …
$ title <chr> "The future of DBI (extension 1)", "Secure TLS Communications for R", "volcalc…
$ funded <dbl> 10000, 10000, 12265, 3000, 15750, 8000, 8000, 22000, 6000, 25000, 15000, 20000…
$ proposed_by <chr> "Kirill Müller", "Charlie Gao", "Kristina Riemer", "Mark Padgham", "Jon Harmon…
$ summary <chr> "This proposal mostly focuses on the maintenance and support for {DBI}, the {D…
$ website <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
skim(data_isc_grants)| Name | data_isc_grants |
| Number of rows | 85 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 4 |
| numeric | 3 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| title | 0 | 1.00 | 4 | 120 | 0 | 85 | 0 |
| proposed_by | 0 | 1.00 | 8 | 63 | 0 | 66 | 0 |
| summary | 0 | 1.00 | 31 | 2210 | 0 | 85 | 0 |
| website | 33 | 0.61 | 21 | 224 | 0 | 48 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1 | 2019.14 | 2.08 | 2016 | 2017 | 2019 | 2021 | 2023 | ▇▆▇▅▅ |
| group | 0 | 1 | 1.40 | 0.49 | 1 | 1 | 1 | 2 | 2 | ▇▁▁▁▅ |
| funded | 0 | 1 | 13781.14 | 11325.80 | 0 | 6000 | 10000 | 16000 | 62400 | ▇▂▁▁▁ |
What’s the trend for grant funding?
Let’s take a look at the funding trend by funding cycle (i.e., fall and spring).
Code
data_by_year_grp <- data_isc_grants |>
mutate(group = case_when(
group == 1 ~ "Spring",
group == 2 ~ "Fall")
) |>
group_by(year, group) |>
summarise(funded = sum(funded), .groups = "drop") |>
arrange(group, year) |>
pivot_wider(names_from = group, values_from = funded)Code
plot_ly(
data_by_year_grp,
x = ~year,
y = ~Fall,
name = "Fall",
type = 'scatter',
mode = 'lines',
line = list(width = 5),
text = ~paste(
"Funding awarded: $", comma(Fall),
"<br>Year: ", year
),
hoverinfo = "text"
) |>
add_trace(
y = ~Spring,
name = "Spring",
text = ~paste(
"Funding awarded: $", comma(Spring),
"<br>Year: ", year
),
hoverinfo = "text"
) |>
layout(
title = list(
text = "<b>Funding trend for R Consortium ISC grants by funding round</b>",
xanchor = "center",
yanchor = "top",
font = list(family = "arial", size = 24)
),
xaxis = list(title = ""),
yaxis = list(title = "Funding amount ($US)")
)What words are used most often within descriptions of funded projects?
Now, let’s explore the words used within descriptions most often in awarded grant applications.
Code
data_word_fund_trend <- data_isc_grants |>
mutate(
summary = str_remove_all(str_to_lower(summary), "[[:punct:]]"),
summary = str_remove_all(summary, "[0-9]"),
) |>
unnest_tokens(word, summary) |>
anti_join(get_stopwords()) |>
group_by(year) |>
count(word) |>
arrange(word, year) |>
group_by(word) |>
mutate(
n_cume = cumsum(n)
)Code
top_words <- data_word_fund_trend |>
ungroup() |>
summarise(top = quantile(n_cume, .99)) |>
pull(top)
data_top_words <- data_word_fund_trend |>
filter(n_cume >= top_words) |>
distinct(word)
plot_ly(
data = data_word_fund_trend,
x = ~year,
y = ~n_cume,
mode = "lines",
line = list(color = "#d3d3d3", width = 3),
type = "scatter",
mode = "lines",
name = "",
text = ~paste(
"Word: ", word,
"<br>Cumulative mentions: ", n_cume,
"<br>Year: ", year
),
hoverinfo = "text"
) |>
add_lines(
data = data_word_fund_trend |> semi_join(data_top_words),
x = ~year,
y = ~n_cume,
line = list(color = "#0C2D48", width = 3),
type = "scatter",
mode = "lines",
name = ""
) |>
layout(
title = list(
text = "<b>Aiming for RConsortium grant funding? Consider using these words</b>",
xanchor = "center",
yanchor = "top",
font = list(family = "arial", size = 24)
),
xaxis = list(title = ""),
yaxis = list(title = "Cumulative mentions"),
showlegend = FALSE
)An attempt using Tableau
To learn more about using Tableau, I took this week’s data as an opportunity to learn more. Here’s what I came up with.
Reuse
Citation
@misc{berke2024,
author = {Berke, Collin K},
title = {Exploring {R} {Consortium} {ISC} {Grants}},
date = {2024-02-26},
langid = {en}
}