library(tidyverse)
library(plotly)
library(skimr)
library(tidytext)
library(here)
library(scales)
Exploring R Consortium ISC Grants
Background
I’ve never really contributed to tidytuesday
. Recently, I’ve been trying to spark some inspiration, so I thought contributing to this social data project would be a good start. I used this post as an opportunity to get more comfortble using plotly
and Tableau for creating data visualizations.
<-
data_isc_grants read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-02-20/isc_grants.csv')
Data description
The data represents information about past projects funded by the R Consortium Infrastructure Committee (ISC) Grant Program. The purpose of these grants is to support projects contributing to the R community. Learn more about the most recent round of funding by checking out their blog post announcing this round of grants.
The data includes columns like: year
, group
(i.e., funding cycle), title
, funded
(i.e., funding amount), and summary
. Before creating some data visualizations, let’s do some quick exploratory analysis.
glimpse(data_isc_grants)
Rows: 85
Columns: 7
$ year <dbl> 2023, 2023, 2023, 2023, 2023, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, …
$ group <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, …
$ title <chr> "The future of DBI (extension 1)", "Secure TLS Communications for R", "volcalc…
$ funded <dbl> 10000, 10000, 12265, 3000, 15750, 8000, 8000, 22000, 6000, 25000, 15000, 20000…
$ proposed_by <chr> "Kirill Müller", "Charlie Gao", "Kristina Riemer", "Mark Padgham", "Jon Harmon…
$ summary <chr> "This proposal mostly focuses on the maintenance and support for {DBI}, the {D…
$ website <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
skim(data_isc_grants)
Name | data_isc_grants |
Number of rows | 85 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
title | 0 | 1.00 | 4 | 120 | 0 | 85 | 0 |
proposed_by | 0 | 1.00 | 8 | 63 | 0 | 66 | 0 |
summary | 0 | 1.00 | 31 | 2210 | 0 | 85 | 0 |
website | 33 | 0.61 | 21 | 224 | 0 | 48 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
year | 0 | 1 | 2019.14 | 2.08 | 2016 | 2017 | 2019 | 2021 | 2023 | ▇▆▇▅▅ |
group | 0 | 1 | 1.40 | 0.49 | 1 | 1 | 1 | 2 | 2 | ▇▁▁▁▅ |
funded | 0 | 1 | 13781.14 | 11325.80 | 0 | 6000 | 10000 | 16000 | 62400 | ▇▂▁▁▁ |
What’s the trend for grant funding?
Let’s take a look at the funding trend by funding cycle (i.e., fall and spring).
Code
<- data_isc_grants |>
data_by_year_grp mutate(group = case_when(
== 1 ~ "Spring",
group == 2 ~ "Fall")
group |>
) group_by(year, group) |>
summarise(funded = sum(funded), .groups = "drop") |>
arrange(group, year) |>
pivot_wider(names_from = group, values_from = funded)
Code
plot_ly(
data_by_year_grp, x = ~year,
y = ~Fall,
name = "Fall",
type = 'scatter',
mode = 'lines',
line = list(width = 5),
text = ~paste(
"Funding awarded: $", comma(Fall),
"<br>Year: ", year
),hoverinfo = "text"
|>
) add_trace(
y = ~Spring,
name = "Spring",
text = ~paste(
"Funding awarded: $", comma(Spring),
"<br>Year: ", year
),hoverinfo = "text"
|>
) layout(
title = list(
text = "<b>Funding trend for R Consortium ISC grants by funding round</b>",
xanchor = "center",
yanchor = "top",
font = list(family = "arial", size = 24)
),xaxis = list(title = ""),
yaxis = list(title = "Funding amount ($US)")
)
What words are used most often within descriptions of funded projects?
Now, let’s explore the words used within descriptions most often in awarded grant applications.
Code
<- data_isc_grants |>
data_word_fund_trend mutate(
summary = str_remove_all(str_to_lower(summary), "[[:punct:]]"),
summary = str_remove_all(summary, "[0-9]"),
|>
) unnest_tokens(word, summary) |>
anti_join(get_stopwords()) |>
group_by(year) |>
count(word) |>
arrange(word, year) |>
group_by(word) |>
mutate(
n_cume = cumsum(n)
)
Code
<- data_word_fund_trend |>
top_words ungroup() |>
summarise(top = quantile(n_cume, .99)) |>
pull(top)
<- data_word_fund_trend |>
data_top_words filter(n_cume >= top_words) |>
distinct(word)
plot_ly(
data = data_word_fund_trend,
x = ~year,
y = ~n_cume,
mode = "lines",
line = list(color = "#d3d3d3", width = 3),
type = "scatter",
mode = "lines",
name = "",
text = ~paste(
"Word: ", word,
"<br>Cumulative mentions: ", n_cume,
"<br>Year: ", year
),hoverinfo = "text"
|>
) add_lines(
data = data_word_fund_trend |> semi_join(data_top_words),
x = ~year,
y = ~n_cume,
line = list(color = "#0C2D48", width = 3),
type = "scatter",
mode = "lines",
name = ""
|>
) layout(
title = list(
text = "<b>Aiming for RConsortium grant funding? Consider using these words</b>",
xanchor = "center",
yanchor = "top",
font = list(family = "arial", size = 24)
),xaxis = list(title = ""),
yaxis = list(title = "Cumulative mentions"),
showlegend = FALSE
)
An attempt using Tableau
To learn more about using Tableau, I took this week’s data as an opportunity to learn more. Here’s what I came up with.
Reuse
Citation
@misc{berke2024,
author = {Berke, Collin K},
title = {Exploring {R} {Consortium} {ISC} {Grants}},
date = {2024-02-26},
langid = {en}
}