Exploring R Consortium ISC Grants

data wrangling

data visualization

tidytuesday

plotly

Tableau

A contribution to the 2024-02-20 #tidytuesday social data project

Author

Collin K. Berke, Ph.D.

Published

February 26, 2024

library(tidyverse)
library(plotly)
library(skimr)
library(tidytext)
library(here)
library(scales)

Background

I’ve never really contributed to tidytuesday. Recently, I’ve been trying to spark some inspiration, so I thought contributing to this social data project would be a good start. I used this post as an opportunity to get more comfortble using plotly and Tableau for creating data visualizations.

data_isc_grants <- 
  read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-02-20/isc_grants.csv')

Data description

The data represents information about past projects funded by the R Consortium Infrastructure Committee (ISC) Grant Program. The purpose of these grants is to support projects contributing to the R community. Learn more about the most recent round of funding by checking out their blog post announcing this round of grants.

The data includes columns like: year, group (i.e., funding cycle), title, funded (i.e., funding amount), and summary. Before creating some data visualizations, let’s do some quick exploratory analysis.

glimpse(data_isc_grants)

Rows: 85
Columns: 7
$ year        <dbl> 2023, 2023, 2023, 2023, 2023, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, …
$ group       <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, …
$ title       <chr> "The future of DBI (extension 1)", "Secure TLS Communications for R", "volcalc…
$ funded      <dbl> 10000, 10000, 12265, 3000, 15750, 8000, 8000, 22000, 6000, 25000, 15000, 20000…
$ proposed_by <chr> "Kirill Müller", "Charlie Gao", "Kristina Riemer", "Mark Padgham", "Jon Harmon…
$ summary     <chr> "This proposal mostly focuses on the maintenance and support for {DBI}, the {D…
$ website     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

skim(data_isc_grants)

Data summary
Name	data_isc_grants
Number of rows	85
Number of columns	7
_______________________
Column type frequency:
character	4
numeric	3
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
title	0	1.00	4	120	85
proposed_by	0	1.00	8	63	66
summary	0	1.00	31	2210	85
website	33	0.61	21	224	48

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	1	2019.14	2.08	2016	2017	2019	2021	2023	▇▆▇▅▅
group	1	1.40	0.49	1	1	1	2	2	▇▁▁▁▅
funded	1	13781.14	11325.80	0	6000	10000	16000	62400	▇▂▁▁▁

What’s the trend for grant funding?

Let’s take a look at the funding trend by funding cycle (i.e., fall and spring).

Code

data_by_year_grp <- data_isc_grants |>
  mutate(group = case_when(
    group == 1 ~ "Spring", 
    group == 2 ~ "Fall")
  ) |>
  group_by(year, group) |>
  summarise(funded = sum(funded), .groups = "drop") |>
  arrange(group, year) |> 
  pivot_wider(names_from = group, values_from = funded)

Code

plot_ly(
  data_by_year_grp, 
  x = ~year, 
  y = ~Fall, 
  name = "Fall", 
  type = 'scatter', 
  mode = 'lines',
  line = list(width = 5),
  text = ~paste(
    "Funding awarded: $", comma(Fall),
    "<br>Year: ", year
  ),
  hoverinfo = "text"
) |>
add_trace(
  y = ~Spring,
  name = "Spring",
  text = ~paste(
    "Funding awarded: $", comma(Spring),
    "<br>Year: ", year
  ),
  hoverinfo = "text"
) |>
layout(
  title = list(
    text = "<b>Funding trend for R Consortium ISC grants by funding round</b>",
    xanchor = "center",
    yanchor = "top",
    font = list(family = "arial", size = 24)
  ),
  xaxis = list(title = ""),
  yaxis = list(title = "Funding amount ($US)")
)

What words are used most often within descriptions of funded projects?

Now, let’s explore the words used within descriptions most often in awarded grant applications.

Code

data_word_fund_trend <- data_isc_grants |>
  mutate(
    summary = str_remove_all(str_to_lower(summary), "[[:punct:]]"),
    summary = str_remove_all(summary, "[0-9]"),
  ) |>
  unnest_tokens(word, summary) |>
  anti_join(get_stopwords()) |>
  group_by(year) |>
  count(word) |>
  arrange(word, year) |>
  group_by(word) |>
  mutate(
    n_cume = cumsum(n)
  )

Code

top_words <- data_word_fund_trend |>
  ungroup() |>
  summarise(top = quantile(n_cume, .99)) |>
  pull(top)

data_top_words <- data_word_fund_trend |>
  filter(n_cume >= top_words) |>
  distinct(word)

plot_ly(
  data = data_word_fund_trend, 
  x = ~year,
  y = ~n_cume,
  mode = "lines",
  line = list(color = "#d3d3d3", width = 3),
  type = "scatter",
  mode = "lines",
  name = "",
  text = ~paste(
    "Word: ", word,
    "<br>Cumulative mentions: ", n_cume,
    "<br>Year: ", year
  ),
  hoverinfo = "text"
) |>
add_lines(
  data = data_word_fund_trend |> semi_join(data_top_words),
  x = ~year,
  y = ~n_cume,
  line = list(color = "#0C2D48", width = 3),
  type = "scatter",
  mode = "lines",
  name = ""
) |>
layout(
  title = list(
    text = "<b>Aiming for RConsortium grant funding? Consider using these words</b>",
    xanchor = "center",
    yanchor = "top",
    font = list(family = "arial", size = 24)
  ),
  xaxis = list(title = ""),
  yaxis = list(title = "Cumulative mentions"),
  showlegend = FALSE
)

An attempt using Tableau

To learn more about using Tableau, I took this week’s data as an opportunity to learn more. Here’s what I came up with.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@misc{berke2024,
  author = {Berke, Collin K},
  title = {Exploring {R} {Consortium} {ISC} {Grants}},
  date = {2024-02-26},
  langid = {en}
}

For attribution, please cite this work as:

Berke, Collin K. 2024. “Exploring R Consortium ISC Grants.” February 26, 2024.