library(tidyverse)
library(here)
library(skimr)
library(tidytext)
library(plotly)
Exploring data from the Fiscal Sponsor Directory
Background
I’m a little behind on this submission. My time to focus on #tidytuesday contributions has been limited recently. Nevertheless, here’s my submission for the 2024-03-12 data set.
This week’s data comes from the Fiscal Sponsor Directory. In short, this directory is a listing of groups supporting non-profits through the fiscal sponsorship of projects. I was unfamilar with this space, so I found the Fiscal Sponsor Directory’s About Us page helpful.
Why the Fiscal Sponsorship Directory this week? Well, the organizer of #tidytuesday is the R4DS Online Learning Community, a group I actively participate in. This group has been on the search for a new fiscal sponsor recently. The aim, thus, was to lean on the community to create data visualizations that may be helpful in identifying another fiscal sponsor for the group. So, below is what I came up with.
Before getting to my contribution, let’s take a moment to explore the data.
<- read_csv(
data_sponsor_dir here(
"blog",
"posts",
"2024-03-22-tidytuesday-2024-03-12-fiscal-sponsor-directory",
"fiscal_sponsor_directory.csv"
) )
Rows: 370 Columns: 12
── Column specification ────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (9): details_url, name, website, fiscal_sponsorship_fee_description, eligibility_criteria, p...
dbl (3): year_501c3, year_fiscal_sponsor, n_sponsored
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Data description
This week was tough. The data was pretty dirty, and I relied heavily on string processing for data wrangling. I think, however, I’ve come up with something that is a little more informative than just a listing of the different sponsorship groups.
glimpse(data_sponsor_dir)
Rows: 370
Columns: 12
$ details_url <chr> "https://fiscalsponsordirectory.org/?page_id=599", "htt…
$ name <chr> "1st Note Music Foundation", "50CAN, Inc.", "The Abunda…
$ website <chr> "www.1stnote.org", "50can.org", "abundancenc.org", "acc…
$ year_501c3 <dbl> 2012, 2011, 2006, 2014, 2007, 1992, 2008, 2002, 1989, 1…
$ year_fiscal_sponsor <dbl> 2012, 2016, 2007, 2017, 2013, 1997, 2009, 2018, 2004, 1…
$ n_sponsored <dbl> 2, 10, 20, 6, 2, 1, NA, 7, 1, 15, 130, 60, 5, 13, 20, 1…
$ fiscal_sponsorship_fee_description <chr> "We charge a 7% administrative fee for most grants and …
$ eligibility_criteria <chr> "Type of service: Music related projects", "Aligned mis…
$ project_types <chr> "Arts and culture: Music Instruments to kids", "Educati…
$ services <chr> "Auditing: Grants", "Auditing|Bill paying|Bookkeeping/a…
$ fiscal_sponsorship_model <chr> "Model C, Preapproved Grant Relationship", "Model A, Di…
$ description <chr> "1st Note Music Foundation Inc. is a nonprofit public s…
skim(data_sponsor_dir)
Name | data_sponsor_dir |
Number of rows | 370 |
Number of columns | 12 |
_______________________ | |
Column type frequency: | |
character | 9 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
details_url | 0 | 1.00 | 47 | 49 | 0 | 370 | 0 |
name | 0 | 1.00 | 3 | 64 | 0 | 368 | 0 |
website | 7 | 0.98 | 7 | 70 | 0 | 362 | 0 |
fiscal_sponsorship_fee_description | 20 | 0.95 | 2 | 901 | 0 | 324 | 0 |
eligibility_criteria | 7 | 0.98 | 21 | 2039 | 0 | 303 | 0 |
project_types | 9 | 0.98 | 9 | 1244 | 0 | 324 | 0 |
services | 14 | 0.96 | 9 | 1852 | 0 | 307 | 0 |
fiscal_sponsorship_model | 86 | 0.77 | 7 | 499 | 0 | 64 | 0 |
description | 31 | 0.92 | 70 | 1665 | 0 | 337 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
year_501c3 | 6 | 0.98 | 1997.62 | 17.18 | 1903 | 1985.75 | 2001 | 2012 | 2022 | ▁▁▂▅▇ |
year_fiscal_sponsor | 15 | 0.96 | 2005.15 | 13.68 | 1957 | 1998.00 | 2009 | 2016 | 2023 | ▁▂▂▅▇ |
n_sponsored | 13 | 0.96 | 42.68 | 90.87 | 0 | 4.00 | 12 | 45 | 850 | ▇▁▁▁▁ |
head(data_sponsor_dir$project_types)
[1] "Arts and culture: Music Instruments to kids"
[2] "Education"
[3] "Children, youth and families|Economic development|Education|Environment/sustainable growth"
[4] "Arts and culture|Children, youth and families|Education|Festivals and events|Health|GBTQ|Mental health|Open space/Parks|People or communities of color/minorities| Social services|Other: Do you have a project idea that increases access to music and recreation? Our mission is to foster connection and understanding through pleasurable experiences like music and recreation to inspire the creation of safe, inclusive, equitable communities."
[5] "Arts and culture|Children, youth and families: Mentor Develop Programs, Youth Empowerment Programs, Grant-funded Programs. Family Empowerment and Educational Programs.|Education: Student Success Strategy Programs; Adult Re-Entry Into Education Projects.|Festivals and events: African Diaspora History Festivals and Events|People or communities of color/minorities: Diversity Impact Programs. African-American and African Diaspora Immigrant Projects|Youth development: Youth Empowerment, Job Readiness, Early Career Development,"
[6] "Arts and culture|Children, youth and families|Disaster relief|Education|Festivals and events|People or communities of color/minorities|Women|Youth development"
Data wrangling
The first step in the data wrangling process was to clean up the string data in the project_types
column. I wanted to use this as a dimension to filter out fiscal sponsor potentially relevant to the the R4DS community. Take note, I used a regular expression to remove string values after the other
in the column. These free text responses would have made it harder to filter data on this dimension.
# Remove any string text after 'other'
<- data_sponsor_dir |>
data_sponsor_dir select(
details_url,
name,
year_fiscal_sponsor,
n_sponsored,
project_types,
website|>
) mutate(
project_types = str_to_lower(project_types),
project_types = str_remove(project_types, ":.*")
)
The next step was to tokenize project_types
’ categories into it’s own rows. I did this by using the unnest_tokens()
function from the tidytext
package.
# What are the unique categories?
<- data_sponsor_dir |>
data_sponsor_cat unnest_tokens(cat, project_types, token = 'regex', pattern = "\\|")
head(unique(data_sponsor_cat$cat), n = 10)
[1] "arts and culture" "education"
[3] "children, youth and families" "economic development"
[5] "environment/sustainable growth" "festivals and events"
[7] "health" "gbtq"
[9] "mental health" "open space/parks"
While wrangling this data, I learned you can pass a regex
pattern to tokenize when using unnest_tokens()
. I found this to be useful in this case.
Outputted above are 10 example categories found in the project_types
column. Looking through all of these categories, the following might be fiscal sponsors whose purpose aligns with that of the R4DS community. These include the following categories:
conference/event management
education environment/sustainable growth
education
economic development
organizational development
However, before I go about creating a visualization using these categories, let’s see if there’s enough data within each category to make the visualization informative.
<- c(
cat_filter "education",
"education environment/sustainable growth",
"economic development",
"conference/event management",
"organizational development"
)
|>
data_sponsor_cat filter(cat %in% cat_filter) |>
count(cat, sort = TRUE)
# A tibble: 5 × 2
cat n
<chr> <int>
1 education 186
2 economic development 144
3 organizational development 6
4 conference/event management 1
5 education environment/sustainable growth 1
Indeed, some categories don’t have enough data. Really, the only two categories worth plotting would be ‘economic development’ and ‘education’. So, let’s filter for just these two categories. Let’s also drop NA
values for simplicity sake.
<- c(
cat_filter "education",
"economic development"
)
<- data_sponsor_cat |>
data_cat_filter filter(cat %in% cat_filter) |>
mutate(cat = str_to_title(cat)) |>
arrange(name, cat) |>
drop_na() |>
select(-details_url)
Creating a Box and Whisker plot
Given I had a numeric variable, n_sponsored
, I thought a Box and Whisker plot split by the two categories would be informative. It would certainly help identify fiscal sponsors who support many or very little projects based on the types of projects they support. Another thing I had to do was log the n_sponsored
column. When I first plotted the untransformed variables, it was challenging to see the distribution of values. Logging n_sponsored
made it easier to see the values. However, the hover tool provides the untransformed value for each fiscal sponsor in the data set.
There will be duplicates in this visualization, as some sponsors will support both education and economic development focused projects.
plot_ly(type = "box") |>
add_boxplot(
data = data_cat_filter |> filter(cat == "Education"),
x = ~log(n_sponsored),
y = ~cat,
boxpoints = "all",
name = "Education",
color = I("#189AB4"),
marker = list(color = "#189AB4"),
line = list(color = "#000000"),
text = ~paste(
"Sponsor: ", name,
"<br>Projects: ", n_sponsored,
"<br>Website: ", website
),hoverinfo = "text"
|>
) add_boxplot(
data = data_cat_filter |>
filter(cat == "Economic Development"),
x = ~log(n_sponsored),
y = ~cat,
boxpoints = "all",
name = "Economic development",
color = I("#191970"),
marker = list(color = "#191970"),
line = list(color = "#000000"),
text = ~paste(
"Sponsor: ", name,
"<br>Projects: ", n_sponsored,
"<br>Website: ", website
),hoverinfo = "text"
|>
) layout(
title = "<b>Distribution of the number of projects (logged) supported by fiscal sponsors",
yaxis = list(title = ""),
xaxis = list(title = "Projects sponsored on log scale")
)
Warning: Can't display both discrete & non-discrete data on same axis
Not bad. The only thing I ran out of time on was related to the hover tool. I really wanted separate hovers, one for the five number summary in the box and whisker plot and one for the individual data points. Unfortunately, I wasn’t able to figure out how to do this with the time I had. Oh well, what resulted was still a useful data visualization, given where we started with the data.
So there you have it. Not the cleanest data to work with. Nonetheless, we came up with a visualization we could still learn something from.
An attempt using Tableau
To continue developing my skills and to practice using other data visualization tools, I created this same visualization using Tableau. You can check out this version of the visualization here.
Reuse
Citation
@misc{berke2024,
author = {Berke, Collin K},
title = {Exploring Data from the {Fiscal} {Sponsor} {Directory}},
date = {2024-03-22},
langid = {en}
}