library(tidyverse)
library(skimr)
library(plotly)
library(here)
library(janitor)
library(tidymodels)
tidymodels_prefer()
Exploring the relationship between trash processed by Mr. Trash Wheel and precipitation
π Say hello to Mr. Trash Wheel and friends
This weekβs #tidytuesday weβre looking into data related to Mr. Trash Wheel and friends. Mr. Trash Wheel is a semi-autonomous trash interceptor, whoβs main purpose is to collect trash floating into the Baltimore Inner Harbor. Mr. Trash Wheel is a pretty neat invention. If youβre interested in how it works, check out the information found here.
My curiosity peaked when I came across the statement that most of the trash collected by Mr. Trash wheel is the result of water runoff, and not from people disposing trash directly into the habor. So, I wanted to explore the relationship between precipitation and the amount of trash being collected by Mr. Trash Wheel and friends for my contribution this week.
In this post, I created my visualizations using plotly
and Tableau.
<- read_csv(
data_trash here(
"blog",
"posts",
"2024-03-12-tidytuesday-2024-03-05-mr-trash-wheel",
"trashwheel.csv"
) )
Rows: 993 Columns: 16
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
chr (4): ID, Name, Month, Date
dbl (12): Dumpster, Year, Weight, Volume, PlasticBottles, Polystyrene, CigaretteButts, GlassBott...
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
<- read_csv(
data_balt_precip here(
"blog",
"posts",
"2024-03-12-tidytuesday-2024-03-05-mr-trash-wheel",
"balt_precip.csv"
) )
Rows: 10 Columns: 13
ββ Column specification ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Delimiter: ","
dbl (13): year, january, february, march, april, may, june, july, august, september, october, no...
βΉ Use `spec()` to retrieve the full column specification for this data.
βΉ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Data description
The data contains observations related to trash collected from 2014 to 2023 by multiple trash wheels. The Baltimore precipitation data came from a tool found here. I simply just copy pasted this data into a Google sheet and saved it as a .csv
file. Further wrangling steps for both data sets are included below.
To get a better sense of whatβs in the data, I did a quick glimpse()
and skim()
of both the data_trash
and data_balt_precip
data sets.
glimpse(data_trash)
Rows: 993
Columns: 16
$ ID <chr> "mister", "mister", "mister", "mister", "mister", "mister", "mister", "mistβ¦
$ Name <chr> "Mister Trash Wheel", "Mister Trash Wheel", "Mister Trash Wheel", "Mister Tβ¦
$ Dumpster <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, β¦
$ Month <chr> "May", "May", "May", "May", "May", "May", "May", "May", "June", "June", "Juβ¦
$ Year <dbl> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 201β¦
$ Date <chr> "5/16/2014", "5/16/2014", "5/16/2014", "5/17/2014", "5/17/2014", "5/20/2014β¦
$ Weight <dbl> 4.31, 2.74, 3.45, 3.10, 4.06, 2.71, 1.91, 3.70, 2.52, 3.76, 3.43, 4.17, 5.1β¦
$ Volume <dbl> 18, 13, 15, 15, 18, 13, 8, 16, 14, 18, 15, 19, 15, 15, 15, 15, 13, 15, 15, β¦
$ PlasticBottles <dbl> 1450, 1120, 2450, 2380, 980, 1430, 910, 3580, 2400, 1340, 740, 950, 530, 84β¦
$ Polystyrene <dbl> 1820, 1030, 3100, 2730, 870, 2140, 1090, 4310, 2790, 1730, 869, 1140, 630, β¦
$ CigaretteButts <dbl> 126000, 91000, 105000, 100000, 120000, 90000, 56000, 112000, 98000, 130000,β¦
$ GlassBottles <dbl> 72, 42, 50, 52, 72, 46, 32, 58, 49, 75, 38, 45, 58, 62, 64, 56, 47, 65, 63,β¦
$ PlasticBags <dbl> 584, 496, 1080, 896, 368, 672, 416, 1552, 984, 448, 344, 520, 224, 344, 432β¦
$ Wrappers <dbl> 1162, 874, 2032, 1971, 753, 1144, 692, 3015, 1988, 1066, 544, 727, 361, 631β¦
$ SportsBalls <dbl> 7, 5, 6, 6, 7, 5, 3, 6, 6, 7, 6, 8, 6, 6, 6, 6, 5, 6, 6, 7, 6, 6, 6, 5, 6, β¦
$ HomesPowered <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, β¦
glimpse(data_balt_precip)
Rows: 10
Columns: 13
$ year <dbl> 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023
$ january <dbl> 2.71, 3.89, 3.50, 2.69, 1.00, 3.15, 3.11, 2.15, 4.27, 1.68
$ february <dbl> 4.58, 2.24, 5.70, 1.46, 5.30, 3.64, 2.98, 4.85, 2.31, 2.18
$ march <dbl> 4.38, 4.67, 2.10, 3.82, 2.25, 4.14, 3.05, 3.90, 3.13, 1.49
$ april <dbl> 8.60, 4.30, 1.31, 3.52, 3.20, 1.46, 5.52, 2.07, 3.92, 4.12
$ may <dbl> 3.35, 2.10, 5.24, 5.64, 8.17, 5.51, 1.76, 3.63, 5.39, 0.55
$ june <dbl> 3.95, 13.09, 3.20, 1.40, 4.77, 2.95, 5.95, 2.75, 2.95, 4.31
$ july <dbl> 2.80, 3.49, 6.09, 7.11, 16.73, 3.85, 3.43, 3.65, 6.25, 6.84
$ august <dbl> 7.90, 2.46, 3.96, 4.60, 3.84, 2.39, 11.81, 4.36, 3.71, 3.73
$ september <dbl> 3.21, 3.25, 4.36, 1.95, 9.19, 0.16, 4.48, 6.04, 3.35, 6.27
$ october <dbl> 4.16, 3.40, 0.78, 2.99, 2.69, 6.21, 4.36, 5.24, 4.66, 1.13
$ november <dbl> 3.36, 2.42, 1.51, 2.15, 8.14, 1.10, 6.35, 1.33, 2.44, 2.80
$ december <dbl> 3.58, 5.85, 2.77, 0.95, 6.54, 3.57, 4.58, 0.82, 4.80, 7.16
skim(data_trash)
Name | data_trash |
Number of rows | 993 |
Number of columns | 16 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 12 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
ID | 0 | 1 | 6 | 9 | 0 | 4 | 0 |
Name | 0 | 1 | 18 | 21 | 0 | 4 | 0 |
Month | 0 | 1 | 3 | 9 | 0 | 14 | 0 |
Date | 0 | 1 | 6 | 10 | 0 | 623 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Dumpster | 0 | 1.00 | 230.88 | 185.82 | 1.00 | 73.00 | 176.00 | 381.00 | 629.00 | ββ βββ |
Year | 0 | 1.00 | 2019.57 | 2.75 | 2014.00 | 2018.00 | 2020.00 | 2022.00 | 2023.00 | βββ ββ |
Weight | 0 | 1.00 | 2.97 | 0.84 | 0.61 | 2.45 | 3.04 | 3.53 | 5.62 | ββ βββ |
Volume | 0 | 1.00 | 14.92 | 1.61 | 5.00 | 15.00 | 15.00 | 15.00 | 20.00 | βββββ |
PlasticBottles | 1 | 1.00 | 2219.33 | 1650.45 | 0.00 | 987.50 | 1900.00 | 2900.00 | 9830.00 | βββββ |
Polystyrene | 1 | 1.00 | 1436.87 | 1832.43 | 0.00 | 240.00 | 750.00 | 2130.00 | 11528.00 | βββββ |
CigaretteButts | 1 | 1.00 | 13728.12 | 24049.61 | 0.00 | 2900.00 | 4900.00 | 12000.00 | 310000.00 | βββββ |
GlassBottles | 251 | 0.75 | 20.96 | 15.26 | 0.00 | 10.00 | 18.00 | 28.00 | 110.00 | βββββ |
PlasticBags | 1 | 1.00 | 984.00 | 1412.34 | 0.00 | 240.00 | 540.00 | 1210.00 | 13450.00 | βββββ |
Wrappers | 144 | 0.85 | 2238.76 | 2712.85 | 0.00 | 880.00 | 1400.00 | 2490.00 | 20100.00 | βββββ |
SportsBalls | 364 | 0.63 | 13.59 | 9.74 | 0.00 | 6.00 | 12.00 | 20.00 | 56.00 | βββββ |
HomesPowered | 0 | 1.00 | 45.85 | 18.23 | 0.00 | 38.00 | 49.00 | 58.00 | 94.00 | ββββ β |
skim(data_balt_precip)
Name | data_balt_precip |
Number of rows | 10 |
Number of columns | 13 |
_______________________ | |
Column type frequency: | |
numeric | 13 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
year | 0 | 1 | 2018.50 | 3.03 | 2014.00 | 2016.25 | 2018.50 | 2020.75 | 2023.00 | βββββ |
january | 0 | 1 | 2.82 | 1.00 | 1.00 | 2.28 | 2.91 | 3.41 | 4.27 | ββ β ββ |
february | 0 | 1 | 3.52 | 1.50 | 1.46 | 2.26 | 3.31 | 4.78 | 5.70 | ββ ββ β |
march | 0 | 1 | 3.29 | 1.07 | 1.49 | 2.45 | 3.47 | 4.08 | 4.67 | β ββ β β |
april | 0 | 1 | 3.80 | 2.15 | 1.31 | 2.35 | 3.72 | 4.26 | 8.60 | βββββ |
may | 0 | 1 | 4.13 | 2.28 | 0.55 | 2.41 | 4.44 | 5.48 | 8.17 | βββββ |
june | 0 | 1 | 4.53 | 3.26 | 1.40 | 2.95 | 3.58 | 4.65 | 13.09 | βββββ |
july | 0 | 1 | 6.02 | 4.09 | 2.80 | 3.53 | 4.97 | 6.69 | 16.73 | βββββ |
august | 0 | 1 | 4.88 | 2.87 | 2.39 | 3.71 | 3.90 | 4.54 | 11.81 | βββββ |
september | 0 | 1 | 4.23 | 2.51 | 0.16 | 3.22 | 3.86 | 5.65 | 9.19 | β ββ β β |
october | 0 | 1 | 3.56 | 1.73 | 0.78 | 2.77 | 3.78 | 4.58 | 6.21 | β ββ ββ |
november | 0 | 1 | 3.16 | 2.30 | 1.10 | 1.67 | 2.43 | 3.22 | 8.14 | βββββ |
december | 0 | 1 | 4.06 | 2.16 | 0.82 | 2.97 | 4.08 | 5.59 | 7.16 | β βββ β |
Looking further into the data, I noticed a few things of note. Hereβs some things to keep in mind:
- There are missing data (e.g.,
NA
s) within several variables:PlasticBottles
,Polystyrene
,CigaretteButts
,GlassBottles
,PlasticBags
,Wrappers
, andSportsBalls
. The documentation didnβt reference why these were missing and since I wasnβt using these for my contribution, I didnβt dig any further. - The month has an issue with capitalization. Some string formatting should fix this issue, though Iβm not using this column for my contribution.
- The
Date
column needed to be transformed into adate
. This can be addressed by using some functions from thelubridate
package.
Data wrangling
Now that we have a better sense of the data, letβs wrangle it. Below is the code to wrangle both the data_balt_precip
and data_trash
data sets. Since my precipitation data was aggregated by month, I decided to aggregate the trash data by month.
<- data_balt_precip |>
data_balt_precip pivot_longer(cols = january:december, names_to = "month", values_to = "precip") |>
mutate(
month = match(month, str_to_lower(month.name)),
day = 1,
month_date = ymd(str_c(year, month, day, sep = "-"))
|>
) select(
month_date,
precip )
<- data_trash |>
data_trash clean_names() |>
mutate(
id,
name,date = mdy(date),
month_date = floor_date(date, "month"),
dumpster,name = str_to_lower(name),
weight,
volume,.keep = "none"
)
<- data_trash |>
data_trash_summ group_by(month_date) |>
summarise(
total_weight = sum(weight),
total_volume = sum(volume)
|>
) left_join(data_balt_precip)
Joining with `by = join_by(month_date)`
min(data_trash_summ$month_date)
[1] "2014-05-01"
max(data_trash_summ$month_date)
[1] "2023-12-01"
What is the relationship between rainfall and the weight and volume of trash processed by the trash wheels?
To explore this relationship, I created two scatter plots. The first plot included precipitation and total weight. The second included volume and precipitation. I did this because weight and volume represent different things. Hereβs the code to create the two scatter plots using plotly
:
plot_ly(
data = data_trash_summ,
x = ~precip,
y = ~total_weight,
type = "scatter",
mode = "markers",
marker = list(
size = 10,
color = "#6495ED",
line = list(
color = "#151B54",
width = 2
)
),text = ~paste(
month_date,"<br>Precipitation (inches): ", precip,
"<br>Weight (tons): ", total_weight
),hoverinfo = "text"
|>
) ::layout(
plotlytitle = list(
text = "<b>More precipitation is related to heavier amounts of trash for Mr. Trash Wheel and friends to process </b>",
font = list(size = 18),
xanchor = "center"
),yaxis = list(
title = "Total weight of trash (tons)/month",
titlefont = list(size = 14)
),xaxis = list(
title = "Total precipitation in Baltimore (inches)/month",
titlefont = list(size = 14)
),font = list(family = "arial", size = 18, face = "bold")
)
plot_ly(
data = data_trash_summ,
x = ~precip,
y = ~total_volume,
type = "scatter",
mode = "markers",
marker = list(
size = 10,
color = "#FFAA33",
line = list(
color = "#151B54",
width = 2
)
),text = ~paste(
month_date,"<br>Precipitation (inches): ", precip,
"<br>Volume (cubic yards): ", total_volume
),hoverinfo = "text"
|>
) ::layout(
plotlytitle = list(
text = "<b>More precipitation is related to a greater volume of trash for Mr. Trash Wheel and friends to process</b>",
font = list(size = 18),
xanchor = "center"
),yaxis = list(
title = "Total volume of trash (cubic yards)/month",
titlefont = list(size = 14)
),xaxis = list(
title = "Total precipitation in Baltimore (inches)/month",
titlefont = list(size = 14)
),font = list(family = "arial", size = 18, face = "bold")
)
Looking at the individual observations, I had a hard time fathoming how much trash Mr. Trash Wheel and friends were processing. So, hereβs a video giving you a sense of dimension of how much trash is really being collectedβitβs a lot once you put it into perspective. I mean, in one month, the trash wheels processed nearly 25 of these 20 cubic yard dumpsters worth of trash. If youβve ever seen these dumpters in real-life, theyβre huge.
Although upon visual inspection it seems a positive relationship is present for both weight and volume of trash, I wanted to further quantify this relationship using a linear model. To do this, I utilized tidymodels
to create two simple linear models, one for volume and the other for weight of trash.
<- linear_reg() |>
lm_mdl set_engine("lm")
<-
volume_mdl |>
lm_mdl fit(total_volume ~ precip, data = data_trash_summ)
tidy(volume_mdl)
# A tibble: 2 Γ 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 65.4 16.8 3.90 0.000167
2 precip 16.0 3.59 4.47 0.0000186
<-
weight_mdl |>
lm_mdl fit(total_weight ~ precip, data = data_trash_summ)
tidy(weight_mdl)
# A tibble: 2 Γ 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 11.6 3.68 3.16 0.00200
2 precip 3.53 0.785 4.49 0.0000172
Both models indicate a statistically significant positive relationship between precipitation, volume, and weight of trash processed. In fact, for every additional inch of precipitation a month in Baltimore, the volume of trash processed increases by 16 cubic yards and the weight of trash increases by 3.53 tons.
The bottom line, throw your trash away properly. It has down stream effects, literally β¦ no pun intended.
An attempt using Tableau
To further practice my data visualization tool skills, I recreated these plots using Tableau. You can view this version by clicking here.
Reuse
Citation
@misc{berke2024,
author = {Berke, Collin K},
title = {Exploring the Relationship Between Trash Processed by {Mr.}
{Trash} {Wheel} and Precipitation},
date = {2024-03-12},
langid = {en}
}