library(tidyverse)
library(wbstats)
library(here)
library(skimr)
library(janitor)
library(plotly)
library(scales)
library(psych)
Exploring objects launched into space and gross domestic product
Background
3… 2… 1… blastoff 🚀. This week’s #tidytuesday
dataset focuses on annual objects launched into space by various entities.
This data is maintained by the United Nations Office for Outer Space Affairs, and it is made available via the Online Index of Objects Launched into Outer Space. Objects include things like satellites, probes, landers, crewed spacecrafts, and space station flight elements launched into Earth orbit or beyond. Although this list aims to be comprehensive, it only includes launches submitted to the UN by participating nations. In addition, joint launches count as one launch for each country (i.e., counts when examined by country may be duplicated). Initially, Our World in Data processed this data and created an annual trend for each country.
Since this data is focused on country, my interest peaked by asking the following question: what is the relationship between a country’s Gross Domestic Product (GDP), a broad indicator or a country’s economic output, and objects launched into space? To answer this question, I create a scatter plot and quantify this relationship using a simple linear regression in this post.
But first, some space banjo music
Seeing as we’re exploring objects launched into space, I felt a little music was in order. Here’s some space banjo ambient for your listening pleasure.
Deep Space Banjo🪕 - Ambient Spacefolk Chillwave by Timber Zeal
Setup and data import
First, let’s import the #tidytuesday
dataset. While we’re importing, I’ll also go ahead and use janitor
’s clean_names()
function to clean up the dataset’s variable names in one step. Here’s the code needed to do this:
<- read_csv(
data_space_objs here(
"blog/posts/",
"2024-04-25-tidytuesday-2024-05-03-space-launches",
"outer_space_objects.csv"
)|>
) clean_names()
Rows: 1175 Columns: 4
── Column specification ────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Entity, Code
dbl (2): Year, num_objects
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Use wbstats
package to obtain GDP
The original dataset didn’t contain Gross Domestic Product (GDP). As such, I had to supplement it with additional data from the World Bank. The world bank makes data containing an estimate of GDP available via an API. In fact, the wbstats
R package provides an intuitive interface to access data via this API. Here’s the code I used to return data from the API using the wbstats
package:
# Interested in looking at:
# * Gross Domestic Product (GDP)
<- c(
wb_variables "gdp" = "NY.GDP.MKTP.CD"
)
<- wb_data(
data_wb
wb_variables,start_date = 1957,
end_date = 2023
|>
) select(
code = iso3c,
year = date,
country,
gdp,starts_with("tax")
)
Explore the data
Now with the data available, let’s do some data exploration. Here I’ll use dplyr
’s glimpse()
function to get a sense of the data’s structure and column names.
glimpse(data_space_objs)
Rows: 1,175
Columns: 4
$ entity <chr> "APSCO", "Algeria", "Algeria", "Algeria", "Algeria", "Angola", "Angola", "Arab…
$ code <chr> NA, "DZA", "DZA", "DZA", "DZA", "AGO", "AGO", NA, NA, NA, NA, NA, NA, NA, NA, …
$ year <dbl> 2023, 2002, 2010, 2016, 2017, 2017, 2022, 1985, 1992, 1996, 1999, 2006, 2008, …
$ num_objects <dbl> 1, 1, 1, 3, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 3, 1, 1, 2, 2, 1, 1, …
glimpse(data_wb)
Rows: 13,888
Columns: 4
$ code <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW"…
$ year <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973…
$ country <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "…
$ gdp <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
Since there’s a common column between these datasets, code
, we’ll do a left join to only include GDP data for countries that launched a space object. While exploring the data, though, I noticed that the data_space_objs
data had entities other than countries. In addition, some of the World Bank data had NA
values present in the GDP variable. Indeed, an argument could be made to apply imputation methods to address these missing values. However, I’m just going to drop any missing values to make things easy. I do this by using the drop_na()
function from dplyr
.
<- data_space_objs |>
data_space_wb left_join(data_wb, by = c("year", "code")) |>
drop_na(c(code, gdp))
With data wrangling complete, we can quickly get a sense of the shape of our data with skimr
’s skim()
function. What becomes immediately apparent is both the num_objects
and gdp
variables exhibit a distribution that is skewed to the right.
skim(data_space_wb)
Name | data_space_wb |
Number of rows | 871 |
Number of columns | 6 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
entity | 0 | 1 | 4 | 20 | 0 | 91 | 0 |
code | 0 | 1 | 3 | 3 | 0 | 91 | 0 |
country | 0 | 1 | 4 | 20 | 0 | 91 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
year | 0 | 1 | 2.004220e+03 | 1.498000e+01 | 1960 | 1995 | 2008 | 2.017000e+03 | 2.02200e+03 | ▁▁▃▃▇ |
num_objects | 0 | 1 | 1.375000e+01 | 8.800000e+01 | 1 | 1 | 2 | 4.000000e+00 | 1.93900e+03 | ▇▁▁▁▁ |
gdp | 0 | 1 | 1.698805e+12 | 3.179395e+12 | 2583335724 | 216531050429 | 554212916092 | 1.782499e+12 | 2.54397e+13 | ▇▁▁▁▁ |
We can further confirm this visually by creating some histograms. I’ll do this using R’s base hist()
function.
hist(data_space_wb$num_objects)
hist(data_space_wb$gdp)
Scatter plot of objects and GDP
Let’s get a sense of the relationship between space objects launched and a country’s GDP. To do this, we’ll create a scatter plot using plotly
.
<- plot_ly(
vis_space_scatter data = data_space_wb
|>
) add_trace(
x = ~gdp,
y = ~num_objects,
type = "scatter",
mode = "markers",
marker = list(
color = "#006cd8",
size = 10,
line = list(
color = "#00008c",
width = 2
)
),text = ~paste(
"Year: ", year,
"<br>Country: ", entity,
"<br>Objects launched: ", comma(num_objects),
"<br>GDP: ", comma(gdp)
),hoverinfo = "text"
|>
) ::layout(
plotlytitle = "<b>A country's GDP is positively related to the number of space objects launched",
xaxis = list(title = "Gross Domestic Product (GDP)"),
yaxis = list(
title = "Objects launced into space",
range = c(0, NULL),
tickformat = ","
)
)
vis_space_scatter
Given the distribution of the data, it’s challenging to see the individual values. As such, I decided to recreate the plot by log transforming both GDP and the number of objects launched into space.
plot_ly(
data = data_space_wb
|>
) add_trace(
x = ~log(gdp),
y = ~log(num_objects),
type = "scatter",
mode = "markers",
marker = list(
color = "#006cd8",
size = 10,
line = list(
color = "#00008c",
width = 2
)
),text = ~paste(
"Year: ", year,
"<br>Country: ", entity,
"<br>Objects launched: ", comma(num_objects),
"<br>GDP: ", comma(gdp)
),hoverinfo = "text"
|>
) ::layout(
plotlytitle = "<b>A country's GDP is positively related to the number of space objects launched",
xaxis = list(title = "Gross Domestic Product (GDP) (logged)"),
yaxis = list(
title = "Objects launced into space (logged)",
range = c(0, NULL),
tickformat = ","
) )
Log transforming these variables now allows us to more easily view the individual values for each country.
Explore the correlation
Visual inspection points to a positive relationship between these two variables. We can use psychs
’s pairs.panels()
function to create a quick visualization and value quantifying this relationship.
pairs.panels(data_space_wb[c("num_objects", "gdp")])
The output provides further evidence of the presence of a positive correlation between these two variables. Now, let’s go one step further and use a simple linear regression to further explore this relationship.
Use simple linear regression to explore launched objects and GDP
Given this is a simple linear regression, I’ll use stats
’ lm()
function to specify the model. Given the scale of the values, I also went ahead and set the scipen
option to avoid printing the output in scientific notation.
# Set the `scipen` object to avoid printing in scientific notation
options(scipen=999)
<- lm(num_objects ~ gdp, data = data_space_wb) space_gdp_mdl
Using the space_gdp_model
object, we can use summary()
to output information about our model. We’ll also use this information to interpret the results.
summary(space_gdp_mdl)
Call:
lm(formula = num_objects ~ gdp, data = data_space_wb)
Residuals:
Min 1Q Median 3Q Max
-198.06 -8.62 6.39 10.77 1570.79
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -11.6127829980763799 2.8486337637951760 -4.077 0.0000499 ***
gdp 0.0000000000149303 0.0000000000007906 18.885 < 0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 74.14 on 869 degrees of freedom
Multiple R-squared: 0.291, Adjusted R-squared: 0.2902
F-statistic: 356.6 on 1 and 869 DF, p-value: < 0.00000000000000022
It’s interesting to see the R-squared value is .291, which is fairly large. This was kind of unexpected, given the complexities inherent within a country’s economy, how GDP results in the funding of space projects, and the technology and infrastructure needed to launch objects into space. Indeed, I was expecting a much smaller R-squared value. It’s also important to recognize this could be some statistical artifact, as there’s a wide discrepancy between countries. Some countries launched space objects in the single digits, while only a few launched hundereds or even thousands for some years.
In addition to the R-squared value, we can use the coefficients to draw further conclusions about this relationship. For instance, we can get a sense of how much GDP a country might require before launching a single object into space. Using our model estimates, it seems a country needs to have a GDP of nearly $66B dollars to begin considering launching objects into space. Another way to look at this $66B estimate is that if countries want to send more objects into space, they need to improve their GDP by this much to launch one additional object into space. Indeed, there are many factors that go into a a country’s ability to launch an object into space. However, the results from this model still give a very general estimate of the economic output a country needs to have before considering these types of projects.
Now that we have the model, we can go ahead and use predict()
to append model predictions to the original data set. We can then plot those values on our original scatter plot to get a better sense of what this relationship looks like. The following code will do this for us:
$obj_pred <- predict(space_gdp_mdl, data_space_wb) data_space_wb
|>
vis_space_scatter add_trace(
data = data_space_wb,
x = ~gdp,
y = ~obj_pred,
type = "scatter",
mode = "lines",
showlegend = FALSE,
line = list(width = 5),
text = ~paste(
"Prediction: ", obj_pred
),hoverinfo = "text"
|>
) ::layout(
plotlytitle = "<b>A country's GDP is positively related to the number of space objects launched",
xaxis = list(title = "Gross Domestic Product (GDP)"),
yaxis = list(
title = "Objects launced into space",
range = c(0, NULL),
tickformat = ","
) )
Why is the USA so much further away from what is predicted?
Exploring the plot, I began to question why the US doesn’t fall within what was expected from our model. My hunch is this is due to the rise in commerical space flight here in the US. In fact, here’s a couple references I came across that go into more detail about the booming commercial space industry. One such reference even goes so far to state private space flight has lead us into the fourth industrial revolution. Learn more:
- U.S. private space launch industry is out of this world
- How space exploration is fueling the Fourth Industrial Revolution
- The commercial space age is here
Indeed, it’s reasonable to assume that if the US can shuttle contracts to private space companies rather than funding whole space programs to launch objects into space, then you’ll likely launch more objects than would be expected. In other words, the US government gets more bang for its buck working with commercial space companies. It’s also important to recognize that the commerical space industry makes launching objects into space more viable for companys and startups, like Varda Space Industries, who’s using space vehicles to manufacture pharmaceuticals (seriously listen to this interesting report from Marketplace).
Wrap up
In this post, we explored data representing objects launched into space from the United Nations Office for Outer Space Affairs. Specifically, we explored and found a relationship between a country’s gross domestic product and the number of objects it launches into space. This was done by creating a scatter plot and using the results from a simple linear regression. Surprisingly, it was interesting to see how the US far and away exceeded the predictions of our model. I posited and provided a few sources that attributes this result to the rise of the commercial space flight industry here in the US. I did all this while also peppering in some poorly delivered space puns, with a backdrop of some space banjo music.
I hope you enjoyed this post as much as I did writing it. This was a fun little data set. Check out the #tidytuesday
GitHub repo for other fun data sets to explore.
Reuse
Citation
@misc{berke2024,
author = {Berke, Collin K},
title = {Exploring Objects Launched into Space and Gross Domestic
Product},
date = {2024-05-03},
langid = {en}
}