SpaceX Falcon 9 First Stage Landing Analysis¶
Data Wrangling and Preparation¶
Estimated time: About 1 hour (may vary depending on your pace)
In this notebook, I explore and clean SpaceX Falcon 9 launch data to prepare it for machine learning. The goal is to understand the different landing outcomes and create a clear label for successful vs. unsuccessful landings. This will help in building predictive models later.
The dataset includes various outcomes, such as landings in the ocean, on drone ships, or on ground pads. Some landings are successful, while others are not. For this project, I simplify these outcomes into two categories: 1 for a successful landing and 0 for an unsuccessful one.
What makes a Falcon 9 landing successful?¶
Below, I also highlight some examples of unsuccessful landings, which are important for understanding the challenges SpaceX faces.
My Objectives¶
- Explore and analyze the SpaceX Falcon 9 launch data
- Define clear training labels for machine learning
- Document my own findings and insights
Import Libraries and Define Helper Functions¶
I will import the following libraries for data analysis and wrangling.
# Importing pandas for data manipulation and numpy for numerical operations
import pandas as pd
import numpy as np
Data Analysis¶
Load the SpaceX dataset for analysis.
df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/datasets/dataset_part_1.csv")
df.head(10)
Identify and calculate the percentage of missing values in each attribute.
df.isnull().sum() / len(df) * 100
Identify which columns are numerical and which are categorical.
df.dtypes
Task 1: Calculate the number of launches on each site¶
The data contains several SpaceX launch facilities. The location of each launch is in the column LaunchSite.
Next, let's see the number of launches for each site using value_counts() on the LaunchSite column.
df["LaunchSite"].value_counts()
Each launch aims for a dedicated orbit. Here are some common orbit types:
- LEO: Low Earth orbit
- VLEO: Very Low Earth Orbit
- GTO: Geosynchronous Transfer Orbit
- SSO: Sun-synchronous Orbit
- ES-L1: Lagrange Point 1
- HEO: Highly Elliptical Orbit
- ISS: International Space Station
- MEO: Medium Earth Orbit
- GEO: Geostationary Orbit
- PO: Polar Orbit
(See Wikipedia for more details on each orbit type.)
Task 2: Calculate the number and occurrence of each orbit¶
Use value_counts() to determine the number and occurrence of each orbit in the Orbit column.
df["Orbit"].value_counts()
Task 3: Calculate the number and occurrence of mission outcomes¶
Use value_counts() on the Outcome column to determine the number of landing outcomes. Assign it to a variable landing_outcomes.
landing_outcomes = df["Outcome"].value_counts()
The Outcome column contains values like True Ocean, False Ocean, True RTLS, False RTLS, True ASDS, False ASDS, and others. For this project, I will group these into successful and unsuccessful landings.
for i, outcome in enumerate(landing_outcomes.keys()):
print(i, outcome)
Create a set of outcomes where the first stage did not land successfully.
bad_outcomes = set(landing_outcomes.keys()[[1, 3, 5, 6, 7]])
bad_outcomes
Task 4: Create a landing outcome label from the Outcome column¶
Using the Outcome column, create a list where the element is zero if the corresponding row is in bad_outcomes; otherwise, it's one. Assign it to the variable landing_class.
landing_class = [0 if outcome in bad_outcomes else 1 for outcome in df['Outcome']]
This variable will represent the classification variable for each launch. If the value is zero, the first stage did not land successfully; one means it landed successfully.
df['Class'] = landing_class
df[['Class']].head(8)
df.head(5)
We can use the following line of code to determine the success rate:
df["Class"].mean()
Now, export the cleaned data to a CSV for use in the next section.
df.to_csv("falcon9_cleaned_data.csv", index=False)
df.to_csv("falcon9_cleaned_data.csv", index=False)
Notebook and analysis by [Your Name], [Date].
This notebook is part of my personal SpaceX data science project. All content and analysis are original and reflect my own approach.