SpaceX Falcon 9 SQL Data Analysis: A Personal Data Science Journey¶
This notebook is a key chapter in my end-to-end SpaceX Falcon 9 data science project. After collecting and cleaning launch data, I now dive into SQL-based analysis to uncover insights about launch outcomes, payloads, and mission details. Each step builds on the previous, forming a progressive, reproducible workflow.
Project Narrative¶
My project follows a real-world data science pipeline:
- Web Scraping & Data Collection: Gather launch records from Wikipedia and SpaceX APIs.
- Data Wrangling & Cleaning: Prepare and clean the raw data for analysis.
- SQL Analysis & Feature Engineering (this notebook): Use SQL and Python to explore, aggregate, and transform the data for deeper insights.
- Exploratory Data Analysis (EDA): Visualize trends and patterns.
- Machine Learning & Prediction: Build and compare models to predict Falcon 9 first stage landing outcomes.
- Dashboarding & Communication: Present findings with interactive dashboards and clear visualizations.
This notebook demonstrates the power of SQL for data exploration and feature engineering, setting the stage for advanced analytics and modeling.
Dataset¶
The dataset contains records for each payload carried during a SpaceX mission. It includes launch site, payload mass, booster version, customer, and mission outcome, among other fields.
# Install required libraries
!pip install sqlalchemy==1.3.9
!pip install -q pandas==1.1.5
# Load SQL extension and connect to SQLite database
%load_ext sql
import csv, sqlite3
con = sqlite3.connect("my-data1.db")
cur = con.cursor()
%sql sqlite:///my-data1.db
# Load the SpaceX dataset into a pandas DataFrame and store it in the database
import pandas as pd
df = pd.read_csv("spacex-web-scraped.csv")
df.to_sql("SPACEXTBL", con, if_exists='replace', index=False, method="multi")
# Remove blank rows from the table
%sql create table SPACEXTABLE as select * from SPACEXTBL where Date is not null
SQL Analysis Tasks¶
Below are a series of SQL queries to explore and analyze the SpaceX Falcon 9 dataset.
Task 1: Display the names of the unique launch sites¶
cur.execute("SELECT DISTINCT Launch_Site FROM SPACEXTBL")
results = cur.fetchall()
for row in results:
print(row)
Task 2: Display 5 records where launch sites begin with the string 'CCA'¶
cur.execute("SELECT * FROM SPACEXTBL where Launch_Site LIKE 'CCA%' LIMIT 5")
results = cur.fetchall()
for row in results:
print(row)
Task 3: Display the total payload mass carried by boosters launched by NASA (CRS)¶
cur.execute("SELECT SUM(PAYLOAD_MASS__KG_) AS TotalPayloadMass FROM SPACEXTBL WHERE Customer = 'NASA (CRS)';")
results = cur.fetchall()
for row in results:
print(row)
Task 4: Display average payload mass carried by booster version F9 v1.1¶
cur.execute("SELECT AVG(PAYLOAD_MASS__KG_) AS AVERAGEPayloadMass FROM SPACEXTBL WHERE Booster_Version = 'F9 v1.1';")
results = cur.fetchall()
for row in results:
print(row)
Task 5: List the date when the first successful landing outcome in ground pad was achieved¶
cur.execute("SELECT MIN(Date) AS FirstSuccessfulGroundPadLandingDate FROM SPACEXTBL WHERE Landing_Outcome = 'Success (ground pad)';")
results = cur.fetchall()
for row in results:
print(row)
Task 6: List the names of the boosters which have success in drone ship and have payload mass greater than 4000 but less than 6000¶
cur.execute("SELECT Booster_Version FROM SPACEXTBL WHERE Landing_Outcome = 'Success (drone ship)' AND PAYLOAD_MASS__KG_ BETWEEN 4000 AND 6000;")
results = cur.fetchall()
for row in results:
print(row)
Task 7: List the total number of successful and failure mission outcomes¶
cur.execute("SELECT Mission_Outcome, COUNT(Mission_Outcome) FROM SPACEXTBL GROUP BY Mission_Outcome;")
results = cur.fetchall()
for row in results:
print(row)
Task 8: List the names of the booster_versions which have carried the maximum payload mass. Use a subquery¶
cur.execute("SELECT Booster_Version FROM SPACEXTBL WHERE PAYLOAD_MASS__KG_ = (SELECT MAX(PAYLOAD_MASS__KG_) FROM SPACEXTBL);")
results = cur.fetchall()
for row in results:
print(row)
Task 9: List the records which will display the month names, failure landing_outcomes in drone ship, booster versions, launch_site for the months in year 2015¶
cur.execute("SELECT SUBSTR(Date, 6, 2) AS Month,Landing_Outcome,Booster_Version,Launch_Site FROM SPACEXTBL WHERE SUBSTR(Date, 0, 5) = '2015' AND Landing_Outcome LIKE '%Failure%' AND Landing_Outcome LIKE '%drone ship%';")
results = cur.fetchall()
for row in results:
print(row)
Task 10: Rank the count of landing outcomes between the date 2010-06-04 and 2017-03-20, in descending order¶
cur.execute("SELECT Landing_Outcome, COUNT(*) AS Outcome_Count FROM SPACEXTBL WHERE Date BETWEEN '2010-06-04' AND '2017-03-20' GROUP BY Landing_Outcome ORDER BY Outcome_Count DESC")
results = cur.fetchall()
for row in results:
print(row)