SpaceX Falcon 9 First Stage Landing Data Collection¶

This notebook is part of my personal data science project. All content and analysis are original and tailored for my own exploration of SpaceX launch data.

Project Overview¶

The goal is to collect, clean, and prepare SpaceX Falcon 9 launch data for further analysis and machine learning. This notebook focuses on retrieving data from the SpaceX API and performing initial wrangling.

Objectives¶

  • Request and collect SpaceX Falcon 9 launch data from the API
  • Clean and format the data for analysis
  • Prepare the dataset for downstream machine learning tasks

Import Libraries and Define Helper Functions¶

In [ ]:
import requests
import pandas as pd
import numpy as np
import datetime
In [ ]:
# Helper functions to extract details from API responses
def getBoosterVersion(data):
    for x in data['rocket']:
        if x:
            response = requests.get(f"https://api.spacexdata.com/v4/rockets/{x}").json()
            BoosterVersion.append(response['name'])

def getLaunchSite(data):
    for x in data['launchpad']:
        if x:
            response = requests.get(f"https://api.spacexdata.com/v4/launchpads/{x}").json()
            Longitude.append(response['longitude'])
            Latitude.append(response['latitude'])
            LaunchSite.append(response['name'])

def getPayloadData(data):
    for load in data['payloads']:
        if load:
            response = requests.get(f"https://api.spacexdata.com/v4/payloads/{load}").json()
            PayloadMass.append(response['mass_kg'])
            Orbit.append(response['orbit'])

def getCoreData(data):
    for core in data['cores']:
        if core['core'] is not None:
            response = requests.get(f"https://api.spacexdata.com/v4/cores/{core['core']}").json()
            Block.append(response['block'])
            ReusedCount.append(response['reuse_count'])
            Serial.append(response['serial'])
        else:
            Block.append(None)
            ReusedCount.append(None)
            Serial.append(None)
        Outcome.append(str(core['landing_success']) + ' ' + str(core['landing_type']))
        Flights.append(core['flight'])
        GridFins.append(core['gridfins'])
        Reused.append(core['reused'])
        Legs.append(core['legs'])
        LandingPad.append(core['landpad'])

Data Collection¶

Request SpaceX Falcon 9 launch data from the API and perform initial wrangling.

In [ ]:
spacex_url = "https://api.spacexdata.com/v4/launches/past"
response = requests.get(spacex_url)
data = pd.json_normalize(response.json())
In [ ]:
# Keep only relevant columns and filter for single-core, single-payload launches
data = data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]
data = data[data['cores'].map(len) == 1]
data = data[data['payloads'].map(len) == 1]
data['cores'] = data['cores'].map(lambda x: x[0])
data['payloads'] = data['payloads'].map(lambda x: x[0])
data['date'] = pd.to_datetime(data['date_utc']).dt.date
data = data[data['date'] <= datetime.date(2020, 11, 13)]
In [ ]:
# Prepare lists for extracted features
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

# Extract features using helper functions
getBoosterVersion(data)
getLaunchSite(data)
getPayloadData(data)
getCoreData(data)
In [ ]:
# Construct the final dataset
dataset = {
    'FlightNumber': list(data['flight_number']),
    'Date': list(data['date']),
    'BoosterVersion': BoosterVersion,
    'PayloadMass': PayloadMass,
    'Orbit': Orbit,
    'LaunchSite': LaunchSite,
    'Outcome': Outcome,
    'Flights': Flights,
    'GridFins': GridFins,
    'Reused': Reused,
    'Legs': Legs,
    'LandingPad': LandingPad,
    'Block': Block,
    'ReusedCount': ReusedCount,
    'Serial': Serial,
    'Longitude': Longitude,
    'Latitude': Latitude
}
df = pd.DataFrame(dataset)

Data Cleaning¶

Filter for Falcon 9 launches and handle missing values.

In [ ]:
# Keep only Falcon 9 launches
df = df[df['BoosterVersion'] != 'Falcon 1']
df.loc[:, 'FlightNumber'] = list(range(1, df.shape[0] + 1))
In [ ]:
# Handle missing values in PayloadMass
payload_mass_mean = df['PayloadMass'].mean()
df['PayloadMass'].replace(np.nan, payload_mass_mean, inplace=True)

Save Cleaned Data¶

Export the cleaned dataset for further analysis.

In [ ]:
df.to_csv('dataset-part-1.csv', index=False)