Interactive Data Visualization with Plotly¶

By Mohammad Sayem Chowdhury


My Motivation¶

In this notebook, I explore the basics of Plotly for interactive data visualization. My goal is to experiment with different chart types and features, documenting what I learn and discover along the way.


Dataset Overview¶

I use a dataset of US domestic airline flights, including flight times, delays, and performance. My goal is to explore and visualize this data using Plotly's expressive charting tools.

Why Plotly?¶

I'm interested in interactive visualizations that go beyond static charts. Plotly offers a lot of flexibility, and I'm using this notebook to get hands-on experience and see what I can create.

In [ ]:
# Author: Mohammad Sayem Chowdhury
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

Loading the Data¶

Data and Setup¶

For these experiments, I use sample datasets and focus on learning how to use Plotly's features. The emphasis is on exploration and creativity rather than following a set of instructions.

In [ ]:
# Author: Mohammad Sayem Chowdhury
# Load the airline data into a pandas DataFrame
airline_data = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/airline_data.csv', 
                            encoding = "ISO-8859-1",
                            dtype={'Div1Airport': str, 'Div1TailNum': str, 
                                   'Div2Airport': str, 'Div2TailNum': str})
In [ ]:
# Preview the first 5 rows
airline_data.head()
Out[ ]:
Unnamed: 0 Year Quarter Month DayofMonth DayOfWeek FlightDate Reporting_Airline DOT_ID_Reporting_Airline IATA_CODE_Reporting_Airline ... Div4WheelsOff Div4TailNum Div5Airport Div5AirportID Div5AirportSeqID Div5WheelsOn Div5TotalGTime Div5LongestGTime Div5WheelsOff Div5TailNum
0 1295781 1998 2 4 2 4 1998-04-02 AS 19930 AS ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 1125375 2013 2 5 13 1 2013-05-13 EV 20366 EV ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 118824 1993 3 9 25 6 1993-09-25 UA 19977 UA ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 634825 1994 4 11 12 6 1994-11-12 HP 19991 HP ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 1888125 2017 3 8 17 4 2017-08-17 UA 19977 UA ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 110 columns

In [ ]:
# Check the shape of the data
airline_data.shape
Out[ ]:
(27000, 110)
In [ ]:
# Randomly sample 500 data points for faster plotting
sampled_data = airline_data.sample(n=500, random_state=42)
In [ ]:
# Check the shape of the sampled data
sampled_data.shape
Out[ ]:
(500, 110)

Plotly Graph Objects¶

My Observations¶

As I try out different Plotly charts, I note what works well, what surprises me, and any challenges I encounter.

Scatter Plot: Distance vs Departure Time¶

I want to see how departure time changes with respect to airport distance.

In [ ]:
fig = go.Figure(data=go.Scatter(x=sampled_data['Distance'], y=sampled_data['DepTime'], mode='markers', marker=dict(color='red')))
fig.update_layout(title='Distance vs Departure Time', xaxis_title='Distance', yaxis_title='DepTime')
fig.show()

Reflections¶

This section is for my thoughts on using Plotly, including what I found useful and what I want to explore further.

In [ ]:
# Group by Month and compute average arrival delay
delay_by_month = sampled_data.groupby('Month')['ArrDelay'].mean().reset_index()
In [ ]:
delay_by_month
Out[ ]:
Month ArrDelay
0 1 2.232558
1 2 2.687500
2 3 10.868421
3 4 6.229167
4 5 -0.279070
5 6 17.310345
6 7 5.088889
7 8 3.121951
8 9 9.081081
9 10 1.200000
10 11 -3.975000
11 12 3.240741
In [ ]:
fig = go.Figure(data=go.Scatter(x=delay_by_month['Month'], y=delay_by_month['ArrDelay'], mode='lines', marker=dict(color='blue')))
fig.update_layout(title='Month vs Average Delay Time', xaxis_title='Month', yaxis_title='Average Delay Time')
fig.show()

Plotly Express¶

Bar Chart: Total Flights by Destination State¶

I want to see the total number of flights to each destination state.

Next Steps¶

Based on what I've learned so far, I plan to try more advanced Plotly features and apply them to my own datasets.

In [ ]:
bar_data = sampled_data.groupby(['DestState'])['Flights'].sum().reset_index()
In [ ]:
bar_data
Out[ ]:
DestState Flights
0 AK 4.0
1 AL 3.0
2 AZ 8.0
3 CA 68.0
4 CO 20.0
5 CT 5.0
6 FL 32.0
7 GA 27.0
8 HI 5.0
9 IA 1.0
10 ID 1.0
11 IL 33.0
12 IN 6.0
13 KS 1.0
14 KY 14.0
15 LA 4.0
16 MA 10.0
17 MD 7.0
18 MI 16.0
19 MN 11.0
20 MO 18.0
21 MT 3.0
22 NC 13.0
23 NE 2.0
24 NH 1.0
25 NJ 5.0
26 NM 1.0
27 NV 13.0
28 NY 21.0
29 OH 9.0
30 OK 6.0
31 OR 3.0
32 PA 14.0
33 PR 2.0
34 RI 1.0
35 SC 1.0
36 TN 14.0
37 TX 60.0
38 UT 7.0
39 VA 11.0
40 VI 1.0
41 WA 10.0
42 WI 8.0
In [ ]:
fig = px.bar(bar_data, x="DestState", y="Flights", title='Total number of flights to each destination state')
fig.show()

Summary¶

This notebook captures my personal journey with Plotly, highlighting my experiments, discoveries, and areas for future exploration.

Bubble Chart: Flights by Reporting Airline¶

I want to compare the number of flights for each reporting airline.

In [ ]:
bub_data = sampled_data.groupby('Reporting_Airline')['Flights'].sum().reset_index()
In [41]:
bub_data
Out[41]:
Reporting_Airline Flights
0 9E 5.0
1 AA 57.0
2 AS 14.0
3 B6 10.0
4 CO 12.0
5 DL 66.0
6 EA 4.0
7 EV 11.0
8 F9 4.0
9 FL 3.0
10 HA 3.0
11 HP 7.0
12 KH 1.0
13 MQ 27.0
14 NK 3.0
15 NW 26.0
16 OH 8.0
17 OO 28.0
18 PA (1) 1.0
19 PI 1.0
20 PS 1.0
21 TW 14.0
22 UA 51.0
23 US 43.0
24 VX 1.0
25 WN 86.0
26 XE 6.0
27 YV 6.0
28 YX 1.0
In [ ]:
fig = px.scatter(bub_data, x="Reporting_Airline", y="Flights", size="Flights",
                 hover_name="Reporting_Airline", title='Reporting Airline vs Number of Flights', size_max=60)
fig.show()
In [ ]:
sampled_data['ArrDelay'] = sampled_data['ArrDelay'].fillna(0)
Out[ ]:
5312     32.0
18357    -1.0
6428     -5.0
15414    -2.0
10610   -11.0
         ... 
18946     8.0
16291    -5.0
21818   -14.0
24116    88.0
16705     4.0
Name: ArrDelay, Length: 500, dtype: float64
In [ ]:
fig = px.histogram(sampled_data, x="ArrDelay", title='Distribution of Arrival Delays')
fig.show()

Additional Notes¶

Any extra observations or ideas for future projects will go here.

In [ ]:
fig = px.pie(sampled_data, values='Month', names='DistanceGroup', title='Distance group proportion by month')
fig.show()

Appendix¶

Supporting code, references, or resources for my Plotly experiments.

In [ ]:
fig = px.sunburst(sampled_data, path=['Month', 'DestStateName'], values='Flights', title='Flights by Month and Destination State')
fig.show()

Summary¶

Through these visualizations, I gained a deeper understanding of airline performance and delays. Plotly makes it easy to create interactive and insightful charts.

End of Notebook¶

This marks the end of my Plotly basics exploration.