Visualizing Data: Waffle Charts, Word Clouds, and Regression Analysis¶

By Mohammad Sayem Chowdhury


My Motivation¶

This notebook is a record of my personal journey exploring creative ways to visualize data using Python. I focus on building waffle charts, word clouds, and regression plots from scratch, sharing my process, challenges, and insights along the way. My aim is to experiment, learn, and document what I discover, rather than follow a set curriculum.


Project Roadmap¶

  1. Data Exploration
  2. Data Preparation
  3. Visualization Experiments
    • Waffle Charts
    • Word Clouds
    • Regression Plots

This roadmap reflects my personal journey through different visualization techniques.

Exploring Data with pandas and Matplotlib¶

For this project, I chose pandas and numpy for data wrangling and matplotlib for plotting. The dataset covers immigration to Canada from 1980 to 2013, sourced from the United Nations. My goal is to experiment with different ways to explore and visualize this data, documenting my process and discoveries.

Downloading and Preparing Data¶

I start by ensuring I have the right tools and data for my analysis.

Before diving in, I make sure the required Excel reader is available for pandas so I can load the data smoothly.

In [ ]:
# Author: Mohammad Sayem Chowdhury
# If you need to read Excel files, ensure openpyxl is installed.
# !pip install openpyxl==3.0.9

Here are the main libraries I use throughout this project, chosen for their flexibility and power in data analysis and visualization.

In [ ]:
# Author: Mohammad Sayem Chowdhury
import numpy as np  # for numerical operations
import pandas as pd # for data manipulation
from PIL import Image # for image processing (used later)

Now, I load the Canadian immigration dataset directly into a pandas DataFrame for my own analysis and visualization experiments.

In [ ]:
# Author: Mohammad Sayem Chowdhury
# Load the dataset from the web
immigration_data = pd.read_excel(
    'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/Canada.xlsx',
    sheet_name='Canada by Citizenship',
    skiprows=range(20),
    skipfooter=2)

print('Data loaded into DataFrame.')
Data downloaded and read into a dataframe!

Previewing the first few rows of the dataset gives me a sense of its structure and helps me decide what to explore next.

In [ ]:
immigration_data.head()
Out[ ]:
Type Coverage OdName AREA AreaName REG RegName DEV DevName 1980 ... 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
0 Immigrants Foreigners Afghanistan 935 Asia 5501 Southern Asia 902 Developing regions 16 ... 2978 3436 3009 2652 2111 1746 1758 2203 2635 2004
1 Immigrants Foreigners Albania 908 Europe 925 Southern Europe 901 Developed regions 1 ... 1450 1223 856 702 560 716 561 539 620 603
2 Immigrants Foreigners Algeria 903 Africa 912 Northern Africa 902 Developing regions 80 ... 3616 3626 4807 3623 4005 5393 4752 4325 3774 4331
3 Immigrants Foreigners American Samoa 909 Oceania 957 Polynesia 902 Developing regions 0 ... 0 0 1 0 0 0 0 0 0 0
4 Immigrants Foreigners Andorra 908 Europe 925 Southern Europe 901 Developed regions 0 ... 0 0 1 1 0 0 0 0 1 1

5 rows × 43 columns

I check the number of records in the dataset to understand its scale.

In [ ]:
# Author: Mohammad Sayem Chowdhury
print(f"DataFrame shape: {immigration_data.shape}")
(195, 43)

To prepare for analysis, I clean up the data by removing unnecessary columns, renaming for clarity, and setting the country as the index. I also add a 'Total' column for each country and prepare a list of years for analysis. This step is crucial for making the data easier to work with in the visualizations that follow.

In [ ]:
# Cleaning and preparing the data for my own analysis and visualization experiments
# Author: Mohammad Sayem Chowdhury
# Remove columns that aren't needed
immigration_data.drop(['AREA','REG','DEV','Type','Coverage'], axis=1, inplace=True)

# Rename columns for clarity
immigration_data.rename(columns={'OdName':'Country', 'AreaName':'Continent', 'RegName':'Region'}, inplace=True)

# Ensure all column labels are strings
immigration_data.columns = list(map(str, immigration_data.columns))

# Set country as index
immigration_data.set_index('Country', inplace=True)

# Add a total column for each country
immigration_data['Total'] = immigration_data.sum(axis=1)

# Prepare list of years for analysis
years = list(map(str, range(1980, 2014)))
print('Cleaned data shape:', immigration_data.shape)
data dimensions: (195, 38)
C:\Users\chysa\AppData\Local\Temp\ipykernel_10804\2754968886.py:14: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
  df_can['Total'] =  df_can.sum (axis = 1)

Visualizing Data with Matplotlib¶

Setting up matplotlib for plotting.

In [ ]:
# Author: Mohammad Sayem Chowdhury
%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches # for waffle charts

mpl.style.use('ggplot') # for a clean look

print('Matplotlib version:', mpl.__version__)
Matplotlib version:  3.5.1

Waffle Charts¶

Waffle charts are a fun way to visualize proportions. Here, I use them to compare immigration from Denmark, Norway, and Sweden to Canada. I build the chart step by step, then wrap it in a reusable function.

First, I create a DataFrame for the three countries I'm interested in.

In [ ]:
# Author: Mohammad Sayem Chowdhury
nordic_data = immigration_data.loc[['Denmark', 'Norway', 'Sweden'], :]
nordic_data
Out[ ]:
Continent Region DevName 1980 1981 1982 1983 1984 1985 1986 ... 2005 2006 2007 2008 2009 2010 2011 2012 2013 Total
Country
Denmark Europe Northern Europe Developed regions 272 293 299 106 93 73 93 ... 62 101 97 108 81 92 93 94 81 3901
Norway Europe Northern Europe Developed regions 116 77 106 51 31 54 56 ... 57 53 73 66 75 46 49 53 59 2327
Sweden Europe Northern Europe Developed regions 281 308 222 176 128 158 187 ... 205 139 193 165 167 159 134 140 140 5866

3 rows × 38 columns

Python doesn't have a built-in waffle chart, so I construct one manually using numpy and matplotlib.

Step 1: Calculate the proportion of each country.

In [ ]:
# Author: Mohammad Sayem Chowdhury
total_immigrants = nordic_data['Total'].sum()
country_proportions = nordic_data['Total'] / total_immigrants
country_proportions.to_frame('Proportion')
Out[ ]:
Category Proportion
Country
Denmark 0.322557
Norway 0.192409
Sweden 0.485034

Step 2: Define the chart size.

In [ ]:
# Author: Mohammad Sayem Chowdhury
waffle_width = 40
waffle_height = 10
total_tiles = waffle_width * waffle_height
print(f'Total tiles: {total_tiles}')
Total number of tiles is 400.

Step 3: Calculate the number of tiles for each country.

In [ ]:
# Author: Mohammad Sayem Chowdhury
tiles_per_country = (country_proportions * total_tiles).round().astype(int)
tiles_per_country.to_frame('Tiles')
Out[ ]:
Number of tiles
Country
Denmark 129
Norway 77
Sweden 194

Now I know how many tiles each country will occupy in the waffle chart.

Step 4: Build the waffle chart matrix.

In [ ]:
# Author: Mohammad Sayem Chowdhury
waffle_matrix = np.zeros((waffle_height, waffle_width), dtype=int)
category_idx = 0
tile_count = 0
for col in range(waffle_width):
    for row in range(waffle_height):
        tile_count += 1
        if tile_count > sum(tiles_per_country[:category_idx]):
            category_idx += 1
        waffle_matrix[row, col] = category_idx
print('Waffle matrix created.')
Waffle chart populated!

Let's see what the matrix looks like.

In [ ]:
waffle_matrix
Out[ ]:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3,
        3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]],
      dtype=uint32)

The matrix now represents the three countries, with each integer corresponding to a country.

Step 5: Visualize the waffle chart.

In [ ]:
# Author: Mohammad Sayem Chowdhury
fig = plt.figure()
colormap = plt.cm.coolwarm
plt.matshow(waffle_matrix, cmap=colormap)
plt.colorbar()
plt.show()
C:\Users\chysa\AppData\Local\Temp\ipykernel_10804\103890981.py:7: MatplotlibDeprecationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecated since 3.5 and will be removed two minor releases later; please call grid(False) first.
  plt.colorbar()
<Figure size 432x288 with 0 Axes>
No description has been provided for this image

Step 6: Add gridlines for clarity.

In [ ]:
# Author: Mohammad Sayem Chowdhury
fig = plt.figure()
plt.matshow(waffle_matrix, cmap=colormap)
plt.colorbar()
ax = plt.gca()
ax.set_xticks(np.arange(-.5, waffle_width, 1), minor=True)
ax.set_yticks(np.arange(-.5, waffle_height, 1), minor=True)
ax.grid(which='minor', color='w', linestyle='-', linewidth=2)
plt.xticks([])
plt.yticks([])
plt.show()
C:\Users\chysa\AppData\Local\Temp\ipykernel_10804\1261046109.py:7: MatplotlibDeprecationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecated since 3.5 and will be removed two minor releases later; please call grid(False) first.
  plt.colorbar()
<Figure size 432x288 with 0 Axes>
No description has been provided for this image

Step 7: Add a legend to the chart for clarity.

In [ ]:
# Author: Mohammad Sayem Chowdhury
fig = plt.figure()
plt.matshow(waffle_matrix, cmap=colormap)
plt.colorbar()
ax = plt.gca()
ax.set_xticks(np.arange(-.5, waffle_width, 1), minor=True)
ax.set_yticks(np.arange(-.5, waffle_height, 1), minor=True)
ax.grid(which='minor', color='w', linestyle='-', linewidth=2)
plt.xticks([])
plt.yticks([])

# Create legend
cumulative = np.cumsum(nordic_data['Total'])
total = cumulative[-1]
legend_handles = []
for i, country in enumerate(nordic_data.index.values):
    label = f"{country} ({nordic_data['Total'][i]})"
    color = colormap(float(cumulative[i])/total)
    legend_handles.append(mpatches.Patch(color=color, label=label))
plt.legend(handles=legend_handles, loc='lower center', ncol=len(nordic_data.index.values), bbox_to_anchor=(0., -0.2, 0.95, .1))
plt.show()
C:\Users\chysa\AppData\Local\Temp\ipykernel_10804\2463873726.py:7: MatplotlibDeprecationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecated since 3.5 and will be removed two minor releases later; please call grid(False) first.
  plt.colorbar()
<Figure size 432x288 with 0 Axes>
No description has been provided for this image

I find waffle charts visually appealing and effective for showing proportions. Now, I’ll wrap the process in a reusable function for future use.

Here’s my custom function to create a waffle chart from any set of categories and values.

In [ ]:
# Author: Mohammad Sayem Chowdhury
def create_waffle_chart(categories, values, height, width, colormap, value_sign=''):
    """Create a waffle chart for the given categories and values."""
    total_values = sum(values)
    proportions = [float(value) / total_values for value in values]
    total_tiles = width * height
    print('Total tiles:', total_tiles)
    tiles_per_category = [round(p * total_tiles) for p in proportions]
    for i, tiles in enumerate(tiles_per_category):
        print(f"{categories[i]}: {tiles}")
    waffle = np.zeros((height, width))
    category_idx = 0
    tile_idx = 0
    for col in range(width):
        for row in range(height):
            tile_idx += 1
            if tile_idx > sum(tiles_per_category[:category_idx]):
                category_idx += 1
            waffle[row, col] = category_idx
    fig = plt.figure()
    plt.matshow(waffle, cmap=colormap)
    plt.colorbar()
    ax = plt.gca()
    ax.set_xticks(np.arange(-.5, width, 1), minor=True)
    ax.set_yticks(np.arange(-.5, height, 1), minor=True)
    ax.grid(which='minor', color='w', linestyle='-', linewidth=2)
    plt.xticks([])
    plt.yticks([])
    legend_handles = []
    cumulative = np.cumsum(values)
    total = cumulative[-1]
    for i, category in enumerate(categories):
        label = f"{category} ({values[i]}{value_sign})"
        color = colormap(float(cumulative[i])/total)
        legend_handles.append(mpatches.Patch(color=color, label=label))
    plt.legend(handles=legend_handles, loc='lower center', ncol=len(categories), bbox_to_anchor=(0., -0.2, 0.95, .1))
    plt.show()

Now to create a waffle chart, all we have to do is call the function create_waffle_chart. Let's define the input parameters:

In [ ]:
# Author: Mohammad Sayem Chowdhury
# Setting up the waffle chart parameters for my own visualization
waffle_cols = 40  # number of columns in the waffle chart
waffle_rows = 10  # number of rows in the waffle chart

waffle_countries = nordic_data.index.values  # countries to visualize
waffle_totals = nordic_data['Total']         # total immigrants per country

waffle_palette = plt.cm.coolwarm  # color palette for the chart

And now let's call our function to create a waffle chart.

In [ ]:
# Author: Mohammad Sayem Chowdhuryrk, Norway, and Sweden, visualizing their immigration proportions to Canada.






Now, I use my custom function to create a waffle chart for Denmark, Norway, and Sweden, visualizing their immigration proportions to Canada.

*By Mohammad Sayem Chowdhury*

```python
create_waffle_chart(categories, values, height, width, colormap)
```    categories=waffle_countries,
    values=waffle_totals,
    height=waffle_rows,
    width=waffle_cols,
    colormap=waffle_palette,
    value_sign=''
)
```
Total number of tiles is 400
Denmark: 129
Norway: 77
Sweden: 194
C:\Users\chysa\AppData\Local\Temp\ipykernel_10804\3286913405.py:45: MatplotlibDeprecationWarning: Auto-removal of grids by pcolor() and pcolormesh() is deprecated since 3.5 and will be removed two minor releases later; please call grid(False) first.
  plt.colorbar()
<Figure size 432x288 with 0 Axes>
No description has been provided for this image

There seems to be a new Python package for generating waffle charts called PyWaffle, but it looks like the repository is still being built. But feel free to check it out and play with it.

Word Clouds¶

Word clouds are one of my favorite ways to visualize text or categorical data. The bigger and bolder a word appears, the more frequently it shows up in the data. In this section, I'll use word clouds to explore both literary text and my own immigration dataset.

By Mohammad Sayem Chowdhury

Luckily, Python has a great package for generating word clouds, called wordcloud by Andreas Mueller. I use it here to quickly turn text or categorical data into a visual summary. Let's see it in action!

If you haven't installed the wordcloud package yet, you can do so easily. I always like to keep my environment ready for creative visualizations.

In [159]:
# install wordcloud
# !pip3 install wordcloud==1.8.1

# import package and its set of stopwords
from wordcloud import WordCloud, STOPWORDS,ImageColorGenerator

print ('Wordcloud is installed and imported!')
Wordcloud is installed and imported!

Word clouds are perfect for high-level analysis of text data. To show how flexible they are, I'll take a quick detour from immigration data and analyze a classic novel: Alice's Adventures in Wonderland by Lewis Carroll. Let's download the text and see what stands out!

In [160]:
import urllib

# open the file and read it into a variable alice_novel
alice_novel = urllib.request.urlopen('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/alice_novel.txt').read().decode("utf-8")

To make the word cloud more meaningful, I use a set of stopwords to filter out common words that don't add much insight. This helps the most interesting words stand out.

In [161]:
stopwords = set(STOPWORDS)

Now, I create a word cloud using only the first 2000 words of the novel. This gives a quick visual summary of the most frequent words in the opening chapters.

In [ ]:
# Author: Mohammad Sayem Chowdhury
# Generate a word cloud from Alice in Wonderland (as a fun example)
alice_wc = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords=stopwords
)
alice_wc.generate(alice_novel)
Out[ ]:
<wordcloud.wordcloud.WordCloud at 0x1a792111190>

Let's take a look at the word cloud I just generated. It's always fun to see which words pop out visually!

In [ ]:
plt.imshow(alice_wc, interpolation='bilinear')
plt.axis('off')
plt.show()
No description has been provided for this image

Some words, like "said," aren't very informative. I add them to the stopwords list and regenerate the word cloud for a cleaner, more insightful visualization.

In [ ]:
stopwords.add('said')
alice_wc.generate(alice_novel)
fig = plt.figure(figsize=(14, 18))
plt.imshow(alice_wc, interpolation='bilinear')
plt.axis('off')
plt.show()
No description has been provided for this image

For extra creativity, I use a custom mask to shape the word cloud. Here, I use an Alice-themed mask to make the visualization even more fun and relevant to the story.

In [ ]:
alice_mask = np.array(Image.open('alice_mask.png'))
In [ ]:
fig = plt.figure(figsize=(14, 18))
plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis('off')
plt.show()
No description has been provided for this image
In [ ]:
alice_wc = WordCloud(background_color='white', max_words=2000, mask=alice_mask, stopwords=stopwords)
alice_wc.generate(alice_novel)
fig = plt.figure(figsize=(14, 18))
plt.imshow(alice_wc, interpolation='bilinear')
plt.axis('off')
plt.show()
No description has been provided for this image
In [ ]:
# Author: Mohammad Sayem Chowdhury
max_words = 90
word_string = ''
for country in immigration_data.index.values:
    if country.count(" ") == 0:
        repeat_num_times = int(immigration_data.loc[country, 'Total'] / immigration_data['Total'].sum() * max_words)
        word_string += (country + ' ') * repeat_num_times
In [ ]:
wordcloud = WordCloud(background_color='white').generate(word_string)
print('Word cloud created!')
Word cloud created!
In [ ]:
plt.figure(figsize=(14, 18))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
No description has been provided for this image
In [ ]:
canada_mask = np.array(Image.open('Flag-Canada.webp'))
wordcloud_canada = WordCloud(stopwords=stopwords, background_color="white", max_words=90, mask=canada_mask).generate(word_string)
image_colors = ImageColorGenerator(canada_mask)
plt.figure(figsize=(14, 18))
plt.imshow(wordcloud_canada.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
plt.show()
No description has been provided for this image

Regression Plots¶

Regression plots are a powerful way to explore trends and relationships in data. Here, I use seaborn to quickly create regression plots and uncover patterns in Canadian immigration over time.

By Mohammad Sayem Chowdhury

In [ ]:
# Author: Mohammad Sayem Chowdhury
# If seaborn is not installed, uncomment the next line:
# !pip install seaborn
import seaborn as sns
print('Seaborn ready!')
Seaborn installed and imported!

To see the overall trend, I create a DataFrame with the total number of immigrants to Canada for each year. This sets the stage for a clear regression analysis.

In [ ]:
total_per_year = pd.DataFrame(immigration_data[years].sum(axis=0))
total_per_year.index = map(float, total_per_year.index)
total_per_year.reset_index(inplace=True)
total_per_year.columns = ['year', 'total']
total_per_year.head()
Out[ ]:
year total
0 1980.0 99137
1 1981.0 110563
2 1982.0 104271
3 1983.0 75550
4 1984.0 73417

With seaborn, it's easy to generate a regression plot that shows the trend in total immigration over time. I love how visually clear and informative these plots are.

In [ ]:
sns.regplot(x='year', y='total', data=total_per_year)
plt.show()
Out[ ]:
<AxesSubplot:xlabel='year', ylabel='total'>
No description has been provided for this image
In [ ]:
sns.regplot(x='year', y='total', data=total_per_year, color='green')
plt.show()
No description has been provided for this image
In [ ]:
ax = sns.regplot(x='year', y='total', data=total_per_year, color='green', marker='+')
plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize=(15, 10))
sns.regplot(x='year', y='total', data=total_per_year, color='green', marker='+')
plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize=(15, 10))
ax = sns.regplot(x='year', y='total', data=total_per_year, color='green', marker='+', scatter_kws={'s': 200})
ax.set(xlabel='Year', ylabel='Total Immigration')
ax.set_title('Total Immigration to Canada from 1980 - 2013')
plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize=(15, 10))
sns.set(font_scale=1.5)
ax = sns.regplot(x='year', y='total', data=total_per_year, color='green', marker='+', scatter_kws={'s': 200})
ax.set(xlabel='Year', ylabel='Total Immigration')
ax.set_title('Total Immigration to Canada from 1980 - 2013')
plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize=(15, 10))
sns.set(font_scale=1.5)
sns.set_style('ticks')
ax = sns.regplot(x='year', y='total', data=total_per_year, color='green', marker='+', scatter_kws={'s': 200})
ax.set(xlabel='Year', ylabel='Total Immigration')
ax.set_title('Total Immigration to Canada from 1980 - 2013')
plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize=(15, 10))
sns.set(font_scale=1.5)
sns.set_style('whitegrid')
ax = sns.regplot(x='year', y='total', data=total_per_year, color='green', marker='+', scatter_kws={'s': 200})
ax.set(xlabel='Year', ylabel='Total Immigration')
ax.set_title('Total Immigration to Canada from 1980 - 2013')
plt.show()
No description has been provided for this image

My Own Exploration: Immigration from Denmark, Sweden, and Norway¶

I'm especially interested in the trends for Denmark, Sweden, and Norway. By summing their yearly totals and plotting a regression line, I can see how immigration from these countries has changed over time.

In [ ]:
# Author: Mohammad Sayem Chowdhury
nordic_yearly = immigration_data.loc[['Denmark', 'Norway', 'Sweden'], years].transpose()
nordic_total = pd.DataFrame(nordic_yearly.sum(axis=1))
nordic_total.reset_index(inplace=True)
nordic_total.columns = ['year', 'total']
nordic_total['year'] = nordic_total['year'].astype(int)
plt.figure(figsize=(15, 10))
sns.set(font_scale=1.5)
sns.set_style('whitegrid')
ax = sns.regplot(x='year', y='total', data=nordic_total, color='green', marker='+', scatter_kws={'s': 200})
ax.set(xlabel='Year', ylabel='Total Immigration')
ax.set_title('Immigration from Denmark, Sweden, and Norway to Canada (1980-2013)')
plt.show()
Out[ ]:
Text(0.5, 1.0, 'Total Immigrationn from Denmark, Sweden, and Norway to Canada from 1980 - 2013')
No description has been provided for this image

Personal Reflections & Summary¶

Exploring data through waffle charts, word clouds, and regression plots has given me new insights into Canadian immigration trends and the power of visualization. Each technique brings out a different aspect of the story, and I hope my personal approach has made these concepts more relatable and inspiring.

If you have feedback or want to connect, feel free to reach out!

Notebook by Mohammad Sayem Chowdhury