Global Alcohol Consumption Patterns: Cross-Cultural Analysis¶

By Mohammad Sayem Chowdhury
Last Updated: June 13, 2025

Project Overview¶

This comprehensive analysis explores global alcohol consumption patterns across different countries and continents. Using statistical analysis and data visualization, I investigate cultural drinking habits, regional preferences, and consumption trends to understand how alcohol consumption varies worldwide.

Research Objectives¶

  • Cultural Analysis: Examine alcohol consumption patterns across different continents and cultures
  • Beverage Preferences: Analyze the relationship between beer, wine, and spirit consumption by region
  • Consumption Patterns: Identify countries with highest/lowest consumption rates
  • Predictive Modeling: Develop models to predict total alcohol consumption based on beverage types

Author: Mohammad Sayem Chowdhury
Project Type: Cross-Cultural Data Analysis
Domain: Global Health & Social Analytics
Dataset: World Health Organization Alcohol Consumption Data

Table of Contents¶

  1. Environment Setup & Data Import
  2. Dataset Exploration & Quality Assessment
  3. Continental Consumption Analysis
  4. Beverage Type Preferences by Region
  5. Statistical Analysis & Correlations
  6. Visualization & Pattern Discovery
  7. Key Findings & Cultural Insights

Executive Summary¶

Research Focus¶

This analysis examines global alcohol consumption patterns using comprehensive data on beer, wine, and spirit servings per capita across 193 countries. The goal is to uncover cultural drinking preferences, identify regional consumption trends, and develop insights into global alcohol consumption behaviors.

Key Questions¶

  • Which continents and countries have the highest alcohol consumption rates?
  • How do beverage preferences (beer vs. wine vs. spirits) vary by region?
  • What cultural and geographic factors influence drinking patterns?
  • Can we predict total alcohol consumption from individual beverage consumption data?

Expected Impact: Providing data-driven insights for public health policy, cultural studies, and international business strategies in the beverage industry.

You will need the following libraries:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

1.0 Importing the Data

Load the csv:

In [ ]:
df= pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/edx/project/drinks.csv')

We use the method head() to display the first 5 columns of the dataframe:

In [5]:
df.head()
Out[5]:
country beer_servings spirit_servings wine_servings total_litres_of_pure_alcohol continent
0 Afghanistan 0 0 0 0.0 Asia
1 Albania 89 132 54 4.9 Europe
2 Algeria 25 0 14 0.7 Africa
3 Andorra 245 138 312 12.4 Europe
4 Angola 217 57 45 5.9 Africa

Question 1: Display the data types of each column using the attribute dtype.

In [6]:
df.dtypes
Out[6]:
country                          object
beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

Question 2 use the method groupby to get the number of wine servings per continent:

In [8]:
df_Wine = df[['continent','wine_servings']]
df_winegrp = df_Wine.groupby(['continent'], as_index=False).sum()
df_winegrp
Out[8]:
continent wine_servings
0 Africa 862
1 Asia 399
2 Europe 6400
3 North America 564
4 Oceania 570
5 South America 749

Question 3: Perform a statistical summary and analysis of beer servings for each continent:

In [9]:
df_beer = df[['continent','beer_servings']]
df_beergrp = df_beer.groupby(['continent'], as_index=True).describe()
df_beergrp
Out[9]:
beer_servings
count mean std min 25% 50% 75% max
continent
Africa 53.0 61.471698 80.557816 0.0 15.00 32.0 76.00 376.0
Asia 44.0 37.045455 49.469725 0.0 4.25 17.5 60.50 247.0
Europe 45.0 193.777778 99.631569 0.0 127.00 219.0 270.00 361.0
North America 23.0 145.434783 79.621163 1.0 80.00 143.0 198.00 285.0
Oceania 16.0 89.687500 96.641412 0.0 21.00 52.5 125.75 306.0
South America 12.0 175.083333 65.242845 93.0 129.50 162.5 198.00 333.0

Question 4: Use the function boxplot in the seaborn library to produce a plot that can be used to show the number of beer servings on each continent.

In [10]:
import seaborn as sns 
sns.boxplot(x="continent", y="beer_servings", data=df_beer)
plt.show()
No description has been provided for this image

Question 5: Use the function  regplot in the seaborn library to determine if the number of wine servings is negatively or positively correlated with the number of beer servings.

In [11]:
import seaborn as sns 
sns.regplot(x="wine_servings", y="beer_servings", data=df)
plt.show()
# Beer servings & Wine servings appear to be positively correlated.
# Also seems like there maybe some places where only Beer is served.
No description has been provided for this image

Question 6: Fit a linear regression model to predict the 'total_litres_of_pure_alcohol' using the number of 'wine_servings' then calculate $R^{2}$:

In [12]:
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
x = df[['wine_servings']]
y = df['total_litres_of_pure_alcohol']
lm.fit(x,y)
yhat = lm.predict(x)
print(yhat[0:5])
print("Intercept is ",lm.intercept_)
print("Slope is ",lm.coef_)
print("R^2 is ",lm.score(x,y))
[ 3.15407943  4.86088833  3.59658545 13.01564196  4.57642018]
Intercept is  3.1540794346874996
Slope is  [0.03160757]
R^2 is  0.4456875459787605

Question 7¶

Use the list of features to predict the 'total_litres_of_pure_alcohol', split the data into training and testing and determine the $R^2$ on the test data, using the provided code:

In [13]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
x_data = df[['beer_servings','spirit_servings','wine_servings']]
y_data = df['total_litres_of_pure_alcohol']
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.4, random_state=0)
lr = LinearRegression()
lr.fit(x_train, y_train)
print("Train Data R^2:", lr.score(x_train, y_train))
print("Test Data R^2:", lr.score(x_test, y_test))
Train Data R^2: 0.9471204262013297
Test Data R^2: 0.7370737388267039

Question 8 : Create a pipeline object that scales the data, performs a polynomial transform and fits a linear regression model. Fit the object using the training data in the question above, then calculate the R^2 using. the test data. Take a screenshot of your code and the $R^{2}$. There are some hints in the notebook:

'scale'

'polynomial'

'model'

The second element in the tuple contains the model constructor

StandardScaler()

PolynomialFeatures(include_bias=False)

LinearRegression()

In [45]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler,PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.4, random_state=0)

input =[('scale',StandardScaler()),('polynomial',PolynomialFeatures(include_bias=False,degree=2)),('model',LinearRegression())]
pipe = Pipeline(input)
pipe.fit(x_train, y_train)
yhat = pipe.predict(x_data)
print("R^2 using Test data is", pipe.score(x_test, y_test))
print("R^2 using Training data is", pipe.score(x_train, y_train))
R^2 using Test data is 0.7594556586231647
R^2 using Training data is 0.9555197146227157
/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/preprocessing/data.py:645: DataConversionWarning: Data with input dtype int64 were all converted to float64 by StandardScaler.
  return self.partial_fit(X, y)
/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/base.py:467: DataConversionWarning: Data with input dtype int64 were all converted to float64 by StandardScaler.
  return self.fit(X, y, **fit_params).transform(X)
/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/pipeline.py:331: DataConversionWarning: Data with input dtype int64 were all converted to float64 by StandardScaler.
  Xt = transform.transform(Xt)
/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/pipeline.py:511: DataConversionWarning: Data with input dtype int64 were all converted to float64 by StandardScaler.
  Xt = transform.transform(Xt)
/opt/conda/envs/Python36/lib/python3.6/site-packages/sklearn/pipeline.py:511: DataConversionWarning: Data with input dtype int64 were all converted to float64 by StandardScaler.
  Xt = transform.transform(Xt)

Question 9: Create and fit a Ridge regression object using the training data, setting the regularization parameter to 0.1 and calculate the $R^{2}$ using the test data. Take a screenshot of your code and the $R^{2}$

In [54]:
from sklearn.linear_model import Ridge
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.4, random_state=0)
RidgeModel = Ridge(alpha=0.1)
RidgeModel.fit(x_train, y_train)
yhat = RidgeModel.predict(x_test)
print("Test R^2:",RidgeModel.score(x_test, y_test))
print("Train R^2:",RidgeModel.score(x_train, y_train))
Test R^2: 0.7370737565866847
Train R^2: 0.9471204262013262

Question 10 : Perform a 2nd order polynomial transform on both the training data and testing data. Create and fit a Ridge regression object using the training data, setting the regularization parameter to 0.1. Calculate the $R^{2}$ utilizing the test data provided. Take a screen-shot of your code and the $R^{2}$.

In [56]:
pr = PolynomialFeatures(degree=2)
x_train_pr = pr.fit_transform(x_train)
x_test_pr = pr.fit_transform(x_test)
RidgeModel_pr = Ridge(alpha=0.1)
RidgeModel_pr.fit(x_train_pr, y_train)
print("RidgeModel Test data R^2: ",RidgeModel_pr.score(x_test_pr, y_test))
print("RidgeModel Train data R^2: ",RidgeModel_pr.score(x_train_pr, y_train))
RidgeModel Test data R^2:  0.7594556764530759
RidgeModel Train data R^2:  0.9555197146226246

CLICK HERE to see how to share your notebook

Sources

Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits? by By Mona Chalabi , you can download the dataset here.