My Chicago Data Analysis Journey¶
By Mohammad Sayem Chowdhury
In this notebook, I explore three fascinating datasets from Chicago. My goal is to load them into a database, run SQL queries, and share my personal insights along the way. Here’s what I’ll be doing:
- Understanding the structure and story behind each dataset
- Loading the data into database tables for flexible analysis
- Using SQL to answer questions and uncover patterns
Let’s get started!
Getting to Know the Datasets¶
For this project, I’m working with three datasets that each tell a different part of Chicago’s story:
- Socioeconomic Indicators: This dataset gives a snapshot of public health and hardship across Chicago’s community areas. I’m especially interested in the hardship index and how it relates to other factors.
- Chicago Public Schools: Here, I can see school-level performance data. I’m curious to explore how different schools compare and what factors might influence their results.
- Chicago Crime Data: This one is huge! It covers reported crimes in Chicago. For practical reasons, I’m using a smaller sample, but it’s still rich with information.
Before diving in, I reviewed the columns and thought about the kinds of questions I want to answer. This helps me plan my analysis and focus on what matters most to me.
Downloading the Data¶
I always make sure to use clean, well-structured CSV files for my analysis. For this project, I downloaded prepared versions of each dataset to avoid any formatting headaches. This way, I can focus on the analysis instead of data wrangling.
In many cases the dataset to be analyzed is available as a .CSV (comma separated values) file, perhaps on the internet. Click on the links below to download and save the datasets (.CSV files):
CENSUS_DATA: Census Dataset
CHICAGO_PUBLIC_SCHOOLS Chicago Public School
CHICAGO_CRIME_DATA: Chicago Crime Data
Loading Data into the Database¶
I prefer to load each CSV into its own table for easy querying. For this project, I created three tables:
CENSUS_DATACHICAGO_PUBLIC_SCHOOLSCHICAGO_CRIME_DATA
Using the database’s LOAD tool, I imported each dataset. This step is crucial for running efficient SQL queries later on.
Connecting to My Database¶
With the data loaded, I connect to the database using the ipython-sql extension. This lets me run SQL queries directly from my notebook, which I find super convenient.
# Loading the SQL extension so I can run SQL queries in this notebook
%load_ext sql
In the next cell, I’ll enter my database connection string. This connects my notebook to the database where all my tables live. I always double-check my credentials to avoid connection issues!
# Remember the connection string is of the format:
# %sql ibm_db_sa://my-username:my-password@my-hostname:my-port/my-db-name
# Enter the connection string for your Db2 on Cloud database instance below
# Replace with your actual connection string for the database
%sql ibm_db_sa://
'Connected: bsx51096@BLUDB'
# Rows in Crime table
%sql SELECT count(*) from CHICAGO_CRIME_DATA
* ibm_db_sa://bsx51096:***@dashdb-txn-sbox-yp-dal09-14.services.dal.bluemix.net:50000/BLUDB Done.
| 1 |
|---|
| 533 |
%sql SELECT * from CHICAGO_CRIME_DATA \
fetch first 10 rows only
* ibm_db_sa://bsx51096:***@dashdb-txn-sbox-yp-dal09-14.services.dal.bluemix.net:50000/BLUDB Done.
| id | case_number | DATE | block | iucr | primary_type | description | location_description | arrest | domestic | beat | district | ward | community_area_number | fbicode | x_coordinate | y_coordinate | YEAR | updatedon | latitude | longitude | location |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3512276 | HK587712 | 08/28/2004 05:50:56 PM | 047XX S KEDZIE AVE | 890 | THEFT | FROM BUILDING | SMALL RETAIL STORE | FALSE | FALSE | 911 | 9 | 14 | 58 | 6 | 1155838 | 1873050 | 2004 | 02/10/2018 03:50:01 PM | 41.80744050 | -87.70395585 | (41.8074405, -87.703955849) |
| 3406613 | HK456306 | 06/26/2004 12:40:00 PM | 009XX N CENTRAL PARK AVE | 820 | THEFT | $500 AND UNDER | OTHER | FALSE | FALSE | 1112 | 11 | 27 | 23 | 6 | 1152206 | 1906127 | 2004 | 02/28/2018 03:56:25 PM | 41.89827996 | -87.71640551 | (41.898279962, -87.716405505) |
| 8002131 | HT233595 | 04/04/2011 05:45:00 AM | 043XX S WABASH AVE | 820 | THEFT | $500 AND UNDER | NURSING HOME/RETIREMENT HOME | FALSE | FALSE | 221 | 2 | 3 | 38 | 6 | 1177436 | 1876313 | 2011 | 02/10/2018 03:50:01 PM | 41.81593313 | -87.62464213 | (41.815933131, -87.624642127) |
| 7903289 | HT133522 | 12/30/2010 04:30:00 PM | 083XX S KINGSTON AVE | 840 | THEFT | FINANCIAL ID THEFT: OVER $300 | RESIDENCE | FALSE | FALSE | 423 | 4 | 7 | 46 | 6 | 1194622 | 1850125 | 2010 | 02/10/2018 03:50:01 PM | 41.74366532 | -87.56246276 | (41.743665322, -87.562462756) |
| 10402076 | HZ138551 | 02/02/2016 07:30:00 PM | 033XX W 66TH ST | 820 | THEFT | $500 AND UNDER | ALLEY | FALSE | FALSE | 831 | 8 | 15 | 66 | 6 | 1155240 | 1860661 | 2016 | 02/10/2018 03:50:01 PM | 41.77345530 | -87.70648047 | (41.773455295, -87.706480471) |
| 7732712 | HS540106 | 09/29/2010 07:59:00 AM | 006XX W CHICAGO AVE | 810 | THEFT | OVER $500 | PARKING LOT/GARAGE(NON.RESID.) | FALSE | FALSE | 1323 | 12 | 27 | 24 | 6 | 1171668 | 1905607 | 2010 | 02/10/2018 03:50:01 PM | 41.89644677 | -87.64493868 | (41.896446772, -87.644938678) |
| 10769475 | HZ534771 | 11/30/2016 01:15:00 AM | 050XX N KEDZIE AVE | 810 | THEFT | OVER $500 | STREET | FALSE | FALSE | 1713 | 17 | 33 | 14 | 6 | 1154133 | 1933314 | 2016 | 02/10/2018 03:50:01 PM | 41.97284491 | -87.70860008 | (41.972844913, -87.708600079) |
| 4494340 | HL793243 | 12/16/2005 04:45:00 PM | 005XX E PERSHING RD | 860 | THEFT | RETAIL THEFT | GROCERY FOOD STORE | TRUE | FALSE | 213 | 2 | 3 | 38 | 6 | 1180448 | 1879234 | 2005 | 02/28/2018 03:56:25 PM | 41.82387989 | -87.61350386 | (41.823879885, -87.613503857) |
| 3778925 | HL149610 | 01/28/2005 05:00:00 PM | 100XX S WASHTENAW AVE | 810 | THEFT | OVER $500 | STREET | FALSE | FALSE | 2211 | 22 | 19 | 72 | 6 | 1160129 | 1838040 | 2005 | 02/28/2018 03:56:25 PM | 41.71128051 | -87.68917910 | (41.711280513, -87.689179097) |
| 3324217 | HK361551 | 05/13/2004 02:15:00 PM | 033XX W BELMONT AVE | 820 | THEFT | $500 AND UNDER | SMALL RETAIL STORE | FALSE | FALSE | 1733 | 17 | 35 | 21 | 6 | 1153590 | 1921084 | 2004 | 02/28/2018 03:56:25 PM | 41.93929582 | -87.71092344 | (41.939295821, -87.710923442) |
%sql SELECT count(ARREST) from CHICAGO_CRIME_DATA where arrest = 'TRUE'
* ibm_db_sa://bsx51096:***@dashdb-txn-sbox-yp-dal09-14.services.dal.bluemix.net:50000/BLUDB Done.
| 1 |
|---|
| 163 |
%sql SELECT DISTINCT(PRIMARY_TYPE) from CHICAGO_CRIME_DATA WHERE location_description = 'GAS STATION';
* ibm_db_sa://bsx51096:***@dashdb-txn-sbox-yp-dal09-14.services.dal.bluemix.net:50000/BLUDB Done.
| primary_type |
|---|
| CRIMINAL TRESPA |
| NARCOTICS |
| ROBBERY |
| THEFT |
%sql SELECT community_area_name from CENSUS_DATA WHERE community_area_name LIKE 'B%'
* ibm_db_sa://bsx51096:***@dashdb-txn-sbox-yp-dal09-14.services.dal.bluemix.net:50000/BLUDB Done.
| community_area_name |
|---|
| Belmont Cragin |
| Burnside |
| Brighton Park |
| Bridgeport |
| Beverly |
%sql SELECT S.NAME_OF_SCHOOL,S.healthy_school_certified,C.COMMUNITY_AREA_NUMBER,C.COMMUNITY_AREA_NAME from CENSUS_DATA as C JOIN CHICAGO_PUBLIC_SCHOOLS as S \
on UPPER(C.COMMUNITY_AREA_NAME) = UPPER(S.community_area_name) \
where C.COMMUNITY_AREA_NUMBER between 10 and 15 AND \
S.healthy_school_certified = 'Yes';
* ibm_db_sa://bsx51096:***@dashdb-txn-sbox-yp-dal09-14.services.dal.bluemix.net:50000/BLUDB Done.
| name_of_school | healthy_school_certified | community_area_number | community_area_name |
|---|---|---|---|
| Rufus M Hitch Elementary School | Yes | 10 | Norwood Park |
%sql select avg(safety_score) as AVERAGE_SAFETY_SCORE from CHICAGO_PUBLIC_SCHOOLS
* ibm_db_sa://bsx51096:***@dashdb-txn-sbox-yp-dal09-14.services.dal.bluemix.net:50000/BLUDB Done.
| average_safety_score |
|---|
| 49.504873 |
%sql select Community_Area_Name, avg(College_Enrollment) AS AVG_ENROLLMENT \
from CHICAGO_PUBLIC_SCHOOLS \
group by Community_Area_Name \
order by AVG_ENROLLMENT DESC\
fetch first 5 rows only
* ibm_db_sa://bsx51096:***@dashdb-txn-sbox-yp-dal09-14.services.dal.bluemix.net:50000/BLUDB Done.
| community_area_name | avg_enrollment |
|---|---|
| ARCHER HEIGHTS | 2411.500000 |
| MONTCLARE | 1317.000000 |
| WEST ELSDON | 1233.333333 |
| BRIGHTON PARK | 1205.875000 |
| BELMONT CRAGIN | 1198.833333 |
%sql select Community_Area_Name,safety_score from CHICAGO_PUBLIC_SCHOOLS where SAFETY_SCORE =(select min(safety_score) from CHICAGO_PUBLIC_SCHOOLS)
* ibm_db_sa://bsx51096:***@dashdb-txn-sbox-yp-dal09-14.services.dal.bluemix.net:50000/BLUDB Done.
| community_area_name | safety_score |
|---|---|
| WASHINGTON PARK | 1 |
%sql select community_area_name,per_capita_income from CENSUS_DATA \
where community_area_number =(select community_area_number from CHICAGO_PUBLIC_SCHOOLS where safety_score = 1)
* ibm_db_sa://bsx51096:***@dashdb-txn-sbox-yp-dal09-14.services.dal.bluemix.net:50000/BLUDB Done.
| community_area_name | per_capita_income |
|---|---|
| Washington Park | 13785 |
My Reflections¶
Working with these three datasets has given me a deeper appreciation for the complexity of urban life in Chicago. From socioeconomic factors to school performance and crime data, each table tells a unique story. I enjoyed using SQL to connect the dots and uncover patterns that might otherwise go unnoticed.
Author¶
Notebook and analysis by Mohammad Sayem Chowdhury
This notebook is a personal project. All content, analysis, and commentary are original and reflect my own perspective.