Introduction¶
In this notebook, I dive into three datasets about Chicago to uncover insights using SQL and Python. My goal is to understand the data, load it into a database, and answer interesting questions through my own analysis.
About the Datasets¶
For this project, I chose three datasets that capture different aspects of life in Chicago: socioeconomic indicators, public school performance, and crime data. These datasets are available from the City of Chicago's open data portal.
1. Socioeconomic Indicators¶
This dataset includes six key indicators related to public health and a hardship index for each Chicago community area (2008–2012). I use this to explore how different neighborhoods compare in terms of economic and social well-being.
2. Chicago Public Schools¶
This dataset provides school-level performance data for the 2011-2012 school year. It helps me analyze educational outcomes across the city.
3. Chicago Crime Data¶
This dataset covers reported crimes in Chicago from 2001 onward (excluding the most recent week). It allows me to investigate crime patterns and their relationship to other factors.
Getting the Data¶
I downloaded the datasets as CSV files from the City of Chicago's open data portal. For consistency, I used the versions provided with this project, which are cleaned and formatted for easier analysis.
Loading Data into the Database¶
To analyze the data with SQL, I first loaded each CSV file into a database. I created three tables:
CENSUS_DATACHICAGO_PUBLIC_SCHOOLSCHICAGO_CRIME_DATA
I used the database's import tool to ensure the data types matched my analysis needs.
Connecting to the Database¶
Before running any queries, I set up the SQL extension and connected to my database. If you're running this notebook elsewhere, make sure to install the required libraries.
# If you're running this outside my environment, you may need to install these libraries:
# !pip install ibm_db ibm_db_sa ipython-sql
%load_ext sql
In the next cell, enter your database connection string. Make sure to use your own credentials for a secure connection.
# Format: %sql ibm_db_sa://username:password@hostname:port/dbname?security=SSL
# Enter your connection string below
%sql ibm_db_sa://
2. Which community areas have a per capita income below $11,000?¶
3. What are the case numbers for crimes involving minors? (Note: In this context, children are not considered minors for crime analysis.)¶
4. Which kidnapping crimes involved a child?¶
5. What types of crimes have been reported at schools?¶
6. What is the average safety score for each type of school?¶
7. Which five community areas have the highest percentage of households below the poverty line?¶
8. Which community area has the highest crime rate?¶
9. Which community area has the highest hardship index? (Using a sub-query)¶
10. Which community area has the most crimes? (Using a sub-query)¶
Notebook and analysis by Mohammad Sayem Chowdhury.
All code and explanations are my own. This notebook is shared under the MIT License.