Exploring Chicago Public Schools Data: My Analysis¶

Author: Mohammad Sayem Chowdhury


About the Dataset¶

This project uses data from Chicago Public Schools for the 2011-2012 school year. The dataset includes a wide range of metrics about school performance, safety, and student demographics. I downloaded a cleaned version of the data for easier analysis.

Loading the Data¶

I loaded the dataset into a table called SCHOOLS for my analysis. I used the database's import tool to ensure the data types matched my needs.

Connecting to the Database¶

I used the ipython-sql extension to connect to my database. If you're running this notebook elsewhere, make sure to install the required libraries.

In [ ]:
%load_ext sql
In [ ]:
# Format: %sql ibm_db_sa://username:password@hostname:port/dbname?security=SSL
# Enter your connection string below
%sql ibm_db_sa://

Verifying the Table¶

To confirm the table was created, I queried the database system catalog for the list of tables in my schema.

In [ ]:
%sql select TABSCHEMA,TABNAME,CREATE_TIME from SYSCAT.TABLES WHERE TABSCHEMA = 'YOUR_SCHEMA'

Exploring Table Columns¶

I checked how many columns are in the SCHOOLS table and explored their names, types, and lengths.

In [ ]:
%sql select count(*) from SYSCAT.COLUMNS where TABNAME = 'SCHOOLS'
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[ ]:
1
78
In [ ]:
%sql select COLNAME, TYPENAME, LENGTH from SYSCAT.COLUMNS where TABNAME = 'SCHOOLS'
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[ ]:
colname typename length
ENVIRONMENT_SCORE SMALLINT 2
INSTRUCTION_ICON VARCHAR 11
INSTRUCTION_SCORE SMALLINT 2
LEADERS_ICON VARCHAR 11
LEADERS_SCORE VARCHAR 3
TEACHERS_ICON VARCHAR 11
TEACHERS_SCORE VARCHAR 3
PARENT_ENGAGEMENT_ICON VARCHAR 7
PARENT_ENGAGEMENT_SCORE VARCHAR 3
PARENT_ENVIRONMENT_ICON VARCHAR 7
PARENT_ENVIRONMENT_SCORE VARCHAR 3
AVERAGE_STUDENT_ATTENDANCE VARCHAR 6
RATE_OF_MISCONDUCTS__PER_100_STUDENTS_ DECIMAL 5
AVERAGE_TEACHER_ATTENDANCE VARCHAR 6
INDIVIDUALIZED_EDUCATION_PROGRAM_COMPLIANCE_RATE VARCHAR 7
PK_2_LITERACY__ VARCHAR 4
PK_2_MATH__ VARCHAR 4
GR3_5_GRADE_LEVEL_MATH__ VARCHAR 4
GR3_5_GRADE_LEVEL_READ__ VARCHAR 4
SCHOOL_ID INTEGER 4
NAME_OF_SCHOOL VARCHAR 64
Elementary, Middle, or High School VARCHAR 2
STREET_ADDRESS VARCHAR 30
CITY VARCHAR 7
STATE VARCHAR 2
ZIP_CODE INTEGER 4
PHONE_NUMBER VARCHAR 14
LINK VARCHAR 78
NETWORK_MANAGER VARCHAR 40
COLLABORATIVE_NAME VARCHAR 34
ADEQUATE_YEARLY_PROGRESS_MADE_ VARCHAR 3
TRACK_SCHEDULE VARCHAR 12
CPS_PERFORMANCE_POLICY_STATUS VARCHAR 16
CPS_PERFORMANCE_POLICY_LEVEL VARCHAR 15
HEALTHY_SCHOOL_CERTIFIED VARCHAR 3
SAFETY_ICON VARCHAR 11
SAFETY_SCORE SMALLINT 2
FAMILY_INVOLVEMENT_ICON VARCHAR 11
FAMILY_INVOLVEMENT_SCORE VARCHAR 3
ENVIRONMENT_ICON VARCHAR 11
GR3_5_KEEP_PACE_READ__ VARCHAR 4
GR3_5_KEEP_PACE_MATH__ VARCHAR 4
GR6_8_GRADE_LEVEL_MATH__ VARCHAR 4
GR6_8_GRADE_LEVEL_READ__ VARCHAR 4
GR6_8_KEEP_PACE_MATH_ VARCHAR 4
GR6_8_KEEP_PACE_READ__ VARCHAR 4
GR_8_EXPLORE_MATH__ VARCHAR 4
GR_8_EXPLORE_READ__ VARCHAR 4
ISAT_EXCEEDING_MATH__ DECIMAL 4
ISAT_EXCEEDING_READING__ DECIMAL 4
ISAT_VALUE_ADD_MATH DECIMAL 3
ISAT_VALUE_ADD_READ DECIMAL 3
ISAT_VALUE_ADD_COLOR_MATH VARCHAR 6
ISAT_VALUE_ADD_COLOR_READ VARCHAR 6
STUDENTS_TAKING__ALGEBRA__ VARCHAR 4
STUDENTS_PASSING__ALGEBRA__ VARCHAR 4
9th Grade EXPLORE (2009) VARCHAR 4
9th Grade EXPLORE (2010) VARCHAR 4
10th Grade PLAN (2009) VARCHAR 4
10th Grade PLAN (2010) VARCHAR 4
NET_CHANGE_EXPLORE_AND_PLAN VARCHAR 3
11th Grade Average ACT (2011) VARCHAR 4
NET_CHANGE_PLAN_AND_ACT VARCHAR 3
COLLEGE_ELIGIBILITY__ VARCHAR 4
GRADUATION_RATE__ VARCHAR 4
COLLEGE_ENROLLMENT_RATE__ VARCHAR 4
COLLEGE_ENROLLMENT SMALLINT 2
GENERAL_SERVICES_ROUTE SMALLINT 2
FRESHMAN_ON_TRACK_RATE__ VARCHAR 4
X_COORDINATE DECIMAL 13
Y_COORDINATE DECIMAL 13
LATITUDE DECIMAL 18
LONGITUDE DECIMAL 18
COMMUNITY_AREA_NUMBER SMALLINT 2
COMMUNITY_AREA_NAME VARCHAR 22
WARD SMALLINT 2
POLICE_DISTRICT SMALLINT 2
LOCATION VARCHAR 27

Data Exploration Questions¶

  1. Is the column name for the "SCHOOL ID" attribute in upper or mixed case?
  2. What is the name of the "Community Area Name" column in your table? Does it have spaces?
  3. Are there any columns whose names have spaces or parentheses replaced by underscores?

My Analysis Challenges¶

Here are the questions I set out to answer using SQL and Python:

In [7]:
%sql SELECT COUNT(*) FROM SCHOOLS WHERE "Elementary, Middle, or High School" = 'ES'
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[7]:
1
462

What is the highest Safety Score?¶

In [8]:
%sql SELECT MAX(safety_score) FROM SCHOOLS;
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[8]:
1
99

Which schools have the highest Safety Score?¶

In [9]:
%sql SELECT NAME_OF_SCHOOL FROM SCHOOLS WHERE safety_score=99;
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[9]:
name_of_school
Abraham Lincoln Elementary School
Alexander Graham Bell Elementary School
Annie Keller Elementary Gifted Magnet School
Augustus H Burley Elementary School
Edgar Allan Poe Elementary Classical School
Edgebrook Elementary School
Ellen Mitchell Elementary School
James E McDade Elementary Classical School
James G Blaine Elementary School
LaSalle Elementary Language Academy
Mary E Courtenay Elementary Language Arts Center
Northside College Preparatory High School
Northside Learning Center High School
Norwood Park Elementary School
Oriole Park Elementary School
Sauganash Elementary School
Stephen Decatur Classical Elementary School
Talman Elementary School
Wildwood Elementary School

What are the top 10 schools with the highest Average Student Attendance?¶

In [10]:
%sql select Name_of_School, Average_Student_Attendance from SCHOOLS \
    order by Average_Student_Attendance desc nulls last limit 10 
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[10]:
name_of_school average_student_attendance
John Charles Haines Elementary School 98.40%
James Ward Elementary School 97.80%
Edgar Allan Poe Elementary Classical School 97.60%
Orozco Fine Arts & Sciences Elementary School 97.60%
Rachel Carson Elementary School 97.60%
Annie Keller Elementary Gifted Magnet School 97.50%
Andrew Jackson Elementary Language Academy 97.40%
Lenart Elementary Regional Gifted Center 97.40%
Disney II Magnet School 97.30%
John H Vanderpoel Elementary Magnet School 97.20%

Retrieve the list of 5 Schools with the lowest Average Student Attendance sorted in ascending order¶

In [11]:
%sql SELECT Name_of_School, Average_Student_Attendance  \
     from SCHOOLS \
     order by Average_Student_Attendance \
     fetch first 5 rows only
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[11]:
name_of_school average_student_attendance
Richard T Crane Technical Preparatory High School 57.90%
Barbara Vick Early Childhood & Family Center 60.90%
Dyett High School 62.50%
Wendell Phillips Academy High School 63.00%
Orr Academy High School 66.30%

Remove the '%' sign from the Average Student Attendance column¶

In [ ]:
%sql SELECT Name_of_School, REPLACE(Average_Student_Attendance,'%','') as Average_Student_Attendance\
     from SCHOOLS \
     order by Average_Student_Attendance \
     fetch first 5 rows only
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[ ]:
name_of_school average_student_attendance
Richard T Crane Technical Preparatory High School 57.90
Barbara Vick Early Childhood & Family Center 60.90
Dyett High School 62.50
Wendell Phillips Academy High School 63.00
Orr Academy High School 66.30

Which Schools have Average Student Attendance lower than 70%?¶

In [13]:
%sql SELECT Name_of_School, Average_Student_Attendance  \
     from SCHOOLS \
     where DECIMAL ( REPLACE(Average_Student_Attendance, '%', '') ) < 70 \
     order by Average_Student_Attendance
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[13]:
name_of_school average_student_attendance
Richard T Crane Technical Preparatory High School 57.90%
Barbara Vick Early Childhood & Family Center 60.90%
Dyett High School 62.50%
Wendell Phillips Academy High School 63.00%
Orr Academy High School 66.30%
Manley Career Academy High School 66.80%
Chicago Vocational Career Academy High School 68.80%
Roberto Clemente Community Academy High School 69.60%

Get the total College Enrollment for each Community Area¶

In [ ]:
%sql select Community_Area_Name, sum(College_Enrollment) AS TOTAL_ENROLLMENT \
   from SCHOOLS \
   group by Community_Area_Name 
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[ ]:
community_area_name total_enrollment
ALBANY PARK 6864
ARCHER HEIGHTS 4823
ARMOUR SQUARE 1458
ASHBURN 6483
AUBURN GRESHAM 4175
AUSTIN 10933
AVALON PARK 1522
AVONDALE 3640
BELMONT CRAGIN 14386
BEVERLY 1636
BRIDGEPORT 3167
BRIGHTON PARK 9647
BURNSIDE 549
CALUMET HEIGHTS 1568
CHATHAM 5042
CHICAGO LAWN 7086
CLEARING 2085
DOUGLAS 4670
DUNNING 4568
EAST GARFIELD PARK 5337
EAST SIDE 5305
EDGEWATER 4600
EDISON PARK 910
ENGLEWOOD 6832
FOREST GLEN 1431
FULLER PARK 531
GAGE PARK 9915
GARFIELD RIDGE 4552
GRAND BOULEVARD 2809
GREATER GRAND CROSSING 4051
HEGEWISCH 963
HERMOSA 3975
HUMBOLDT PARK 8620
HYDE PARK 1930
IRVING PARK 7764
JEFFERSON PARK 1755
KENWOOD 4287
LAKE VIEW 7055
LINCOLN PARK 5615
LINCOLN SQUARE 4132
LOGAN SQUARE 7351
LOOP 871
LOWER WEST SIDE 7257
MCKINLEY PARK 1552
MONTCLARE 1317
MORGAN PARK 3271
MOUNT GREENWOOD 2091
NEAR NORTH SIDE 3362
NEAR SOUTH SIDE 1378
NEAR WEST SIDE 7975
NEW CITY 7922
NORTH CENTER 7541
NORTH LAWNDALE 5146
NORTH PARK 4210
NORWOOD PARK 6469
OAKLAND 140
OHARE 786
PORTAGE PARK 6954
PULLMAN 1620
RIVERDALE 1547
ROGERS PARK 4068
ROSELAND 7020
SOUTH CHICAGO 4043
SOUTH DEERING 1859
SOUTH LAWNDALE 14793
SOUTH SHORE 4543
UPTOWN 4388
WASHINGTON HEIGHTS 4006
WASHINGTON PARK 2648
WEST ELSDON 3700
WEST ENGLEWOOD 5946
WEST GARFIELD PARK 2622
WEST LAWN 4207
WEST PULLMAN 3240
WEST RIDGE 8197
WEST TOWN 9429
WOODLAWN 4206

Get the 5 Community Areas with the least total College Enrollment sorted in ascending order¶

In [15]:
%sql select Community_Area_Name, sum(College_Enrollment) AS TOTAL_ENROLLMENT \
   from SCHOOLS \
   group by Community_Area_Name \
   order by TOTAL_ENROLLMENT asc \
   fetch first 5 rows only
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[15]:
community_area_name total_enrollment
OAKLAND 140
FULLER PARK 531
BURNSIDE 549
OHARE 786
LOOP 871

List 5 schools with lowest safety score.¶

In [16]:
%sql select Name_of_School,safety_score\
   from SCHOOLS \
order by safety_score asc\
fetch first 5 rows only 
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[16]:
name_of_school safety_score
Edmond Burke Elementary School 1
Luke O'Toole Elementary School 5
George W Tilton Elementary School 6
Foster Park Elementary School 11
Emil G Hirsch Metropolitan High School 13

Get the hardship index for the community area which has College Enrollment of 4368¶

In [ ]:
%%sql 
select hardship_index 
   from chicago_socioeconomic_data CD, schools CPS 
   where CD.ca = CPS.community_area_number 
      and college_enrollment = 4368
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[ ]:
hardship_index
6.0

Get the hardship index for the community area which has the school with the highest enrollment.¶

In [18]:
%sql select ca, community_area_name, hardship_index from chicago_socioeconomic_data \
   where ca in \
   ( select community_area_number from schools order by college_enrollment desc limit 1 )
 * ibm_db_sa://pgs19748:***@6667d8e9-9d4d-4ccb-ba32-21da3bb5aafc.c1ogj3sd0tgtu0lqde00.databases.appdomain.cloud:30376/BLUDB
Done.
Out[18]:
ca community_area_name hardship_index
5.0 North Center 6.0

Summary¶

In this project, I explored Chicago Public Schools data using SQL and Python. I loaded the data, ran queries to answer key questions, and practiced working with columns that have spaces or special characters. This approach helped me better understand the structure and challenges in Chicago's schools.


Notebook and analysis by Mohammad Sayem Chowdhury.

All code and explanations are my own. This notebook is shared under the MIT License.