Exploring the Survey Dataset¶

In this notebook, I explore and analyze the Stack Overflow survey dataset as part of my data analysis project.

Survey Dataset Exploration¶

Estimated time needed: 30 minutes

Objectives¶

After completing this lab you will be able to:

  • Load the dataset that will used thru the capstone project.
  • Explore the dataset.
  • Get familier with the data types.

Load the dataset¶

Import the required libraries.

In [41]:
import pandas as pd

The dataset is available on the IBM Cloud at the below url.

In [42]:
dataset_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/LargeData/m1_survey_data.csv"

Load the data available at dataset_url into a dataframe.

In [43]:
# your code goes here
df = pd.read_csv(dataset_url)
df
Out[43]:
Respondent MainBranch Hobbyist OpenSourcer OpenSource Employment Country Student EdLevel UndergradMajor ... WelcomeChange SONewContent Age Gender Trans Sexuality Ethnicity Dependents SurveyLength SurveyEase
0 4 I am a developer by profession No Never The quality of OSS and closed source software ... Employed full-time United States No Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 22.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Easy
1 9 I am a developer by profession Yes Once a month or more often The quality of OSS and closed source software ... Employed full-time New Zealand No Some college/university study without earning ... Computer science, computer engineering, or sof... ... Just as welcome now as I felt last year NaN 23.0 Man No Bisexual White or of European descent No Appropriate in length Neither easy nor difficult
2 13 I am a developer by profession Yes Less than once a month but more than once per ... OSS is, on average, of HIGHER quality than pro... Employed full-time United States No Master’s degree (MA, MS, M.Eng., MBA, etc.) Computer science, computer engineering, or sof... ... Somewhat more welcome now than last year Tech articles written by other developers;Cour... 28.0 Man No Straight / Heterosexual White or of European descent Yes Appropriate in length Easy
3 16 I am a developer by profession Yes Never The quality of OSS and closed source software ... Employed full-time United Kingdom No Master’s degree (MA, MS, M.Eng., MBA, etc.) NaN ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 26.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Neither easy nor difficult
4 17 I am a developer by profession Yes Less than once a month but more than once per ... The quality of OSS and closed source software ... Employed full-time Australia No Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 29.0 Man No Straight / Heterosexual Hispanic or Latino/Latina;Multiracial No Appropriate in length Easy
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
11547 25136 I am a developer by profession Yes Never OSS is, on average, of HIGHER quality than pro... Employed full-time United States No Master’s degree (MA, MS, M.Eng., MBA, etc.) Computer science, computer engineering, or sof... ... Just as welcome now as I felt last year Tech articles written by other developers;Cour... 36.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Difficult
11548 25137 I am a developer by profession Yes Never The quality of OSS and closed source software ... Employed full-time Poland No Master’s degree (MA, MS, M.Eng., MBA, etc.) Computer science, computer engineering, or sof... ... A lot more welcome now than last year Tech articles written by other developers;Tech... 25.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Neither easy nor difficult
11549 25138 I am a developer by profession Yes Less than once per year The quality of OSS and closed source software ... Employed full-time United States No Master’s degree (MA, MS, M.Eng., MBA, etc.) Computer science, computer engineering, or sof... ... A lot more welcome now than last year Tech articles written by other developers;Indu... 34.0 Man No Straight / Heterosexual White or of European descent Yes Too long Easy
11550 25141 I am a developer by profession Yes Less than once a month but more than once per ... OSS is, on average, of LOWER quality than prop... Employed full-time Switzerland No Secondary school (e.g. American high school, G... NaN ... Somewhat less welcome now than last year NaN 25.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Easy
11551 25142 I am a developer by profession Yes Less than once a month but more than once per ... OSS is, on average, of HIGHER quality than pro... Employed full-time United Kingdom No Other doctoral degree (Ph.D, Ed.D., etc.) A natural science (ex. biology, chemistry, phy... ... Just as welcome now as I felt last year Tech articles written by other developers;Tech... 30.0 Man No Bisexual White or of European descent No Appropriate in length Easy

11552 rows × 85 columns

Explore the data set¶

It is a good idea to print the top 5 rows of the dataset to get a feel of how the dataset will look.

Display the top 5 rows and columns from your dataset.

In [44]:
# your code goes here
df.head()
Out[44]:
Respondent MainBranch Hobbyist OpenSourcer OpenSource Employment Country Student EdLevel UndergradMajor ... WelcomeChange SONewContent Age Gender Trans Sexuality Ethnicity Dependents SurveyLength SurveyEase
0 4 I am a developer by profession No Never The quality of OSS and closed source software ... Employed full-time United States No Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 22.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Easy
1 9 I am a developer by profession Yes Once a month or more often The quality of OSS and closed source software ... Employed full-time New Zealand No Some college/university study without earning ... Computer science, computer engineering, or sof... ... Just as welcome now as I felt last year NaN 23.0 Man No Bisexual White or of European descent No Appropriate in length Neither easy nor difficult
2 13 I am a developer by profession Yes Less than once a month but more than once per ... OSS is, on average, of HIGHER quality than pro... Employed full-time United States No Master’s degree (MA, MS, M.Eng., MBA, etc.) Computer science, computer engineering, or sof... ... Somewhat more welcome now than last year Tech articles written by other developers;Cour... 28.0 Man No Straight / Heterosexual White or of European descent Yes Appropriate in length Easy
3 16 I am a developer by profession Yes Never The quality of OSS and closed source software ... Employed full-time United Kingdom No Master’s degree (MA, MS, M.Eng., MBA, etc.) NaN ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 26.0 Man No Straight / Heterosexual White or of European descent No Appropriate in length Neither easy nor difficult
4 17 I am a developer by profession Yes Less than once a month but more than once per ... The quality of OSS and closed source software ... Employed full-time Australia No Bachelor’s degree (BA, BS, B.Eng., etc.) Computer science, computer engineering, or sof... ... Just as welcome now as I felt last year Tech articles written by other developers;Indu... 29.0 Man No Straight / Heterosexual Hispanic or Latino/Latina;Multiracial No Appropriate in length Easy

5 rows × 85 columns

Find out the number of rows and columns¶

Start by exploring the numbers of rows and columns of data in the dataset.

Print the number of rows in the dataset.

In [45]:
# your code goes here
print("Number of rows: ",len(df))
Number of rows:  11552

Print the number of columns in the dataset.

In [46]:
# your code goes here
print("Number of columns: ",len(df.columns))
Number of columns:  85

Identify the data types of each column¶

Explore the dataset and identify the data types of each column.

Print the datatype of all columns.

In [47]:
# your code goes here
# dict(df.dtypes)
df.dtypes
Out[47]:
Respondent       int64
MainBranch      object
Hobbyist        object
OpenSourcer     object
OpenSource      object
                 ...  
Sexuality       object
Ethnicity       object
Dependents      object
SurveyLength    object
SurveyEase      object
Length: 85, dtype: object

Print the mean age of the survey participants.

In [48]:
# your code goes here
mean_age = df["Age"].mean()
print("mean age of the survey participants",mean_age)
mean age of the survey participants 30.77239449133718

The dataset is the result of a world wide survey. Print how many unique countries are there in the Country column.

In [49]:
# your code goes here
print("Unique Countries are there in the Country column",df["Country"].value_counts().count())
Unique Countries are there in the Country column 135