Exploring the Survey Dataset¶
In this notebook, I explore and analyze the Stack Overflow survey dataset as part of my data analysis project.
Survey Dataset Exploration¶
Estimated time needed: 30 minutes
Objectives¶
After completing this lab you will be able to:
- Load the dataset that will used thru the capstone project.
- Explore the dataset.
- Get familier with the data types.
Load the dataset¶
Import the required libraries.
import pandas as pd
The dataset is available on the IBM Cloud at the below url.
dataset_url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/LargeData/m1_survey_data.csv"
Load the data available at dataset_url into a dataframe.
# your code goes here
df = pd.read_csv(dataset_url)
df
| Respondent | MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | ... | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4 | I am a developer by profession | No | Never | The quality of OSS and closed source software ... | Employed full-time | United States | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 22.0 | Man | No | Straight / Heterosexual | White or of European descent | No | Appropriate in length | Easy |
| 1 | 9 | I am a developer by profession | Yes | Once a month or more often | The quality of OSS and closed source software ... | Employed full-time | New Zealand | No | Some college/university study without earning ... | Computer science, computer engineering, or sof... | ... | Just as welcome now as I felt last year | NaN | 23.0 | Man | No | Bisexual | White or of European descent | No | Appropriate in length | Neither easy nor difficult |
| 2 | 13 | I am a developer by profession | Yes | Less than once a month but more than once per ... | OSS is, on average, of HIGHER quality than pro... | Employed full-time | United States | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | Computer science, computer engineering, or sof... | ... | Somewhat more welcome now than last year | Tech articles written by other developers;Cour... | 28.0 | Man | No | Straight / Heterosexual | White or of European descent | Yes | Appropriate in length | Easy |
| 3 | 16 | I am a developer by profession | Yes | Never | The quality of OSS and closed source software ... | Employed full-time | United Kingdom | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | NaN | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 26.0 | Man | No | Straight / Heterosexual | White or of European descent | No | Appropriate in length | Neither easy nor difficult |
| 4 | 17 | I am a developer by profession | Yes | Less than once a month but more than once per ... | The quality of OSS and closed source software ... | Employed full-time | Australia | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 29.0 | Man | No | Straight / Heterosexual | Hispanic or Latino/Latina;Multiracial | No | Appropriate in length | Easy |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 11547 | 25136 | I am a developer by profession | Yes | Never | OSS is, on average, of HIGHER quality than pro... | Employed full-time | United States | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | Computer science, computer engineering, or sof... | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Cour... | 36.0 | Man | No | Straight / Heterosexual | White or of European descent | No | Appropriate in length | Difficult |
| 11548 | 25137 | I am a developer by profession | Yes | Never | The quality of OSS and closed source software ... | Employed full-time | Poland | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | Computer science, computer engineering, or sof... | ... | A lot more welcome now than last year | Tech articles written by other developers;Tech... | 25.0 | Man | No | Straight / Heterosexual | White or of European descent | No | Appropriate in length | Neither easy nor difficult |
| 11549 | 25138 | I am a developer by profession | Yes | Less than once per year | The quality of OSS and closed source software ... | Employed full-time | United States | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | Computer science, computer engineering, or sof... | ... | A lot more welcome now than last year | Tech articles written by other developers;Indu... | 34.0 | Man | No | Straight / Heterosexual | White or of European descent | Yes | Too long | Easy |
| 11550 | 25141 | I am a developer by profession | Yes | Less than once a month but more than once per ... | OSS is, on average, of LOWER quality than prop... | Employed full-time | Switzerland | No | Secondary school (e.g. American high school, G... | NaN | ... | Somewhat less welcome now than last year | NaN | 25.0 | Man | No | Straight / Heterosexual | White or of European descent | No | Appropriate in length | Easy |
| 11551 | 25142 | I am a developer by profession | Yes | Less than once a month but more than once per ... | OSS is, on average, of HIGHER quality than pro... | Employed full-time | United Kingdom | No | Other doctoral degree (Ph.D, Ed.D., etc.) | A natural science (ex. biology, chemistry, phy... | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Tech... | 30.0 | Man | No | Bisexual | White or of European descent | No | Appropriate in length | Easy |
11552 rows × 85 columns
Explore the data set¶
It is a good idea to print the top 5 rows of the dataset to get a feel of how the dataset will look.
Display the top 5 rows and columns from your dataset.
# your code goes here
df.head()
| Respondent | MainBranch | Hobbyist | OpenSourcer | OpenSource | Employment | Country | Student | EdLevel | UndergradMajor | ... | WelcomeChange | SONewContent | Age | Gender | Trans | Sexuality | Ethnicity | Dependents | SurveyLength | SurveyEase | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4 | I am a developer by profession | No | Never | The quality of OSS and closed source software ... | Employed full-time | United States | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 22.0 | Man | No | Straight / Heterosexual | White or of European descent | No | Appropriate in length | Easy |
| 1 | 9 | I am a developer by profession | Yes | Once a month or more often | The quality of OSS and closed source software ... | Employed full-time | New Zealand | No | Some college/university study without earning ... | Computer science, computer engineering, or sof... | ... | Just as welcome now as I felt last year | NaN | 23.0 | Man | No | Bisexual | White or of European descent | No | Appropriate in length | Neither easy nor difficult |
| 2 | 13 | I am a developer by profession | Yes | Less than once a month but more than once per ... | OSS is, on average, of HIGHER quality than pro... | Employed full-time | United States | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | Computer science, computer engineering, or sof... | ... | Somewhat more welcome now than last year | Tech articles written by other developers;Cour... | 28.0 | Man | No | Straight / Heterosexual | White or of European descent | Yes | Appropriate in length | Easy |
| 3 | 16 | I am a developer by profession | Yes | Never | The quality of OSS and closed source software ... | Employed full-time | United Kingdom | No | Master’s degree (MA, MS, M.Eng., MBA, etc.) | NaN | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 26.0 | Man | No | Straight / Heterosexual | White or of European descent | No | Appropriate in length | Neither easy nor difficult |
| 4 | 17 | I am a developer by profession | Yes | Less than once a month but more than once per ... | The quality of OSS and closed source software ... | Employed full-time | Australia | No | Bachelor’s degree (BA, BS, B.Eng., etc.) | Computer science, computer engineering, or sof... | ... | Just as welcome now as I felt last year | Tech articles written by other developers;Indu... | 29.0 | Man | No | Straight / Heterosexual | Hispanic or Latino/Latina;Multiracial | No | Appropriate in length | Easy |
5 rows × 85 columns
Find out the number of rows and columns¶
Start by exploring the numbers of rows and columns of data in the dataset.
Print the number of rows in the dataset.
# your code goes here
print("Number of rows: ",len(df))
Number of rows: 11552
Print the number of columns in the dataset.
# your code goes here
print("Number of columns: ",len(df.columns))
Number of columns: 85
Identify the data types of each column¶
Explore the dataset and identify the data types of each column.
Print the datatype of all columns.
# your code goes here
# dict(df.dtypes)
df.dtypes
Respondent int64
MainBranch object
Hobbyist object
OpenSourcer object
OpenSource object
...
Sexuality object
Ethnicity object
Dependents object
SurveyLength object
SurveyEase object
Length: 85, dtype: object
Print the mean age of the survey participants.
# your code goes here
mean_age = df["Age"].mean()
print("mean age of the survey participants",mean_age)
mean age of the survey participants 30.77239449133718
The dataset is the result of a world wide survey. Print how many unique countries are there in the Country column.
# your code goes here
print("Unique Countries are there in the Country column",df["Country"].value_counts().count())
Unique Countries are there in the Country column 135