Python Sets: My Comprehensive Guide¶
Author: Mohammad Sayem Chowdhury
Welcome! In this notebook, I'll share my experience working with sets in Python. Sets are incredibly powerful for handling unique collections and performing mathematical operations. By the end, you'll understand how I approach set operations, logic operations, and when sets are the perfect tool for the job.
What I'll Cover¶
- Understanding Sets and Uniqueness
- Creating and Modifying Sets
- Set Operations (Union, Intersection, Difference)
- Set Logic Operations
- Practical Applications
- Performance Benefits
Estimated time: about 20 minutes of hands-on exploration.
Understanding Sets: My Perspective¶
Sets are collections of unique elements - no duplicates allowed! I think of them as mathematical sets, perfect for membership testing, removing duplicates, and performing set operations like unions and intersections.
Key characteristics I always remember:
- Unique elements only: Duplicates are automatically removed
- Unordered: No indexing or specific order
- Mutable: Can add and remove elements
- Fast membership testing: Very efficient for checking if an item exists
Creating Sets: Different Methods¶
Let me show the various ways I create sets:
# Method 1: Using curly braces
my_skills = {"Python", "JavaScript", "SQL", "Machine Learning", "Data Analysis"}
print("My skills set:", my_skills)
# Method 2: Using set() constructor with a list
programming_languages = set(["Python", "Java", "C++", "Python", "JavaScript", "Python"])
print("Programming languages (duplicates removed):", programming_languages)
# Method 3: Creating from a string
unique_letters = set("Mohammad")
print("Unique letters in my name:", unique_letters)
# Method 4: Empty set (note: {} creates a dict, not a set!)
empty_set = set()
print("Empty set:", empty_set, type(empty_set))
# Method 5: Set comprehension
even_numbers = {x for x in range(10) if x % 2 == 0}
print("Even numbers 0-9:", even_numbers)
Basic Set Operations¶
Let me explore the fundamental operations I use with sets:
# Working with my hobbies
hobbies = {"reading", "coding", "music", "photography", "traveling"}
print("My hobbies:", hobbies)
print(f"Number of hobbies: {len(hobbies)}")
# Adding elements
hobbies.add("cooking")
print("After adding cooking:", hobbies)
# Adding multiple elements
hobbies.update(["gaming", "writing", "swimming"])
print("After adding multiple hobbies:", hobbies)
# Removing elements (different methods)
hobbies.remove("swimming") # Raises error if not found
print("After removing swimming:", hobbies)
hobbies.discard("dancing") # No error if not found
print("After discarding dancing (not in set):", hobbies)
# Pop removes and returns arbitrary element
removed_hobby = hobbies.pop()
print(f"Removed hobby: {removed_hobby}")
print("Remaining hobbies:", hobbies)
# Membership testing
print(f"Is 'coding' a hobby? {'coding' in hobbies}")
print(f"Is 'skydiving' a hobby? {'skydiving' in hobbies}")
Set Mathematical Operations¶
This is where sets really shine! I can perform mathematical set operations:
# My programming languages vs my friend's
my_languages = {"Python", "JavaScript", "SQL", "R"}
friend_languages = {"Java", "C++", "Python", "JavaScript", "Go"}
print("My languages:", my_languages)
print("Friend's languages:", friend_languages)
print()
# Union: All languages either of us knows
all_languages = my_languages | friend_languages # or my_languages.union(friend_languages)
print("All languages (Union):", all_languages)
# Intersection: Languages we both know
common_languages = my_languages & friend_languages # or my_languages.intersection(friend_languages)
print("Common languages (Intersection):", common_languages)
# Difference: Languages I know but my friend doesn't
my_unique = my_languages - friend_languages # or my_languages.difference(friend_languages)
print("Languages only I know (Difference):", my_unique)
# Symmetric difference: Languages known by only one of us
unique_to_each = my_languages ^ friend_languages # or my_languages.symmetric_difference(friend_languages)
print("Languages unique to each (Symmetric Difference):", unique_to_each)
Set Comparison Operations¶
I can also compare sets to understand their relationships:
# Different skill levels
beginner_skills = {"Python", "SQL"}
intermediate_skills = {"Python", "SQL", "JavaScript", "Git"}
advanced_skills = {"Python", "SQL", "JavaScript", "Git", "Machine Learning", "Docker"}
expert_skills = {"Python", "SQL", "JavaScript", "Git", "ML", "Docker", "Kubernetes", "AWS"}
print("Beginner:", beginner_skills)
print("Intermediate:", intermediate_skills)
print("Advanced:", advanced_skills)
print()
# Subset relationships
print("Is beginner a subset of intermediate?", beginner_skills.issubset(intermediate_skills))
print("Is intermediate a subset of advanced?", intermediate_skills.issubset(advanced_skills))
print("Is advanced a superset of beginner?", advanced_skills.issuperset(beginner_skills))
# Disjoint sets (no common elements)
frontend_skills = {"HTML", "CSS", "React", "Vue"}
backend_skills = {"Node.js", "Django", "FastAPI", "PostgreSQL"}
print(f"\nFrontend and backend are disjoint: {frontend_skills.isdisjoint(backend_skills)}")
print(f"Beginner and expert are disjoint: {beginner_skills.isdisjoint(expert_skills)}")
# Equal sets
copy_of_beginner = {"SQL", "Python"} # Order doesn't matter
print(f"Beginner equals its copy: {beginner_skills == copy_of_beginner}")
Frozen Sets: Immutable Sets¶
Sometimes I need immutable sets, especially as dictionary keys:
# Creating frozen sets
web_technologies = frozenset(["HTML", "CSS", "JavaScript"])
data_technologies = frozenset(["Python", "SQL", "Pandas"])
print("Web technologies (frozen):", web_technologies)
print("Data technologies (frozen):", data_technologies)
# Frozen sets can be dictionary keys
technology_categories = {
web_technologies: "Frontend Development",
data_technologies: "Data Science",
frozenset(["Java", "Spring", "Maven"]): "Enterprise Development"
}
print("\nTechnology categories:")
for tech_set, category in technology_categories.items():
print(f"{category}: {tech_set}")
# Frozen sets support set operations
all_tech = web_technologies | data_technologies
print(f"\nCombined technologies: {all_tech}")
# But cannot be modified
try:
web_technologies.add("React") # This will fail
except AttributeError as e:
print(f"Cannot modify frozen set: {e}")
Practical Applications¶
Let me demonstrate real-world scenarios where I use sets:
# Application 1: Finding unique visitors across days
monday_visitors = {"alice", "bob", "charlie", "diana", "eve"}
tuesday_visitors = {"bob", "diana", "frank", "grace", "henry"}
wednesday_visitors = {"alice", "charlie", "frank", "ivan", "jane"}
print("Daily visitors:")
print(f"Monday: {monday_visitors}")
print(f"Tuesday: {tuesday_visitors}")
print(f"Wednesday: {wednesday_visitors}")
# Analysis using sets
all_visitors = monday_visitors | tuesday_visitors | wednesday_visitors
print(f"\nTotal unique visitors: {len(all_visitors)}")
print(f"All visitors: {all_visitors}")
# Visitors who came multiple days
multi_day_visitors = (monday_visitors & tuesday_visitors) | \
(monday_visitors & wednesday_visitors) | \
(tuesday_visitors & wednesday_visitors)
print(f"Multi-day visitors: {multi_day_visitors}")
# Visitors who came all three days
all_three_days = monday_visitors & tuesday_visitors & wednesday_visitors
print(f"Visitors all three days: {all_three_days}")
# Application 2: Data cleaning - removing duplicates
messy_data = [
"python", "PYTHON", "Python", "java", "JAVA",
"javascript", "JavaScript", "sql", "SQL", "python"
]
print("Original messy data:", messy_data)
# Method 1: Simple deduplication (case-sensitive)
unique_simple = list(set(messy_data))
print("Simple deduplication:", unique_simple)
# Method 2: Case-insensitive deduplication
seen = set()
unique_case_insensitive = []
for item in messy_data:
if item.lower() not in seen:
seen.add(item.lower())
unique_case_insensitive.append(item.lower())
print("Case-insensitive unique:", unique_case_insensitive)
# Method 3: Using set comprehension for normalization
normalized_unique = {item.lower() for item in messy_data}
print("Normalized using set comprehension:", normalized_unique)
# Application 3: Permission system
# Different user roles and their permissions
admin_permissions = {
"read", "write", "delete", "create_user", "modify_user",
"delete_user", "view_logs", "system_config"
}
editor_permissions = {
"read", "write", "delete", "view_logs"
}
viewer_permissions = {
"read", "view_logs"
}
# User assignments
user_roles = {
"mohammad": admin_permissions,
"sarah": editor_permissions,
"john": viewer_permissions,
"alice": editor_permissions
}
def check_permission(username, required_permission):
"""Check if user has required permission."""
user_perms = user_roles.get(username, set())
return required_permission in user_perms
def get_common_permissions(*role_sets):
"""Find permissions common to all roles."""
if not role_sets:
return set()
result = role_sets[0]
for role in role_sets[1:]:
result = result & role
return result
# Testing the permission system
print("Permission Analysis:")
print(f"Mohammad can delete: {check_permission('mohammad', 'delete')}")
print(f"John can write: {check_permission('john', 'write')}")
common_perms = get_common_permissions(admin_permissions, editor_permissions, viewer_permissions)
print(f"Permissions available to all roles: {common_perms}")
# Find users with specific permission
write_users = [user for user, perms in user_roles.items() if "write" in perms]
print(f"Users with write permission: {write_users}")
Performance Comparison¶
Let me demonstrate why sets are so efficient for membership testing:
import time
# Create large collections
large_list = list(range(100000))
large_set = set(range(100000))
# Element to search for (worst case - at the end)
search_element = 99999
# Time list membership test
start_time = time.time()
result_list = search_element in large_list
list_time = time.time() - start_time
# Time set membership test
start_time = time.time()
result_set = search_element in large_set
set_time = time.time() - start_time
print(f"List membership test: {list_time:.8f} seconds")
print(f"Set membership test: {set_time:.8f} seconds")
print(f"Set is {list_time/set_time:.1f}x faster!")
# Memory usage comparison
import sys
print(f"\nMemory usage:")
print(f"List size: {sys.getsizeof(large_list):,} bytes")
print(f"Set size: {sys.getsizeof(large_set):,} bytes")
Set Comprehensions and Advanced Techniques¶
I love using set comprehensions for creating sets with conditions:
# Set comprehensions with conditions
text = "The quick brown fox jumps over the lazy dog"
# Unique vowels in the text
vowels = {char.lower() for char in text if char.lower() in 'aeiou'}
print(f"Unique vowels: {vowels}")
# Unique word lengths
word_lengths = {len(word) for word in text.split()}
print(f"Unique word lengths: {word_lengths}")
# Numbers that are perfect squares
squares = {x**2 for x in range(1, 11)}
print(f"Perfect squares 1-100: {squares}")
# Complex filtering
people = [
{"name": "Mohammad", "age": 25, "city": "Dhaka"},
{"name": "Sarah", "age": 30, "city": "London"},
{"name": "Ahmed", "age": 28, "city": "Dhaka"},
{"name": "Lisa", "age": 25, "city": "New York"}
]
# Unique ages of people from Dhaka
dhaka_ages = {person["age"] for person in people if person["city"] == "Dhaka"}
print(f"Ages of people from Dhaka: {dhaka_ages}")
# Unique cities
cities = {person["city"] for person in people}
print(f"All cities: {cities}")
Working with Multiple Sets¶
Advanced operations with multiple sets:
# Skills required for different job roles
frontend_dev = {"HTML", "CSS", "JavaScript", "React", "Git"}
backend_dev = {"Python", "SQL", "API Design", "Git", "Docker"}
data_scientist = {"Python", "SQL", "Statistics", "Machine Learning", "Git"}
devops_engineer = {"Docker", "Kubernetes", "AWS", "Git", "Python"}
all_roles = [frontend_dev, backend_dev, data_scientist, devops_engineer]
role_names = ["Frontend Dev", "Backend Dev", "Data Scientist", "DevOps Engineer"]
# Find skills common to all roles
universal_skills = frontend_dev.copy()
for role_skills in all_roles[1:]:
universal_skills &= role_skills
print(f"Universal skills (needed by all): {universal_skills}")
# Find skills unique to each role
print("\nUnique skills by role:")
for i, (role_name, role_skills) in enumerate(zip(role_names, all_roles)):
other_roles = set()
for j, other_role in enumerate(all_roles):
if i != j:
other_roles |= other_role
unique_skills = role_skills - other_roles
print(f"{role_name}: {unique_skills}")
# Find most versatile skills (appearing in most roles)
all_skills = set()
for role_skills in all_roles:
all_skills |= role_skills
skill_popularity = {}
for skill in all_skills:
count = sum(1 for role_skills in all_roles if skill in role_skills)
skill_popularity[skill] = count
print(f"\nSkill popularity:")
for skill, count in sorted(skill_popularity.items(), key=lambda x: x[1], reverse=True):
print(f"{skill}: appears in {count} roles")
My Key Takeaways¶
Sets have become indispensable in my Python toolkit. Here's what I consider most important:
- Uniqueness: Automatic duplicate removal makes data cleaning effortless
- Fast Membership Testing: O(1) average case for checking if element exists
- Mathematical Operations: Union, intersection, difference operations are intuitive
- Data Analysis: Perfect for finding relationships between datasets
- Performance: Much faster than lists for membership testing and uniqueness operations
I use sets when I need to:
- Remove duplicates from data
- Test membership efficiently
- Perform mathematical set operations
- Find common or unique elements between collections
- Implement permission systems
- Analyze relationships between datasets
The combination of sets with comprehensions creates powerful, readable code for data processing and analysis tasks.
Copyright © 2025 Mohammad Sayem Chowdhury. This notebook and its content are shared for personal learning and inspiration.
Next Steps in My Set Journey¶
Understanding sets has been crucial for advanced topics like graph algorithms, database operations, and data analysis. They're fundamental to many algorithms and data processing pipelines I use in my work.
— Mohammad Sayem Chowdhury