Introduction to Structured Data
Imagine you're organizing a library. You could throw all the books into a pile, but that would make finding anything nearly impossible. Instead, libraries use different organizational systems - some books are arranged in sequence (like fiction alphabetically by author), while others are grouped by category (like non-fiction by subject). Python's structured data types work in similar ways, providing different methods to organize and access data effectively.
Let's explore how Python helps us organize data through its various structured data types, each with its own characteristics and use cases.
Understanding Sequences
Think of a sequence like a line of people waiting for a movie. Each person has a specific position in line (their index), starting with position 0 for the first person. This ordering is crucial - it maintains the exact arrangement of elements.
Exploring Different Types of Sequences
# Strings - A sequence of characters
movie_title = "Star Wars"
print(movie_title[0]) # Prints 'S'
print(movie_title[5]) # Prints 'W'
# Lists - A sequence of any type of elements
movie_ratings = [4.5, 3.8, 5.0, 4.2]
print(movie_ratings[0]) # Prints 4.5
# Tuples - An immutable sequence
movie_info = ("Star Wars", 1977, "Science Fiction")
print(movie_info[1]) # Prints 1977
# Ranges - A sequence of numbers
showing_times = range(10, 22, 2) # Movies from 10 AM to 10 PM, every 2 hours
for time in showing_times:
print(f"Showing at {time}:00")
Working with Sequence Operations
Let's explore common operations that work across all sequence types:
# Length of sequences
print(len(movie_title)) # Number of characters
print(len(movie_ratings)) # Number of ratings
print(len(movie_info)) # Number of tuple items
# Checking membership
print("Star" in movie_title) # True
print(3.8 in movie_ratings) # True
print("Comedy" in movie_info) # False
# Concatenation
numbers = [1, 2, 3] + [4, 5, 6]
print(numbers) # [1, 2, 3, 4, 5, 6]
# Repetition
reminder = ["Don't forget!"] * 3
print(reminder) # ['Don't forget!', 'Don't forget!', 'Don't forget!']
Understanding Collections
If sequences are like a line of people at a movie theater, collections are more like the theater itself - with different areas serving different purposes. Collections don't maintain order; instead, they organize elements in other useful ways.
Dictionaries: Key-Value Pairs
# Creating a movie database using a dictionary
movie_database = {
"title": "The Matrix",
"year": 1999,
"directors": ["Lana Wachowski", "Lilly Wachowski"],
"ratings": {
"imdb": 8.7,
"rotten_tomatoes": 88
}
}
# Accessing dictionary values
print(movie_database["title"])
print(movie_database["ratings"]["imdb"])
# Adding new information
movie_database["genre"] = "Science Fiction"
# Modifying existing information
movie_database["year"] = 1999 # Even though it's the same value, we can modify it
Sets: Unique Collections
# Creating a set of unique movie genres
movie_genres = {"Action", "Comedy", "Drama", "Action"} # Note the duplicate
print(movie_genres) # Prints {'Action', 'Comedy', 'Drama'} - duplicates removed
# Set operations
scifi_movies = {"The Matrix", "Inception", "Star Wars"}
action_movies = {"The Matrix", "Die Hard", "John Wick"}
# Movies that are both sci-fi and action
print(scifi_movies & action_movies) # Intersection
# All movies from both categories
print(scifi_movies | action_movies) # Union
# Sci-fi movies that aren't action movies
print(scifi_movies - action_movies) # Difference
Understanding Iterables
An iterable is like a tour guide that knows how to show you each item in a collection or sequence, one at a time. This consistent interface makes it possible to write code that works with many different types of data structures.
# Different types of iterables
sequence_example = [1, 2, 3]
collection_example = {"a": 1, "b": 2}
string_example = "Hello"
set_example = {1, 2, 3}
# They all work with for loops
for item in sequence_example:
print(item)
for key in collection_example:
print(key, collection_example[key])
for char in string_example:
print(char)
for number in set_example:
print(number)
Understanding Mutability
Think of mutability like a pencil drawing versus a painting that has dried. You can erase and modify a pencil drawing (mutable), but once a painting has dried, it cannot be changed (immutable).
Mutable vs Immutable Types
# Mutable Examples
mutable_list = [1, 2, 3]
mutable_list[0] = 99 # This works
print(mutable_list) # [99, 2, 3]
mutable_dict = {"name": "Alice"}
mutable_dict["age"] = 30 # This works
print(mutable_dict) # {'name': 'Alice', 'age': 30}
# Immutable Examples
immutable_tuple = (1, 2, 3)
try:
immutable_tuple[0] = 99 # This will raise an error
except TypeError as e:
print("Cannot modify tuple:", e)
immutable_string = "Hello"
try:
immutable_string[0] = "h" # This will raise an error
except TypeError as e:
print("Cannot modify string:", e)
Practical Application: Movie Recommendation System
Let's put everything together in a practical example that uses various data structures:
class MovieRecommendationSystem:
def __init__(self):
# Using a dictionary to store movie information
self.movies = {
"The Matrix": {
"genre": {"Science Fiction", "Action"},
"year": 1999,
"rating": 8.7,
"similar_movies": ["Inception", "Blade Runner"]
},
"Inception": {
"genre": {"Science Fiction", "Action", "Drama"},
"year": 2010,
"rating": 8.8,
"similar_movies": ["The Matrix", "Interstellar"]
}
}
# Using a set to track unique genres
self.all_genres = set()
for movie in self.movies.values():
self.all_genres.update(movie["genre"])
def get_movies_by_genre(self, genre):
"""Return a list of movies in a specific genre."""
return [
title for title, info in self.movies.items()
if genre in info["genre"]
]
def get_movie_recommendations(self, watched_movie):
"""Get recommendations based on a watched movie."""
if watched_movie not in self.movies:
return []
# Create a list of recommended movies
recommendations = []
# Add similar movies
recommendations.extend(self.movies[watched_movie]["similar_movies"])
# Add movies from the same genres
watched_genres = self.movies[watched_movie]["genre"]
for movie, info in self.movies.items():
if movie != watched_movie and info["genre"] & watched_genres:
recommendations.append(movie)
return list(set(recommendations)) # Remove duplicates using a set
# Using the system
recommender = MovieRecommendationSystem()
print("All genres:", recommender.all_genres)
print("Sci-fi movies:", recommender.get_movies_by_genre("Science Fiction"))
print("Recommendations for Matrix fans:",
recommender.get_movie_recommendations("The Matrix"))
Best Practices and Tips
When working with structured data in Python, keep these guidelines in mind:
# 1. Choose the right data structure for your needs
# Use lists when:
ordered_data = [1, 2, 3] # Order matters
flexible_data = [1, "two", 3.0] # Need to store different types
mutable_data = [1, 2] # Need to modify contents
# Use tuples when:
coordinates = (x, y) # Data shouldn't change
return_values = get_position() # Returning multiple values
# Use dictionaries when:
config = {"port": 8000, "host": "localhost"} # Need key-value pairs
cache = {} # Need fast lookups
# Use sets when:
unique_visitors = {"user1", "user2"} # Need unique values
tags = {"python", "coding"} # Need set operations
# 2. Consider performance implications
large_list = list(range(1000000)) # Searching will be slow
large_set = set(range(1000000)) # Searching will be fast
# 3. Use appropriate methods
# List methods
my_list = [3, 1, 2]
my_list.sort() # Modifies in place
sorted_list = sorted(my_list) # Creates new list
# Dictionary methods
my_dict = {"a": 1}
value = my_dict.get("b", 0) # Safe access with default
Related Topics to Explore
To deepen your understanding of Python's structured data, consider exploring:
- Advanced sequence operations and slicing
- Collections module (Counter, defaultdict, OrderedDict)
- Array module for homogeneous sequences
- List, dictionary, and set comprehensions
- Generator expressions and lazy evaluation