Transforming Data with map()
Imagine you're running a assembly line in a factory. Each product moves down the line and undergoes the same transformation. This is exactly how map() works - it applies the same function to every item in an iterable.
Understanding map()
# Basic syntax: map(transformation_function, iterable)
# Simple temperature conversion example
celsius_temps = [0, 10, 20, 30, 40]
def to_fahrenheit(c):
"""Convert Celsius to Fahrenheit"""
return (c * 9/5) + 32
# Converting all temperatures at once
fahrenheit_temps = list(map(to_fahrenheit, celsius_temps))
print("Temperature Conversion:")
for c, f in zip(celsius_temps, fahrenheit_temps):
print(f"{c}°C = {f}°F")
Let's break down what makes map() special:
- It's memory efficient - it transforms items one at a time
- It creates an iterator, not a list (use list() to see all results)
- It can work with any function that takes one argument
Real-World Example: Data Cleaning
# Cleaning messy user data
raw_data = [
' John Smith ',
'JANE DOE',
'bob wilson',
' Alice Brown '
]
def clean_name(name):
"""Standardize a name by:
1. Removing extra spaces
2. Capitalizing first letters
3. Converting rest to lowercase"""
return name.strip().title()
# Clean all names at once
clean_names = list(map(clean_name, raw_data))
print("\nCleaned Names:")
for original, cleaned in zip(raw_data, clean_names):
print(f"Original: '{original}' → Cleaned: '{cleaned}'")
Filtering Data with filter()
Think of filter() as a quality control station on our assembly line. Each item is inspected, and only those meeting certain criteria are allowed to pass through.
Understanding filter()
# Basic syntax: filter(condition_function, iterable)
# Example: Finding prime numbers
def is_prime(n):
"""Check if a number is prime"""
if n < 2:
return False
for i in range(2, int(n ** 0.5) + 1):
if n % i == 0:
return False
return True
numbers = range(1, 20)
primes = list(filter(is_prime, numbers))
print(f"Prime numbers up to 20: {primes}")
# Practical Example: Error Log Analysis
log_entries = [
"INFO: System started",
"ERROR: Database connection failed",
"INFO: Processing file",
"ERROR: Memory overflow",
"WARNING: High CPU usage"
]
def is_error(entry):
"""Check if log entry is an error"""
return entry.startswith("ERROR")
# Finding all error messages
errors = list(filter(is_error, log_entries))
print("\nError Messages Found:")
for error in errors:
print(f"- {error}")
Combining map() and filter()
# Processing student grades
students = [
{'name': 'Alice', 'grade': 85, 'attendance': 0.95},
{'name': 'Bob', 'grade': 92, 'attendance': 0.88},
{'name': 'Charlie', 'grade': 78, 'attendance': 0.92},
{'name': 'Diana', 'grade': 95, 'attendance': 0.85},
]
# First filter for high performers, then get their names
honor_roll = list(map(
lambda s: s['name'],
filter(
lambda s: s['grade'] >= 90 and s['attendance'] >= 0.9,
students
)
))
print("\nHonor Roll Students:")
for student in honor_roll:
print(f"- {student}")
Organizing Data with sorted()
The sorted() function is like having a skilled librarian who can arrange books according to any criteria you specify. It's incredibly flexible and can handle complex sorting requirements.
Basic and Advanced Sorting
# Basic sorting
numbers = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]
sorted_numbers = sorted(numbers)
print(f"Sorted numbers: {sorted_numbers}")
# Custom sorting with key function
books = [
{'title': 'Python Basics', 'year': 2020, 'rating': 4.5},
{'title': 'Advanced Python', 'year': 2019, 'rating': 4.8},
{'title': 'Python Cookbook', 'year': 2021, 'rating': 4.6}
]
# Sorting by multiple criteria
def book_sort_key(book):
"""Sort books by rating (descending) then year (ascending)"""
return (-book['rating'], book['year'])
sorted_books = sorted(books, key=book_sort_key)
print("\nBooks by Rating (then Year):")
for book in sorted_books:
print(f"{book['title']}: {book['rating']}★ ({book['year']})")
# Case-insensitive sorting
names = ['alice', 'Bob', 'charlie', 'David']
sorted_names = sorted(names, key=str.lower)
print(f"\nSorted names: {sorted_names}")
Tracking Position with enumerate()
The enumerate() function is like adding numbered tabs to a document - it helps you keep track of where you are in the sequence.
Using enumerate() Effectively
# Basic enumeration
tasks = ['Review code', 'Write tests', 'Update docs']
print("Today's Tasks:")
for index, task in enumerate(tasks, start=1):
print(f"{index}. {task}")
# Practical Example: Finding Word Positions
def find_word_positions(text, target):
"""Find all positions of a word in text"""
words = text.split()
return [
pos
for pos, word in enumerate(words)
if word.lower() == target.lower()
]
text = "The quick brown fox jumps over the lazy fox"
fox_positions = find_word_positions(text, "fox")
print(f"\n'Fox' appears at positions: {fox_positions}")
# Creating indexed dictionaries
colors = ['red', 'green', 'blue']
color_dict = {
index: color
for index, color in enumerate(colors)
}
print(f"\nColor lookup: {color_dict}")
Practice Exercise: Advanced Data Processing
Let's combine multiple advanced functions to process some real-world data:
# Sales data analysis
sales_data = [
{'product': 'Laptop', 'price': 999.99, 'units': 50},
{'product': 'Mouse', 'price': 29.99, 'units': 200},
{'product': 'Keyboard', 'price': 79.99, 'units': 100},
{'product': 'Monitor', 'price': 299.99, 'units': 75}
]
# Your tasks:
# 1. Calculate total revenue for each product
# 2. Filter for products with revenue > $5000
# 3. Sort by revenue (descending)
# 4. Create a numbered report of top performers
# Solution walkthrough:
def calculate_revenue(item):
"""Add revenue to each sales item"""
return {
**item,
'revenue': item['price'] * item['units']
}
# Step 1 & 2: Calculate revenue and filter
high_performers = filter(
lambda x: x['revenue'] > 5000,
map(calculate_revenue, sales_data)
)
# Step 3: Sort by revenue
sorted_performers = sorted(
high_performers,
key=lambda x: x['revenue'],
reverse=True
)
# Step 4: Create report
print("Top Performing Products:")
for i, product in enumerate(sorted_performers, 1):
print(f"{i}. {product['product']}: "
f"${product['revenue']:,.2f}")
Looking Ahead
Now that you've mastered these advanced functions, you're ready to explore iterable analysis tools in Part 3. We'll look at functions like len(), max(), min(), and others that help you gain insights from your data.