SQL Aggregate Functions

Understanding How to Analyze and Summarize Data in SQL

Step 1: Understanding the Problem

The Cat Shelter Database

Imagine you're managing a cat shelter, and you need to analyze information about your cats and their toys. This is similar to how a school might need to analyze student enrollment numbers or how a library might track book statistics. We need to understand aggregate functions that help us answer questions like "How many cats do we have?" and "What's the age range of our cats?"

Real-World Analogy

Think of aggregate functions like different ways of analyzing a jar of marbles:

COUNT is like counting how many marbles you have in total.

MIN is like finding the smallest marble in the jar.

MAX is like finding the largest marble.

GROUP BY is like sorting marbles by color and counting each group.

HAVING is like only showing color groups that have more than a certain number of marbles.

Our Database Structure

CREATE TABLE cats (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT,
    birth_year INTEGER
);

CREATE TABLE toys (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT,
    cat_id INTEGER,
    FOREIGN KEY (cat_id) REFERENCES cats(id) ON DELETE CASCADE
);

Step 2: Devising a Plan

1. Start with basic counting to understand the dataset size

2. Analyze age-related data using MIN and MAX

3. Group data to understand toy distribution

4. Filter grouped data to find patterns

5. Combine techniques for complex analysis

Step 3: Carrying Out the Plan

Basic Counting and Analysis

-- Count total number of cats
SELECT COUNT(*) as total_cats
FROM cats;

-- Find the oldest and youngest cats
SELECT 
    MIN(birth_year) as oldest_cat_birth_year,
    MAX(birth_year) as youngest_cat_birth_year
FROM cats;

Let's understand what each query does:

The COUNT query is like taking attendance in a classroom - it tells us exactly how many cats we have in our database. The asterisk (*) means we're counting all rows.

For finding the oldest and youngest cats, we use MIN and MAX on the birth_year column. Remember that the higher the birth year, the younger the cat - just like how a person born in 2020 is younger than someone born in 2000.

Advanced Analysis with GROUP BY and HAVING

-- Count toys per cat
SELECT 
    cats.name,
    COUNT(toys.id) as toy_count
FROM cats
LEFT JOIN toys ON cats.id = toys.cat_id
GROUP BY cats.id, cats.name;

-- Find cats with multiple toys
SELECT 
    cats.name,
    COUNT(toys.id) as toy_count
FROM cats
LEFT JOIN toys ON cats.id = toys.cat_id
GROUP BY cats.id, cats.name
HAVING COUNT(toys.id) >= 2;

These more complex queries help us understand patterns in our data:

The first query is like creating a toy inventory for each cat. We use GROUP BY to organize toys by cat, similar to sorting a child's toys into separate bins.

The second query finds our "spoiled" cats - those with multiple toys. The HAVING clause filters our groups after they're formed, like checking which toy bins are especially full.

Step 4: Looking Back and Reflecting

Verifying Our Results

For each query, we should verify that:

Our COUNT results match the visible rows in the table

The birth years make sense (no cats from 1800 or 2025)

Every cat appears in the toy count, even those with zero toys

The "spoiled cats" list correctly identifies those with multiple toys

Common Challenges and Solutions

When working with aggregates, watch out for:

Forgetting to use LEFT JOIN when you want to include cats with no toys

Confusing GROUP BY requirements with different SQL engines

Missing NULL values in your calculations

Mixing aggregated and non-aggregated columns incorrectly

Additional Practice Scenarios

Try These Queries

Write queries to find:

The average birth year of all cats

How many cats were born in each year

The cats with the most toys (top 3)

Years where we have more than one cat born

Real-World Applications

These aggregate patterns appear in many scenarios:

Retail: Analyzing sales totals by department

Education: Calculating class size averages

Healthcare: Tracking patient visits per doctor

Social Media: Counting posts per user

Further Learning

Advanced Topics to Explore

To deepen your understanding, consider learning about:

Window functions for advanced aggregation

Subqueries with aggregates

Complex grouping with ROLLUP and CUBE

Performance optimization for aggregate queries