A Complete Guide to Relational Database Design

Understanding Database Design Through Real-World Examples

Imagine you're organizing a large library. Before you place a single book on the shelves, you need to decide how to organize everything. Will you group books by genre? By author? How will you track which books are checked out, and by whom? These are the exact kinds of questions we ask when designing a database. Just as a well-organized library makes it easy to find and manage books, a well-designed database makes it easy to store and retrieve information.

Database design is like creating a blueprint for your data's home. Just as architects carefully plan each room's purpose and how they connect, database designers must think about what data they need to store and how different pieces of information relate to each other.

The Four Stages of Relational Database Design

Let's walk through each stage of database design using a practical example: designing a system for a small online bookstore. We'll build our understanding step by step, seeing how each decision affects the final design.

Stage 1: Define the Purpose and Entities

The first stage is like creating a list of requirements for building a house. What rooms do you need? What will happen in each room? Similarly, we need to identify what our database needs to track and store.

For our bookstore example, let's think through the requirements:

/* Purpose of our database:
 * - Track inventory of books
 * - Manage customer information
 * - Process orders
 * - Maintain author information
 */

/* Main entities (tables) we'll need:
 * Books - to store information about each book
 * Authors - to store information about book authors
 * Customers - to store customer details
 * Orders - to track purchases
 */

-- Here's how we might create our books table:
CREATE TABLE books (
    id INTEGER PRIMARY KEY,
    title TEXT NOT NULL,
    isbn VARCHAR(13),
    publication_year INTEGER,
    price DECIMAL(10,2),
    description TEXT,
    stock_quantity INTEGER
);

-- And our customers table:
CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email VARCHAR(100) UNIQUE,
    address TEXT,
    phone VARCHAR(20)
);

Stage 2: Identify Primary Keys

Think of primary keys like unique identifiers in the real world - just as every person has a unique social security number, every record in our database needs a unique identifier. This helps us avoid confusion and maintain data integrity.

/* For our bookstore, we need primary keys for:
 * - Books (book_id)
 * - Authors (author_id)
 * - Customers (customer_id)
 * - Orders (order_id)
 */

CREATE TABLE authors (
    author_id INTEGER PRIMARY KEY,  -- Unique identifier for each author
    name TEXT NOT NULL,
    biography TEXT,
    birth_year INTEGER
);

CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,  -- Unique identifier for each order
    customer_id INTEGER,           -- Will be a foreign key
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    total_amount DECIMAL(10,2),
    status VARCHAR(50)
);

Stage 3: Establish Table Relationships

Now we need to connect our tables in meaningful ways. Think of this like drawing lines between related pieces of information. In our bookstore, we need to know which authors wrote which books, which customers placed which orders, and which books are in each order.

-- Adding foreign keys to connect our tables
CREATE TABLE book_authors (
    book_id INTEGER,
    author_id INTEGER,
    FOREIGN KEY (book_id) REFERENCES books(id),
    FOREIGN KEY (author_id) REFERENCES authors(author_id),
    PRIMARY KEY (book_id, author_id)  -- Books can have multiple authors
);

CREATE TABLE order_details (
    order_id INTEGER,
    book_id INTEGER,
    quantity INTEGER,
    price_at_time DECIMAL(10,2),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (book_id) REFERENCES books(id)
);

Stage 4: Apply Normalization Rules

Normalization is like organizing a messy closet. We want to ensure everything has its proper place and there's no unnecessary duplication. Let's look at how we apply the three main normal forms to our bookstore database.

First Normal Form (1NF): Ensure atomic values

-- Bad design (violates 1NF):
CREATE TABLE books_bad (
    id INTEGER PRIMARY KEY,
    title TEXT,
    authors TEXT  -- Contains multiple authors as "Author1, Author2"
);

-- Good design (follows 1NF):
CREATE TABLE books_good (
    id INTEGER PRIMARY KEY,
    title TEXT
);

CREATE TABLE book_authors (
    book_id INTEGER,
    author_id INTEGER
);

Second Normal Form (2NF): Remove partial dependencies

-- Bad design (violates 2NF):
CREATE TABLE order_items_bad (
    order_id INTEGER,
    book_id INTEGER,
    quantity INTEGER,
    book_title TEXT,  -- This depends only on book_id, not the full key
    price DECIMAL(10,2)
);

-- Good design (follows 2NF):
CREATE TABLE order_items_good (
    order_id INTEGER,
    book_id INTEGER,
    quantity INTEGER,
    price_at_time DECIMAL(10,2)  -- Historical price when ordered
);

Third Normal Form (3NF): Remove transitive dependencies

-- Bad design (violates 3NF):
CREATE TABLE orders_bad (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    customer_zip_code TEXT,  -- This depends on customer_id, not the order id
    order_date TIMESTAMP
);

-- Good design (follows 3NF):
CREATE TABLE orders_good (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    order_date TIMESTAMP
);

Putting It All Together: A Complete Example

Let's see how all these concepts come together in a complete database design for our bookstore:

-- Create the authors table
CREATE TABLE authors (
    author_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    biography TEXT,
    birth_year INTEGER
);

-- Create the books table
CREATE TABLE books (
    book_id INTEGER PRIMARY KEY,
    title TEXT NOT NULL,
    isbn VARCHAR(13) UNIQUE,
    publication_year INTEGER,
    price DECIMAL(10,2),
    stock_quantity INTEGER,
    CONSTRAINT valid_isbn CHECK (LENGTH(isbn) = 13),
    CONSTRAINT valid_year CHECK (publication_year > 1000)
);

-- Create the book_authors join table
CREATE TABLE book_authors (
    book_id INTEGER,
    author_id INTEGER,
    author_order INTEGER,  -- For multiple authors
    PRIMARY KEY (book_id, author_id),
    FOREIGN KEY (book_id) REFERENCES books(book_id),
    FOREIGN KEY (author_id) REFERENCES authors(author_id)
);

-- Create the customers table
CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email VARCHAR(100) UNIQUE,
    address TEXT,
    phone VARCHAR(20)
);

-- Create the orders table
CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(50) DEFAULT 'pending',
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Create the order_details table
CREATE TABLE order_details (
    order_id INTEGER,
    book_id INTEGER,
    quantity INTEGER CHECK (quantity > 0),
    price_at_time DECIMAL(10,2),
    PRIMARY KEY (order_id, book_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (book_id) REFERENCES books(book_id)
);

Let's add some sample data to see how it all works together:

-- Add an author
INSERT INTO authors (author_id, name, birth_year) VALUES
    (1, 'Jane Austen', 1775);

-- Add a book
INSERT INTO books (book_id, title, isbn, price) VALUES
    (1, 'Pride and Prejudice', '9780141439518', 12.99);

-- Connect author to book
INSERT INTO book_authors (book_id, author_id, author_order) VALUES
    (1, 1, 1);

-- Add a customer
INSERT INTO customers (customer_id, name, email) VALUES
    (1, 'John Smith', 'john@example.com');

-- Create an order
INSERT INTO orders (order_id, customer_id) VALUES
    (1, 1);

-- Add books to the order
INSERT INTO order_details (order_id, book_id, quantity, price_at_time) VALUES
    (1, 1, 2, 12.99);

Best Practices and Common Pitfalls

When designing your database, keep these important principles in mind:

Choose appropriate data types: Use the most specific data type that fits your needs. For example, use DATE for dates instead of TEXT, and DECIMAL for monetary values instead of FLOAT.

Plan for growth: Design your schema to accommodate future requirements. For example, don't assume a user will only have one phone number or address.

Use meaningful constraints: Add CHECK constraints, NOT NULL constraints, and UNIQUE constraints where appropriate to maintain data integrity.

-- Example of good constraint usage
CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    price DECIMAL(10,2) CHECK (price >= 0),
    sku VARCHAR(20) UNIQUE,
    status VARCHAR(20) CHECK (status IN ('active', 'discontinued', 'out_of_stock'))
);

Index strategically: Create indexes on columns that are frequently used in WHERE clauses and joins, but remember that too many indexes can slow down write operations.

-- Create indexes for frequently queried columns
CREATE INDEX idx_books_isbn ON books(isbn);
CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_order_details_book ON order_details(book_id);

Conclusion

Good database design is like building a strong foundation for a house - it supports everything that comes after. By following the four stages of relational database design and applying normalization rules, you create a database structure that is efficient, maintainable, and reliable.

Remember that database design is often iterative. You might need to revisit and refine your design as you better understand your data requirements. Don't be afraid to make changes early in the process - it's much easier to modify your design before you have lots of data and dependent applications.