Welcome
Database Management System

Course: COMP 163
Time: 08:00 AM - 09:20 AM on Monday, Wednesday, Friday
Location: John T Chambers Technology Center 114 (CTC 114)

Week Topic Assignment (Canvas + GitHub) Database Comment
1 (Jan 13, 15, 17) Introduction to Databases Management System Lab 1 + Homework 1 (2w) SQLite
2 (Jan 22, 24) ERM and SQL Holiday on Monday
3 (Jan 27, 29, 31) Relational Algebra Lab 2 + Homework 2 (2w)
4 (Feb 3, 5, 7) Relational Database Interfaces PostgreSQL
5 (Feb 10, 12, 14) Relational Database Advanced SQL Lab 3 + Homework 3 (2w), Project Part 1 (4w)
6 (Feb 19, 21) Relational Database Normalization Holiday on Monday
7 (Feb 24, 26, 28) Relational Database Table Indexing MariaDB
8 (Mar 3, 5, 7) Midterm Preparation Midterm (Mar 7)
9 ( - ) Holiday Week
10 (Mar 17, 19, 21) Relational Database Transactions Lab 4 + Homework 4 (2w), Project Part 2 (4w) OracleSQL
11 (Mar 24, 26, 28) Relational Database Security
12 (Mar 31, Apr 2, Apr 4) Relational Database and Data Distribution Lab 5, Homework 5 (2w)
13 (Apr 14, 16, 18) Relational Database and Data Aggregation ClickHouse
14 (Apr 21, 23, 25) Project Presentation Project Presentation (15 min)
15 (Apr 28) Final Preparation Classes End on Tuesday (Apr 29)
16 (May 7) Final (May 7, 10-11 AM)
Today
  1. Review Topics
  2. Review Syllabus
Module 1
Introduction to Databases Management System
Before 1900: Manual Systems
  • Before 1440 - Manual Writing: Information was written by hand, making copying and dissemination extremely slow and labor-intensive
  • 1440 - Printing Press: Invented by Johannes Gutenberg, it enabled faster copying and dissemination of written information
  • 1800s - Shipping Letters: Physical mail was sent by ship. A letter from here to Hawaii could take up to two weeks to arrive
20th Century: The Digital Revolution
  • 1970 - Relational Databases: Edgar F. Codd introduced the relational model, which became the foundation of modern database management systems (DBMS)
  • 1980s - SQL Standardization: Structured Query Language (SQL) became the standardized way to interact with relational databases
  • 1990s - Client-Server Databases: Systems like Oracle and Microsoft SQL Server facilitated centralized databases accessible by multiple users over a network
21st Century: Connectivity and Real-Time Revolution
  • 2000s - NoSQL Databases: Non-relational databases like MongoDB and Cassandra emerged to handle unstructured and semi-structured data
  • 2010s - Cloud Databases: Platforms like AWS and Azure enabled scalable, distributed databases accessible over the internet
  • 2010s - Big Data: Systems like Hadoop and Spark allowed storage and processing of massive datasets across distributed environments
  • 2020s - AI Integration: Artificial intelligence began improving query optimization, data analysis, and automated insights in databases
  • 2020s - Edge Databases: Localized databases for IoT and real-time applications reduced latency and improved data processing efficiency
What is Databases Management System 🤔

Databases solve problems of managing, storing, and retrieving vast amounts of data efficiently.

  • Why centralize data? To avoid duplication and inconsistencies
  • Why organize data? To make access faster and more reliable
  • Why automate data management? To save time and reduce errors
Why Databases Management System 🤔
IoT Devices
Full Stack Engineer
More reasons
  • Explosion of Data: The amount of data generated globally is growing exponentially, requiring systems that can store, manage, and retrieve it efficiently
  • Core Value Across Domains: Data is the foundation of industries like healthcare, finance, e-commerce, and social media, driving insights and innovation
  • Reliability and Efficiency: DBMS ensure data integrity, availability, and security in critical applications
  • Provides a foundation for understanding unstructured data and the evolution of NoSQL databases
Why Relational Databases Management System?
RDMS
Module 2
ERM and SQL
Every Database Management System Starts with Requirements

Requirement

The system should allow users to rent bikes easily

Requirement

The system should allow registered users to rent a bike by selecting a location, choosing an available bike, and completing payment through the application within three steps

Requirement

Purpose:Bike Rental Database System to Manage rentals and track customers, bikes, and transactions


  • Track the availability of bikes, including their unique ID, condition, and location
  • Log rental transactions, incl. the bike ID, customer ID, rental start and return time
  • Maintain customer records, including name, contact details, and rental history
Translating Requirements into Entities and Relationships
What Are Entities?
Definition: Entities represent real-world objects or concepts.
What Are Entities?

Entities are nouns

Examples: Customer, Bike, Rental Transaction

What Are Relationships?

Definition: Relationships define how entities are connected

What Are Relationships?

Relationships are verbs

Examples: Rents, Is part of

Question?
Can Two Entities Have the Same Name?
Are Entity Names Case-Sensitive?

Does it matter if you name an entity Customer or customer

Is it an Entity or a Relationship?

Question: A "Customer rents a Bike."

Is it an Entity or a Relationship?

Question: A "Bike."

Requirement

A travel booking system should allow users to search for, compare, and book flights, hotels, and rental cars. Users must be able to filter results by price, date, and location, and securely complete payments

Requirement

A travel booking system should allow users to search for, compare, and book flights, hotels, and rental cars. Users must be able to filter results by price, date, and location, and securely complete payments

Question?
Translating Textual Requirements into Visual Requirements
Entity-Relationship Diagram (ERD)
Entity-Relationship Diagram (ERD)
Each Entity has a list of Attributes
Technical Mandatory Attributes

ID Attributes: IDs are technically mandatory and serve as the primary key to uniquely identify each entity record.

ERD with primary key attribute
Entity-Relationship Diagram (ERD)
Conceptual Attributes

Description: Optional attributes provide additional details about an entity but are not required to uniquely identify it

ERD with additional attributes
Entity-Relationship Diagram (ERD)
Each Attribute Has a Data Type
Entity Attributes with Data Types
Entity-Relationship Diagram (ERD) showcasing attributes and their data types
SQL Standard Data Types
Data Type Description Example
INTEGER A whole number, no decimal points 42 (Max: ±2,147,483,647)
BIGINT A large integer value 9223372036854775807 (Max: ±9,223,372,036,854,775,807)
DECIMAL(p, s) Exact numeric value with specified precision (p) and scale (s) DECIMAL(10, 2): 12345.67 (Max: 10 digits, 2 after decimal)
FLOAT Approximate numeric value with floating-point precision 3.14159 (Precision depends on implementation)
CHAR(n) Fixed-length character string with length n CHAR(5): 'Hello' (Max: Defined by n)
VARCHAR(n) Variable-length character string with maximum length n VARCHAR(50): 'Mountain Bike' (Max: Defined by n)
TEXT Variable-length text data 'This is a long text.' (Max: Implementation dependent, e.g., 4GB in MySQL)
DATE Date value (YYYY-MM-DD) 2025-01-14 (Range: 1000-01-01 to 9999-12-31)
TIME Time value (HH:MM:SS) 14:30:00 (Range: -838:59:59 to 838:59:59 in MySQL)
TIMESTAMP Date and time value (YYYY-MM-DD HH:MM:SS) 2025-01-14 14:30:00 (Range: 1970-01-01 to 2038-01-19 in UNIX-based systems)
BOOLEAN Logical value (TRUE or FALSE) TRUE (Stored as 1 or 0 in most databases)
BLOB Binary Large Object for storing binary data [Binary data] (Max: Implementation dependent, e.g., 4GB in MySQL)
Question?
What Data Type Do You Recommend?
Entity-Relationship Diagram (ERD) showcasing attributes and their data types

INTEGER, BIGINT, DECIMAL(p, s), FLOAT, CHAR(n), VARCHAR(n), TEXT, DATE, TIME, TIMESTAMP, BOOLEAN, BLOB

Question?
Each Attribute Can Have Constraints
Entity Attributes with Constraints
Entity-Relationship Diagram (ERD) showcasing attributes and their constraints
SQL Attribute Constraints
Constraint Description
PRIMARY KEY Uniquely identifies each record in a table. Automatically implies NOT NULL
FOREIGN KEY Ensures referential integrity by linking a column to the primary key in another table
NOT NULL Ensures a column cannot have NULL values
UNIQUE Ensures all values in a column are distinct
DEFAULT Sets a default value for a column when no value is provided
INDEX Improves the speed of data retrieval operations on a table
AUTO_INCREMENT Automatically generates a unique value for each new record in a column
UNIQUE + NOT NULL Combination often used for candidate keys or alternate unique identifiers
Question?
From ERD to Database Schema
Conceptual vs. Technical (SQL) Terms
Conceptual Term Technical Term
Diagram Schema
Entity Table
Attribute Column
Record Row
Identifier Primary Key
Relationship Foreign Key
Optional Attribute Nullable Column
Mandatory Attribute NOT NULL Column
Entity-Relationship Diagram (ERD) showcasing attributes and their constraints

CREATE TABLE Customer (
    CustomerID INTEGER PRIMARY KEY AUTOINCREMENT,
    Name VARCHAR(100) NOT NULL,
    Email VARCHAR(100) UNIQUE,
    Phone VARCHAR(15)
);

CREATE TABLE Bike (
    BikeID INTEGER PRIMARY KEY AUTOINCREMENT,
    Model VARCHAR(100) NOT NULL,
    Color VARCHAR(50),
    Condition VARCHAR(50),
    Availability BOOLEAN DEFAULT TRUE
);

CREATE TABLE RentalTransaction (
    RentalID INTEGER PRIMARY KEY AUTOINCREMENT,
    CustomerID INTEGER NOT NULL,
    BikeID INTEGER NOT NULL,
    BookingDate DATE NOT NULL,
    ReturnDate DATE,
    FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID),
    FOREIGN KEY (BikeID) REFERENCES Bike(BikeID)
);
                        
Question?
Practice SQL Queries

    CREATE TABLE Customer (
        CustomerID INTEGER PRIMARY KEY AUTOINCREMENT,
        Name VARCHAR(100) NOT NULL,
        Email VARCHAR(100) UNIQUE,
        Phone VARCHAR(15)
    );

    CREATE TABLE Bike (
        BikeID INTEGER PRIMARY KEY AUTOINCREMENT,
        Model VARCHAR(100) NOT NULL,
        Color VARCHAR(50),
        Condition VARCHAR(50),
        Availability BOOLEAN DEFAULT TRUE
    );

    CREATE TABLE RentalTransaction (
        RentalID INTEGER PRIMARY KEY AUTOINCREMENT,
        CustomerID INTEGER NOT NULL,
        BikeID INTEGER NOT NULL,
        BookingDate DATE NOT NULL,
        ReturnDate DATE,
        FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID),
        FOREIGN KEY (BikeID) REFERENCES Bike(BikeID)
    );
                
Today: January 17, 2025
  • Review Questions
  • Case Study:
    • Plant Watering Database Application
    • Entity Relationship Diagram
    • SQLite: Embedded Device Database
    • SQLite: Installation
    • SQLite: CRUD Query
  • Review Lab 1, Homework 1
Review Questions

Why must Front-End, Back-End, and Database data types be aligned?

Examples of Misaligned Data Types
Front-End Back-End Database Result
Text input for phone number String validation for format INTEGER type Formatted numbers (e.g., +1-123-456) are rejected by the database
Dropdown allows decimal selection Expects integer values FLOAT type Back-end fails due to unexpected decimal input
Date picker with local format Assumes ISO 8601 format DATE type Date conversion errors lead to failed database insertion
Challenges of Misaligned Data Types
  • Validation Errors
  • Data Loss
  • Format Mismatch
  • Database Rejections
  • Unexpected Behavior
Entity-Relationship Diagram: Quiz (1/3)

Question 1: What does an entity represent in an ERD?

  • A. A database table
  • B. A real-world object or concept
  • C. A relationship between tables
  • D. A data attribute
Entity-Relationship Diagram: Quiz (2/3)

Question 2: What is the purpose of a relationship in an ERD?

  • A. To store data
  • B. To define data types
  • C. To show how entities connect
  • D. To create unique identifiers
Entity-Relationship Diagram: Quiz (3/3)

Question 3: What is the purpose of the primary key in a database?

  • A. To allow duplicate rows in a table
  • B. To uniquely identify each row in a table
  • C. To establish a relationship between two tables
  • D. To enforce data types for attributes
Database Quiz: Tables (1/3)

Question 1: What does a table in a database represent?

  • A. A collection of databases
  • B. A structured collection of rows and columns
  • C. A single data type
  • D. A relationship between entities
Database Quiz: Table Attributes (2/3)

Question 2: What does a table attribute represent?

  • A. A row in the table
  • B. A unique identifier
  • C. A column that defines a property of the entity
  • D. A collection of tables
Database Quiz: Data Types (3/3)

Question 3: Which of the following is a valid SQL data type?

  • A. INTEGER
  • B. COLUMN
  • C. TABLE
  • D. RELATION
Database Quiz: Tables (1/3)

Question 1: What does a table in a database represent?

  • A. A collection of databases
  • B. A structured collection of rows and columns
  • C. A single data type
  • D. A relationship between entities
Database Quiz: Constraints (1/3)

Question 1: What is the purpose of a primary key?

  • A. To ensure a column accepts NULL values
  • B. To uniquely identify each row in a table
  • C. To define relationships between databases
  • D. To store default values
Database Quiz: Constraints (2/3)

Question 2: What does the NOT NULL constraint ensure?

  • A. A column cannot have duplicate values
  • B. A column cannot have NULL values
  • C. A column must be a primary key
  • D. A column stores only numeric values
Database Quiz: SQL Queries (1/2)

Question 1: How do you select all bicycles that are available?

  • A. SELECT * FROM Bicycle WHERE Available = TRUE;
  • B. SELECT BicycleID FROM Bicycle;
  • C. SELECT * FROM Bicycle WHERE Available = FALSE;
  • D. SELECT All FROM Bicycle;
Database Quiz: SQL Queries (2/2)

Question 2: How do you update bicycle 2 to set it as unavailable?

  • A. UPDATE Bicycle SET Available = TRUE WHERE BicycleID = 2;
  • B. UPDATE Bicycle SET Available = FALSE WHERE BicycleID = 2;
  • C. UPDATE Bicycle SET BicycleID = 2 WHERE Available = FALSE;
  • D. UPDATE Bicycle SET Availability = FALSE;
Question?
Case Study

Plant Watering Database Application

Plant Watering Database Application
Arduino Soil Moisture Sensor
Database Requirements
  • Store sensor readings
  • Save plant-specific moisture thresholds
  • Log watering events and timestamps

What entities do you recommend?

PlantUML Template Editor
@startuml
entity "Entity1" as E1 {
    +Attribute1 : 
    +Attribute2 : 
    +Attribute3 : 
}

entity "Entity2" as E2 {
    +Attribute1 : 
    +Attribute2 : 
    +Attribute3 : 
}

entity "Entity3" as E3 {
    +Attribute1 : 
    +Attribute2 : 
    +Attribute3 : 
}
@enduml
                    
Which SQL Database do you Recommend?
Arduino Soil Moisture Sensor
Why SQLite?
  • Lightweight and fast
  • Zero configuration required
  • Self-contained, no server needed
  • Widely used in embedded systems
  • Reliable with extensive testing
  • Open source and free to use
Chronology of SQLite Versions
  • Version 1.0 (2000): Initial release, basic SQL support
  • Version 2.0 (2001): Added transactions and improved stability
  • Version 3.0 (2004): Major update with support for UTF-8, BLOBs, and more queries

View the complete SQLite timeline

Install SQLite
SQLite Workflow
  1. Create Database:
  2. sqlite3 database_name.db
                            
  3. Create Table:
  4. CREATE TABLE table_name (
        column1 datatype,
        column2 datatype,
        ...
    );
                            
  5. Insert Data:
  6. INSERT INTO table_name (column1, column2, ...)
    VALUES (value1, value2, ...);
                            
Executing SQL Commands
  • From a File:
  • sqlite3 database_name.db < script.sql
                            
  • Directly in Command Line:
  • sqlite3 database_name.db
    > .read script.sql
    > SELECT * FROM table_name;
                            
SQLite Dot Commands
  • .databases: Lists all attached databases
  • .tables: Displays all tables in the current database
  • .schema: Shows the schema of a table or the entire database
  • .read: Execute SQL scripts without typing them manually (e.g., create.sql).
  • .exit: Exits the SQLite shell
  • .help: Lists all available dot commands

Documentation

SQL Query: Plants Needing Water
SELECT 
    PlantID,
    Name,
    WaterLevel,
    WaterThreshold
FROM 
    SensorData
WHERE 
    WaterLevel < (SELECT WaterThreshold FROM Plant WHERE Plant.PlantID = SensorData.PlantID)
ORDER BY 
    Name;
                    
Plant Watering Application: ERD to SQL (1/3)
PlantUML SQL Create Query
                entity "Plant" as Plant {
                    +PlantID : INTEGER
                    +Name : VARCHAR(100)
                    +WaterThreshold : FLOAT
                }
                                    
                CREATE TABLE Plant (
                    PlantID INTEGER PRIMARY KEY,
                    Name VARCHAR(100) NOT NULL,
                    WaterThreshold FLOAT
                );
                                    
Plant Watering Application: ERD to SQL (2/3)
PlantUML SQL Create Query
                entity "Sensor" as Sensor {
                    +SensorID : INTEGER
                    +Location : VARCHAR(100)
                    +Type : VARCHAR(50)
                }
                                    
                CREATE TABLE Sensor (
                    SensorID INTEGER PRIMARY KEY,
                    Location VARCHAR(100),
                    Type VARCHAR(50)
                );
                                    
                entity "SensorData" as SensorData {
                    +SensorDataID : INTEGER
                    +SensorID : INTEGER
                    +PlantID : INTEGER
                    +WaterLevel : FLOAT
                    +Timestamp : DATETIME
                }
                                    
                CREATE TABLE SensorData (
                    SensorDataID INTEGER PRIMARY KEY,
                    SensorID INTEGER NOT NULL,
                    PlantID INTEGER NOT NULL,
                    WaterLevel FLOAT,
                    Timestamp DATETIME,
                    FOREIGN KEY (SensorID) REFERENCES Sensor(SensorID),
                    FOREIGN KEY (PlantID) REFERENCES Plant(PlantID)
                );
                                    
Plant Watering Application: ERD to SQL (3/3)
PlantUML SQL Create Query
                entity "WateringLog" as WateringLog {
                    +LogID : INTEGER
                    +PlantID : INTEGER
                    +WateredAt : DATETIME
                }
                                    
                CREATE TABLE WateringLog (
                    LogID INTEGER PRIMARY KEY,
                    PlantID INTEGER NOT NULL,
                    WateredAt DATETIME,
                    FOREIGN KEY (PlantID) REFERENCES Plant(PlantID)
                );
                                    
Today: January 22, 2025
  • Review Questions
  • Modeling Relationship in ERM
  • Modeling Relationship in SQL
  • Query Types in SQL
  • Review Lab 1, Homework 1
SQLite Queries

What is a difference between SQL and SQLite?

  • A. SQL is a standard language, while SQLite is a database management system
  • B. SQLite is serverless, while SQL databases usually require a server
  • C. SQL does not support transactions, but SQLite does
  • D. SQLite can only be used with web applications
SQLite Queries

Which SQLite command is used to display the table schema?

  • A. .show table_name;
  • B. DESCRIBE table_name;
  • C. .schema table_name;
  • D. SHOW COLUMNS FROM table_name;
SQLite Queries

Which SQLite command executes SQL statements from a file?

  • A. .import file_name.sql;
  • B. .run file_name.sql;
  • C. .execute file_name.sql;
  • D. .read file_name.sql;
SQL Queries

Which is the correct syntax to create a table in SQLite?

  • A. CREATE DATABASE Students;
  • B. CREATE TABLE Students (ID INTEGER PRIMARY KEY, Name TEXT, Age INTEGER);
  • C. CREATE Students TABLE (ID INTEGER PRIMARY KEY, Name TEXT, Age INTEGER);
  • D. MAKE TABLE Students (ID INTEGER, Name TEXT, Age INTEGER);
SQL Queries

Which syntax updates the age of student 'Alice' in the table?

  • A. UPDATE Students SET Age = 23 WHERE Name = 'Alice';
  • B. MODIFY Students SET Age = 23 WHERE Name = 'Alice';
  • C. CHANGE Students SET Age = 23 WHERE Name = 'Alice';
  • D. ALTER Students SET Age = 23 WHERE Name = 'Alice';
SQL Queries

How do you write an SQL query to create a table named 'Book' with columns for ID, Title, and Author?

Please write answer here

Question?
Today: January 22, 2025
  • Review Questions ✓
  • Modeling Relationship in ERM
  • Modeling Relationship in SQL
  • Query Types in SQL
  • Review Lab 1, Homework 1

How to model relationships between entities in ERD?

1 Entity
0 Relationship

One-to-One ERD

2 Entity
1 Relationship

One-to-Many ERD

3 Entity
2 Relationship

Many-to-Many ERD
Why Use Minimum Relationships?
  • Performance: Fewer relationships mean faster queries
  • Simplicity: Easier to design, maintain, and understand
  • Storage Efficiency: Reduces unnecessary data duplication
  • Query Optimization: Minimizes complex joins and indexing overhead
  • Consistency: Less chance of data inconsistencies
Relationship Types (Cardinality)
1:1

Each record links to one record

Entities: 1

1:N

One record links to many records

Entities: 2

N:N

Many records link to many records

Entities: 3

When to Use Relationship Type?
Cardinality When to Use Example
1:1 When each record in one table corresponds to exactly one record in another table User and Profile
1:N When one record can be associated with multiple records in another table Library and Books
N:N When multiple records in both tables can be related to each other Students and Courses

How to query relationships between entities in ERD?


SELECT Book.title FROM Book WHERE Book.isbnId = '978-1-';

SELECT Book.title FROM Book, ISBN WHERE Book.bookId = ISBN.bookId AND ISBN.isbnId = '978-1-';

SELECT Book.title FROM Book, ISBN, Book_ISBN WHERE Book.bookId = Book_ISBN.bookId AND ISBN.isbnId = Book_ISBN.isbnId AND ISBN.isbnId = '978-1-';
                
Which Relationship?
# Scenario Which Cardinality?
1 Book and ISBN
2 Author and Book
3 Student and borrow Book
4 Student and Student ID
5 Book and Publisher
6 Publisher and Books
7 Library and Book
8 Book and Format (e.g., eBook, hardcover)
9 Book and Catalog Number in the Library System
10 Reader and Book Review
Today: January 22, 2025
  • Review Questions ✓
  • Modeling Relationship in ERM ✓
  • Modeling Relationship in SQL
  • Query Types in SQL
  • Review Lab 1, Homework 1
Implement Relationships in SQL
Book to Library Relationship
Implement Relationships in SQL
  • Primary Key: Uniquely identifies a record in a table
  • Foreign Key: A reference to a primary key in another table
Implement Relationships in SQL
Property Description
Uniqueness Each value must be unique across all records
Non-null Primary key values cannot be NULL
Immutability Once assigned, the primary key value should not change
Single-column or Composite Can consist of one column or a combination of multiple columns
Indexed Primary keys are automatically indexed for faster lookups
Book to Library Relationship

CREATE TABLE Book (
    bookId INTEGER PRIMARY KEY,
    title VARCHAR(255),
    libraryId INTEGER FOREIGN KEY REFERENCES Library(libraryId)
);

CREATE TABLE Library (
    libraryId INTEGER PRIMARY KEY,
    name VARCHAR(255),
    location VARCHAR(255)
);
                            
Implement Relationships in SQL

libraryId INTEGER FOREIGN KEY REFERENCES Library(libraryId) -- establishes a one-to-many relationship:

  • One Library: A single library can have multiple books
  • Many Books: Each book belongs to exactly one library
  • Foreign Key: Ensures referential integrity by linking Book.libraryId to Library.libraryId
  • Cardinality: 1:N (One library to many books)
SQL Queries on Book and Library

-- Insert data into Library
INSERT INTO Library (libraryId, name, location) 
VALUES (1, 'Central Library', 'Downtown');

-- Insert data into Book
INSERT INTO Book (bookId, title, libraryId) 
VALUES (101, 'Database Design', 1);

-- Retrieve books and their library details
SELECT Book.title, Library.name, Library.location 
FROM Book, Library 
WHERE Book.libraryId = Library.libraryId;
                    
SQL Queries on Book and Library
bookId title libraryId
1 Introduction to SQL 101
2 Database Design Principles 101
3 Advanced Database Systems 102
4 Big Data Analytics 103
5 Machine Learning with SQL 101
Question?
Today: January 22, 2025
  • Review Questions ✓
  • Modeling Relationship in ERM ✓
  • Modeling Relationship in SQL ✓
  • Query Types in SQL
  • Review Lab 1, Homework 1
SQL Query Types
Query Type Description Example
Data Definition (DDL) Used to define and modify database schema and attributes CREATE TABLE Library (...);
Data Control (DCL) Used to control access to data within the database GRANT SELECT ON Book TO user;
Data Manipulation (DML) Used to retrieve, insert, update, and delete data in tables SELECT * FROM Book;
Transaction Control (TCL) Used to manage transactions and ensure data integrity COMMIT;
SQL Query Types
# SQL Keyword Type (DDL, DML, DCL, TCL)
1 SELECT
2 INSERT
3 UPDATE
4 DELETE
5 CREATE
6 ALTER
7 DROP
9 GRANT
10 REVOKE
11 COMMIT
12 ROLLBACK
15 BEGIN
Today: January 22, 2025
  • Review Questions ✓
  • Modeling Relationship in ERM ✓
  • Modeling Relationship in SQL ✓
  • Query Types in SQL ✓
  • Review Lab 1, Homework 1
Lab 1 & Homework 1
  • Lab 1: Introduction to SQL and basic queries
  • Homework 1: Read literature and answer questions
  • Reminder: Please install SQLite by Friday
Today: January 24, 2025

Lab Exercise: Relationships and Foreign Keys in SQLite

GitHub Repository

Module 3
Relational Algebra
Today: January 27, 2025
  • Practice Questions
  • Set Theory
  • Relational Algebra
  • Core Operations
  • Advanced Operations
  • Optimization and Efficiency
  • Application Use Case

How to create a table with a foreign key reference in SQLite?

                A. CREATE TABLE Orders (
                       orderId INTEGER, 
                       customerId INTEGER FOREIGN KEY REFERENCES Customers(customerId)
                   );
                B. CREATE TABLE Orders (
                       orderId INTEGER PRIMARY KEY, 
                       customerId INTEGER REFERENCES Customers(customerId)
                   );
                C. CREATE TABLE Orders (
                       orderId INTEGER, 
                       customerId INTEGER, 
                       FOREIGN KEY (customerId) REFERENCES Customers(customerId)
                   );

What happens (default) if a referenced row in the parent table?

  • A. The related rows in the child table are automatically deleted
  • B. The delete operation is blocked to maintain referential integrity
  • C. The related rows are set to NULL
  • D. The related rows remain unchanged

How to add a foreign key constraint to an existing SQLite table?


                A. ALTER TABLE Orders ADD FOREIGN KEY (customerId) REFERENCES Customers(customerId);
                
                B. MODIFY TABLE Orders ADD FOREIGN KEY (customerId) REFERENCES Customers(customerId);
                
                C. SQLite does not support adding foreign keys to existing tables;
                
                D. UPDATE TABLE Orders SET FOREIGN KEY (customerId) REFERENCES Customers(customerId);
                    

Which of the following table names are permitted in SQLite?


                A. CREATE TABLE customer_data (...);
                B. CREATE TABLE order (...);
                C. CREATE TABLE _sales_record (...);
                D. CREATE TABLE Sales-Report (...);
                    
Question?
Set Theory
  • Origin: Developed by Georg Cantor in the late 19th century
  • Purpose: To formally define collections of objects and their relationships
  • Key Concept: Elements, subsets, unions, intersections, and complements
  • No Duplicates: Ensures clarity, consistency, and prevents ambiguity in set operations
  • Impact: Foundation for modern mathematics, logic, and computer science
  • Why It Matters: Provides a universal language for data organization and problem-solving
Set Theory Concepts
Concept Description Example
Union (⋃) Combines elements from two sets, removing duplicates Union
Intersection (∩) Finds common elements between two sets Intersection
Set Difference (−) Elements present in one set but not in the other Set Difference
Set Theory Concepts
Set Theory Diagram
Example of a Set: Fruits
Set of Fruits
Example of a Set: Cars
Set of Cars

In set theory, a set is a collection of any distinct elements

  • Numbers (e.g., {1, 2, 3, 4})
  • Fruits (e.g., {Apple, Banana, Orange})
  • Vehicles (e.g., {Car, Bike, Bus})
  • Concepts (e.g., {Love, Peace, Joy})
  • People (e.g., {Alice, Bob, Charlie})

Relational Algebra ⊆ Set Theory

Sets in Relational Algebra
  • Relations: Tables representing structured data
  • Tuples: Rows within a table representing records
  • Attributes: Columns defining data properties
  • Keys: Uniquely identify records within relations
  • Operations: Selection, Projection, Join, etc
What is a Tuple?

In relational algebra, a tuple is a single unique row in a relation

  • (1, 'Trek', 'Mountain', 1200)
  • (2, 'Giant', 'Road', 1500)
  • (3, 'Cannondale', 'Hybrid', 1000)
Different Relations in Relational Algebra
  • Bike Relation: (bikeId, brand, type, price)
  • Flower Relation: (flowerId, name, color, season)
  • Pet Relation: (petId, name, species, age)
  • Book Relation: (bookId, title, author, genre)
  • Gadget Relation: (gadgetId, name, category, price)
Relational Algebra
  • Collections: Set theory deals with collections of any objects
  • Elements: In relational algebra, elements are tuples
  • Relations: Relational algebra organizes tuples into structured relations
  • Operations: Both use union, intersection, and difference
  • Focus: Relational algebra emphasizes structured relationships, unlike general sets
Origins of Relational Algebra
  • Augustus De Morgan (1806–1871) introduced relational concepts in mathematics
  • Charles Sanders Peirce (1839–1914) expanded and formalized relation theory
  • Alfred Tarski (1901–1983) developed relational algebra as a formal system
  • Edgar F. Codd (1923–2003) invented relational algebra for databases while at IBM

It was created to logically analyze relationships between elements

Relational Algebra Operators from Set Theory
  • Union (⋃): Combines tuples from two relations, removing duplicates
  • Intersection (∩): Finds common tuples between two relations
  • Set Difference (−): Returns tuples present in one relation but not the other
  • Cartesian Product (×): Combines all tuples from two relations
Why Relational Algebra for DBMS 🤔

IBM had vast data but struggled with fast and accurate queries

Edgar F. Codd - Why Relational Algebra for DBMS?
  • Selecting: Retrieving specific data from relations
  • Combining: Joining data from multiple relations
  • Filtering: Narrowing down data based on conditions
Edgar F. Codd - Relational Algebra for Databases
  • Edgar F. Codd (1923–2003) invented relational algebra in 1970
  • He developed it to provide a formal way to manage and query data
  • His work led to the foundation of modern relational databases like SQL
  • Codd aimed to improve data organization, consistency, and accessibility
Edgar F. Codd - Why Relational Algebra for DBMS
  • Set Operations: Finding common elements in two sets using intersection
  • Ordering Relations: Determining if one number is greater than another (e.g., x > y)
  • Equivalence Relations: Checking if two elements are related by symmetry or transitivity
  • Graph Theory: Analyzing connections between nodes in a directed graph
  • Logic: Representing predicates and logical relationships using relational algebra
Added Operators Unique to Relational Algebra
  • Selection (σ): Filters tuples based on a condition
  • Projection (π): Selects specific attributes (columns)
  • Join (⨝): Combines related tuples from different relations
  • Division (÷): Finds tuples that match all values in another relation
Selection (σ)
  • Filters bicycles based on a specified condition
  • Example: σ(Price < 500)(Bicycles)
Selection Operation
Projection (π)
  • Selects specific columns from the bicycle inventory
  • Example: π(brand, price)(Bicycles)
Projection Operation
Union and Projection Operations
  • Two separate bicycle tables: Available and Sold
  • The union operation will combine them, removing duplicates
  • Projection selects specific attributes: bikeId and brand
  • Example: π(bikeId, brand)(Bicycles_Available ⋃ Bicycles_Sold)
Union Before Execution
Union (⋃)
  • Combines two bicycle inventories with the same schema, removing duplicates
  • Example: Bicycles_Available ⋃ Bicycles_Sold
Union Operation
Set Difference (−)
  • Finds bicycles available in stock but not sold
  • Example: Bicycles_Available − Bicycles_Sold
                        @startuml
                        entity Bicycles_Available {
                            bikeId INTEGER
                            brand VARCHAR(50)
                        }
                        entity Bicycles_Sold {
                            bikeId INTEGER
                            brand VARCHAR(50)
                        }
                        Bicycles_Available ||--o{ Bicycles_Sold : Difference
                        @enduml
                    
Cartesian Product (×)
  • Combines all bicycles with stores
  • Example: Bicycles × Stores
                        @startuml
                        entity Bicycles {
                            bikeId INTEGER
                            brand VARCHAR(50)
                        }
                        entity Stores {
                            storeId INTEGER
                            location VARCHAR(100)
                        }
                        Bicycles ||--|| Stores : Product
                        @enduml
                    
Natural Join (⨝)
  • Matches bicycles and sales data using common attributes
  • Example: Bicycles ⨝ Sales
                        @startuml
                        entity Bicycles {
                            bikeId INTEGER
                            brand VARCHAR(50)
                        }
                        entity Sales {
                            saleId INTEGER
                            bikeId INTEGER
                        }
                        Bicycles ||--|| Sales : Join
                        @enduml
                    
Division (÷)
  • Finds bicycle models available in all stores
  • Example: Bicycles ÷ Stores
                        @startuml
                        entity Bicycles {
                            bikeId INTEGER
                            model VARCHAR(50)
                        }
                        entity Stores {
                            storeId INTEGER
                        }
                        Bicycles ||--o{ Stores : Division
                        @enduml
                    
Aggregation Operations
  • Used to compute the total sales of bicycles
  • Example: π(SUM(price))(Sales)
                        @startuml
                        entity Sales {
                            saleId INTEGER
                            price INTEGER
                        }
                        @enduml
                    
Renaming (ρ)
  • Renames the bicycle relation for clarity
  • Example: ρ(BikeInventory)(Bicycles)
                        @startuml
                        entity BikeInventory {
                            bikeId INTEGER
                            brand VARCHAR(50)
                        }
                        @enduml
                    
Relational Algebra Operators
Operator Symbol Description Example
Selection σ Filters rows based on a condition σ(Price < 500)(Bicycles)
Projection π Selects specific columns π(brand, price)(Bicycles)
Union Combines two relations, removing duplicates Bicycles_Available ⋃ Bicycles_Sold
Set Difference Finds tuples in one relation but not the other Bicycles_Available − Bicycles_Sold
Cartesian Product × Combines all tuples from two relations Bicycles × Stores
Intersection Finds common tuples between relations Bicycles_Available ∩ Bicycles_Sold
Join Combines tuples based on common attributes Bicycles ⨝ Stores
Division ÷ Finds tuples matching all values in another relation Projects ÷ RequiredParts
Renaming ρ Renames a relation or attributes ρ(BikeInventory)(Bicycles)
Query Optimization
Pushing Selection Down

Before Optimization:

                        π(brand) (σ(price < 500) (Bicycles × Stores))
                    

After Optimization:

                        σ(price < 500) (π(brand) (Bicycles × Stores))
                    
Reducing Cartesian Product

Before Optimization:

                        σ(Bicycles.storeId = Stores.storeId) (Bicycles × Stores)
                    

After Optimization:

                        Bicycles ⨝ Stores
                    
Combining Projections

Before Optimization:

                        π(brand) (π(bikeId, brand) (Bicycles))
                    

After Optimization:

                        π(brand) (Bicycles)
                    
Eliminating Unnecessary Joins

Before Optimization:

                        π(brand) (Bicycles ⨝ Stores ⨝ Suppliers)
                    

After Optimization:

                        π(brand) (Bicycles ⨝ Stores)
                    
Reordering Joins

Before Optimization:

                        (Bicycles ⨝ Stores) ⨝ Suppliers
                    

After Optimization:

                        (Stores ⨝ Suppliers) ⨝ Bicycles
                    
Breaking Complex Queries

Before Optimization:

                        σ(price < 500) (π(brand, price) (Bicycles ⨝ Stores))
                    

After Optimization:

                        σ(price < 500) (Bicycles) ⨝ π(brand) (Stores)
                    
Avoiding Calculated Fields in Joins

Before Optimization:

                        Bicycles ⨝ σ(DiscountedPrice = price * 0.9) (Stores)
                    

After Optimization:

                        Precompute DiscountedPrice column and then join
                    
Today: January 27, 2025
  • Practice Questions ✓
  • Query Optimization ✓
  • Combine Operations ✓
  • Advanced Operations ✓
  • Application Use Case
  1. What does the selection (σ) operator do in relational algebra?
  2. Which operator is used to retrieve specific columns from a relation?
  3. What is the symbol for the union operator in relational algebra?
  4. How does the Cartesian product (×) operation combine relations?
  5. Which operator is used to eliminate duplicate rows when combining relations?
  6. What is the purpose of the projection (π) operator?
  7. How is a join (⨝) different from a Cartesian product?
  8. Which operator finds records that exist in one relation but not in another?
  9. What does the renaming (ρ) operator do?
Today: January 29, 2025
  • Practice Questions
  • Query Optimization
  • Nested Operators/Relations
  • Application Use Case
  • Assignments 1 and 2
Which of the following is a set?
  • {1, 2, 2, 3, 4}
  • (∅)
  • {1, 2, 3, 4}
  • {"Hello", 2, 3, 4}
Which of the following is a tuple?
  • {1, "Alice", 25, "Engineer"}
  • (1, "Alice", 25, "Engineer")
  • [1, "Alice", 25, "Engineer"]
  • "1, Alice, 25, Engineer"
What is a relation?
  • A collection of duplicate tuples
  • A single row in a table
  • A function mapping keys to values
  • A set of unique tuples with the same attributes
Which relational algebra operator filters rows based on a condition?
  • Projection (π)
  • Selection (σ)
  • Cartesian Product (×)
  • Union (∪)
Which relational algebra operator is used to retrieve specific attributes (columns)?
  • Projection (π)
  • Selection (σ)
  • Join (⨝)
  • Intersection (∩)
Which relational algebra operator combines two relations and removes duplicates?
  • Cartesian Product (×)
  • Difference (-)
  • Union (∪)
  • Selection (σ)
Which relational algebra operator returns tuples that are in both relations?
  • Intersection (∩)
  • Projection (π)
  • Join (⨝)
  • Selection (σ)
Which relational algebra operator pairs every tuple from one relation with every tuple from another?
  • Join (⨝)
  • Cartesian Product (×)
  • Union (∪)
  • Difference (-)
Which relational algebra operator combines tuples from two relations based on a common attribute?
  • Join (⨝)
  • Selection (σ)
  • Projection (π)
  • Cartesian Product (×)
Question?
Query Optimization
Find all roses over $10 from local suppliers
σ Price > 10 ∧ Name = 'Rose' ∧ Location = 'Local' (Flowers ⨝ Suppliers)
Without Optimization
Flowers ⨝ Suppliers → σ Filters

Comparisons: 1,000 × 500 = 500K

With Optimization
(σ Location = 'Local' (Suppliers)) ⨝ (σ Name = 'Rose' ∧ Price > 10 (Flowers))

Comparisons: 100 × 50 = 5K

Pushing Selection Down (first)

Filtering first reduces data size, making both the join and select the result faster

Question?
Query Processing Flow
            ┌───────────────┐      ┌────────────┐      ┌───────────────┐
            │   Relation    │ ───► │  Operator  │ ───► │   Relation    │
            └───────────────┘      └────────────┘      └───────────────┘
                    
Nested Relations - Relational Algebra
                        π Name (
                            σ Price > 10 (
                                Flowers ⨝ (
                                    σ Location = 'Local' (
                                        Suppliers ⨝ (
                                            σ Revenue > 1000 (
                                                Companies
                                            )
                                        )
                                    )
                                )
                            )
                        )
                    
Nested Relations - Relational Algebra
π Name (                               -- (6) Project only the "Name" attribute
    σ Price > 10 (                     -- (5) Select flowers priced over $10
        Flowers ⨝ (                   -- (4) Join Flowers with filtered Suppliers
            σ Location = 'Local' (     -- (3) Select only local suppliers
                Suppliers ⨝ (         -- (2) Join Suppliers with filtered Companies
                    σ Revenue > 1000 ( -- (1) Select companies with revenue > $1000
                        Companies      -- Input relation: Companies
                    )                  -- Output relation: High-revenue companies
                )                      -- Output relation: Local suppliers from revenue companies
            )                          -- Output relation: Flowers with local suppliers
        )                              -- Output relation: Flowers priced over $10
    )                                  -- Output relation: Names of flowers
)
                
Before vs After Query Optimization
Before                                 |   After
-------------------------------------- | --------------------------------------
π Name (                               |   π Name (
    σ Price > 10 (                     |       (σ Price > 10 (Flowers)) ⨝ (
        Flowers ⨝ (                   |           (σ Location = 'Local' (Suppliers)) ⨝ (
            σ Location = 'Local' (     |               σ Revenue > 1000 (Companies)
                Suppliers ⨝ (         |           )
                    σ Revenue > 1000 ( |       )
                        Companies      |   )
                    )                  |
                )                      |
            )                          |
        )                              |
    )                                  |
)                                      |
                  
Optimized Nested Relations - SQL

            SELECT f.Name
            FROM (
                (SELECT * FROM Flowers WHERE Price > 10) f
                JOIN (
                    (SELECT * FROM Suppliers WHERE Location = 'Local') s
                    JOIN (SELECT * FROM Companies WHERE Revenue > 1000) c
                    ON s.CompanyID = c.CompanyID
                )
                ON f.SupplierID = s.SupplierID
            );
                
Question?
Relational Algebra Use Case

Application Use Case

Assignments 1 and 2
  • Lab 1: Introduction to SQL and basic queries
  • Lab 2 and Homework 2: Create and Alter tables
  • Reminder: Please install SQLite by Friday
Question?
Relational Algebra and SQL
Aspect Relational Algebra SQL
Definition Theoretical foundation for database queries Practical language for interacting with relational databases
Operators/Syntax Uses mathematical operators:
σ (Selection), π (Projection), ∪ (Union), ∩ (Intersection), − (Difference), ⋈ (Join).
Uses English-like commands:
SELECT, INSERT, DELETE, JOIN.
Purpose Conceptual understanding of query mechanisms Execution of queries in real databases
Usage Primarily in academic and theoretical contexts Widely used in industry for database management
Projection Operation
Question?
Assignments 1 and 2
  • Lab 1: Introduction to SQL and basic queries
  • Lab 2 and Homework 2: Create and Alter tables
Module 4
Relational Database Interfaces
Today: February 3, 2025
  • Practice Questions
  • Relational Database Interfaces:
    • Admin Interface
    • Command-Line Interface (CLI)
    • Application Programming Interface (API)
    • Application User Interface (UI)
  • Homework 2, Lab 2, Project Part 1
What is a relation?
  • A collection of duplicate tuples
  • A single row in a table
  • A function mapping keys to values
  • A set of unique tuples with the same attributes
Which SQL clause is used to filter rows before processing?
  • WHERE
  • HAVING
  • GROUP BY
  • ORDER BY
Which relational algebra operation eliminates duplicate tuples?
  • Selection (σ)
  • Projection (π)
  • Union (∪)
  • Difference (-)
Which SQL statement correctly defines a foreign key constraint?

    CREATE TABLE Orders (OrderID INT PRIMARY KEY, CustomerID INT FOREIGN KEY REFERENCES Customers(CustomerID)); -- A

    CREATE TABLE Orders (OrderID INT, CustomerID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)); -- B

    CREATE TABLE Orders (OrderID INT, CustomerID INT, FOREIGN KEY REFERENCES Customers(CustomerID)); -- C
    
    CREATE TABLE Orders (OrderID INT PRIMARY KEY, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)) -- D;
          
Which SQL query is optimized by predicate pushdown?
  • σage > 30 (Users ⨝ Orders)
  • age > 30 Users) ⨝ Orders
  • σage > 30 ((Users ⨝ Orders))
  • (Users ⨝ Orders) ⨝ σage > 30 (Users)
Which SQL statement is correct?

CREATE TABLE Table (ID INT PRIMARY KEY, Name VARCHAR(50));

CREATE TABLE Orders (OrderID INT, CustomerID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID))

CREATE TABLE Orders (OrderID INT, CustomerID INT, FOREIGN KEY REFERENCES Customers(CustomerID));

CREATE TABLE Orders (OrderID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID), CustomerID INT);

CREATE TABLE Orders (OrderID INT, CustomerID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID));
          
What is the relationship between Customers and Orders, where one customer can have multiple orders?
  • One-to-One (1:1)
  • One-to-Many (1:N)
  • Many-to-Many (N:N)
  • One-to-Many (N:1)
Which of the following is a key difference between SQLite and PostgreSQL?
  • SQLite supports advanced concurrency control, while PostgreSQL does not
  • PostgreSQL is an embedded database, while SQLite is client-server based
  • PostgreSQL supports ACID transactions, while SQLite does not
  • SQLite is lightweight and serverless, and PostgreSQL is a full-featured relational database
Which relational algebra operator is used to retrieve specific attributes?
  • Projection (π)
  • Selection (σ)
  • Cartesian Product (×)
  • Union (∪)
In query optimization, which heuristic helps reduce the number of tuples early?
  • Predicate Pushdown
  • Join Reordering
  • Expression Flattening
  • Index Lookup
Question?
Relational Database Interfaces
Evolution of Relational Database Interfaces
  • 1970s: Early Databases
  • 1980s: Admin Tools Emerge
  • 1990s: Command-Line Interface (CLI)
  • 2000s: APIs Enable Integration
  • 2010s: Web-Based Interfaces
  • Today: Hybrid & Automated Systems
Database Interfaces by Role & Purpose
Interface Role Purpose Technology
Admin Interface (GUI) Database Administrators User management, schema modifications, monitoring SQLite Browser
Command-Line Interface (CLI) Developers & DBAs Direct SQL execution, automation, scripting SQLite CLI
Application Programming Interface (API) Developers Programmatic database interaction, integrating apps Neon Tech
Web-Based UI Regular Users Accessing, viewing, and interacting with data in a user-friendly way -
What is a Direct Database API Interface?
Direct Database API
  • Connects online directly to a database using drivers
Database Connection
  • Uses a connection string:
  • postgresql://user:password@host:port/database?sslmode=require
Direct DB API vs. REST API
  • Direct DB API: Uses drivers (e.g., psycopg2) for queries
Security
  • Use authentication (username/password, certificates)
  • Restrict access with roles and permissions
  • Enable SSL/TLS for encryption
Python Connection

        import psycopg2
        conn = psycopg2.connect("postgresql://user:pass@host/db?sslmode=require")
        cur = conn.cursor()
        cur.execute("SELECT version();")
        print(cur.fetchone())
      
Best Practices
  • Use connection pooling
  • Restrict database exposure
  • Sanitize inputs to prevent SQL injection
Question?
Checkout DMS Repository
  • Clone the repository:
  • git clone https://github.com/SE4CPS/DMS.git
  • Navigate to the project directory:
  • cd DMS
Database Interfaces by Role & Purpose
Interface Role Purpose Technology
Admin Interface (GUI) Database Administrators User management, schema modifications, monitoring SQLite Browser
Command-Line Interface (CLI) Developers & DBAs Direct SQL execution, automation, scripting SQLite CLI
Application Programming Interface (API) Developers Programmatic database interaction, integrating apps Neon Tech
Web-Based UI Regular Users Accessing, viewing, and interacting with data in a user-friendly way -
What is a Admin Tool Interface?
Today: February 5, 2025
  • Introduction PostgreSQL
  • Relational Database Admin Interfaces
  • Relational Database Developer Interfaces
  • Relational Database User Interfaces
  • Practice Questions
  • Homework 2, Lab 2, Project Part 1
PostgreSQL: An Introduction
  • What is PostgreSQL? Open-source, enterprise-class relational database
  • License: PostgreSQL License (permissive open-source)
  • ACID Compliance: Ensures data integrity
  • Extensibility: Supports custom functions and data types
  • Advanced Features: JSON, full-text search, geospatial support
  • Concurrency Control: Uses MVCC for efficient transactions
  • Cross-Platform: Runs on Linux, Windows, macOS, and cloud
  • Community: Strong open-source support and documentation
PostgreSQL: A Timeline
  • 1986: Developed at UC Berkeley by Michael Stonebraker
  • 1989: POSTGRES 1.0 released
  • 1994: Renamed to PostgreSQL
  • 1996: Open-source development begins
  • 2000: MVCC introduced in PostgreSQL 7.0
  • 2005: Native Windows support added
  • 2010: Streaming replication introduced
  • 2016: Logical replication and partitioning added
  • 2020: Improved indexing and query performance
  • 2023: PostgreSQL 16 improves parallelism and SQL features
PostgreSQL: Open Source License Requirements
Database License Restrictions
SQLite Public Domain No restrictions, free for any use
PostgreSQL PostgreSQL License (BSD-style) Requires keeping copyright notice
PostgreSQL License Notice
        PostgreSQL Database Management System
        Copyright (c) 1996-2024, The PostgreSQL Global Development Group
        
        Permission to use, copy, modify, and distribute this software and its 
        documentation for any purpose, without fee, and without a written 
        agreement is hereby granted, provided that the above copyright notice 
        and this paragraph appear in all copies.
        
        IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY 
        CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, 
        TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN CONNECTION WITH THE 
        SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
          
Why Was PostgreSQL Invented?
  • Ingres Limitations: Lacked extensibility and support for complex data types
  • Advanced Data Needs: Required better handling of objects and relationships
  • Scalability: Needed a more flexible and efficient query engine
  • ACID Compliance: Improved transaction reliability and consistency
  • Open-Source Model: Encouraged community contributions and innovation

PostgreSQL Feature Matrix

PostgreSQL Query Transformer

View Code

Relational Database Admin Interfaces
Neon Database Admin Functions
  • Database Management – Create, delete, and manage databases
  • Branching – Instantly create database branches for development and testing
  • Connection Monitoring – View active connections and resource usage
  • Role & Access Control – Manage user roles and permissions
  • Performance Insights – Analyze query performance and optimize execution
Neon Database Console
pg_stat_statements
  • Tracks execution statistics of SQL queries
  • Records query text, execution time, and I/O stats
  • Helps identify slow or expensive queries
  • Requires the pg_stat_statements extension
  • Useful for database performance tuning
Questions?
Task 1: Creating a Database Backup

Is this an Admin Task, Developer Task, or User Task?

Task 2: Writing a New API Endpoint

Is this an Admin Task, Developer Task, or User Task?

Task 3: Managing User Permissions

Is this an Admin Task, Developer Task, or User Task?

Task 4: Debugging an Application Error

Is this an Admin Task, Developer Task, or User Task?

Task 5: Updating Software Dependencies

Is this an Admin Task, Developer Task, or User Task?

Relational Database Developer Interfaces
Developers: The Bridge
  • Connect frontend applications to databases
  • Handle business logic and data processing
  • Develop APIs for frontend communication
  • Ensure data security and integrity
  • Optimize performance between UI and storage
Backend to Database Connection
  • Uses API key for authentication
  • Connects to PostgreSQL database
  • Ensures secure communication with SSL
PostgreSQL API Connection String Breakdown
  • Protocol: postgresql://
  • Username: neondb_owner
  • Password: *********
  • Host: ep-shrill-tree-a819xf7v-pooler.eastus2.azure.neon.tech
  • Database: neondb
  • SSL Mode: require (ensures secure connection)
Backend to Frontend via REST API
  • Exposes endpoints for frontend access
  • Uses HTTP methods (GET, POST, etc.)
  • Requires authentication for secure access
REST API Request Example

            @app.route('/flowers')
            def manage_flowers():
                conn = get_db_connection()
                cur = conn.cursor()
                cur.execute("SELECT * FROM flowers")
                flowers = cur.fetchall()
                cur.close()
                conn.close()
                return render_template('flowers.html', flowers=flowers)
          
Task 1: Designing a Database Schema

Is this a Backend Database Developer Task?

Task 2: Implementing a REST API

Is this a Backend Database Developer Task?

Task 3: Styling a Web Page

Is this a Backend Database Developer Task?

PostgreSQL Backend Code Example
GitHub: SE4CPS DMS - PostgreSQL
Questions?
Relational Database User Interfaces
Frontend Database-Related Tasks
  • Fetching data from APIs
  • Displaying database records in UI
  • Sending user input to backend
  • Handling API authentication (e.g., tokens)
  • Managing client-side caching
  • Managing client-side validation
PostgreSQL Backend Code Example
GitHub: SE4CPS DMS - PostgreSQL
Question 1

Is fetching data from an API a frontend or backend task?

Question 2

Who is responsible for validating user input before storing it in the database?

Question 3

Does the frontend directly modify the database?

Today: February 7, 2025
  • Practice Question
  • Database Interface and ID
  • Lab - Github and Python
  • Team Project Part 1
  • Homework 2, Lab 2
Database Interface and ID
Frontend
Backend
Database

Which component is responsible for ensuring data validation and access control?

  • Frontend
  • Backend
  • Database
  • All of the above

Which component plays a key role in optimizing system performance and preventing security breaches?

  • Frontend
  • Backend
  • Database
  • All of the above

Which component is essential for managing user requests, processing data, and ensuring secure communication?

  • Frontend
  • Backend
  • Database
  • All of the above

Which component typically handles REST API requests and responses?

  • Frontend
  • Backend
  • Database
  • All of the above

Which component is primarily responsible for handling database connection strings?

  • Frontend
  • Backend
  • Database
  • All of the above
Relational Database Interfaces and ID
Database Interface and ID
Frontend
  • Name
  • Email
  • Username
  • Phone Number
Backend
  • Student ID
  • SSN
  • Transaction ID
  • Session Token
Database
  • Primary Key
  • UUID
  • Foreign Key
  • Auto-increment ID
Book Table Schema

          CREATE TABLE Book (
              book_id SERIAL PRIMARY KEY,
              uuid UUID DEFAULT gen_random_uuid(),
              electronic_article_number VARCHAR(50) UNIQUE NOT NULL,
              isbn VARCHAR(512) UNIQUE NOT NULL,
              title VARCHAR(200) NOT NULL,
              author VARCHAR(100) NOT NULL,
              price DECIMAL(10,2) NOT NULL,
              stock INT NOT NULL
          );
          
Database Interfaces and ID
Frontend
  • Title
  • Author
Backend
  • EAN
  • ISBN
Database
  • Primary Key
  • UUID
  • Foreign Key
  • Auto-increment ID
What is UUID?

UUID (Universally Unique Identifier) is a 128-bit unique identifier

UUID Format

Example: 550e8400-e29b-41d4-a716-446655440000

UUID Format
  • 550e8400 - Time-based or random value
  • e29b - Time-mid segment
  • 41d4 - Version and variant identifier
  • a716 - Clock sequence or unique identifier
  • 446655440000 - Node (often MAC address or random)
Why UUID over Auto Increment?
  • Globally Unique: Prevents ID conflicts in distributed databases
  • Security: Harder to guess compared to sequential auto-increment IDs
  • Scalability: Works well in multi-node environments without synchronization issues
  • Decentralized: No need for a central authority to generate IDs
Database Structures and ID Comparison
Database 1
  • UUID: (550e8400-)
  • Auto-Increment: (1, 2, 3...)
Database 2
  • UUID: (123e4567-)
  • Auto-Increment: (1, 2, 3...)
Database 3
  • UUID: (f47ac10b-)
  • Auto-Increment: (1, 2, 3...)
Generate UUID in SQL

              -- Supported in PostgreSQL, MySQL (8.0+), MariaDB, and others
              CREATE TABLE Users (
                  user_id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
                  name VARCHAR(100) NOT NULL
              );
          
Generate UUID in Python

              import uuid
              
              # Generate a new UUID
              user_id = uuid.uuid4()
              print("Generated UUID:", user_id)
          
Generate UUID in JavaScript (Server)

              import { v4 as uuidv4 } from 'uuid';
              
              // Generate a new UUID
              const userId = uuidv4();
              console.log("Generated UUID:", userId);
          
Generate UUID in JavaScript (Browser)

              // Generate a UUID using the browser's crypto API
              function generateUUID() {
                  return crypto.randomUUID();
              }
              
              console.log("Generated UUID:", generateUUID());
          
Questions?
UUID Specification

UUID RFC 4122

Lab - GitHub and Python
SQL Query: Get all flowers
SQL Query: Find the most expensive flower
SQL Query: Get all customers
SQL Query: Get all orders with customer names
SQL Query: Find customers who have placed orders
SQL Query: Find the total number of orders
SQL Query: Find the total revenue generated from orders
SQL Query: Get all order details for a specific order ID
SQL Query: Find all flowers that are in stock
SQL Query: Find all orders placed after 2024-01-01
SQL Query: Find flowers that cost more than $10
SQL Query: Find customers who have a phone number listed
SQL Query: Find orders with a total amount greater than $50
SQL Query: Find customers with a Gmail address
SQL Query: Find orders placed on 2024-02-01
SQL Query: Find flowers that are red
SQL Query: Find flowers that are red
SQL Query: Find customers who do not have an address
Module 5
Relational Database Advanced SQL
Today: February 10, 2025
  • Practice Question
  • SQL JOIN
  • SQL Aggregation
  • SQL Constraint
  • Team Project Part 1
  • Homework 2, Lab 2
ER Model Question

Which of the following best describes an entity in an ER Model?

  • A real-world object with attributes
  • A table in a database
  • A primary key in a table
  • A stored procedure
SQL Question

Which SQL clause is used to filter records based on a condition?

  • ORDER BY
  • GROUP BY
  • WHERE
  • HAVING
SQLite Question

Which of the following is a major limitation of SQLite compared to PostgreSQL?

  • No support for basic queries
  • No support for full outer joins
  • No support for indexing
  • No support for SELECT statements
Relational Algebra Question

Which relational algebra operation is used to retrieve specific attributes from a relation?

  • Selection (σ)
  • Projection (π)
  • Cartesian Product (×)
  • Union (∪)
Foreign Key Question

What is the primary purpose of a foreign key in a relational database?

  • To enforce referential integrity
  • To improve query performance
  • To store primary key values
  • To increase storage capacity
Why SQL Joins?
  • Merge data (rows) from multiple tables
  • Avoid redundancy & maintain consistency
Bike Store Requirements

The bike store needs a database to manage its inventory, sales, and customer interactions

Inventory Management

The store wants to keep track of all bikes and accessories, along with their suppliers

They need to see which products have suppliers and which do not

Customer Orders

Customers can place orders for bikes and accessories

The store wants a report showing all customers and their orders, but also customers who have never ordered

Sales Performance

The store needs to generate reports comparing all products, including those that have never been sold

Employee Management

Employees handle sales and inventory updates

They need a way to see all employees and the sales they have made, while also seeing employees with no recorded sales

Which Join to Use?
Travel Relationship
Inner Join

Returns only matching rows from both tables


          SELECT * 
          FROM A 
          INNER JOIN B 
          ON A.id = B.id;
          
A.id A.val B.id B.val
1 X 1 Y
Left Join

Returns all rows from the left table and matching rows from the right


          SELECT * 
          FROM A 
          LEFT JOIN B 
          ON A.id = B.id;
          
A.id A.val B.id B.val
1 X 1 Y
2 Z NULL NULL
3 M 3 N
4 P NULL NULL
Right Join

Returns all rows from the right table and matching rows from the left


          SELECT * 
          FROM A 
          RIGHT JOIN B 
          ON A.id = B.id;
          
A.id A.val B.id B.val
1 X 1 Y
NULL NULL 2 Z
3 M 3 N
NULL NULL 4 P
Natural Join with Multiple Columns

Automatically joins columns with the same name and identical values


          SELECT * 
          FROM A 
          NATURAL JOIN B;
          
Table A
id name value
1 Alpha 10
2 Beta 20

Table B
id name score
1 Alpha 90
3 Gamma 85
Result of Natural Join

Matches rows where both id and name are the same

id name value score
1 Alpha 10 90
Full Outer Join

Returns all rows from both tables

Includes unmatched rows with NULL values where no match is found

Full Outer Join Query

          SELECT * 
          FROM A 
          FULL OUTER JOIN B 
          ON A.id = B.id;
          
Table A
id A.val
1 X
2 Y

Table B
id B.val
2 Z
3 W
Result of Full Outer Join
id A.val B.val
1 X NULL
2 Y Z
3 NULL W

Includes unmatched rows from both tables

What is a CROSS JOIN?

Generates the Cartesian product of two tables

CROSS JOIN Query

      SELECT * FROM A CROSS JOIN B;
          
Example Tables
Table A
id name
1 Alice
2 Bob

Table B
id color
10 Red
20 Blue
Result of CROSS JOIN

All possible combinations of rows

A.id A.name B.id B.color
1 Alice 10 Red
1 Alice 20 Blue
2 Bob 10 Red
2 Bob 20 Blue
Question?
Travel Relationship
Travel Relationship
Travel Relationship
Which Join to Apply?

Consider the following use cases on a flower database

Flower Database Use Cases
Use Case Tables Involved Expected Result
Get only flowers that have matching orders Flowers, Orders Only flowers that have been ordered
List all flowers with their orders (if any) Flowers, Orders All flowers, showing NULL for those with no orders
Show all flowers and all orders, even if unmatched Flowers, Orders All flowers and all orders, showing NULL where unmatched
Which Join Would You Use?
  • Inner Join?
  • Left Join?
  • Full Outer Join?
Question?
PostgreSQL Spaceship
SQLite and PostgreSQL Syntax Differences
Feature SQLite PostgreSQL
Auto-increment ID INTEGER PRIMARY KEY SERIAL PRIMARY KEY
Full Outer Join Not supported (use UNION of LEFT and RIGHT JOIN) FULL OUTER JOIN
Foreign Keys PRAGMA foreign_keys = ON Enforced by default
PostgreSQL Meta Commands Table
Command Description
SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'your_table'; List column names and data types for a specific table
SELECT table_name FROM information_schema.tables WHERE table_schema = 'public'; Get all table names from the public schema
SELECT * FROM pg_catalog.pg_tables; Get detailed information about all tables
SELECT * FROM pg_indexes WHERE tablename = 'your_table'; Get all indexes of a specific table
How to Get All Astronauts and Their Missions?
SQL Query

          SELECT A.name, A.role, M.name AS mission_name
          FROM Astronauts A
          LEFT JOIN Missions M
          ON A.missions = M.id;
          
How to Find Missions with No Assigned Astronauts?
SQL Query

          SELECT M.name, M.destination 
          FROM Missions M
          LEFT JOIN Astronauts A
          ON M.id = A.missions
          WHERE A.id IS NULL;
          
How to Get All Alien Encounters with Mission Names?
SQL Query

          SELECT AE.species, AE.communication, M.name AS mission_name
          FROM AlienEncounters AE
          INNER JOIN Missions M
          ON AE.missionId = M.id;
          
Today: February 12, 2025
  • Practice Question
  • SQL Aggregation
  • SQL Constraints
  • SQL JOINS
  • Practice Code
  • Homework 3, Lab 3, Team Project Part 1
Travel Relationship
The Dashboard Effect
  • Real-Time Data Monitoring
  • Improved Decision-Making
  • improved Team Collaboration
  • Increased Efficiency and Productivity
  • Better Performance Tracking
Dashboard
Practice Question
Which type of SQL JOIN returns only the matching rows between two tables?
  • LEFT JOIN
  • RIGHT JOIN
  • INNER JOIN
  • FULL OUTER JOIN
Answer: INNER JOIN

Query:

SELECT bikes.model, sales.price 
        FROM bikes 
        INNER JOIN sales ON bikes.id = sales.bike_id;
        Bikes
        +----+-----------+------+
        | id | model    | type |
        +----+-----------+------+
        | 1  | RoadX    | Road |
        | 2  | SpeedPro | Road |
        | 3  | TrailMax | MTB  |
        +----+-----------+------+
            
        Sales
        +----+----------+--------+
        | id | bike_id | price  |
        +----+----------+--------+
        | 1  | 1       | 1200   |
        | 2  | 2       | 1500   |
        +----+----------+--------+
            
        Result
        +-----------+--------+
        | model     | price  |
        +-----------+--------+
        | RoadX     | 1200   |
        | SpeedPro  | 1500   |
        +-----------+--------+
            
What happens when a LEFT JOIN is used between two tables?
  • It returns only the matching rows from both tables
  • It returns all rows from the left table and only matching rows from the right table
  • It returns all rows from both tables, filling gaps with NULLs
  • It returns only the non-matching rows between both tables
Answer: LEFT JOIN

Query:

SELECT bikes.model, sales.price 
        FROM bikes 
        LEFT JOIN sales ON bikes.id = sales.bike_id;
        Bikes
        +----+-----------+------+
        | id | model    | type |
        +----+-----------+------+
        | 1  | RoadX    | Road |
        | 2  | SpeedPro | Road |
        | 3  | TrailMax | MTB  |
        | 4  | CityRide | City |
        +----+-----------+------+
            
        Sales
        +----+----------+--------+
        | id | bike_id | price  |
        +----+----------+--------+
        | 1  | 1       | 1200   |
        | 2  | 2       | 1500   |
        +----+----------+--------+
            
        Result
        +-----------+--------+
        | model     | price  |
        +-----------+--------+
        | RoadX     | 1200   |
        | SpeedPro  | 1500   |
        | TrailMax  | NULL   |
        | CityRide  | NULL   |
        +-----------+--------+
            
Which JOIN will return all records from the right table and only the matching records from the left table?
  • LEFT JOIN
  • INNER JOIN
  • RIGHT JOIN
  • CROSS JOIN
Answer: RIGHT JOIN

Query:

SELECT bikes.model, sales.price 
        FROM bikes 
        RIGHT JOIN sales ON bikes.id = sales.bike_id;
        Bikes
        +----+-----------+------+
        | id | model    | type |
        +----+-----------+------+
        | 1  | RoadX    | Road |
        | 2  | SpeedPro | Road |
        | 3  | TrailMax | MTB  |
        +----+-----------+------+
            
        Sales
        +----+----------+--------+
        | id | bike_id | price  |
        +----+----------+--------+
        | 1  | 1       | 1200   |
        | 2  | 2       | 1500   |
        | 3  | 5       | 800    |
        +----+----------+--------+
            
        Result
        +-----------+--------+
        | model     | price  |
        +-----------+--------+
        | RoadX     | 1200   |
        | SpeedPro  | 1500   |
        | NULL      | 800    |
        +-----------+--------+
            
Question?
SQL Aggregation
What is Aggregation?
  • Combines multiple values into a single summary
  • Helps in decision-making by reducing complexity
  • Used in databases, financial reports, and AI models
  • Common aggregation types: sum, min, max, avg, count, and grouping
What is a Metric?

A metric is a measurable value used to track performance

  • Example: Total sales, number of users
  • Aggregation: SUM(sales), COUNT(users)
What is a KPI?

A KPI (Key Performance Indicator) is a metric that measures success

  • Example: Revenue growth rate
  • Aggregation: AVG(revenue_growth)
What is a Target?

A target is a predefined goal for a metric or KPI

  • Example: Achieve $1M in sales
  • Aggregation: Compare SUM(sales) to target
What is a Benchmark?

A benchmark is a standard used for comparison

  • Example: Industry average sales
  • Aggregation: Compare AVG(company_sales) vs. AVG(industry_sales)
How It Relates to Aggregation
  • Metrics and KPIs rely on SUM, AVG, COUNT
  • Targets track progress using aggregated values
  • Benchmarks compare aggregated data across companies
SQL Aggregation Functions
  • SUM() – Adds up values (e.g., total sales)
  • AVG() – Calculates the average (e.g., average price)
  • COUNT() – Counts rows (e.g., number of orders)
  • MAX() – Finds the highest value (e.g., max salary)
  • MIN() – Finds the lowest value (e.g., min salary)
SQL Aggregation Questions
Question SQL Query
Total sales revenue (Metric)
SELECT SUM(sales) FROM orders;
Average order value (KPI)
SELECT AVG(sales) FROM orders;
Number of unique customers
SELECT COUNT(DISTINCT customer_id) FROM orders;
Highest sale recorded
SELECT MAX(sales) FROM orders;
Compare sales with target (Benchmark)
SELECT SUM(sales), 100000 AS target FROM orders;
What is GROUP BY?

The GROUP BY clause groups rows with the same values in specified columns

  • Used with aggregation functions like SUM(), COUNT(), AVG()
  • Groups data to analyze trends and summaries
  • Prevents duplicate calculations in reports
GROUP BY Syntax
        
        SELECT column, AGG_FUNCTION(column)
        FROM table
        GROUP BY column;
        
          
SELECT product, SUM(sales) AS total_sales FROM sales GROUP BY product;
        +----+------------+--------+
        | id | product   | sales  |
        +----+------------+--------+
        | 1  | Bike A    | 500    |
        | 2  | Bike B    | 700    |
        | 3  | Bike A    | 300    |
        | 4  | Bike C    | 900    |
        | 5  | Bike B    | 400    |
        +----+------------+--------+
              
        Result:
        +------------+-------------+
        | product   | total_sales |
        +------------+-------------+
        | Bike A    | 800        |
        | Bike B    | 1100       |
        | Bike C    | 900        |
        +------------+-------------+
              
HAVING Syntax
        
        SELECT column, AGG_FUNCTION(column)
        FROM table
        GROUP BY column
        HAVING condition;
        
          
        
        SELECT product, SUM(sales) AS total_sales 
        FROM sales 
        GROUP BY product
        HAVING SUM(sales) > 900;
        
          

Before GROUP BY

        +----+------------+--------+
        | id | product   | sales  |
        +----+------------+--------+
        | 1  | Bike A    | 500    |
        | 2  | Bike B    | 700    |
        | 3  | Bike A    | 300    |
        | 4  | Bike C    | 900    |
        | 5  | Bike B    | 400    |
        +----+------------+--------+
              

After GROUP BY + HAVING

        Result:
        +------------+-------------+
        | product   | total_sales |
        +------------+-------------+
        | Bike B    | 1100       |
        | Bike C    | 900        |
        +------------+-------------+
              
Question?
SQL Constraints
SQL CHECK Constraint

The CHECK constraint enforces rules on column values

  • Ensures data integrity by restricting invalid values
  • Applied at column or table level
  • Rejects any data that does not meet the condition
Example: CHECK Constraint in a Bike Store
        
        CREATE TABLE bikes (
            id INT PRIMARY KEY,
            model VARCHAR(50),
            price DECIMAL(10,2) CHECK (price > 0),
            stock INT CHECK (stock >= 0),
            category VARCHAR(20) CHECK (category IN ('Road', 'MTB', 'City', 'Hybrid')),
            discount DECIMAL(5,2) CHECK (discount BETWEEN 0 AND 50),
            weight DECIMAL(5,2) CHECK (weight > 5)
        );
        
          
Question?
SQL JOINS + Aggregation
Practice Code
PostgreSQL Spaceship
How to Get the Total Number of Missions for Each Role?
SQL Query

          SELECT A.role, COUNT(A.missions) AS total_missions
          FROM Astronauts A
          GROUP BY A.role;
          
Question?
Homework 2 and Lab 2
Team Project Part 1 - GitHub
Today: Homework 3, Lab 3, Project Part 1
Flower Watering System
Frontend
Backend
Database
Question?
Today: February 14, 2025
  • Use Case Scenario
  • Documentation
  • Homework 3, Lab 3, Team Project Part 1
Who has gone on the most space missions?

            SELECT name, missions FROM Astronauts 
            ORDER BY missions DESC LIMIT 1;
          
Which missions are still ongoing?

            SELECT name, destination FROM Missions 
            WHERE status = 'Ongoing';
          
Which scientists have never been on a mission?

            SELECT name FROM Astronauts 
            WHERE role = 'Scientist' AND missions = 0;
          
Which mission had the largest crew?

            SELECT name, crew FROM Missions 
            ORDER BY crew DESC LIMIT 1;
          
Which spaceships are out of fuel?

            SELECT name FROM Spaceship 
            WHERE fuelLevel = 0;
          
Which missions encountered hostile aliens?

            SELECT m.name, a.species FROM AlienEncounters a
            JOIN Missions m ON a.missionId = m.id 
            WHERE a.communication = 'Hostile';
          
List all engineers and the number of missions they have participated in, if any

            SELECT name, missions FROM Astronauts 
            WHERE role = 'Engineer';
          
Which teams have a score greater than 0?

            SELECT team, score FROM Score 
            WHERE score > 0;
          
Find all missions that visited planets starting with 'P'

            SELECT name, destination FROM Missions 
            WHERE destination LIKE 'P%';
          
List all missions along with any alien species encountered

            SELECT m.name, a.species FROM Missions m
            LEFT JOIN AlienEncounters a ON m.id = a.missionId;
          
Which mission had the smallest crew?

            SELECT name, crew FROM Missions 
            ORDER BY crew ASC LIMIT 1;
          
Which spaceships have more than 10 fuel?

            SELECT name, fuelLevel FROM Spaceship 
            WHERE fuelLevel > 10;
          
How many completed missions are there?

            SELECT COUNT(*) FROM Missions 
            WHERE status = 'Completed';
          
Find all engineers who have participated in at least one mission

            SELECT name FROM Astronauts 
            WHERE role = 'Engineer' AND missions > 0;
          
Which team has the highest score?

            SELECT team, score FROM Score 
            ORDER BY score DESC LIMIT 1;
          
Find all scientists and their total number of missions

            SELECT name, missions FROM Astronauts 
            WHERE role = 'Scientist';
          
Which mission had the most alien encounters?

            SELECT m.name, COUNT(a.id) AS encounters FROM Missions m
            JOIN AlienEncounters a ON m.id = a.missionId 
            GROUP BY m.name 
            ORDER BY encounters DESC LIMIT 1;
          
List all spaceships along with their type and fuel level

            SELECT name, shiptype, fuelLevel FROM Spaceship;
          
Find all missions that went to planets with names longer than 6 characters

            SELECT name, destination FROM Missions 
            WHERE LENGTH(destination) > 6;
          
Questions?
PostgreSQL Documentation
Homework 3, Lab 3, Team Project Part 1
Module 6
Relational Database Normalization
Today: February 19, 2025
  • Structured and Unstructured Data
  • Not Normalized: Redundant data and anomalies
  • First Normal Form (1NF): Atomic values, no repeating groups
  • Second Normal Form (2NF): Remove partial dependencies
  • Third Normal Form (3NF): Remove transitive dependencies
  • Practice Exercise: Normalize a sample table
  • Discussion: When to denormalize?
Database Engineering Process in Theory
+--------------+
| Requirements | -->
+--------------+
  
+-----+
| ERD | -->
+-----+
  
+----------+
| Database | -->
+----------+
  
+-------+
| Table | -->
+-------+
  
+------+
| Data |
+------+
              
Structured vs Unstructured Data
Structured (20%) vs. Unstructured Data (80%)
Characteristic Structured Data Unstructured Data
Format Organized (tables) Freeform (text, images)
Storage Relational databases (SQL) NoSQL databases, file systems
Querying Easy to query Difficult to analyze
Examples Spreadsheets, SQL databases Emails, social media posts, videos
Database Engineering Process in Practice
+------+
| Data | -->
+------+
            
+--------------+
| Requirements | -->
+--------------+
  
+-----+
| ERD | -->
+-----+
  
+----------+
| Database | -->
+----------+
  
+-------+
| Table | 
+-------+
  
Stuctured Data
+------------+----------------------+---------------------+--------------------+
| customer   | products             | phone               | address            |
+------------+----------------------+---------------------+--------------------+
| Alice      | Laptop, Mouse        | 123-456-7890        | 12 Main St, NY     |
| Bob        | Phone, Charger       | 987-654-3210        | 34 Elm St, CA      |
| Alice      | Keyboard, Monitor    | 123-456-7890        | 12 Main St, NY     |
+------------+----------------------+---------------------+--------------------+
          
How to query all keyboard products?
Stuctured Data
+------------+----------------------+---------------------+--------------------+
| customer   | products             | phone               | address            |
+------------+----------------------+---------------------+--------------------+
| Alice      | Laptop, Mouse        | 123-456-7890        | 12 Main St, NY     |
| Bob        | Phone, Charger       | 987-654-3210        | 34 Elm St, CA      |
| Alice      | Keyboard, Monitor    | 123-456-7890        | 12 Main St, NY     |
+------------+----------------------+---------------------+--------------------+
          
How to query all products of the customer 1?
Stuctured Data
+------------+----------------------+---------------------+--------------------+
| customer   | products             | phone               | address            |
+------------+----------------------+---------------------+--------------------+
| Alice      | Laptop, Mouse        | 123-456-7890        | 12 Main St, NY     |
| Bob        | Phone, Charger       | 987-654-3210        | 34 Elm St, CA      |
| Alice      | Keyboard, Monitor    | 123-456-7890        | 12 Main St, NY     |
+------------+----------------------+---------------------+--------------------+
          
How to query all products of the customers in CA?

Problems:

  • Repeated customer details (Alice appears twice)
  • Multiple products in a single field
  • Difficult to update customer information
  • We can't query the data with SQL
Data is Structured 🙂
Data is not Normalized 😞
Data Engineering Process in Practice
        +--------------+
        | Unstructured | -->
        +--------------+
                  
+------------+
| Structured | -->
+------------+
                  
+--------------+
| Unnormalized | -->
+--------------+
                  
+-----+
| NF1 |
+-----+
                  
First Normal Form (NF1)

Definition: A table is in First Normal Form when all its attributes contain only atomic (indivisible) values, and each record is unique

Key Characteristics of NF1
  • Each column contains atomic values (no repeating groups or arrays)
  • Each column must hold values of a one type
  • Each row must be unique, typically ensured by a primary key
  • Order of rows and columns does not affect the data's significance
Example of NF1
+----+-----------+-----------------+
| id | name      | hobbies         |
+----+-----------+-----------------+
| 1  | Alice     | Reading         |
| 2  | Bob       | Cycling         |
| 3  | Charlie   | Reading, Hiking |
+----+-----------+-----------------+
        
+----+-----------+-----------------+
| id | name      | hobbies         |
+----+-----------+-----------------+
| 1  | Alice     | Reading         |
| 2  | Bob       | Cycling         |
| 3  | Charlie   | Reading         |   
| 4  | Charlie   | Hiking          |     
+----+-----------+-----------------+
        
Data in NF1
+------------+---------+----------------+-----------+------------+-------+
| customer_id| name    | phone          | products  | city       | state |
+------------+---------+----------------+-----------+------------+-------+
| 1          | Alice   | 123-456-7890   | Laptop    | New York   | NY    |
| 2          | Alice   | 123-456-7890   | Mouse     | New York   | NY    |
| 3          | Bob     | 987-654-3210   | Phone     | California | CA    |
| 4          | Bob     | 987-654-3210   | Charger   | California | CA    |
| 5          | Alice   | 123-456-7890   | Keyboard  | New York   | NY    |
| 6          | Alice   | 123-456-7890   | Monitor   | New York   | NY    |
+------------+---------+----------------+-----------+------------+-------+
          
How to query all keyboard products?
Data in NF1
+------------+---------+----------------+-----------+------------+-------+
| customer_id| name    | phone          | products  | city       | state |
+------------+---------+----------------+-----------+------------+-------+
| 1          | Alice   | 123-456-7890   | Laptop    | New York   | NY    |
| 2          | Alice   | 123-456-7890   | Mouse     | New York   | NY    |
| 3          | Bob     | 987-654-3210   | Phone     | California | CA    |
| 4          | Bob     | 987-654-3210   | Charger   | California | CA    |
| 5          | Alice   | 123-456-7890   | Keyboard  | New York   | NY    |
| 6          | Alice   | 123-456-7890   | Monitor   | New York   | NY    |
+------------+---------+----------------+-----------+------------+-------+
          
How to query all products of the customer 1?
Data in NF1
+------------+---------+----------------+-----------+------------+-------+
| customer_id| name    | phone          | products  | city       | state |
+------------+---------+----------------+-----------+------------+-------+
| 1          | Alice   | 123-456-7890   | Laptop    | New York   | NY    |
| 2          | Alice   | 123-456-7890   | Mouse     | New York   | NY    |
| 3          | Bob     | 987-654-3210   | Phone     | California | CA    |
| 4          | Bob     | 987-654-3210   | Charger   | California | CA    |
| 5          | Alice   | 123-456-7890   | Keyboard  | New York   | NY    |
| 6          | Alice   | 123-456-7890   | Monitor   | New York   | NY    |
+------------+---------+----------------+-----------+------------+-------+
          
How to query all products of the customers in CA?
Unnormalized Data to First Normal Form (1NF)
Step Description Example
1. Identify repeating groups Find attributes with multiple values in a single row ❌ Alice → Laptop, Mouse
2. Remove repeating groups Split multi-valued attributes into separate rows ✔ Alice → Laptop ✔ Alice → Mouse
3. Ensure atomic values Each field should hold a single value ❌ "New York, NY 10001" → ✔ "New York" (city), "NY" (state), "10001" (zip)
4. Add a unique identifier Ensure each row has a primary key ✔ Add "purchase_id"
5. Remove duplicate rows Ensure no duplicate records exist ❌ Alice, Laptop (x2) → ✔ Remove extra entry
1NF Enables Full SQL Usage

Once data is in 1NF:

  • Meets relational algebra requirements
  • Has atomic values and no repeating groups
  • SQL can be fully utilized
History of First Normal Form (1NF)
  • 1970 – Edgar F. Codd introduces 1NF in his relational model
  • 1971 – Codd refines 1NF, emphasizing atomic values and eliminating repeating groups
  • 1980s – 1NF becomes a standard for relational databases
  • 1990s – SQL databases widely adopt 1NF for data consistency
  • 2000s-Present – 1NF remains essential, even as denormalized data emerges
Is NF1 sufficient?
+------------+---------+----------------+-----------+------------+-------+
| customer_id| name    | phone          | products  | city       | state |
+------------+---------+----------------+-----------+------------+-------+
| 1          | Alice   | 123-456-7890   | Laptop    | New York   | NY    |
| 2          | Alice   | 123-456-7890   | Mouse     | New York   | NY    |
| 3          | Bob     | 987-654-3210   | Phone     | California | CA    |
| 4          | Bob     | 987-654-3210   | Charger   | California | CA    |
| 5          | Alice   | 123-456-7890   | Keyboard  | New York   | NY    |
| 6          | Alice   | 123-456-7890   | Monitor   | New York   | NY    |
+------------+---------+----------------+-----------+------------+-------+
          
Historical Cost of Storage (1MB)
  • 1970 – $200,000 (Magnetic Core, Early HDDs)
  • 1980 – $10,000 (HDDs, Floppy Disks)
  • 1990 – $1,000 → $1 (HDDs, CDs)
  • 2000 – $0.10 (HDDs, Flash Storage)
  • 2020 – $0.00005 (SSDs, Cloud Storage)
Early Storage Was Expensive
  • Storage was costly, making efficiency crucial
  • Data redundancy had to be avoided to save space
Redundant Data
+------------+---------+----------------+-----------+------------+-------+
| customer_id| name    | phone          | products  | city       | state |
+------------+---------+----------------+-----------+------------+-------+
| 1          | Alice   | 123-456-7890   | Laptop    | New York   | NY    |
| 2          | Alice   | 123-456-7890   | Mouse     | New York   | NY    |
| 3          | Bob     | 987-654-3210   | Phone     | California | CA    |
| 4          | Bob     | 987-654-3210   | Charger   | California | CA    |
| 5          | Alice   | 123-456-7890   | Keyboard  | New York   | NY    |
| 6          | Alice   | 123-456-7890   | Monitor   | New York   | NY    |
+------------+---------+----------------+-----------+------------+-------+
          
Second Normal Form (2NF)
  • Remove partial dependencies
  • Non-key attributes must depend on the entire primary key
2NF: Eliminating Partial Dependencies
Customers
+------------+----------------+-----------+
| customer_id| phone          | city      |
+------------+----------------+-----------+
| 1          | 123-456-7890   | New York  |
| 3          | 987-654-3210   | California|
+------------+----------------+-----------+
              
Products
+------------+-----------+
| product_id | product   |
+------------+-----------+
| 1          | Laptop    |
| 2          | Mouse     |
| 3          | Phone     |
+------------+-----------+
              
Orders
+----------+-------------+
| order_id | customer_id |
+----------+-------------+
| 1        | 1           |
| 2        | 3           |
+----------+-------------+
              
Composite Keys & Storage
  • A composite key uses multiple columns as a unique identifier
  • A lot intermediate tables in many-to-many relationships
  • Reduces storage by avoiding redundant IDs
Defining a Composite Key in SQL

        CREATE TABLE OrderDetails (
            order_id INT,
            product_id INT,
            quantity INT,
            PRIMARY KEY (order_id, product_id)
        );
      
First Normal Form (1NF) to Second Normal Form (2NF)
Step Description Example
1. Identify partial dependencies Find attributes that depend on part of a composite key ❌ (order_id, product_id) → product_name
2. Remove partial dependencies Move dependent attributes to a new table ✔ Create "Products" table with product_id → product_name
3. Ensure full dependency All attributes must depend on the full primary key ❌ order_id → customer_name → ✔ Store customer_name in "Customers"
4. Split into separate tables Break the table into independent entities ✔ Orders, Customers, and Products as separate tables
5. Use foreign keys Link tables using primary and foreign keys ✔ orders.customer_id → customers.customer_id
2NF: Eliminating Partial Dependencies
Customers
+------------+----------------+-----------+
| customer_id| phone          | city      |
+------------+----------------+-----------+
| 1          | 123-456-7890   | New York  |
| 3          | 987-654-3210   | California|
+------------+----------------+-----------+
              
Products
+------------+-----------+
| product_id | product   |
+------------+-----------+
| 1          | Laptop    |
| 2          | Mouse     |
| 3          | Phone     |
+------------+-----------+
              
Orders
+----------+-------------+
| order_id | customer_id |
+----------+-------------+
| 1        | 1           |
| 2        | 3           |
+----------+-------------+
              
History of Second Normal Form (2NF)
  • 1971 – Edgar F. Codd defines 2NF to remove partial dependencies
  • 1980s – 2NF becomes widely used in relational databases to reduce redundancy
  • 1990s – SQL databases formally implement 2NF principles
  • 2000s-Present – 2NF standard but often relaxed for performance optimization
Why is 2NF Not Enough?
  • Does 2NF eliminate all redundancy?
  • Can transitive dependencies still exist?
  • What problems remain in 2NF?
Why is 2NF Not Enough?
          +------------+----------------+--------------+-------+
          | customer_id| phone          | city         | state |
          +------------+----------------+--------------+-------+
          | 1          | 123-456-7890   | New York     | NY    |
          | 3          | 987-654-3210   | Los Angeles  | CA    |
          +------------+----------------+--------------+-------+
              
Does City Depend on State?

State depends on City, not the other way around

  • Each city belongs to one state
  • A state has many cities
  • Knowing the city → you can find the state
  • Knowing the state → you cannot find a unique city
Third Normal Form (3NF)
  • Remove transitive dependencies
  • Non-key attributes must depend only on the primary key
3NF Fixed: Remove Transitive Dependency
Customers (Fixed)
+------------+----------------+---------+
| customer_id| phone          | city_id |
+------------+----------------+---------+
| 1          | 123-456-7890   | 101     |
| 3          | 987-654-3210   | 102     |
+------------+----------------+---------+
              
Cities (New Table)
        +---------+-------------+-------+
        | city_id | city       | state  |
        +---------+-------------+-------+
        | 101     | New York   | NY     |
        | 102     | Los Angeles| CA     |
        +---------+-------------+-------+
              

A table in NF1 is in 3NF if all non-key attributes depend only on the primary key (NF2) and do not depend on other non-key attributes (NF3)

Second Normal Form (2NF) to Third Normal Form (3NF)
Step Description Example
1. Identify transitive dependencies Find attributes that depend on a non-key attribute instead of the primary key ❌ customer_id → city → state
2. Remove transitive dependencies Move dependent attributes to a separate table ✔ Create "Cities" table with city → state
3. Ensure direct dependency All non-key attributes must depend only on the primary key ❌ customer_id → city → state → ✔ Store city_id in "Customers"
4. Split into separate tables Move transitive dependencies to their own entities ✔ Customers, Orders, and Cities as separate tables
5. Use foreign keys Link tables using primary and foreign keys ✔ customers.city_id → cities.city_id
History of Third Normal Form (3NF)
  • 1971 – Edgar F. Codd introduces 3NF to remove transitive dependencies
  • 1980s – 3NF becomes widely adopted to improve data integrity
  • 1990s – SQL databases implement 3NF as a best practice
  • 2000s-Present – 3NF remains a standard, though often denormalized for performance
Database Normal Forms
Normal Form Summmary
1NF Eliminate repeating groups; ensure atomic values
2NF Remove partial dependencies; all attributes must depend on the whole primary key
3NF Remove transitive dependencies; non-key attributes must depend only on the primary key
BCNF Every determinant must be a candidate key; stricter than 3NF
4NF Eliminate multi-valued dependencies; each row represents a single fact
5NF Decompose tables to remove redundancy while preserving all joins
Questions?
Unnormalized, 1NF, 2NF, or 3NF?
Orders
+----------+------------+-----------------------+
| order_id | customer  | products               |
+----------+------------+-----------------------+
| 1        | Alice     | Laptop, Mouse          |
| 2        | Bob       | Phone, Charger         |
+----------+------------+-----------------------+
              
Unnormalized, 1NF, 2NF, or 3NF?
Orders
+----------+-----------+-------------+
| order_id | customer  | product_id  |
+----------+------------+------------+
| 1        | Alice     | 101         |
| 1        | Alice     | 102         |
| 2        | Bob       | 103         |
| 2        | Bob       | 104         |
+----------+-----------+-------------+
              
Unnormalized, 1NF, 2NF, or 3NF?
Orders
+----------+------------+----------+-------+
| order_id | customer  | product  | city   |
+----------+------------+----------+-------+
| 1        | Alice     | Laptop   | NY     |
| 1        | Alice     | Mouse    | NY     |
| 2        | Bob       | Phone    | CA     |
| 2        | Bob       | Charger  | CA     |
+----------+------------+----------+-------+
              
Unnormalized, 1NF, 2NF, or 3NF?
Customers
+------------+----------------+-------------+-------+---------+
| customer_id| phone          | city        | state | zip_code|
+------------+----------------+-------------+-------+---------+
| 1          | 123-456-7890   | New York    | NY    | 10001   |
| 3          | 987-654-3210   | Los Angeles | CA    | 90001   |
+------------+----------------+-------------+-------+---------+
              
Orders
    +----------+-------------+
    | order_id | customer_id |
    +----------+-------------+
    | 1        | 1           |
    | 2        | 3           |
    +----------+-------------+
              
Unnormalized, 1NF, 2NF, or 3NF?
Customers
+------------+----------------+---------+
| customer_id| phone          | city_id |
+------------+----------------+---------+
| 1          | 123-456-7890   | 101     |
| 3          | 987-654-3210   | 102     |
+------------+----------------+---------+
              
Cities
      +---------+-------------+---------+
      | city_id | city       | zip_code |
      +---------+-------------+---------+
      | 101     | New York   | 10001    |
      | 102     | Los Angeles| 90001    |
      +---------+-------------+---------+
              
Questions?
Database Engineering Process in Practice
+------+
| Data | -->
+------+
            
+--------------+
| Requirements | -->
+--------------+
  
+-----+
| ERD | -->
+-----+
  
+----------+
| Database | -->
+----------+
  
+-------+
| Table | 
+-------+
  
Discussion: When to Normalize or denormalize?
Questions?
Recursive JOINS
Sample Data: Car Parts Hierarchy
        +---------+------------+---------------+
        | part_id | name       | parent_part_id|
        +---------+------------+---------------+
        | 1       | Car        | NULL          |  <-- Root (Top-Level Part)
        | 2       | Engine     | 1             |  <-- Engine belongs to Car
        | 3       | Wheel      | 1             |  <-- Wheel belongs to Car
        | 4       | Piston     | 2             |  <-- Piston belongs to Engine
        | 5       | Bolt       | 4             |  <-- Bolt belongs to Piston
        +---------+------------+---------------+
          
Recursive Self-JOIN with Levels

        WITH RECURSIVE CarParts AS (
            -- Start with the whole car (top-level part)
            SELECT t1.part_id, t1.name AS part_name, 
                   t1.parent_part_id AS parent_id, 
                   NULL AS parent_name, 1 AS level
            FROM parts AS t1
            WHERE t1.parent_part_id IS NULL  
        
            UNION ALL
        
            -- Recursively join parts to their parent parts and track levels
            SELECT t2.part_id, t2.name, t2.parent_part_id, 
                   c.part_name AS parent_name, c.level + 1
            FROM parts AS t2
            JOIN CarParts AS c ON t2.parent_part_id = c.part_id
        )
        SELECT * FROM CarParts;
          
Recursive Self-JOIN Output

        +---------+------------+-----------+-------------+--------+
        | part_id | part_name  | parent_id | parent_name | level  |
        +---------+------------+-----------+-------------+--------+
        | 1       | Car        | NULL      | NULL        | 1      |
        | 2       | Engine     | 1         | Car         | 2      |
        | 3       | Wheel      | 1         | Car         | 2      |
        | 4       | Piston     | 2         | Engine      | 3      |
        | 5       | Bolt       | 4         | Piston      | 4      |
        +---------+------------+-----------+-------------+--------+
          
February 21, 2025
  • Practice Question
  • Lab Coding
  • Project Part 1 (Mar/21), Homework 3 (Feb/24), Lab 3 (Feb/24)
What Normal Form is This?
                +----------+------------+------------------+
                | order_id | customer  | products          |
                +----------+------------+------------------+
                | 1        | Alice     | Laptop, Mouse     |
                | 2        | Bob       | Phone, Charger    |
                +----------+------------+------------------+
  
Show Answer

Unnormalized (UNF) – The "products" contains multiple values

What Normal Form is This?
                +----------+------------+----------+-------+
                | order_id | customer  | product  | city   |
                +----------+------------+----------+-------+
                | 1        | Alice     | Laptop   | NY     |
                | 1        | Alice     | Mouse    | NY     |
                | 2        | Bob       | Phone    | CA     |
                | 2        | Bob       | Charger  | CA     |
                +----------+------------+----------+-------+
  
Show Answer

2NF – "city" depends on "customer"

Which of the following violates 1NF?

  • A table with a primary key
  • A table where a column stores multiple values (e.g., "Engine, Wheel")
  • A table with foreign keys
  • A table without NULL values
Show Answer

Answer: A

How do you fix a table that violates 1NF?

  • Remove duplicate rows
  • Store multi-valued attributes in separate rows
  • Use a composite primary key
  • Merge columns to reduce table size
Show Answer

Answer: B

Which table violates 2NF?

  • A table where all attributes depend on the full primary key
  • A table with a non-key column that depends on only part of a composite key
  • A table with foreign keys
  • A table with NULL values
Show Answer

Answer: B

How do you convert a 1NF table to 2NF?

  • Remove transitive dependencies
  • Ensure all non-key attributes depend on the whole primary key
  • Remove foreign keys
  • Merge duplicate records
Show Answer

Answer: B

Which table violates 3NF?

  • A table where non-key attributes depend only on the primary key
  • A table where a non-key column depends on another non-key column
  • A table with foreign keys
  • A table without NULL values
Show Answer

Answer: B

How do you fix a table that violates 3NF?

  • Remove attributes that do not depend on the primary key
  • Create separate tables for attributes with transitive dependencies
  • Add composite keys
  • Remove NULL values
Show Answer

Answer: B

Which of these real-world cases breaks 1NF?

  • A spaceship database storing multiple crew names in one column
  • A database with a primary key
  • A table using foreign keys
  • A table with atomic values
Show Answer

Answer: A

What is the key difference between 2NF and 3NF?

  • 2NF focuses on foreign keys, 3NF removes NULL values
  • 2NF only works for composite keys
  • 3NF does not allow primary keys
  • 2NF removes partial dependencies, 3NF removes transitive dependencies
Show Answer

Answer: D

Which is NOT a benefit of normalization?

  • Reduces redundancy
  • Improves consistency
  • Increases update anomalies
  • Improves query performance
Show Answer

Answer: C

Which is a possible downside of normalization?

  • Too many joins can slow down queries
  • It increases redundancy
  • It makes updates inconsistent
  • It prevents data integrity
Show Answer

Answer: A

Functional vs. Semantic Dependency
  • Functional Dependency (2NF) → Every non-key attribute depends on full primary key
  • Semantic Dependency (3NF) → Every non-key attribute depends only on the primary key, not on other non-key attributes
Questions?
        +------+
        | Data | -->
        +------+
            
+-----+
| NF1 | -->
+-----+
            
+--------------+
| Requirements | -->
+--------------+
  
+-----------+
| NF2 + NF3 | -->
+-----------+
  
+-------+
| Table | 
+-------+
  
Projection Operation
Projection Operation
Unnormalized Space Mission Data

🔗 View Dataset on GitHub

        +------+
        | Data | -->
        +------+
            
+-----+
| NF1 | -->
+-----+
            
+--------------+
| Requirements | -->
+--------------+
  
+-----------+
| NF2 + NF3 | -->
+-----------+
  
+-------+
| Table | 
+-------+
  
        +------+
        | Data | -->
        +------+
            
+-----+
| NF1 | -->
+-----+
            
+--------------+
| Requirements | -->
+--------------+
  
+-----------+
| NF2 + NF3 | -->
+-----------+
  
+-------+
| Table | 
+-------+
  

Which missions had the most hostile alien encounters?

        +------+
        | Data | -->
        +------+
            
+-----+
| NF1 | -->
+-----+
            
+--------------+
| Requirements | -->
+--------------+
  
+-----------+
| NF2 + NF3 | -->
+-----------+
  
+-------+
| Table | 
+-------+
  
        +------+
        | Data | -->
        +------+
            
+-----+
| NF1 | -->
+-----+
            
+--------------+
| Requirements | -->
+--------------+
  
+-----------+
| NF2 + NF3 | -->
+-----------+
  
+-------+
| Table | 
+-------+
  

🔗 SQL Database

Module 7
Relational Database Table Indexing
February 24, 2025
  • Practice Question
  • Database Table Indexing
  • Project Part 1 (Mar/21), Homework 3 (Feb/24), Lab 3 (Feb/24)

Table: employees

employee_id name department salary
1 Alice HR 60000
2 Bob IT 75000
3 Charlie Finance 82000

Which SQL query retrieves the names of employees in the IT department?

Table: orders

order_id customer_name order_date total_amount
101 John 2024-02-10 250.50
102 Emma 2024-02-11 180.75
103 Michael 2024-02-12 99.99

Write an SQL query to find the total sales amount for all orders placed after 2024-02-10

Table: students

student_id name age gpa
1 Jane 21 3.8
2 Tom 22 3.5
3 Lucy 20 3.9

Write an SQL query to count how many students have a GPA greater than 3.6

Table: products

product_id product_name category price
1 Laptop Electronics 1200.00
2 Headphones Electronics 150.00
3 Shoes Fashion 80.00

Which SQL query retrieves the most expensive product in the Electronics category?

Table: customers

customer_id name city
1 Emma New York
2 Daniel Los Angeles
3 Olivia New York

How would you write an SQL query to count how many customers are from New York?

Which action best helps a table comply with 3NF?

  • Combine all attributes into a single column
  • Remove transitive dependencies by creating new tables
  • Allow multi-valued attributes
  • Delete the primary key
Show Answer

Answer: B

How do you resolve a table that violates 2NF?

  • Remove NULL values from all columns
  • Split the table into separate tables based on partial dependencies
  • Add foreign keys to every table
  • Ensure all attributes depend on a composite key
Show Answer

Answer: B

What is the first step to ensure a table meets 1NF requirements?

  • Remove all non-key attributes
  • Eliminate repeating groups and ensure atomic values
  • Create a composite key with multiple columns
  • Add transitive dependencies
Show Answer

Answer: B

What should you do to normalize a table from 1NF to 2NF?

  • Ensure all non-key attributes are fully dependent on the entire primary key
  • Add duplicate rows for clarity
  • Introduce NULL values in key columns
  • Remove all foreign keys
Show Answer

Answer: A

What is the best way to choose a primary key in an entity-relationship diagram (ERD)?

  • Use a unique and immutable attribute
  • Use a frequently updated attribute
  • Choose any attribute arbitrarily
  • Use a foreign key from another table
Show Answer

Answer: A

Which SQL query retrieves all employees with a salary greater than 50,000?

  • SELECT salary FROM employees WHERE salary > 50000;
  • SELECT * FROM employees WHERE salary > 50000;
  • SELECT ALL FROM employees WHERE salary > 50000;
  • SELECT employees FROM salary WHERE salary > 50000;
Show Answer

Answer: B

Which SQL join returns only matching records from both tables?

  • LEFT JOIN
  • RIGHT JOIN
  • INNER JOIN
  • FULL OUTER JOIN
Show Answer

Answer: C

Which statement best describes a foreign key?

  • A unique identifier for each row in a table
  • A column that establishes a relationship with a primary key in another table
  • A key that allows NULL values only
  • A key that must always be auto-incremented
Show Answer

Answer: B

Which SQL query correctly finds the total number of employees?

  • SELECT COUNT(*) FROM employees;
  • SELECT SUM(*) FROM employees;
  • SELECT COUNT(salary) FROM employees;
  • SELECT TOTAL(*) FROM employees;
Show Answer

Answer: A

How do you fix a table that violates 3NF?

  • Remove attributes that do not depend on the primary key
  • Create separate tables for attributes with transitive dependencies
  • Add composite keys
  • Remove NULL values
Show Answer

Answer: B

Which SQL query retrieves departments with more than 10 employees?

  • SELECT department, COUNT(*) FROM employees GROUP BY department WHERE COUNT(*) > 10;
  • SELECT department, COUNT(*) FROM employees GROUP BY department HAVING COUNT(*) > 10;
  • SELECT department FROM employees WHERE COUNT(*) > 10;
  • SELECT * FROM employees WHERE department COUNT > 10;
Show Answer

Answer: B

Which condition must be met for a table to be in 1st Normal Form (1NF)?

  • No repeating groups or arrays
  • Each non-key column must depend on the whole primary key
  • No transitive dependencies
  • Each column must be a foreign key
Show Answer

Answer: A

Database Table Indexing

How does row count affect query performance?

Table 1: 5 Rows

ID Name Value
1 Alice 100
2 Bob 200
3 Charlie 150
4 Dave 300
5 Eve 250

Table 2: 50 Rows

ID Name Value
1 Alice 100
2 Bob 200
3 Charlie 150
4 Dave 300
5 Eve 250
...45 more rows

Table 3: 500 Rows

ID Name Value
1 Alice 100
2 Bob 200
3 Charlie 150
4 Dave 300
5 Eve 250
...495 more rows
Full Table Scan Time Complexity

Time estimates assuming 0.01ms (10µs) per row

Table Row Count (n) Operation Time Complexity (Big O) Total Time
Table A 10,000 Full Table Scan O(n) = O(10,000) 100ms
Table B 100,000 Full Table Scan O(n) = O(100,000) 1,000ms (1s)
Table C 1,000,000 Full Table Scan O(n) = O(1,000,000) 10,000ms (10s)

Real-world examples of data generation

Example Data Rate Records per Second Time to 1 Million Records
GPS Data (Driving, 1Hz) 1 record per second 1 1,000,000s (~11.6 days)
GPS Data (Driving, 10Hz) 10 records per second 10 100,000s (~1.16 days)
Social Media Posts (Busy Platform) 1,000 posts per second 1,000 1,000s (~16.7 minutes)
IoT Sensor (High-Frequency) 10,000 readings per second 10,000 100s (~1.67 minutes)
SQL Table as a List of Rows
ID Latitude Longitude Timestamp PostText SensorValue Source
1 40.7128 -74.0060 2025-02-23 10:00:00 NULL NULL GPS (10Hz)
2 40.7130 -74.0058 2025-02-23 10:00:00.1 NULL NULL GPS (10Hz)
3 NULL NULL 2025-02-23 10:00:01 Great day! NULL Social Media
4 NULL NULL 2025-02-23 10:00:02.001 NULL 23.5 IoT Sensor
1000000 41.0000 -73.0000 2025-02-24 15:23:45 NULL NULL GPS (10Hz)
How Time Complexity Grows (1 to 10k)
Big O n = 1 n = 100 n = 10,000
O(1) 1 1 1
O(log n) [base 2] 0 ~6.6 ~13.3
O(n) 1 100 10,000
O(n log n) [base 2] 0 ~660 ~133,000
O(n²) 1 10,000 100,000,000
From Table to Search Tree data structure
SQL Table Example
SQL Table Example
FlowerID (Indexed) Name Color Price
101 Rose Red 2.50
305 Tulip Yellow 1.75
204 Lily White 3.00
FlowerID Name Color (Indexed) Price
101 Rose Red 2.50
102 Tulip Red 1.75
204 Lily White 3.00
205 Daisy Red 1.25
Indexing an Attribute with Multiple Value Occurrences
SQL Table Example
Table with Scattered Primary Keys

Color is indexed, FlowerID varies widely

FlowerID (PK) Name Color (Indexed) Price
101 Rose Red 2.50
5000 Tulip Red 1.75
999999 Daisy Red 1.25
204 Lily White 3.00
How a Color Index Looks (B-tree)

A B-tree groups Color values with pointers

B-tree Level Structure
Root [Blue, Red, White, Green]
Leaf (Blue) "Blue" → [300, 7500]
Leaf (Red) "Red" → [101, 5000, 999999]
Leaf (White) "White" → [204, 15000]
Leaf (Green) "Green" → [405, 6000, 888888]
Indexed Attribute Lookup

An indexed attribute can be found in log(n) steps using a B-tree

MariaDB: A Chronological Overview
Year Milestone
2009 MariaDB project founded by Michael "Monty" Widenius after leaving MySQL
2010 MariaDB 5.1 released as a drop-in replacement for MySQL
2012 Debian and Fedora replace MySQL with MariaDB as the default database
2013 MariaDB 10 introduced with improved performance and replication
2015 MariaDB Corporation formed, MariaDB 10.1 released with encryption features
2017 MariaDB 10.2 adds JSON support and Window functions
2019 MariaDB 10.4 introduces system-versioned tables and new authentication
2022 MariaDB 10.9+ improves replication and backup features
SQLite vs PostgreSQL vs MariaDB
Feature SQLite PostgreSQL MariaDB
License Public Domain PostgreSQL License (MIT-like) GPLv2 (Community), Enterprise License
Storage Type Single-file DB Client-Server Client-Server
Scalability Not scalable (Single-file) Highly scalable (Multi-threaded, Multi-node) Scalable (Replication, Clustering)
ACID Compliance Partial (Write-ahead logging) Fully ACID-compliant Fully ACID-compliant
Concurrency Single-threaded Multi-threaded with MVCC Multi-threaded with MVCC
Performance Fast for small DBs Optimized for complex queries Optimized for OLTP workloads
Use Cases Embedded systems, Mobile apps Data Warehousing, Enterprise Apps Web apps, MySQL replacement
Replication Not supported Asynchronous, Streaming Replication Supported
MariaDB Licensing
  • GPLv2 (Community Edition): Open-source, free use under GNU General Public License v2
  • Enterprise License: Commercial license additional features, support, and security patches
MariaDB Main Repository
MariaDB & GPLv2 License

Understanding the implications of the GNU General Public License v2

What Does GPLv2 Mean?

  • Freedom to Use & Modify - Anyone can use and modify MariaDB
  • Source Code Availability - The source code must be accessible
  • Copyleft Requirement - Modified versions must also be GPLv2
  • No Proprietary Forks - Cannot relicense as closed-source
  • Enterprise Licensing - Commercial options exist for proprietary use
Questions?
What is the primary purpose of an index in a database?
  • To store additional copies of data
  • To improve query performance
  • To increase storage usage only
  • To prevent data duplication
Show Answer

Answer: B (To improve query performance)

Which type of index ensures that all values in a column are unique?
  • Full-text index
  • Composite index
  • Unique index
  • Primary key index
Show Answer

Answer: C (Unique index)

Questions?
Indexing in SQLite, PostgreSQL, and MariaDB
Database Index Creation
SQLite CREATE INDEX idx_color ON Flowers(color);
PostgreSQL CREATE INDEX idx_color ON Flowers(color);
MariaDB CREATE INDEX idx_color ON Flowers(color);
Database Table Indexing with MariaDB

MariaDB OneCompiler
GitHub: SE4CPS DMS Database

Why Not Index All Attributes?
February 26, 2025
  • Practice Question
  • Review Homework 3, Lab 3
  • Review Modules 1 - 7
  • Review Project Part 1 (Mar/21)
  • Review Homework 4, Lab 4

Which SQL query is used to define an index?

  • CREATE INDEX idx_name ON table_name(column_name);
  • ALTER TABLE table_name ADD INDEX idx_name (column_name);
  • INSERT INTO table_name (index_name) VALUES ('idx_name');
  • UPDATE table_name SET index_name = 'idx_name' WHERE column_name = value;
Show Answer

Correct answers: A

Which data structure is most commonly used to store indexes in SQL databases?

  • Hash Table
  • Linked List
  • Binary Search Tree
  • B-Tree
Show Answer

Correct answer: D

What is the key difference between a Binary Tree and a B-Tree?

  • Binary Tree has at most two children per node, a B-Tree can have multiple children per node
  • A B-Tree is always balanced, whereas a Binary Tree may not be balanced
  • B-Tree is optimized for disk storage, whereas a Binary Tree is primarily used in memory
  • All of the above
Show Answer

Correct answer: D

What is the time complexity for finding a value in an indexed attribute versus a non-indexed attribute?

  • Indexed: O(n), Non-Indexed: O(log n)
  • Indexed: O(log n), Non-Indexed: O(n)
  • Indexed: O(1), Non-Indexed: O(n log n)
  • Indexed: O(n log n), Non-Indexed: O(1)
Show Answer

Correct answer: B

How does adding an index affect the storage complexity of a database?

  • It increases storage complexity to O(n log n)
  • It does not affect storage complexity
  • It reduces storage complexity to O(log n)
  • It decreases storage complexity to O(1)
Show Answer

Correct answer: A

In which scenario is adding an index not recommended?

  • When the table has frequent inserts, updates, and deletes
  • When the table is mostly used for read queries
  • When the table has a large number of rows
  • When queries often filter data using WHERE clauses
Show Answer

Correct answer: A

SQLite Index Performance Test

GitHub Repository

  • Creating and resetting an SQLite database
  • Inserting 100 million rows efficiently
  • Comparing query performance with and without indexing
Questions?
Homework 3
Question 1

Please read page 128 in Textbook 1 on Natural Joins and explain one risk when using a Natural Join

Without explicitly specifying the join condition, there is a risk of accidentally choosing the same name for a new column to be added to the table even though the columns are unrelated to one another. In which case there would be unintended inclusion or exclusion of columns due to automatically matched column names.

Question 2

Please read page 141 in Textbook 1 on Multi-Table Query Performance and explain which types of queries typically require JOINS

Queries especially relevant to related information or simply that we are retrieving information from multiple sources. Querying aggregated data from multiple tables to perform calculations would require joins. The act of filtering data would also require joins because we can apply conditions that help eliminate duplicates or unnecessary information

Question 3

Please read page 149 in Textbook 1 with Figure 7-12 and explain what unmatched rows mean and how an OUTER JOIN handles them

Unmatched rows are rows in a table that do not have corresponding matches in another table, Outer join handles them by providing a way to keep them. This is done by filling the missing values from the unmatched table with NULLs

Question 4

Please read pages 67–71 in Textbook 2 on GROUP BY and explain the difference between the two queries:

            SELECT type_id, COUNT(*) FROM foods GROUP BY type_id HAVING COUNT(*) < 20;
            SELECT type_id, COUNT(*) FROM foods;
          

The major difference between the two is that one uses GROUP BY and HAVING and the other is a simple select query. The use of GROUP BY and HAVING allows us to aggregate the rows in foods by type_id with the condition of “having” the count of less than 20. The second query doesn’t aggregate or filter the data. The output is very different

Question 5

"For the modification to succeed, the expressions for all constraints must evaluate to true."

In the event where the values of expressions are compared, all constraints for the expressions must evaluate to true for the test of comparison to evaluate to true. Check involves verifying data integrity and so if all the constraints are met then the result is true otherwise false or null. The idea of a modification could be an insert or update, if any of the constraints evaluate to false then it is not successful and rejected

Lab 3
Question 1: Basic Selection and Projection

πname,price​(σcolor=′red′∧stockQuantity>20​(Flowers))

Question 2

πname​(σflowerID ≠ NULL​(Flowers⋈Orders))

Question 3

Using join before selection, the first expression performs a join first, then filters yellow flowers

It is inefficient since it joins all flowers with all orders, though only yellow flowers are needed

            T1← σ color=′Yellow′ ​(F)
            T2← π name ​(T2)
            T3← T1⋈T2
          
Question 4

Upon sending a GET /api/flowers?color=red request, the backend server processes it by extracting the query parameter

            SELECT * FROM Flowers WHERE color = 'red';
          
Question 5

Web interface:

Advantage: Accessible without SQL knowledge

Disadvantage: Limited query complexity

Console-based queries:

Advantage: Allows complex, customized queries

Disadvantage: Requires SQL knowledge

Question 6
SELECT category, AVG(price) AS average_price FROM Flowers GROUP BY category;
SELECT category, AVG(price) AS average_price FROM Flowers GROUP BY category HAVING AVG(price) > 10;
          

“Having” allows us to create the condition for price

Question 7
            CREATE TABLE flower (
                id INT PRIMARY KEY,
                color VARCHAR(50),
                heightInCm INT CHECK (heightInCm > 0),
                petalCount INT CHECK (petalCount >= 1),  
                species VARCHAR(100),
                bloomingSeason VARCHAR(50),
                price DECIMAL(10, 2) CHECK (price >= 0),
                fragranceLevel INT
            );
          
Question 8
            SELECT f.id AS flower_id, f.name AS flower_name, f.color, s.id AS supplier_id, 
                   s.name AS supplier_name, s.location 
            FROM flower f 
            INNER JOIN supplier s ON f.supplier_id = s.id;
          
Question 9
            SELECT c.id AS customer_id, c.name AS customer_name, c.email, 
                   o.id AS order_id, o.order_date, o.amount 
            FROM customer c 
            LEFT JOIN order o ON c.id = o.customer_id;
          
Question 10
            SELECT p.id AS product_id, p.name AS product_name, p.price, 
                   s.id AS sale_id, s.quantity_sold 
            FROM product p 
            RIGHT JOIN sale s ON p.id = s.product_id;
          
Questions?
SQLite Index Performance Test

GitHub Repository

  • Creating and resetting an SQLite database
  • Inserting 100 million rows efficiently
  • Comparing query performance with and without indexing
Review Modules 1 - 7
Questions?
Project Part 1 (Mar/21), Homework 3 (Feb/24), Lab 3 (Feb/24)
Git Checkout & Setup for DMS (Part 1)
        # 1️⃣ Clone the repository
        git clone https://github.com/SE4CPS/DMS.git
        
        # 2️⃣ Navigate to the project folder
        cd DMS/project/part1
        
        # 3️⃣ Check current teams
        dir  # (Windows) or ls (Mac/Linux)
        
        # 4️⃣ Create a new team folder
        mkdir team-n
        cd team-n
Git Commit & Push (Part 2)
        # 5️⃣ Make your code changes
        
        # 6️⃣ Stage your new folder
        git add team-n
        
        # 7️⃣ Commit your changes
        git commit -m "Added team-n folder"
        
        # 8️⃣ Push changes to the repository
        git push origin main
February 28, 2025
  • Review Module 1 - 7
Module 8
Midterm Preparation
March 3, 2025
  • Midterm Question

Why is an ERD primarily useful in database design?

  • It guarantees all business rules are enforced in queries
  • It helps visualize entities and their relationships
  • It ensures automatic indexing of all tables
  • It generates efficient SQL queries
Show Answer

Correct: B

How many errors are in this SQL query?

SELECT name age FROM users WHERE username = "admin";
  • 0
  • 1
  • 2
  • 3
Show Answer

Correct: C

Which entity-relationship setup best fits an online course platform?

  • Entities: Student, Course, Instructor | Relationships: Students enroll in Courses, Instructors teach Courses
  • Entities: Student, Instructor, Grade | Relationships: Students are taught by Instructors, Instructors assign Grades
  • Entities: Course, Instructor, Assignment | Relationships: Courses are assigned to Instructors, Assignments belong to Instructors
  • Entities: Student, Course, Department | Relationships: Students belong to Departments, Courses are part of Departments
Show Answer

Correct: A

Which set of attributes best fits an ERD for a library system?

  • Entities: Book, Author, Library | Attributes: Title, Biography, Location
  • Entities: Member, Staff, Transaction | Attributes: Age, Salary, Timestamp
  • Entities: Book, Member, Loan | Attributes: Title, Name, Due Date
  • Entities: Book, Genre, Publisher | Attributes: Title, Category, Address
Show Answer

Correct: C

Which SQL data types best fit this sample data?

  • OrderID: INT, Price: FLOAT, OrderDate: DATE
  • OrderID: VARCHAR, Price: DECIMAL, OrderDate: TIMESTAMP
  • OrderID: INT, Price: DECIMAL, OrderDate: DATETIME
  • OrderID: BIGINT, Price: STRING, OrderDate: TIME
Show Answer

Correct: C

Which SQL constraints best enforce data integrity?

  • UserID: NOT NULL, Email: UNIQUE, Name: PRIMARY KEY
  • UserID: PRIMARY KEY, Email: UNIQUE, Name: NOT NULL
  • UserID: UNIQUE, Email: NOT NULL, Name: UNIQUE
  • UserID: PRIMARY KEY, Email: NOT NULL, Name: UNIQUE
Show Answer

Correct: B

Which of the following SQL operations ensures data integrity by enforcing rules on the values allowed in a column?

  • Primary Key
  • Foreign Key
  • Check Constraint
  • Index
Show Answer

Correct answer: A, B, C

Which approach is best for generating unique IDs in a database?

  • Auto-increment for sequential numeric IDs
  • Custom ID format based on business rules
  • UUID for globally unique identifiers
  • Using a manually entered primary key
Show Answer

Correct: A, B, C

Why did Edgar F. Codd choose relational algebra for databases?

  • It provides a formal foundation based on set theory
  • It simplifies database storage structures
  • It ensures all queries run in constant time
  • It avoids the need for indexing and optimization
Show Answer

Correct: A

How does relational algebra operate on data?

  • It manipulates sets of tuples
  • It processes tuples separately
  • It splits attributes into relations
  • It performs only calculations
Show Answer

Correct: A

Which relational algebra operation is used to retrieve specific columns from a relation?

  • Selection (σ)
  • Projection (π)
  • Join (⨝)
  • Union (∪)
Show Answer

Correct: B

Which transformation optimizes the following relational algebra query?

       π_name(σ_age > 30 (Employee ⨝ Department))
  • π_name(Employee ⨝ σ_age > 30 (Department))
  • σ_age > 30 (π_name (Employee ⨝ Department))
  • π_name(σ_age > 30 Employee ⨝ Department)
  • Employee ⨝ π_name(σ_age > 30 Department)
Show Answer

Correct: C

Which SQLite dot command is used to display the database schema?

  • .read
  • .schema
  • .tables
  • .mode
Show Answer

Correct: B

How do PostgreSQL and SQLite handle primary key generation differently?

  • PostgreSQL uses SERIAL, while SQLite auto-increments INTEGER PRIMARY KEY
  • Both use SERIAL for automatic primary key generation
  • SQLite requires manual primary key insertion, unlike PostgreSQL
  • PostgreSQL does not support automatic primary key generation
Show Answer

Correct: A

How do a console, API, and web interface differ?

  • A web interface is text-based, an API is graphical, a console is automated
  • A console is command-line, an API is programmatic, a web interface is graphical
  • An API is for users, while a console and web interface automate tasks
  • A console and API are the same, but a web interface has graphics
Show Answer

Correct: B

Which SQL command is used to add a new column to an existing table?

  • ADD COLUMN
  • MODIFY TABLE
  • UPDATE TABLE
  • ALTER TABLE
Show Answer

Correct: D

Which SQL command is used to create a new table?

  • CREATE TABLE
  • ADD TABLE
  • NEW TABLE
  • INSERT TABLE
Show Answer

Correct: A

Which query correctly retrieves departments with more than 10 employees?

  • SELECT department, COUNT(*) FROM employees WHERE COUNT(*) > 10;
  • SELECT department FROM employees GROUP BY department HAVING COUNT(*) > 10;
  • SELECT department FROM employees WHERE COUNT(*) > 10 GROUP BY department;
  • SELECT department, COUNT(*) FROM employees HAVING COUNT(*) > 10 GROUP BY department;
Show Answer

Correct: B

Which SQL query correctly calculates the total salary, highest salary, lowest salary, average salary, and employee count per department?

  • SELECT department, SUM(salary), MAX(salary), MIN(salary), AVG(salary), COUNT(*) FROM employees;
  • SELECT department, SUM(salary), MAX(salary), MIN(salary), AVG(salary), COUNT(*) FROM employees GROUP BY department;
  • SELECT department, SUM(salary), MAX(salary), MIN(salary), AVG(salary), COUNT(*) FROM employees WHERE department GROUP BY department;
  • SELECT department, SUM(salary), MAX(salary), MIN(salary), AVG(salary), COUNT(*) FROM employees HAVING department;
Show Answer

Correct: B

Which query correctly groups employees by department and counts them?

SELECT department, COUNT(*) FROM employees ___ department;
  • GROUP BY
  • ORDER BY
  • HAVING
  • WHERE
Show Answer

Correct: A

Which query retrieves departments with more than 5 employees?

SELECT department, COUNT(*) FROM employees GROUP BY department ___ COUNT(*) > 5;
  • WHERE
  • ORDER BY
  • HAVING
  • LIMIT
Show Answer

Correct: C

Which query filters before grouping rather than after?

SELECT department, COUNT(*) FROM employees ___ salary > 50000 GROUP BY department;
  • HAVING
  • WHERE
  • GROUP BY
  • ORDER BY
Show Answer

Correct: B

Which JOIN retrieves only matching rows from both tables?

SELECT employees.name, departments.name FROM employees ___ JOIN departments ON employees.dept_id = departments.id;
  • LEFT
  • RIGHT
  • INNER
  • FULL
Show Answer

Correct: C

Which JOIN retrieves all rows from the left table and matching rows from the right?

SELECT employees.name, departments.name FROM employees ___ JOIN departments ON employees.dept_id = departments.id;
  • LEFT
  • RIGHT
  • INNER
  • FULL
Show Answer

Correct: A

Which JOIN retrieves all rows from both tables, filling missing matches with NULLs?

SELECT employees.name, departments.name FROM employees ___ JOIN departments ON employees.dept_id = departments.id;
  • LEFT
  • RIGHT
  • INNER
  • FULL
Show Answer

Correct: D

Which statement correctly describes normalization levels in databases?

  • Unnormalized form (UNF) allows duplicate and multi-valued attributes
  • First normal form (1NF) eliminates duplicate rows but allows repeating groups
  • Second normal form (2NF) requires 1NF and ensures all non-key attributes depend on the full primary key
  • Third normal form (3NF) requires 2NF and ensures all attributes depend only on the primary key
Show Answer

Correct: A, C, D

Why is normalization important in database design?

  • It removes redundant data and improves consistency
  • It speeds up all queries by reducing table joins
  • It allows storing data without any constraints
  • It ensures every table has exactly one column
Show Answer

Correct: A

How does indexing improve query speed?

  • Index: O(n), Table Scan: O(1)
  • Index: O(n log n), Table Scan: O(1)
  • Index: O(1), Table Scan: O(n log n)
  • Index: O(log n), Table Scan: O(n)
Show Answer

Correct: C

When is indexing most beneficial?

  • When performing many DELETE operations
  • When performing many INSERT operations
  • When performing many UPDATE operations
  • When performing many SELECT queries
Show Answer

Correct: D

A flower shop tracks orders and customers. Each order is linked to a customer. Propose four attributes for each table and write two CREATE TABLE statements ensuring good data quality with appropriate primary key, foreign key, data types and constraints, including CHECK constraints

March 5, 2025
  • Edgar F. Codd
  • Midterm Question

Edgar F. Codd ("Ted")

The Inventor of the Relational Database Model

Edgar F. Codd
  • Born: August 19, 1923
  • Known for: Inventing the Relational Database Model
  • Key Contribution: Relational Algebra & Normal Forms
  • Impact: Foundation for SQL & Modern Databases
  • Award: ACM Turing Award (1981)

Edgar F. Codd: Technical Background

  • Education – Studied Mathematics & Chemistry at Exeter College, Oxford (1944)
  • Early Career – Worked as a programmer for IBM (1948)
  • Ph.D. in CS – Earned a Ph.D. in Computer Science at the University of Michigan (1965).
  • IBM Research – Developed the Relational Model at IBM (1970)
  • Award – Received the ACM Turing Award (1981)

Challenges Before the Relational Model (1970)

  • Complexity – Hierarchical & network models were hard to use
  • No Standard – No unified query language
  • Redundancy – Duplicate data caused inconsistencies
  • Inflexibility – Structural changes were difficult
  • Slow Queries – Hard to optimize data retrieval

Edgar Codd introduced the relational model in 1970 to solve these issues

Data Before Relational Databases

  • Hierarchical Storage – Data was stored in **tree-like structures** (e.g., IBM IMS).
  • Network Model – Complex **many-to-many links** between records (e.g., CODASYL DBTG).
  • Fixed Formats – Data stored in **rigid file formats**, making changes difficult
  • Manual Querying – No SQL, queries required **procedural navigation**
  • Redundancy – Data duplication was common due to lack of normalization
              Example (Hierarchical Data):
              ├── Customers
              │   ├── Orders
              │   │   ├── Order Items
              │   │   ├── Payments
          

Before relational databases, data was difficult to manage, leading to inefficiency and inconsistency

Entities and Relations from Raw Data

              Entities:
              Customer (CustomerID, Name, Email)
              Order (OrderID, CustomerID, OrderDate)
              Product (ProductID, Name, Price)
      
              Relations:
              Customer - Order (One-to-Many via CustomerID)
              Order - Product (Many-to-Many via OrderID & ProductID)
          

"Ted" Wins Turing Award (1981)

  • Award: ACM Turing Award (1981)
  • Reason: Invented the Relational Database Model
  • Impact: Revolutionized data storage & retrieval
  • Legacy: Foundation for SQL & modern databases
Midterm Question

Which SQL query correctly retrieves all rows from the flower table where the species is 'Rose' and the color is 'Red'?

  • SELECT * FROM flower WHERE type = 'Rose' AND color = 'Red';
  • SELECT * FROM flower WHERE species = 'Rose' AND color = 'Red';
  • SELECT * FROM flower WHERE name = 'Rose' AND color = 'Red';
  • SELECT * FROM flower WHERE category = 'Rose' AND color = 'Red';
Show Answer

Correct: B

Which SQL query correctly retrieves only the species and color columns from the flower table?

  • SELECT species, color FROM flower;
  • SELECT * FROM flower WHERE species, color;
  • SELECT species AND color FROM flower;
  • SELECT species color FROM flower;
Show Answer

Correct: A

Which SQL query correctly retrieves only the species and color columns from the flower table?

  • SELECT species, color FROM flower;
  • SELECT * FROM flower WHERE species, color;
  • SELECT species AND color FROM flower;
  • SELECT species color FROM flower;
Show Answer

Correct: A

What is the purpose of the GROUP BY clause in SQL?

  • To filter rows based on a condition
  • To aggregate results based on unique column values
  • To sort the query results in ascending order
  • To join multiple tables in a single query
Show Answer

Correct: B

A florist wants to find out how many flowers of each type were sold. Which SQL query should they use?

  • SELECT flower_type, SUM(quantity) FROM sales;
  • SELECT flower_type, COUNT(*) FROM sales GROUP BY flower_type;
  • SELECT flower_type, quantity FROM sales ORDER BY flower_type;
  • SELECT DISTINCT flower_type, COUNT(*) FROM sales;
Show Answer

Correct: B

Design an Entity-Relationship Diagram (ERD) for a flower ordering system with the following requirements:

  • Three entities: Customer, Order, and Flower
  • Two relationships between them
  • Each table should have exactly four attributes, including foreign keys where applicable
  • Pick appropriate data types for each attribute
  • Include constraints such as NOT NULL, UNIQUE, CHECK where necessary
  • Ensure foreign keys are properly defined

The following table represents unnormalized flower order data:

| order_id | customer_name | customer_phone | order_date  | flower_details              | total_price |
|----------|--------------|----------------|-------------|------------------------------|-------------|
| 101      | Alice Green  | 555-1234       | 2024-03-01  | Rose-Red-2, Tulip-Yellow-3   | 25.00       |
| 102      | Bob White    | 555-5678       | 2024-03-02  | Lily-White-1                 | 10.00       |
          

Transform this data step by step into 1NF, 2NF, and 3NF

  • Start by removing repeating groups to achieve 1NF
  • Eliminate partial dependencies to reach 2NF
  • Remove transitive dependencies for 3NF

A florist wants to find flower types that were sold more than 10 times. Which SQL query should they use?

  • SELECT flower_type, COUNT(*) FROM sales WHERE COUNT(*) > 10;
  • SELECT flower_type, COUNT(*) FROM sales GROUP BY flower_type HAVING COUNT(*) > 10;
  • SELECT flower_type FROM sales WHERE quantity > 10 GROUP BY flower_type;
  • SELECT flower_type, SUM(quantity) FROM sales HAVING SUM(quantity) > 10;
Show Answer

Correct: B

A florist wants to find the number of different flower types sold. Which SQL query should they use?

  • SELECT COUNT(*) FROM sales GROUP BY flower_type;
  • SELECT DISTINCT flower_type, COUNT(*) FROM sales;
  • SELECT COUNT(DISTINCT flower_type) FROM sales;
  • SELECT flower_type FROM sales WHERE DISTINCT COUNT(*) > 1;
Show Answer

Correct: C

How many errors (1, 2, 3, or 4) are present in the following SQL query?


                      SELECT name, price FROM flowers ORDER BY color;
          
Show Answer

Correct: A (1 error)

Which SQL query correctly finds the least expensive flower?

  • SELECT name FROM flowers WHERE price = MIN(price);
  • SELECT name FROM flowers ORDER BY price DESC LIMIT 1;
  • SELECT name FROM flowers ORDER BY price ASC LIMIT 1;
  • SELECT name FROM flowers GROUP BY name HAVING price = MIN(price);
Show Answer

Correct: C

In relational algebra optimization, which approach is typically used to improve query performance?

  • Applying selection and projection as early as possible
  • Executing Cartesian products before joins
  • Performing aggregation before filtering conditions
  • Avoiding the use of indexes for faster retrieval
Show Answer

Correct: A

What is the difference between an Entity, a Relation, and a Table?

  • An Entity is a concept, a Relation is its structure, and a Table is its storage
  • An Entity and a Table are the same, but a Relation is a connection
  • A Relation is the physical storage, and an Entity is its logical form
  • A Table represents multiple Entities, while a Relation is a single Entity
Show Answer

Correct: A

What do SQLite’s AUTOINCREMENT and PostgreSQL’s SERIAL have common?

  • Both automatically generate unique, incrementing integer values for a column
  • Both prevent manual insertion of values into the column
  • Both strictly prevent ID reuse, even if rows are deleted
  • Both require an explicit SEQUENCE to function
Show Answer

Correct: A

Which SQL data type is best for storing the value 2.99?

  • INT
  • FLOAT
  • DECIMAL
  • CHAR
Show Answer

Correct: C

What is the primary purpose of using JOINs in SQL?

  • To combine rows from multiple tables based on a related column
  • To merge two databases into one
  • To sort and filter data within a single table
  • To remove duplicate rows from a table
Show Answer

Correct: A

Which SQL JOIN type typically generates the most data?

  • INNER JOIN
  • LEFT JOIN
  • RIGHT JOIN
  • CROSS JOIN
Show Answer

Correct: D

Which SQL JOIN type typically generates the least data?

  • INNER JOIN
  • LEFT JOIN
  • RIGHT JOIN
  • FULL JOIN
Show Answer

Correct: A

Which of the following is not an aggregate function in SQL?

  • SUM()
  • COUNT()
  • AVG()
  • ROUND()
Show Answer

Correct: D

Why is relational algebra important for relational databases?

  • It defines operations for querying and manipulating relational data
  • It ensures that databases only store structured data
  • It is used to enforce security and access control
  • It is required to create database schemas
Show Answer

Correct: A

Which statement is true about the UNION operator in SQL?

  • It combines results from multiple queries and keeps duplicates
  • It only works on tables with a primary key
  • It combines results from multiple queries and removes duplicates
  • It requires a JOIN condition to work
Show Answer

Correct: C

Which SQL query retrieves all unique flower names that appear in either the spring_flowers or summer_flowers tables?

  • SELECT name FROM spring_flowers UNION SELECT name FROM summer_flowers;
  • SELECT name FROM spring_flowers INTERSECT SELECT name FROM summer_flowers;
  • SELECT name FROM spring_flowers JOIN summer_flowers ON spring_flowers.id = summer_flowers.id;
  • SELECT DISTINCT name FROM spring_flowers, summer_flowers;
Show Answer

Correct: A

If you need a database that allows full customization without license restrictions, would you choose SQLite, PostgreSQL, or MariaDB? Consider their licensing models and explain your choice

Midterm Exam – Friday, March 7

Time: 8:00 - 9:00 AM

Format

  • 20 questions on paper
  • 18 multiple-choice, 2 written response
  • Allowed: 2 pencils & 1 page of notes

Scope

  • Modules 1–7 (2–3 questions per module)
  • Homework/Lab 1–4
Questions
Query Writing
Thanks
Module 9
Spring Break
Module 10
Relational Database Transactions
March 17, 2025
  • Midterm Review
  • Relational Database Transactions
Midterm Review

Why is an ERD primarily useful in database design?

  • It helps picking the right operating system and database system such as SQLite
  • It is a one-time task to quickly create database tables
  • It helps defining SQL database queries first with stakeholders
  • It helps visualize and communicate database requirements with stakeholders
Show Answer

Correct: D

How many errors are in the SQL query? Assume table, attributes, and data types are correct


                  SELECT name, stock FROM Flower WHERE name = 'rose';
          
  • 0
  • 1
  • 2
  • 3
Show Answer

Correct: A

Which set of entities and relationships best represents how a flower ordering system tracks customer purchase?

  • Entities: Flower, Supplier, Orders | Relationships: Suppliers and Flowers, Flowers and Orders
  • Entities: Customer, Payment, Discount | Relationships: Customers and Payments, Discounts and Payments
  • Entities: Customer, Order, Flower | Relationships: Customers and Orders, Orders and Flowers
  • Entities: Order, Delivery, Employee | Relationships: Orders and Delivery, Employees and Deliveries
Show Answer

Correct: C

Which of the following table names is NOT allowed for an order tracking table in SQL?

  • order_table
  • order
  • orders
  • “order”
Show Answer

Correct: B (ORDER is a reserved SQL keyword for sorting)

Which SQL data types best fit this flower ordering system sample data?

  • FlowerID: INT, Price: INT, Delivery: DATE
  • FlowerID: VARCHAR, Price: DECIMAL, DeliveryDate: TIMESTAMP
  • FlowerID: INT, Price: DECIMAL, DeliveryDate: DATETIME
  • FlowerID: BIGINT, Price: STRING, DeliveryDate: TIME
Show Answer

Correct: C

Which SQL constraints best enforce data integrity in a flower ordering system?

  • FlowerID: NOT NULL, Price: UNIQUE, Name: PRIMARY KEY
  • FlowerID: PRIMARY KEY, Price: UNIQUE, Name: NOT NULL
  • FlowerID: UNIQUE, Price: NOT NULL, Name: UNIQUE
  • FlowerID: PRIMARY KEY, Price: NOT NULL, Name: NOT NULL
Show Answer

Correct: D

Which SQL constraint ensures that a flower's price is always greater than zero, enforcing data integrity?

  • DEFAULT
  • NOT NULL
  • CHECK
  • UNIQUE
Show Answer

Correct: C (price > 0),

When is AUTO_INCREMENT not a good choice for generating unique IDs in a flower ordering system?

  • When the system requires sequential numeric Order IDs
  • When orders need to be unique across multiple databases
  • When the database only has a single server handling all orders
  • When the order numbers should always increase without gaps
Show Answer

Correct: B

What database concept did Edgar F. Codd invent?

  • NoSQL Databases
  • The Relational Database Model
  • Key-Value Stores
  • Object-Oriented Databases
Show Answer

Correct: B

What is the main purpose of relational algebra in relational databases?

  • To define how data is stored physically in a database
  • To provide a formal way to manipulate and query relational data
  • To organize data into key-value pairs for faster access
  • To automatically optimize SQL queries for performance
Show Answer

Correct: B

Which relational algebra operation is used to filter rows in a relation?

  • π (Projection)
  • σ (Selection)
  • ∪ (Union)
  • × (Cartesian Product)
Show Answer

Correct: B

Which transformation optimizes the following relational algebra query for a flower ordering system?


                            π_flowerName (σ_price > 10 (Orders ⨝ Flowers))
          
  • σ_price > 10 (π_flowerName (Orders ⨝ Flowers))
  • π_flowerName (σ_price > 10 (Orders ⨝ Flowers))
  • π_flowerName (Orders ⨝ (σ_price > 10 (Flowers)))
  • Orders ⨝ (π_flowerName (Flowers))
Show Answer

Correct: C

In SQLite, how many files are used to store the entire database?

  • One file
  • Two files
  • Three files
  • Multiple files, depending on table count
Show Answer

Correct: A

What is a key difference between SQLite and PostgreSQL in how they run?

  • Both SQLite and PostgreSQL always need a server
  • PostgreSQL can run without setup, but SQLite needs extra configuration
  • SQLite and PostgreSQL work the same way
  • SQLite works without a server, but PostgreSQL needs one
Show Answer

Correct: D

How do a database console, API, and web interface differ?

  • A web interface is text-based, an API is graphical, and a console is automated
  • A console uses commands, an API is for programs, and a web interface has buttons and visuals
  • An API is for users, while a console and web interface automate tasks
  • A console and API are the same, but a web interface has graphics
Show Answer

Correct: B

Which SQL command is used to remove a column from an existing table?

  • DELETE COLUMN
  • DROP COLUMN
  • REMOVE COLUMN
  • ALTER COLUMN
Show Answer

Correct: B or D

Which SQL command is used to delete an entire table from a database?

  • REMOVE TABLE
  • DELETE TABLE
  • DROP TABLE
  • CLEAR TABLE
Show Answer

Correct: C

Which query correctly retrieves flower types that have been ordered more than 10 times?

  • SELECT flower_type, COUNT() FROM orders WHERE COUNT() > 10;
  • SELECT flower_type FROM orders GROUP BY flower_type HAVING COUNT(*) > 10;
  • SELECT flower_type FROM orders WHERE COUNT(*) > 10 GROUP BY flower_type;
  • SELECT flower_type, COUNT() FROM orders HAVING COUNT() > 10 GROUP BY flower_type;
Show Answer

Correct: B

Which JOIN retrieves only matching rows from both tables?

SELECT * FROM employees ___ JOIN departments ON employees.dept_id = departments.id;
  • LEFT
  • RIGHT
  • INNER
  • FULL
Show Answer

Correct: C

Please draw an Entity-Relationship Diagram (ERD) for a flower supply system with two entities. Ensure that your diagram clearly indicates primary keys (PK), foreign keys (FK), and attribute types and constraints.

  • Flower (4 attributes, including a primary key and a foreign key)
  • Supplier (4 attributes, including a primary key)
  • One relationship (one-to-many):
  • One flower is supplied by one supplier
  • One supplier can supply many flowers
Questions
March 19, 2025
Relational Database Transactions
  • ACID Properties
  • Transaction States
  • Concurrency Control
  • Isolation Levels
  • Practice Exercise
Relational Database Transactions

ACID Properties

Atomicity

  • Ensures that all operations in a transaction are completed
  • If any part of the transaction fails, the entire transaction is rolled back

Consistency

  • Ensures the database remains in a valid state before and after transactions
  • Transactions must respect all integrity constraints

Isolation

  • Transactions execute independently without interfering with each other
  • Prevents issues like invalid reads, non-repeatable reads, and phantom reads

Durability

  • Ensures that once a transaction is committed, it remains permanently stored
  • Data is saved even in case of a system crash

A transaction partially executes, and some operations succeed while others fail, leaving the database in an inconsistent state. Which ACID property is violated?

  • Atomicity
  • Consistency
  • Isolation
  • Durability
Show Answer

Correct Answer: A) Atomicity

Two transactions read and update the same record simultaneously, leading to inconsistent data due to uncommitted changes. Which ACID property is violated?

  • Atomicity
  • Consistency
  • Isolation
  • Durability
Show Answer

Correct Answer: C) Isolation

Transaction States

State Description SQL Keyword (SQLite, PostgreSQL, MariaDB, Oracle)
Active The transaction is in progress and operations are being executed BEGIN TRANSACTION; / START TRANSACTION;
Partially Committed All operations have been executed, but the final commit is pending
Committed All changes are permanently saved to the database, completing the transaction COMMIT;
Failed An error occurred during execution, preventing further progress
Rollback The transaction is explicitly reversed and returning the database to its previous state ROLLBACK;

Concurrency Control

What is Concurrency Control?

Concurrency control ensures that multiple transactions can execute simultaneously without interfering with each other, maintaining data integrity and consistency

Problems in Concurrency

  • Invalid Reads - A transaction reads uncommitted changes from another transaction
  • Non-Repeatable Reads - A transaction gets different results when reading the same data multiple times
  • Phantom Reads - A transaction sees new rows inserted by another transaction during its execution

Concurrency Control Methods

Method Description Used In Example (T1, T2, Value X)
Lock-Based Concurrency Transactions acquire locks on data to prevent conflicts PostgreSQL, MySQL (InnoDB), Oracle T1 locks X=10 and updates it to X=20
T2 tries to update X but must wait until T1 commits or rolls back
Optimistic Concurrency Control Transactions proceed without locking, but check for conflicts before committing SQLite, PostgreSQL (MVCC), Oracle T1 reads X=10, performs calculations
T2 updates X=20 and commits
T1 tries to commit X=15 but detects conflict and must restart
Timestamp-Based Concurrency Each transaction gets a timestamp and is executed based on ordering rules Oracle, PostgreSQL (MVCC), SQL Server T1 (timestamp 100) reads X=10
T2 (timestamp 200) updates X=20 and commits
T1 tries to update X=15 but is aborted due to an older timestamp
Multiversion Concurrency Control (MVCC) Each transaction sees a snapshot of the database, avoiding locks PostgreSQL, MySQL (InnoDB), Oracle T1 starts and sees X=10
T2 updates X=20 and commits
T1 still sees X=10 until it commits and starts a new transaction
Method Best When ✅ Not Ideal When ❌
Lock-Based Concurrency ✅ High-conflict environments where strict consistency is required (e.g., financial transactions) ❌ High-read workloads, as locks can cause contention and slow down performance
Optimistic Concurrency Control (OCC) ✅ Read-heavy workloads with infrequent conflicts (e.g., analytics, dashboards) ❌ High-write workloads where conflicts are common, leading to frequent transaction retries
Timestamp-Based Concurrency ✅ Systems that require strict ordering of transactions (e.g., blockchain, distributed databases) ❌ Systems where real-time changes need immediate visibility to other transactions
Multiversion Concurrency Control (MVCC) ✅ Databases with many concurrent reads and some writes, avoiding read locks (e.g., PostgreSQL, MySQL InnoDB) ❌ When strict consistency is required, as it may allow anomalies like write skew

Isolation Control

Isolation ensures that transactions execute independently, preventing conflicts and inconsistencies when multiple transactions run concurrently

Isolation Level Description Best When ✅ Not Ideal When ❌
Read Uncommitted Allows transactions to read uncommitted changes from other transactions, leading to invalid reads ✅ Performance is the highest priority and minor inconsistencies are acceptable ❌ Data consistency is required, as invalid reads can occur
Read Committed Transactions can only read committed data, preventing invalid reads but allowing non-repeatable reads ✅ General-purpose applications where occasional inconsistencies are tolerable ❌ When repeatable reads are necessary to maintain consistency
Repeatable Read Ensures that transactions read the same data multiple times without seeing changes from other transactions ✅ When avoiding non-repeatable reads is critical, such as in financial applications ❌ Systems with high concurrency, as it may reduce performance due to increased locking
Serializable Enforces full isolation by executing transactions as if they run sequentially, eliminating concurrency issues ✅ When absolute consistency is required, such as in banking and accounting systems ❌ High-concurrency environments, as it can lead to significant performance bottlenecks
Isolation Level Transaction Step Value of X
Read Uncommitted T1 Starts and reads X X=10
T2 Starts, updates X to 20, does not commit X=20 (invalid read visible to T1)
T1 Reads X again X=20
T2 Rolls back changes X=10 (T1 already saw uncommitted changes)
Read Committed T1 Starts and reads X X=10
T2 Starts, updates X to 20, commits X=20
T1 Reads X again X=20 (sees committed changes)
T2 Updates X to 25, commits X=25 (T1 sees new committed changes)
Repeatable Read T1 Starts and reads X X=10
T2 Starts, updates X to 20, commits X=20
T1 Reads X again X=10 (does not see T2's changes)
T1 Reads X once more X=10 (still sees original snapshot)
Serializable T1 Starts and reads X X=10
T2 Starts, tries to update X to 20 Blocked by T1
T1 Commits X=10
T2 Updates X to 20 X=20 (now visible to new transactions)

Which isolation level allows invalid reads, making it the least strict but fastest?

  • Serializable
  • Repeatable Read
  • Read Committed
  • Read Uncommitted
Show Answer

Answer: D) Read Uncommitted

Which isolation level is generally recommended for preventing anomalies while maintaining good concurrency in most applications?

  • Serializable
  • Repeatable Read
  • Read Committed
  • Read Uncommitted
Show Answer

Answer: C) Read Committed

Practice Exercise

Hotel Reservation


    SELECT RoomID FROM Rooms 
    WHERE RoomID NOT IN (
        SELECT RoomID FROM Reservations WHERE CheckInDate < '2025-06-15' AND CheckOutDate > '2025-06-10') 
    LIMIT 1;

    INSERT INTO Reservations (ReservationID, CustomerID, RoomID, CheckInDate, CheckOutDate) 
    VALUES (202, 15, 301, '2025-06-10', '2025-06-15');

    INSERT INTO Payments (PaymentID, ReservationID, Amount, PaymentDate) 
    VALUES (501, 202, 499.99, '2025-03-15');
          
ACID Properties: Atomicity

What happens if a transaction fails during execution?

  • All changes are permanently saved
  • Only successful operations are kept
  • All changes are rolled back
  • The database remains locked
ACID Properties: Consistency

Which statement best describes the Consistency property?

  • Data is always encrypted
  • Database moves from one valid state to another
  • Records are stored in sequential order
  • All users see the same data version
ACID Properties: Isolation

What is the primary purpose of the Isolation property?

  • To prevent unauthorized access
  • To make transactions execute faster
  • To prevent concurrent transactions from interfering
  • To ensure data encryption

Overlapping Hotel Reservations

ReservationID RoomID CheckInDate CheckOutDate
301 101 2025-06-08 2025-06-12
302 101 2025-06-10 2025-06-15
303 101 2025-06-14 2025-06-18
304 102 2025-06-15 2025-06-20
305 103 2025-06-01 2025-06-05

Hotel Reservation with Transaction


            START TRANSACTION;
        
            SELECT RoomID FROM Rooms 
            WHERE RoomID NOT IN (
                SELECT RoomID FROM Reservations WHERE CheckInDate < '2025-06-15' AND CheckOutDate > '2025-06-10') 
            LIMIT 1
            FOR UPDATE;
        
            INSERT INTO Reservations (ReservationID, CustomerID, RoomID, CheckInDate, CheckOutDate) 
            VALUES (202, 15, 301, '2025-06-10', '2025-06-15');
        
            INSERT INTO Payments (PaymentID, ReservationID, Amount, PaymentDate) 
            VALUES (501, 202, 499.99, '2025-03-15');

            COMMIT;
          

Transaction Locking in Hotel Reservation

Time Transaction 1 (User A) – Locks Room 301 Transaction 2 (User B)
T1 START TRANSACTION;
T2 SELECT RoomID FROM Rooms WHERE RoomID NOT IN (...) LIMIT 1 FOR UPDATE;
(Locks Room 301)
T3 INSERT INTO Reservations (...); (Waiting... Cannot access Room 301)
T4 INSERT INTO Payments (...); (Waiting... Still blocked)
T5 COMMIT; (Lock on Room 301 released)
T6 (Transaction 1 is done) SELECT RoomID FROM Rooms WHERE RoomID NOT IN (...) LIMIT 1 FOR UPDATE;
(Locks next available room)

ACID Properties in Transactions

Property Description
Atomicity Ensures that a transaction is fully completed or fully rolled back. No partial execution
Consistency Ensures that the database remains in a valid state before and after a transaction
Isolation Ensures that concurrent transactions do not interfere with each other
Durability Ensures that once a transaction is committed, it is permanently saved even in case of system failure

Oracle 1977 - today

Year Milestone
1977 Oracle Corporation founded in Santa Clara, California, by Larry Ellison
1979 Oracle Version 2 – first commercially available SQL-based RDBMS (1982 IBM DB2)
1983-1985/td> Introduced transaction management, concurrency control, read consistency, and client-server architecture
1988-1992 PL/SQL added, along with stored procedures, triggers, and integrity constraints
1997-1999 Object-relational features, partitioning, built-in Java support, and internet integration
2001-2003 Added XML support, Real Application Clusters (RAC), grid computing, and auto-tuning
2007-2013 Advanced compression, partitioning, multi-tenant architecture for cloud
2018-2021 AI-powered autonomous database, blockchain tables, and machine learning optimizations

Relational Databases

Feature SQLite PostgreSQL MariaDB Oracle DB
Storage Single-file database (lightweight) Client-server architecture Client-server architecture Client-server architecture
ACID Compliance Yes Yes Yes Yes
Best For Embedded systems, mobile apps Complex queries, enterprise apps MySQL alternative, web apps Large-scale enterprise solutions
Performance Fast for small databases Optimized for complex queries Optimized for web applications Highly optimized for enterprise workloads
Scalability Limited (single file) High (supports replication, clustering) High (supports clustering, replication) Very High (supports RAC, sharding)
Concurrency Limited (single writer, multiple readers) Strong concurrency support Good concurrency support Advanced concurrency management
License Public domain (free) Open-source (PostgreSQL License) Open-source (GPL) Proprietary (some free versions)

Licensing: Oracle vs Others

Database License Type Cost
Oracle Proprietary (Oracle License) Paid (Free for limited versions like Oracle XE)
PostgreSQL Open-source (PostgreSQL License, BSD-like) Free
MariaDB Open-source (GPL v2) Free
SQLite Public domain (No restrictions) Free
March 21, 2025

Lab Practice

The database system shall store customer information, furniture items, and orders in relational tables. Each order must link to one or more furniture items and update inventory upon placement. The system should allow staff to retrieve order history and identify low-stock items using SQL queries.
Module 11
Relational Database Security 🔒
March 24, 2025
Relational Database Security
  • Database Security
  • Database Access Control
  • Database Fragmentation
  • Database Encryption
  • Database SQL Injection
  • Database Query Log

Database Security

Zero Trust in Databases

  • Never trust by default
  • Authenticate every access request
  • Enforce least privilege at all levels
  • Continuously monitor and log activity
  • Combine RBAC, IP rules, and encryption

Database Access Control

What Is Access Control?

Controls who can access data

  • Prevents unauthorized access
  • Enforces least privilege
  • Core to database security

API Key Access Control

Restrict DB/API access via unique API keys

  • Map keys to user roles
  • Used in web apps, microservices
  • Regenerate or revoke compromised keys

DB Connection URL


        postgresql://username:password@host:port/database?sslmode=require
          
  • username: neondb_owner
  • password: npg_M5sVheSzQLv41234 (acts like API key)
  • host: ep-shrill-tree-a819xf7v-pooler.eastus2.azure.neon.tech
  • database: neondb
  • sslmode: require (enforces encrypted connection)

IP-Based Access Control

Restrict DB/API access to specific IP addresses

  • Allowlist trusted IPs only
  • Block unauthorized external sources
  • Used in firewalls and DB configs

Access Control Models

  • DAC: Owner controls access
  • MAC: Based on security labels
  • RBAC: Role defines permissions

Access Control Models: Use Case

Model Who Controls Access? Access Logic Example Supported by Relational DBs
RBAC System (via Roles) User has role with permission Doctor role can read patient record ✅ Widely Supported
DAC Record Owner Owner grants access Patient grants doctor access ✅ Partially Supported
MAC System (via Labels) Access requires clearance Only staff with "Confidential" clearance can read ❌ Rarely Supported

Mandatory Access Control (MAC)

  • System enforces security labels
  • Access requires clearance level
  • Used in military and classified systems

Discretionary Access Control (DAC)

  • Object owners define access
  • Permissions can be delegated
  • Simple but hard to audit at scale

DAC: Granting User Access

GRANT SELECT ON patient_records TO doctor_alex;
  • Owner of the table grants access
  • Follows discretionary access model
  • Permissions can be passed to others (if allowed)

Role-Based Access Control (RBAC)

  • Users → Roles → Permissions
  • Centralized permission management
  • Scales well in large systems
Full Stack Engineer

Creating and Assigning Roles (UA)

CREATE ROLE physician;
CREATE USER dr_smith WITH PASSWORD 'secure123';
GRANT physician TO dr_smith;

Common Privileges/Permissions (P)

  • SELECT, INSERT, UPDATE, DELETE
  • CREATE, DROP, ALTER
  • GRANT, REVOKE

Granting Permissions (PA)

GRANT SELECT ON employees TO physician;

Allows read-only access to a table

Revoking Permissions (PA)

REVOKE SELECT ON employees FROM physician;

Removes previously granted privileges

Revoke All Permissions


        REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM PUBLIC;
          
Full Stack Engineer

Granular Access Support

Database RBAC Support IP-Based Access API Key Control
SQLite External Only
PostgreSQL ✅ Column-Level ✅ via pg_hba.conf External Only
MariaDB ✅ Roles + Fine-Grained ✅ via host-based grants External Only
OracleDB ✅ RBAC + VPD ✅ via network ACLs External Only

Database Fragmentation

Database Vertical Fragmentation for Privacy

  • Separates identifiers from sensitive content
  • Example: store patient names in one table/database, diagnoses in another
  • Reduces risk of re-identification if breached
  • Used in privacy-preserving and regulatory-compliant systems
Full Stack Engineer

Summary

  • Start with API key or user identity
  • Use RBAC to simplify privilege logic
  • Use GRANT/REVOKE for control
  • Choose DB based on granularity needs

Database Encryption

Full Stack Engineer

Plain Text vs. Encrypted Text

Plain Text Encrypted Text
Patient Name: Alice U2FsdGVkX1+gTx4kV7K9Zw==
SSN: 123-45-6789 8b1a9953c4611296a827abf8c47804d7
Diagnosis: Flu 5d41402abc4b2a76b9719d911017c592

What Is Database Encryption?

  • Protects data by converting it to unreadable form
  • Requires a key to decrypt and access
  • Mitigates data breach impact

Why Encrypt Databases?

  • Protect sensitive data at rest and in transit
  • Support regulatory compliance (e.g., GDPR, HIPAA)
  • Prevent insider and external threats

Types of Encryption

  • At Rest: Encrypts stored data (e.g., disk, tablespace)
  • In Transit: Encrypts data over the network
  • Column-Level: Encrypt specific fields (e.g., SSN)

Transparent Data Encryption (TDE)

  • Encrypts data files automatically
  • Keys managed by the DB engine
  • Minimal application changes required

Example: Column Encryption

CREATE TABLE patients (
          id SERIAL,
          name TEXT,
          ssn TEXT ENCRYPTED FOR STORAGE
        );

Key Management

  • Keys can be internal or managed externally (KMS, HSM)
  • Rotation and protection are critical
  • Lost keys = lost data

Performance Considerations

  • Encryption adds CPU overhead
  • Column-level is slower but more selective
  • Use indexing carefully on encrypted fields

Risks Without Encryption

  • Plaintext data exposed in disk theft or backups
  • Network sniffing leaks sensitive data
  • Non-compliance with data protection laws

Risks With Encryption?

Plain Text Encrypted Text
Patient Name: Alice U2FsdGVkX1+gTx4kV7K9Zw==
SSN: 123-45-6789 8b1a9953c4611296a827abf8c47804d7
Diagnosis: Flu 5d41402abc4b2a76b9719d911017c592

Database Encryption Support

Database Encryption at Rest Encryption in Transit Column-Level TDE Support
SQLite ❌ (via extensions)
PostgreSQL ✅ (manual) ✅ (SSL) ✅ (PGCrypto)
MariaDB ✅ (via functions)
OracleDB

Summary

  • Encrypt data to reduce breach risks
  • Choose level: full, column, or in transit
  • Use strong key management practices
  • Verify DB feature support before implementation

Database SQL Injection

What Is SQL Injection?

  • Attacker manipulates SQL queries via user input
  • Can expose, modify, or delete data
  • Common in apps that use unsafe query building

Vulnerable Python Example


        user = input("Enter username: ")
        query = f"SELECT * FROM users WHERE name = '{user}'"
        cursor.execute(query)
          

Injection Attack Example


        Input: alice' OR '1'='1
        Query: SELECT * FROM users WHERE name = 'alice' OR '1'='1'
          

Bypasses authentication and returns all users

Safe Python Example (Parameterized)


        user = input("Enter username: ")
        query = "SELECT * FROM users WHERE name = %s"
        cursor.execute(query, (user,))
          

Using parameters prevents SQL injection

SQL Injection Prevention Tips

  • Always use parameterized queries
  • Never concatenate raw input into SQL
  • Use ORM libraries (e.g., SQLAlchemy, Django ORM)
  • Sanitize and validate input when needed

Database Query Log

Why Log SQL Queries?

  • Track suspicious or slow queries
  • Monitor access patterns
  • Detect SQL injection attempts
  • Improve performance and security

Basic Logging in Python


        import logging
        logging.basicConfig(level=logging.INFO)
        
        query = "SELECT * FROM patients"
        logging.info(f"Running query: {query}")
        cursor.execute(query)
          

Logging Query with Parameters


        query = "SELECT * FROM patients WHERE name = %s"
        params = ("Alice",)
        logging.info("Query: %s | Params: %s", query, params)
        cursor.execute(query, params)
          

Enabling SQL Logs in Databases

  • PostgreSQL: log_statement = 'all'
  • MySQL/MariaDB: general_log = ON
  • SQLite: Use logging in wrapper code

Best Practices for Query Logging

  • Mask sensitive values in logs (e.g., passwords)
  • Use rotating log files to manage size
  • Centralize logs for analysis (e.g., ELK stack)
  • Correlate with user actions for traceability
Questions
Case Study 1: Stolen Laptop

A doctor’s laptop with a local database of patient records is stolen. The database file is unprotected and contains names, SSNs, and diagnoses

What security measure would have reduced the risk?

Case Study 2: Unusual Patient Record Access

A staff member accessed over 200 patient records within 5 minutes — far beyond typical behavior for their role

What access control or monitoring strategy should be used?

Case Study 3: SQL Injection Breach

A public-facing web app allowed attackers to run raw SQL through a login form, exposing the entire user database

What protection should have been implemented?

Questions
Homework 4 (Mar/31), Lab 4 (Mar/31), Project Part 2 (Apr/20)
March 26, 2025
Relational Database Security Implementation
  • Database Encryption PostgreSQL
  • Database Hashing PostgreSQL
  • Database IP-Based (Network) Access Control PostgreSQL
  • Database User-Based Access Control
  • Database Role-Based Access Control
  • Database Fragmentation
  • Database SQL Parameterized
  • Database Query Log PostgreSQL and SQLite
  • Database Encryption SQLCipher
  • Database Case Study
Database Encryption PostgreSQL
Neon Tech Project Console

Create Extension


            CREATE EXTENSION IF NOT EXISTS pgcrypto;
          

PGP Encryption

Technical Aspect PGP Encryption (Asymmetric) Symmetric Encryption (AES, DES)
Key Type Public-Private Key Pair Single Shared Secret Key
Encryption Speed Slower; computationally intensive Faster; suitable for large data
Key Management Complex; public key distribution required Simpler; secure key sharing essential
Security Strength High; secure over insecure channels Medium to High; risk if key compromised
Use Case Examples Email encryption, digital signatures File encryption, database encryption

Reference: PostgreSQL pgcrypto Documentation

Drop Table


            DROP TABLE secure_data;
          

Create Table


            CREATE TABLE IF NOT EXISTS secure_data(
              id SERIAL PRIMARY KEY,
              sensitive_info BYTEA NOT NULL
            );
        

Insert Encrypted Data


    INSERT INTO secure_data (sensitive_info)
    VALUES (pgp_sym_encrypt('Secret Information', 'my_strong_password'));
          

Select Encrypted Data


    SELECT id, ENCODE(sensitive_info, 'hex') AS encrypted_info 
    FROM secure_data;
          

Decrypt Data


    SELECT id, pgp_sym_decrypt(sensitive_info, 'my_strong_password') 
    AS decrypted_info 
    FROM secure_data;
          
Database Hashing PostgreSQL

Hashing vs. Encryption

  • Hashing: One-way, irreversible; used for password storage and verification
  • Encryption: Two-way, reversible; protects sensitive data, allows secure retrieval

Create User with SCRAM-SHA-256


    SET password_encryption = 'scram-sha-256';
    CREATE ROLE username WITH LOGIN PASSWORD 'your_secure_password';
        

List All Users and Roles


        SELECT rolname, rolsuper, rolcreaterole, rolcreatedb, rolcanlogin
        FROM pg_roles;
          

Create User with MD5 (Legacy)


    SET password_encryption = 'md5';
    CREATE ROLE username WITH LOGIN PASSWORD 'your_secure_password';
        
Database IP-Based (Network) Access Control PostgreSQL
Neon Network Security
Questions
Database User-Based Access Control
Create Users (Flower Store)

    CREATE USER Bob PASSWORD 'bob_pass_123!';
    CREATE USER Alice PASSWORD 'alice_pass_123!';
          
Grant Permissions to Users

        -- Grant full access to Bob
        GRANT ALL PRIVILEGES ON Flowers TO Bob;
        
        -- Grant read-only access to Alice
        GRANT SELECT ON Flowers TO Alice;
          
List and Switch Users

        -- Show all users and roles
        SELECT rolname FROM pg_roles;
        
        -- Switch user temporarily
        SET ROLE Alice;
        
        -- Switch back
        RESET ROLE;
          
Test Queries with User Permissions

        -- Switch to user 'Alice' (read-only)
        SET ROLE Alice;
        
        -- Successful query (Alice can SELECT)
        SELECT * FROM flowers;
        
        -- Failed query (Alice cannot DELETE)
        DELETE FROM flowers WHERE id = 1;
        
        RESET ROLE;
        
        -- Switch to user 'bob' (full access)
        SET ROLE Bob;
        
        -- Successful query (Bob can DELETE)
        DELETE FROM flowers WHERE id = 1;
        
        RESET ROLE;
          
Revoke User Access

    REVOKE INSERT ON flowers FROM Bob;
          
Questions
Database Role-Based Access Control
Create Roles

        CREATE ROLE viewer;
        CREATE ROLE manager;
          
Create Users and Assign Roles

        CREATE USER alice PASSWORD 'alice_pass';
        CREATE USER bob PASSWORD 'bob_pass';
        
        GRANT viewer TO alice;
        GRANT manager TO bob;
          
Grant Permissions to Roles

        GRANT SELECT ON flowers_core TO viewer;
        GRANT SELECT, INSERT, UPDATE ON flowers_pricing TO manager;
          
Test Role-Based Access

        GRANT alice TO current_user;
        SET ROLE alice;
        SELECT * FROM flowers_core;
        
        SELECT * FROM flowers_pricing;
        
        RESET ROLE;
          
Questions
Database Fragmentation
Split Table: Core Info

        CREATE TABLE flowers_core (
          id INTEGER PRIMARY KEY,
          name TEXT
        );
          
Split Table: Pricing Info

        CREATE TABLE flowers_pricing (
          id INTEGER PRIMARY KEY,
          price REAL
        );
          
Insert Fragmented Data

        INSERT INTO flowers_core (id, name) VALUES (1, 'Rose');
        INSERT INTO flowers_pricing (id, price) VALUES (1, 2.99);
          
Assign Access to Users

        GRANT SELECT ON flowers_core TO viewer;
        GRANT SELECT, INSERT, UPDATE ON flowers_pricing TO manager;
          
Questions
Database SQL Parameterized
Safe SQLite Parameter
github.com/SE4CPS/DMS/sql-param.py
Database Query Log PostgreSQL and SQLite
Technical Area Importance of Logging
Query Performance Identify slow queries and optimize them
Security Track unauthorized access attempts
Error Diagnosis Understand and debug failed operations
Auditing Maintain compliance and trace changes
Replication & Backup Ensure consistency and troubleshoot issues

        -- In postgresql.conf:
        logging_collector = on
        log_directory = 'log'
        log_filename = 'postgresql-%Y-%m-%d.log'
        log_statement = 'all'
        log_min_error_statement = error
        log_min_messages = warning
          
Neon Query History
Grant Access and Test as Alice

        GRANT USAGE ON SCHEMA public TO Alice;
        GRANT SELECT ON team1_flowers TO Alice;
        -- GRANT ALL PRIVILEGES ON team1_flowers TO Alice;
        GRANT Alice TO current_user;
        SET ROLE Alice;
        SELECT * FROM team1_flowers;
        RESET ROLE;
          
SQLite Query Logs - Option 1

        sqlite3 mydatabase.db
        .echo ON
        .tables
        .schema
          
SQLite Query Logs - Option 2

        sqlite3 mydatabase.db
        .echo ON
        .output my_log.txt
        .schema
          
Questions
Database Encryption SQLCipher

Why Encrypted SQLite?

  • Mobile: Android, iOS smartphones and tablets
  • Wearables: Smartwatches, fitness trackers
  • Smart TV: Securing user preferences and viewing history
  • Desktop: Windows, macOS, Linux apps
  • Compliance: Ensures GDPR, HIPAA, and security standards compliance

Clone SQLCipher Repository

git clone https://github.com/sqlcipher/sqlcipher.git

SQLCipher Encryption

Technical Aspect SQLCipher Standard SQLite
Encryption Method Transparent AES-256 Encryption No built-in encryption
Data-at-rest Security Strong; protects database files on disk Weak; plaintext data storage
Performance Impact Moderate overhead due to encryption operations Minimal; faster access (unencrypted)
Use Cases Mobile apps, sensitive data storage General-purpose lightweight storage
Key Management Application-managed passphrase Not applicable

Reference: SQLCipher GitHub Documentation

Encrypting with SQLCipher


        sqlcipher mydatabase.db
        
        sqlite> PRAGMA key = 'your_secure_password';
        sqlite> CREATE TABLE flowers(id INTEGER, name TEXT);
        sqlite> INSERT INTO flowers VALUES (1, 'Rose');
        sqlite> .quit
          
Questions

⚠️ No Access Control in SQLite

  • SQLite does not support user roles or permissions
  • Anyone with access to the database file has full control
  • No built-in authentication or authorization
  • Use OS-level file permissions or encryption (e.g., SQLCipher) for protection
March 26, 2025
Relational Database Security Implementation
  • Database Encryption PostgreSQL
  • Database Hashing PostgreSQL
  • Database IP-Based (Network) Access Control PostgreSQL
  • Database User-Based Access Control
  • Database Role-Based Access Control
  • Database Fragmentation
  • Database SQL Parameterized
  • Database Query Log PostgreSQL and SQLite
  • Database Encryption SQLCipher
  • Database Case Study
Database Case Study
Case Study: Bike Rental Platform

A bike rental company uses a PostgreSQL database to manage rentals, customer accounts, and location check-ins.

  • Some team members manage prices, others handle station maintenance records
  • Rental submissions come from a public web app
  • Data should be monitored for abuse or repeated failures

Which role-based controls, logging settings, and SQL security features would you apply?

Case Study: Furniture Store Inventory

A local furniture business tracks product inventory, employee tasks, and customer orders in PostgreSQL.

  • Warehouse staff update stock; sales staff access product and pricing info
  • Admins need analytics logs without modifying data
  • Mobile apps use local SQLite for offline browsing

How would you design user roles, query logging, and local database encryption?

Case Study: Flower Store Order System

A flower shop uses both an online order system and an in-store terminal, with all data synced to a central PostgreSQL database.

  • Customer input is stored, including flower types and delivery info
  • Employees have different access levels (viewer vs. manager)
  • Product details and prices are fragmented into separate tables

What access control model, SQL safety, and fragmentation strategy makes sense here?

Case Study: Unusual Patient Record Access

A staff member accessed over 200 patient records within 5 minutes — far beyond typical behavior for their role

What access control or monitoring strategy should be used?

Case Study: SQL Injection Breach

A public-facing web app allowed attackers to run raw SQL through a login form, exposing the entire user database

What protection should have been implemented?

March 28, 2025
Lab - Database System for Hiking Store
Module 12
Relational Database and Data Distribution
March 31, 2025
Relational Database and Data Distribution
  • Understanding Distributed Databases
  • Horizontal vs Vertical Partitioning
  • Data Replication
  • CAP Theorem
  • Consistency Models
  • Implementation
  • Limitations
Understanding Distributed Databases
  1. Handle large volumes of data (e.g., petabytes across clusters)
  2. Improve system availability (e.g., 99.999% uptime with redundancy)
  3. Reduce latency by locating data closer to users (e.g., 100ms round-trip)
  4. Enable fault tolerance and recovery (e.g., auto-replication in seconds)
  5. Support global applications (e.g., users across continents)
  6. Scale out easily with new nodes (e.g., linear scale: +1 node → +x capacity)
  7. Enable parallel processing of queries (e.g., 2–5x speedup over single-node)
  8. Avoid single point of failure (e.g., multi-main or quorum-based design)
  9. Comply with data locality regulations (e.g., GDPR, data stored per region)
Handle Large Volumes of Data
  • Data can reach petabytes (10¹⁵ bytes) in size
  • Distributed clusters spread the load
  • Enables scalable storage and processing
  • Supports growing applications and user bases
  • (e.g., petabytes across clusters — 10¹⁵)
  • Traditional SQL DB limit: ~1–100 TB (terabytes — 10¹²) per node
Horizontal vs Vertical Partitioning
Horizontal vs Vertical Partitioning
Aspect Horizontal Partitioning Vertical Partitioning
Definition Rows are split across nodes (sharding) Columns are split across nodes
Structure Each shard has all columns but different rows Each partition holds subset of columns
Performance Good for large-scale read/write traffic Good for optimizing access to specific attributes
Use Case User data by region or ID Separating frequently vs infrequently accessed fields
Challenge Re-assembling data across shards Joining data from multiple partitions
Horizontal vs Vertical Partitioning
  1. Horizontal Partitioning (Sharding): Split by rows
    e.g., Users with ID 1–1000 on Node A, 1001–2000 on Node B
  2. Vertical Partitioning: Split by columns
    e.g., Contact Info on Node A, Login Credentials on Node B
  3. Used to improve performance, scalability, and data locality
  4. Trade-off: More joins or routing logic in queries
Data Replication
Data Replication in Distributed Databases
Replication Type Description Pros Cons
Primary-Secondary One primary node handles writes, secondaries replicate and serve reads Simpler to manage; Good for read-heavy workloads Primary can become a bottleneck; No write scalability
Multi-Primary Multiple nodes can accept writes and sync changes Increased availability and write scalability Conflict resolution can be complex
Peer-to-Peer All nodes are equal and can read/write Highly scalable; Fault-tolerant Complex conflict resolution; Requires more coordination
Synchronous Data is replicated to all nodes in real-time Strong consistency Higher latency; Reduced availability during failures
Asynchronous Replication happens with a delay Low latency; Better performance Risk of data inconsistency; May cause eventual consistency issues
CAP Theorem
CAP Theorem
  1. Consistency: Every read gets the latest write
  2. Availability: Every request gets a response, even if not the latest
  3. Partition Tolerance: System continues despite network failures
  4. Only two out of the three can be fully achieved at once
  5. Trade-offs shape database design (e.g., CP, AP, or CA)
Why Single-Node MySQL Supports AC
  • Consistency: Follows ACID — every transaction is correct and reliable
  • Availability: No network partitions in a single-node setup
  • All writes go to one place — no replication delay
  • Data is durable and isolated in local storage
  • Partition Tolerance is not required in a local environment
Consistency Models
Consistency Models in Distributed Databases
Model Description Use Case
Strong Consistency Every read reflects the most recent write Transactions, critical systems
Eventual Consistency Reads may be stale, but data will converge over time Social media feeds, caches
Causal Consistency Operations that are causally related are seen in order Collaborative apps, messaging
Read-Your-Writes A process always reads its own updates User sessions, dashboards
Monotonic Reads Reads never go backward in time Progressive data views
Implementation
Replication Across PostgreSQL Nodes 🌷

import psycopg2
conns = [psycopg2.connect(host=h, dbname="garden", user="u", password="p") for h in ["db1", "db2", "db3"]]
def replicate(flower_id, flower_data):
    for conn in conns:
        conn.cursor().execute("INSERT INTO flowers (id, data) VALUES (%s, %s)", (flower_id, flower_data))
        conn.commit()
replicate(7, '{"type": "Rose", "color": "Red"}')
          
Horizontal Partitioning Across PostgreSQL Nodes 🌼

import psycopg2
conns = [psycopg2.connect(host=h, dbname="garden", user="u", password="p") for h in ["db1", "db2", "db3"]]
def insert_shard(flower_id, flower_data):
    conn = conns[flower_id % len(conns)]
    conn.cursor().execute("INSERT INTO flowers (id, data) VALUES (%s, %s)", (flower_id, flower_data))
    conn.commit()
insert_shard(7, '{"type": "Rose", "color": "Red"}')
          
Semantic Horizontal Partitioning 🌹

import psycopg2
partitions = {
    "Rose": psycopg2.connect(host="db1", dbname="garden", user="u", password="p"),
    "Tulip": psycopg2.connect(host="db2", dbname="garden", user="u", password="p"),
    "Daisy": psycopg2.connect(host="db3", dbname="garden", user="u", password="p")
}
def insert_by_type(flower_type, flower_data):
    conn = partitions[flower_type]
    conn.cursor().execute("INSERT INTO flowers (type, data) VALUES (%s, %s)", (flower_type, flower_data))
    conn.commit()
insert_by_type("Rose", '{"color": "Red", "season": "Spring"}')
          
Modern Frameworks Supporting Distributed Databases
  • TiDB: Distributed SQL engine with horizontal scalability (MySQL compatible)
  • CockroachDB: SQL-based, distributed transactions, strongly consistent (CP)
  • FoundationDB: ACID-compliant key-value store, used in Apple ecosystem
  • Apache Cassandra: Decentralized, wide-column store, CP-oriented
  • MongoDB: Document database with sharding and replica sets (AC)
  • Amazon DynamoDB: Managed NoSQL with partitioning and replication (AP)
  • FaunaDB: Globally distributed serverless database with strong consistency
Limitations
Limitations: Distributed vs SQL Databases
Aspect Distributed Databases Traditional SQL Databases
Complexity High setup & maintenance complexity Simpler to deploy and manage
Consistency Eventual consistency trade-offs Strong ACID consistency
Latency Higher due to network coordination Lower in local settings
Scalability Horizontally scalable Limited vertical scalability
Query Support Limited joins, often denormalized Rich SQL query and joins
CAP Theorem Cannot guarantee all 3: Consistency, Availability, Partition Tolerance Not designed for partition tolerance
Questions

Which describes semantic horizontal partitioning in distributed databases?

  • Data is split by fixed-size blocks and distributed randomly
  • Rows are partitioned based on row ID modulo node count
  • Data is routed to nodes based on meaningful attributes like category or type
  • Each column is stored on a different node
Answer   C

What does the CAP theorem state about distributed systems?

  • All three properties—Consistency, Availability, and Partition Tolerance—can always be guaranteed
  • Only Consistency and Availability matter in practice
  • Only two of Consistency, Availability, and Partition Tolerance can be fully guaranteed
  • CAP applies only to SQL databases
Answer   C

Which database is commonly categorized as CP in the CAP theorem?

  • MongoDB
  • CouchDB
  • DynamoDB
  • MySQL
Answer   B

What kind of partitioning splits data by rows?

  • Vertical partitioning
  • Horizontal partitioning
  • Semantic indexing
  • Data denormalization
Answer   B

In distributed databases, what is a typical challenge of replication?

  • Ensuring real-time video rendering
  • Achieving strong consistency across nodes
  • Disabling logging and backups
  • Maintaining monolithic code structure
Answer   B

Which replication type allows multiple nodes to accept writes?

  • Primary-Secondary
  • Multi-Primary
  • Read-Only
  • Snapshot-based
Answer   B

Which consistency model ensures every read returns the latest write?

  • Eventual Consistency
  • Monotonic Reads
  • Causal Consistency
  • Strong Consistency
Answer   D

What kind of partitioning splits columns across different nodes?

  • Vertical partitioning
  • Horizontal partitioning
  • Logical segmentation
  • Key-based sharding
Answer   A

Why is single-node MySQL considered AC (not CP)?

  • It supports network partition tolerance
  • It guarantees consistency and availability in a local setup
  • It scales horizontally by default
  • It uses a NoSQL backend
Answer   B
April 2, 2025
Hands-on Distributed SQL with Python
  • Simulating Distribution with PostgreSQL and SQLite
  • Data Replication Logic in Python
  • Horizontal Partitioning: Hash-Based and Semantic
  • Tracking with UUIDs Across Nodes
  • Fallback Reads and Resilient Lookups
  • Manual Syncing Between Nodes
  • Aggregating and Querying Distributed Data
Simulating Distribution with PostgreSQL and SQLite
  • Two-node setup: local (SQLite) + remote (PostgreSQL)
  • Simulates a distributed system with minimal tools
  • Python handles routing and writes to appropriate nodes
  • Highlights design trade-offs between local and remote writes
Data Replication Logic in Python
  • Insertions replicated to both PostgreSQL and SQLite
  • Demonstrates write duplication across systems
  • Shows partial failure tolerance in distributed writes
  • Basis for building consistency or sync logic

What is a benefit of inserting the same flower data into both PostgreSQL and SQLite?

  • Increases local latency
  • Reduces RAM usage
  • Improves availability through replication
  • Forces vertical partitioning
Answer   C
Horizontal Partitioning: Hash-Based and Semantic
  • Hash-based: uses modulo logic for node assignment
  • Semantic: routes by flower type (e.g., Rose → SQLite)
  • Demonstrates custom sharding strategies in Python
  • Builds intuition around partitioning rules

Which node stores "Rose" flower data in our semantic partitioning example?

  • PostgreSQL
  • Redis
  • SQLite
  • MySQL
Answer   C

Which concept does hash-based partitioning demonstrate in our example?

  • Random access pattern
  • Deterministic row distribution based on ID
  • Query optimization by join reordering
  • Vertical partitioning by field type
Answer   B

Which describes semantic horizontal partitioning in distributed databases?

  • Data is split by fixed-size blocks and distributed randomly
  • Rows are partitioned based on row ID modulo node count
  • Data is routed to nodes based on meaningful attributes like category or type
  • Each column is stored on a different node
Answer   C
Tracking with UUIDs Across Nodes
  • Each flower has a globally unique ID
  • Allows unified tracking across both databases
  • UUID enables cross-node consistency checks
  • Prepares students for real-world multi-node sync

What does the UUID field help us achieve in a distributed database?

  • Improve compression
  • Track identical records across different nodes
  • Increase query speed
  • Reduce memory usage
Answer   B

Why do we use UUID instead of primary key when syncing data?

  • Primary keys are always duplicated
  • UUIDs are guaranteed to be unique across systems
  • UUIDs improve performance of joins
  • UUIDs make databases faster to start
Answer   B
Fallback Reads and Resilient Lookups
  • Tries PostgreSQL first, falls back to SQLite if needed
  • Demonstrates read availability in partitioned systems
  • Good for introducing fault tolerance
  • Shows practical resilience logic in distributed systems

Which operation simulates fault tolerance in our application?

  • Manual table drop
  • Fallback read from SQLite if PostgreSQL fails
  • Reading and writing only to PostgreSQL
  • Trigger-based deletion
Answer   B
Manual Syncing Between Nodes
  • SQLite acts as buffer when PostgreSQL is down
  • Manual sync pushes offline data later
  • Simulates eventual consistency architecture
  • Highlights sync conflicts and retry logic

What does the manual sync function do in our demo?

  • Deletes all rows from SQLite
  • Clones the PostgreSQL schema to SQLite
  • Pushes unsynced records from SQLite to PostgreSQL
  • Backs up PostgreSQL to a file
Answer   C
Aggregating and Querying Distributed Data
  • Pulls data from both databases in one result
  • Shows basic distributed query behavior
  • Teaches importance of merging and deduplication
  • Useful for analytics or reporting across nodes

What feature does the query aggregation function demonstrate?

  • Distributed joins
  • Single-node analytics
  • Combining results from multiple nodes
  • Binary tree sorting
Answer   C
Questions

Which of the following best describes our demo's architecture?

  • Vertical scaling with caching
  • Monolithic query planner
  • Python-based logic controlling two independent databases
  • Multi-master SQL cluster with global consensus
Answer   C
Questions
Homework 4 (Apr/18), Lab 5 (Apr/18), Project Part 2 (Apr/20)
Thanks
Module 13
Relational Database and Data Aggregation
Module 14
Project Presentation
Module 15
Final Preparation
Syllabus
Database Management Systems (COMP-163-01)

Class Schedule:

Time: 08:00 AM – 09:20 AM on Monday, Wednesday, Friday

Location: John T Chambers Technology Center 114 (CTC 114)

Course Description

A Database Management System (DMS) is designed for the efficient storage, access, and update of data. This course explores DMS from two main perspectives:

  • User-Centered Perspective: Data models, query languages, and design techniques
  • System Implementation Perspective: Policies, algorithms, and data structures for DMS design and implementation
Course Objectives
  • Understand DMS functionality to select and utilize a DMS effectively
  • Analyze data requirements and develop database schemas
  • Create, manipulate, and query data using SQL
  • Understand transaction processing concepts (ACID)
  • Develop and implement access control policies
  • Evaluate and utilize indexing structures
  • Understand the theoretical foundations of query languages
  • Integrate a DMS with a programming environment
Prerequisites
  • COMP 51: Fundamental programming skills
  • COMP 53: Basic data structures and algorithms
Teaching Assistant
Teaching Assistant: Ms. Konika Reddy Saddikuti
Email: k_saddikuti@u.pacific.edu
Office Location: Student Support Center
Office Hours: By appointment

Instructor
Instructor: Dr. Solomon Berhe
Email: sberhe@pacific.edu
Zoom: Zoom Meeting Link
Office: CTC 117
Office Hours: Mon/Wed/Fri, 2:00–3:15 PM (or by appointment)

Textbooks:

Course Topics
  • Entity-Relationship Models
  • Relational Data Model and Algebra
  • SQL: Data Definition, Manipulation, and Queries
  • Schema Design and Normalization
  • SQL Application Interfaces
  • Static and Dynamic Indexes
  • Query Processing and Optimization
  • Security in Databases
  • Transaction Management (Serializability, Recoverability)
  • Concurrency Control
  • NoSQL Databases
Grading
  • Labs: 15%
  • Homework: 15%
  • Project: 40%
  • Exams: 30%
Assignments and Exams
Homework: Includes written exercises and programming assignments. Submit via Canvas or GitHub.
Team Projects: Two projects (database design, development, and application integration)
Exams: One midterm and one final. Students may bring one handwritten note sheet (8.5x11)
Policies
  • Attendance and Participation: Active participation is required. Additional points may be awarded for consistent engagement
  • Late Submissions:
    • Late assignments accepted during the next class with a 20% penalty
    • No makeup exams unless excused for valid reasons
  • Academic Integrity: Academic dishonesty (e.g., plagiarism, cheating) will result in a grade of zero for the assignment or more severe penalties for repeated offenses.
  • Collaboration: Collaborative problem-solving is encouraged but must stop short of sharing code or written work
  • Accommodations: Students requiring accommodations should contact the Office of Services for Students with Disabilities at McCaffrey Center, Rm. 137.
Canvas and Course Updates

All course materials, announcements, and assignments will be posted on Canvas. Check regularly for updates

Honor Code

Students are expected to act with integrity, honesty, and responsibility. Violations of the Honor Code will be reported to the Office of Student Conduct and Community Standards

For more details, see the University Academic Honesty Policy

Source Document: Course Syllabus