Course: COMP 163
Time: 08:00 AM - 09:15 AM on Monday, Wednesday, Friday
Location: John T Chambers Technology Center 114 (CTC 114)
Flower
------
id: 101
name: "Rose"
color: "Red"
price: 2.50
Flowers
-------
101 | Rose | Red | 2.50
102 | Tulip | Yellow | 1.75
103 | Lily | White | 3.00
104 | Daisy | White | 1.25
105 | Orchid | Purple | 4.50
...
(Thousands of rows)
Rule:
price >= 0
✔ Rose | 2.50
✔ Tulip | 1.75
✘ Lily | -3.00 ← rejected
SELECT name, price
FROM flowers
WHERE color = 'White';
Result:
Lily | 3.00
Daisy | 1.25
App / Website / API / Software
↓
--------- DBMS ---------
↓
Database (disk / SSD)
| System | How it Stores Data | How you Query | Main Problem |
|---|---|---|---|
| Paper | Handwritten records | Search by reading | Slow, hard to scale |
| Files (folders) | Documents in directories | Manual naming + search | Duplicates, inconsistent versions |
| Spreadsheets | Rows and columns | Filters and formulas | Hard for multi user + rules |
| Custom App (ad hoc) | Program specific format | Whatever the app supports | No standard, hard to maintain |
| Type | Data Shape | Good For | Example |
|---|---|---|---|
| Relational | Tables (rows/columns) | Business data, consistency | PostgreSQL, MySQL |
| Document | JSON like documents | Flexible fields, web apps | MongoDB |
| Key Value | Key → value | Caching, sessions | Redis |
| Column Store | Columns (analytics) | Fast aggregates | ClickHouse |
| Graph | Nodes + edges | Relationships, networks | Neo4j |
Each component has a distinct role, but all three must work together.
-- Generated by an LLM
DELETE FROM orders;
-- Missing WHERE clause
-- Result: all rows deleted
Conclusion: LLMs can assist, but humans must review SQL before execution.
flowers(id, name, color, price)
orders(id, flowerId, qty)
| StudentId | Name | Major |
|---|---|---|
| 101 | Ana | CS |
| 102 | Ben | EE |
| 103 | Cam | CS |
SELECT name, price
FROM flowers
WHERE price > 2.00;
Can this query be used in programs written in Java, C++, and Python?
Correct: D
SQL is a database query language. The same SQL query can be sent from Java, C++, Python, or any other language through a database API.
Connection conn = DriverManager.getConnection(url);
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("""
SELECT name, price
FROM flowers
WHERE price > 2.00
""");
conn = sqlite3.connect("flowers.db")
cur = conn.cursor()
cur.execute("""
SELECT name, price
FROM flowers
WHERE price > 2.00
""")
sqlite3* db;
sqlite3_open("flowers.db", &db);
sqlite3_exec(db,
"SELECT name, price FROM flowers WHERE price > 2.00",
nullptr, nullptr, nullptr);
const db = new sqlite3.Database("flowers.db");
db.all(
`SELECT name, price
FROM flowers
WHERE price > 2.00`, callback);
Which component is executing the SQL query?
Correct: C
The DBMS parses, plans, and executes SQL. Programming languages only send the query and receive results.
Java · Python · C++ · JS
send SQL string
parse
plan
execute
flowers.db
tables
rows
indexes
What is the role of a programming language when working with a database?
Correct: C
Applications use programming languages to communicate with the DBMS, but SQL remains the interface for data access.
# step-by-step instructions
# how to do it
for flower in flowers:
if flower.price > 2.00:
print(flower.name, flower.price)
-- describe the result
-- what you want
SELECT name, price
FROM flowers
WHERE price > 2.00;
Query A:
SELECT * FROM flowers;
Query B:
INSERT INTO flowers(name, price)
VALUES ('Rose', 2.50);
.db)
flowers.db
├─ flowers
├─ suppliers
└─ orders
Java · Python · C++ · JS
send SQL string
parse
plan
execute
flowers.db
tables
rows
indexes
sqlite3 flowers.db
enter SQL
| Type | Description | Flower Example |
|---|---|---|
| INTEGER | Whole numbers | flower_id |
| REAL | Decimal numbers | price |
| TEXT | Strings | name, color |
| BLOB | Binary data | image |
| NULL | Missing value | unknown_color |
flowers(
id,
name,
color,
price
)
| Column | Type | Reason |
|---|---|---|
| id | INTEGER | Primary identifier |
| name | TEXT | Flower name |
| color | TEXT | Readable attribute |
| price | REAL | Decimal value |
CREATE TABLE flowers (
id INTEGER PRIMARY KEY,
name TEXT,
color TEXT,
price REAL
);
price REAL CHECK (price >= 0)
.db file
flowers.db
├─ flowers
├─ suppliers
└─ orders
| Type | Description | Flower Example |
|---|---|---|
| INTEGER | Whole numbers | flower_id |
| REAL | Decimal numbers | price |
| TEXT | Strings | name, color |
| BLOB | Binary data | image |
| NULL | Missing value | unknown_color |
flowers(
id,
name,
color,
price
)
| Column | Type | Reason |
|---|---|---|
| id | INTEGER | Unique identifier |
| name | TEXT | Flower name |
| color | TEXT | Human readable |
| price | REAL | Decimal value |
CREATE TABLE flowers (
id INTEGER PRIMARY KEY,
name TEXT,
color TEXT,
price REAL
);
INTEGER PRIMARY KEY is special
price REAL CHECK (price >= 0)
inventoryid INTEGER PRIMARY KEYname TEXT NOT NULLcategory TEXT NOT NULLprice REAL NOT NULLquantity INTEGER NOT NULL
CREATE TABLE inventory (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT NOT NULL,
price REAL NOT NULL,
quantity INTEGER NOT NULL
);
.schema inventory
INSERT INTO inventory (name, category, price, quantity)
VALUES ('Oak Table', 'Table', 299.99, 4);
INSERT INTO inventory (name, category, price, quantity)
VALUES ('Leather Sofa', 'Sofa', 899.00, 2);
INSERT INTO inventory (name, category, price, quantity)
VALUES ('Desk Lamp', 'Lamp', 39.50, 12);
SELECT id, name, category, price, quantity
FROM inventory;
SELECT name, quantity
FROM inventory
WHERE quantity <= 3;
supplier (TEXT)supplier to inventory
ALTER TABLE inventory
ADD COLUMN supplier TEXT;
.schema inventory
UPDATE inventory
SET supplier = 'Pacific Furnishings'
WHERE name = 'Oak Table';
UPDATE inventory
SET supplier = 'West Coast Leather'
WHERE name = 'Leather Sofa';
UPDATE inventory
SET supplier = 'BrightHome'
WHERE name = 'Desk Lamp';
SELECT id, name, supplier, quantity
FROM inventory;
SELECT name, category, quantity
FROM inventory
WHERE supplier = 'BrightHome';
UNIQUE, NOT NULL, DEFAULTFirst the concept, then the SQL (using our Furniture Store case study).
inventory
---------
id | name | category | price | quantity
1 | Oak Table | Table | 299.99 | 4
2 | Leather Sofa | Sofa | 899.00 | 2
3 | Desk Lamp | Lamp | 39.50 | 12
Good:
id = 1, 2, 3, 4, ...
Bad:
id = 1, 1, 2, 3 (duplicate)
Without a key:
UPDATE inventory
SET price = 10
WHERE category = 'Lamp';
Could affect many rows.
inventoryid
id is not "meaningful".
It is just an identifier.
name/category can change
id should not.
CREATE TABLE inventory (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT NOT NULL,
price REAL NOT NULL,
quantity INTEGER NOT NULL
);
INTEGER PRIMARY KEY can be auto-generatedid during INSERT
INSERT INTO inventory (name, category, price, quantity)
VALUES ('Oak Table', 'Table', 299.99, 4);
-- id assigned automatically
id INTEGER PRIMARY KEY AUTOINCREMENT
CREATE TABLE inventory (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
category TEXT NOT NULL,
price REAL NOT NULL,
quantity INTEGER NOT NULL
);
SQLite will keep increasing id values even if rows are deleted.
SELECT id, name, category, price, quantity
FROM inventory;
UPDATE inventory
SET price = 279.99
WHERE id = 1;
DELETE FROM inventory
WHERE id = 1;
Primary keys exist for correctness and control.
Rules enforced by the database to protect data quality.
price REAL NOT NULL
sku TEXT UNIQUE
quantity INTEGER DEFAULT 0
Prevents missing critical information.
name TEXT NOT NULL,
category TEXT NOT NULL,
price REAL NOT NULL
Prevents duplicate values across rows.
sku TEXT UNIQUE
Provides a value when none is supplied.
quantity INTEGER DEFAULT 0
CREATE TABLE inventory (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
category TEXT NOT NULL,
sku TEXT UNIQUE,
price REAL NOT NULL,
quantity INTEGER DEFAULT 0
);
Real systems connect data across tables.
| id (PK) | name | country |
|---|---|---|
| 1 | Pacific Furnishings | USA |
| 2 | West Coast Leather | USA |
| 3 | BrightHome | Canada |
| id (PK) | name | category | supplier_id (FK) |
|---|---|---|---|
| 101 | Oak Table | Table | 1 |
| 102 | Leather Sofa | Sofa | 2 |
| 103 | Desk Lamp | Lamp | 3 |
Each inventory item references exactly one supplier using a foreign key.
The database enforces that supplier_id = 1 must exist in the
supplier table before the inventory row can be inserted.
A reference to a primary key in another table.
supplier_id INTEGER
REFERENCES supplier(id)
CREATE TABLE supplier (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
contact_email TEXT UNIQUE
);
CREATE TABLE inventory (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
category TEXT NOT NULL,
price REAL NOT NULL,
quantity INTEGER DEFAULT 0,
supplier_id INTEGER,
FOREIGN KEY (supplier_id)
REFERENCES supplier(id)
);
Constraints and keys are not optional. They are the foundation of reliable, maintainable databases.
The furniture store now wants to manage suppliers separately.
supplierThink about what uniquely identifies a supplier and what information should never be missing.
id: unique supplier identifiername: supplier nameemail: contact emailphone: optional contact numberINTEGER PRIMARY KEY AUTOINCREMENTname is NOT NULLemail is UNIQUE
CREATE TABLE supplier (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
phone TEXT
);
.schema supplier
supplier_id to inventorysupplier(id)In SQLite, foreign keys must be enabled explicitly.
PRAGMA foreign_keys = ON;
Run this before creating or using foreign key constraints.
CREATE TABLE inventory (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
category TEXT NOT NULL,
price REAL NOT NULL,
quantity INTEGER DEFAULT 0,
supplier_id INTEGER NOT NULL,
FOREIGN KEY (supplier_id)
REFERENCES supplier(id)
);
You now have: entities, primary keys, constraints, and a foreign key relationship enforced by SQLite.
Due: January 30, 11:59 PM
Goal: model, design, and implement a small relational database from a real-world
scenario.
A local library wants to track books loaned to members.
Entities are the things the system needs to track.
| Good Entities | Bad Entities |
|---|---|
| Book | City |
| Member | Chair |
| Staff | Building |
Relationships describe how entities are connected.
Output: libraryERD.png
@startuml
entity "Book" {
+book_id: INTEGER
title: TEXT
isbn: TEXT
}
entity "Member" {
+member_id: INTEGER
name: TEXT
email: TEXT
}
Book -- Member: borrows
@enduml
Verify using the .schema command
| Operation | What You Do |
|---|---|
| Create | INSERT at least 3 rows |
| Read | SELECT all rows |
| Update | Change a value in one row |
| Delete | Remove one specific row |
This lab connects concepts → design → SQL implementation.
Due: January 30, 11:59 PM
Focus: conceptual understanding of SQL, SQLite, and course expectations.
This homework ensures you understand the role of SQL, the difference between language and system, and the scope of the course.
Please read Chapters 1 and 2 of the following books:
Goal: general understanding, not mastery of syntax.
Do not worry about remembering every command.
Answer in your own words. Short, clear explanations are preferred.
Explain the purpose of each keyword:
CREATESELECTUPDATEDELETEINSERTPrecision of thought matters more than syntax.
Please read the course syllabus carefully:
Submitting this homework confirms that you reviewed the syllabus.
This homework sets the foundation for everything that follows.
-- Work order:
-- 1) requirements
-- 2) ERD
-- 3) SQL schema
-- 4) queries
The system should allow users to rent bikes easily
The system should allow registered users to rent a bike by selecting a location, choosing an available bike, and completing payment through the application within three steps
Purpose: Bike Rental Database System to manage rentals and track customers, bikes, and transactions
-- Nouns: user, bike, location, payment, transaction
-- Verbs: rent, select, choose, pay
-- Candidate entities:
-- - Customer
-- - Bike
-- - Location
-- - Payment
-- - RentalTransaction
-- Entity candidates:
-- Customer
-- Bike
-- RentalTransaction
-- Location
-- Payment
-- Relationship examples:
-- Customer rents Bike
-- Bike located at Location
-- Customer pays Payment
-- ERD view:
-- Customer --(rents)--> Bike
-- Entity will have attributes:
-- BikeID, Model, Color, Condition, Availability
-- Prefer consistent names:
-- Customer, Bike, RentalTransaction
-- Avoid reserved words for table names
-- Safe habit:
-- Pick a naming style and keep it.
-- Example: PascalCase tables, snake_case columns
A travel booking system should allow users to search for, compare, and book flights, hotels, and rental cars. Users must be able to filter results by price, date, and location, and securely complete payments
A travel booking system should allow users to search for, compare, and book flights, hotels, and rental cars. Users must be able to filter results by price, date, and location, and securely complete payments.
-- Practice:
-- turn nouns into entity boxes
-- connect boxes with verb labels
-- Customer attributes:
-- CustomerID, Name, Email, Phone
-- Bike attributes:
-- BikeID, Model, Color, Condition, Availability
-- Primary key examples:
-- Customer(CustomerID)
-- Bike(BikeID)
-- RentalTransaction(RentalID)
-- Optional examples:
-- Customer.Email, Customer.Phone
-- Bike.Color, Bike.Condition
-- RentalTransaction.ReturnDate
-- BookingDate DATE
-- Availability BOOLEAN
-- Name VARCHAR(100)
| Data Type | Description | Example |
|---|---|---|
| INTEGER | A whole number, no decimal points | 42 |
| BIGINT | A large integer value | 9223372036854775807 |
| DECIMAL(p, s) | Exact numeric value | DECIMAL(10, 2) |
| FLOAT | Approximate numeric value | 3.14159 |
| CHAR(n) | Fixed length string | CHAR(5) |
| VARCHAR(n) | Variable length string | VARCHAR(50) |
| TEXT | Variable length text | "long notes" |
| DATE | Date | 2025-01-14 |
| TIME | Time | 14:30:00 |
| TIMESTAMP | Date and time | 2025-01-14 14:30:00 |
| BOOLEAN | TRUE or FALSE | TRUE |
| BLOB | Binary data | [bytes] |
-- Common constraints:
-- PRIMARY KEY
-- FOREIGN KEY
-- NOT NULL
-- UNIQUE
-- DEFAULT
| Constraint | Description |
|---|---|
| PRIMARY KEY | Uniquely identifies each record; implies NOT NULL |
| FOREIGN KEY | Ensures referential integrity by linking to another table |
| NOT NULL | Column cannot be NULL |
| UNIQUE | All values in a column are distinct |
| DEFAULT | Default value when no value is provided |
| AUTO_INCREMENT | Auto generates unique integer id (DB specific) |
| Conceptual Term | Technical Term |
|---|---|
| Diagram | Schema |
| Entity | Table |
| Attribute | Column |
| Record | Row |
| Identifier | Primary Key |
| Relationship | Foreign Key |
| Optional Attribute | Nullable Column |
| Mandatory Attribute | NOT NULL Column |
CREATE TABLE Customer (
CustomerID INTEGER PRIMARY KEY AUTOINCREMENT,
Name VARCHAR(100) NOT NULL,
Email VARCHAR(100) UNIQUE,
Phone VARCHAR(15)
);
CREATE TABLE Bike (
BikeID INTEGER PRIMARY KEY AUTOINCREMENT,
Model VARCHAR(100) NOT NULL,
Color VARCHAR(50),
Condition VARCHAR(50),
Availability BOOLEAN DEFAULT TRUE
);
CREATE TABLE RentalTransaction (
RentalID INTEGER PRIMARY KEY AUTOINCREMENT,
CustomerID INTEGER NOT NULL,
BikeID INTEGER NOT NULL,
BookingDate DATE NOT NULL,
ReturnDate DATE,
FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID),
FOREIGN KEY (BikeID) REFERENCES Bike(BikeID)
);
-- Example risk:
-- UI collects phone as text "+1-123-456"
-- DB column INTEGER
-- Insert fails or loses formatting
| Front-End | Back-End | Database | Result |
|---|---|---|---|
| Text input for phone number | String validation for format | INTEGER type | Formatted numbers rejected |
| Dropdown allows decimal selection | Expects integer values | FLOAT type | Unexpected decimal input |
| Date picker with local format | Assumes ISO format | DATE type | Date conversion errors |
-- Entity → table later
-- Example: Customer, Bike
-- Relationship example:
-- Customer rents Bike
-- Primary key column:
-- CustomerID INTEGER PRIMARY KEY
Cardinality matters: 1:1, 1:N, N:N
1 Entity
0 Relationship
2 Entity
1 Relationship
3 Entity
2 Relationship
-- Design tip:
-- prefer the simplest ERD that meets requirements
-- add complexity only when needed
| Cardinality | Meaning | Example |
|---|---|---|
| 1:1 | Each record links to one record | User and Profile |
| 1:N | One record links to many records | Customer and Rentals |
| N:N | Many records link to many records | Students and Courses |
| Cardinality | When to Use | Example |
|---|---|---|
| 1:1 | Each record in one table corresponds to exactly one record in another table | User and Profile |
| 1:N | One record can be associated with multiple records in another table | Library and Books |
| N:N | Multiple records in both tables can be related to each other | Students and Courses |
SELECT Book.title
FROM Book
WHERE Book.isbnId = '978-1-';
SELECT Book.title
FROM Book, ISBN
WHERE Book.bookId = ISBN.bookId
AND ISBN.isbnId = '978-1-';
SELECT Book.title
FROM Book, ISBN, Book_ISBN
WHERE Book.bookId = Book_ISBN.bookId
AND ISBN.isbnId = Book_ISBN.isbnId
AND ISBN.isbnId = '978-1-';
| # | Scenario | Which Cardinality? |
|---|---|---|
| 1 | Book and ISBN | |
| 2 | Author and Book | |
| 3 | Student and borrow Book | |
| 4 | Student and Student ID | |
| 5 | Book and Publisher | |
| 6 | Publisher and Books | |
| 7 | Library and Book | |
| 8 | Book and Format (e.g., eBook, hardcover) | |
| 9 | Book and Catalog Number in the Library System | |
| 10 | Reader and Book Review |
-- Next step:
-- use your ERD to justify every CREATE TABLE line
@startuml
entity "EntityA" as A {
+id: INTEGER
attribute_1: VARCHAR
attribute_2: INTEGER
}
entity "EntityB" as B {
+id: INTEGER
attribute_1: VARCHAR
}
A -- B: "relationship"
@enduml
Requirements → ERM → Tables → Data
-- Work order:
-- 1) requirements
-- 2) ERM / ERD
-- 3) schema
-- 4) data
-- 5) queries
The store should sell furniture products to customers and record each order.
Each order can include multiple products with quantities and prices at the time of purchase.
The store should track inventory per product, so staff know what is in stock and what needs reordering.
-- Design choice:
-- Keep the first version small.
-- Add delivery, returns, warehouses later.
-- Minimal set of tables:
-- 1) Product (includes inventory count)
-- 2) Customer
-- 3) CustomerProductSales (captures sales rows with two FKs)
-- Tradeoff:
-- Normalized: Order + OrderItem (best design)
-- Constrained: store each sale inCustomerProductSales
-- Conceptual ERM:
-- Customer N:N Product (via CustomerProductSales)
-- Product carries stock columns (qty_on_hand, reorder_level)
@startuml
hide circle
skinparam linetype ortho
skinparam shadowing false
entity "Customer" as Customer {
+customer_id: INTEGER
--
name: VARCHAR
email: VARCHAR
phone: VARCHAR
}
entity "Product" as Product {
+product_id: INTEGER
--
sku: VARCHAR
name: VARCHAR
category: VARCHAR
price: DECIMAL
qty_on_hand: INTEGER
reorder_level: INTEGER
}
entity "CustomerProductSales" as CustomerProductSales {
+sale_id: INTEGER
--
customer_id: INTEGER (FK)
product_id: INTEGER (FK)
sale_date:DATE
qty :INTEGER
unit_price: DECIMAL
status :VARCHAR
}
Customer ||--o{ CustomerProductSales: buys
Product ||--o{ CustomerProductSales: sold_in
@enduml
-- Example row meaning:
-- customer 1 bought product 5
-- qty = 2 at unit_price = 89.00 on 2026-01-10
-- Primary keys are stable identifiers.
-- They should not carry meaning.
-- Constraints catch errors early.
-- They protect data quality.
-- Customer ||--o{ CustomerProductSales
-- FK: CustomerProductSales.customer_id → Customer.customer_id
-- FK: CustomerProductSales.product_id → Product.product_id
| Table | Purpose | Key Columns |
|---|---|---|
| Customer | Who buys | customer_id (PK), email (UNIQUE) |
| Product | What is sold and stock count | product_id (PK), sku (UNIQUE), qty_on_hand |
| CustomerProductSales | Sales rows (customer ↔ product) | sale_id (PK), customer_id (FK), product_id (FK) |
CREATE TABLE Customer (
customer_id INTEGER PRIMARY KEY AUTOINCREMENT,
name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE,
phone VARCHAR(30)
);
CREATE TABLE Product (
product_id INTEGER PRIMARY KEY AUTOINCREMENT,
sku VARCHAR(40) NOT NULL UNIQUE,
name VARCHAR(120) NOT NULL,
category VARCHAR(60),
price DECIMAL(10,2) NOT NULL,
qty_on_hand INTEGER NOT NULL DEFAULT 0,
reorder_level INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE CustomerProductSales (
sale_id INTEGER PRIMARY KEY AUTOINCREMENT,
customer_id INTEGER NOT NULL,
product_id INTEGER NOT NULL,
sale_date DATE NOT NULL,
status VARCHAR(30) NOT NULL DEFAULT 'NEW',
qty INTEGER NOT NULL,
unit_price DECIMAL(10,2) NOT NULL,
FOREIGN KEY (customer_id) REFERENCES Customer(customer_id),
FOREIGN KEY (product_id) REFERENCES Product(product_id)
);
-- Insert order:
-- 1) Customer
-- 2) Product
-- 3) CustomerProductSales
INSERT INTO Customer (name, email, phone) VALUES
('Ava Kim', 'ava.kim@example.com', '209-555-0101'),
('Tom Patel', 'tom.patel@example.com', '209-555-0112');
INSERT INTO Product (sku, name, category, price, qty_on_hand, reorder_level) VALUES
('SOFA-1001', 'Linen Sofa 3 Seat', 'Sofa', 899.00, 5, 2),
('TBLE-2001', 'Oak Dining Table', 'Table', 749.00, 2, 1),
('CHAI-3001', 'Walnut Accent Chair', 'Chair', 299.00, 7, 3);
INSERT INTO Product (sku, name, category, price, qty_on_hand, reorder_level) VALUES
('DSKR-4001', 'Standing Desk', 'Desk', 499.00, 3, 1),
('LAMP-5001', 'Floor Lamp', 'Lighting', 89.00, 12, 5),
('RUGG-6001', 'Wool Rug 5x8', 'Rug', 179.00, 4, 2);
INSERT INTO CustomerProductSales (customer_id, product_id, sale_date, status, qty, unit_price)
VALUES
(1, 1, '2026-01-10', 'PAID', 1, 899.00),
(1, 5, '2026-01-10', 'PAID', 2, 89.00);
INSERT INTO CustomerProductSales (customer_id, product_id, sale_date, status, qty, unit_price)
VALUES
(2, 2, '2026-01-11', 'NEW', 1, 749.00),
(2, 3, '2026-01-11', 'NEW', 2, 299.00),
(1, 6, '2026-01-12', 'PAID', 1, 179.00),
(1, 4, '2026-01-12', 'PAID', 1, 499.00);
SELECT product_id, sku, name, price, qty_on_hand, reorder_level
FROM Product
ORDER BY category, price DESC;
| Condition | Meaning | Action |
|---|---|---|
| qty_on_hand = 0 | Out of stock | Block purchase |
| qty_on_hand ≤ reorder_level | Low stock | Restock soon |
| qty_on_hand > reorder_level | Healthy stock | No action |
SELECT sku, name, qty_on_hand, reorder_level
FROM Product
WHERE qty_on_hand <= reorder_level
ORDER BY qty_on_hand ASC;
SELECT c.name,
COUNT(s.sale_id) AS sale_rows,
SUM(s.qty * s.unit_price) AS total_spent
FROM Customer c
JOIN CustomerProductSales s ON s.customer_id = c.customer_id
GROUP BY c.customer_id
ORDER BY total_spent DESC;
UPDATE Product
SET qty_on_hand = qty_on_hand + 10
WHERE sku = 'TBLE-2001';
UPDATE Product
SET price = 799.00
WHERE sku = 'SOFA-1001';
UPDATE CustomerProductSales
SET status = 'FULFILLED'
WHERE sale_id = 1;
DELETE FROM CustomerProductSales
WHERE sale_id = 2
AND status = 'NEW';
| Goal | SQL | Example |
|---|---|---|
| Read data | SELECT | List low stock products |
| Add data | INSERT | Create new sales rows |
| Change data | UPDATE | Restock inventory |
| Remove data | DELETE | Delete a test sale |
-- Guard rails:
-- qty_on_hand >= 0
-- price > 0
-- qty > 0
-- valid status values
-- Next iteration:
-- Add Order and OrderItem tables
-- Group line items under an order_id
-- Stronger constraints + joins
-- Grow the model only when needed.
-- Keep the first version stable.
-- Habit:
-- justify each column with a requirement
Modify requirements and update the ERM and tables
Georg Cantor: Founder of Set Theory
items = []
items.append("chair")
items.append("table")
items.append("chair") # duplicate allowed!
print(items) # ['chair', 'table', 'chair']
s = set()
s.add("chair")
s.add("table")
s.add("chair") # ignored!
print(s) # {'chair', 'table'}
Sets:
A = {1, 2}
B = {2, 3}
A ∪ B = {1, 2, 3}
A ∩ B = {2}
Logic:
P(x): x ∈ A
Q(x): x ∈ B
P(x) ∨ Q(x): true for 1,2,3
P(x) ∧ Q(x): true for 2
¬P(x): true for 3,4
s = {"chair", "table"}
print(len(s)) # 2
s.add("chair") # ignored
s.add("desk") # added
s.add("chair") # ignored again
print(len(s)) # still 3
print(s) # {'chair', 'table', 'desk'}
CREATE TABLE Furniture (
id INTEGER PRIMARY KEY AUTOINCREMENT,
item VARCHAR(50) UNIQUE
);
INSERT INTO Furniture (item) VALUES ('chair'), ('table');
-- Duplicate item → raises exception
INSERT INTO Furniture (item) VALUES ('chair');
-- Error: UNIQUE constraint failed: Furniture.item
-- Using OR IGNORE to mimic set (duplicates ignored)
INSERT OR IGNORE INTO Furniture (item) VALUES ('chair');
INSERT OR IGNORE INTO Furniture (item) VALUES ('desk');
INSERT OR IGNORE INTO Furniture (item) VALUES ('chair');
SELECT COUNT(*) FROM Furniture; -- 3
Set = {chair, table, lamp} with no duplicates
const furniture = new Set([
"chair",
"table",
"lamp"
]);
Set<String> furniture = new HashSet<>();
furniture.add("chair");
furniture.add("table");
furniture.add("lamp");
furniture = {"chair", "table", "lamp"}
CREATE TABLE Furniture (
item INTEGER PRIMARY KEY
);
INSERT INTO Furniture VALUES
('chair'), ('table'), ('lamp');
| Operation | Symbol | Description | Diagram |
|---|---|---|---|
| Union | ⋃ | All unique elements | |
| Intersection | ∩ | Common elements only | |
| Difference | − | In A but not in B |
Ordered sequence of elements
Bicycles relation:
bikeId | brand | type | price
1 | Trek | Mountain | 1200 ← tuple 1
1 | Trek | Mountain | 1200 ← tuple 1
1 | Trek | Mountain | 1200 ← tuple 1
2 | Giant | Road | 1500 ← tuple 2
3 | Cannondale | Hybrid | 1000 ← tuple 3
Each row = one tuple (ordered sequence of elements)
Bikes relation (a set of tuples):
bikeId | brand | type | price
1 | Trek | Mountain | 1200
2 | Giant | Road | 1500
3 | Cannondale | Hybrid | 1000
• Each row = one tuple
• No duplicate rows allowed
• All rows share the same columns (attributes)
Input Bicycles:
bikeId | brand | type | price
1 | Trek | Mountain | 1699
2 | Giant | Road | 1100
3 | Cannondale | Hybrid | 999
After σ(price < 1200):
bikeId | brand | type | price
2 | Giant | Road | 1100
3 | Cannondale | Hybrid | 999
Input:
bikeId | brand | model | price | color
1 | Trek | Roscoe 7 | 1699 | Blue
2 | Giant | Contend | 1299 | Black
3 | Trek | Marlin 5 | 799 | Red
After π(brand, price):
brand | price
Trek | 1699
Giant | 1299
Trek | 799
Available:
bikeId | brand
1 | Trek
2 | Giant
Sold:
bikeId | brand
2 | Giant
4 | Scott
After Union:
bikeId | brand
1 | Trek
2 | Giant
4 | Scott
Stock:
bikeId | brand
1 | Trek
2 | Giant
3 | Cannondale
Sold:
bikeId | brand
2 | Giant
After Difference:
bikeId | brand
1 | Trek
3 | Cannondale
Bikes (2 rows) × Stores (2 rows) = 4 rows
Bikes: Stores:
1 Trek A Berlin
2 Giant B Munich
Result:
bikeId | brand | store | city
1 | Trek | A | Berlin
1 | Trek | B | Munich
2 | Giant | A | Berlin
2 | Giant | B | Munich
Orders: Customers:
orderId | custId | amt custId | name
101 | 5 | 299 5 | Alice
102 | 7 | 450 7 | Bob
After Natural Join:
orderId | custId | amt | name
101 | 5 | 299 | Alice
102 | 7 | 450 | Bob
Enrollments: Required:
student | course course
A | Math Math
A | Physics Physics
B | Math
After Division (students who took ALL required):
student
A
Before:
bikeId | brand | price
After ρ(Inv ← Bicycles):
Inv.bikeId | Inv.brand | Inv.price
| Operator | Symbol | Description | Example |
|---|---|---|---|
| Selection | σ | Filter rows | σ(price < 1200)(Bikes) |
| Projection | π | Select columns | π(brand, price)(Bikes) |
| Union | ⋃ | Combine, no duplicates | Avail ⋃ Sold |
| Difference | − | A minus B | Stock − Sold |
| Cartesian Product | × | All pairs | Bikes × Stores |
| Natural Join | ⨝ | Match on common attrs | Orders ⨝ Customers |
| Division | ÷ | Match all values | Students ÷ Required |
| Renaming | ρ | Rename relation/attrs | ρ(Inv)(Bikes) |
σ → less rows
π → less columns
⨝ → less comparisons
Relations
↓
Selection
↓
Projection
↓
Join
↓
Result
Before:
π brand (σ price < 500 (Bicycles ⨝ Stores))
After:
π brand ((σ price < 500 (Bicycles)) ⨝ Stores)
Step 1: Bicycles ⨝ Stores
→ large intermediate result
Step 2: σ price < 500
→ many rows discarded late
π brand
Step 1: σ price < 500 (Bicycles)
→ small input
Step 2: filtered Bicycles ⨝ Stores
→ fewer comparisons
π brand
Before:
π brand (π bikeId, brand (Bicycles))
After:
π brand (Bicycles)
Before:
π brand (Bicycles ⨝ Stores ⨝ Suppliers)
After:
π brand (Bicycles ⨝ Stores)
(Bicycles ⨝ Stores) ⨝ Suppliers
Bicycles ⨝ (Stores ⨝ Suppliers)
Query:
σ R.name = 'West' (O ⨝ C ⨝ R)
(O ⨝ C) ⨝ R
O ⨝ C → ~100,000,000,000 rows
O ⨝ (C ⨝ (σ R='West'(R)))
R: 10 → 1
C: 100,000 → 10,000
O ⨝ C → ~1,000,000,000 rows
Before:
σ price < 500 (π brand, price (Bicycles ⨝ Stores))
After:
(π brand, price (σ price < 500 (Bicycles)))
⨝ Stores
σ Price > 10 ∧ Name='Rose' ∧ Location='Local'
(Flowers ⨝ Suppliers)
Flowers ⨝ Suppliers
→ apply filters
1,000 × 500 = 500K
(σ Local (Suppliers))
⨝
(σ Rose ∧ Price > 10 (Flowers))
100 × 50 = 5K
(σ Rose ∧ Red (Flowers))
⨝ Supply
⨝ (σ Local (Suppliers))
π Name (
σ Price > 10 (
Flowers ⨝
(σ Local (Suppliers ⨝
σ Revenue > 1000 (Companies)))
)
)
π Name (
(σ Price > 10 (Flowers))
⨝
(σ Local (Suppliers ⨝
σ Revenue > 1000 (Companies)))
)
Write Operation
↓
[ No Journal ]
↓
Data File
SQLite
↓ write
OS Buffer
↓ (later)
Disk
Query
↓
Temp Tables
↓
RAM (Memory)
Sequence (1…N)
↓
Value assignment
↓
Skew introduced
↓
Large tables
seq = [1, 2, 3, ..., 200000]
INSERT customers
FOR EACH x IN seq:
(x, 'West' or 'Other')
WITH RECURSIVE
seq(x)
↓
x starts at 1
↓
x = x + 1
↓
STOP when x reaches limit
↓
INSERT generated rows
-- Define a recursive sequence
WITH RECURSIVE seq(x) AS (
-- Base case: start at 1
SELECT 1
UNION ALL
-- Recursive step: increment
SELECT x + 1
FROM seq
-- Abort condition
WHERE x < 200000
)
-- Insert generated values
INSERT INTO customers
SELECT
x, -- unique id
CASE -- controlled skew
WHEN x % 100 = 0
THEN 'West'
ELSE 'Other'
END
FROM seq;
x in seq(x)?x is the column name of the recursive tableseq is a temporary result setseq has one value: xseq = table, x = column
WITH RECURSIVE seq(x) AS (...)
seq
----
x
----
1
2
3
...
Time ─────────────────────────────▶
|──────────── real (wall clock) ────────────|
|──────── user (CPU work) ───────|
| sys |
Example:
real = user + sys + waiting
(what the user feels)
Run Time:
real 0.077
user 0.075025
sys 0.000000
SELECT COUNT(*)
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.region = 'West';
σ region='West' (Orders ⨝ Customers)
SELECT *
FROM sales s
JOIN products p ON s.product_id = p.id
WHERE p.category = 'Bike';
σ category='Bike' (Sales ⨝ Products)
SELECT o.id
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN regions r ON c.region_id = r.id
WHERE r.name = 'West';
Orders ⨝ (Customers ⨝ σ name='West' (Regions))
SELECT *
FROM enrollments e
JOIN students s ON e.student_id = s.id
WHERE s.major = 'CS';
Enrollments ⨝ σ major='CS' (Students)
SELECT *
FROM supply sp
JOIN suppliers s ON sp.supplier_id = s.id
JOIN flowers f ON sp.flower_id = f.id
WHERE s.location = 'Local';
Supply ⨝ (σ location='Local' (Suppliers) ⨝ Flowers)
SELECT *
FROM logs l
JOIN users u ON l.user_id = u.id
WHERE u.active = 1;
Logs ⨝ σ active=1 (Users)
SELECT *
FROM orders o
JOIN items i ON o.id = i.order_id
WHERE o.date > '2025-01-01';
σ date>'2025-01-01' (Orders) ⨝ Items
SELECT *
FROM activity a
JOIN conditions c ON a.id = c.activity_id
WHERE c.type = 'once';
Activity ⨝ σ type='once' (Conditions)
SELECT *
FROM payments p
JOIN users u ON p.user_id = u.id
JOIN countries c ON u.country_id = c.id
WHERE c.code = 'US';
Payments ⨝ (Users ⨝ σ code='US' (Countries))
SELECT *
FROM reviews r
JOIN products p ON r.product_id = p.id
WHERE p.rating > 4;
Reviews ⨝ σ rating>4 (Products)
SELECT COUNT(*)
FROM (
SELECT customer_id
FROM customers
WHERE region = 'West'
) c
JOIN orders o ON o.customer_id = c.customer_id;
-- Before
σ region='West' (Orders ⨝ Customers)
-- After
Orders ⨝ (π customer_id (σ region='West' (Customers)))
SELECT s.sale_id, s.product_id, s.qty
FROM (
SELECT id
FROM products
WHERE category = 'Bike'
) p
JOIN sales s ON s.product_id = p.id;
-- Before
σ category='Bike' (Sales ⨝ Products)
-- After
Sales ⨝ (π id (σ category='Bike' (Products)))
SELECT o.id
FROM (
SELECT id
FROM regions
WHERE name = 'West'
) r
JOIN customers c ON c.region_id = r.id
JOIN orders o ON o.customer_id = c.customer_id;
-- Before
π o.id (σ name='West' (Orders ⨝ Customers ⨝ Regions))
-- After
π o.id (
Orders ⨝
(Customers ⨝ (π id (σ name='West' (Regions))))
)
SELECT e.student_id, e.course_id
FROM enrollments e
JOIN (
SELECT id
FROM students
WHERE major = 'CS'
) s ON e.student_id = s.id;
-- Before
σ major='CS' (Enrollments ⨝ Students)
-- After
Enrollments ⨝ (π id (σ major='CS' (Students)))
SELECT sp.supplier_id, sp.flower_id
FROM (
SELECT id
FROM suppliers
WHERE location = 'Local'
) s
JOIN supply sp ON sp.supplier_id = s.id
JOIN flowers f ON sp.flower_id = f.id;
-- Before
σ location='Local' (Supply ⨝ Suppliers ⨝ Flowers)
-- After
((Supply ⨝ (π id (σ location='Local' (Suppliers)))) ⨝ Flowers)
SELECT l.log_id, l.user_id, l.action
FROM logs l
JOIN (
SELECT id
FROM users
WHERE active = 1
) u ON l.user_id = u.id;
-- Before
σ active=1 (Logs ⨝ Users)
-- After
Logs ⨝ (π id (σ active=1 (Users)))
SELECT o.id, i.item_id, i.qty
FROM (
SELECT id
FROM orders
WHERE date > '2025-01-01'
) o
JOIN items i ON o.id = i.order_id;
-- Before
σ date>'2025-01-01' (Orders ⨝ Items)
-- After
((π id (σ date>'2025-01-01' (Orders))) ⨝ Items)
SELECT a.id, a.title, c.type
FROM activity a
JOIN (
SELECT activity_id, type
FROM conditions
WHERE type = 'once'
) c ON a.id = c.activity_id;
-- Before
σ type='once' (Activity ⨝ Conditions)
-- After
Activity ⨝ (π activity_id, type (σ type='once' (Conditions)))
SELECT p.payment_id, p.amount
FROM (
SELECT id
FROM countries
WHERE code = 'US'
) c
JOIN users u ON u.country_id = c.id
JOIN payments p ON p.user_id = u.id;
-- Before
π payment_id, amount (σ code='US' (Payments ⨝ Users ⨝ Countries))
-- After
π payment_id, amount (
Payments ⨝
(Users ⨝ (π id (σ code='US' (Countries))))
)
SELECT r.review_id, r.product_id, r.stars
FROM reviews r
JOIN (
SELECT id
FROM products
WHERE rating > 4
) p ON r.product_id = p.id;
-- Before
σ rating>4 (Reviews ⨝ Products)
-- After
Reviews ⨝ (π id (σ rating>4 (Products)))
┌──────────────────────┐
│ Admin Interface │
│ (CLI · GUI · Tools) │
└──────────┬───────────┘
│
│
┌──────────────────────┐ │ ┌──────────────────────┐
│ Application Code │────┼────│ Data Tools / ETL │
│ (Web · Mobile · API) │ │ │ (Import · Export) │
└──────────┬───────────┘ │ └──────────┬───────────┘
│ │ │
│ ▼ │
│ ┌────────────────┐ │
└───────▶ DBMS ◀─────┘
│ (SQLite / PG) │
│ parse · exec │
│ enforce rules │
└────────────────┘
▲
│
┌──────────┴───────────┐
│ Reporting / BI │
│ (Queries · Views) │
└──────────────────────┘
┌──────────────────────┐
│ User │
│ (DBA · Developer) │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Terminal │
│ (CLI · Shell) │
│ SQL · scripts │
└──────────┬───────────┘
│
│ SQL / Batch Jobs
▼
┌──────────────────────┐
│ Database │
│ (Tables · Rules) │
│ constraints · ids │
└──────────────────────┘
┌──────────────────────┐
│ User │
│ (Clicks · Forms) │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ GUI │
│ (Desktop · Web UI) │
│ buttons · tables │
└──────────┬───────────┘
│
│ SQL / Queries
▼
┌──────────────────────┐
│ Database │
│ (Tables · Rules) │
│ constraints · ids │
└──────────────────────┘
┌──────────────────────┐
│ User │
│ (Browser · Client) │
└──────────┬───────────┘
│ Internet (HTTP / HTTPS)
▼
┌──────────────────────┐
│ Frontend │
│ (Web UI · Forms) │
│ HTML · CSS · JS │
└──────────┬───────────┘
│ API Calls
▼
┌──────────────────────┐
│ Backend │
│ (Logic · Auth) │
│ REST · Validation │
└──────────┬───────────┘
│ SQL
▼
┌──────────────────────┐
│ Database │
│ (Tables · Rules) │
│ constraints · ids │
└──────────────────────┘
┌──────────────────────┐ ┌──────────────────────┐
│ Users │ │ Cloud Platform │
│ (Browser · Mobile) │ │ (Network · Policy) │
└──────────┬───────────┘ └──────────┬───────────┘
│ │
│ Internet (HTTPS) │ Managed Services
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ Cloud App Services │ │ Cloud Database │
│ (Frontend + Backend) │ │ (Managed RDBMS) │
│ APIs · Auth · Logic │ │ backups · replicas │
└──────────────────────┘ └──────────────────────┘
CREATE TABLE Orders (OrderID INT PRIMARY KEY, CustomerID INT FOREIGN KEY REFERENCES Customers(CustomerID)); -- A
CREATE TABLE Orders (OrderID INT, CustomerID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)); -- B
CREATE TABLE Orders (OrderID INT, CustomerID INT, FOREIGN KEY REFERENCES Customers(CustomerID)); -- C
CREATE TABLE Orders (OrderID INT PRIMARY KEY, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID)) -- D
CREATE TABLE Table (ID INT PRIMARY KEY, Name VARCHAR(50));
CREATE TABLE Orders (OrderID INT, CustomerID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID))
CREATE TABLE Orders (OrderID INT, CustomerID INT, FOREIGN KEY REFERENCES Customers(CustomerID));
CREATE TABLE Orders (OrderID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID), CustomerID INT);
CREATE TABLE Orders (OrderID INT, CustomerID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID));
┌──────────────────────┐
│ Admin Interface │
│ (CLI · GUI · Tools) │
└──────────┬───────────┘
│
│
┌──────────────────────┐ │ ┌──────────────────────┐
│ Application Code │────┼────│ Data Tools / ETL │
│ (Web · Mobile · API) │ │ │ (Import · Export) │
└──────────┬───────────┘ │ └──────────┬───────────┘
│ │ │
│ ▼ │
│ ┌────────────────┐ │
└───────▶ DBMS ◀─────┘
│ (SQLite / PG) │
│ parse · exec │
│ enforce rules │
└────────────────┘
▲
│
┌──────────┴───────────┐
│ Reporting / BI │
│ (Queries · Views) │
└──────────────────────┘
Time
↓
GUI tools
↓
CLI automation
↓
APIs for apps
↓
Web dashboards
| Interface | Role | Purpose | Technology |
|---|---|---|---|
| Admin Interface (GUI) | Database Administrators | User management, schema modifications, monitoring | SQLite Browser |
| Command-Line Interface (CLI) | Developers and DBAs | Direct SQL execution, automation, scripting | SQLite CLI |
| Application Programming Interface (API) | Developers | Programmatic database interaction, integrating apps | Neon Tech |
| Web-Based UI | Regular Users | Accessing, viewing, and interacting with data in a user friendly way | - |
App
↓ (driver)
Database
↓
SQL results
Driver examples:
psycopg2 (Python)
pg (Node)
JDBC (Java)
postgresql://user:password@host:port/database?sslmode=require
Risk increases when DB is exposed
Prefer:
App → Backend → DB
over:
App → DB
import psycopg2
conn = psycopg2.connect("postgresql://user:pass@host/db?sslmode=require")
cur = conn.cursor()
cur.execute("SELECT version();")
print(cur.fetchone())
Keep credentials out of code
Least privilege roles
Rotate secrets
User interface
Forms · Buttons · Lists
Application logic
Validation · APIs
Data storage
Tables · Constraints
git clone https://github.com/SE4CPS/2026-COMP-163
cd 2026-COMP-163
| Interface | Role | Purpose | Technology |
|---|---|---|---|
| Admin Interface (GUI) | Database Administrators | User management, schema modifications, monitoring | SQLite Browser |
| Command-Line Interface (CLI) | Developers and DBAs | Direct SQL execution, automation, scripting | SQLite CLI |
| Application Programming Interface (API) | Developers | Programmatic database interaction, integrating apps | Neon Tech |
| Web-Based UI | Regular Users | Accessing, viewing, and interacting with data in a user friendly way | - |
Admin → database ownership tasks
Dev → build apps on top
User → consume data safely
CLI · GUI · Tools
backups · users · schema · monitoring
┌──────────────────────┐
│ Admin Interface │
│ (CLI · GUI · Tools) │
└──────────┬───────────┘
│
│
┌──────────────────────┐ │ ┌──────────────────────┐
│ Application Code │────┼────│ Data Tools / ETL │
│ (Web · Mobile · API) │ │ │ (Import · Export) │
└──────────┬───────────┘ │ └──────────┬───────────┘
│ │ │
│ ▼ │
│ ┌────────────────┐ │
└───────▶ DBMS ◀─────┘
│ (SQLite / PG) │
│ parse · exec │
│ enforce rules │
└────────────────┘
▲
│
┌──────────┴───────────┐
│ Reporting / BI │
│ (Queries · Views) │
└──────────────────────┘
Traditional rule: treat admin actions like production change control.
Admin Interface = "How we control the DB"
Not:
- how users browse data
- how apps call SQL
Yes:
- how owners keep it stable
| Area | Typical Tasks | Why it matters | Common Tools |
|---|---|---|---|
| Identity | Users, roles, grants, password rotation | Least privilege | psql, pgAdmin, cloud console |
| Reliability | Backups, restore tests, replication checks | Recovery time | pg_dump, snapshots |
| Schema | Migrations, constraints, indexes, data fixes | Correctness | migration tools, SQL |
| Observability | Slow queries, locks, storage growth, alerts | Predictability | metrics, logs, dashboards |
psql
pg_dump / pg_restore
SQL scripts
cron / CI jobs
pgAdmin
DBeaver
SQLite Browser
Cloud consoles
┌──────────────┐ grants ┌──────────────┐
│ Admin │──────────────────▶│ Role │
│ (owner) │ │ (policy) │
└──────┬───────┘ └──────┬───────┘
│ │
│ assigns │ used by
▼ ▼
┌──────────────┐ ┌──────────────┐
│ User │ │ App │
│ (human) │ │ (service) │
└──────┬───────┘ └──────┬───────┘
│ │
└────────────── SQL ───────────────┘
▼
┌──────────────┐
│ DBMS │
│ checks rules │
└──────────────┘
Traditional rule: practice restore like a fire drill.
┌──────────────┐ pg_dump ┌──────────────┐
│ Primary │─────────────────▶│ Backup │
│ DBMS │ │ Store │
└──────┬───────┘ └──────┬───────┘
│ │
│ restore test │ retention
▼ ▼
┌──────────────┐ pg_restore ┌──────────────┐
│ Staging DB │◀────────────────│ Policies │
│ (verification) │ (30d / 90d) │
└──────┬───────┘ └──────────────┘
│
▼
confidence report
| Step | Action | Admin goal |
|---|---|---|
| 1 | Plan change | Know impact |
| 2 | Run in staging | Catch issues early |
| 3 | Deploy during window | Minimize disruption |
| 4 | Validate with queries | Confirm correctness |
| 5 | Record + audit | Traceability |
Signals → Actions
Slow queries → optimize SQL / add index
Locks → fix transactions
Disk growth → retention / cleanup
Auth failures → rotate secrets / investigate
Backup failures → stop and repair immediately
┌──────────────────────┐
│ Users │
└──────────┬───────────┘
│ HTTPS
▼
┌──────────────────────┐
│ Frontend │
└──────────┬───────────┘
│ REST
▼
┌──────────────────────┐
│ Backend │
└──────────┬───────────┘
│ SQL
▼
┌──────────────────────┐
│ DBMS │
└──────────┬───────────┘
▲
│ privileged (admin)
┌──────────┴───────────┐
│ Admin Interface │
│ (CLI · GUI · Tools) │
└──────────────────────┘
| Task | Admin? | Why |
|---|---|---|
| Creating a backup | Yes | Reliability ownership |
| Managing user permissions | Yes | Policy and access control |
| Writing a new API endpoint | No | Application layer work |
| Building a dashboard view | No | User or BI interface work |
Admin work:
- keeps system stable
- changes ownership settings
- restores service when broken
If it changes policy, backups, or schema:
it is admin.
Download DB Browser for SQLite
┌──────────────┐ open .db ┌──────────────────────┐
│ Student │ ──────────────▶ │ DB Browser (GUI) │
│ (local file) │ │ tables · schema · SQL│
└──────────────┘ └──────────┬───────────┘
│
▼
┌──────────────┐
│ SQLite DB │
│ tables · rows│
└──────────────┘
Is this an Admin Task, Developer Task, or User Task?
Is this an Admin Task, Developer Task, or User Task?
Is this an Admin Task, Developer Task, or User Task?
Is this an Admin Task, Developer Task, or User Task?
Is this an Admin Task, Developer Task, or User Task?
Traditional flow: learn SQLite first, then move to PostgreSQL.
Jump to PostgreSQL section
Backend
↓ (SSL)
PostgreSQL
Frontend
↓ HTTP
Backend (REST)
↓ SQL
Database
@app.route('/flowers')
def manage_flowers():
conn = get_db_connection()
cur = conn.cursor()
cur.execute("SELECT * FROM flowers")
flowers = cur.fetchall()
cur.close()
conn.close()
return render_template('flowers.html', flowers=flowers)
Is this a Backend Database Developer Task?
Is this a Backend Database Developer Task?
Is this a Backend Database Developer Task?
Is fetching data from an API a frontend or backend task?
Who is responsible for validating user input before storing it in the database?
Does the frontend directly modify the database?
We want to keep track of the flower inventory.
Start on paper before using any tool.
@startuml
entity FlowerEntity {
* id: int
--
attribute_one: text
attribute_two: text
attribute_three: number
}
@enduml
Constraints protect the data, not the code.
Decide at the database level first.
# first time only
git clone https://github.com/SE4CPS/2026-COMP-163.git
# every class / work session
cd 2026-COMP-163
git pull
Which component is responsible for ensuring data validation and access control?
Which component plays a key role in optimizing system performance and preventing security breaches?
Which component is essential for managing user requests, processing data, and ensuring secure communication?
Which component typically handles REST API requests and responses?
Which component is primarily responsible for handling database connection strings?
CREATE TABLE Book (
book_id SERIAL PRIMARY KEY,
uuid UUID DEFAULT gen_random_uuid(),
electronic_article_number VARCHAR(50) UNIQUE NOT NULL,
isbn VARCHAR(512) UNIQUE NOT NULL,
title VARCHAR(200) NOT NULL,
author VARCHAR(100) NOT NULL,
price DECIMAL(10,2) NOT NULL,
stock INT NOT NULL
);
UUID (Universally Unique Identifier) is a 128-bit unique identifier
Example: 550e8400-e29b-41d4-a716-446655440000
-- Supported in PostgreSQL, MySQL (8.0+), MariaDB, and others
CREATE TABLE Users (
user_id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
import uuid
# Generate a new UUID
user_id = uuid.uuid4()
print("Generated UUID:", user_id)
import { v4 as uuidv4 } from 'uuid';
// Generate a new UUID
const userId = uuidv4();
console.log("Generated UUID:", userId);
// Generate a UUID using the browser's crypto API
function generateUUID() {
return crypto.randomUUID();
}
console.log("Generated UUID:", generateUUID());
SELECT * FROM flowers;
SELECT * FROM flowers ORDER BY price DESC LIMIT 1;
SELECT * FROM customers;
SELECT orders.order_id, customers.name, orders.order_date, orders.total
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id;
SELECT DISTINCT customers.name, customers.email
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id;
SELECT COUNT(*) FROM orders;
SELECT SUM(total) FROM orders;
SELECT * FROM order_details WHERE order_id = 1;
SELECT * FROM flowers WHERE stock > 0;
SELECT * FROM orders WHERE order_date > '2024-01-01';
SELECT * FROM flowers WHERE price > 10.00;
SELECT * FROM customers WHERE phone IS NOT NULL;
SELECT * FROM orders WHERE total > 50.00;
SELECT * FROM customers WHERE email LIKE '%@gmail.com';
SELECT * FROM orders WHERE DATE(order_date) = '2024-02-01';
SELECT * FROM flowers WHERE color = 'Red';
SELECT * FROM flowers WHERE LOWER(color) = 'red';
SELECT * FROM customers WHERE address IS NULL;
aggregates · grouping · ordering · limiting · analysis patterns
SELECT
columns, aggregates
FROM
table
WHERE
row filters
GROUP BY
groups
HAVING
group filters
ORDER BY
sort
LIMIT / OFFSET
page
Traditional habit:
Read the query in order:
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
-- For Module 5 practice (single table focus)
-- flowers(flower_id, name, color, price, stock, category, supplier)
-- Example categories: Rose, Tulip, Lily
-- Example suppliers: BloomCo, PetalWorks, GardenDirect
ORDER BY price DESC, name ASC
First:
highest price
Then:
alphabetical name
| flower_id | name | color | price | stock | category | supplier |
|---|---|---|---|---|---|---|
| 1 | Rose | Red | 12.50 | 30 | Rose | BloomCo |
| 2 | Tulip | Yellow | 6.00 | 80 | Tulip | GardenDirect |
| 3 | Lily | White | 9.25 | 15 | Lily | PetalWorks |
| 4 | Orchid | Purple | 19.99 | 6 | Orchid | BloomCo |
| 5 | Daisy | White | 3.50 | 120 | Daisy | GardenDirect |
SELECT flower_id, name, price
FROM flowers
ORDER BY price DESC, name ASC;
In the logical order of a query, ORDER BY is applied:
ORDER BY + LIMIT
= Top N pattern
LIMIT alone
= random-looking subset
| flower_id | name | price | stock | category |
|---|---|---|---|---|
| 1 | Rose | 12.50 | 30 | Rose |
| 2 | Tulip | 6.00 | 80 | Tulip |
| 3 | Lily | 9.25 | 15 | Lily |
| 4 | Orchid | 19.99 | 6 | Orchid |
| 5 | Daisy | 3.50 | 120 | Daisy |
SELECT name, price
FROM flowers
ORDER BY price DESC
LIMIT 3;
Pick the best explanation:
COUNT(*)
= total rows
COUNT(stock)
= rows where stock is NOT NULL
| flower_id | name | stock | supplier |
|---|---|---|---|
| 1 | Rose | 30 | BloomCo |
| 2 | Tulip | 80 | GardenDirect |
| 3 | Lily | 15 | PetalWorks |
| 4 | Orchid | 6 | BloomCo |
| 5 | Daisy | 120 | GardenDirect |
SELECT COUNT(*) AS flower_count
FROM flowers;
Choose the correct statement:
MIN(price) = cheapest flower
MIN(stock) = lowest stock level
| name | price | stock |
|---|---|---|
| Rose | 12.50 | 30 |
| Tulip | 6.00 | 80 |
| Lily | 9.25 | 15 |
| Orchid | 19.99 | 6 |
| Daisy | 3.50 | 120 |
SELECT MIN(price) AS cheapest_price
FROM flowers;
Pick the best answer:
MAX(price) = highest price value
ORDER BY price DESC LIMIT 1
= the row with highest price
| name | price | stock |
|---|---|---|
| Rose | 12.50 | 30 |
| Tulip | 6.00 | 80 |
| Lily | 9.25 | 15 |
| Orchid | 19.99 | 6 |
| Daisy | 3.50 | 120 |
SELECT MAX(price) AS highest_price
FROM flowers;
You want the full flower record with the highest price.
SUM(stock) = total units on hand
SUM(price * stock) = inventory value estimate
| name | price | stock |
|---|---|---|
| Rose | 12.50 | 30 |
| Tulip | 6.00 | 80 |
| Lily | 9.25 | 15 |
| Orchid | 19.99 | 6 |
| Daisy | 3.50 | 120 |
SELECT SUM(stock) AS total_units
FROM flowers;
Pick the most fitting use case:
AVG(price) = typical price
In PostgreSQL:
ROUND(AVG(price), 2)
| name | price | category |
|---|---|---|
| Rose | 12.50 | Rose |
| Tulip | 6.00 | Tulip |
| Lily | 9.25 | Lily |
| Orchid | 19.99 | Orchid |
| Daisy | 3.50 | Daisy |
SELECT ROUND(AVG(price), 2) AS avg_price
FROM flowers;
What does AVG compute?
GROUP BY category
Each category becomes one output row:
- COUNT(*)
- AVG(price)
- SUM(stock)
| name | category | price | stock |
|---|---|---|---|
| Rose | Rose | 12.50 | 30 |
| Rose (Mini) | Rose | 8.75 | 25 |
| Tulip | Tulip | 6.00 | 80 |
| Lily | Lily | 9.25 | 15 |
| Orchid | Orchid | 19.99 | 6 |
SELECT
category,
COUNT(*) AS items,
ROUND(AVG(price), 2) AS avg_price,
SUM(stock) AS total_stock
FROM flowers
GROUP BY category
ORDER BY items DESC;
When you GROUP BY category, each result row represents:
WHERE: individual rows
HAVING: grouped results
If you need COUNT(*) in the filter:
use HAVING
| name | category | price |
|---|---|---|
| Rose | Rose | 12.50 |
| Rose (Mini) | Rose | 8.75 |
| Rose (Garden) | Rose | 10.00 |
| Tulip | Tulip | 6.00 |
| Orchid | Orchid | 19.99 |
SELECT
category,
COUNT(*) AS items
FROM flowers
GROUP BY category
HAVING COUNT(*) >= 2
ORDER BY items DESC;
You want categories where COUNT(*) is at least 2. Which clause?
DISTINCT applies to SELECT output
COUNT(DISTINCT x)
= number of unique values of x
| name | supplier | category |
|---|---|---|
| Rose | BloomCo | Rose |
| Orchid | BloomCo | Orchid |
| Tulip | GardenDirect | Tulip |
| Daisy | GardenDirect | Daisy |
| Lily | PetalWorks | Lily |
SELECT COUNT(DISTINCT supplier) AS supplier_count
FROM flowers;
What is being counted?
ORDER BY category ASC, price DESC
First group by category (visually),
then show expensive items at top
| name | category | price |
|---|---|---|
| Rose | Rose | 12.50 |
| Rose (Garden) | Rose | 12.50 |
| Tulip | Tulip | 6.00 |
| Tulip (Premium) | Tulip | 6.00 |
| Orchid | Orchid | 19.99 |
SELECT name, category, price
FROM flowers
ORDER BY price DESC, name ASC;
Choose the best reason:
Example goal:
Average price of items that are in stock
WHERE stock > 0
then AVG(price)
| name | price | stock | category |
|---|---|---|---|
| Rose | 12.50 | 30 | Rose |
| Tulip | 6.00 | 0 | Tulip |
| Lily | 9.25 | 15 | Lily |
| Orchid | 19.99 | 0 | Orchid |
| Daisy | 3.50 | 120 | Daisy |
SELECT ROUND(AVG(price), 2) AS avg_in_stock_price
FROM flowers
WHERE stock > 0;
WHERE changes the aggregate results because it:
Pattern:
GROUP BY supplier
HAVING SUM(stock) < threshold
Find weak suppliers
| name | supplier | stock |
|---|---|---|
| Rose | BloomCo | 30 |
| Orchid | BloomCo | 6 |
| Tulip | GardenDirect | 80 |
| Daisy | GardenDirect | 120 |
| Lily | PetalWorks | 15 |
SELECT
supplier,
SUM(stock) AS total_stock
FROM flowers
GROUP BY supplier
HAVING SUM(stock) < 25
ORDER BY total_stock ASC;
When does HAVING filter happen?
Example:
Top 2 categories by total stock
GROUP BY category
ORDER BY SUM(stock) DESC
LIMIT 2
| category | name | stock |
|---|---|---|
| Rose | Rose | 30 |
| Rose | Rose (Mini) | 25 |
| Tulip | Tulip | 80 |
| Daisy | Daisy | 120 |
| Lily | Lily | 15 |
SELECT
category,
SUM(stock) AS total_stock
FROM flowers
GROUP BY category
ORDER BY total_stock DESC
LIMIT 2;
Pick the most correct query pattern:
COALESCE(stock, 0)
If stock is NULL:
treat it as 0 for reporting
| name | stock | supplier |
|---|---|---|
| Rose | 30 | BloomCo |
| Tulip | 80 | GardenDirect |
| Lily | 15 | PetalWorks |
| Orchid | NULL | BloomCo |
| Daisy | 120 | GardenDirect |
SELECT SUM(COALESCE(stock, 0)) AS total_units
FROM flowers;
Choose the correct interpretation:
CASE
WHEN price >= 10 THEN 'Premium'
ELSE 'Standard'
END
| name | price |
|---|---|
| Rose | 12.50 |
| Tulip | 6.00 |
| Lily | 9.25 |
| Orchid | 19.99 |
| Daisy | 3.50 |
SELECT
CASE
WHEN price >= 10 THEN 'Premium'
ELSE 'Standard'
END AS tier,
COUNT(*) AS items
FROM flowers
GROUP BY tier
ORDER BY items DESC;
Pick the best description:
Page size = 5
Page 1: LIMIT 5 OFFSET 0
Page 2: LIMIT 5 OFFSET 5
| name | price |
|---|---|
| Orchid | 19.99 |
| Rose | 12.50 |
| Lily | 9.25 |
| Tulip | 6.00 |
| Daisy | 3.50 |
SELECT name, price
FROM flowers
ORDER BY price DESC
LIMIT 2 OFFSET 2;
Choose the correct use:
Example UUID:
550e8400-e29b-41d4-a716-446655440000
Same format everywhere
No central coordination needed
SERIAL:
local, incremental
predictable
UUID:
global, random
safer for APIs
-- UUID column type
id UUID
-- Valid in PRIMARY KEY
PRIMARY KEY (id)
-- Enable extension once
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
-- Generate random UUID
SELECT gen_random_uuid();
CREATE TABLE orders (
order_id UUID
DEFAULT gen_random_uuid(),
customer_name TEXT NOT NULL,
created_at TIMESTAMP NOT NULL
DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (order_id)
);
-- Insert without UUID
INSERT INTO orders (customer_name)
VALUES ('Alice');
SELECT category, COUNT(*) AS items
FROM flowers
GROUP BY category
HAVING COUNT(*) >= 3
ORDER BY items DESC
LIMIT 5;
You want to filter rows where stock > 0 before computing SUM(stock).
SELECT COUNT(DISTINCT category)
FROM flowers;
SELECT category, price, COUNT(*)
FROM flowers
GROUP BY category;
What is wrong?
Goal: return the row for the cheapest flower with stock > 0
SELECT
COUNT(*),
MIN(price),
MAX(price),
ROUND(AVG(price),2)
FROM flowers;
| name | price |
|---|---|
| Rose | 12.50 |
| Tulip | 6.00 |
| Lily | 9.25 |
| Orchid | 19.99 |
| Daisy | 3.50 |
SELECT
COUNT(*) AS items,
MIN(price) AS min_price,
MAX(price) AS max_price,
ROUND(AVG(price), 2) AS avg_price
FROM flowers;
SELECT COUNT(*), MIN(price), MAX(price)
FROM flowers;
Find categories with avg price >= 10
GROUP BY category
HAVING AVG(price) >= 10
| name | category | price |
|---|---|---|
| Rose | Rose | 12.50 |
| Rose (Mini) | Rose | 8.75 |
| Orchid | Orchid | 19.99 |
| Tulip | Tulip | 6.00 |
| Lily | Lily | 9.25 |
SELECT
category,
ROUND(AVG(price), 2) AS avg_price
FROM flowers
GROUP BY category
HAVING AVG(price) >= 10
ORDER BY avg_price DESC;
Which level is filtered?
GROUP BY category, supplier
One output row per (category, supplier) pair
| name | category | supplier | stock |
|---|---|---|---|
| Rose | Rose | BloomCo | 30 |
| Rose (Mini) | Rose | GardenDirect | 25 |
| Tulip | Tulip | GardenDirect | 80 |
| Lily | Lily | PetalWorks | 15 |
| Orchid | Orchid | BloomCo | 6 |
SELECT
category,
supplier,
SUM(stock) AS total_stock
FROM flowers
GROUP BY category, supplier
ORDER BY category ASC, total_stock DESC;
Each result row represents:
Find categories where total stock < 40
GROUP BY category
HAVING SUM(stock) < 40
| category | name | stock |
|---|---|---|
| Rose | Rose | 30 |
| Rose | Rose (Mini) | 25 |
| Orchid | Orchid | 6 |
| Lily | Lily | 15 |
| Tulip | Tulip | 80 |
SELECT
category,
SUM(stock) AS total_stock
FROM flowers
GROUP BY category
HAVING SUM(stock) < 40
ORDER BY total_stock ASC;
You already grouped and computed SUM(stock). Now you filter by that sum using:
ORDER BY AVG(price) DESC
Sort groups by their computed average
| category | prices |
|---|---|
| Rose | 12.50, 8.75, 10.00 |
| Orchid | 19.99 |
| Tulip | 6.00 |
| Lily | 9.25 |
SELECT
category,
ROUND(AVG(price), 2) AS avg_price
FROM flowers
GROUP BY category
ORDER BY AVG(price) DESC;
This ordering is based on:
Pick the clause that defines the buckets:
Pick the clause that filters aggregated results:
Select the correct query:
COUNT(*) produces:
Choose the best answer:
Practice prompts (write the SQL):
-- PostgreSQL: one table for advanced query practice
CREATE TABLE book_inventory (
book_id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
author TEXT NOT NULL,
genre TEXT NOT NULL,
publisher TEXT NOT NULL,
price NUMERIC(10,2) NOT NULL,
stock INT NOT NULL,
published_year INT NOT NULL
);
-- Seed data idea (students can add more rows)
-- genre: 'Databases', 'Software Engineering', 'AI', 'Security'
-- publisher: 'Prentice Hall', 'OReilly', 'Pearson', 'MIT Press'
-- Typical questions :
-- 1) Show each book with publisher name
-- 2) Show each book with author names
-- 3) Find missing relationships (orphans)
-- That is why JOIN exists.
ERD relationship
↓ (implement)
PK/FK columns
↓ (query)
JOIN ON keys
-- PostgreSQL example question:
-- Show each book with its author name
-- Need:
-- books + authors + relationship (book_authors)
CREATE TABLE authors (
author_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
country TEXT
);
CREATE TABLE publishers (
publisher_id SERIAL PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE books (
book_id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
publisher_id INT REFERENCES publishers(publisher_id),
published_year INT NOT NULL,
price NUMERIC(10,2) NOT NULL
);
CREATE TABLE book_authors (
book_id INT REFERENCES books(book_id),
author_id INT REFERENCES authors(author_id),
PRIMARY KEY (book_id, author_id)
);
| author_id | name | country |
|---|---|---|
| 1 | Ada Lovelace | UK |
| 2 | Alan Turing | UK |
| 3 | Grace Hopper | USA |
| 4 | Edsger Dijkstra | Netherlands |
| book_id | title | publisher_id | published_year | price |
|---|---|---|---|---|
| 10 | Foundations of Computing | 100 | 2020 | 59.00 |
| 11 | Practical SQL Systems | 101 | 2019 | 49.50 |
| 12 | Algorithms the Classic Way | 100 | 2018 | 72.25 |
| 13 | Systems Thinking | NULL | 2021 | 39.99 |
| publisher_id | name |
|---|---|
| 100 | Classic Press |
| 101 | Data House |
| 102 | Future Books |
| book_id | author_id |
|---|---|
| 10 | 1 |
| 10 | 2 |
| 11 | 3 |
| 12 | 4 |
INNER JOIN = intersection
A ∩ B
Only matching keys from both tables
SELECT
b.book_id,
b.title,
p.name AS publisher
FROM books b
INNER JOIN publishers p
ON b.publisher_id = p.publisher_id
ORDER BY b.book_id;
Pick the best statement:
LEFT JOIN = keep left
All A
Plus matching B
Else NULL for B
SELECT
b.book_id,
b.title,
p.name AS publisher
FROM books b
LEFT JOIN publishers p
ON b.publisher_id = p.publisher_id
ORDER BY b.book_id;
SELECT
b.book_id,
b.title,
p.name AS publisher
FROM books b
LEFT JOIN publishers p
ON b.publisher_id = p.publisher_id
WHERE p.name = 'Classic Press'
ORDER BY b.book_id;
WHERE runs after the join. Rows where p.name is NULL are removed.
| book_id | title | publisher |
|---|---|---|
| 10 | Foundations of Computing | Classic Press |
| 12 | Algorithms the Classic Way | Classic Press |
⚠ Books without a publisher are gone.
LEFT JOIN behaves like INNER JOIN.
SELECT
b.book_id,
b.title,
p.name AS publisher
FROM books b
LEFT JOIN publishers p
ON b.publisher_id = p.publisher_id
AND p.name = 'Classic Press'
ORDER BY b.book_id;
Filter is applied during the join. Left rows are preserved.
| book_id | title | publisher |
|---|---|---|
| 10 | Foundations of Computing | Classic Press |
| 11 | Practical SQL Systems | NULL |
| 12 | Algorithms the Classic Way | Classic Press |
| 13 | Systems Thinking | NULL |
✔ All books remain.
✔ Publisher attached only when it matches.
LEFT JOIN then WHERE (right_col = ...)
can behave like INNER JOIN
Safer:
LEFT JOIN ... ON ... AND (right_col = ...)
SELECT
b.book_id,
b.title,
p.name AS publisher
FROM books b
LEFT JOIN publishers p
ON b.publisher_id = p.publisher_id
AND p.name = 'Classic Press'
ORDER BY b.book_id;
What can happen?
RIGHT JOIN = keep right
Usually rewrite as LEFT JOIN
by swapping sides
-- RIGHT JOIN form
SELECT p.name, b.title
FROM books b
RIGHT JOIN publishers p
ON b.publisher_id = p.publisher_id;
-- Preferred LEFT JOIN form
SELECT p.name, b.title
FROM publishers p
LEFT JOIN books b
ON b.publisher_id = p.publisher_id;
FULL OUTER JOIN = union of keys
All A and all B
Match when possible
Else NULLs
SELECT
b.title,
p.name AS publisher
FROM books b
FULL OUTER JOIN publishers p
ON b.publisher_id = p.publisher_id
ORDER BY p.name, b.title;
| book_id | title | publisher_id |
|---|---|---|
| 10 | Foundations of Computing | 100 |
| 11 | Practical SQL Systems | 101 |
| 12 | Algorithms the Classic Way | 100 |
| 13 | Systems Thinking | NULL |
Book 13 has no publisher_id.
| publisher_id | name |
|---|---|
| 100 | Classic Press |
| 101 | Data House |
| 102 | Future Books |
Publisher 102 has no books.
SELECT
b.title,
p.name AS publisher
FROM books b
FULL OUTER JOIN publishers p
ON b.publisher_id = p.publisher_id
ORDER BY p.name, b.title;
| title | publisher |
|---|---|
| Systems Thinking | NULL |
| Algorithms the Classic Way | Classic Press |
| Foundations of Computing | Classic Press |
| Practical SQL Systems | Data House |
| NULL | Future Books |
Both “missing sides” appear: book without publisher → publisher is NULL, publisher without books → title is NULL.
CROSS JOIN = all combinations
Rows = |A| × |B|
| author_id | name |
|---|---|
| 1 | Ada Lovelace |
| 2 | Alan Turing |
2 rows
| publisher_id | name |
|---|---|
| 100 | Classic Press |
| 101 | Data House |
| 102 | Future Books |
3 rows
SELECT
a.name AS author,
p.name AS publisher
FROM authors a
CROSS JOIN publishers p
ORDER BY a.name, p.name;
| author | publisher |
|---|---|
| Ada Lovelace | Classic Press |
| Ada Lovelace | Data House |
| Ada Lovelace | Future Books |
| Alan Turing | Classic Press |
| Alan Turing | Data House |
| Alan Turing | Future Books |
Cartesian product: rows = |A| × |B| = 2 × 3 = 6.
SELECT
a.name AS author,
p.name AS publisher
FROM authors a
CROSS JOIN publishers p
ORDER BY a.name, p.name;
authors ← book_authors → books
Two joins:
authors to bridge
bridge to books
SELECT
b.title,
a.name AS author
FROM books b
JOIN book_authors ba
ON ba.book_id = b.book_id
JOIN authors a
ON a.author_id = ba.author_id
ORDER BY b.title, a.name;
SELECT
b.title,
COUNT(*) AS author_count
FROM books b
JOIN book_authors ba
ON ba.book_id = b.book_id
GROUP BY b.title
ORDER BY author_count DESC, b.title;
| book_id | title |
|---|---|
| 10 | Foundations of Computing |
| 11 | Practical SQL Systems |
| 12 | Algorithms the Classic Way |
| 13 | Systems Thinking |
One row per book
| book_id | author_id |
|---|---|
| 10 | 1 |
| 10 | 2 |
| 11 | 3 |
| 12 | 4 |
Many rows per book possible
SELECT
b.title,
COUNT(*) AS author_count
FROM books b
JOIN book_authors ba
ON ba.book_id = b.book_id
GROUP BY b.title
ORDER BY author_count DESC, b.title;
Joined rows (one per book-author match)
| title | author_id |
|---|---|
| Foundations of Computing | 1 |
| Foundations of Computing | 2 |
| Practical SQL Systems | 3 |
| Algorithms the Classic Way | 4 |
Grouped result (one row per book)
| title | author_count |
|---|---|
| Foundations of Computing | 2 |
| Algorithms the Classic Way | 1 |
| Practical SQL Systems | 1 |
After joining books to book_authors, book rows may appear multiple times because:
SELECT
b.book_id,
b.title
FROM books b
LEFT JOIN book_authors ba
ON ba.book_id = b.book_id
WHERE ba.book_id IS NULL
ORDER BY b.book_id;
| book_id | title |
|---|---|
| 10 | Foundations of Computing |
| 11 | Practical SQL Systems |
| 12 | Algorithms the Classic Way |
| 13 | Systems Thinking |
All books (starting point)
| book_id | author_id |
|---|---|
| 10 | 1 |
| 10 | 2 |
| 11 | 3 |
| 12 | 4 |
Bridge rows (missing book_id = 13)
SELECT
b.book_id,
b.title
FROM books b
LEFT JOIN book_authors ba
ON ba.book_id = b.book_id
WHERE ba.book_id IS NULL
ORDER BY b.book_id;
After LEFT JOIN (unmatched become NULL)
| book_id | title | ba.book_id |
|---|---|---|
| 10 | Foundations of Computing | 10 |
| 10 | Foundations of Computing | 10 |
| 11 | Practical SQL Systems | 11 |
| 12 | Algorithms the Classic Way | 12 |
| 13 | Systems Thinking | NULL |
Filtered result (books with no authors)
| book_id | title |
|---|---|
| 13 | Systems Thinking |
Traditional habit: LEFT JOIN + IS NULL reveals missing relationships.
Anti join goal:
Keep A rows where no B match exists
| book_id | title |
|---|---|
| 10 | Foundations of Computing |
| 11 | Practical SQL Systems |
| 12 | Algorithms the Classic Way |
| 13 | Systems Thinking |
We want books that have no match in Table B.
| book_id | author_id |
|---|---|
| 10 | 1 |
| 10 | 2 |
| 11 | 3 |
| 12 | 4 |
Missing match: book_id = 13
Goal: keep A rows where no B match exists.
-- Pattern 1 (common): LEFT JOIN then IS NULL
SELECT b.book_id, b.title
FROM books b
LEFT JOIN book_authors ba
ON ba.book_id = b.book_id
WHERE ba.book_id IS NULL
ORDER BY b.book_id;
-- Pattern 2 (also common): NOT EXISTS
SELECT b.book_id, b.title
FROM books b
WHERE NOT EXISTS (
SELECT 1
FROM book_authors ba
WHERE ba.book_id = b.book_id
)
ORDER BY b.book_id;
Result table
| book_id | title |
|---|---|
| 13 | Systems Thinking |
Traditional habit: use anti joins for data quality (missing links, orphan checks).
SELECT
b.book_id,
b.title
FROM books b
WHERE NOT EXISTS (
SELECT 1
FROM book_authors ba
WHERE ba.book_id = b.book_id
)
ORDER BY b.book_id;
| book_id | title |
|---|---|
| 10 | Foundations of Computing |
| 11 | Practical SQL Systems |
| 12 | Algorithms the Classic Way |
| 13 | Systems Thinking |
| book_id | author_id |
|---|---|
| 10 | 1 |
| 10 | 2 |
| 11 | 3 |
| 12 | 4 |
Missing: book_id = 13 has no rows here.
SELECT
b.book_id,
b.title
FROM books b
WHERE NOT EXISTS (
SELECT 1
FROM book_authors ba
WHERE ba.book_id = b.book_id
)
ORDER BY b.book_id;
Result table
| book_id | title |
|---|---|
| 13 | Systems Thinking |
Traditional habit: use NOT EXISTS when you only need a yes/no match, not joined columns.
Semi join:
Keep A rows where a match exists in B
Use EXISTS
| book_id | title |
|---|---|
| 10 | Foundations of Computing |
| 11 | Practical SQL Systems |
| 12 | Algorithms the Classic Way |
| 13 | Systems Thinking |
| book_id | author_id |
|---|---|
| 10 | 1 |
| 10 | 2 |
| 11 | 3 |
| 12 | 4 |
Exists match for: 10, 11, 12
No match for: 13
Goal: keep A rows where a B match exists.
SELECT
b.book_id,
b.title
FROM books b
WHERE EXISTS (
SELECT 1
FROM book_authors ba
WHERE ba.book_id = b.book_id
)
ORDER BY b.book_id;
Result table
| book_id | title |
|---|---|
| 10 | Foundations of Computing |
| 11 | Practical SQL Systems |
| 12 | Algorithms the Classic Way |
Traditional habit: use EXISTS to avoid duplicates when the right side is “many”.
SELECT
b.book_id,
b.title
FROM books b
WHERE EXISTS (
SELECT 1
FROM book_authors ba
WHERE ba.book_id = b.book_id
)
ORDER BY b.book_id;
authors a1 JOIN authors a2
Same table, different roles
CREATE TABLE editions (
edition_id INT PRIMARY KEY,
book_id INT REFERENCES books(book_id),
edition_number INT NOT NULL,
previous_edition_id INT REFERENCES editions(edition_id)
);
previous_edition_id → edition_id| edition_id | edition_number | previous_edition_id |
|---|---|---|
| 1 | 1 | NULL |
| 2 | 2 | 1 |
| 3 | 3 | 2 |
SELECT
e.edition_number,
prev.edition_number AS prev
FROM editions e
LEFT JOIN editions prev
ON e.previous_edition_id = prev.edition_id;
| edition | prev |
|---|---|
| 1 | NULL |
| 2 | 1 |
| 3 | 2 |
SELECT
e.edition_id,
e.edition_number,
prev.edition_number AS previous_edition_number
FROM editions e
LEFT JOIN editions prev
ON e.previous_edition_id = prev.edition_id
ORDER BY e.edition_id;
JOIN ... ON b.price BETWEEN band.min AND band.max
Not just equality
< > BETWEEN| band | min | max |
|---|---|---|
| Budget | 0 | 39.99 |
| Standard | 40 | 69.99 |
| Premium | 70 | 999 |
SELECT
b.title,
pb.band
FROM books b
JOIN price_bands pb
ON b.price BETWEEN pb.min AND pb.max;
| title | band |
|---|---|
| Foundations of Computing | Standard |
| Algorithms the Classic Way | Premium |
| Systems Thinking | Budget |
CREATE TABLE price_bands (
band TEXT PRIMARY KEY,
min_price NUMERIC(10,2) NOT NULL,
max_price NUMERIC(10,2) NOT NULL
);
-- Example rows:
-- ('Budget', 0.00, 39.99)
-- ('Standard', 40.00, 69.99)
-- ('Premium', 70.00, 9999.99)
SELECT
b.title,
b.price,
pb.band
FROM books b
JOIN price_bands pb
ON b.price BETWEEN pb.min_price AND pb.max_price
ORDER BY b.price DESC;
books → publishers
books → book_authors → authors
Keep each link explicit
SELECT
b.title,
p.name AS publisher,
a.name AS author
FROM books b
LEFT JOIN publishers p
ON b.publisher_id = p.publisher_id
JOIN book_authors ba
ON ba.book_id = b.book_id
JOIN authors a
ON a.author_id = ba.author_id
ORDER BY b.title, a.name;
If you want unique books only:
SELECT DISTINCT b.book_id, b.title
...
| book_id | title | author |
|---|---|---|
| 10 | Foundations of Computing | Ada |
| 10 | Foundations of Computing | Turing |
Same book appears multiple times.
SELECT DISTINCT
b.book_id,
b.title
FROM books b
JOIN book_authors ba
ON ba.book_id = b.book_id;
| book_id | title |
|---|---|
| 10 | Foundations of Computing |
One row per book.
SELECT DISTINCT
b.book_id,
b.title
FROM books b
JOIN book_authors ba
ON ba.book_id = b.book_id
ORDER BY b.book_id;
Prefer explicit ON
USING is acceptable if column names are stable
Avoid NATURAL JOIN
| books | publishers |
|---|---|
| publisher_id | publisher_id |
-- Explicit (best)
JOIN publishers p
ON b.publisher_id = p.publisher_id
-- Shorthand
JOIN publishers p
USING (publisher_id)
-- Avoid
NATURAL JOIN publishers
Traditional habit: prefer explicit ON.
-- If both tables have publisher_id
SELECT
b.title,
p.name AS publisher
FROM books b
JOIN publishers p
USING (publisher_id);
SELECT
b.title,
COALESCE(p.name, 'Unknown Publisher') AS publisher
FROM books b
LEFT JOIN publishers p
ON b.publisher_id = p.publisher_id;
COALESCE replaces NULL with a label| title | publisher |
|---|---|
| Foundations of Computing | Classic Press |
| Systems Thinking | NULL |
NULL hides missing information.
SELECT
b.title,
COALESCE(p.name, 'Unknown') AS publisher
FROM books b
LEFT JOIN publishers p
ON b.publisher_id = p.publisher_id;
| title | publisher |
|---|---|
| Foundations of Computing | Classic Press |
| Systems Thinking | Unknown |
Clear and explicit in reports.
SELECT b.title, p.name
FROM (
SELECT * FROM books
WHERE published_year >= 2020
) b
LEFT JOIN publishers p
ON b.publisher_id = p.publisher_id;
SELECT b.title, p.name
FROM books b
JOIN publishers p
ON b.publisher_id = p.publisher_id
WHERE p.name = 'Data House';
SELECT name
FROM books b
JOIN publishers p ON ...
-- Ambiguous: which name?
SELECT p.name
-- Clear
FROM books b
LEFT JOIN publishers p ON ...
JOIN book_authors ba ON ...
JOIN authors a ON ...
One step at a time
SELECT
COUNT(*) AS joined_rows,
COUNT(DISTINCT b.book_id) AS unique_books
FROM books b
JOIN book_authors ba
ON ba.book_id = b.book_id;
SELECT
p.name AS publisher,
COUNT(*) AS book_rows
FROM publishers p
LEFT JOIN books b
ON b.publisher_id = p.publisher_id
GROUP BY p.name
ORDER BY book_rows DESC;
What is being counted?
SELECT
b.title,
STRING_AGG(a.name, ', ' ORDER BY a.name) AS authors
FROM books b
JOIN book_authors ba ON ba.book_id = b.book_id
JOIN authors a ON a.author_id = ba.author_id
GROUP BY b.title
ORDER BY b.title;
STRING_AGG| title | author |
|---|---|
| Foundations of Computing | Ada |
| Foundations of Computing | Turing |
Same book, many rows.
SELECT
b.title,
STRING_AGG(a.name, ', ' ORDER BY a.name) AS authors
FROM books b
JOIN book_authors ba ON ba.book_id = b.book_id
JOIN authors a ON a.author_id = ba.author_id
GROUP BY b.title
ORDER BY b.title;
| title | authors |
|---|---|
| Foundations of Computing | Ada, Turing |
One row, readable list.
SELECT
a.name,
COUNT(*) AS books_written
FROM authors a
JOIN book_authors ba
ON ba.author_id = a.author_id
GROUP BY a.name
ORDER BY books_written DESC, a.name
LIMIT 5;
| author | book_id |
|---|---|
| Ada | 10 |
| Ada | 12 |
| Turing | 10 |
Bridge creates one row per (author, book).
SELECT
a.name,
COUNT(*) AS books_written
FROM authors a
JOIN book_authors ba
ON ba.author_id = a.author_id
GROUP BY a.name
ORDER BY books_written DESC, a.name
LIMIT 5;
| author | books_written |
|---|---|
| Ada | 2 |
| Turing | 1 |
Ranked summary.
SELECT
b.title,
b.published_year,
a.name AS author
FROM books b
JOIN book_authors ba ON ba.book_id = b.book_id
JOIN authors a ON a.author_id = ba.author_id
WHERE b.published_year >= 2020
ORDER BY b.published_year DESC, b.title, a.name;
| title | year |
|---|---|
| Systems Thinking | 2021 |
| Foundations of Computing | 2020 |
| Algorithms the Classic Way | 2018 |
Only years ≥ 2020 qualify.
SELECT
b.title,
b.published_year,
a.name AS author
FROM books b
JOIN book_authors ba ON ba.book_id = b.book_id
JOIN authors a ON a.author_id = ba.author_id
WHERE b.published_year >= 2020
ORDER BY b.published_year DESC, b.title, a.name;
| title | year | author |
|---|---|---|
| Systems Thinking | 2021 | Grace |
| Foundations of Computing | 2020 | Ada |
| Foundations of Computing | 2020 | Turing |
Filter first, then join.
SELECT
p.name,
COUNT(b.book_id) AS book_count
FROM publishers p
LEFT JOIN books b
ON b.publisher_id = p.publisher_id
GROUP BY p.name
HAVING COUNT(b.book_id) = 0
ORDER BY p.name;
| publisher | book_id |
|---|---|
| Classic Press | 10 |
| Data House | 11 |
| Future Books | NULL |
NULL means no matching book.
SELECT
p.name,
COUNT(b.book_id) AS book_count
FROM publishers p
LEFT JOIN books b
ON b.publisher_id = p.publisher_id
GROUP BY p.name
HAVING COUNT(b.book_id) = 0
ORDER BY p.name;
| publisher | book_count |
|---|---|
| Future Books | 0 |
Data quality check.
In a LEFT JOIN from publishers to books, COUNT(*) would:
-- Bad: missing ON creates cross join
SELECT b.title, p.name
FROM books b
JOIN publishers p;
-- Good: correct ON
SELECT b.title, p.name
FROM books b
JOIN publishers p
ON b.publisher_id = p.publisher_id;
-- Example: index the FK for faster joins
CREATE INDEX idx_books_publisher_id
ON books(publisher_id);
EXPLAIN
SELECT b.title, p.name
FROM books b
JOIN publishers p
ON b.publisher_id = p.publisher_id;
Goal: list all books, show publisher name when it exists.
SELECT b.title, p.name
FROM books b
LEFT JOIN publishers p
ON b.publisher_id = p.publisher_id
WHERE p.name = 'Classic Press';
What is the risk?
Goal: show author names and the titles they wrote.
You must use book_authors.
SELECT
a.name,
b.title
FROM authors a
JOIN book_authors ba
ON ba.author_id = a.author_id
JOIN books b
ON b.book_id = ba.book_id
ORDER BY a.name, b.title;
-- Orphaned books.publisher_id
SELECT
b.book_id,
b.title,
b.publisher_id
FROM books b
LEFT JOIN publishers p
ON b.publisher_id = p.publisher_id
WHERE b.publisher_id IS NOT NULL
AND p.publisher_id IS NULL
ORDER BY b.book_id;
SELECT
p.name AS publisher,
b.title,
b.price
FROM publishers p
JOIN (
SELECT publisher_id, MIN(price) AS min_price
FROM books
WHERE publisher_id IS NOT NULL
GROUP BY publisher_id
) m
ON m.publisher_id = p.publisher_id
JOIN books b
ON b.publisher_id = m.publisher_id
AND b.price = m.min_price
ORDER BY p.name, b.title;
NULL = NULL is not true
Prefer PK and FK joins
Avoid joining by names
Traditional habit:
Write joins so a reader can verify keys
without guessing
Traditional habit: start with the join keys, then refine.
-- Template
SELECT ...
FROM ...
JOIN ... ON ...
LEFT JOIN ... ON ...
WHERE ...
GROUP BY ...
HAVING ...
ORDER BY ...
LIMIT ...;
CREATE TABLE trail (
trail_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
difficulty TEXT NOT NULL,
capacity INT NOT NULL
);
CREATE TABLE reservation (
reservation_id SERIAL PRIMARY KEY,
hiker_name TEXT NOT NULL,
hike_date DATE NOT NULL,
trail_id INT REFERENCES trail(trail_id)
);
trail (1) -------- (many) reservation
r.trail_id = t.trail_id
Write an INNER JOIN
between reservation and trail.
Add a WHERE clause.
Which JOIN keeps
all rows from trail?
Use COUNT()
Use GROUP BY
Use GROUP BY
Use HAVING
Use COUNT(*)
Use ORDER BY
Use LIMIT
Use LIMIT
Use OFFSET
SELECT
r.reservation_id,
r.hiker_name,
r.hike_date,
t.name AS trail_name
FROM reservation r
JOIN trail t
ON r.trail_id = t.trail_id;
SELECT
r.hiker_name,
r.hike_date,
t.name AS trail_name
FROM reservation r
JOIN trail t
ON r.trail_id = t.trail_id
WHERE t.difficulty = 'Hard';
SELECT
t.trail_id,
t.name AS trail_name,
r.reservation_id,
r.hiker_name,
r.hike_date
FROM trail t
LEFT JOIN reservation r
ON r.trail_id = t.trail_id
ORDER BY t.trail_id, r.hike_date;
SELECT
t.trail_id,
t.name AS trail_name,
COUNT(r.reservation_id) AS total_reservations
FROM trail t
LEFT JOIN reservation r
ON r.trail_id = t.trail_id
GROUP BY t.trail_id, t.name
ORDER BY total_reservations DESC, t.trail_id;
SELECT
t.trail_id,
t.name AS trail_name,
t.capacity,
COUNT(r.reservation_id) AS total_reservations
FROM trail t
LEFT JOIN reservation r
ON r.trail_id = t.trail_id
GROUP BY t.trail_id, t.name, t.capacity
HAVING COUNT(r.reservation_id) > t.capacity
ORDER BY total_reservations DESC, t.trail_id;
SELECT COUNT(*) AS total_reservations
FROM reservation;
SELECT
reservation_id,
hiker_name,
hike_date,
trail_id
FROM reservation
ORDER BY hike_date, reservation_id
LIMIT 3;
SELECT
reservation_id,
hiker_name,
hike_date,
trail_id
FROM reservation
ORDER BY hike_date, reservation_id
LIMIT 3 OFFSET 3;
Write 2 queries:
1) MIN/MAX/AVG on trail.capacity
2) GROUP BY trail.difficulty with COUNT(*)
-- 1) capacity stats
SELECT
MIN(capacity) AS min_capacity,
MAX(capacity) AS max_capacity,
AVG(capacity) AS avg_capacity
FROM trail;
-- 2) trails per difficulty
SELECT
difficulty,
COUNT(*) AS trails_count
FROM trail
GROUP BY difficulty
ORDER BY trails_count DESC;
In simple terms:
SQL is important because it gives us
a reliable and consistent way to
store, retrieve, and manage data.
Without SQL, structured data systems
would not scale or communicate well.
SQL:
Structured Query Language
Standard for relational databases.
SQLite:
A software system that
implements SQL.
Language ≠ Database
CREATE TABLE students (
student_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
major TEXT
);
SELECT name, major
FROM students
WHERE major = 'CS';
INSERT INTO students (student_id, name, major)
VALUES (1, 'Ana', 'CS');
UPDATE students
SET major = 'Data Science'
WHERE student_id = 1;
DELETE FROM students
WHERE student_id = 1;
| Student Question | Instructor Response |
|---|---|
| How are attendance and participation points calculated, and is there a maximum? | Attendance and participation are calculated based on consistent presence and meaningful engagement. |
| For team projects, how are individual contributions evaluated within a single team grade? | There is one overall team project grade, but each submission must clearly state individual responsibilities, and contributions may be reviewed through GitHub activity and documentation, with adjustments made if significant imbalance is observed. |
| Clarification on grading breakdown: What are the percentage values for labs, homework, projects, and exams? | The grading breakdown is as follows: Labs 15 percent, Homework 15 percent, Projects 40 percent, and Exams 30 percent. |
| How is assignment submission on GitHub acknowledged and recorded? | Assignment submission on GitHub is acknowledged by confirming that the code has been uploaded to the repository. |
Relational Algebra: Selection and Projection
Relational Algebra: Join (⨝)
Relational Algebra Optimization
API and Database Interaction
Web vs Console Interface
GROUP BY and Aggregation
CHECK Constraints
INNER JOIN
LEFT JOIN
RIGHT JOIN
Natural Join Risk (Textbook 1, p.128)
Which queries typically require JOINS? (Textbook 1, p.141)
Unmatched rows and OUTER JOIN (Textbook 1, p.149, Fig 7-12)
GROUP BY query difference (Textbook 2, p.67–71)
CHECK constraints meaning (Textbook 2, p.99)
Submission guidance
git clone https://github.com/SE4CPS/2026-COMP-163.git
cd Project/Part-1/Sample
UPDATE teamX_flowers
SET water_level =
water_level - (5 * (CURRENT_DATE - last_watered));
| order_id | customer | items | prices |
|---|---|---|---|
| 1001 | Alice | Rose,Tulip | 12.50,6.00 |
| 1002 | Bob | Lily,Orchid | 9.25,15.00 |
-- Not reliable in UNF SELECT * FROM Orders WHERE items = 'Rose';
-- Hacky and unsafe SELECT * FROM Orders WHERE items LIKE '%Rose%';
| order_id | flower | unit_price |
|---|---|---|
| 1001 | Rose | 12.50 |
| 1001 | Tulip | 6.00 |
| 1002 | Lily | 9.25 |
| 1002 | Orchid | 15.00 |
SELECT * FROM OrderItems WHERE flower = 'Rose';
| customer | phones |
|---|---|
| Alice | 0101,0109 |
| Clara | 0103,0111 |
-- Impossible to enforce uniqueness SELECT COUNT(*) FROM Customers WHERE phones = '0101';
| customer | phone |
|---|---|
| Alice | 0101 |
| Alice | 0109 |
| Clara | 0103 |
| Clara | 0111 |
SELECT * FROM CustomerPhones WHERE phone = '0101';
| supplier | location |
|---|---|
| BloomCo | Stockton,CA |
| FloraSupreme | Sacramento,CA |
-- Cannot filter by state safely SELECT * FROM Suppliers WHERE location LIKE '%CA';
| supplier | city | state |
|---|---|---|
| BloomCo | Stockton | CA |
| FloraSupreme | Sacramento | CA |
SELECT * FROM Suppliers WHERE state = 'CA';
| order_id | tags |
|---|---|
| 1001 | ["gift","red"] |
| 1002 | ["wedding"] |
-- Hard to count per tag SELECT COUNT(*) FROM Orders WHERE tags = 'gift';
| order_id | tag |
|---|---|
| 1001 | gift |
| 1001 | red |
| 1002 | wedding |
SELECT COUNT(*) FROM OrderTags WHERE tag = 'gift';
| order_id | flower |
|---|---|
| 1001 | Rose |
| 1001 | Rose |
-- Which row is correct? DELETE FROM OrderItems WHERE order_id = 1001 AND flower = 'Rose';
| PK | PK |
|---|---|
| order_id | flower |
| 1001 | Rose |
| 1001 | Tulip |
| order_id | items | prices |
|---|---|---|
| 1001 | Rose,Tulip | 12.50,6.00 |
| 1002 | Lily,Orchid | 9.25,15.00 |
| order_id | items | prices |
|---|---|---|
| 1001 | Rose | 12.50 |
| 1001 | Tulip | 6.00 |
| 1002 | Lily | 9.25 |
| 1002 | Orchid | 15.00 |
| supplier | address |
|---|---|
| BloomCo | 123 Market, Stockton, CA |
| FloraSupreme | 45 Garden, Sacramento, CA |
| supplier | street | city | state |
|---|---|---|---|
| BloomCo | 123 Market | Stockton | CA |
| FloraSupreme | 45 Garden | Sacramento | CA |
| PK | PK | |
|---|---|---|
| order_id | flower | unit_price |
| 1001 | Rose | 12.50 |
| 1001 | Tulip | 6.00 |
| 1002 | Lily | 9.25 |
| 1002 | Orchid | 15.00 |
reservation_id | hiker_name | trails | dates --------------------------------------------------------------------------- 5001 | Alice | Half Dome, Full Day | 2026-06-01, 2026-06-02 5001 | Bob | Yosemite Falls, Morning | 2026-06-03 5002 | Clara | Half Dome, Yosemite Falls | 2026-06-04, 2026-06-05 5002 | Tom | Mist Trail | 2026-06-06
| PK | PK | |||
|---|---|---|---|---|
| order_id | flower | unit_price | customer | phone |
| 1001 | Rose | 12.50 | Alice | 0101 |
| 1001 | Tulip | 6.00 | Alice | 0101 |
| 1002 | Lily | 9.25 | Bob | 0102 |
| 1002 | Orchid | 15.00 | Bob | 0102 |
-- Update touches multiple rows (symptom) UPDATE OrderItems SET phone = '9999' WHERE order_id = 1001;
| Column | Depends on | Why this is bad |
|---|---|---|
| customer | order_id | Repeats per item row |
| phone | order_id | Repeats per item row |
| unit_price | flower | Repeats across many orders |
| PK | ||
|---|---|---|
| order_id | customer | phone |
| 1001 | Alice | 0101 |
| 1002 | Bob | 0102 |
| PK | |||
|---|---|---|---|
| order_item_id | order_id | flower | unit_price |
| 1 | 1001 | Rose | 12.50 |
| 2 | 1001 | Tulip | 6.00 |
| 3 | 1002 | Lily | 9.25 |
| 4 | 1002 | Orchid | 15.00 |
-- Phone changes once UPDATE Orders SET phone = '9999' WHERE order_id = 1001;
| PK | ||
|---|---|---|
| flower_id | name | list_price |
| 1 | Rose | 12.50 |
| 2 | Tulip | 6.00 |
| 3 | Lily | 9.25 |
| 4 | Orchid | 15.00 |
| PK | ||
|---|---|---|
| order_item_id | order_id | flower_id |
| 1 | 1001 | 1 |
| 2 | 1001 | 2 |
| 3 | 1002 | 3 |
| 4 | 1002 | 4 |
-- Price changes once UPDATE Flowers SET list_price = 13.00 WHERE flower_id = 1;
| PK | ||||
|---|---|---|---|---|
| flower_id | name | list_price | supplier_name | supplier_addr |
| 1 | Rose | 12.50 | BloomCo | 123 Market |
| 2 | Tulip | 6.00 | BloomCo | 123 Market |
| 3 | Lily | 9.25 | FloraSupreme | 45 Garden |
| 4 | Orchid | 15.00 | FloraSupreme | 45 Garden |
-- Update anomaly (address repeated) UPDATE Flowers SET supplier_addr = '125 Market' WHERE supplier_name = 'BloomCo';
| PK | |||
|---|---|---|---|
| supplier_id | name | street | city |
| 10 | BloomCo | 123 Market | Stockton |
| 20 | FloraSupreme | 45 Garden | Sacramento |
| PK | |||
|---|---|---|---|
| flower_id | name | list_price | supplier_id |
| 1 | Rose | 12.50 | 10 |
| 2 | Tulip | 6.00 | 10 |
| 3 | Lily | 9.25 | 20 |
| 4 | Orchid | 15.00 | 20 |
-- Address changes once UPDATE Suppliers SET street = '125 Market' WHERE supplier_id = 10;
| PK | ||
|---|---|---|
| order_item_id | order_id | flower_id |
| 1 | 1001 | 1 |
| 2 | 1001 | 2 |
| 3 | 1002 | 3 |
| 4 | 1002 | 4 |
SELECT o.order_id, o.customer, f.name, s.name AS supplier FROM Orders o JOIN OrderItems oi ON oi.order_id = o.order_id JOIN Flowers f ON f.flower_id = oi.flower_id JOIN Suppliers s ON s.supplier_id = f.supplier_id;
| PK | PK | |||||
|---|---|---|---|---|---|---|
| order_id | flower | customer | phone | unit_price | supplier_name | supplier_addr |
| 1001 | Rose | Alice | 0101 | 12.50 | BloomCo | 123 Market |
| 1001 | Tulip | Alice | 0101 | 6.00 | BloomCo | 123 Market |
| 1002 | Lily | Bob | 0102 | 9.25 | FloraSupreme | 45 Garden |
| 1002 | Orchid | Bob | 0102 | 15.00 | FloraSupreme | 45 Garden |
| PK | ||
|---|---|---|
| order_id | customer | phone |
| 1001 | Alice | 0101 |
| 1002 | Bob | 0102 |
| PK | PK | |||
|---|---|---|---|---|
| order_id | flower | unit_price | supplier_name | supplier_addr |
| 1001 | Rose | 12.50 | BloomCo | 123 Market |
| 1001 | Tulip | 6.00 | BloomCo | 123 Market |
| 1002 | Lily | 9.25 | FloraSupreme | 45 Garden |
| 1002 | Orchid | 15.00 | FloraSupreme | 45 Garden |
| PK | ||
|---|---|---|
| supplier_id | name | addr |
| 10 | BloomCo | 123 Market |
| 20 | FloraSupreme | 45 Garden |
| PK | ||
|---|---|---|
| flower_id | name | supplier_id |
| 1 | Rose | 10 |
| 2 | Tulip | 10 |
| 3 | Lily | 20 |
| 4 | Orchid | 20 |
| PK | ||
|---|---|---|
| order_id | customer | phone |
| 1001 | Alice | 0101 |
| 1002 | Bob | 0102 |
| PK | PK | |
|---|---|---|
| order_id | flower_id | unit_price |
| 1001 | 1 | 12.50 |
| 1001 | 2 | 6.00 |
| 1002 | 3 | 9.25 |
| 1002 | 4 | 15.00 |
| Year | Normal Form |
|---|---|
| 1970 | 1NF |
| 1971 | 2NF |
| 1971 | 3NF |
| Year | ~Cost per MB |
|---|---|
| 1970 | $1,000,000+ |
| 1980 | $10,000 |
| 1990 | $100 |
| 2000 | $1 |
| 2010 | $0.10 |
| 2020 | <$0.02 |
| Constraint | Works on Atomic Value? |
|---|---|
| PRIMARY KEY | Yes |
| UNIQUE | Yes |
| NOT NULL | Yes |
| CHECK | Yes |
| FOREIGN KEY | Yes |
reservation_id | hiker_name | trails | dates
---------------------------------------------------------------------------
5001 | Alice | Half Dome, Mist Trail | 2026-06-01, Feb
5001 | Bob | Yosemite Falls | 2026-06-03
5002 | Clara | Half Dome, Yosemite Falls | 2026-06-04, Apr
5002 | Tom | Mist Trail | 2026-06-06
reservation_id | hiker_name | trails | dates
---------------------------------------------------------------------------
5001 | Alice | Half Dome, Mist Trail | 2026-06-01, Dec
5001 | Bob | Yosemite Falls | 2026-06-03
5002 | Clara | Half Dome, Yosemite Falls | 2026-06-04, Feb
5002 | Tom | Mist Trail | 2026-06-06
reservation_id | hiker_name | trails | dates
---------------------------------------------------------------------------
5001 | Alice | Half Dome, Mist Trail | 2026-06-01, Jan
5001 | Bob | Yosemite Falls | 2026-06-03
5002 | Clara | Half Dome, Yosemite Falls | 2026-06-04, Apr
5002 | Tom | Mist Trail | 2026-06-06
reservation_id | hiker_name | trails | dates
---------------------------------------------------------------------------
5001 | Alice | Half Dome, Mist Trail | 2026-06-01, Feb
5001 | Bob | Yosemite Falls | 2026-06-03
5002 | Clara | Half Dome, Yosemite Falls | 2026-06-04, Mar
5002 | Tom | Mist Trail | 2026-06-06
reservation_id | hiker_name | trails | dates
---------------------------------------------------------------------------
5001 | Alice | Half Dome, Mist Trail | 2026-06-01, Feb
5001 | Bob | Yosemite Falls | 2026-06-03
5002 | Clara | Half Dome, Yosemite Falls | 2026-06-04, Apr
5002 | Tom | Mist Trail | 2026-06-06
CREATE TABLE trail (
trail_id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
difficulty TEXT NOT NULL,
capacity INT NOT NULL
);
CREATE TABLE reservation (
reservation_id SERIAL PRIMARY KEY,
hiker_name TEXT NOT NULL,
hike_date DATE NOT NULL,
trail_id INT REFERENCES trail(trail_id)
);
trail (1) -------- (many) reservation
r.trail_id = t.trail_id
Write an INNER JOIN
between reservation and trail.
-- Imported legacy data (not in 1NF)
"River Path" | "Easy, Scenic" | 25 | "Sunny, Windy"
"Canyon Loop" | "Medium, Shaded" | 15 | "Cloudy"
"Summit Ridge"| "Hard, Rocky, Steep" | 10 | "Snowy, Cold"
Add a WHERE clause.
[
{
"name": "Pine Valley Trail",
"guide": "Emma Johnson, Liam Brown"
},
{
"name": "Red Rock Ridge",
"guide": "Carlos Martinez, Ava Wilson"
},
{
"name": "Silver Lake Path",
"guide": "Olivia Chen"
}
]
Which JOIN keeps
all rows from trail?
trail_name,guide_name,guide_phone
River Path,Emma Johnson,555-0101
Canyon Loop,Emma Johnson,555-0101
Summit Ridge,Carlos Martinez,555-0202
Forest Walk,Carlos Martinez,555-0202
Use COUNT()
Use GROUP BY
Use GROUP BY
Use HAVING
Use COUNT(*)
Use ORDER BY
Use LIMIT
Use LIMIT
Use OFFSET
SELECT
r.reservation_id,
r.hiker_name,
r.hike_date,
t.name AS trail_name
FROM reservation r
JOIN trail t
ON r.trail_id = t.trail_id;
-- Change request: allow legacy raw values
ALTER TABLE trail
ADD COLUMN difficulty_raw TEXT,
ADD COLUMN weather_raw TEXT;
-- Insert legacy data
INSERT INTO trail (name, difficulty_raw, weather_raw, capacity)
VALUES
('River Path', 'Easy, Scenic', 'Sunny, Windy', 25),
('Canyon Loop', 'Medium, Shaded', 'Cloudy', 15),
('Summit Ridge','Hard, Rocky, Steep', 'Snowy, Cold', 10);
SELECT
r.hiker_name,
r.hike_date,
t.name AS trail_name
FROM reservation r
JOIN trail t
ON r.trail_id = t.trail_id
WHERE t.difficulty = 'Hard';
-- Add JSONB column
ALTER TABLE trail
ADD COLUMN guides JSONB;
-- Update existing rows (match by name)
UPDATE trail
SET guides = '["Emma Johnson", "Liam Brown"]'
WHERE name = 'River Path';
UPDATE trail
SET guides = '["Carlos Martinez", "Ava Wilson"]'
WHERE name = 'Canyon Loop';
UPDATE trail
SET guides = '["Olivia Chen"]'
WHERE name = 'Summit Ridge';
-- Select JSON data
SELECT name, guides
FROM trail;
SELECT
t.trail_id,
t.name AS trail_name,
r.reservation_id,
r.hiker_name,
r.hike_date
FROM trail t
LEFT JOIN reservation r
ON r.trail_id = t.trail_id
ORDER BY t.trail_id, r.hike_date;
-- 1) Guide table
CREATE TABLE guide (
guide_id SERIAL PRIMARY KEY,
guide_name TEXT NOT NULL,
guide_phone TEXT NOT NULL
);
-- 2) Trail assignment table
CREATE TABLE trail_assignment (
assignment_id SERIAL PRIMARY KEY,
trail_name TEXT NOT NULL,
guide_id INT NOT NULL,
FOREIGN KEY (guide_id) REFERENCES guide(guide_id)
);
-- Insert guide data
INSERT INTO guide (guide_name, guide_phone) VALUES
('Emma Johnson', '555-0101'),
('Carlos Martinez', '555-0202');
-- Insert assignments (using FK references)
INSERT INTO trail_assignment (trail_name, guide_id) VALUES
('River Path', 1),
('Canyon Loop', 1),
('Summit Ridge', 2),
('Forest Walk', 2);
SELECT
t.trail_id,
t.name AS trail_name,
COUNT(r.reservation_id) AS total_reservations
FROM trail t
LEFT JOIN reservation r
ON r.trail_id = t.trail_id
GROUP BY t.trail_id, t.name
ORDER BY total_reservations DESC, t.trail_id;
SELECT
t.trail_id,
t.name AS trail_name,
t.capacity,
COUNT(r.reservation_id) AS total_reservations
FROM trail t
LEFT JOIN reservation r
ON r.trail_id = t.trail_id
GROUP BY t.trail_id, t.name, t.capacity
HAVING COUNT(r.reservation_id) > t.capacity
ORDER BY total_reservations DESC, t.trail_id;
SELECT COUNT(*) AS total_reservations
FROM reservation;
SELECT
reservation_id,
hiker_name,
hike_date,
trail_id
FROM reservation
ORDER BY hike_date, reservation_id
LIMIT 3;
SELECT
reservation_id,
hiker_name,
hike_date,
trail_id
FROM reservation
ORDER BY hike_date, reservation_id
LIMIT 3 OFFSET 3;
Write 2 queries:
1) MIN/MAX/AVG on trail.capacity
2) GROUP BY trail.difficulty with COUNT(*)
-- 1) capacity stats
SELECT
MIN(capacity) AS min_capacity,
MAX(capacity) AS max_capacity,
AVG(capacity) AS avg_capacity
FROM trail;
-- 2) trails per difficulty
SELECT
difficulty,
COUNT(*) AS trails_count
FROM trail
GROUP BY difficulty
ORDER BY trails_count DESC;
In simple terms:
SQL is important because it gives us
a reliable and consistent way to
store, retrieve, and manage data.
Without SQL, structured data systems
would not scale or communicate well.
SQL:
Structured Query Language
Standard for relational databases.
SQLite:
A software system that
implements SQL.
Language ≠ Database
CREATE TABLE students (
student_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
major TEXT
);
SELECT name, major
FROM students
WHERE major = 'CS';
INSERT INTO students (student_id, name, major)
VALUES (1, 'Ana', 'CS');
UPDATE students
SET major = 'Data Science'
WHERE student_id = 1;
DELETE FROM students
WHERE student_id = 1;
| Student Question | Instructor Response |
|---|---|
| How are attendance and participation points calculated, and is there a maximum? | Attendance and participation are calculated based on consistent presence and meaningful engagement. |
| For team projects, how are individual contributions evaluated within a single team grade? | There is one overall team project grade, but each submission must clearly state individual responsibilities, and contributions may be reviewed through GitHub activity and documentation, with adjustments made if significant imbalance is observed. |
| Clarification on grading breakdown: What are the percentage values for labs, homework, projects, and exams? | The grading breakdown is as follows: Labs 15 percent, Homework 15 percent, Projects 40 percent, and Exams 30 percent. |
| How is assignment submission on GitHub acknowledged and recorded? | Assignment submission on GitHub is acknowledged by confirming that the code has been uploaded to the repository. |
Relational Algebra: Selection and Projection
Relational Algebra: Join (⨝)
Relational Algebra Optimization
API and Database Interaction
Web vs Console Interface
GROUP BY and Aggregation
CHECK Constraints
INNER JOIN
LEFT JOIN
RIGHT JOIN
Natural Join Risk (Textbook 1, p.128)
Which queries typically require JOINS? (Textbook 1, p.141)
Unmatched rows and OUTER JOIN (Textbook 1, p.149, Fig 7-12)
GROUP BY query difference (Textbook 2, p.67–71)
CHECK constraints meaning (Textbook 2, p.99)
Submission guidance
git clone https://github.com/SE4CPS/2026-COMP-163.git
cd Project/Part-1/Sample
UPDATE teamX_flowers
SET water_level =
water_level - (5 * (CURRENT_DATE - last_watered));
CREATE TABLE posts (
post_id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL,
created_at DATETIME NOT NULL,
content TEXT,
likes_count INT DEFAULT 0
) ENGINE=InnoDB;
| Query | Intent | What the engine must do |
|---|---|---|
WHERE user_id = 5001 |
Profile feed | Find all rows for one user |
WHERE created_at >= '2026-01-01' |
Recent posts | Locate start point then read in order |
WHERE user_id = 5001 ORDER BY created_at DESC LIMIT 20 |
Latest 20 posts | Find user range, read newest first, stop early |
SELECT *
FROM posts
WHERE user_id = 5001;
| n (rows) | Worst-case checks | Impact |
|---|---|---|
| 10,000 | 10,000 | Often acceptable |
| 1,000,000 | 1,000,000 | Noticeable latency, higher I/O |
| 50,000,000 | 50,000,000 | Can dominate latency and reduce throughput |
| Plan | Time Complexity | Work Growth |
|---|---|---|
| Full table scan | O(n) | Linear |
| B-Tree lookup | O(log n) | Very slow growth |
| B-Tree range scan | O(log n + k) | Depends on output size |
Root (few pages)
↓
Internal nodes (few pages)
↓
Leaf pages (many pages)

50
/ \
30 70
/ \ / \
20 40 60 80
[30 | 60]
/ | \
[10 20] [40 50] [70 80 90]
[ 5000 | 9000 ]
/ | \
[1000|3000] [6000|7000] [10000|15000]
/ \ | / \
... ... ... ... ...
CREATE INDEX idx_price
ON flowers(price);
Level 1 [50]
Level 2 [30] [70]
Level 3 [20] [40] [60] [80]
Level 4 ...
Level 5 ...
Level 1 [30 | 60 | 90]
Level 2 [10 20] [40 50] [70 80] [100 110]
Example:
Fanout ≈ 1000
Level 1 → 1 node
Level 2 → 1000 nodes
Level 3 → 1,000,000 entries
Only 3 page reads to reach any row.
Assume max 3 keys per node
Before insert:
[ 10 | 20 | 30 ]
Insert 40 → overflow:
[ 10 | 20 | 30 | 40 ]
[30]
/ \
[10 | 20] [40]
High fanout example:
Level 1: [ 5000 | 9000 ]
/ | \
Level 2: [1000|3000] [6000|7000] [10000|15000]
/ \ | / \
Level 3: ... ... ... ... ...
More keys per node → Fewer levels
Fewer levels → Fewer page reads
Height 3 tree:
Root → Internal → Leaf
= 3 page reads
Height 6 tree:
= 6 page reads
user_idEXPLAIN
CREATE INDEX idx_posts_user
ON posts(user_id);
SELECT *
FROM posts
WHERE user_id = 42;
CREATE INDEX idx_posts_created
ON posts(created_at);
SELECT *
FROM posts
WHERE created_at >= '2026-01-01'
AND created_at < '2026-02-01'
ORDER BY created_at;
user_id and order by created_atLIMIT
CREATE INDEX idx_user_created
ON posts(user_id, created_at);
SELECT *
FROM posts
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 10;
user_idcreated_atSELECT post_id, created_at
FROM posts
WHERE user_id = 5001
ORDER BY created_at DESC
LIMIT 20;
| Predicate | Index Friendly? | Reason |
|---|---|---|
= |
Yes | Exact match |
BETWEEN |
Yes | Range scan |
LIKE 'abc%' |
Often | Prefix search |
LIKE '%abc' |
No | Leading wildcard breaks ordering |
Good: user_id = 5001
Good: created_at BETWEEN a AND b
Bad: content LIKE '%hello%'
content is TEXT (large)SHOW INDEX FROM posts;
ANALYZE TABLE posts;
Improves cardinality estimates used by the optimizer.
| At Design Time | After Deployment | Consequence |
|---|---|---|
| Assumed common filters | Different filters dominate | Need new indexes |
| Estimated row counts | Data grows faster than expected | Scans become too slow |
| Simple queries expected | Composite patterns appear | Need composite indexes |
EXPLAIN to see plan choicesEXPLAIN
SELECT *
FROM posts
WHERE user_id = 5001;
| Write Pattern | What indexes add | Risk |
|---|---|---|
| Frequent INSERT | Each index must be updated | Throughput drops |
| UPDATE of indexed column | Reposition key in B-Tree | Higher write cost |
| Bulk ingestion | Many index page writes | Long ingest times |
| Operation | Cost with 0 extra indexes | Cost with k indexes |
|---|---|---|
| INSERT | Lower overhead | O(k · log n) |
| UPDATE indexed column | Lower overhead | O(k · log n) |
| DELETE | Lower overhead | O(k · log n) |
CREATE TABLE posts (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT,
created_at DATETIME,
content VARCHAR(100)
);
WITH RECURSIVE seq AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1 FROM seq WHERE n < 500000
)
INSERT INTO posts (user_id, created_at, content)
SELECT
FLOOR(RAND()*10000),
NOW() - INTERVAL FLOOR(RAND()*365) DAY,
CONCAT('Post ', n)
FROM seq;
DROP TABLE IF EXISTS posts;
CREATE TABLE posts (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
created_at DATETIME NOT NULL,
content VARCHAR(100) NOT NULL
);
INSERT INTO posts (user_id, created_at, content)
SELECT
FLOOR(RAND() * 10000) AS user_id,
NOW() - INTERVAL FLOOR(RAND() * 365) DAY AS created_at,
CONCAT('Post ', n) AS content
FROM (
SELECT
(a.d + 10*b.d + 100*c.d + 1000*d.d + 10000*e.d) + 1 AS n
FROM (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
CROSS JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
CROSS JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
CROSS JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
CROSS JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4) e
) nums
WHERE n <= 500000;
SELECT COUNT(*) FROM posts;
EXPLAIN ANALYZE
SELECT *
FROM posts
WHERE user_id = 500
ORDER BY created_at DESC
LIMIT 20;
CREATE INDEX idx_user_created
ON posts(user_id, created_at DESC);
EXPLAIN ANALYZE
SELECT *
FROM posts
WHERE user_id = 500
ORDER BY created_at DESC
LIMIT 20;
SHOW INDEX, EXPLAINSHOW INDEX FROM tasks;
EXPLAIN SELECT * FROM tasks WHERE status='done';
-- Todo example
CREATE INDEX idx_status
ON tasks(status);
Whether clustered or not, the internal structure is still a B-Tree.
| Aspect | B-Tree Index | Clustered Index |
|---|---|---|
| What is sorted? | Only the index | The actual table rows |
| How many allowed? | Many | Only one |
| Example | idx_status |
PRIMARY KEY (id) |
| Data storage | Separate from table | Data stored in that order |
| Database | Is PRIMARY KEY auto-indexed? | Are rows stored in PK order (clustered)? | Beginner takeaway |
|---|---|---|---|
| MariaDB (InnoDB) | Yes (auto B-Tree) | Yes (clustered by PK) | Table data “lives” in the PK B-Tree |
| SQLite | Yes (practically, via rowid / PK) | Often yes (rowid order is the storage order) | Rowid acts like the built-in key ordering |
| PostgreSQL | Yes (auto B-Tree) | No (heap table, not clustered by default) | PK index exists, but rows are stored separately |
CREATE TABLE tasks (
id INT PRIMARY KEY,
title VARCHAR(100),
status VARCHAR(20),
created_at DATETIME
);
id
SELECT *
FROM tasks
WHERE id BETWEEN 100 AND 200;
EXPLAIN QUERY PLAN
SELECT * FROM tasks WHERE status='done';
EXPLAIN (ANALYZE, BUFFERS)EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM tasks WHERE status='done';
| Feature | MariaDB | SQLite | PostgreSQL |
|---|---|---|---|
| Default index | B-Tree | B-Tree | B-Tree |
| Runs as | Server | Embedded | Server |
| Clustered PK | Yes (InnoDB) | Rowid behavior | No (heap table) |
| Advanced index types | Limited | Minimal | Many |
status and sort by created_at-- Logical schema
id, title, status, priority, created_at
DROP TABLE IF EXISTS tasks;
CREATE TABLE tasks (
id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) NOT NULL,
status VARCHAR(20) NOT NULL,
priority INT NOT NULL,
created_at DATETIME NOT NULL
) ENGINE=InnoDB;
-- SQLite
CREATE TABLE tasks (
id INTEGER PRIMARY KEY,
title TEXT NOT NULL,
status TEXT NOT NULL,
priority INTEGER NOT NULL,
created_at TEXT NOT NULL
);
-- PostgreSQL
CREATE TABLE tasks (
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
status VARCHAR(20) NOT NULL,
priority INT NOT NULL,
created_at TIMESTAMP NOT NULL
);
INSERT INTO tasks (title, status, priority, created_at)
SELECT
CONCAT('Task ', n),
ELT(FLOOR(1 + RAND()*3),'open','in_progress','done'),
FLOOR(RAND()*5),
NOW() - INTERVAL FLOOR(RAND()*30) DAY
FROM (
SELECT (a.d + 10*b.d + 100*c.d + 1000*d.d) + 1 AS n
FROM (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
CROSS JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
CROSS JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
CROSS JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
) nums
WHERE n <= 10000;
SELECT COUNT(*) FROM tasks;
WITH RECURSIVE seq(n) AS (
SELECT 1
UNION ALL
SELECT n + 1 FROM seq WHERE n < 10000
)
INSERT INTO tasks (title, status, priority, created_at)
SELECT
'Task ' || n,
CASE (ABS(RANDOM()) % 3)
WHEN 0 THEN 'open'
WHEN 1 THEN 'in_progress'
ELSE 'done'
END,
ABS(RANDOM()) % 5,
DATETIME('now', '-' || (ABS(RANDOM()) % 30) || ' days')
FROM seq;
SELECT COUNT(*) FROM tasks;
INSERT INTO tasks (title, status, priority, created_at)
SELECT
'Task ' || n,
(ARRAY['open','in_progress','done'])[floor(random()*3)+1],
floor(random()*5),
NOW() - (floor(random()*30) || ' days')::interval
FROM generate_series(1,10000) AS n;
SELECT COUNT(*) FROM tasks;
SELECT *
FROM tasks
WHERE status = 'done';
status='done'| DB | Plan / Analyzer Command |
|---|---|
| MariaDB |
|
| PostgreSQL |
|
| SQLite |
|
CREATE INDEX idx_tasks_status
ON tasks(status);
CREATE INDEX idx_tasks_status
ON tasks(status);
CREATE INDEX idx_tasks_status
ON tasks(status);
WHERE status = 'done'statusEXPLAIN
SELECT *
FROM tasks
WHERE status = 'done';
EXPLAIN ANALYZE
SELECT *
FROM tasks
WHERE status = 'done';
EXPLAIN QUERY PLAN
SELECT *
FROM tasks
WHERE status = 'done';
idx_tasks_statusEXPLAIN ANALYZE
SELECT *
FROM tasks
WHERE status = 'done'
ORDER BY created_at DESC
LIMIT 20;
idx_tasks_status exists, DB may still need to sort(status, created_at)
statuscreated_at-- Think: ordered groups
(status='done', newest → oldest)
(status='open', newest → oldest)
(status='in_progress', newest → oldest)
| status | created_at | title |
|---|---|---|
| done | 2026-02-25 10:05 | Task A |
| open | 2026-02-25 09:00 | Task B |
| done | 2026-02-25 08:10 | Task C |
| done | 2026-02-24 20:00 | Task D |
done rows are next to each otherLIMIT 20 can stop early (fast)-- MariaDB
CREATE INDEX idx_status_created
ON tasks(status, created_at DESC);
-- SQLite (DESC allowed in index; planner may still optimize)
CREATE INDEX idx_status_created
ON tasks(status, created_at);
-- PostgreSQL
CREATE INDEX idx_status_created
ON tasks(status, created_at DESC);
WHERE status=? ORDER BY created_at DESCEXPLAIN ANALYZE
SELECT *
FROM tasks
WHERE status = 'done'
ORDER BY created_at DESC
LIMIT 20;
SHOW INDEX, EXPLAINSHOW INDEX FROM tasks;
EXPLAIN SELECT * FROM tasks WHERE status='done';
SELECT *
FROM tasks
WHERE id BETWEEN 100 AND 200;
status and sort by created_at-- Logical schema
id, title, status, priority, created_at
DROP TABLE IF EXISTS tasks;
CREATE TABLE tasks (
id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) NOT NULL,
status VARCHAR(20) NOT NULL,
priority INT NOT NULL,
created_at DATETIME NOT NULL
) ENGINE=InnoDB;
INSERT INTO tasks (title, status, priority, created_at)
SELECT
CONCAT('Task ', n),
ELT(FLOOR(1 + RAND()*3),'open','in_progress','done'),
FLOOR(RAND()*5),
NOW() - INTERVAL FLOOR(RAND()*30) DAY
FROM (
SELECT (a.d + 10*b.d + 100*c.d + 1000*d.d) + 1 AS n
FROM (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
CROSS JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
CROSS JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
CROSS JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) d
) nums
WHERE n <= 10000;
SELECT COUNT(*) FROM tasks;
SELECT *
FROM tasks
WHERE status = 'done';
status='done'| DB | Plan / Analyzer Command |
|---|---|
| MariaDB |
|
| PostgreSQL |
|
| SQLite |
|
CREATE INDEX idx_tasks_status
ON tasks(status);
CREATE INDEX idx_tasks_status
ON tasks(status);
CREATE INDEX idx_tasks_status
ON tasks(status);
WHERE status = 'done'statusEXPLAIN
SELECT *
FROM tasks
WHERE status = 'done';
EXPLAIN ANALYZE
SELECT *
FROM tasks
WHERE status = 'done';
EXPLAIN QUERY PLAN
SELECT *
FROM tasks
WHERE status = 'done';
idx_tasks_statusEXPLAIN ANALYZE
SELECT *
FROM tasks
WHERE status = 'done'
ORDER BY created_at DESC
LIMIT 20;
idx_tasks_status exists, DB may still need to sort(status, created_at)
statuscreated_at-- Think: ordered groups
(status='done', newest → oldest)
(status='open', newest → oldest)
(status='in_progress', newest → oldest)
| status | created_at | title |
|---|---|---|
| done | 2026-02-25 10:05 | Task A |
| open | 2026-02-25 09:00 | Task B |
| done | 2026-02-25 08:10 | Task C |
| done | 2026-02-24 20:00 | Task D |
done rows are next to each otherLIMIT 20 can stop early (fast)-- MariaDB
CREATE INDEX idx_status_created
ON tasks(status, created_at DESC);
-- SQLite (DESC allowed in index; planner may still optimize)
CREATE INDEX idx_status_created
ON tasks(status, created_at);
-- PostgreSQL
CREATE INDEX idx_status_created
ON tasks(status, created_at DESC);
WHERE status=? ORDER BY created_at DESCEXPLAIN ANALYZE
SELECT *
FROM tasks
WHERE status = 'done'
ORDER BY created_at DESC
LIMIT 20;
Time: 8:00 - 9:00 AM
March 10: 14 | No class
Questions? Email sberhe@pacific.edu
Transaction structure
BEGIN
SQL step 1
SQL step 2
SQL step 3
COMMIT
Flower Shop App
│
▼
BEGIN
│
├── create order
├── reduce flower stock
└── record payment
│
▼
COMMIT
BEGIN;
SQL step 1
SQL step 2
SQL step 3
COMMIT;
| Letter | Meaning |
|---|---|
| A | Atomicity |
| C | Consistency |
| I | Isolation |
| D | Durability |
BEGIN;
update flowers
create order
record payment
COMMIT or ROLLBACK
Examples of rules
flower_stock >= 0
price > 0
order_total = sum(order_items)
BEGIN
update stock
insert order
COMMIT
BEGIN;
UPDATE flowers
SET stock = stock - 2;
COMMIT;
BEGIN;
UPDATE flowers
SET stock = stock - 10;
ROLLBACK;
Without Transaction
create order ✔
reduce stock ✔
record payment ✖
Result:
order exists
inventory changed
payment missing
With Transaction
BEGIN
create order ✔
reduce stock ✔
record payment ✖
ROLLBACK
State restored
BEGIN;
INSERT INTO orders(customer_name,total_amount,status)
VALUES ('Ava',45.00,'paid');
UPDATE flowers
SET stock = stock - 3
WHERE flower_id = 101;
INSERT INTO payments(order_id,amount)
VALUES (LAST_INSERT_ID(),45.00);
COMMIT;
Order saved ✔
Stock reduced ✔
Payment saved ✔
Final state valid
BEGIN;
INSERT INTO orders(customer_name,total_amount,status)
VALUES ('Ava',45.00,'paid');
UPDATE flowers
SET stock = stock - 3
WHERE flower_id = 101;
-- payment insert fails
ROLLBACK;
Order inserted ✔
Stock reduced ✔
Payment failed ✖
ROLLBACK removes
partial changes
Rule:
flower_stock >= 0
Before sale: 20
Sell: 5
After sale: 15 ✔
Before sale: 4
Customer buys: 10
Result: -6 ✖
Database rejects
transaction
Customer A
reads rose stock = 12
wants to buy 10
Customer B
also reads stock = 12
wants to buy 10
Without isolation
overselling occurs
Transaction
BEGIN
create order
update stock
record payment
COMMIT
After COMMIT
server crash
restart system
confirmed flower order
still exists
Initial rose stock = 20
Transaction A reads 20
Transaction B reads 20
A sells 5 → writes 15
B sells 4 → writes 16
Final stock = 16
Expected = 11
Transaction A
BEGIN;
UPDATE flowers
SET stock = stock - 10
WHERE flower_id = 300;
Transaction B
SELECT stock
FROM flowers
WHERE flower_id = 300;
B reads value
not yet committed
If A rolls back
B read invalid data
Transaction A
locks flower row
needs order row
Transaction B
locks order row
needs flower row
Circular wait
= DEADLOCK
BEGIN;
INSERT INTO orders(customer_name,total_amount,status)
VALUES ('Lina',60,'pending');
SAVEPOINT order_created;
INSERT INTO order_items(order_id,flower_id,qty)
VALUES (25,101,4);
ROLLBACK TO order_created;
COMMIT;
BEGIN
│
create order
│
SAVEPOINT
│
insert flower items
│
error
│
ROLLBACK TO SAVEPOINT
BEGIN;
UPDATE flowers
SET stock = stock - 3
WHERE flower_id = 101;
COMMIT;
Order A → BEGIN ... COMMIT
Order B → BEGIN ... COMMIT
Order C → BEGIN ... COMMIT
Each order is separate
BEGIN;
INSERT INTO orders VALUES ('Mia',32);
UPDATE flowers SET stock = stock - 2
WHERE flower_id = 205;
-- payment fails
ROLLBACK;
Stock = 20
A reads 20
B reads 20
A writes 15
B writes 16
A's change is lost
T1 locks flower row
T2 tries same row
T2 waits
T1 commits
T2 continues
T1 runs first
T1 commits
T2 runs next
T2 commits
As if one-at-a-time
T1 reads stock = 20
T2 commits stock = 18
T1 reads again = 18
Dirty read avoided
T1 reads stock = 20
T2 updates stock = 18
T2 commits
T1 reads same row = 20
T1:
SELECT COUNT(*) WHERE status='open'
→ 12
T2 inserts open order
T2 commits
T1 repeats query
→ 13
BEGIN;
SELECT stock
FROM flowers
WHERE flower_id = 101
FOR UPDATE;
-- verify stock again
COMMIT;
BEGIN;
UPDATE orders
SET status = 'cancelled'
WHERE order_id = 77;
UPDATE flowers
SET stock = stock + 2
WHERE flower_id = 101;
COMMIT;
Try transaction
deadlock occurs
rollback happens
start again
commit succeeds
BEGIN;
UPDATE flowers
SET price = price * 1.10
WHERE category = 'rose';
UPDATE flowers
SET price = price * 1.05
WHERE category = 'tulip';
COMMIT;
create order
insert order items
reduce stock
record payment
commit order
BEGIN;
SELECT stock
FROM flowers
WHERE flower_id = 205
FOR UPDATE;
UPDATE flowers
SET stock = stock - 2
WHERE flower_id = 205;
COMMIT;
BEGIN;
INSERT INTO payments(order_id,amount,status)
VALUES (88,45.00,'confirmed');
UPDATE orders
SET status = 'paid'
WHERE order_id = 88;
COMMIT;
BEGIN;
UPDATE orders
SET status = 'fulfilled'
WHERE order_id = 88;
UPDATE shipments
SET packed_at = NOW()
WHERE order_id = 88;
COMMIT;
Committed work → keep
Uncommitted work → undo
Restart system
recover database
return to safe state
Log entry 1 → replay
Log entry 2 → replay
Log entry 3 → replay
Committed state restored
| # | Concept | Flower Domain Meaning |
|---|---|---|
| 1 | Transaction | One complete flower purchase workflow |
| 2 | Atomicity | Order, stock, and payment succeed together |
| 3 | Consistency | Flower stock rules remain valid |
| 4 | Isolation | Multiple customers do not corrupt stock |
| 5 | Durability | Committed flower orders survive crashes |
CREATE TABLE flowers (
id NUMBER PRIMARY KEY,
name VARCHAR2(50) NOT NULL,
color VARCHAR2(30) NOT NULL,
price NUMBER(6,2) NOT NULL
);
BEGIN;
INSERT INTO flowers VALUES (1,'Rose','Red',5);
INSERT INTO flowers VALUES (2,'Lily','White',7);
INSERT INTO flowers VALUES (3,'Tulip','Yellow',4);
INSERT INTO flowers VALUES (4,'Daisy','White',3.5);
COMMIT;
SELECT id, name, color, price
FROM flowers
ORDER BY price DESC;
SAVEPOINT before_update;
BEGIN;
UPDATE flowers
SET price = price + 1
WHERE color = 'White';
SELECT * FROM flowers;
-- Save changes permanently
COMMIT;
-- Undo entire transaction
ROLLBACK;
-- Undo only last step
ROLLBACK TO before_update;
Flow of Today
ACID → Guarantees
↓
Transactions → Control
↓
COMMIT / ROLLBACK
↓
Real Examples
Start → Changes → Decision
┌──────────┐
│ COMMIT │ → save ✔
└──────────┘
OR
┌──────────┐
│ ROLLBACK │ → undo ✖
└──────────┘
UPDATE accounts
SET balance = balance - 100
WHERE id = 1;
SAVEPOINT s1;
UPDATE accounts
SET balance = balance + 100
WHERE id = 2;
ROLLBACK TO s1;
COMMIT;
Atomicity → all or nothing
Consistency → valid state
Isolation → no conflicts
Durability → survives crash
-- Atomicity
INSERT INTO orders VALUES (1);
INSERT INTO payments VALUES (1);
ROLLBACK;
-- Consistency
UPDATE products
SET stock = stock - 5
WHERE stock >= 5;
-- Isolation
SELECT * FROM orders
FOR UPDATE;
-- Durability
COMMIT;
E-commerce
order → payment → commit ✔
Banking
debit → credit → commit ✔
Inventory
reduce stock → commit ✔
-- E-commerce
INSERT INTO orders VALUES (10);
INSERT INTO payments VALUES (10);
COMMIT;
-- Banking
UPDATE accounts SET balance = balance - 200 WHERE id=1;
UPDATE accounts SET balance = balance + 200 WHERE id=2;
COMMIT;
-- Inventory
UPDATE products SET stock = stock - 1 WHERE id=5;
COMMIT;
E-commerce
payment fails → rollback ✖
Banking
credit fails → rollback ✖
Inventory
error → rollback ✖
-- E-commerce failure
INSERT INTO orders VALUES (11);
ROLLBACK;
-- Banking failure
UPDATE accounts SET balance = balance - 200 WHERE id=1;
ROLLBACK;
-- Inventory failure
UPDATE products SET stock = stock - 100 WHERE id=5;
ROLLBACK;
BEGIN
│
step 1 ✔
│
SAVEPOINT A
│
step 2 ✔
│
step 3 ✖
│
ROLLBACK TO A
BEGIN;
INSERT INTO orders VALUES (20);
SAVEPOINT s1;
INSERT INTO order_items VALUES (20,101);
ROLLBACK TO s1;
COMMIT;
Entities
Flowers → inventory
Orders → purchases
Payments → money
All linked in transactions
CREATE TABLE flowers (
id NUMBER PRIMARY KEY,
name VARCHAR2(50),
stock NUMBER,
price NUMBER(6,2)
);
CREATE TABLE orders (
id NUMBER PRIMARY KEY,
customer VARCHAR2(50),
total NUMBER(6,2),
status VARCHAR2(20)
);
CREATE TABLE payments (
id NUMBER PRIMARY KEY,
order_id NUMBER,
amount NUMBER(6,2)
);
Initial Stock
Rose → 20
Lily → 15
Tulip → 25
INSERT INTO flowers VALUES (1,'Rose',20,5);
INSERT INTO flowers VALUES (2,'Lily',15,7);
INSERT INTO flowers VALUES (3,'Tulip',25,4);
COMMIT;
Customer buys 3 Roses
Step 1: create order ✔
Step 2: reduce stock ✔
Step 3: record payment ✔
→ COMMIT
BEGIN;
INSERT INTO orders
VALUES (100,'Ava',15,'paid');
UPDATE flowers
SET stock = stock - 3
WHERE id = 1;
INSERT INTO payments
VALUES (1,100,15);
COMMIT;
Customer buys 10 Lilies
Order created ✔
Stock updated ✔
Payment fails ✖
→ ROLLBACK
Everything undone
BEGIN;
INSERT INTO orders
VALUES (101,'Liam',70,'pending');
UPDATE flowers
SET stock = stock - 10
WHERE id = 2;
-- payment fails here
ROLLBACK;
Order process
Create order ✔
SAVEPOINT
Add items ✔
Error occurs ✖
Rollback to savepoint
(order remains)
BEGIN;
INSERT INTO orders
VALUES (102,'Mia',40,'pending');
SAVEPOINT after_order;
UPDATE flowers
SET stock = stock - 5
WHERE id = 3;
-- error
ROLLBACK TO after_order;
COMMIT;
Initial stock = 20
T1 reads 20
T2 reads 20
T1 writes 15
T2 writes 16
Final = 16 ✖
Expected = 11
-- Transaction 1
UPDATE flowers
SET stock = stock - 5
WHERE id = 1;
-- Transaction 2
UPDATE flowers
SET stock = stock - 4
WHERE id = 1;
Lock row first
T1 locks row
T2 waits
T1 commits
T2 continues safely
BEGIN;
SELECT stock
FROM flowers
WHERE id = 1
FOR UPDATE;
UPDATE flowers
SET stock = stock - 3
WHERE id = 1;
COMMIT;
Steps
1. Create order
2. Update stock
3. Insert payment
Question:
Where should the
transaction START
and END?
INSERT INTO orders
VALUES (201,'Tom',50,'paid');
UPDATE products
SET stock = stock - 2
WHERE id = 10;
INSERT INTO payments
VALUES (1,201,50);
Steps
1. Insert enrollment
2. Reduce seat count
Question:
Where should the
transaction START
and END?
INSERT INTO enrollments
VALUES (5001, 'COMP-163');
UPDATE courses
SET seats = seats - 1
WHERE course_id = 'COMP-163';
Steps
1. Insert patient
2. Assign bed
3. Update bed status
Question:
Where should the
transaction START
and END?
INSERT INTO patients
VALUES (301,'Emma');
UPDATE beds
SET status = 'occupied'
WHERE id = 12;
UPDATE rooms
SET available = 'N'
WHERE id = 5;
START TRANSACTION
│
try
│
all steps OK?
/ \
yes no
│ │
COMMIT ROLLBACK
try:
cursor.execute("""
INSERT INTO enrollments
VALUES (5002, 'CS101')
""")
cursor.execute("""
UPDATE courses
SET seats = seats - 1
WHERE course_id = 'CS101'
""")
conn.commit()
except Exception as e:
conn.rollback()
print("Error:", e)
finally:
cursor.close()
conn.close()
ERD → Tables → SQL → Transactions
Users → Lists → Tasks
Task(102, 'Alice', 'Finish presentation pending')
Task(200, 'Bob', 'Buy groceries is done')
User creates a list
User adds multiple tasks
One insert fails
task_iduser_idstatustitle
User Data
│
▼
Database
│
▼
Leak → Misuse → Damage
Examples
password = 'rose123'
ssn = '123-45-6789'
diagnosis = 'diabetes'
Regulation
│
▼
Security Controls
│
▼
Compliance / Violation
| Data Type | Risk |
|---|---|
| Password | Account compromise |
| SSN | Identity theft |
| Clinical Data | Privacy violation |
Sensitive Data
Password
SSN
Medical Records
→ Must be protected
Example:
User submits password
over HTTPS
Client
│
▼
Encrypted Channel (TLS)
│
▼
Server
Example:
Stored password (hashed)
Stored SSN (encrypted)
Database Storage
│
▼
Encrypted Data
│
▼
Disk / Backup
User Input
│
▼
SQL Query
│
▼
Injected Logic
│
▼
Database
SELECT * FROM flowers
WHERE name = 'Rose';
flowers table
Rose
Tulip
Lily
query =
"SELECT * FROM flowers
WHERE name = '" + input + "'";
Input joins SQL
Query + Input
│
▼
Unsafe Query
' OR '1'='1
Condition
TRUE OR TRUE
│
▼
All rows returned
SELECT * FROM flowers
WHERE name = '' OR '1'='1';
Filter removed
→ full table scan
→ all flowers returned
Attack Results
Read → Modify → Delete
SELECT * FROM users
WHERE username = input
AND password = input;
input:
' OR '1'='1
Login bypassed
"... WHERE name = '" + input + "'"
WHERE name = ?
-- prepare statement with placeholder
stmt = conn.prepare(
"SELECT *
FROM flowers
WHERE name = ?"
);
-- bind input as data (not SQL)
stmt.execute(input);
? is a parameter placeholder
SQL Template
│
▼
Bind Parameter
│
▼
Execution Engine
│
▼
Safe Query (no injection)
Query + Data
→ NOT merged
→ Safe
input = user_input
if input.isalpha():
allow()
allowed = ['Rose','Lily']
if input in allowed:
run_query()
input.replace("'", "''")
SELECT * FROM flowers
UNION SELECT * FROM users;
flowers data + users data
'; DELETE FROM flowers; --
flowers table wiped
'; UPDATE flowers SET price=0; --
all prices modified
1. Prepared Statements
2. Input Validation
3. Least Privilege
4. Monitoring
User
│
▼
Application (validation)
│
▼
Prepared SQL
│
▼
Database
Search Flower
Input → Validate → Bind → Query → Result
Plaintext
│
Encrypt(key)
│
▼
Ciphertext
password = 'rose123'
ssn = '123-45-6789'
Sensitive Data
→ Must be encrypted
CREATE EXTENSION pgcrypto;
INSERT INTO payments(card)
VALUES ('4111-2222');
INSERT INTO payments(card)
VALUES (
pgp_sym_encrypt('4111-2222','key')
);
pgp_sym_encrypt(data, key)
Data + Key
│
▼
Encrypted Output
pgp_sym_decrypt(column,'key')
Encrypted Data
│
Decrypt(key)
│
▼
Original Data
INSERT INTO payments(card)
VALUES (
pgp_sym_encrypt('4111-2222','key')
);
Order → Encrypt → Store
SELECT pgp_sym_decrypt(card,'key')
FROM payments;
Encrypted → Decrypt → Output
| Feature | Hash | Encrypt |
|---|---|---|
| Reverse | No | Yes |
| Use | Password | Data |
SELECT crypt('rose123', gen_salt('bf'));
Password → Hash → Store
SELECT crypt('rose123', stored_hash) = stored_hash;
Same Key
Encrypt → Decrypt
Public Key → Encrypt
Private Key → Decrypt
Client
│
TLS/SSL
│
Server
Database
│
Encrypted Columns
│
Disk
SELECT * FROM payments
WHERE card = '4111';
Encrypt Sensitive Data
+
Hash Passwords
+
Secure Keys
Input → Encrypt → Store
Query → Decrypt → Output
No Control → Full Access
With Control → Restricted Access
CREATE ROLE user1;
CREATE ROLE alice LOGIN PASSWORD 'pass';
Admin → Full access
Manager → Read + Write
User → Read only
SELECT * FROM flowers;
UPDATE flowers SET price=0;
Access restricted
based on role
GRANT SELECT ON flowers TO user1;
GRANT INSERT ON flowers TO user1;
GRANT UPDATE ON flowers TO user1;
GRANT DELETE ON flowers TO user1;
GRANT ALL ON flowers TO admin;
REVOKE SELECT ON flowers FROM user1;
SELECT * FROM flowers;
GRANT SELECT(name,price)
ON flowers TO user1;
ALTER TABLE flowers OWNER TO admin;
GRANT manager TO alice;
Admin
│
▼
Manager
│
▼
User
GRANT SELECT ON flowers TO user1;
GRANT INSERT, UPDATE ON flowers TO user1;
User → Read flowers
Manager → Update prices
Admin → Full control
Give only required access
User → minimal rights
User Request
│
Check Role
│
Allow / Deny
Public DB
flowers(name, price)
Private DB
users(ssn, password)
payments(card)
Single DB
(all data mixed)
Public DB Private DB
(separated systems)
SELECT name, price
FROM flowers;
SELECT ssn, password
FROM users;
Frontend → Public DB
Backend → Private DB
Internet
│
Public DB (open)
│
Firewall
│
Private DB (restricted)
User Request
│
Public Data Query
│
Sensitive Data → Backend Only
Public DB
│
Replicas
│
High availability
Private DB
│
Encrypted Storage
│
Restricted Access
Shard 1 → Public Data
Shard 2 → Private Data
Attack on Public DB
→ No access to private data
Public DB failure
→ Private DB unaffected
Security ↑
Complexity ↑
User
│
▼
Frontend
│
▼
Public DB
Backend
│
▼
Private DB
Prevent → Detect → Respond
SQL Injection
Encryption
Access Control
Monitoring
Layer 1 → Input Validation
Layer 2 → Access Control
Layer 3 → Encryption
Layer 4 → Monitoring
User Action
│
▼
Log Entry
log_statement = 'all'
log_connections = on
log_disconnections = on
SELECT * FROM flowers;
DELETE FROM flowers;
Timestamp
User
Query
Logs → Audit → Alerts
User: alice
Action: DELETE
Table: flowers
Time: 10:00
Suspicious?
→ Investigate
System Activity
│
▼
Real-Time Monitoring
│
▼
Alert
Too many failed logins
Unexpected DELETE query
Trigger Alert
Input contains:
' OR '1'='1
→ Flag as attack
Key
│
Secure Storage
│
Limited Access
Database
│
▼
Backup
│
▼
Recovery
Internet
│
Firewall
│
Database
Admin → Manage DB
User → Query Data
Auditor → Review Logs
Detect → Contain → Fix → Recover
.DS_Store
# Keep secrets out of source control
.env
.env
# Database Connection String
DATABASE_URL=postgresql://neondb_...
# Other API Secrets
API_KEY=your_secret_key_here
DEBUG=True
Before: Hardcoded Credentials
import psycopg2
import os
DATABASE_URL = (
"postgresql://neondb_owner:npg_M5sVheSzQLv4@..."
)
After: Environment Variables
import psycopg2
import os
from dotenv import load_dotenv
load_dotenv()
DATABASE_URL = os.getenv("DATABASE_URL")
Key trade-off:
performance vs consistency
Examples:
banking, feeds, messaging, dashboards
def replicate(flower_id, flower_data):
for conn in conns:
conn.cursor().execute(
"INSERT INTO flowers...
)
conn.commit()
def insert_shard(flower_id, flower_data):
conn = conns[flower_id % len(conns)]
conn.cursor().execute(
"INSERT INTO flowers...
(flower_id, flower_data)
)
conn.commit()
def insert_by_type(flower_type, flower_data):
conn = partitions[flower_type]
conn.cursor().execute(
"INSERT INTO flowers...
(flower_type, flower_data)
)
conn.commit()
They differ in:
Compared to traditional SQL:
simpler locally, but less scalable
+------------------+
| Request: id=145 |
+------------------+
|
v
+------------------+
| Range / Hash Key |
+------------------+
|
v
+------------------+
| Route to db2 |
| rows 101 - 200 |
+------------------+
+----------------------+
| Request: flower info |
+----------------------+
|
v
+----------------------+
| id,type -> db1 |
| color,cost -> db2 |
+----------------------+
|
v
+----------------------+
| Merge final result |
+----------------------+
+------------------+
| Request: id=7 |
+------------------+
|
v
+------------------+
| 7 % 3 = 1 |
+------------------+
|
v
+------------------+
| Route to db2 |
+------------------+
+------------------+
| Request: id=230 |
+------------------+
v
+------------------+
| Check ranges |
| 1-100 -> db1 |
| 101-200 -> db2 |
| 201-300 -> db3 |
+------------------+
v
+------------------+
| Route to db3 |
+------------------+
+----------------------+
| Request: type=Rose |
+----------------------+
v
+----------------------+
| Semantic rule lookup |
+----------------------+
v
+----------------------+
| Rose -> db1 |
| Tulip -> db2 |
| Daisy -> db3 |
+----------------------+
+------------------+
| Write request |
+------------------+
/ \
v v
+--------+ +--------+
| db1 | | db2 |
| write | | write |
+--------+ +--------+
\ /
v v
+------------------+
| Conflict check |
+------------------+
+------------------+
| Read request |
+------------------+
v
+------------------+
| Try PostgreSQL |
+------------------+
| success | fail
v v
result +------------------+
| Fallback SQLite |
+------------------+
v
result
+------------------+
| COUNT flowers |
+------------------+
/ | \
v v v
+------+ +------+ +------+
| db1 | | db2 | | db3 |
+------+ +------+ +------+
\ | /
v v v
+------------------+
| Merge totals |
+------------------+
Health -> CP / strong consistency
Social -> AP / eventual consistency
IoT -> hash or time-range partition
Design → Scale → Optimize
Flow:
IDs → Data → Distribution
→ Consistency → Queries → Design
+------------------+
| Client |
+------------------+
/ | \
v v v
+---------+ +---------+ +---------+
| DB1 | | DB2 | | DB3 |
+---------+ +---------+ +---------+
| UUID A | | UUID B | | UUID C |
| data... | | data... | | data... |
+---------+ +---------+ +---------+
id = UUID()
-- bad
id SERIAL
-- good
id UUID PRIMARY KEY
INSERT INTO flowers(id,type)
VALUES (gen_random_uuid(),'Rose');
UUID ≈ 16 bytes
INT ≈ 4 bytes
UUIDv7 → time-based
shard_key = hash(user_id)
user_id → same shard
JOIN → high latency
INSERT ... ON CONFLICT DO NOTHING;
550e8400-e29b-41d4-a716-446655440000
INT → centralized
UUID → distributed safe
import uuid
id = uuid.uuid4()
CREATE TABLE flowers(
id UUID PRIMARY KEY,
name TEXT
);
UUID = 16 bytes
INT = 4 bytes
UUIDv7 → time-based ordering
INSERT ... ON CONFLICT DO NOTHING;
DB1 → UUID ?
DB2 → UUID ?
DB3 → UUID ?
How do we guarantee uniqueness?
2^128 ≈ 3.4 × 10^38 possible IDs
xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx
↑
version
Collision probability:
~ 1 in 10^18 (extremely low)
Node A → UUID A
Node B → UUID B
Node C → UUID C
All unique without communication
Central ID Server → bottleneck
UUID → fully distributed
DB1 → UUID X
DB2 → UUID Y
Merge → no collision
User → Feed Service → Cache → DB
Sensor → Gateway → DB shards
(time-based)
Users → API → Cache → DB
Orders → Strong consistency
Query → Nodes → Merge → Result
Sender → Queue → Receiver
Doctor → Secure DB → Patient Record
hash(id)
range(time)
type(category)
Write → Primary
Read → Replica
Strong | Eventual | Hybrid
Client → API → Router
→ db1/db2/db3
→ Merge → Response
Try → Fail → Retry → Fallback
Design → Scale → Optimize
SELECT SUM(price) FROM flowers;
SELECT COUNT(*) FROM flowers;
SELECT AVG(price) FROM flowers;
SELECT * FROM users WHERE id = 10;
SELECT SUM(balance)
FROM payments;
Row: 3B rows × 20 cols = 60B values read
Col: 3B rows × 1 col = 3B values read
A A A A B B
→ (A, 4), (B, 2)
status:
active active active inactive
→ (active, 3), (inactive, 1)
Raw data: ~12 GB
Compressed: ~2 GB (or better)
SELECT COUNT(*) FROM logs;
SELECT hour, COUNT(*)
FROM logs
GROUP BY hour;
Row: 3B rows × 20 cols = 60B values read
Col: 3B rows × 1 col = 3B values read
(A, 5)
(B, 2)
Red Red Red Red
→ (Red, 4)
Active Active Inactive
→ (Active, 2), (Inactive, 1)
-- ClickHouse splits automatically:
SELECT SUM(revenue)
FROM sales;
-- Core 1: SUM rows 0..1B → 4.2B
-- Core 2: SUM rows 1B..2B → 3.8B
-- Core 3: SUM rows 2B..3B → 4.0B
-- Merge: 4.2 + 3.8 + 4.0 = 12.0B
Row: 3B rows × 20 cols = 60B values read
Col: 3B rows × 1 col = 3B values read
Row storage path for SUM(price):
1. Load full row: [name|color|type|price]
2. Skip name, color, type
3. Extract price
4. Repeat for every row
→ reads ~20× more data than needed
Column storage path for SUM(price):
1. Load price column block only
2. Decompress (RLE / dict)
3. SIMD vectorized SUM
4. Done
→ reads only the data it needs ✓
Scenario A:
Bank app: 10K transactions/sec
ACID guarantees required
→ Which storage?
Scenario B:
Monthly revenue report
50B events, SUM/GROUP BY
→ Which storage?
Scenario C:
Both: what's the architecture?
-- Three questions to ask first:
1. What queries run most often?
→ point lookups vs aggregations
2. What is the read/write ratio?
→ heavy writes → row-based
→ heavy reads → columnar
3. What scale do I need?
→ billions of rows → ClickHouse
→ millions + ACID → PostgreSQL
-- Point lookup: perfect for row DB
SELECT * FROM users WHERE id = 42;
-- Update: row DBs handle this well
UPDATE users
SET email = 'new@email.com'
WHERE id = 42;
-- ClickHouse: analytics at scale
SELECT
toMonth(event_time) AS month,
SUM(revenue) AS total
FROM events
GROUP BY month
ORDER BY month;
-- 50B rows → ~200ms
-- OLTP: PostgreSQL
INSERT INTO orders VALUES (...);
UPDATE users SET email = ...;
-- Sync via ETL pipeline
-- OLAP: ClickHouse
SELECT region, SUM(revenue)
FROM sales
GROUP BY region;
ClickHouse applies this automatically on sorted columns
SELECT COUNT(*)
FROM default.query_metrics_v2;
SELECT
MIN(event_date) AS earliest,
MAX(event_date) AS latest
FROM default.query_metrics_v2;
SELECT uniqExact(test) AS distinct_tests
FROM default.query_metrics_v2;
-- approximate (faster on huge tables):
SELECT uniq(test) AS approx_tests
FROM default.query_metrics_v2;
SELECT AVG(diff) AS avg_diff
FROM default.query_metrics_v2;
-- with rounding for readability:
SELECT round(AVG(diff), 4) AS avg_diff
FROM default.query_metrics_v2;
SELECT
metric,
AVG(new_value) AS avg_new
FROM default.query_metrics_v2
GROUP BY metric
ORDER BY avg_new DESC
LIMIT 1;
SELECT
COUNT(*) AS exceeded
FROM default.query_metrics_v2
WHERE diff > stat_threshold;
-- as a percentage of all runs:
SELECT
round(100 * COUNTIf(
diff > stat_threshold
) / COUNT(*), 2) AS pct
FROM default.query_metrics_v2;
SELECT
metric,
COUNT(*) AS total_runs
FROM default.query_metrics_v2
GROUP BY metric
ORDER BY total_runs DESC;
SELECT
test,
COUNT(*) AS runs
FROM default.query_metrics_v2
GROUP BY test
ORDER BY runs DESC
LIMIT 1;
SELECT
test,
toStartOfMonth(event_date) AS month,
MAX(diff) AS max_diff
FROM default.query_metrics_v2
GROUP BY test, month
ORDER BY test, month;
SELECT
metric,
uniqExact(pr_number) AS distinct_prs
FROM default.query_metrics_v2
GROUP BY metric
ORDER BY distinct_prs DESC;
task with 1000 randomly generated rowsMAXget_max_completed_hours()status = 'completed' firstestimated_hours value
def get_max_completed_hours():
with get_postgres_conn_cursor() as (_, cur):
timed_postgres_execute(
cur,
"""
SELECT MAX(estimated_hours)
FROM task
WHERE status = 'completed';
""",
label="Postgres MAX completed"
)
rows = cur.fetchall()
print_rows(rows)
main to test itGROUP BY and return mult. rowsDATABASE_URL in your .env filepython aggregation.py: MAX should printmain and run again
Motivation, history, types, ACID, CAP theorem and query languages.
This module is covered in class for context, but will not appear on the final exam.
A flower shop processes hundreds of online payments per day. Each payment must debit the customer's account and credit the shop's account as a single, all-or-nothing operation. A partial write would leave money in limbo. Which database family fits best?
Answer: C: Relational SQL. Financial operations require ACID atomicity. A transaction either commits fully or rolls back. NoSQL BASE semantics risk a debit without a matching credit.
The flower shop sells roses (color, stem length), potted plants (pot size, sunlight), and gift baskets (items list, occasion). Each product type has completely different attributes. Adding a new product type must not require an ALTER TABLE. Which database fits?
Answer: B: Document. Each product is stored as a JSON document with its own fields. No schema migration needed when a new product type is added. SQL would require nullable columns or a complex EAV pattern.
The flower shop's web app creates a login session token for every user. On each page request the app checks whether the token is valid. Sessions expire after 30 minutes of inactivity. The lookup must complete in under 1 ms. Which database fits?
Answer: C: Key-Value (Redis). Sessions are a classic key→value pair (token → user data). Redis stores data in-memory for O(1) reads and supports TTL so expired sessions are deleted automatically. No SQL join needed.
The flower shop's greenhouses have 500 sensors recording temperature and humidity every second: 43 million rows per day. Queries always filter by sensor ID and a time range. Reads are rare; writes are constant and must never block. Which database fits?
Answer: C: Wide-Column (Cassandra). Rows are partitioned by sensor_id and clustered by timestamp. Writes append to a commit log: no locking, no B-tree rebalancing. Exactly the write-heavy, time-series workload Cassandra is designed for.
The flower shop launches a loyalty program where customers can refer friends. The app needs to find "customers within 3 referral hops of a VIP" and detect referral fraud rings. Queries follow chains of relationships, not rows. Which database fits?
Answer: D: Graph (Neo4j). Nodes are customers, edges are referrals. A single Cypher traversal finds all paths within 3 hops natively. SQL recursive CTEs are possible but slow at depth; Redis and Cassandra have no traversal primitives.
Modules 1 to 14 · Flower domain
The flower shop changes disk storage from heap to B-tree. Which independence protects the conceptual schema?
Physical data independence. The internal schema changes but flower(id, name, price, color) stays untouched.
OrderLine stores quantity and unit_price. It cannot exist without an Order. What entity type is it and what is its full key?
Weak entity. Its partial key is line_number. The full key is (order_id, line_number): borrowed from the strong entity Order.
Write relational algebra to find names of flowers priced above 10 supplied from the Netherlands.
π flower.name (σ price>10 ∧ country='Netherlands' (flower ⋈ flower.supplier_id=supplier.id supplier))
A flower shop has a React frontend, a FastAPI backend and a psql admin. Name the interface type each uses to talk to PostgreSQL, and one advantage of each.
React: REST API → backend abstracts SQL, protects DB credentials. FastAPI: SQL driver (psycopg2) → full SQL power, parameterized queries. psql: CLI → direct DDL, EXPLAIN, schema inspection without deploying code.
Find the top 2 suppliers by number of distinct flower types they supply, but only suppliers with more than 3 types.
SELECT s.name,
COUNT(DISTINCT f.id) AS types
FROM supplier s
JOIN flower f ON f.supplier_id =
s.id
GROUP BY s.name
HAVING COUNT(DISTINCT f.id) > 3
ORDER BY types DESC
LIMIT
2;
Table: (order_id, flower_id, flower_name, supplier_country). What normal form violation exists and how do you fix it?
flower_name depends only on flower_id: partial dependency. Violates 2NF. Fix: move flower_name to a separate flower(flower_id, flower_name) table.
A query filters flowers by color = 'red' AND price > 10 ORDER BY price. Design the optimal composite index and explain why column order matters.
CREATE INDEX ON flower (color, price). color first because it is an equality filter. price second for range and sort. Reversed order would not let the index prune on color first.
A flower order deducts stock and inserts an order row. Where do you put the COMMIT and ROLLBACK, and what happens if you leave autocommit on?
Wrap both statements in BEGIN ... COMMIT. If stock goes negative, call ROLLBACK. With autocommit on each statement commits immediately: a crash between the two leaves stock deducted but no order inserted.
A supplier login should only see their own flowers. How do you enforce this without changing the application query?
Row-level security. Enable RLS on flower, create a policy WHERE supplier_id = current_setting('app.supplier_id')::int. The DB filters automatically per session.
The flower table is sharded by hash(flower_id). A query JOINs flower with supplier. What is the performance problem and one solution?
Cross-shard JOIN: flower rows are on different shards, supplier is not sharded. Solution: co-locate by sharding both tables on supplier_id, or broadcast the small supplier table to all shards.
Write SQL to find colors where the average flower price is above the overall average price of all flowers.
SELECT color,
AVG(price)
FROM flower
GROUP BY color
HAVING AVG(price) > (SELECT AVG(price) FROM
flower);
A customer buys roses. Stock must decrease and order must record atomically. A second customer queries the catalogue and may see a slightly stale price. Which model for each?
Stock + order: ACID. Both writes commit together or both roll back. Catalogue query: BASE / eventual consistency. Stale price for a few seconds is acceptable for read-only browsing.
The flower shop Cassandra cluster splits into two groups. Cassandra is AP. What happens to writes on both sides during the split, and how are conflicts resolved after healing?
Both sides accept writes. After the partition heals Cassandra uses last-write-wins (LWW): the write with the higher timestamp overwrites the other. Data may be inconsistent for the duration of the split.
You store flowers in MongoDB. Write the MQL aggregation to find the top-selling color by total quantity ordered.
db.orders.aggregate([
{$unwind:"$lines"},
{$lookup:{from:"flower",localField:"lines.flower_id",foreignField:"_id",as:"f"}},
{$group:{_id:"$f.color",total:{$sum:"$lines.qty"}}},
{$sort:{total:-1}},{$limit:1}
])
A new requirement: find the top 3 flower suppliers connected within 2 hops in the supply chain. Which database and which query language?
Neo4j ·
Cypher:
MATCH (s:Supplier)-[:SUPPLIES*1..2]->(f:Flower)
RETURN s.name, COUNT(f) AS
total
ORDER BY total DESC LIMIT 3
Why can ClickHouse precompute SUM(price) during a merge but PostgreSQL cannot precompute it for the same query?
ClickHouse uses eventual consistency: a stale precomputed total is acceptable for analytics. PostgreSQL's ACID requires every read to reflect every committed row, so a cached total could violate consistency.
Relation: (flower_id, flower_name, supplier_id, supplier_country). Identify all normal form violations and produce a fully decomposed schema.
3NF violation: flower_id → supplier_id → supplier_country (transitive). Decompose into flower(flower_id, flower_name, supplier_id) and supplier(supplier_id, supplier_country).
Explain why a partial index on flower WHERE stock > 0 is more efficient than a full index on stock for the query: SELECT * FROM flower WHERE stock > 0 AND color = 'red'.
The partial index only stores entries for rows where stock > 0: typically a small fraction of the table. The index is smaller, fits in cache better, and scans fewer entries than a full index on stock would.
Two customers both read that only 1 rose is left in stock and both proceed to buy it. Which anomaly is this, which isolation level prevents it, and how?
Phantom read / lost update. SERIALIZABLE prevents it by detecting the conflict and rolling back one transaction. Alternatively use SELECT ... FOR UPDATE to lock the row before the check.
A staff member runs: SELECT * FROM flower WHERE name = 'Rose' OR '1'='1'. How does a parameterized query prevent this SQL injection?
With a parameterized query the driver sends the SQL and the parameter separately. The DB treats the parameter as a string literal, never as SQL. The OR '1'='1' becomes part of the string value, not executable SQL.
The flower order requires: deduct stock on Shard A and insert order on Shard B. Two-phase commit is too slow. What pattern do you use and what does compensation look like?
Saga pattern. Step 1: deduct stock. Step 2: insert order. If step 2 fails the compensating transaction is: restore stock by incrementing it back. Each step commits independently with a compensating undo.
A ClickHouse query runs SELECT AVG(price) FROM flower WHERE color = 'red'. Name three optimisations ClickHouse applies that PostgreSQL cannot.
1) Column pruning: only price.bin and color.bin are opened. 2) Granule pruning: granules where the sparse index shows no red rows are skipped. 3) Vectorized SIMD execution: 8 prices summed in one CPU instruction.
Write a single SQL query to find each customer's total spend, ordered by spend descending, showing only customers who spent more than 100.
SELECT c.name,
SUM(ol.qty * ol.unit_price) AS total
FROM customer c
JOIN "order" o ON o.customer_id =
c.id
JOIN order_line ol ON ol.order_id = o.id
GROUP BY c.name
HAVING SUM(ol.qty *
ol.unit_price) > 100
ORDER BY total DESC;
The flower shop stores sensor readings (zone_id, timestamp, lux, pir) at 5-second intervals. Which database, which schema design decision, and why?
Cassandra or InfluxDB. Partition key = zone_id so all readings for a zone are on one node. Clustering key = timestamp so readings are sorted by time on disk. Append-only writes are O(1). Range queries by zone and time window are efficient.
Click / press space to reveal the answer
The flower shop moves disk storage from heap files to a B-tree. SQL queries still work unchanged. Which DBMS property ensures this?
B) Physical independence
Physical independence means a change to the internal (storage) schema does not break the conceptual schema or application. Logical independence protects applications when the conceptual schema changes.
A DBA renames the flowers table to products in the conceptual schema. All
existing application queries still run unchanged. Which DBMS property protects this?
D) Logical independence
Logical independence means changing the conceptual schema does not break external views or applications. Physical independence protects against storage-layer changes; internal and external are schema types, not independence properties.
Customer has a 1:N relationship with Order. Where should the foreign key be placed in SQL?
D) On the Order table, referencing Customer
In 1:N the foreign key goes on the "many" side (Order). A bridge table is only needed for N:N relationships. A composite key is a different concern.
The ERD below links Student and Course directly. A student can enroll in many courses; a course can have many students. What is missing?
[Student] ←::::→ [Course]
D) A bridge entity (e.g. Enrollment)
N:N cannot be stored with a single foreign key. A bridge entity
(Enrollment) holds both student_id and course_id as foreign keys,
converting the N:N into two 1:N relationships.
A flower shop needs to track Customers, Orders, and Flowers. Rules:
Draw the ERD showing entities, relationships, and cardinalities.
A furniture retailer needs to track Customers, Orders, and Products. Rules:
Draw the ERD showing entities, relationships, and cardinalities.
A bike rental company needs to track Members, Rentals, and Bikes. Rules:
Draw the ERD showing entities, relationships, and cardinalities.
Which operation keeps only specific columns from a relation and discards all others?
C) Projection (π)
π(name, price)(flowers) keeps only those two columns. Selection (σ) filters rows by condition. Difference returns rows in A but not B. Cartesian Product pairs every row from A with every row from B.
How many errors does this expression have?
π(student_id)(σ(grade > 90)(students ∪ grades))
B) 1: Union requires compatible schemas
Union (∪) requires both relations to have identical attributes.
students and grades have different schemas, so the union is the one error.
Selection (σ) and Projection (π) are applied correctly.
A web application uses a Python library to connect to PostgreSQL and run SQL queries from code. Which interface type is this?
D) Application Programming Interface (API)
API means connecting programmatically from application code. CLI is a text terminal. Admin Interface is a tool like pgAdmin. Application UI is what end users interact with visually.
A database admin opens a terminal, types psql -U admin flowers_db, and runs SQL commands
manually. Which interface type is this?
D) Command-Line Interface (CLI)
CLI means issuing commands directly in a terminal session. Admin Interface uses a GUI tool like pgAdmin. API is used from application code. Application UI is the visual interface end users interact with.
A query groups flowers by color and must exclude colors with fewer than 3 flowers. Which clause handles this filter?
D) HAVING
WHERE filters rows before grouping;
HAVING COUNT(*) >= 3 filters groups after aggregation. ORDER BY
sorts; LIMIT caps row count.
How many errors does this query have?
SELECT color, COUNT(*)
FROM flowers
WHERE COUNT(*) > 2
GROUP BY color;
B) 1: COUNT(*) cannot appear in WHERE
Aggregate functions like COUNT(*) are not allowed in
WHERE. Fix: replace WHERE COUNT(*) > 2 with
HAVING COUNT(*) > 2 after the GROUP BY.
SELECT supplier, SUM(price)
FROM flowers
WHERE SUM(price) > 100
GROUP BY supplier;
B) 1 error
Line 3: SUM(price) is an aggregate and
cannot appear in WHERE. WHERE runs before grouping, so no aggregate value
exists yet. Fix:
SELECT supplier, SUM(price)
FROM flowers
GROUP BY supplier
HAVING SUM(price) > 100;
SELECT color, name, COUNT(*)
FROM flowers
GROUP BY color;
B) 1 error
Line 1: name is neither in
GROUP BY nor wrapped in an aggregate. When grouping by color, each group
has many names: the engine cannot pick one. Fix: add name to GROUP BY, or
remove it, or wrap it: MIN(name).
SELECT color, COUNT(*) AS cnt
FROM flowers
GROUP BY color
ORDER BY cnt DESC
HAVING cnt > 2;
B) 1 error
Line 5: SQL clause order is
GROUP BY → HAVING → ORDER BY. HAVING cannot come after
ORDER BY. Fix:
GROUP BY color
HAVING cnt > 2
ORDER BY cnt DESC;
SELECT name, AVG(price)
FROM flowers, orders
GROUP BY color
ORDER BY AVG(price) DESC;
C) 2 errors
Line 2: listing two tables without a JOIN
condition creates a Cartesian Product: every flower paired with every order.
Line
3: GROUP BY color but name in SELECT is not
aggregated and not in GROUP BY.
A table's composite primary key is (order_id, flower_id). The column supplier_name
depends only on flower_id. Which normal form is violated?
C) 2NF
2NF requires no partial dependencies: every non-key column must depend
on the whole composite key. supplier_name depends only on
flower_id: fix by splitting it into its own table.
In an orders table, customer_email determines customer_name, but
customer_email is not the primary key. Which normal form does this violate?
D) 3NF: transitive dependency
3NF requires no transitive dependencies: non-key columns must depend only
on the primary key, not on other non-key columns. Fix: move customer_name and
customer_email to a separate customers table.
There is no index on user_id. The engine runs WHERE user_id = 5001 on a
10-million-row table. What does it perform?
D) Full table scan
Without an index the engine inspects every row: work grows with n. B-tree traversal and range scan only happen when a B-tree index exists. A binary tree is an in-memory structure, not a database index.
A composite index exists on (user_id, created_at). Which query benefits most from this
index?
D) WHERE user_id = 5001 ORDER BY created_at DESC
A composite index is used efficiently only when the leading column
(user_id) appears in the filter. Options A–C skip the leading column, so the engine
falls back to a full table scan.
A transaction creates an order, reduces stock, and records payment. The server crashes after step 1. All three changes are rolled back. Which ACID property guarantees this?
D) Atomicity
Atomicity: all-or-nothing. Either all steps commit or none do. Durability means committed data survives crashes. Isolation keeps concurrent transactions separate. Consistency ensures rules hold after each commit.
Which SQL command permanently saves all changes made in the current transaction?
D) COMMIT
COMMIT makes all transaction changes permanent (Durability).
BEGIN starts a transaction. ROLLBACK undoes all changes back to the
BEGIN. SAVEPOINT marks a partial rollback point within a transaction.
A user types ' OR '1'='1 into a login form and gains access without a valid password.
Which attack is this?
C) SQL Injection
SQL injection embeds malicious SQL into a query string to bypass logic. Fixed with parameterized queries. Encryption bypass targets stored data; access control and privilege escalation involve permission misuse, not query manipulation.
A DBA creates a reports role with SELECT on the flowers table, then assigns
that role to user Alice. What access control model is this?
D) Role-Based Access Control (RBAC)
RBAC assigns permissions to roles, then roles to users. Many users can share the same permission set cleanly. SQL injection and encryption address different threats; privilege escalation is unauthorized permission gain.
During a network partition, the CAP theorem says a distributed system cannot simultaneously guarantee which two properties?
D) Consistency and Availability
When a partition occurs, the system must choose: stay consistent (block potentially stale reads) or stay available (return possibly stale data). Partition Tolerance is assumed: a distributed system must survive network failure. Atomicity and Durability are ACID, not CAP.
A distributed flowers database sends rows 1–500,000 to Server A and rows 500,001–1,000,000 to Server B, splitting by row ranges. Which technique is this?
D) Horizontal partitioning
Horizontal partitioning splits rows across nodes (same columns, different rows). Vertical partitioning splits columns. Data replication copies the same data to multiple nodes. Synchronous replication is a consistency strategy, not a partitioning technique.
A query runs SUM(price) on a 20-column table with 1 billion rows. A column-store reads
only the price column. A row-store reads all 20 columns. How many times fewer reads does
the column-store perform?
D) 20×
A row-store reads all 20 columns per row even for a single-column aggregate. A column-store reads only the 1 column needed: 20× fewer reads. This is the core I/O advantage of columnar storage for analytical queries.
Which database covered in Module 13 uses columnar storage designed for fast analytical aggregation across billions of rows?
D) ClickHouse
ClickHouse is the columnar OLAP database studied in Module 13. PostgreSQL and MariaDB are row-oriented relational databases; OracleSQL is the row-store used in Module 10's transactions unit. ClickHouse was accessed live at play.clickhouse.com in labs.
MongoDB stores data as flexible JSON-like records with no fixed schema. Which of the four NoSQL families does MongoDB belong to?
D) Document store
MongoDB is a document store: each record is a JSON-like document with flexible fields. Redis = key-value; Cassandra = column-family; Neo4j = graph. All four are the NoSQL families covered in Module 14.
MongoDB is classified as AP in the CAP theorem. During a network partition, what property does MongoDB sacrifice to remain available?
D) Consistency
AP systems (MongoDB) keep serving requests during a partition but may return stale data: sacrificing strong consistency. CP systems block requests to stay consistent. Atomicity is an ACID term, not a CAP property.
SELECT name, price
FROM flowers
WHERE price > (SELECT AVG(price) FROM flowers)
ORDER BY price DESC;
C) Flowers with above-average price, highest first
The subquery computes AVG(price) once across all flowers.
The outer WHERE keeps only rows above that average. ORDER BY price DESC
then sorts those rows highest to lowest: a classic non-correlated subquery pattern.
SELECT c.name, o.order_id
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
D) Customers who have placed no orders
LEFT JOIN keeps all rows from customers;
unmatched rows get NULL in o.order_id. Filtering
WHERE o.order_id IS NULL isolates exactly those customers: a classic anti-join pattern
for finding missing relationships.
SELECT color, COUNT(*) AS total
FROM flowers
GROUP BY color
HAVING COUNT(*) > 1
ORDER BY total DESC;
C) One row per color, count > 1, sorted descending
GROUP BY color collapses rows into one per color.
HAVING COUNT(*) > 1 drops colors with only one flower.
ORDER BY total DESC puts the most common color first. Result: one summary row per
qualifying color.
SELECT o.order_id, f.name, od.quantity
FROM orders o
INNER JOIN order_details od ON o.order_id = od.order_id
INNER JOIN flowers f ON od.flower_id = f.flower_id
WHERE f.color = 'Red';
C) Orders containing Red flowers, with quantity
The two INNER JOINs chain
orders → order_details → flowers. INNER JOIN drops rows with no match, so
only orders that actually include a Red flower survive. The result gives one row per line item where
color = 'Red'.
How LLMs are changing the SQL interface:tools, risks and opportunities.
Application
|
| SQLite library
|
flowers.db ← file on disk
Application Code
(Java · Python · C++ · JS)
│
│ SQLite library
│ parse · plan · execute
│
Database File
flowers.db
(tables · rows · indexes)
SQLite is included out of the box on many devices and operating systems.
Fix typos in climode.html
Open Parts ✔️
Closed or Protected Parts 🔒
-- rename a table
ALTER TABLE flowers
RENAME TO plants;
-- add a new column
ALTER TABLE flowers
ADD COLUMN origin TEXT;
-- rename a column
ALTER TABLE flowers
RENAME COLUMN price TO unit_price;
-- remove a column
ALTER TABLE flowers
DROP COLUMN color;
-- change column type
ALTER TABLE flowers
ALTER COLUMN price REAL;
-- add multiple columns
ALTER TABLE flowers
ADD COLUMN size TEXT,
ADD COLUMN stock INTEGER;
DB Browser for SQLite offers a simple graphical interface to manage SQLite databases.
sqlite3 database_name.db
CREATE TABLE table_name (
column1 datatype,
column2 datatype
);
INSERT INTO table_name (column1, column2)
VALUES (value1, value2);
import sqlite3
conn = sqlite3.connect("db.db")
cur = conn.cursor()
cur.execute("SELECT * FROM table_name;")
val db = SQLiteDatabase.openOrCreateDatabase("db.db", null)
val cursor = db.rawQuery("SELECT * FROM table_name;", null)
#include "sqlite3.h"
sqlite3 *db;
sqlite3_open("db.db", &db);
sqlite3_exec(db, "SELECT * FROM table_name;", 0, 0, 0);
import SQLite3
var db: OpaquePointer?
sqlite3_open("db.db", &db)
sqlite3_exec(db, "SELECT * FROM table_name;", nil, nil, nil)
ACID Basics
Isolation
Multi-User Limits ⚠️
PRAGMA foreign_keys = ON;
PRAGMA foreign_keys;
Plant Table
CREATE TABLE Plant (
PlantID INTEGER PRIMARY KEY,
Name TEXT NOT NULL,
WaterThreshold FLOAT
);
SensorData Table
CREATE TABLE SensorData (
SensorDataID INTEGER PRIMARY KEY,
PlantID INTEGER NOT NULL,
WaterLevel FLOAT,
Timestamp DATETIME,
FOREIGN KEY (PlantID) REFERENCES Plant(PlantID)
);
Create
INSERT INTO Plant (Name, WaterThreshold)
VALUES ('Aloe Vera', 30);
INSERT INTO SensorData (PlantID, WaterLevel, Timestamp)
VALUES (1, 22, '2025-11-27 10:00');
Read
SELECT *
FROM Plant;
SELECT PlantID, WaterLevel
FROM SensorData
ORDER BY Timestamp DESC;
Update
UPDATE Plant
SET WaterThreshold = 40
WHERE PlantID = 1;
UPDATE SensorData
SET WaterLevel = 18
WHERE SensorDataID = 1;
Delete
DELETE FROM Plant
WHERE PlantID = 3;
DELETE FROM SensorData
WHERE Timestamp < '2025-01-01';
Dot Commands
PRAGMA Settings
PRAGMA foreign_keys = ON;
PRAGMA foreign_keys;
Foreign Key Violation
INSERT INTO SensorData (PlantID, WaterLevel, Timestamp)
VALUES (999, 50, '2025-11-27 11:00');
CREATE TABLE User (
Name VARCHAR(2)
);
INSERT INTO User (Name)
VALUES ('ABCDE'); -- Works!
-- Strict mode needed for real enforcement
CREATE TABLE UserStrict (
Name TEXT
) STRICT;
PostgreSQL Database Management System
(formerly known as Postgres, then Postgres95)
Copyright (c) 1996–2024,
The PostgreSQL Global Development Group
Permission to use, copy, modify, and distribute this
software and its documentation for any purpose, without
fee, and without a written agreement is hereby granted,
provided that the above copyright notice and this
paragraph appear in all copies.
IN NO EVENT SHALL THE POSTGRESQL GLOBAL DEVELOPMENT
GROUP BE LIABLE FOR ANY DAMAGES ARISING OUT OF THE
USE OF THIS SOFTWARE.
CREATE TABLE users (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE
);
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE
);
product_id INTEGER
-- example: 101
product_id INTEGER
-- example: 101
product_name VARCHAR(100)
-- example: 'Oak Dining Table'
product_name VARCHAR(100)
-- example: 'Oak Dining Table'
description TEXT
-- example: 'Solid wood table with natural finish'
description TEXT
-- example: 'Solid wood table with natural finish'
in_stock BOOLEAN
-- example: 1
in_stock BOOLEAN
-- example: true
release_date DATE
-- example: '2025-03-01'
release_date DATE
-- example: '2025-03-01'
product_id INTEGER
PRIMARY KEY AUTOINCREMENT
-- example values: 1, 2, 3
product_id SERIAL PRIMARY KEY
-- example values: 1, 2, 3
price NUMERIC CHECK (price > 0)
-- example: 1299.99
price NUMERIC(10,2)
CHECK (price > 0 AND price < 20000)
-- example: 1299.99
SELECT date('now');
-- example result: '2026-02-04'
SELECT date('now','+7 days');
-- example result: '2026-02-11'
SELECT CURRENT_DATE;
-- example result: 2026-02-04
SELECT CURRENT_DATE + INTERVAL '7 days';
-- example result: 2026-02-11
created_at TIMESTAMP
-- example: '2025-03-01 10:30:00'
created_at TIMESTAMP
-- example: '2025-03-01 10:30:00'
created_at TIMESTAMP
-- example: '2025-03-01 10:30:00'
created_at TIMESTAMPTZ
-- example: '2025-03-01 18:30:00+00'
weight REAL
-- example: 45.5
weight REAL
-- example: 45.5
price NUMERIC
-- example: 1299.99
price NUMERIC(10,2)
-- example: 1299.99
specs JSON
-- example: '{"material":"oak","seats":6}'
specs JSONB
-- example: '{"material":"oak","seats":6}'
-- ENUM
status ENUM('draft','active','discontinued')
-- example: 'active'
-- ARRAY
tags TEXT[]
-- example: {'wood','dining','premium'}
-- JSONB
specs JSONB
-- example: {"material":"oak","seats":6}
-- UUID
product_uuid UUID
-- example: '550e8400-e29b-41d4-a716-446655440000'
-- INTERVAL
warranty INTERVAL
-- example: '2 years 6 months'
-- ENUM
SELECT * FROM products WHERE status = 'active';
-- ARRAY
SELECT * FROM products WHERE 'wood' = ANY(tags);
-- JSONB
SELECT * FROM products WHERE specs->>'material' = 'oak';
-- UUID
SELECT *
FROM products
WHERE product_uuid = '550e8400-e29b-41d4-a716-446655440000';
-- INTERVAL
SELECT * FROM products WHERE warranty > INTERVAL '2 years';
INSERT INTO users (name, email)
VALUES ('Alice', 'alice@example.com');
INSERT INTO users (name, email)
VALUES ('Alice', 'alice@example.com')
RETURNING id;
SELECT id, name, email
FROM users
WHERE name = 'Alice';
SELECT id, name, email
FROM users
WHERE name = 'Alice';
UPDATE users
SET email = 'new@example.com'
WHERE id = 1;
UPDATE users
SET email = 'new@example.com'
WHERE id = 1
RETURNING *;
DELETE FROM users
WHERE id = 1;
DELETE FROM users
WHERE id = 1
RETURNING id;
BEGIN;
UPDATE users SET name = 'Bob';
COMMIT;
BEGIN;
UPDATE users SET name = 'Bob';
COMMIT;
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
user_id INTEGER,
FOREIGN KEY (user_id)
REFERENCES users(id)
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(id)
);
price TEXT
price INTEGER
price REAL
price NUMERIC(10,2)
data JSONB
id UUID
Application (Frontend / Backend)
|
| network connection (TCP)
|
PostgreSQL Server
| parse · plan · execute
|
Database Storage
(tables · rows · indexes)
Client App
(Python · Java · Node · Go)
│
│ driver / protocol
│ authenticate · send SQL
▼
PostgreSQL Server
parse · plan · execute
│
▼
Storage
tables · rows · indexes
SQLite
PostgreSQL
| Database | License | Restrictions |
|---|---|---|
| SQLite | Public Domain | No restrictions, free for any use |
| PostgreSQL | PostgreSQL License (BSD style) | Requires keeping copyright notice |
PostgreSQL Database Management System
Copyright (c) 1996-2024, The PostgreSQL Global Development Group
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written
agreement is hereby granted, provided that the above copyright notice
and this paragraph appear in all copies.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF, OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
pg_stat_statements extension
-- typical enable (often via postgresql.conf)
-- shared_preload_libraries = 'pg_stat_statements'
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
SELECT query, calls, total_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;
CREATE TABLE users (
id UUID PRIMARY KEY,
name TEXT
);
-- example value
'550e8400-e29b-41d4-a716-446655440000'
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
total NUMERIC(10,2)
);
INSERT INTO orders (total)
VALUES (99.99)
RETURNING id;
id SERIAL PRIMARY KEY
-- 1,2,3,4
id UUID PRIMARY KEY
-- random pattern
CREATE TABLE products (
id SERIAL PRIMARY KEY,
tags TEXT[]
);
INSERT INTO products (tags)
VALUES ('{wood,dining,premium}');
SELECT *
FROM products
WHERE 'wood' = ANY(tags);
-- overlap operator
SELECT *
FROM products
WHERE tags && '{wood,oak}';
&& checks overlap
data JSON
data JSONB
CREATE TABLE events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
payload JSONB
);
INSERT INTO events (payload)
VALUES (
'{"type":"click","device":"mobile"}'
);
SELECT *
FROM events
WHERE payload->>'device' = 'mobile';
->> extracts text
SELECT *
FROM events
WHERE payload @>
'{"type":"click"}';
mysql.user table already exists!
Run mariadb-upgrade, not mariadb-install-db
ERROR 2002 (HY000):
Can't connect to local server
through socket '/tmp/mysql.sock' (2)
FATAL ERROR: Upgrade failed
brew services start mariadb
mysqladmin ping
ls /opt/homebrew/var/mysql/*.sock
sudo ln -sf /opt/homebrew/var/mysql/mysql.sock /tmp/mysql.sock
mariadb-upgrade -uroot
$ python --version
Python 3.11.x
$ pip --version
pip 23.x from ... (python 3.9)
# Result:
# Package installs to different Python
# ModuleNotFoundError
which python
python --version
python -m pip install package_name
python3 -m pip install package_name
# or explicitly:
python3.11 -m pip install package_name
python -m pip --version
MariaDB installed via Windows installer
mysql -u root -p
ERROR 2003 (HY000):
Can't connect to MySQL server
on 'localhost'
sc query MariaDB
net start MariaDB
Win + R
services.msc
→ Find "MariaDB"
→ Right-click → Start
mysql -u root -p
mysqladmin ping
This program is free software; you can redistribute it
and/or modify it under the terms of the GNU General Public
License as published by the Free Software Foundation;
either version 2 of the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY.
| Feature | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| AutoPrimary | ROWID | SERIAL | AUTO_INCREMENT |
| Keyword | AUTOINCREMENT | GENERATED | AUTO_INCREMENT |
| Feature | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| UniqueConstraint | YES | YES | YES |
| MultipleNULL | YES | YES | YES |
| Feature | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| NotNull | YES | YES | YES |
| StrictTyping | NO | YES | YES |
| Feature | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| DefaultValue | YES | YES | YES |
| NowFunction | datetime | NOW | NOW |
| Feature | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| CheckConstraint | YES | YES | YES |
| Deferrable | NO | YES | NO |
| Feature | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| ForeignKey | PRAGMA | ON | INNODB |
| Cascade | YES | YES | YES |
| TypeRule | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| Typing | AFFINITY | STRICT | STRICT |
| TypeRule | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| BooleanType | INTEGER | BOOLEAN | TINYINT |
| TypeRule | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| DateStorage | TEXT | DATE | DATE |
| TimestampTZ | NO | YES | NO |
| Type | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| JSON | TEXT | JSONB | JSON |
| ARRAY | NO | YES | NO |
| Database | Example Syntax | Notes |
|---|---|---|
| SQLite |
|
PRAGMA foreign_keys = ON required |
| PostgreSQL |
|
Enforced by default Supports DEFERRABLE |
| MariaDB |
|
Requires InnoDB engine |
| Database | Engine Model | Default Engine | Multiple Engines |
|---|---|---|---|
| SQLite | Embedded | Single | NO |
| PostgreSQL | Integrated | Native | NO |
| MariaDB | Pluggable | InnoDB | YES |
| Concept | Meaning |
|---|---|
| Engine | Storage + Locking + Transaction implementation |
| InnoDB | Row-level locking + ACID |
| MyISAM | Table-level locking + No transactions |
| Use Case | SQLite | PostgreSQL | MariaDB |
|---|---|---|---|
| Prototyping | GOOD | GOOD | GOOD |
| SingleUser | GOOD | OK | OK |
| EmbeddedApp | GOOD | NO | NO |
| WebApp | OK | GOOD | GOOD |
| MultiUser | OK | GOOD | GOOD |
| LAMPStack | NO | OK | GOOD |
| Production | OK | GOOD | GOOD |
| Property | SQLite | MariaDB | PostgreSQL |
|---|---|---|---|
| Large ANSI / ISO | LOW | MEDIUM | HIGH |
| FullOuterJoin | NO | NO | YES |
| Deferrable | NO | NO | YES |
| CheckEnforced | BASIC | GOOD | FULL |
| FilterClause | NO | NO | YES |
| StrongTyping | NO | YES | YES |
CREATE TABLE flowers (
id INT,
name TEXT,
color TEXT,
price NUMERIC
);
INSERT INTO flowers VALUES
(1, 'Rose', 'Red', 5),
(2, 'Lily', 'White', 7),
(3, 'Tulip', 'Red', 4),
(4, 'Daisy', 'White', 3);
SELECT
COUNT(*) AS total_flowers,
COUNT(*) FILTER (WHERE color = 'Red') AS red_flowers,
AVG(price) FILTER (WHERE color = 'White') AS avg_white_price
FROM flowers;
Note: FILTER supported in PostgreSQL
Oracle Database is typically distributed
under commercial proprietary terms.
Oracle Database Express Edition (XE)
is available for free use under Oracle's
separate license terms, but it is not
open source software.
| Feature | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| AutoPrimary | ROWID | SERIAL | IDENTITY / SEQUENCE |
| Keyword | AUTOINCREMENT | GENERATED | GENERATED AS IDENTITY |
| Feature | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| UniqueConstraint | YES | YES | YES |
| MultipleNULL | YES | YES | YES |
| Feature | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| NotNull | YES | YES | YES |
| StrictTyping | NO | YES | YES |
| Feature | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| DefaultValue | YES | YES | YES |
| NowFunction | datetime | NOW | SYSTIMESTAMP |
| Feature | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| CheckConstraint | YES | YES | YES |
| Deferrable | NO | YES | YES |
| Feature | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| ForeignKey | PRAGMA | ON | ON |
| Cascade | YES | YES | YES |
| TypeRule | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| Typing | AFFINITY | STRICT | STRICT |
| TypeRule | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| BooleanType | INTEGER | BOOLEAN | NO NATIVE BOOLEAN |
| TypeRule | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| DateStorage | TEXT | DATE | DATE / TIMESTAMP |
| TimestampTZ | NO | YES | YES |
| Type | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| JSON | TEXT | JSONB | JSON |
| ARRAY | NO | YES | COLLECTION TYPES |
| Database | Example Syntax | Notes |
|---|---|---|
| SQLite |
|
PRAGMA foreign_keys = ON required |
| PostgreSQL |
|
Enforced by default Supports DEFERRABLE |
| Oracle |
|
Strong enterprise constraint support |
| Database | Engine Model | Default Engine | Multiple Engines |
|---|---|---|---|
| SQLite | Embedded | Single | NO |
| PostgreSQL | Integrated | Native | NO |
| Oracle | Integrated | Native | NO |
| Concept | Meaning |
|---|---|
| Engine | Storage + locking + transactions |
| Integrated Engine | Core system manages storage internally |
| Enterprise Focus | Recovery, security, and large workloads |
| Use Case | SQLite | PostgreSQL | Oracle |
|---|---|---|---|
| Prototyping | GOOD | GOOD | OK |
| SingleUser | GOOD | OK | OK |
| EmbeddedApp | GOOD | NO | NO |
| WebApp | OK | GOOD | GOOD |
| MultiUser | OK | GOOD | GOOD |
| EnterpriseERP | NO | GOOD | GOOD |
| Production | OK | GOOD | GOOD |
| Property | SQLite | Oracle | PostgreSQL |
|---|---|---|---|
| Large ANSI / ISO | LOW | HIGH | HIGH |
| FullOuterJoin | NO | YES | YES |
| Deferrable | NO | YES | YES |
| CheckEnforced | BASIC | GOOD | FULL |
| FilterClause | NO | NO | YES |
| StrongTyping | NO | YES | YES |
CREATE TABLE flowers (
id NUMBER,
name VARCHAR2(50),
color VARCHAR2(20),
price NUMBER
);
INSERT ALL
INTO flowers VALUES (1, 'Rose', 'Red', 5)
INTO flowers VALUES (2, 'Lily', 'White', 7)
INTO flowers VALUES (3, 'Tulip', 'Red', 4)
INTO flowers VALUES (4, 'Daisy', 'White', 3)
SELECT * FROM dual;
SELECT
COUNT(*) AS total_flowers,
SUM(CASE WHEN color = 'Red' THEN 1 ELSE 0 END) AS red,
AVG(CASE WHEN color = 'White' THEN price END) AS price
FROM flowers;
Note: Traditional Oracle style
Architecture · SQL similarity · Performance · Real-world use
-- Identical to standard SQL:
SELECT
region,
COUNT(*) AS orders,
SUM(revenue) AS total,
AVG(revenue) AS avg_order
FROM sales
WHERE year = 2024
GROUP BY region
ORDER BY total DESC
LIMIT 10;
-- Approximate distinct count (fast)
SELECT uniq(user_id) FROM events;
-- Exact distinct count (slower)
SELECT uniqExact(user_id) FROM events;
-- 95th percentile of response time
SELECT quantile(0.95)(response_ms)
FROM requests;
-- Count events per hour
SELECT toStartOfHour(ts) AS hr,
COUNT(*) AS hits
FROM logs
GROUP BY hr;
-- Creating a MergeTree table
CREATE TABLE sales (
event_date Date,
region String,
product String,
revenue Float64
)
ENGINE = MergeTree()
ORDER BY (event_date, region);
-- ORDER BY = primary sort key
-- ClickHouse uses it to skip
-- irrelevant data blocks
-- Per-column codec configuration
CREATE TABLE metrics (
ts DateTime CODEC(Delta, LZ4),
cpu_pct Float32 CODEC(Gorilla, LZ4),
region LowCardinality(String),
value Float64 CODEC(ZSTD(1))
)
ENGINE = MergeTree()
ORDER BY (region, ts);
-- LowCardinality = built-in
-- dictionary encoding for strings
CREATE TABLE events ON CLUSTER my_cluster
(
ts DateTime,
user UInt64,
action String
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/tables/{shard}/events',
'{replica}'
)
ORDER BY (ts, user);
-- {shard} and {replica} come from
-- server config: auto-assigned
-- Batch insert (efficient)
INSERT INTO events
SELECT * FROM generateRandom(
'user UInt64, ts DateTime,
action String', 1, 10, 2
) LIMIT 100000;
-- Kafka engine table
CREATE TABLE kafka_events
ENGINE = Kafka
SETTINGS
kafka_broker_list = 'broker:9092',
kafka_topic_list = 'events',
kafka_format = 'JSONEachRow';
-- Standard JOIN (works, watch size)
SELECT e.user_id, u.country,
COUNT(*) AS hits
FROM events AS e
JOIN users AS u ON e.user_id = u.id
GROUP BY e.user_id, u.country;
-- Dictionary lookup (faster for enrichment)
SELECT user_id,
dictGet('users_dict', 'country',
user_id) AS country,
COUNT(*) AS hits
FROM events
GROUP BY user_id, country;
Class Schedule:
Time: 08:00 AM: 09:15 AM on Monday, Wednesday, Friday
Location: John T Chambers Technology Center 114 (CTC 114)
A Database Management System (DBMS) is designed for the efficient storage, access, and update of data. This course explores database systems from two main perspectives.
| Teaching Assistant: | Mr. Qazi Haad |
| Email: | q_haad@u.pacific.edu |
| Office: | Student Support Center |
| Office Hours: | By appointment |
| Instructor: | Dr. Solomon Berhe |
| Email: | sberhe@pacific.edu |
| Zoom: | Zoom Meeting Link |
| Office: | CTC 117 |
| Office Hours: | Mon/Wed, 2:00–3:00 PM (or by appointment) |
Textbooks:
| Homework: | Includes written exercises and programming assignments. Submit via Canvas or GitHub. |
| Team Projects: | Two projects (database design, development, and application integration) |
| Exams: | One midterm and one final. Students may bring one handwritten note sheet (8.5x11) |
All course materials, announcements, and assignments will be posted on Canvas.
Check regularly
for updates
Students are expected to act with integrity, honesty, and responsibility. Violations of the Honor Code will be reported to the Office of Student Conduct and Community Standards
For more details, see the University Academic Honesty Policy
Source Document: Course Syllabus
Due: February 13
Submission
What is data integrity and why is it important in a database?
Data integrity ensures data remains accurate and consistent by using constraints that prevent errors, duplicates, and invalid entries.
CREATE TABLE Bicycles (
bikeID INTEGER PRIMARY KEY,
model TEXT NOT NULL,
price REAL NOT NULL CHECK(price > 0)
);
Provide examples of key data integrity use cases in a Bike system.
CREATE TABLE Stores (
storeID INTEGER PRIMARY KEY
);
CREATE TABLE Bicycles (
bikeID INTEGER PRIMARY KEY,
model TEXT NOT NULL,
stock INTEGER CHECK(stock >= 0),
storeID INTEGER,
FOREIGN KEY(storeID) REFERENCES Stores(storeID)
);
Explain key SQL constraints.
CREATE TABLE Bicycles (
bikeID INTEGER PRIMARY KEY,
serialNumber TEXT UNIQUE,
model TEXT NOT NULL,
bikeType TEXT DEFAULT 'road',
price REAL CHECK(price > 0)
);
Write SQLite tables demonstrating constraints.
CREATE TABLE Stores (
storeID INTEGER PRIMARY KEY,
storeName TEXT NOT NULL UNIQUE,
city TEXT NOT NULL DEFAULT 'Stockton'
);
CREATE TABLE Bicycles (
bikeID INTEGER PRIMARY KEY AUTOINCREMENT,
serialNumber TEXT NOT NULL UNIQUE,
model TEXT NOT NULL,
bikeType TEXT NOT NULL DEFAULT 'road',
storeID INTEGER NOT NULL,
FOREIGN KEY(storeID) REFERENCES Stores(storeID)
);
What does NOT NULL DEFAULT 'Unknown' mean?
The column cannot be NULL. If no value is given, it automatically becomes "Unknown."
CREATE TABLE Bicycles (
bikeID INTEGER PRIMARY KEY,
brand TEXT NOT NULL DEFAULT 'Unknown',
price REAL NOT NULL
);
Difference between PRIMARY KEY, UNIQUE, and NOT NULL?
CREATE TABLE Bicycles (
bikeID INTEGER PRIMARY KEY,
serialNumber TEXT UNIQUE,
model TEXT NOT NULL
);
Risk of too many constraints?
Excessive constraints can make valid inserts fail and reduce flexibility.
-- Overly strict example
CREATE TABLE Bicycles (
bikeID INTEGER PRIMARY KEY,
model TEXT NOT NULL UNIQUE,
price REAL NOT NULL CHECK(price > 5000),
stock INTEGER CHECK(stock = 1)
);
Why must each table have a PRIMARY KEY?
It uniquely identifies each row so records can be reliably updated or referenced.
CREATE TABLE Bicycles (
bikeID INTEGER PRIMARY KEY,
model TEXT,
bikeType TEXT
);
Why is AUTOINCREMENT useful?
It automatically generates the next unique ID for each new record.
CREATE TABLE Bicycles (
bikeID INTEGER PRIMARY KEY AUTOINCREMENT,
model TEXT NOT NULL
);
Compare PRIMARY KEY and FOREIGN KEY.
PRIMARY KEY uniquely identifies rows in its table. FOREIGN KEY links rows to another table.
CREATE TABLE Stores (
storeID INTEGER PRIMARY KEY
);
CREATE TABLE Bicycles (
bikeID INTEGER PRIMARY KEY,
storeID INTEGER,
FOREIGN KEY(storeID) REFERENCES Stores(storeID)
);
Style
Due: February 13
FlowerShop.db using:
.open FlowerShop.db
.databases
.schema.schema
PRAGMA foreign_key_list(<table_name>);
Stock with default value 50.schema.schema after the alter
PRAGMA foreign_keys = ON;PRAGMA foreign_keys;
SELECT * FROM OrderDetails;
DROP TABLE Flowers;.tables
Will dropping the table only delete the table, or will it also delete the data permanently?
Due: March 30
π(name, price) (
σ(color = 'red' ∧ stockQuantity > 20) (Flowers)
);
π(name) (
Flowers ⨝ Orders
);
T1 ← σ(color = 'Yellow') (F)
T2 ← T1 ⨝ O
T3 ← π(name) (T2)
The API receives a request with a parameter such as color=red. The backend translates
this into
an SQL query, sends it to the database, retrieves matching rows, and returns the result as a response
to the
client.
SELECT *
FROM Flowers
WHERE color = 'red';
SELECT category, AVG(price) AS avg_price
FROM Flowers
GROUP BY category;
SELECT category, AVG(price) AS avg_price
FROM Flowers
GROUP BY category
HAVING AVG(price) > 10;
CREATE TABLE flower (
id INT PRIMARY KEY,
color VARCHAR(50),
heightInCm INT CHECK (heightInCm > 0),
petalCount INT CHECK (petalCount >= 0),
species VARCHAR(100),
bloomingSeason VARCHAR(50) CHECK (bloomingSeason IN ('Spring','Summer','Autumn','Winter')),
price DECIMAL(10,2) CHECK (price >= 0),
fragranceLevel INT CHECK (fragranceLevel BETWEEN 0 AND 10)
);
SELECT f.name AS flower_name,
f.color,
s.name AS supplier_name,
s.location
FROM flower f
INNER JOIN supplier s
ON f.supplier_id = s.id;
SELECT c.name AS customer_name,
c.email,
o.id AS order_id,
o.order_date,
o.amount
FROM customer c
LEFT JOIN order o
ON c.id = o.customer_id;
SELECT p.name AS product_name,
p.price,
s.id AS sale_id,
s.quantity_sold
FROM product p
LEFT JOIN sale s
ON p.id = s.product_id;
Due: March 30
INSERT INTO Flower (Name, Color, Price)
SELECT CONCAT('Flower_', id), 'Red', 9.99
FROM seq_1_to_10000;
SELECT COUNT(*) FROM Flower;
Result
COUNT = 10000
SELECT * FROM Flower
WHERE Name = 'Rose';
Scan Type Cost
Full Scan High
CREATE INDEX idx_name
ON Flower(Name);
Column Indexed
Name Yes
SELECT * FROM Flower
WHERE Name = 'Rose';
Scan Type Cost
Index Scan Low
Flower | Supplier | Phone
Rose | Flora | 123...
Rose | Flora | 123...
ID | Name | Phone1 | Phone2
1 | Rose | 123 | 456
Flower Supplier
ID Name ID Name
1 Rose 10 Flora
Supplier Contact
ID Name ID Phone
10 Flora 100 123
SELECT f.Name, s.Name
FROM Flower f
JOIN Supplier s
ON f.SupplierID = s.ID;
Type Speed
Full Scan Slow
Index Fast
Due: March 30
Table A Table B
id | name id | dept
1 | Rose 1 | Sales
Flower Order
id | name id | flower_id
1 | Rose 10 | 1
Customer Order
id | name id | cust_id
1 | Ana - | NULL
type_id | count
1 | 5
2 | 3
price > 0 ✓
price = -5 ✗ rejected
Due: March 30
Topic Purpose
Normalization Reduce redundancy
Indexing Faster lookup
Problem Effect
Redundancy Same data repeated
Anomalies Inconsistent updates
Before
Employee | Skill
Smith | Cook, Type
After
Employee | Skill
Smith | Cook
Smith | Type
Key: (Employee, Skill)
Bad:
Employee | Skill | Dept
Smith | Cook | Sales
Dept depends only on Employee
Bad:
Employee | Father | Father_Address
Father_Address depends on Father
Search Type Method
Full Scan Row by row
Index Tree Scan Search index path
DESCRIPTION?
Before After
PRODUCTS PRODUCTS + INDEX
Scan rows Search index first
Advantage Disadvantage
Fast lookup More space used
More maintenance
Scenario
Log table with constant inserts
→ index maintenance adds overhead
Due: April 13
Warehouse table with columns: id, region, stock
CREATE TABLE Warehouse (
id INT PRIMARY KEY,
region VARCHAR(50),
stock INT
);
INSERT INTO Warehouse VALUES
(1, 'West', 1200),
(2, 'East', 980),
(3, 'Central', 1500),
(4, 'North', 450),
(5, 'South', 820);
-- Total stock
SELECT SUM(stock) AS total_stock
FROM Warehouse;
-- Highest stock region
SELECT region, stock
FROM Warehouse
ORDER BY stock DESC
LIMIT 1;
Shard A (West, North) → Server 1
Shard B (East, South) → Server 2
Shard C (Central) → Server 3
Sharding → split rows, more storage
Replication → same rows, more availability
Due: April 13
Goal Benefit
Scalability Handle more users/data
Availability System stays up if one node fails
Server A: Customer(id, name, email)
Server B: Customer(id, address, phone)
Challenge Effect
Latency Slower cross-node queries
Consistency Replicas may temporarily differ
Use Case Choice
Global e-commerce Distributed
Small business app Single server
ER diagrams visually represent entities and relationships. They help technical and non-technical stakeholders understand database structure.
SELECT name, age users WHERE id == 5;
1. Missing FROM
2. Missing table name after FROM
3. '==' instead of '='
Customers place orders. Orders contain products.
Books are borrowed by members using a loan record.
order_id = 102
order_total = 19.99
order_date = 2026-03-05
order_id INT
order_total DECIMAL(10,2)
order_date DATE
Primary key uniquely identifies users. Email uniqueness prevents duplicates.
CHECK(age > 0)
UUIDs generate globally unique identifiers across systems.
Relational algebra defines theoretical operations on relations.
Relations are sets of tuples.
Projection returns selected attributes.
Book(book_id,title,year_published,author_id)
Author(author_id,name)
Projection and selection occur before join.
.open library.db
Backend developers connect APIs with database queries.
ALTER TABLE Book
ADD COLUMN publisher TEXT;
CREATE TABLE Book (
book_id INT,
title TEXT,
year_published INT
);
SELECT genre
FROM Book
GROUP BY genre
HAVING COUNT(*) > 5;
Example violating 1NF
BorrowRecord
-----------------------------------------------
books
"Database Systems, SQL Fundamentals, Data Modeling"
location
"Univ library 3601 Pacific Ave"
Problems
---------------------------------------------------------
- titles column stores multiple values
- library_location stores multiple pieces of information
- values are not atomic
1NF version
---------------------------------------------------------
title | location_name | street_address
Database Systems | Univ. library | 3601 Pacific Ave
SQL Fundamentals | Univ. library | 3601 Pacific Ave
Data Modeling | Univ. library | 3601 Pacific Ave
Customer
--------------------------------
customer_id INT PRIMARY KEY
name TEXT NOT NULL
email TEXT UNIQUE
phone TEXT
Product
--------------------------------
product_id INT PRIMARY KEY
name TEXT NOT NULL
price DECIMAL(10,2)
stock INT
Order
--------------------------------
order_id INT PRIMARY KEY
customer_id INT
product_id INT
order_date DATE
Customer 1 --- n Order
Product 1 --- n Order
teamX_customers
------------------------
id SERIAL PRIMARY KEY
name VARCHAR(100)
email VARCHAR(100)
teamX_orders
------------------------
id SERIAL PRIMARY KEY
customer_id INT (FK)
flower_id INT (FK)
order_date DATE
\timing on
-- run query
\timing off
CREATE INDEX idx_orders_customer_id
ON teamX_orders(customer_id);
CREATE INDEX idx_orders_flower_id
ON teamX_orders(flower_id);
CREATE INDEX idx_flowers_name
ON teamX_flowers(name);
Please ask anytime sberhe@pacific.edu
Table of Contents | Office Hours: Mon, Wed, 2:00 PM - 3:00 PM, CTC 117 | sberhe@pacific.edu | q_haad@u.pacific.edu | Syllabus | Join Zoom | SQL Docs | OneCompiler | ERD Editor | GitHub Repository Neon Console | ClickHouse Console