Relational Model & Relational Algebra
Formal Foundation of Relational Databases and Query Languages
100% Exam Frequency
Relational Algebra operators appear in every exam. Be prepared to explain Select, Project, Union, Rename, Natural Join, and Cartesian Product with examples.
1. Introduction to Relational Model
What is the Relational Model?
The Relational Model of data is based on the concept of a Relation, which is a mathematical concept based on the ideas of sets. It was introduced by E.F. Codd in 1970 and provides a formal foundation for relational database management systems.
Informal Definition
A relation is simply a table of values:
- A set of rows (tuples)
- A set of columns (attributes)
- Each row represents a real-world entity or relationship
- Each column has a name (attribute name)
Formal Definition
A relation schema R(A1, A2, ..., An) is defined over attributes A1, A2, ..., An, where each attribute has a domain.
CUSTOMER(Cust-id, Cust-name, Address, Phone#)
-- Cust-id domain: 6-digit numbers
-- Cust-name domain: strings of 25 chars
2. Relational Model Concepts
| Informal Term | Formal Term | Description |
|---|---|---|
| Table | Relation | A two-dimensional structure containing data |
| Column | Attribute / Domain | A named column of a relation |
| Row | Tuple | A single row of a relation |
| Values in a column | Domain | Set of allowed values for an attribute |
| Table Definition | Schema (Intension) | The logical structure of the relation |
| Populated Table | Extension (Instance) | The actual data in the relation at a point in time |
Example: A Tuple
A tuple is an ordered set of values derived from appropriate domains:
<632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000">
-- This tuple belongs to the CUSTOMER relation
Characteristics of Relations
Ordering of Tuples
Tuples are NOT considered to be ordered. The order in which rows appear doesn't matter - they form a set, not a list.
Atomic Values
All values are considered atomic (indivisible). This is a requirement for First Normal Form (1NF).
NULL Values
The special value NULL represents unknown or inapplicable values. It causes complications in many operations.
Attribute Names
Each attribute must have a unique name within a relation. Attributes in R(A1, A2, ..., An) are considered ordered.
3. Relational Model Constraints
Constraints are conditions that must hold on all valid relation instances. There are three main types:
Key Constraints
Define uniqueness requirements on attributes. No two tuples can have the same value for the key attribute(s).
Entity Integrity
Primary key attributes cannot have NULL values. Every tuple must be uniquely identifiable.
Referential Integrity
Foreign key values must either match a primary key value in the referenced relation or be NULL.
4. Concept of Keys
Exam Tip
Understanding the hierarchy: Super Key โ Candidate Key โ Primary Key is essential. Be clear on definitions with examples.
Super Key
A set of attributes SK of R such that no two tuples in any valid relation instance r(R) will have the same value for SK. May contain extra attributes.
Example: For INSTRUCTOR relation with attributes (ID, name, dept_name, salary):
{ID}, {ID, name}, {ID, salary}, {ID, name, salary} are all super keys
Candidate Key
A minimal super key - removal of any attribute from it results in a set that is NOT a super key. Also called "minimal super key".
Example: For CAR(State, Reg#, SerialNo, Make, Model, Year):
- Key1 = {State, Reg#} - Candidate key
- Key2 = {SerialNo} - Candidate key
- {SerialNo, Make} - Super key but NOT a candidate key (not minimal)
Primary Key
One candidate key chosen by the database designer to uniquely identify tuples. Primary key attributes are typically underlined in the schema.
Foreign Key
An attribute (or set of attributes) in one relation that references the primary key of another relation. Used to establish relationships between tables.
DEPARTMENT(DNO, DName, Location)
-- DNO in EMPLOYEE references DNO in DEPARTMENT
Secondary Key
A candidate key that is not selected as the primary key. Also called alternate keys.
Prime vs Non-Prime Attributes
Prime Attribute
An attribute that is a member of some candidate key.
Non-Prime Attribute
An attribute that is NOT a member of any candidate key.
5. Integrity Constraints
Entity Integrity Constraint
The primary key attributes PK of each relation schema R cannot have NULL values in any tuple of r(R).
t[PK] โ NULL for any tuple t in r(R)
This is because primary key values are used to identify individual tuples.
Referential Integrity Constraint
A constraint involving two relations. The foreign key value must either:
- Match an existing primary key value in the referenced table, OR
- Be NULL (if allowed)
Ensures relationships between tables remain consistent.
Update Operations and Constraints
When constraints are violated during INSERT, DELETE, or UPDATE operations, several actions can be taken:
- REJECT: Cancel the operation that causes the violation
- CASCADE: Propagate changes to related tuples
- SET NULL: Set foreign key values to NULL
- SET DEFAULT: Set to a default value
6. Mapping ER Model to Relational Model
Exam Tip
This is a frequently asked 10-mark question: "Draw an ER diagram and map it to relational schema". Know all 8 steps of the mapping algorithm.
ER-to-Relational Mapping Algorithm
Step 1: Mapping of Regular (Strong) Entity Types
For each strong entity E, create a relation R with all simple attributes of E. Choose one key attribute as primary key.
DEPARTMENT(DNUMBER, Dname, Location)
PROJECT(PNUMBER, Pname, Plocation)
Step 2: Mapping of Weak Entity Types
For each weak entity W with owner entity E, create a relation R with all attributes of W. Include owner's primary key as foreign key. Primary key = Owner's PK + Partial key of W.
Step 3: Mapping of Binary 1:1 Relationship Types
Three approaches:
- Foreign Key approach: Add FK to one relation (preferably with total participation)
- Merged relation: Combine both entities (when both have total participation)
- Cross-reference: Create a third relation
Step 4: Mapping of Binary 1:N Relationship Types
Include the primary key of the "one" side as a foreign key in the "many" side relation.
-- DNO references DEPARTMENT (1:N - one dept has many employees)
Step 5: Mapping of Binary M:N Relationship Types
Create a new relation S with primary keys of both participating entities as foreign keys. Their combination forms the primary key of S. Include any relationship attributes.
Step 6: Mapping of Multivalued Attributes
Create a new relation R with the attribute A plus the primary key K of the parent entity as foreign key. Primary key = (K, A).
Step 7: Mapping of N-ary Relationship Types
Create a new relation S with primary keys of all participating entities as foreign keys, plus any relationship attributes.
Step 8: Mapping of Specialization/Generalization
Option 8A: Create relations for superclass AND each subclass (subclass has superclass PK as FK)
Option 8B: Create relations ONLY for subclasses with all inherited attributes (works for total specialization)
Option 8C: Single relation with type attribute (for disjoint)
Option 8D: Single relation with multiple Boolean type attributes (for overlapping)
| ER Model Construct | Relational Model |
|---|---|
| Entity type | "Entity" relation |
| 1:1 or 1:N relationship | Foreign key (or "relationship" relation) |
| M:N relationship | "Relationship" relation with two FKs |
| N-ary relationship | "Relationship" relation with n FKs |
| Simple attribute | Attribute |
| Composite attribute | Set of simple component attributes |
| Multivalued attribute | Relation and foreign key |
| Value set | Domain |
7. Introduction to Relational Algebra
What is Relational Algebra?
Relational Algebra is a procedural query language that consists of a set of operations that take one or two relations as input and produce a new relation as output. It provides the theoretical foundation for SQL.
Characteristics
- Procedural: Specifies what data AND how to retrieve it
- Closure property: Output of an operation is also a relation
- Foundation for SQL: SQL is based on relational algebra
- Operations can be combined into expressions
Types of Operations
- Unary: Operate on one relation (SELECT, PROJECT, RENAME)
- Binary: Operate on two relations (UNION, INTERSECTION, DIFFERENCE)
- Join Operations: Combine related tuples
- Aggregate: Compute summary values
Six Fundamental Operations
Select
Horizontal filter (rows)
Project
Vertical filter (columns)
Union
Combine tuples from both
Set Difference
Tuples in first, not second
Cartesian Product
All combinations
Rename
Rename relation/attributes
8. Unary Relational Operations
ฯ SELECT Operation (Selection)
Selects tuples (rows) that satisfy a given predicate. It is a horizontal filter that produces a subset of tuples.
Notation
ฯp(r) where p is the selection predicate
Examples
ฯDNO=4(EMPLOYEE)
-- Select all employees where department number is 4
ฯSALARY > 30000(EMPLOYEE)
-- Select employees with salary greater than 30,000
ฯdept_name='IT' โง marks > 60(student)
-- Select IT students with marks > 60
Properties:
- Result has the same schema (attributes) as the input relation
- SELECT is commutative: ฯc1(ฯc2(R)) = ฯc2(ฯc1(R))
- Cascaded SELECTs can be combined: ฯc1(ฯc2(R)) = ฯc1 โง c2(R)
ฯ PROJECT Operation (Projection)
Returns the relation with only specified attributes (columns). It is a vertical filter.
Notation
ฯA1, A2, ..., Ak(r) where A1, A2, ..., Ak are attribute names
Examples
ฯLNAME, FNAME, SALARY(EMPLOYEE)
-- Get only name and salary columns
ฯID, name, salary(instructor)
-- Project ID, name, salary from instructor relation
Properties:
- Duplicate rows are removed (since relations are sets)
- Number of tuples in result โค number in input
- If projection includes a key, tuple count remains same
ฯ RENAME Operation
Allows renaming of relations and/or their attributes. Critical for self-joins and clarifying results.
Notation
ฯS(B1, B2, ..., Bn)(R) - Rename R to S with new attribute names
ฯS(R) - Just rename relation to S
Example: Self-Join
-- Find employees who earn more than their supervisor
DEP5_EMPS โ ฯDNO=5(EMPLOYEE)
RESULT โ ฯFNAME, LNAME, SALARY(DEP5_EMPS)
-- Using rename for self-join
ฯd(instructor) -- Rename instructor as 'd' for comparison
SELECT and PROJECT Visualization
9. Set Theory Operations
Type Compatibility (Union Compatibility)
For UNION, INTERSECTION, and DIFFERENCE operations, the two relations must be type compatible:
- Same number of attributes
- Corresponding attributes have compatible domains
โช UNION Operation
Returns all tuples that are in either R or S or both. Duplicate tuples are eliminated.
r โช s = {t | t โ r OR t โ s}
-- Example: Find SSNs of employees who work in dept 5 OR supervise someone in dept 5
DEP5_EMPS โ ฯDNO=5(EMPLOYEE)
RESULT1 โ ฯSSN(DEP5_EMPS)
RESULT2(SSN) โ ฯSUPERSSN(DEP5_EMPS)
RESULT โ RESULT1 โช RESULT2
โฉ INTERSECTION Operation
Returns all tuples that are in BOTH R and S.
r โฉ s = {t | t โ r AND t โ s}
-- Example: Find people who are both students AND instructors
ฯName(STUDENT) โฉ ฯName(INSTRUCTOR)
INTERSECTION can be expressed using other operations: R โฉ S = R โ (R โ S)
โ SET DIFFERENCE (MINUS) Operation
Returns all tuples that are in R but NOT in S.
r โ s = {t | t โ r AND t โ s}
-- Example: Find courses taught in Fall 2009 but NOT in Spring 2010
ฯcourse_id(ฯsemester='Fall' โง year=2009(section)) โ
ฯcourse_id(ฯsemester='Spring' โง year=2010(section))
Note: SET DIFFERENCE is NOT commutative: R โ S โ S โ R
ร CARTESIAN PRODUCT Operation
Combines every tuple of R with every tuple of S. If R has nr tuples and S has ns tuples, result has nr ร ns tuples.
r ร s = {t q | t โ r AND q โ s}
-- Example
FEMALE_EMPS โ ฯSEX='F'(EMPLOYEE)
EMPNAMES โ ฯFNAME, LNAME, SSN(FEMALE_EMPS)
EMP_DEPENDENTS โ EMPNAMES ร DEPENDENT
Cartesian product alone is rarely useful. It's typically combined with SELECT to create meaningful joins.
Set Operations Visualization
10. Binary Relational Operations
รท DIVISION Operation
Used when we need to find tuples in one relation that are related to ALL tuples in another relation. Useful for "for all" type queries.
Definition
R(Z) รท S(X) where X โ Z. Let Y = Z โ X. The result contains tuples t such that for every tuple s in S, there is a tuple (t, s) in R.
Example
Find all customers who have accounts at ALL branches located in Brooklyn:
ฯcustomer_name, branch_name(depositor โ account) รท
ฯbranch_name(ฯbranch_city='Brooklyn'(branch))
11. Join Operations
โ NATURAL JOIN
Combines tuples from two relations based on common attributes. Automatically matches attributes with the same name and removes duplicate columns.
-- Natural join of instructor and teaches on common attribute ID
instructor โ teaches
-- Result has attributes: ID, name, dept_name, salary, course_id, sec_id, semester, year
Danger of Natural Join
Beware of unrelated attributes with the same name getting matched incorrectly!
โฮธ THETA JOIN (Conditional Join)
Combines Cartesian product with a selection condition ฮธ. More general than natural join.
R โcondition S = ฯcondition(R ร S)
-- Example: Join DEPARTMENT and EMPLOYEE where manager SSN matches
DEPARTMENT โMGRSSN=SSN EMPLOYEE
=โ EQUIJOIN
A theta join where the condition uses only equality (=) comparisons. Most common type of join.
-- Find manager name for each department
DEPT_MGR โ DEPARTMENT โMGRSSN=SSN EMPLOYEE
Outer Joins
Outer joins preserve tuples that don't have matching tuples in the other relation by padding with NULLs.
โ LEFT OUTER JOIN
Keeps all tuples from the LEFT relation. Unmatched tuples get NULL values for right relation attributes.
course โ prereq
โ RIGHT OUTER JOIN
Keeps all tuples from the RIGHT relation. Unmatched tuples get NULL values for left relation attributes.
course โ prereq
โ FULL OUTER JOIN
Keeps all tuples from BOTH relations. Unmatched tuples from either side get NULLs.
course โ prereq
12. Aggregate Functions
Aggregate Functions
Functions that compute summary values from collections of data. Cannot be expressed in basic relational algebra.
Common Functions
- COUNT: Number of values/tuples
- SUM: Sum of numeric values
- AVG: Average of numeric values
- MAX: Maximum value
- MIN: Minimum value
Notation
grouping_attrsโฑfunction_list(R)
-- Examples:
โฑMAX Salary(Employee)
-- Maximum salary
DNOโฑCOUNT SSN, AVG Salary(Employee)
-- Count and average per department
13. Relational Algebra Query Examples
Q1: Retrieve name and address of all employees who work for the 'Research' department
โผRESEARCH_DEPT โ ฯDNAME='Research'(DEPARTMENT)
RESEARCH_EMPS โ (RESEARCH_DEPT โDNUMBER=DNO EMPLOYEE)
RESULT โ ฯFNAME, LNAME, ADDRESS(RESEARCH_EMPS)
Q2: Retrieve the names of employees who have no dependents
โผALL_EMPS โ ฯSSN(EMPLOYEE)
EMPS_WITH_DEPS(SSN) โ ฯESSN(DEPENDENT)
EMPS_WITHOUT_DEPS โ ALL_EMPS โ EMPS_WITH_DEPS
RESULT โ ฯLNAME, FNAME(EMPS_WITHOUT_DEPS โ EMPLOYEE)
Q3: Find the largest salary in the university
โผ-- Step 1: Find salaries that are NOT maximum (less than some other)
NOT_MAX โ ฯinstructor.salary(ฯinstructor.salary < d.salary(instructor ร ฯd(instructor)))
-- Step 2: Maximum = All salaries - Non-maximum salaries
MAX_SALARY โ ฯsalary(instructor) โ NOT_MAX
Q4: Find courses offered in Fall 2009 AND Spring 2010
โผFALL_2009 โ ฯcourse_id(ฯsemester='Fall' โง year=2009(section))
SPRING_2010 โ ฯcourse_id(ฯsemester='Spring' โง year=2010(section))
RESULT โ FALL_2009 โฉ SPRING_2010
Q5: Find names of instructors in Physics along with course IDs they taught
โผ-- Method 1:
ฯinstructor.ID, course_id(ฯdept_name='Physics'(ฯinstructor.ID=teaches.ID(instructor ร teaches)))
-- Method 2 (more efficient):
ฯinstructor.ID, course_id(ฯinstructor.ID=teaches.ID(ฯdept_name='Physics'(instructor) ร teaches))
14. Practice Questions
Explain the following Relational Algebra operators with suitable examples: Select, Project, Union, Rename, Natural Join [10 marks]
โผ1. SELECT (ฯ) - Selection Operation:
- Unary operator that selects tuples satisfying a predicate
- Horizontal filtering of rows
- Notation: ฯp(r) where p is the predicate
Example: ฯsalary > 50000(EMPLOYEE) - selects employees with salary > 50000
2. PROJECT (ฯ) - Projection Operation:
- Unary operator that returns specified attributes
- Vertical filtering of columns
- Removes duplicate tuples
Example: ฯname, salary(EMPLOYEE) - projects only name and salary
3. UNION (โช):
- Binary operator combining tuples from both relations
- Relations must be union-compatible
- Eliminates duplicates
Example: ฯname(STUDENT) โช ฯname(INSTRUCTOR)
4. RENAME (ฯ):
- Renames relation and/or attributes
- Used for self-joins and clarifying expressions
Example: ฯS(ID, Name)(EMPLOYEE) - renames to S with new attribute names
5. NATURAL JOIN (โ):
- Combines tuples from two relations on common attributes
- Automatically matches same-named attributes
- Removes duplicate attributes
Example: EMPLOYEE โ DEPARTMENT - joins on common DNO
Define: Super Key, Candidate Key, Primary Key, Foreign Key with examples [5 marks]
โผSuper Key: A set of attributes that uniquely identifies each tuple. May contain extra attributes.
Example: {SSN}, {SSN, Name}, {SSN, Salary} are all super keys of EMPLOYEE
Candidate Key: A minimal super key - removing any attribute makes it non-unique.
Example: For CAR relation - {State, Reg#} and {SerialNo} are candidate keys
Primary Key: The candidate key chosen by designer to identify tuples. Underlined in schema.
Example: EMPLOYEE(SSN, Name, Address) - SSN is primary key
Foreign Key: Attribute(s) in one relation that reference the primary key of another relation.
Example: DNO in EMPLOYEE references DNUMBER in DEPARTMENT
Explain ER to Relational Mapping algorithm with all steps [10 marks]
โผStep 1 - Strong Entity: Create relation with all simple attributes. Entity's key becomes PK.
Step 2 - Weak Entity: Create relation with all attributes + owner's PK as FK. PK = Owner's PK + Partial key.
Step 3 - Binary 1:1: Add FK to one relation (preferably total participation side).
Step 4 - Binary 1:N: Add PK of "1" side as FK in "N" side relation.
Step 5 - Binary M:N: Create new relation with both PKs as FKs, forming composite PK.
Step 6 - Multivalued Attribute: Create new relation with attribute + owner's PK.
Step 7 - N-ary Relationship: Create relation with all participating PKs as FKs.
Step 8 - Specialization/Generalization: Options include separate tables for super/subclasses or combined tables with type attributes.
Explain different types of Join operations in Relational Algebra [5 marks]
โผ1. Natural Join (โ): Joins on all common attributes, removes duplicates.
2. Theta Join (โฮธ): Cartesian product followed by selection on condition ฮธ.
3. Equijoin: Theta join where condition uses only equality (=).
4. Left Outer Join (โ): Keeps all left tuples, NULLs for unmatched right.
5. Right Outer Join (โ): Keeps all right tuples, NULLs for unmatched left.
6. Full Outer Join (โ): Keeps all tuples from both, NULLs where no match.