III

Relational Model & Relational Algebra

Formal Foundation of Relational Databases and Query Languages

100% Exam Frequency

Relational Algebra operators appear in every exam. Be prepared to explain Select, Project, Union, Rename, Natural Join, and Cartesian Product with examples.

1. Introduction to Relational Model

What is the Relational Model?

The Relational Model of data is based on the concept of a Relation, which is a mathematical concept based on the ideas of sets. It was introduced by E.F. Codd in 1970 and provides a formal foundation for relational database management systems.

๐Ÿ“Š

Informal Definition

A relation is simply a table of values:

  • A set of rows (tuples)
  • A set of columns (attributes)
  • Each row represents a real-world entity or relationship
  • Each column has a name (attribute name)
๐Ÿ“

Formal Definition

A relation schema R(A1, A2, ..., An) is defined over attributes A1, A2, ..., An, where each attribute has a domain.

CUSTOMER(Cust-id, Cust-name, Address, Phone#) -- Cust-id domain: 6-digit numbers -- Cust-name domain: strings of 25 chars

2. Relational Model Concepts

Informal Term Formal Term Description
Table Relation A two-dimensional structure containing data
Column Attribute / Domain A named column of a relation
Row Tuple A single row of a relation
Values in a column Domain Set of allowed values for an attribute
Table Definition Schema (Intension) The logical structure of the relation
Populated Table Extension (Instance) The actual data in the relation at a point in time

Example: A Tuple

A tuple is an ordered set of values derived from appropriate domains:

<632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000"> -- This tuple belongs to the CUSTOMER relation

Characteristics of Relations

Ordering of Tuples

Tuples are NOT considered to be ordered. The order in which rows appear doesn't matter - they form a set, not a list.

Atomic Values

All values are considered atomic (indivisible). This is a requirement for First Normal Form (1NF).

NULL Values

The special value NULL represents unknown or inapplicable values. It causes complications in many operations.

Attribute Names

Each attribute must have a unique name within a relation. Attributes in R(A1, A2, ..., An) are considered ordered.

3. Relational Model Constraints

Constraints are conditions that must hold on all valid relation instances. There are three main types:

๐Ÿ”‘

Key Constraints

Define uniqueness requirements on attributes. No two tuples can have the same value for the key attribute(s).

โœ…

Entity Integrity

Primary key attributes cannot have NULL values. Every tuple must be uniquely identifiable.

๐Ÿ”—

Referential Integrity

Foreign key values must either match a primary key value in the referenced relation or be NULL.

4. Concept of Keys

Exam Tip

Understanding the hierarchy: Super Key โ†’ Candidate Key โ†’ Primary Key is essential. Be clear on definitions with examples.

1

Super Key

A set of attributes SK of R such that no two tuples in any valid relation instance r(R) will have the same value for SK. May contain extra attributes.

Example: For INSTRUCTOR relation with attributes (ID, name, dept_name, salary):

{ID}, {ID, name}, {ID, salary}, {ID, name, salary} are all super keys

2

Candidate Key

A minimal super key - removal of any attribute from it results in a set that is NOT a super key. Also called "minimal super key".

Example: For CAR(State, Reg#, SerialNo, Make, Model, Year):

  • Key1 = {State, Reg#} - Candidate key
  • Key2 = {SerialNo} - Candidate key
  • {SerialNo, Make} - Super key but NOT a candidate key (not minimal)
3

Primary Key

One candidate key chosen by the database designer to uniquely identify tuples. Primary key attributes are typically underlined in the schema.

EMPLOYEE(SSN, Name, Address, Salary)
4

Foreign Key

An attribute (or set of attributes) in one relation that references the primary key of another relation. Used to establish relationships between tables.

EMPLOYEE(SSN, Name, DNO)
DEPARTMENT(DNO, DName, Location)
-- DNO in EMPLOYEE references DNO in DEPARTMENT
5

Secondary Key

A candidate key that is not selected as the primary key. Also called alternate keys.

Prime vs Non-Prime Attributes

Prime Attribute

An attribute that is a member of some candidate key.

Non-Prime Attribute

An attribute that is NOT a member of any candidate key.

5. Integrity Constraints

๐Ÿ›ก๏ธ

Entity Integrity Constraint

The primary key attributes PK of each relation schema R cannot have NULL values in any tuple of r(R).

t[PK] โ‰  NULL for any tuple t in r(R)

This is because primary key values are used to identify individual tuples.

๐Ÿ”—

Referential Integrity Constraint

A constraint involving two relations. The foreign key value must either:

  1. Match an existing primary key value in the referenced table, OR
  2. Be NULL (if allowed)

Ensures relationships between tables remain consistent.

Update Operations and Constraints

When constraints are violated during INSERT, DELETE, or UPDATE operations, several actions can be taken:

  • REJECT: Cancel the operation that causes the violation
  • CASCADE: Propagate changes to related tuples
  • SET NULL: Set foreign key values to NULL
  • SET DEFAULT: Set to a default value

6. Mapping ER Model to Relational Model

Exam Tip

This is a frequently asked 10-mark question: "Draw an ER diagram and map it to relational schema". Know all 8 steps of the mapping algorithm.

ER-to-Relational Mapping Algorithm

Step 1: Mapping of Regular (Strong) Entity Types

For each strong entity E, create a relation R with all simple attributes of E. Choose one key attribute as primary key.

EMPLOYEE(SSN, Fname, Lname, Address, Salary)
DEPARTMENT(DNUMBER, Dname, Location)
PROJECT(PNUMBER, Pname, Plocation)
Step 2: Mapping of Weak Entity Types

For each weak entity W with owner entity E, create a relation R with all attributes of W. Include owner's primary key as foreign key. Primary key = Owner's PK + Partial key of W.

DEPENDENT(ESSN, Dependent_Name, Sex, BDate, Relationship)
Step 3: Mapping of Binary 1:1 Relationship Types

Three approaches:

  • Foreign Key approach: Add FK to one relation (preferably with total participation)
  • Merged relation: Combine both entities (when both have total participation)
  • Cross-reference: Create a third relation
DEPARTMENT(DNUMBER, Dname, Mgr_SSN, Mgr_Start_Date)
Step 4: Mapping of Binary 1:N Relationship Types

Include the primary key of the "one" side as a foreign key in the "many" side relation.

EMPLOYEE(SSN, Fname, Lname, DNO)
-- DNO references DEPARTMENT (1:N - one dept has many employees)
Step 5: Mapping of Binary M:N Relationship Types

Create a new relation S with primary keys of both participating entities as foreign keys. Their combination forms the primary key of S. Include any relationship attributes.

WORKS_ON(ESSN, PNO, Hours)
Step 6: Mapping of Multivalued Attributes

Create a new relation R with the attribute A plus the primary key K of the parent entity as foreign key. Primary key = (K, A).

DEPT_LOCATIONS(DNUMBER, DLOCATION)
Step 7: Mapping of N-ary Relationship Types

Create a new relation S with primary keys of all participating entities as foreign keys, plus any relationship attributes.

SUPPLY(SNAME, PARTNO, PROJNAME, Quantity)
Step 8: Mapping of Specialization/Generalization

Option 8A: Create relations for superclass AND each subclass (subclass has superclass PK as FK)

Option 8B: Create relations ONLY for subclasses with all inherited attributes (works for total specialization)

Option 8C: Single relation with type attribute (for disjoint)

Option 8D: Single relation with multiple Boolean type attributes (for overlapping)

ER Model Construct Relational Model
Entity type"Entity" relation
1:1 or 1:N relationshipForeign key (or "relationship" relation)
M:N relationship"Relationship" relation with two FKs
N-ary relationship"Relationship" relation with n FKs
Simple attributeAttribute
Composite attributeSet of simple component attributes
Multivalued attributeRelation and foreign key
Value setDomain

7. Introduction to Relational Algebra

What is Relational Algebra?

Relational Algebra is a procedural query language that consists of a set of operations that take one or two relations as input and produce a new relation as output. It provides the theoretical foundation for SQL.

Characteristics

  • Procedural: Specifies what data AND how to retrieve it
  • Closure property: Output of an operation is also a relation
  • Foundation for SQL: SQL is based on relational algebra
  • Operations can be combined into expressions

Types of Operations

  • Unary: Operate on one relation (SELECT, PROJECT, RENAME)
  • Binary: Operate on two relations (UNION, INTERSECTION, DIFFERENCE)
  • Join Operations: Combine related tuples
  • Aggregate: Compute summary values

Six Fundamental Operations

ฯƒ

Select

Horizontal filter (rows)

ฯ€

Project

Vertical filter (columns)

โˆช

Union

Combine tuples from both

โˆ’

Set Difference

Tuples in first, not second

ร—

Cartesian Product

All combinations

ฯ

Rename

Rename relation/attributes

8. Unary Relational Operations

ฯƒ SELECT Operation (Selection)

Selects tuples (rows) that satisfy a given predicate. It is a horizontal filter that produces a subset of tuples.

Notation

ฯƒp(r) where p is the selection predicate

Examples

ฯƒDNO=4(EMPLOYEE) -- Select all employees where department number is 4 ฯƒSALARY > 30000(EMPLOYEE) -- Select employees with salary greater than 30,000 ฯƒdept_name='IT' โˆง marks > 60(student) -- Select IT students with marks > 60
Properties:
  • Result has the same schema (attributes) as the input relation
  • SELECT is commutative: ฯƒc1(ฯƒc2(R)) = ฯƒc2(ฯƒc1(R))
  • Cascaded SELECTs can be combined: ฯƒc1(ฯƒc2(R)) = ฯƒc1 โˆง c2(R)

ฯ€ PROJECT Operation (Projection)

Returns the relation with only specified attributes (columns). It is a vertical filter.

Notation

ฯ€A1, A2, ..., Ak(r) where A1, A2, ..., Ak are attribute names

Examples

ฯ€LNAME, FNAME, SALARY(EMPLOYEE) -- Get only name and salary columns ฯ€ID, name, salary(instructor) -- Project ID, name, salary from instructor relation
Properties:
  • Duplicate rows are removed (since relations are sets)
  • Number of tuples in result โ‰ค number in input
  • If projection includes a key, tuple count remains same

ฯ RENAME Operation

Allows renaming of relations and/or their attributes. Critical for self-joins and clarifying results.

Notation

ฯS(B1, B2, ..., Bn)(R) - Rename R to S with new attribute names

ฯS(R) - Just rename relation to S

Example: Self-Join

-- Find employees who earn more than their supervisor DEP5_EMPS โ† ฯƒDNO=5(EMPLOYEE) RESULT โ† ฯ€FNAME, LNAME, SALARY(DEP5_EMPS) -- Using rename for self-join ฯd(instructor) -- Rename instructor as 'd' for comparison

SELECT and PROJECT Visualization

Relation R A | B | C | D 1 | a | x | 5 2 | b | y | 8 3 | a | z | 3 ฯƒB='a' SELECT Result 1 | a | x | 5 3 | a | z | 3 ฯ€A,C PROJECT Result A | C 1 | x

9. Set Theory Operations

Type Compatibility (Union Compatibility)

For UNION, INTERSECTION, and DIFFERENCE operations, the two relations must be type compatible:

  • Same number of attributes
  • Corresponding attributes have compatible domains

โˆช UNION Operation

Returns all tuples that are in either R or S or both. Duplicate tuples are eliminated.

r โˆช s = {t | t โˆˆ r OR t โˆˆ s} -- Example: Find SSNs of employees who work in dept 5 OR supervise someone in dept 5 DEP5_EMPS โ† ฯƒDNO=5(EMPLOYEE) RESULT1 โ† ฯ€SSN(DEP5_EMPS) RESULT2(SSN) โ† ฯ€SUPERSSN(DEP5_EMPS) RESULT โ† RESULT1 โˆช RESULT2

โˆฉ INTERSECTION Operation

Returns all tuples that are in BOTH R and S.

r โˆฉ s = {t | t โˆˆ r AND t โˆˆ s} -- Example: Find people who are both students AND instructors ฯ€Name(STUDENT) โˆฉ ฯ€Name(INSTRUCTOR)

INTERSECTION can be expressed using other operations: R โˆฉ S = R โˆ’ (R โˆ’ S)

โˆ’ SET DIFFERENCE (MINUS) Operation

Returns all tuples that are in R but NOT in S.

r โˆ’ s = {t | t โˆˆ r AND t โˆ‰ s} -- Example: Find courses taught in Fall 2009 but NOT in Spring 2010 ฯ€course_id(ฯƒsemester='Fall' โˆง year=2009(section)) โˆ’ ฯ€course_id(ฯƒsemester='Spring' โˆง year=2010(section))

Note: SET DIFFERENCE is NOT commutative: R โˆ’ S โ‰  S โˆ’ R

ร— CARTESIAN PRODUCT Operation

Combines every tuple of R with every tuple of S. If R has nr tuples and S has ns tuples, result has nr ร— ns tuples.

r ร— s = {t q | t โˆˆ r AND q โˆˆ s} -- Example FEMALE_EMPS โ† ฯƒSEX='F'(EMPLOYEE) EMPNAMES โ† ฯ€FNAME, LNAME, SSN(FEMALE_EMPS) EMP_DEPENDENTS โ† EMPNAMES ร— DEPENDENT

Cartesian product alone is rarely useful. It's typically combined with SELECT to create meaningful joins.

Set Operations Visualization

R S R โˆช S (Shaded area) R S R โˆฉ S (Overlap) R S R โˆ’ S (R only)

10. Binary Relational Operations

รท DIVISION Operation

Used when we need to find tuples in one relation that are related to ALL tuples in another relation. Useful for "for all" type queries.

Definition

R(Z) รท S(X) where X โІ Z. Let Y = Z โˆ’ X. The result contains tuples t such that for every tuple s in S, there is a tuple (t, s) in R.

Example

Find all customers who have accounts at ALL branches located in Brooklyn:

ฯ€customer_name, branch_name(depositor โ‹ˆ account) รท ฯ€branch_name(ฯƒbranch_city='Brooklyn'(branch))

11. Join Operations

โ‹ˆ NATURAL JOIN

Combines tuples from two relations based on common attributes. Automatically matches attributes with the same name and removes duplicate columns.

-- Natural join of instructor and teaches on common attribute ID instructor โ‹ˆ teaches -- Result has attributes: ID, name, dept_name, salary, course_id, sec_id, semester, year

Danger of Natural Join

Beware of unrelated attributes with the same name getting matched incorrectly!

โ‹ˆฮธ THETA JOIN (Conditional Join)

Combines Cartesian product with a selection condition ฮธ. More general than natural join.

R โ‹ˆcondition S = ฯƒcondition(R ร— S) -- Example: Join DEPARTMENT and EMPLOYEE where manager SSN matches DEPARTMENT โ‹ˆMGRSSN=SSN EMPLOYEE

=โ‹ˆ EQUIJOIN

A theta join where the condition uses only equality (=) comparisons. Most common type of join.

-- Find manager name for each department DEPT_MGR โ† DEPARTMENT โ‹ˆMGRSSN=SSN EMPLOYEE

Outer Joins

Outer joins preserve tuples that don't have matching tuples in the other relation by padding with NULLs.

โŸ• LEFT OUTER JOIN

Keeps all tuples from the LEFT relation. Unmatched tuples get NULL values for right relation attributes.

course โŸ• prereq

โŸ– RIGHT OUTER JOIN

Keeps all tuples from the RIGHT relation. Unmatched tuples get NULL values for left relation attributes.

course โŸ– prereq

โŸ— FULL OUTER JOIN

Keeps all tuples from BOTH relations. Unmatched tuples from either side get NULLs.

course โŸ— prereq

12. Aggregate Functions

Aggregate Functions

Functions that compute summary values from collections of data. Cannot be expressed in basic relational algebra.

Common Functions

  • COUNT: Number of values/tuples
  • SUM: Sum of numeric values
  • AVG: Average of numeric values
  • MAX: Maximum value
  • MIN: Minimum value

Notation

grouping_attrsโ„ฑfunction_list(R) -- Examples: โ„ฑMAX Salary(Employee) -- Maximum salary DNOโ„ฑCOUNT SSN, AVG Salary(Employee) -- Count and average per department

13. Relational Algebra Query Examples

Q1: Retrieve name and address of all employees who work for the 'Research' department

โ–ผ
RESEARCH_DEPT โ† ฯƒDNAME='Research'(DEPARTMENT) RESEARCH_EMPS โ† (RESEARCH_DEPT โ‹ˆDNUMBER=DNO EMPLOYEE) RESULT โ† ฯ€FNAME, LNAME, ADDRESS(RESEARCH_EMPS)

Q2: Retrieve the names of employees who have no dependents

โ–ผ
ALL_EMPS โ† ฯ€SSN(EMPLOYEE) EMPS_WITH_DEPS(SSN) โ† ฯ€ESSN(DEPENDENT) EMPS_WITHOUT_DEPS โ† ALL_EMPS โˆ’ EMPS_WITH_DEPS RESULT โ† ฯ€LNAME, FNAME(EMPS_WITHOUT_DEPS โ‹ˆ EMPLOYEE)

Q3: Find the largest salary in the university

โ–ผ
-- Step 1: Find salaries that are NOT maximum (less than some other) NOT_MAX โ† ฯ€instructor.salary(ฯƒinstructor.salary < d.salary(instructor ร— ฯd(instructor))) -- Step 2: Maximum = All salaries - Non-maximum salaries MAX_SALARY โ† ฯ€salary(instructor) โˆ’ NOT_MAX

Q4: Find courses offered in Fall 2009 AND Spring 2010

โ–ผ
FALL_2009 โ† ฯ€course_id(ฯƒsemester='Fall' โˆง year=2009(section)) SPRING_2010 โ† ฯ€course_id(ฯƒsemester='Spring' โˆง year=2010(section)) RESULT โ† FALL_2009 โˆฉ SPRING_2010

Q5: Find names of instructors in Physics along with course IDs they taught

โ–ผ
-- Method 1: ฯ€instructor.ID, course_id(ฯƒdept_name='Physics'(ฯƒinstructor.ID=teaches.ID(instructor ร— teaches))) -- Method 2 (more efficient): ฯ€instructor.ID, course_id(ฯƒinstructor.ID=teaches.ID(ฯƒdept_name='Physics'(instructor) ร— teaches))

14. Practice Questions

Explain the following Relational Algebra operators with suitable examples: Select, Project, Union, Rename, Natural Join [10 marks]

โ–ผ

1. SELECT (ฯƒ) - Selection Operation:

  • Unary operator that selects tuples satisfying a predicate
  • Horizontal filtering of rows
  • Notation: ฯƒp(r) where p is the predicate

Example: ฯƒsalary > 50000(EMPLOYEE) - selects employees with salary > 50000

2. PROJECT (ฯ€) - Projection Operation:

  • Unary operator that returns specified attributes
  • Vertical filtering of columns
  • Removes duplicate tuples

Example: ฯ€name, salary(EMPLOYEE) - projects only name and salary

3. UNION (โˆช):

  • Binary operator combining tuples from both relations
  • Relations must be union-compatible
  • Eliminates duplicates

Example: ฯ€name(STUDENT) โˆช ฯ€name(INSTRUCTOR)

4. RENAME (ฯ):

  • Renames relation and/or attributes
  • Used for self-joins and clarifying expressions

Example: ฯS(ID, Name)(EMPLOYEE) - renames to S with new attribute names

5. NATURAL JOIN (โ‹ˆ):

  • Combines tuples from two relations on common attributes
  • Automatically matches same-named attributes
  • Removes duplicate attributes

Example: EMPLOYEE โ‹ˆ DEPARTMENT - joins on common DNO

Define: Super Key, Candidate Key, Primary Key, Foreign Key with examples [5 marks]

โ–ผ

Super Key: A set of attributes that uniquely identifies each tuple. May contain extra attributes.

Example: {SSN}, {SSN, Name}, {SSN, Salary} are all super keys of EMPLOYEE

Candidate Key: A minimal super key - removing any attribute makes it non-unique.

Example: For CAR relation - {State, Reg#} and {SerialNo} are candidate keys

Primary Key: The candidate key chosen by designer to identify tuples. Underlined in schema.

Example: EMPLOYEE(SSN, Name, Address) - SSN is primary key

Foreign Key: Attribute(s) in one relation that reference the primary key of another relation.

Example: DNO in EMPLOYEE references DNUMBER in DEPARTMENT

Explain ER to Relational Mapping algorithm with all steps [10 marks]

โ–ผ

Step 1 - Strong Entity: Create relation with all simple attributes. Entity's key becomes PK.

Step 2 - Weak Entity: Create relation with all attributes + owner's PK as FK. PK = Owner's PK + Partial key.

Step 3 - Binary 1:1: Add FK to one relation (preferably total participation side).

Step 4 - Binary 1:N: Add PK of "1" side as FK in "N" side relation.

Step 5 - Binary M:N: Create new relation with both PKs as FKs, forming composite PK.

Step 6 - Multivalued Attribute: Create new relation with attribute + owner's PK.

Step 7 - N-ary Relationship: Create relation with all participating PKs as FKs.

Step 8 - Specialization/Generalization: Options include separate tables for super/subclasses or combined tables with type attributes.

Explain different types of Join operations in Relational Algebra [5 marks]

โ–ผ

1. Natural Join (โ‹ˆ): Joins on all common attributes, removes duplicates.

2. Theta Join (โ‹ˆฮธ): Cartesian product followed by selection on condition ฮธ.

3. Equijoin: Theta join where condition uses only equality (=).

4. Left Outer Join (โŸ•): Keeps all left tuples, NULLs for unmatched right.

5. Right Outer Join (โŸ–): Keeps all right tuples, NULLs for unmatched left.

6. Full Outer Join (โŸ—): Keeps all tuples from both, NULLs where no match.