Module V: Relational Database Design

1. Introduction to Database Design

📖 What is Relational Database Design?

Relational database design is the process of grouping attributes to form "good" relation schemas. It involves organizing data to minimize redundancy and dependency issues while maintaining data integrity.

Two Levels of Relation Schemas

Logical "User View" Level

How users perceive and interact with the data. Focuses on what data is stored and how it's organized logically.

Storage "Base Relation" Level

How data is physically stored in the database. Design is primarily concerned with this level.

What Makes a "Good" Relation?

We use concepts of functional dependencies and normal forms to evaluate and improve database designs:

1NF - First Normal Form
2NF - Second Normal Form
3NF - Third Normal Form
BCNF - Boyce-Codd Normal Form

2. Informal Design Guidelines

1

Semantics of Relation Attributes

Each tuple should represent one entity or relationship instance. Attributes of different entities (EMPLOYEE, DEPARTMENT, PROJECT) should not be mixed in the same relation. Only foreign keys should be used to refer to other entities.

💡 Bottom Line

Design a schema that can be explained easily, relation by relation. The semantics of attributes should be easy to interpret.

2

Minimize Redundant Information

Design schemas that do not suffer from insertion, deletion, and update anomalies. If anomalies exist, document them so applications can handle them appropriately.

3

Reduce NULL Values

Relations should be designed to have as few NULL values as possible. Frequently NULL attributes should be placed in separate relations with the primary key.

4

Avoid Spurious Tuples

Relations should be designed to satisfy the lossless join condition. No spurious (false) tuples should be generated by natural-join operations.

3. Update Anomalies

Mixing attributes of multiple entities in a single relation causes several problems:

📝 Example: Poorly Designed Relation

Consider: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours, Plocation, Dnum)

This combines employee and project information, leading to the following anomalies:

🔄 Update Anomaly

Changing project P1's name from "Billing" to "Customer-Accounting" requires updating all 100 rows where employees work on P1.

➕ Insertion Anomaly

Cannot insert a new project unless an employee is assigned to it. Cannot add an employee without assigning them to a project.

➖ Deletion Anomaly

Deleting a project removes all employees who work only on that project. Deleting the sole employee on a project deletes the project information.

Example: EMP_DEPT Relation with Redundancy

Emp#	Ename	Dept#	Dname	Dmgr_ssn
E1	John	D5	Research	123456789
E2	Smith	D5	Research	123456789
E3	Jane	D5	Research	123456789

⚠️ Department information is repeated for every employee - this is redundancy!

4. Functional Dependencies (FDs)

📖 Definition

A functional dependency X → Y holds if whenever two tuples have the same value for X, they must have the same value for Y. In other words, the value of X uniquely determines the value of Y.

Notation

X → Y reads as "X functionally determines Y" or "Y is functionally dependent on X"

X is called the determinant (left-hand side)
Y is called the dependent (right-hand side)

Examples of Functional Dependencies

SSN → ENAME // SSN determines employee name

PNUMBER → {PNAME, PLOCATION} // Project number determines name and location

{SSN, PNUMBER} → HOURS // Employee + Project determines hours worked

Properties of FDs

An FD is a property of the schema R, not a particular instance
The constraint must hold on every relation instance r(R)
If K is a key of R, then K functionally determines all attributes in R
FDs are derived from real-world constraints on the attributes

Closure of a Set of FDs (F⁺)

📖 Closure F⁺

The closure of F (denoted F⁺) is the set of all functional dependencies that can be inferred from F. It includes F itself plus all dependencies that logically follow from F.

📝 Example

Given: F = {SSN → {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER → {DNAME, DMGRSSN}}

Inferred FDs in F⁺:

SSN → DNAME (by transitivity)
SSN → DMGRSSN (by transitivity)
SSN → SSN (reflexive)
DNUMBER → DNAME (decomposition)

5. Inference Rules for FDs (Armstrong's Axioms)

Given a set of FDs F, we can infer additional FDs using these rules:

IR1: Reflexive Rule

If Y ⊆ X, then X → Y

Any set of attributes determines its subset

IR2: Augmentation Rule

If X → Y, then XZ → YZ

Adding same attributes to both sides preserves FD

IR3: Transitive Rule

If X → Y and Y → Z, then X → Z

Dependencies can be chained

📝 Sound and Complete

Armstrong's axioms (IR1, IR2, IR3) are sound (only derive valid FDs) and complete (can derive all FDs in F⁺).

Additional Derived Rules

Rule	Definition	Derived From
Decomposition	If X → YZ, then X → Y and X → Z	IR1, IR3
Union	If X → Y and X → Z, then X → YZ	IR2, IR3
Pseudotransitivity	If X → Y and WY → Z, then WX → Z	IR2, IR3

6. The Normalization Process

📖 What is Normalization?

Normalization is the process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller, well-structured relations. It eliminates redundancy and minimizes anomalies.

Why Normalize?

Eliminate redundant data (same data in multiple places)
Ensure data dependencies make sense (related data stored together)
Reduce data modification anomalies
Simplify query operations

Normal Forms Progression

UN

Unnormalized

→

1NF

Atomic Values

→

2NF

No Partial Dep.

→

3NF

No Transitive Dep.

→

BCNF

Every Det. is Key

⚠️ Key Definitions

Prime Attribute: An attribute that is a member of some candidate key
Non-Prime Attribute: An attribute that is NOT a member of any candidate key
Candidate Key: A minimal superkey (removing any attribute loses the uniqueness property)
Superkey: A set of attributes that uniquely identifies tuples

7. First Normal Form (1NF)

1NF

First Normal Form

📖 Definition

A relation is in 1NF if and only if every attribute contains only atomic (indivisible) values. No repeating groups, composite attributes, multivalued attributes, or nested relations are allowed.

1NF Violation Example

Before 1NF:

Student_ID	Name	Phone_Numbers
S1	John	9876543210, 9123456789
S2	Jane	8765432109

❌ Phone_Numbers contains multiple values - NOT atomic!

After 1NF:

Student_ID	Name	Phone_Number
S1	John	9876543210
S1	John	9123456789
S2	Jane	8765432109

✓ Each cell contains exactly one value - 1NF satisfied!

📝 Note

1NF is considered part of the basic definition of a relation in modern database theory.

8. Second Normal Form (2NF)

2NF

Second Normal Form

📖 Definition

A relation is in 2NF if:

It is in 1NF, AND
Every non-prime attribute is fully functionally dependent on the primary key (no partial dependencies)

Full vs Partial Functional Dependency

Full FD: X → Y where removing any attribute from X means the FD no longer holds
Partial FD: X → Y where Y depends on only a part of a composite key X

2NF Violation Example

Relation: WORKS_ON(SSN, Pnumber, Hours, Ename, Pname, Plocation)

Primary Key: {SSN, Pnumber}

✓ {SSN, Pnumber} → Hours (Full dependency - OK)

❌ SSN → Ename (Partial dependency - VIOLATION!)

❌ Pnumber → {Pname, Plocation} (Partial dependency - VIOLATION!)

Decomposition to 2NF:

WORKS_ON

(SSN, Pnumber, Hours, Ename, Pname, Plocation)

→

EMPLOYEE

(SSN, Ename)

PROJECT

(Pnumber, Pname, Plocation)

WORKS_ON

(SSN, Pnumber, Hours)

📝 Key Point

2NF is relevant only when there is a composite primary key. If the primary key is a single attribute, 1NF automatically implies 2NF (no partial dependencies possible).

9. Third Normal Form (3NF)

3NF

Third Normal Form

📖 Definition

A relation is in 3NF if:

It is in 2NF, AND
No non-prime attribute is transitively dependent on the primary key

Transitive Dependency: X → Y and Y → Z implies X → Z (where Y is not a candidate key)

Alternative Definition (More General)

A relation R is in 3NF if, for every non-trivial FD X → A:

X is a superkey of R, OR
A is a prime attribute (part of some candidate key)

3NF Violation Example

Relation: EMPLOYEE(Emp_ID, Ename, Zip_Code, City)

Emp_ID → Zip_Code (Direct dependency)

Zip_Code → City (Direct dependency)

❌ Emp_ID → City (Transitive via Zip_Code - VIOLATION!)

Since City depends on Zip_Code (not a candidate key), and Zip_Code depends on Emp_ID, we have a transitive dependency.

Decomposition to 3NF:

EMPLOYEE

(Emp_ID, Ename, Zip_Code, City)

→

EMPLOYEE

(Emp_ID, Ename, Zip_Code)

LOCATION

(Zip_Code, City)

📝 Example: SSN → DMGRSSN Transitive Dependency

Consider: SSN → DNUMBER and DNUMBER → DMGRSSN

This means SSN → DMGRSSN transitively (through DNUMBER)

To fix: Separate the DEPARTMENT information into its own relation.

10. Boyce-Codd Normal Form (BCNF)

BCNF

Boyce-Codd Normal Form

📖 Definition

A relation R is in BCNF if for every non-trivial functional dependency X → A:

X is a superkey of R

In other words, the left-hand side of every FD must be a superkey.

BCNF vs 3NF

Aspect	3NF	BCNF
Definition	X is superkey OR A is prime	X must be superkey (stricter)
Strength	Less strict	More strict (⊆ 3NF)
Dependency Preservation	Always possible	May not be possible

Example: 3NF but NOT BCNF

Relation: TEACH(Student, Course, Instructor)

Constraint: Each instructor teaches only one course

{Student, Course} → Instructor (PK determines Instructor)

Instructor → Course (Instructor is NOT a superkey - BCNF VIOLATION!)

{Student, Course} is a candidate key. Since Instructor → Course exists and Instructor is not a superkey, this violates BCNF.

However, Course is a prime attribute (part of the key), so 3NF is satisfied!

Decomposition to BCNF:

TEACH

(Student, Course, Instructor)

→

INSTRUCTOR_COURSE

(Instructor, Course)

STUDENT_INSTRUCTOR

(Student, Instructor)

⚠️ Note: The FD {Student, Course} → Instructor is lost in this decomposition!

⚠️ BCNF Trade-off

BCNF is the "gold standard" for eliminating all redundancy based on FDs. However, achieving BCNF may require sacrificing dependency preservation. In practice, 3NF is often acceptable.

11. Lossless (Non-Additive) Decomposition

📖 Lossless Join Property

A decomposition D = {R₁, R₂, ..., Rₘ} of R has the lossless (non-additive) join property if for every relation instance r of R:

π_R₁(r) ⋈ π_R₂(r) ⋈ ... ⋈ π_Rₘ(r) = r

This means joining the decomposed relations gives back the original relation exactly - no spurious tuples are added.

Binary Decomposition Test

Property LJ1

A decomposition D = {R₁, R₂} of R is lossless with respect to F if and only if:

(R₁ ∩ R₂) → (R₁ - R₂) is in F⁺, OR
(R₁ ∩ R₂) → (R₂ - R₁) is in F⁺

In simple terms: the common attributes must be a key for at least one of the decomposed relations.

Dependency Preservation

📖 Definition

A decomposition is dependency preserving if the union of FDs that can be checked in each decomposed relation is equivalent to the original set of FDs. This allows constraints to be verified without performing joins.

Lossless Join

CRITICAL PROPERTY - Must always be satisfied. Cannot be sacrificed.

Dependency Preservation

Desirable but may be sacrificed to achieve BCNF.

12. Normal Forms Comparison

Normal Form	Requirement	Eliminates
1NF	All attributes contain atomic values only	Repeating groups, multivalued attributes
2NF	1NF + No partial dependencies	Redundancy from partial key dependencies
3NF	2NF + No transitive dependencies on non-prime attributes	Redundancy from transitive dependencies
BCNF	Every determinant is a superkey	All redundancy from functional dependencies

Normal Form Hierarchy

All Relations

1NF

2NF

3NF

BCNF

Every BCNF relation is in 3NF, every 3NF relation is in 2NF, etc.

📝 Practical Guidelines

Database designers typically normalize to 3NF or BCNF
Higher normal forms (4NF, 5NF) exist but are rarely needed in practice
Denormalization may be used for performance optimization

13. Practice Questions

Explain 1NF, 2NF, 3NF, and BCNF with suitable examples. (10 Marks - Most Frequent)

▼

First Normal Form (1NF)

Definition: A relation is in 1NF if all attributes contain only atomic (indivisible) values.

Violation: Student(ID, Name, Subjects) where Subjects = "Math, Physics" (non-atomic)

Resolution: Separate rows for each subject.

Second Normal Form (2NF)

Definition: A relation is in 2NF if it's in 1NF and every non-prime attribute is fully functionally dependent on the primary key.

Violation: Grades(Student_ID, Course_ID, Professor_Name) where Professor_Name depends only on Course_ID.

Resolution: Decompose into Courses(Course_ID, Professor_Name) and Grades(Student_ID, Course_ID).

Third Normal Form (3NF)

Definition: A relation is in 3NF if it's in 2NF and no non-prime attribute is transitively dependent on the primary key.

Violation: Employee(Emp_ID, Zip_Code, City) where Emp_ID → Zip_Code → City

Resolution: Decompose into Employee(Emp_ID, Zip_Code) and Location(Zip_Code, City).

Boyce-Codd Normal Form (BCNF)

Definition: A relation is in BCNF if for every non-trivial FD X → Y, X is a superkey.

Difference from 3NF: BCNF does not allow the exception for prime attributes. It's stricter than 3NF.

Status: BCNF is the "gold standard" for eliminating all FD-based redundancy.

Why is there a need for normalization?

▼

Normalization is needed to:

Eliminate Redundancy: Avoid storing the same data in multiple places, saving storage space.
Prevent Update Anomalies: Ensure that modifying data in one place doesn't require changes in multiple locations.
Prevent Insertion Anomalies: Allow adding new data without requiring unrelated data to exist.
Prevent Deletion Anomalies: Ensure that deleting some data doesn't unintentionally remove other important data.
Ensure Data Integrity: Maintain consistency and accuracy of data.
Simplify Queries: Well-structured tables are easier to query and maintain.

What is a Functional Dependency? Explain with example.

▼

Definition: A functional dependency X → Y holds if whenever two tuples have the same value for X, they must have the same value for Y.

Examples:

SSN → Name: Given a Social Security Number, the employee name is uniquely determined.
Student_ID → {Name, DOB, Address}: A student ID determines all student attributes.
{Course_ID, Semester} → Instructor: A course in a specific semester has one instructor.

Properties:

FDs are constraints on the schema, not on specific data
If K is a key, then K → all attributes
FDs are derived from real-world semantics

Explain lossless decomposition with suitable example.

▼

Definition: A decomposition is lossless (non-additive) if joining the decomposed relations produces exactly the original relation without any spurious tuples.

Test for Binary Decomposition:

D = {R₁, R₂} is lossless if: (R₁ ∩ R₂) → (R₁ - R₂) OR (R₁ ∩ R₂) → (R₂ - R₁)

Example:

Original: EMPLOYEE(Emp_ID, Name, Dept_ID, Dept_Name)

FD: Dept_ID → Dept_Name

Decomposition:

R₁ = EMPLOYEE(Emp_ID, Name, Dept_ID)
R₂ = DEPARTMENT(Dept_ID, Dept_Name)

R₁ ∩ R₂ = {Dept_ID}

R₂ - R₁ = {Dept_Name}

Since Dept_ID → Dept_Name is in F, this is a lossless decomposition.

Given relation R(A, B, C, D) with FDs: A→B, B→C. Identify the highest normal form.

▼

Step 1: Find Candidate Key

A⁺ = {A, B, C} (A determines B, B determines C)

To get D, we need {A, D}

Candidate Key = {A, D}

Step 2: Identify Prime and Non-Prime Attributes

Prime: A, D
Non-Prime: B, C

Step 3: Check Normal Forms

1NF: ✓ (Assuming atomic values)
2NF: ❌ A→B is a partial dependency (A is part of key {A,D})

Answer: The relation is in 1NF only.

To normalize to 3NF:

R1(A, B)
R2(B, C)
R3(A, D)

Explain Armstrong's Inference Rules.

▼

Armstrong's axioms are a set of inference rules for deriving all functional dependencies:

Primary Rules (Axioms):

Reflexivity: If Y ⊆ X, then X → Y
Example: ABC → AB, ABC → A
Augmentation: If X → Y, then XZ → YZ
Example: If A → B, then AC → BC
Transitivity: If X → Y and Y → Z, then X → Z
Example: If A → B and B → C, then A → C

Derived Rules:

Decomposition: If X → YZ, then X → Y and X → Z
Union: If X → Y and X → Z, then X → YZ
Pseudotransitivity: If X → Y and WY → Z, then WX → Z

These rules are sound (derive only valid FDs) and complete (can derive all FDs in F⁺).

Relational Database Design

🎯 100% Exam Appearance Topic

1. Introduction to Database Design

📖 What is Relational Database Design?

Two Levels of Relation Schemas

Logical "User View" Level

Storage "Base Relation" Level

What Makes a "Good" Relation?

2. Informal Design Guidelines

Semantics of Relation Attributes

💡 Bottom Line

Minimize Redundant Information

Reduce NULL Values

Avoid Spurious Tuples

3. Update Anomalies

📝 Example: Poorly Designed Relation

🔄 Update Anomaly

➕ Insertion Anomaly

➖ Deletion Anomaly

Example: EMP_DEPT Relation with Redundancy

4. Functional Dependencies (FDs)

📖 Definition

Notation

Examples of Functional Dependencies

Properties of FDs

Closure of a Set of FDs (F⁺)

📖 Closure F⁺

📝 Example

5. Inference Rules for FDs (Armstrong's Axioms)

IR1: Reflexive Rule

IR2: Augmentation Rule

IR3: Transitive Rule

📝 Sound and Complete

Additional Derived Rules

6. The Normalization Process

📖 What is Normalization?

Why Normalize?

Normal Forms Progression

⚠️ Key Definitions

7. First Normal Form (1NF)

First Normal Form

📖 Definition

1NF Violation Example

📝 Note

8. Second Normal Form (2NF)

Second Normal Form

📖 Definition

Full vs Partial Functional Dependency

2NF Violation Example

WORKS_ON

EMPLOYEE

PROJECT

WORKS_ON

📝 Key Point

9. Third Normal Form (3NF)

Third Normal Form

📖 Definition

Alternative Definition (More General)

3NF Violation Example

EMPLOYEE

EMPLOYEE

LOCATION

📝 Example: SSN → DMGRSSN Transitive Dependency

10. Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form

📖 Definition

BCNF vs 3NF

Example: 3NF but NOT BCNF

TEACH

INSTRUCTOR_COURSE

STUDENT_INSTRUCTOR

⚠️ BCNF Trade-off

11. Lossless (Non-Additive) Decomposition

📖 Lossless Join Property

Binary Decomposition Test

Property LJ1

Dependency Preservation

📖 Definition

Lossless Join

Dependency Preservation