V

Relational Database Design

Functional Dependencies, Normalization, and Normal Forms (1NF to BCNF)

🎯 100% Exam Appearance Topic

Normalization appears in every examination paper as a 10-mark question. Master all normal forms with examples!

1. Introduction to Database Design

📖 What is Relational Database Design?

Relational database design is the process of grouping attributes to form "good" relation schemas. It involves organizing data to minimize redundancy and dependency issues while maintaining data integrity.

Two Levels of Relation Schemas

Logical "User View" Level

How users perceive and interact with the data. Focuses on what data is stored and how it's organized logically.

Storage "Base Relation" Level

How data is physically stored in the database. Design is primarily concerned with this level.

What Makes a "Good" Relation?

We use concepts of functional dependencies and normal forms to evaluate and improve database designs:

  • 1NF - First Normal Form
  • 2NF - Second Normal Form
  • 3NF - Third Normal Form
  • BCNF - Boyce-Codd Normal Form

2. Informal Design Guidelines

1

Semantics of Relation Attributes

Each tuple should represent one entity or relationship instance. Attributes of different entities (EMPLOYEE, DEPARTMENT, PROJECT) should not be mixed in the same relation. Only foreign keys should be used to refer to other entities.

💡 Bottom Line

Design a schema that can be explained easily, relation by relation. The semantics of attributes should be easy to interpret.

2

Minimize Redundant Information

Design schemas that do not suffer from insertion, deletion, and update anomalies. If anomalies exist, document them so applications can handle them appropriately.

3

Reduce NULL Values

Relations should be designed to have as few NULL values as possible. Frequently NULL attributes should be placed in separate relations with the primary key.

4

Avoid Spurious Tuples

Relations should be designed to satisfy the lossless join condition. No spurious (false) tuples should be generated by natural-join operations.

3. Update Anomalies

Mixing attributes of multiple entities in a single relation causes several problems:

📝 Example: Poorly Designed Relation

Consider: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours, Plocation, Dnum)

This combines employee and project information, leading to the following anomalies:

🔄 Update Anomaly

Changing project P1's name from "Billing" to "Customer-Accounting" requires updating all 100 rows where employees work on P1.

➕ Insertion Anomaly

Cannot insert a new project unless an employee is assigned to it. Cannot add an employee without assigning them to a project.

➖ Deletion Anomaly

Deleting a project removes all employees who work only on that project. Deleting the sole employee on a project deletes the project information.

Example: EMP_DEPT Relation with Redundancy
Emp# Ename Dept# Dname Dmgr_ssn
E1 John D5 Research 123456789
E2 Smith D5 Research 123456789
E3 Jane D5 Research 123456789

⚠️ Department information is repeated for every employee - this is redundancy!

4. Functional Dependencies (FDs)

📖 Definition

A functional dependency X → Y holds if whenever two tuples have the same value for X, they must have the same value for Y. In other words, the value of X uniquely determines the value of Y.

Notation

X → Y reads as "X functionally determines Y" or "Y is functionally dependent on X"

  • X is called the determinant (left-hand side)
  • Y is called the dependent (right-hand side)

Examples of Functional Dependencies

SSN ENAME // SSN determines employee name
PNUMBER {PNAME, PLOCATION} // Project number determines name and location
{SSN, PNUMBER} HOURS // Employee + Project determines hours worked

Properties of FDs

  • An FD is a property of the schema R, not a particular instance
  • The constraint must hold on every relation instance r(R)
  • If K is a key of R, then K functionally determines all attributes in R
  • FDs are derived from real-world constraints on the attributes

Closure of a Set of FDs (F⁺)

📖 Closure F⁺

The closure of F (denoted F⁺) is the set of all functional dependencies that can be inferred from F. It includes F itself plus all dependencies that logically follow from F.

📝 Example

Given: F = {SSN → {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER → {DNAME, DMGRSSN}}

Inferred FDs in F⁺:

  • SSN → DNAME (by transitivity)
  • SSN → DMGRSSN (by transitivity)
  • SSN → SSN (reflexive)
  • DNUMBER → DNAME (decomposition)

5. Inference Rules for FDs (Armstrong's Axioms)

Given a set of FDs F, we can infer additional FDs using these rules:

IR1: Reflexive Rule

If Y ⊆ X, then X → Y

Any set of attributes determines its subset

IR2: Augmentation Rule

If X → Y, then XZ → YZ

Adding same attributes to both sides preserves FD

IR3: Transitive Rule

If X → Y and Y → Z, then X → Z

Dependencies can be chained

📝 Sound and Complete

Armstrong's axioms (IR1, IR2, IR3) are sound (only derive valid FDs) and complete (can derive all FDs in F⁺).

Additional Derived Rules

Rule Definition Derived From
Decomposition If X → YZ, then X → Y and X → Z IR1, IR3
Union If X → Y and X → Z, then X → YZ IR2, IR3
Pseudotransitivity If X → Y and WY → Z, then WX → Z IR2, IR3

6. The Normalization Process

📖 What is Normalization?

Normalization is the process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller, well-structured relations. It eliminates redundancy and minimizes anomalies.

Why Normalize?

  • Eliminate redundant data (same data in multiple places)
  • Ensure data dependencies make sense (related data stored together)
  • Reduce data modification anomalies
  • Simplify query operations

Normal Forms Progression

UN
Unnormalized
1NF
Atomic Values
2NF
No Partial Dep.
3NF
No Transitive Dep.
BCNF
Every Det. is Key

⚠️ Key Definitions

  • Prime Attribute: An attribute that is a member of some candidate key
  • Non-Prime Attribute: An attribute that is NOT a member of any candidate key
  • Candidate Key: A minimal superkey (removing any attribute loses the uniqueness property)
  • Superkey: A set of attributes that uniquely identifies tuples

7. First Normal Form (1NF)

1NF

First Normal Form

📖 Definition

A relation is in 1NF if and only if every attribute contains only atomic (indivisible) values. No repeating groups, composite attributes, multivalued attributes, or nested relations are allowed.

1NF Violation Example

Before 1NF:

Student_ID Name Phone_Numbers
S1 John 9876543210, 9123456789
S2 Jane 8765432109

❌ Phone_Numbers contains multiple values - NOT atomic!

After 1NF:

Student_ID Name Phone_Number
S1 John 9876543210
S1 John 9123456789
S2 Jane 8765432109

✓ Each cell contains exactly one value - 1NF satisfied!

📝 Note

1NF is considered part of the basic definition of a relation in modern database theory.

8. Second Normal Form (2NF)

2NF

Second Normal Form

📖 Definition

A relation is in 2NF if:

  1. It is in 1NF, AND
  2. Every non-prime attribute is fully functionally dependent on the primary key (no partial dependencies)

Full vs Partial Functional Dependency

  • Full FD: X → Y where removing any attribute from X means the FD no longer holds
  • Partial FD: X → Y where Y depends on only a part of a composite key X

2NF Violation Example

Relation: WORKS_ON(SSN, Pnumber, Hours, Ename, Pname, Plocation)

Primary Key: {SSN, Pnumber}

✓ {SSN, Pnumber} → Hours (Full dependency - OK)

❌ SSN → Ename (Partial dependency - VIOLATION!)

❌ Pnumber → {Pname, Plocation} (Partial dependency - VIOLATION!)

Decomposition to 2NF:

WORKS_ON
(SSN, Pnumber, Hours, Ename, Pname, Plocation)
EMPLOYEE
(SSN, Ename)
PROJECT
(Pnumber, Pname, Plocation)
WORKS_ON
(SSN, Pnumber, Hours)

📝 Key Point

2NF is relevant only when there is a composite primary key. If the primary key is a single attribute, 1NF automatically implies 2NF (no partial dependencies possible).

9. Third Normal Form (3NF)

3NF

Third Normal Form

📖 Definition

A relation is in 3NF if:

  1. It is in 2NF, AND
  2. No non-prime attribute is transitively dependent on the primary key

Transitive Dependency: X → Y and Y → Z implies X → Z (where Y is not a candidate key)

Alternative Definition (More General)

A relation R is in 3NF if, for every non-trivial FD X → A:

  • X is a superkey of R, OR
  • A is a prime attribute (part of some candidate key)

3NF Violation Example

Relation: EMPLOYEE(Emp_ID, Ename, Zip_Code, City)

Emp_ID → Zip_Code (Direct dependency)

Zip_Code → City (Direct dependency)

❌ Emp_ID → City (Transitive via Zip_Code - VIOLATION!)

Since City depends on Zip_Code (not a candidate key), and Zip_Code depends on Emp_ID, we have a transitive dependency.

Decomposition to 3NF:

EMPLOYEE
(Emp_ID, Ename, Zip_Code, City)
EMPLOYEE
(Emp_ID, Ename, Zip_Code)
LOCATION
(Zip_Code, City)

📝 Example: SSN → DMGRSSN Transitive Dependency

Consider: SSN → DNUMBER and DNUMBER → DMGRSSN

This means SSN → DMGRSSN transitively (through DNUMBER)

To fix: Separate the DEPARTMENT information into its own relation.

10. Boyce-Codd Normal Form (BCNF)

BCNF

Boyce-Codd Normal Form

📖 Definition

A relation R is in BCNF if for every non-trivial functional dependency X → A:

X is a superkey of R

In other words, the left-hand side of every FD must be a superkey.

BCNF vs 3NF

Aspect 3NF BCNF
Definition X is superkey OR A is prime X must be superkey (stricter)
Strength Less strict More strict (⊆ 3NF)
Dependency Preservation Always possible May not be possible

Example: 3NF but NOT BCNF

Relation: TEACH(Student, Course, Instructor)

Constraint: Each instructor teaches only one course

{Student, Course} → Instructor (PK determines Instructor)

Instructor → Course (Instructor is NOT a superkey - BCNF VIOLATION!)

{Student, Course} is a candidate key. Since Instructor → Course exists and Instructor is not a superkey, this violates BCNF.

However, Course is a prime attribute (part of the key), so 3NF is satisfied!

Decomposition to BCNF:

TEACH
(Student, Course, Instructor)
INSTRUCTOR_COURSE
(Instructor, Course)
STUDENT_INSTRUCTOR
(Student, Instructor)

⚠️ Note: The FD {Student, Course} → Instructor is lost in this decomposition!

⚠️ BCNF Trade-off

BCNF is the "gold standard" for eliminating all redundancy based on FDs. However, achieving BCNF may require sacrificing dependency preservation. In practice, 3NF is often acceptable.

11. Lossless (Non-Additive) Decomposition

📖 Lossless Join Property

A decomposition D = {R₁, R₂, ..., Rₘ} of R has the lossless (non-additive) join property if for every relation instance r of R:

πR₁(r) ⋈ πR₂(r) ⋈ ... ⋈ πRₘ(r) = r

This means joining the decomposed relations gives back the original relation exactly - no spurious tuples are added.

Binary Decomposition Test

Property LJ1

A decomposition D = {R₁, R₂} of R is lossless with respect to F if and only if:

  • (R₁ ∩ R₂) → (R₁ - R₂) is in F⁺, OR
  • (R₁ ∩ R₂) → (R₂ - R₁) is in F⁺

In simple terms: the common attributes must be a key for at least one of the decomposed relations.

Dependency Preservation

📖 Definition

A decomposition is dependency preserving if the union of FDs that can be checked in each decomposed relation is equivalent to the original set of FDs. This allows constraints to be verified without performing joins.

Lossless Join

CRITICAL PROPERTY - Must always be satisfied. Cannot be sacrificed.

Dependency Preservation

Desirable but may be sacrificed to achieve BCNF.

12. Normal Forms Comparison

Normal Form Requirement Eliminates
1NF All attributes contain atomic values only Repeating groups, multivalued attributes
2NF 1NF + No partial dependencies Redundancy from partial key dependencies
3NF 2NF + No transitive dependencies on non-prime attributes Redundancy from transitive dependencies
BCNF Every determinant is a superkey All redundancy from functional dependencies

Normal Form Hierarchy

All Relations

1NF

2NF

3NF

BCNF

Every BCNF relation is in 3NF, every 3NF relation is in 2NF, etc.

📝 Practical Guidelines

  • Database designers typically normalize to 3NF or BCNF
  • Higher normal forms (4NF, 5NF) exist but are rarely needed in practice
  • Denormalization may be used for performance optimization

13. Practice Questions

Explain 1NF, 2NF, 3NF, and BCNF with suitable examples. (10 Marks - Most Frequent)

First Normal Form (1NF)

Definition: A relation is in 1NF if all attributes contain only atomic (indivisible) values.

Violation: Student(ID, Name, Subjects) where Subjects = "Math, Physics" (non-atomic)

Resolution: Separate rows for each subject.

Second Normal Form (2NF)

Definition: A relation is in 2NF if it's in 1NF and every non-prime attribute is fully functionally dependent on the primary key.

Violation: Grades(Student_ID, Course_ID, Professor_Name) where Professor_Name depends only on Course_ID.

Resolution: Decompose into Courses(Course_ID, Professor_Name) and Grades(Student_ID, Course_ID).

Third Normal Form (3NF)

Definition: A relation is in 3NF if it's in 2NF and no non-prime attribute is transitively dependent on the primary key.

Violation: Employee(Emp_ID, Zip_Code, City) where Emp_ID → Zip_Code → City

Resolution: Decompose into Employee(Emp_ID, Zip_Code) and Location(Zip_Code, City).

Boyce-Codd Normal Form (BCNF)

Definition: A relation is in BCNF if for every non-trivial FD X → Y, X is a superkey.

Difference from 3NF: BCNF does not allow the exception for prime attributes. It's stricter than 3NF.

Status: BCNF is the "gold standard" for eliminating all FD-based redundancy.

Why is there a need for normalization?

Normalization is needed to:

  1. Eliminate Redundancy: Avoid storing the same data in multiple places, saving storage space.
  2. Prevent Update Anomalies: Ensure that modifying data in one place doesn't require changes in multiple locations.
  3. Prevent Insertion Anomalies: Allow adding new data without requiring unrelated data to exist.
  4. Prevent Deletion Anomalies: Ensure that deleting some data doesn't unintentionally remove other important data.
  5. Ensure Data Integrity: Maintain consistency and accuracy of data.
  6. Simplify Queries: Well-structured tables are easier to query and maintain.

What is a Functional Dependency? Explain with example.

Definition: A functional dependency X → Y holds if whenever two tuples have the same value for X, they must have the same value for Y.

Examples:

  • SSN → Name: Given a Social Security Number, the employee name is uniquely determined.
  • Student_ID → {Name, DOB, Address}: A student ID determines all student attributes.
  • {Course_ID, Semester} → Instructor: A course in a specific semester has one instructor.

Properties:

  • FDs are constraints on the schema, not on specific data
  • If K is a key, then K → all attributes
  • FDs are derived from real-world semantics

Explain lossless decomposition with suitable example.

Definition: A decomposition is lossless (non-additive) if joining the decomposed relations produces exactly the original relation without any spurious tuples.

Test for Binary Decomposition:

D = {R₁, R₂} is lossless if: (R₁ ∩ R₂) → (R₁ - R₂) OR (R₁ ∩ R₂) → (R₂ - R₁)

Example:

Original: EMPLOYEE(Emp_ID, Name, Dept_ID, Dept_Name)

FD: Dept_ID → Dept_Name

Decomposition:

  • R₁ = EMPLOYEE(Emp_ID, Name, Dept_ID)
  • R₂ = DEPARTMENT(Dept_ID, Dept_Name)

R₁ ∩ R₂ = {Dept_ID}

R₂ - R₁ = {Dept_Name}

Since Dept_ID → Dept_Name is in F, this is a lossless decomposition.

Given relation R(A, B, C, D) with FDs: A→B, B→C. Identify the highest normal form.

Step 1: Find Candidate Key

A⁺ = {A, B, C} (A determines B, B determines C)

To get D, we need {A, D}

Candidate Key = {A, D}

Step 2: Identify Prime and Non-Prime Attributes

  • Prime: A, D
  • Non-Prime: B, C

Step 3: Check Normal Forms

  • 1NF: ✓ (Assuming atomic values)
  • 2NF: ❌ A→B is a partial dependency (A is part of key {A,D})

Answer: The relation is in 1NF only.

To normalize to 3NF:

  • R1(A, B)
  • R2(B, C)
  • R3(A, D)

Explain Armstrong's Inference Rules.

Armstrong's axioms are a set of inference rules for deriving all functional dependencies:

Primary Rules (Axioms):

  1. Reflexivity: If Y ⊆ X, then X → Y
    Example: ABC → AB, ABC → A
  2. Augmentation: If X → Y, then XZ → YZ
    Example: If A → B, then AC → BC
  3. Transitivity: If X → Y and Y → Z, then X → Z
    Example: If A → B and B → C, then A → C

Derived Rules:

  • Decomposition: If X → YZ, then X → Y and X → Z
  • Union: If X → Y and X → Z, then X → YZ
  • Pseudotransitivity: If X → Y and WY → Z, then WX → Z

These rules are sound (derive only valid FDs) and complete (can derive all FDs in F⁺).