Relational Database Design
Functional Dependencies, Normalization, and Normal Forms (1NF to BCNF)
🎯 100% Exam Appearance Topic
Normalization appears in every examination paper as a 10-mark question. Master all normal forms with examples!
1. Introduction to Database Design
📖 What is Relational Database Design?
Relational database design is the process of grouping attributes to form "good" relation schemas. It involves organizing data to minimize redundancy and dependency issues while maintaining data integrity.
Two Levels of Relation Schemas
Logical "User View" Level
How users perceive and interact with the data. Focuses on what data is stored and how it's organized logically.
Storage "Base Relation" Level
How data is physically stored in the database. Design is primarily concerned with this level.
What Makes a "Good" Relation?
We use concepts of functional dependencies and normal forms to evaluate and improve database designs:
- 1NF - First Normal Form
- 2NF - Second Normal Form
- 3NF - Third Normal Form
- BCNF - Boyce-Codd Normal Form
2. Informal Design Guidelines
Semantics of Relation Attributes
Each tuple should represent one entity or relationship instance. Attributes of different entities (EMPLOYEE, DEPARTMENT, PROJECT) should not be mixed in the same relation. Only foreign keys should be used to refer to other entities.
💡 Bottom Line
Design a schema that can be explained easily, relation by relation. The semantics of attributes should be easy to interpret.
Minimize Redundant Information
Design schemas that do not suffer from insertion, deletion, and update anomalies. If anomalies exist, document them so applications can handle them appropriately.
Reduce NULL Values
Relations should be designed to have as few NULL values as possible. Frequently NULL attributes should be placed in separate relations with the primary key.
Avoid Spurious Tuples
Relations should be designed to satisfy the lossless join condition. No spurious (false) tuples should be generated by natural-join operations.
3. Update Anomalies
Mixing attributes of multiple entities in a single relation causes several problems:
📝 Example: Poorly Designed Relation
Consider: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours, Plocation, Dnum)
This combines employee and project information, leading to the following anomalies:
🔄 Update Anomaly
Changing project P1's name from "Billing" to "Customer-Accounting" requires updating all 100 rows where employees work on P1.
➕ Insertion Anomaly
Cannot insert a new project unless an employee is assigned to it. Cannot add an employee without assigning them to a project.
➖ Deletion Anomaly
Deleting a project removes all employees who work only on that project. Deleting the sole employee on a project deletes the project information.
Example: EMP_DEPT Relation with Redundancy
| Emp# | Ename | Dept# | Dname | Dmgr_ssn |
|---|---|---|---|---|
| E1 | John | D5 | Research | 123456789 |
| E2 | Smith | D5 | Research | 123456789 |
| E3 | Jane | D5 | Research | 123456789 |
⚠️ Department information is repeated for every employee - this is redundancy!
4. Functional Dependencies (FDs)
📖 Definition
A functional dependency X → Y holds if whenever two tuples have the same value for X, they must have the same value for Y. In other words, the value of X uniquely determines the value of Y.
Notation
X → Y reads as "X functionally determines Y" or "Y is functionally dependent on X"
- X is called the determinant (left-hand side)
- Y is called the dependent (right-hand side)
Examples of Functional Dependencies
Properties of FDs
- An FD is a property of the schema R, not a particular instance
- The constraint must hold on every relation instance r(R)
- If K is a key of R, then K functionally determines all attributes in R
- FDs are derived from real-world constraints on the attributes
Closure of a Set of FDs (F⁺)
📖 Closure F⁺
The closure of F (denoted F⁺) is the set of all functional dependencies that can be inferred from F. It includes F itself plus all dependencies that logically follow from F.
📝 Example
Given: F = {SSN → {ENAME, BDATE, ADDRESS, DNUMBER}, DNUMBER → {DNAME, DMGRSSN}}
Inferred FDs in F⁺:
- SSN → DNAME (by transitivity)
- SSN → DMGRSSN (by transitivity)
- SSN → SSN (reflexive)
- DNUMBER → DNAME (decomposition)
5. Inference Rules for FDs (Armstrong's Axioms)
Given a set of FDs F, we can infer additional FDs using these rules:
IR1: Reflexive Rule
If Y ⊆ X, then X → Y
Any set of attributes determines its subset
IR2: Augmentation Rule
If X → Y, then XZ → YZ
Adding same attributes to both sides preserves FD
IR3: Transitive Rule
If X → Y and Y → Z, then X → Z
Dependencies can be chained
📝 Sound and Complete
Armstrong's axioms (IR1, IR2, IR3) are sound (only derive valid FDs) and complete (can derive all FDs in F⁺).
Additional Derived Rules
| Rule | Definition | Derived From |
|---|---|---|
| Decomposition | If X → YZ, then X → Y and X → Z | IR1, IR3 |
| Union | If X → Y and X → Z, then X → YZ | IR2, IR3 |
| Pseudotransitivity | If X → Y and WY → Z, then WX → Z | IR2, IR3 |
6. The Normalization Process
📖 What is Normalization?
Normalization is the process of decomposing unsatisfactory "bad" relations by breaking up their attributes into smaller, well-structured relations. It eliminates redundancy and minimizes anomalies.
Why Normalize?
- Eliminate redundant data (same data in multiple places)
- Ensure data dependencies make sense (related data stored together)
- Reduce data modification anomalies
- Simplify query operations
Normal Forms Progression
⚠️ Key Definitions
- Prime Attribute: An attribute that is a member of some candidate key
- Non-Prime Attribute: An attribute that is NOT a member of any candidate key
- Candidate Key: A minimal superkey (removing any attribute loses the uniqueness property)
- Superkey: A set of attributes that uniquely identifies tuples
7. First Normal Form (1NF)
First Normal Form
📖 Definition
A relation is in 1NF if and only if every attribute contains only atomic (indivisible) values. No repeating groups, composite attributes, multivalued attributes, or nested relations are allowed.
1NF Violation Example
Before 1NF:
| Student_ID | Name | Phone_Numbers |
|---|---|---|
| S1 | John | 9876543210, 9123456789 |
| S2 | Jane | 8765432109 |
❌ Phone_Numbers contains multiple values - NOT atomic!
After 1NF:
| Student_ID | Name | Phone_Number |
|---|---|---|
| S1 | John | 9876543210 |
| S1 | John | 9123456789 |
| S2 | Jane | 8765432109 |
✓ Each cell contains exactly one value - 1NF satisfied!
📝 Note
1NF is considered part of the basic definition of a relation in modern database theory.
8. Second Normal Form (2NF)
Second Normal Form
📖 Definition
A relation is in 2NF if:
- It is in 1NF, AND
- Every non-prime attribute is fully functionally dependent on the primary key (no partial dependencies)
Full vs Partial Functional Dependency
- Full FD: X → Y where removing any attribute from X means the FD no longer holds
- Partial FD: X → Y where Y depends on only a part of a composite key X
2NF Violation Example
Relation: WORKS_ON(SSN, Pnumber, Hours, Ename, Pname, Plocation)
Primary Key: {SSN, Pnumber}
✓ {SSN, Pnumber} → Hours (Full dependency - OK)
❌ SSN → Ename (Partial dependency - VIOLATION!)
❌ Pnumber → {Pname, Plocation} (Partial dependency - VIOLATION!)
Decomposition to 2NF:
WORKS_ON
EMPLOYEE
PROJECT
WORKS_ON
📝 Key Point
2NF is relevant only when there is a composite primary key. If the primary key is a single attribute, 1NF automatically implies 2NF (no partial dependencies possible).
9. Third Normal Form (3NF)
Third Normal Form
📖 Definition
A relation is in 3NF if:
- It is in 2NF, AND
- No non-prime attribute is transitively dependent on the primary key
Transitive Dependency: X → Y and Y → Z implies X → Z (where Y is not a candidate key)
Alternative Definition (More General)
A relation R is in 3NF if, for every non-trivial FD X → A:
- X is a superkey of R, OR
- A is a prime attribute (part of some candidate key)
3NF Violation Example
Relation: EMPLOYEE(Emp_ID, Ename, Zip_Code, City)
Emp_ID → Zip_Code (Direct dependency)
Zip_Code → City (Direct dependency)
❌ Emp_ID → City (Transitive via Zip_Code - VIOLATION!)
Since City depends on Zip_Code (not a candidate key), and Zip_Code depends on Emp_ID, we have a transitive dependency.
Decomposition to 3NF:
EMPLOYEE
EMPLOYEE
LOCATION
📝 Example: SSN → DMGRSSN Transitive Dependency
Consider: SSN → DNUMBER and DNUMBER → DMGRSSN
This means SSN → DMGRSSN transitively (through DNUMBER)
To fix: Separate the DEPARTMENT information into its own relation.
10. Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form
📖 Definition
A relation R is in BCNF if for every non-trivial functional dependency X → A:
X is a superkey of R
In other words, the left-hand side of every FD must be a superkey.
BCNF vs 3NF
| Aspect | 3NF | BCNF |
|---|---|---|
| Definition | X is superkey OR A is prime | X must be superkey (stricter) |
| Strength | Less strict | More strict (⊆ 3NF) |
| Dependency Preservation | Always possible | May not be possible |
Example: 3NF but NOT BCNF
Relation: TEACH(Student, Course, Instructor)
Constraint: Each instructor teaches only one course
{Student, Course} → Instructor (PK determines Instructor)
Instructor → Course (Instructor is NOT a superkey - BCNF VIOLATION!)
{Student, Course} is a candidate key. Since Instructor → Course exists and Instructor is not a superkey, this violates BCNF.
However, Course is a prime attribute (part of the key), so 3NF is satisfied!
Decomposition to BCNF:
TEACH
INSTRUCTOR_COURSE
STUDENT_INSTRUCTOR
⚠️ Note: The FD {Student, Course} → Instructor is lost in this decomposition!
⚠️ BCNF Trade-off
BCNF is the "gold standard" for eliminating all redundancy based on FDs. However, achieving BCNF may require sacrificing dependency preservation. In practice, 3NF is often acceptable.
11. Lossless (Non-Additive) Decomposition
📖 Lossless Join Property
A decomposition D = {R₁, R₂, ..., Rₘ} of R has the lossless (non-additive) join property if for every relation instance r of R:
πR₁(r) ⋈ πR₂(r) ⋈ ... ⋈ πRₘ(r) = r
This means joining the decomposed relations gives back the original relation exactly - no spurious tuples are added.
Binary Decomposition Test
Property LJ1
A decomposition D = {R₁, R₂} of R is lossless with respect to F if and only if:
- (R₁ ∩ R₂) → (R₁ - R₂) is in F⁺, OR
- (R₁ ∩ R₂) → (R₂ - R₁) is in F⁺
In simple terms: the common attributes must be a key for at least one of the decomposed relations.
Dependency Preservation
📖 Definition
A decomposition is dependency preserving if the union of FDs that can be checked in each decomposed relation is equivalent to the original set of FDs. This allows constraints to be verified without performing joins.
Lossless Join
CRITICAL PROPERTY - Must always be satisfied. Cannot be sacrificed.
Dependency Preservation
Desirable but may be sacrificed to achieve BCNF.
12. Normal Forms Comparison
| Normal Form | Requirement | Eliminates |
|---|---|---|
| 1NF | All attributes contain atomic values only | Repeating groups, multivalued attributes |
| 2NF | 1NF + No partial dependencies | Redundancy from partial key dependencies |
| 3NF | 2NF + No transitive dependencies on non-prime attributes | Redundancy from transitive dependencies |
| BCNF | Every determinant is a superkey | All redundancy from functional dependencies |
Normal Form Hierarchy
All Relations
1NF
2NF
3NF
BCNF
Every BCNF relation is in 3NF, every 3NF relation is in 2NF, etc.
📝 Practical Guidelines
- Database designers typically normalize to 3NF or BCNF
- Higher normal forms (4NF, 5NF) exist but are rarely needed in practice
- Denormalization may be used for performance optimization
13. Practice Questions
Explain 1NF, 2NF, 3NF, and BCNF with suitable examples. (10 Marks - Most Frequent)
▼First Normal Form (1NF)
Definition: A relation is in 1NF if all attributes contain only atomic (indivisible) values.
Violation: Student(ID, Name, Subjects) where Subjects = "Math, Physics" (non-atomic)
Resolution: Separate rows for each subject.
Second Normal Form (2NF)
Definition: A relation is in 2NF if it's in 1NF and every non-prime attribute is fully functionally dependent on the primary key.
Violation: Grades(Student_ID, Course_ID, Professor_Name) where Professor_Name depends only on Course_ID.
Resolution: Decompose into Courses(Course_ID, Professor_Name) and Grades(Student_ID, Course_ID).
Third Normal Form (3NF)
Definition: A relation is in 3NF if it's in 2NF and no non-prime attribute is transitively dependent on the primary key.
Violation: Employee(Emp_ID, Zip_Code, City) where Emp_ID → Zip_Code → City
Resolution: Decompose into Employee(Emp_ID, Zip_Code) and Location(Zip_Code, City).
Boyce-Codd Normal Form (BCNF)
Definition: A relation is in BCNF if for every non-trivial FD X → Y, X is a superkey.
Difference from 3NF: BCNF does not allow the exception for prime attributes. It's stricter than 3NF.
Status: BCNF is the "gold standard" for eliminating all FD-based redundancy.
Why is there a need for normalization?
▼Normalization is needed to:
- Eliminate Redundancy: Avoid storing the same data in multiple places, saving storage space.
- Prevent Update Anomalies: Ensure that modifying data in one place doesn't require changes in multiple locations.
- Prevent Insertion Anomalies: Allow adding new data without requiring unrelated data to exist.
- Prevent Deletion Anomalies: Ensure that deleting some data doesn't unintentionally remove other important data.
- Ensure Data Integrity: Maintain consistency and accuracy of data.
- Simplify Queries: Well-structured tables are easier to query and maintain.
What is a Functional Dependency? Explain with example.
▼Definition: A functional dependency X → Y holds if whenever two tuples have the same value for X, they must have the same value for Y.
Examples:
- SSN → Name: Given a Social Security Number, the employee name is uniquely determined.
- Student_ID → {Name, DOB, Address}: A student ID determines all student attributes.
- {Course_ID, Semester} → Instructor: A course in a specific semester has one instructor.
Properties:
- FDs are constraints on the schema, not on specific data
- If K is a key, then K → all attributes
- FDs are derived from real-world semantics
Explain lossless decomposition with suitable example.
▼Definition: A decomposition is lossless (non-additive) if joining the decomposed relations produces exactly the original relation without any spurious tuples.
Test for Binary Decomposition:
D = {R₁, R₂} is lossless if: (R₁ ∩ R₂) → (R₁ - R₂) OR (R₁ ∩ R₂) → (R₂ - R₁)
Example:
Original: EMPLOYEE(Emp_ID, Name, Dept_ID, Dept_Name)
FD: Dept_ID → Dept_Name
Decomposition:
- R₁ = EMPLOYEE(Emp_ID, Name, Dept_ID)
- R₂ = DEPARTMENT(Dept_ID, Dept_Name)
R₁ ∩ R₂ = {Dept_ID}
R₂ - R₁ = {Dept_Name}
Since Dept_ID → Dept_Name is in F, this is a lossless decomposition.
Given relation R(A, B, C, D) with FDs: A→B, B→C. Identify the highest normal form.
▼Step 1: Find Candidate Key
A⁺ = {A, B, C} (A determines B, B determines C)
To get D, we need {A, D}
Candidate Key = {A, D}
Step 2: Identify Prime and Non-Prime Attributes
- Prime: A, D
- Non-Prime: B, C
Step 3: Check Normal Forms
- 1NF: ✓ (Assuming atomic values)
- 2NF: ❌ A→B is a partial dependency (A is part of key {A,D})
Answer: The relation is in 1NF only.
To normalize to 3NF:
- R1(A, B)
- R2(B, C)
- R3(A, D)
Explain Armstrong's Inference Rules.
▼Armstrong's axioms are a set of inference rules for deriving all functional dependencies:
Primary Rules (Axioms):
- Reflexivity: If Y ⊆ X, then X → Y
Example: ABC → AB, ABC → A - Augmentation: If X → Y, then XZ → YZ
Example: If A → B, then AC → BC - Transitivity: If X → Y and Y → Z, then X → Z
Example: If A → B and B → C, then A → C
Derived Rules:
- Decomposition: If X → YZ, then X → Y and X → Z
- Union: If X → Y and X → Z, then X → YZ
- Pseudotransitivity: If X → Y and WY → Z, then WX → Z
These rules are sound (derive only valid FDs) and complete (can derive all FDs in F⁺).