Database System Concepts & Architecture
Introduction to DBMS, Architecture, and Applications
1. Introduction to Database Systems
What is a Database?
A database is a collection of related data organized in a way that data can be easily accessed, managed, and updated. It represents some aspect of the real world (called the mini-world or Universe of Discourse).
What is a Database Management System (DBMS)?
A DBMS is a software system that enables users to define, create, maintain, and control access to the database. It provides an interface between the data and the software applications or end users.
Key Functions of DBMS
Define
Specifying the types, structures, and constraints of the data to be stored in the database.
Construct
The process of storing the data on some storage medium that is controlled by the DBMS.
Manipulate
Functions such as querying the database, updating the database, and generating reports.
Share
Allowing multiple users and programs to access the database simultaneously.
2. Key Definitions
| Term | Definition | Example |
|---|---|---|
| Data | Raw, unprocessed facts that have no meaning on their own | "Prathamesh", "Navi Mumbai", "12/12/2000" |
| Information | Processed data that has meaning and context | "The age of Prathamesh is 24 years" |
| Database | Organized collection of related data | Online Railway Reservation System |
| DBMS | Software to manage databases | MySQL, Oracle, PostgreSQL |
| Schema | Logical structure/blueprint of the database | Table definitions, relationships |
| Instance | Actual content/data at a particular point in time | Current rows in tables |
Example: University Database
A university database application may include:
- Add new students, instructors, and courses
- Register students for courses and generate class rosters
- Assign grades to students, compute grade point averages (GPA)
- Generate transcripts and reports
3. Characteristics of Database Systems
Self-Describing Nature
A DBMS catalog stores the description of the database (called metadata). This allows the DBMS software to work with different databases without modification.
Program-Data Independence
The structure of data files is stored separately from access programs. Changes to data structure don't require changes to programs.
Data Abstraction
DBMS provides a conceptual representation of data that hides storage details. Users can view data without worrying about how it's stored.
Multiple Views of Data
Different users can have different views of the same database based on their requirements and permissions.
Sharing of Data and Multi-user Transaction Processing
DBMS allows multiple users to access the database concurrently while maintaining data integrity through concurrency control.
4. File System vs Database System
Why Not Use File Systems?
In the early days, database applications were built directly on top of file systems. This approach had significant drawbacks that led to the development of DBMS.
Drawbacks of File Systems
1. Data Redundancy & Inconsistency
Multiple file formats, duplication of information in different files leads to wastage of storage and inconsistent data.
2. Difficulty in Accessing Data
Need to write a new program to carry out each new task. No flexible query capabilities.
3. Data Isolation
Multiple files and formats make data retrieval complex. Data is scattered across various files.
4. Integrity Problems
Integrity constraints (e.g., account balance > 0) become "buried" in program code. Hard to add or change constraints.
5. Atomicity Problems
Failures may leave data in an inconsistent state with partial updates. Example: Transfer funds between accounts should complete fully or not at all.
6. Concurrent Access Anomalies
Uncontrolled concurrent accesses can lead to inconsistencies. Example: Two people withdrawing from the same account simultaneously.
7. Security Problems
Hard to provide user access to some, but not all, data. No centralized security control.
Comparison Table
| Feature | File System | DBMS |
|---|---|---|
| Data Redundancy | High redundancy | Controlled/Minimal |
| Data Consistency | Difficult to maintain | Ensured through constraints |
| Data Independence | Not supported | Fully supported |
| Concurrent Access | Limited/Problematic | Well-managed |
| Security | Limited, file-level | Comprehensive, user-level |
| Backup & Recovery | Manual, complex | Automated, reliable |
| Query Capability | Requires programming | SQL queries |
| Integrity Constraints | Embedded in programs | Defined in schema |
5. Data Abstraction
What is Data Abstraction?
Data abstraction is the process of hiding the complexity of data storage from users while providing a simplified interface. It allows users to interact with data without knowing the underlying implementation details.
Levels of Abstraction
Three Levels of Data Abstraction
View Level
Highest level - describes only part of the entire database. Different users have different views based on their needs. Hides complexity and provides security.
Example: A student view shows grades and courses; a payroll view shows salary information.
Logical Level
Middle level - describes what data is stored and relationships among data. This is where the DBA works to define the schema.
type instructor = record
ID : string;
name : string;
dept_name : string;
salary : integer;
end;
Physical Level
Lowest level - describes how data is actually stored. Deals with complex data structures, file organizations, access paths, and storage allocation.
Example: B-trees, hash files, block allocations, indexes.
6. Data Independence
What is Data Independence?
Data independence is the capacity to change the schema at one level of the database system without having to change the schema at the next higher level. It is a key advantage of the three-schema architecture.
Physical Data Independence
The capacity to change the internal schema without having to change the conceptual schema.
Examples:
- Creating a new index on the Salary column to speed up queries
- Moving the database file from a magnetic disk to an SSD
- Changing file organization from sequential to indexed
- Changing storage structures or access methods
Note: These changes should not require any alteration to the logical definition of the data or application programs.
Logical Data Independence
The capacity to change the conceptual schema without having to change the external schemas or application programs.
Examples:
- Adding a new field (e.g., Date_of_Birth) to the Employee table
- Adding a new entity type or relationship
- Modifying constraints on existing data
- Splitting or merging tables
Important: Logical data independence is harder to achieve than physical data independence because the conceptual schema is the central contract of the database.
7. DBMS System Architecture
The architecture of a database system is greatly influenced by the underlying computer system on which it runs. There are several types of architectures:
Centralized Architecture
All components (DBMS, application programs, user interfaces) run on a single computer. Suitable for small organizations with limited users.
Client-Server Architecture
Server provides database services; clients access the server. Two-tier (client → server) or three-tier (client → application server → database server).
Parallel Architecture
Multiple processors work together to process database operations. Increases throughput and reduces response time for large databases.
Distributed Architecture
Data is distributed across multiple sites connected by a network. Each site has its own DBMS but they work together as one system.
8. Three-Schema Architecture (ANSI/SPARC)
Exam Tip
This topic appears in almost every exam (85%+ frequency). Be prepared to draw the diagram and explain each level with examples.
Three-Schema Architecture Diagram
External Level (View Level)
- Highest level of abstraction
- Contains external schemas or user views
- Each view describes part of the database
- Different users see different views
- Provides security by hiding data
Conceptual Level (Logical Level)
- Community view of the database
- Describes what data is stored
- Defines entities, data types, relationships
- Hides physical storage details
- DBA operates at this level
Internal Level (Physical Level)
- Lowest level of abstraction
- Describes how data is stored
- Deals with file organizations
- Access paths (indexes, B-trees)
- Optimized for performance
9. Data Models
What is a Data Model?
A data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. It provides the underlying structure of a database.
Relational Model
Uses tables (relations) to represent data and relationships. Most widely used model today.
| ID | Name | Dept |
|---|---|---|
| 101 | John | CS |
| 102 | Jane | IT |
Entity-Relationship Model
Uses entities and relationships for conceptual database design. Primarily used during design phase before implementation.
Object-Based Data Models
Object-oriented and object-relational models. Extends relational model with object concepts like encapsulation, inheritance.
Semi-structured Data Model
Permits the specification of data where individual items may have different sets of attributes. Example: XML, JSON.
Historical Models
- Network Model: Represents data as a graph with many-to-many relationships
- Hierarchical Model: Represents data as a tree structure with parent-child relationships
These older models are less common today but laid the foundation for modern DBMS.
10. Database Languages
Data Definition Language (DDL)
Specifies the database schema, structure, and constraints. DDL compiler generates table templates stored in the data dictionary.
CREATE TABLE instructor (
ID CHAR(5),
name VARCHAR(20),
dept_name VARCHAR(20),
salary NUMERIC(8,2)
);
DDL defines: Database schema, Integrity constraints, Primary keys, Foreign keys, Authorization
Data Manipulation Language (DML)
Used to access and manipulate data. Also known as the query language.
Two types:
- Procedural DML: User specifies what data and how to get it
- Declarative DML: User specifies what data without specifying how (SQL)
SELECT name
FROM instructor
WHERE dept_name = 'Physics';
SQL Example - Joining Tables
SELECT instructor.ID, department.building
FROM instructor, department
WHERE instructor.dept_name = department.dept_name
AND department.dept_name = 'Physics';
This query finds the ID and building of instructors in the Physics department by joining instructor and department tables.
11. Database Administrator (DBA)
Exam Tip
DBA roles and responsibilities is a frequent 5-mark question. Learn specific technical responsibilities, not just general descriptions.
Who is a DBA?
The Database Administrator (DBA) is a person or group responsible for the overall control and management of the database system. The DBA has central authority over the database.
Responsibilities of DBA
Schema Definition
The DBA creates the original database schema by executing Data Definition Language (DDL) statements. This includes defining tables, relationships, and constraints.
Storage Structure and Access Method Definition
The DBA decides how data is to be represented in the stored database, including specific indexing techniques, file organization, and storage allocation.
Schema and Physical Organization Modification
The DBA carries out changes to the schema and physical organization to reflect the changing needs of the organization or to improve performance.
Granting Authorization for Data Access
The DBA regulates which parts of the database various users can access. This includes granting and revoking privileges to users and roles.
Routine Maintenance
This includes:
- Periodic backup of the database
- Monitoring database performance
- Ensuring enough free disk space
- Applying patches and upgrades
- Recovery procedures after failures
12. Database Users
Database users are categorized based on their mode of interaction with the database system:
Application Programmers
Write application programs using DML calls embedded in a host language (Java, Python, C++). They use tools like JDBC/ODBC to access the database.
Sophisticated Users
Interact with the system without writing programs. They use database query language (SQL) to form ad-hoc queries. Examples: analysts, data scientists.
Naive/Parametric Users
Unsophisticated users who interact through pre-defined applications. They invoke canned transactions that were previously programmed. Examples: bank tellers, reservation clerks.
Database Administrators
Have central control over the database system. Responsible for schema definition, access authorization, and routine maintenance.
13. Applications of Database Systems
Database systems are used in virtually all areas where computers are employed. Here are major application domains:
Banking Systems
Account management, transactions, loan processing, ATM operations, customer information, fraud detection.
E-Commerce
Product catalogs, shopping carts, order tracking, customer profiles, inventory management, payment processing.
Telecommunications
Call records, billing systems, network management, customer service, prepaid/postpaid accounts.
Healthcare Systems
Patient records, appointments, prescriptions, medical history, billing, insurance claims, lab results.
Social Media Platforms
User profiles, connections/friends, posts, messages, notifications, content recommendations.
Education Systems
Student records, course registration, grades, attendance, faculty information, library management.
Airline Reservation
Flight schedules, seat booking, passenger information, ticketing, check-in, baggage tracking.
Government Applications
Census data, tax records, voter registration, public services, land records, licensing systems.
Manufacturing
Production tracking, inventory, supply chain, orders, quality control, equipment maintenance.
Case Study: University Database
A university database system typically includes:
- Students, Instructors, Courses
- Departments, Buildings, Classrooms
- Sections, Semesters, Time Slots
- Register students for courses
- Record and calculate grades/GPA
- Generate transcripts and reports
14. Practice Questions
Describe the overall architecture of DBMS with suitable diagram. [10 marks]
▼The DBMS architecture follows the Three-Schema Architecture (ANSI/SPARC) that provides three levels of abstraction:
1. External Level (View Level):
- Highest level of abstraction, closest to users
- Contains multiple external schemas (user views)
- Each view describes only the portion of database relevant to a user group
- Provides security by hiding sensitive data
2. Conceptual Level (Logical Level):
- Community view of the entire database
- Describes what data is stored and relationships
- Hides physical storage details
- Database Administrator works at this level
3. Internal Level (Physical Level):
- Lowest level, closest to physical storage
- Describes how data is actually stored
- Deals with file organization, indexes, access paths
- Optimized for performance and storage efficiency
Mappings:
- External/Conceptual Mapping: Transforms user views to conceptual schema
- Conceptual/Internal Mapping: Transforms logical to physical representation
Note: Include a diagram showing the three levels with mappings between them.
Explain the concept of data independence. Discuss the differences between logical and physical data independence. [5 marks]
▼Data Independence is the capacity to change the schema at one level without having to change the schema at the next higher level.
Physical Data Independence:
- Ability to change internal schema without changing conceptual schema
- Examples: Creating indexes, changing file organization, moving to SSD
- Application programs remain unaffected
- Easier to achieve
Logical Data Independence:
- Ability to change conceptual schema without changing external schemas
- Examples: Adding new attributes, adding new entities
- Existing applications continue to work
- Harder to achieve (conceptual schema is central)
Differentiate between Database Management System and File System. [5 marks]
▼| Aspect | File System | DBMS |
|---|---|---|
| Data Redundancy | High - data duplicated | Controlled/Minimal |
| Data Consistency | Hard to maintain | Ensured through constraints |
| Data Independence | Not supported | Supported (logical & physical) |
| Concurrent Access | Limited, problematic | Well-managed with locking |
| Security | File-level only | Comprehensive, user-level |
| Backup/Recovery | Manual, complex | Automated, reliable |
| Query Capability | Requires programming | SQL queries supported |
| Integrity | In application code | Defined in schema |
Explain responsibilities of DBA. [5 marks]
▼The Database Administrator (DBA) is responsible for:
1. Schema Definition: Creating the original database schema using DDL statements.
2. Storage Structure and Access Method Definition: Deciding how data is represented, indexing techniques, and file organization.
3. Schema and Physical Organization Modification: Making changes to reflect changing needs and improve performance.
4. Granting Authorization: Regulating which users can access what data, managing privileges.
5. Routine Maintenance: Periodic backups, monitoring performance, ensuring disk space, applying patches, recovery procedures.
Identify different types of users of database management system. [5 marks]
▼1. Application Programmers:
- Write application programs using DML calls
- Use host languages with embedded SQL
- Use tools like JDBC/ODBC
2. Sophisticated/Casual Users:
- Interact without writing programs
- Use query languages (SQL) directly
- Form ad-hoc queries as needed
3. Naive/Parametric End Users:
- Use pre-defined applications
- Invoke canned transactions
- Examples: bank tellers, reservation clerks
4. Database Administrators:
- Central control over database
- Schema definition and modification
- Security and maintenance
Explain three-level schema architecture with suitable diagram. [10 marks]
▼The Three-Level Schema Architecture (ANSI/SPARC architecture) separates user applications from the physical database:
External Level (View Level):
- Highest level of abstraction
- Multiple external schemas for different user groups
- Each describes relevant portion of database
- Provides security through data hiding
- Example: Student view shows grades; HR view shows salary
Conceptual Level (Logical Level):
- Community view of entire database
- Describes entities, data types, relationships, constraints
- Hides physical storage details
- DBA defines schema at this level
Internal Level (Physical Level):
- Lowest level of abstraction
- Describes physical storage structure
- Deals with file organization, indexes, access paths
- Optimized for performance
Benefits:
- Supports data independence (logical and physical)
- Multiple user views of same data
- Security through view-level access control
Include diagram showing three levels connected by mappings, with users at top and stored database at bottom.