I

Database System Concepts & Architecture

Introduction to DBMS, Architecture, and Applications

1. Introduction to Database Systems

What is a Database?

A database is a collection of related data organized in a way that data can be easily accessed, managed, and updated. It represents some aspect of the real world (called the mini-world or Universe of Discourse).

What is a Database Management System (DBMS)?

A DBMS is a software system that enables users to define, create, maintain, and control access to the database. It provides an interface between the data and the software applications or end users.

Key Functions of DBMS

📝

Define

Specifying the types, structures, and constraints of the data to be stored in the database.

🔨

Construct

The process of storing the data on some storage medium that is controlled by the DBMS.

🔄

Manipulate

Functions such as querying the database, updating the database, and generating reports.

🤝

Share

Allowing multiple users and programs to access the database simultaneously.

2. Key Definitions

Term Definition Example
Data Raw, unprocessed facts that have no meaning on their own "Prathamesh", "Navi Mumbai", "12/12/2000"
Information Processed data that has meaning and context "The age of Prathamesh is 24 years"
Database Organized collection of related data Online Railway Reservation System
DBMS Software to manage databases MySQL, Oracle, PostgreSQL
Schema Logical structure/blueprint of the database Table definitions, relationships
Instance Actual content/data at a particular point in time Current rows in tables

Example: University Database

A university database application may include:

  • Add new students, instructors, and courses
  • Register students for courses and generate class rosters
  • Assign grades to students, compute grade point averages (GPA)
  • Generate transcripts and reports

3. Characteristics of Database Systems

1

Self-Describing Nature

A DBMS catalog stores the description of the database (called metadata). This allows the DBMS software to work with different databases without modification.

2

Program-Data Independence

The structure of data files is stored separately from access programs. Changes to data structure don't require changes to programs.

3

Data Abstraction

DBMS provides a conceptual representation of data that hides storage details. Users can view data without worrying about how it's stored.

4

Multiple Views of Data

Different users can have different views of the same database based on their requirements and permissions.

5

Sharing of Data and Multi-user Transaction Processing

DBMS allows multiple users to access the database concurrently while maintaining data integrity through concurrency control.

4. File System vs Database System

Why Not Use File Systems?

In the early days, database applications were built directly on top of file systems. This approach had significant drawbacks that led to the development of DBMS.

Drawbacks of File Systems

1. Data Redundancy & Inconsistency

Multiple file formats, duplication of information in different files leads to wastage of storage and inconsistent data.

2. Difficulty in Accessing Data

Need to write a new program to carry out each new task. No flexible query capabilities.

3. Data Isolation

Multiple files and formats make data retrieval complex. Data is scattered across various files.

4. Integrity Problems

Integrity constraints (e.g., account balance > 0) become "buried" in program code. Hard to add or change constraints.

5. Atomicity Problems

Failures may leave data in an inconsistent state with partial updates. Example: Transfer funds between accounts should complete fully or not at all.

6. Concurrent Access Anomalies

Uncontrolled concurrent accesses can lead to inconsistencies. Example: Two people withdrawing from the same account simultaneously.

7. Security Problems

Hard to provide user access to some, but not all, data. No centralized security control.

Comparison Table

Feature File System DBMS
Data Redundancy High redundancy Controlled/Minimal
Data Consistency Difficult to maintain Ensured through constraints
Data Independence Not supported Fully supported
Concurrent Access Limited/Problematic Well-managed
Security Limited, file-level Comprehensive, user-level
Backup & Recovery Manual, complex Automated, reliable
Query Capability Requires programming SQL queries
Integrity Constraints Embedded in programs Defined in schema

5. Data Abstraction

What is Data Abstraction?

Data abstraction is the process of hiding the complexity of data storage from users while providing a simplified interface. It allows users to interact with data without knowing the underlying implementation details.

Levels of Abstraction

Three Levels of Data Abstraction

View Level (External)
Logical Level (Conceptual)
Physical Level (Internal)
👁️

View Level

Highest level - describes only part of the entire database. Different users have different views based on their needs. Hides complexity and provides security.

Example: A student view shows grades and courses; a payroll view shows salary information.

📋

Logical Level

Middle level - describes what data is stored and relationships among data. This is where the DBA works to define the schema.

type instructor = record ID : string; name : string; dept_name : string; salary : integer; end;
💾

Physical Level

Lowest level - describes how data is actually stored. Deals with complex data structures, file organizations, access paths, and storage allocation.

Example: B-trees, hash files, block allocations, indexes.

6. Data Independence

What is Data Independence?

Data independence is the capacity to change the schema at one level of the database system without having to change the schema at the next higher level. It is a key advantage of the three-schema architecture.

🔧

Physical Data Independence

The capacity to change the internal schema without having to change the conceptual schema.

Examples:

  • Creating a new index on the Salary column to speed up queries
  • Moving the database file from a magnetic disk to an SSD
  • Changing file organization from sequential to indexed
  • Changing storage structures or access methods

Note: These changes should not require any alteration to the logical definition of the data or application programs.

📐

Logical Data Independence

The capacity to change the conceptual schema without having to change the external schemas or application programs.

Examples:

  • Adding a new field (e.g., Date_of_Birth) to the Employee table
  • Adding a new entity type or relationship
  • Modifying constraints on existing data
  • Splitting or merging tables

Important: Logical data independence is harder to achieve than physical data independence because the conceptual schema is the central contract of the database.

7. DBMS System Architecture

The architecture of a database system is greatly influenced by the underlying computer system on which it runs. There are several types of architectures:

🖥️

Centralized Architecture

All components (DBMS, application programs, user interfaces) run on a single computer. Suitable for small organizations with limited users.

🔗

Client-Server Architecture

Server provides database services; clients access the server. Two-tier (client → server) or three-tier (client → application server → database server).

Parallel Architecture

Multiple processors work together to process database operations. Increases throughput and reduces response time for large databases.

🌐

Distributed Architecture

Data is distributed across multiple sites connected by a network. Each site has its own DBMS but they work together as one system.

8. Three-Schema Architecture (ANSI/SPARC)

Exam Tip

This topic appears in almost every exam (85%+ frequency). Be prepared to draw the diagram and explain each level with examples.

Three-Schema Architecture Diagram

External View 1 External View 2 External View n External Level External/Conceptual Mapping Conceptual Schema (Logical Structure) Conceptual Level Conceptual/Internal Mapping Internal Schema (Physical Storage) Internal Level Stored Database Physical Storage
👥

External Level (View Level)

  • Highest level of abstraction
  • Contains external schemas or user views
  • Each view describes part of the database
  • Different users see different views
  • Provides security by hiding data
📊

Conceptual Level (Logical Level)

  • Community view of the database
  • Describes what data is stored
  • Defines entities, data types, relationships
  • Hides physical storage details
  • DBA operates at this level
💿

Internal Level (Physical Level)

  • Lowest level of abstraction
  • Describes how data is stored
  • Deals with file organizations
  • Access paths (indexes, B-trees)
  • Optimized for performance

9. Data Models

What is a Data Model?

A data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. It provides the underlying structure of a database.

Relational Model

Uses tables (relations) to represent data and relationships. Most widely used model today.

IDNameDept
101JohnCS
102JaneIT

Entity-Relationship Model

Uses entities and relationships for conceptual database design. Primarily used during design phase before implementation.

Object-Based Data Models

Object-oriented and object-relational models. Extends relational model with object concepts like encapsulation, inheritance.

Semi-structured Data Model

Permits the specification of data where individual items may have different sets of attributes. Example: XML, JSON.

Historical Models

  • Network Model: Represents data as a graph with many-to-many relationships
  • Hierarchical Model: Represents data as a tree structure with parent-child relationships

These older models are less common today but laid the foundation for modern DBMS.

10. Database Languages

📝

Data Definition Language (DDL)

Specifies the database schema, structure, and constraints. DDL compiler generates table templates stored in the data dictionary.

CREATE TABLE instructor ( ID CHAR(5), name VARCHAR(20), dept_name VARCHAR(20), salary NUMERIC(8,2) );

DDL defines: Database schema, Integrity constraints, Primary keys, Foreign keys, Authorization

🔄

Data Manipulation Language (DML)

Used to access and manipulate data. Also known as the query language.

Two types:

  • Procedural DML: User specifies what data and how to get it
  • Declarative DML: User specifies what data without specifying how (SQL)
SELECT name FROM instructor WHERE dept_name = 'Physics';

SQL Example - Joining Tables

SELECT instructor.ID, department.building FROM instructor, department WHERE instructor.dept_name = department.dept_name AND department.dept_name = 'Physics';

This query finds the ID and building of instructors in the Physics department by joining instructor and department tables.

11. Database Administrator (DBA)

Exam Tip

DBA roles and responsibilities is a frequent 5-mark question. Learn specific technical responsibilities, not just general descriptions.

Who is a DBA?

The Database Administrator (DBA) is a person or group responsible for the overall control and management of the database system. The DBA has central authority over the database.

Responsibilities of DBA

1

Schema Definition

The DBA creates the original database schema by executing Data Definition Language (DDL) statements. This includes defining tables, relationships, and constraints.

2

Storage Structure and Access Method Definition

The DBA decides how data is to be represented in the stored database, including specific indexing techniques, file organization, and storage allocation.

3

Schema and Physical Organization Modification

The DBA carries out changes to the schema and physical organization to reflect the changing needs of the organization or to improve performance.

4

Granting Authorization for Data Access

The DBA regulates which parts of the database various users can access. This includes granting and revoking privileges to users and roles.

5

Routine Maintenance

This includes:

  • Periodic backup of the database
  • Monitoring database performance
  • Ensuring enough free disk space
  • Applying patches and upgrades
  • Recovery procedures after failures

12. Database Users

Database users are categorized based on their mode of interaction with the database system:

👨‍💻

Application Programmers

Write application programs using DML calls embedded in a host language (Java, Python, C++). They use tools like JDBC/ODBC to access the database.

🔍

Sophisticated Users

Interact with the system without writing programs. They use database query language (SQL) to form ad-hoc queries. Examples: analysts, data scientists.

📱

Naive/Parametric Users

Unsophisticated users who interact through pre-defined applications. They invoke canned transactions that were previously programmed. Examples: bank tellers, reservation clerks.

👑

Database Administrators

Have central control over the database system. Responsible for schema definition, access authorization, and routine maintenance.

13. Applications of Database Systems

Database systems are used in virtually all areas where computers are employed. Here are major application domains:

🏦

Banking Systems

Account management, transactions, loan processing, ATM operations, customer information, fraud detection.

🛒

E-Commerce

Product catalogs, shopping carts, order tracking, customer profiles, inventory management, payment processing.

📱

Telecommunications

Call records, billing systems, network management, customer service, prepaid/postpaid accounts.

🏥

Healthcare Systems

Patient records, appointments, prescriptions, medical history, billing, insurance claims, lab results.

📘

Social Media Platforms

User profiles, connections/friends, posts, messages, notifications, content recommendations.

🎓

Education Systems

Student records, course registration, grades, attendance, faculty information, library management.

✈️

Airline Reservation

Flight schedules, seat booking, passenger information, ticketing, check-in, baggage tracking.

🏛️

Government Applications

Census data, tax records, voter registration, public services, land records, licensing systems.

🏭

Manufacturing

Production tracking, inventory, supply chain, orders, quality control, equipment maintenance.

Case Study: University Database

A university database system typically includes:

Entities:
  • Students, Instructors, Courses
  • Departments, Buildings, Classrooms
  • Sections, Semesters, Time Slots
Operations:
  • Register students for courses
  • Record and calculate grades/GPA
  • Generate transcripts and reports

14. Practice Questions

Describe the overall architecture of DBMS with suitable diagram. [10 marks]

The DBMS architecture follows the Three-Schema Architecture (ANSI/SPARC) that provides three levels of abstraction:

1. External Level (View Level):

  • Highest level of abstraction, closest to users
  • Contains multiple external schemas (user views)
  • Each view describes only the portion of database relevant to a user group
  • Provides security by hiding sensitive data

2. Conceptual Level (Logical Level):

  • Community view of the entire database
  • Describes what data is stored and relationships
  • Hides physical storage details
  • Database Administrator works at this level

3. Internal Level (Physical Level):

  • Lowest level, closest to physical storage
  • Describes how data is actually stored
  • Deals with file organization, indexes, access paths
  • Optimized for performance and storage efficiency

Mappings:

  • External/Conceptual Mapping: Transforms user views to conceptual schema
  • Conceptual/Internal Mapping: Transforms logical to physical representation

Note: Include a diagram showing the three levels with mappings between them.

Explain the concept of data independence. Discuss the differences between logical and physical data independence. [5 marks]

Data Independence is the capacity to change the schema at one level without having to change the schema at the next higher level.

Physical Data Independence:

  • Ability to change internal schema without changing conceptual schema
  • Examples: Creating indexes, changing file organization, moving to SSD
  • Application programs remain unaffected
  • Easier to achieve

Logical Data Independence:

  • Ability to change conceptual schema without changing external schemas
  • Examples: Adding new attributes, adding new entities
  • Existing applications continue to work
  • Harder to achieve (conceptual schema is central)

Differentiate between Database Management System and File System. [5 marks]

AspectFile SystemDBMS
Data RedundancyHigh - data duplicatedControlled/Minimal
Data ConsistencyHard to maintainEnsured through constraints
Data IndependenceNot supportedSupported (logical & physical)
Concurrent AccessLimited, problematicWell-managed with locking
SecurityFile-level onlyComprehensive, user-level
Backup/RecoveryManual, complexAutomated, reliable
Query CapabilityRequires programmingSQL queries supported
IntegrityIn application codeDefined in schema

Explain responsibilities of DBA. [5 marks]

The Database Administrator (DBA) is responsible for:

1. Schema Definition: Creating the original database schema using DDL statements.

2. Storage Structure and Access Method Definition: Deciding how data is represented, indexing techniques, and file organization.

3. Schema and Physical Organization Modification: Making changes to reflect changing needs and improve performance.

4. Granting Authorization: Regulating which users can access what data, managing privileges.

5. Routine Maintenance: Periodic backups, monitoring performance, ensuring disk space, applying patches, recovery procedures.

Identify different types of users of database management system. [5 marks]

1. Application Programmers:

  • Write application programs using DML calls
  • Use host languages with embedded SQL
  • Use tools like JDBC/ODBC

2. Sophisticated/Casual Users:

  • Interact without writing programs
  • Use query languages (SQL) directly
  • Form ad-hoc queries as needed

3. Naive/Parametric End Users:

  • Use pre-defined applications
  • Invoke canned transactions
  • Examples: bank tellers, reservation clerks

4. Database Administrators:

  • Central control over database
  • Schema definition and modification
  • Security and maintenance

Explain three-level schema architecture with suitable diagram. [10 marks]

The Three-Level Schema Architecture (ANSI/SPARC architecture) separates user applications from the physical database:

External Level (View Level):

  • Highest level of abstraction
  • Multiple external schemas for different user groups
  • Each describes relevant portion of database
  • Provides security through data hiding
  • Example: Student view shows grades; HR view shows salary

Conceptual Level (Logical Level):

  • Community view of entire database
  • Describes entities, data types, relationships, constraints
  • Hides physical storage details
  • DBA defines schema at this level

Internal Level (Physical Level):

  • Lowest level of abstraction
  • Describes physical storage structure
  • Deals with file organization, indexes, access paths
  • Optimized for performance

Benefits:

  • Supports data independence (logical and physical)
  • Multiple user views of same data
  • Security through view-level access control

Include diagram showing three levels connected by mappings, with users at top and stored database at bottom.