DMD Assignment 3 & 4

Q1. Physical Database Design

Physical Database Design is the process of converting a logical data model into a physical schema that can be implemented on a specific DBMS. It deals with how data is actually stored, organized, and accessed on hardware.

Key Activities:

Selecting storage structures (heap, clustered, hash)
Choosing file organizations for tables
Defining indexes to speed up queries
Partitioning large tables horizontally or vertically
Choosing storage engine (InnoDB, MyISAM, etc.)
Deciding on buffer pool / cache sizes

Example:

Consider a Students table with millions of rows. Physical design decisions:

CREATE TABLE Students (
  student_id   INT PRIMARY KEY,
  name         VARCHAR(100),
  dept         VARCHAR(50),
  dob          DATE
) ENGINE=InnoDB;

-- Index on dept for fast dept-wise queries
CREATE INDEX idx_dept ON Students(dept);

-- Clustered on student_id (primary key = clustered index in InnoDB)
-- Partitioned by dept to split large data

Here, InnoDB stores rows in B+ tree order of student_id. The secondary index on dept speeds up queries like "find all CS students" without full table scan.

★ Physical design ≠ Logical design. Logical = what data; Physical = where & how it's stored.

1

Q2. DDL, DML and DCL

SQL commands are classified into sub-languages based on their purpose:

DDL — Data Definition Language

Used to define and modify database structure (schema). Changes are auto-committed.

CREATE — create table/db/view/index
ALTER — modify existing structure
DROP — delete schema object
TRUNCATE — remove all rows (no rollback)
RENAME — rename an object

CREATE TABLE Emp (id INT, name VARCHAR(50));
ALTER TABLE Emp ADD salary DECIMAL(10,2);
DROP TABLE Emp;

DML — Data Manipulation Language

Used to insert, update, delete, and retrieve data. Can be rolled back.

SELECT — query/read data
INSERT — add new rows
UPDATE — modify existing rows
DELETE — remove rows

INSERT INTO Emp VALUES (1, 'Alice', 50000);
UPDATE Emp SET salary = 60000 WHERE id = 1;
DELETE FROM Emp WHERE id = 1;

DCL — Data Control Language

Used to control access permissions to database objects.

GRANT — give privileges to a user
REVOKE — take back privileges

GRANT SELECT, INSERT ON Emp TO 'alice'@'localhost';
REVOKE INSERT ON Emp FROM 'alice'@'localhost';

Feature	DDL	DML	DCL
Purpose	Schema definition	Data manipulation	Access control
Rollback	No (auto-commit)	Yes	No
Commands	CREATE, ALTER, DROP	SELECT, INSERT, UPDATE, DELETE	GRANT, REVOKE

2

Q3. Advanced Manipulation Using SQL

1. Joins

Combine rows from two or more tables based on related columns.

-- INNER JOIN: rows matching in both
SELECT e.name, d.dept_name
FROM Emp e INNER JOIN Dept d ON e.dept_id = d.id;

-- LEFT JOIN: all from left, matched from right
SELECT e.name, d.dept_name
FROM Emp e LEFT JOIN Dept d ON e.dept_id = d.id;

2. Subqueries

-- Employees earning above average
SELECT name FROM Emp
WHERE salary > (SELECT AVG(salary) FROM Emp);

3. Aggregate Functions & GROUP BY

SELECT dept, COUNT(*), AVG(salary), MAX(salary)
FROM Emp
GROUP BY dept
HAVING AVG(salary) > 40000;

4. Views

A virtual table based on a SELECT query.

CREATE VIEW HighPaid AS
SELECT name, salary FROM Emp WHERE salary > 70000;

5. Window Functions

Perform calculations across rows related to current row without collapsing them.

SELECT name, salary,
  RANK() OVER (PARTITION BY dept ORDER BY salary DESC) AS rnk,
  ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM Emp;

6. CTEs (Common Table Expressions)

WITH TopEmp AS (
  SELECT * FROM Emp WHERE salary > 60000
)
SELECT name FROM TopEmp WHERE dept = 'CS';

7. CASE Expression

SELECT name,
  CASE WHEN salary > 70000 THEN 'Senior'
       WHEN salary > 40000 THEN 'Mid'
       ELSE 'Junior'
  END AS level
FROM Emp;

3

Q4. Cursor, Exception Handling & Triggers

Cursors

A cursor is a database object that allows row-by-row processing of a result set. Used in stored procedures/PL-SQL.

DELIMITER $$
CREATE PROCEDURE ShowSalaries()
BEGIN
  DECLARE done INT DEFAULT 0;
  DECLARE emp_name VARCHAR(100);
  DECLARE emp_sal  DECIMAL(10,2);
  DECLARE cur CURSOR FOR SELECT name, salary FROM Emp;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

  OPEN cur;
  read_loop: LOOP
    FETCH cur INTO emp_name, emp_sal;
    IF done THEN LEAVE read_loop; END IF;
    SELECT CONCAT(emp_name, ': ', emp_sal);
  END LOOP;
  CLOSE cur;
END$$
DELIMITER ;

Steps: DECLARE → OPEN → FETCH → CLOSE

Exception Handling

Handle runtime errors gracefully using DECLARE ... HANDLER in MySQL.

DECLARE CONTINUE HANDLER FOR SQLEXCEPTION
BEGIN
  -- runs when any SQL error occurs
  ROLLBACK;
  SELECT 'Error occurred!';
END;

In Oracle PL/SQL: EXCEPTION WHEN NO_DATA_FOUND THEN ...

Triggers

A trigger is a stored procedure that automatically fires in response to an event (INSERT / UPDATE / DELETE) on a table.

-- Log every salary update in AuditLog table
CREATE TRIGGER salary_audit
AFTER UPDATE ON Emp
FOR EACH ROW
BEGIN
  IF OLD.salary != NEW.salary THEN
    INSERT INTO AuditLog(emp_id, old_sal, new_sal, changed_at)
    VALUES (NEW.id, OLD.salary, NEW.salary, NOW());
  END IF;
END;

Type	Timing	Use Case
BEFORE INSERT	Before row is added	Validate/modify data
AFTER INSERT	After row is added	Logging, cascade
BEFORE UPDATE	Before update	Prevent invalid changes
AFTER DELETE	After deletion	Archive deleted records

4

Q5. Procedures and Functions

Stored Procedure

A stored procedure is a precompiled set of SQL statements stored in the DB. Called explicitly; may or may not return a value.

DELIMITER $$
CREATE PROCEDURE GetEmpByDept(IN dept_name VARCHAR(50))
BEGIN
  SELECT name, salary FROM Emp WHERE dept = dept_name;
END$$
DELIMITER ;

-- Call it:
CALL GetEmpByDept('CS');

Types of parameters: IN (input), OUT (output), INOUT (both).

CREATE PROCEDURE GetCount(IN dept_name VARCHAR(50), OUT cnt INT)
BEGIN
  SELECT COUNT(*) INTO cnt FROM Emp WHERE dept = dept_name;
END;

CALL GetCount('CS', @c);
SELECT @c;  -- outputs count

User-Defined Function (UDF)

A function always returns a single value. Can be used directly in SQL expressions.

DELIMITER $$
CREATE FUNCTION CalcBonus(sal DECIMAL(10,2)) RETURNS DECIMAL(10,2)
DETERMINISTIC
BEGIN
  DECLARE bonus DECIMAL(10,2);
  IF sal > 70000 THEN SET bonus = sal * 0.20;
  ELSEIF sal > 40000 THEN SET bonus = sal * 0.10;
  ELSE SET bonus = sal * 0.05;
  END IF;
  RETURN bonus;
END$$
DELIMITER ;

-- Use in query:
SELECT name, salary, CalcBonus(salary) AS bonus FROM Emp;

Feature	Procedure	Function
Returns	0 or more values (OUT params)	Always 1 value
Use in SELECT	No	Yes
DML inside	Allowed	Not recommended
Called with	CALL proc_name()	SELECT fn_name()

5

Q6. Database Tuning

Database tuning is the process of improving the performance of a database system by optimizing queries, schema, indexes, and server configuration.

1. Query Tuning

Use EXPLAIN to analyze query execution plan
Avoid SELECT *; select only needed columns
Avoid functions on indexed columns in WHERE clause
Use joins instead of correlated subqueries

EXPLAIN SELECT name FROM Emp WHERE dept = 'CS';
-- Check if index is being used (type = 'ref' is good)

2. Index Tuning

Create indexes on frequently searched / filtered columns
Use composite indexes for multi-column WHERE clauses
Remove unused indexes (they slow down writes)
Use covering indexes to avoid table lookups

3. Schema Tuning

Choose appropriate data types (INT vs BIGINT, CHAR vs VARCHAR)
Normalize to reduce redundancy; denormalize for read performance
Partition large tables

4. Server / Configuration Tuning

Increase innodb_buffer_pool_size (cache hot data)
Tune query_cache_size
Set optimal max_connections
Enable slow query log to find bottlenecks

5. Connection Pooling

Reuse existing DB connections instead of creating new ones for each request. Reduces overhead significantly in high-traffic apps.

★ Tuning is iterative: Measure → Identify bottleneck → Optimize → Repeat

6

Q7. Clustering and Indexing in DBMS

Clustering

Clustering means physically storing related rows together on disk based on their key values. This minimizes disk I/O for range queries.

Types of clustering:

Intra-file clustering — related rows of one table stored together
Inter-file clustering — rows from multiple related tables stored together (joined data co-located)

Example: In InnoDB, the primary key is the clustered index. Rows are stored in B+ tree order of primary key. So queries like:

SELECT * FROM Emp WHERE student_id BETWEEN 100 AND 200;

are fast because those rows are contiguous on disk.

Indexing

An index is a separate data structure (usually B+ tree or hash) that speeds up data retrieval.

Type	Description	Example
Primary Index	On primary key (auto in InnoDB)	student_id
Secondary Index	On non-key columns	CREATE INDEX ON dept
Unique Index	No duplicate values allowed	Email field
Composite Index	Multiple columns	(dept, salary)
Full-text Index	For text search	Article content
Hash Index	Exact match lookup only	Memory tables

-- Create composite index
CREATE INDEX idx_dept_sal ON Emp(dept, salary);

-- Query benefits from this index:
SELECT name FROM Emp WHERE dept='CS' AND salary > 50000;

Dense vs Sparse Index

Dense Index: One index entry per data record
Sparse Index: One entry per block (only for sorted files)

★ Clustered index = data physically ordered. Heap = no order. Secondary = separate structure pointing to data.

7

Q8. Database Security Mechanisms

Database Security refers to protecting the database from unauthorized access, corruption, or theft.

1. Access Control

Controls who can access what data. Two types:

DAC (Discretionary Access Control) — object owner grants permissions (SQL GRANT/REVOKE)
MAC (Mandatory Access Control) — system-defined labels (Top Secret, Confidential, etc.)
RBAC (Role-Based Access Control) — permissions assigned to roles, roles to users

-- DCL for access control
GRANT SELECT, INSERT ON Students TO 'teacher'@'localhost';
REVOKE DELETE ON Students FROM 'teacher'@'localhost';

-- Role-based (MySQL 8+)
CREATE ROLE 'reader';
GRANT SELECT ON mydb.* TO 'reader';
GRANT 'reader' TO 'alice'@'localhost';

2. Authentication

Verify identity of users. Methods: username/password, SSL certificates, Kerberos, LDAP, two-factor auth.

3. Encryption

Data-at-rest encryption — encrypt stored files (InnoDB tablespace encryption)
Data-in-transit encryption — SSL/TLS for DB connections
Column-level encryption — AES_ENCRYPT() for sensitive fields like passwords

4. Auditing & Logging

Track who did what and when. Enable MySQL general log / audit plugin. Store in AuditLog table.

5. SQL Injection Prevention

Use prepared statements / parameterized queries
Validate and sanitize all user inputs
Principle of least privilege (don't use root account in app)

6. Backup & Recovery

Regular backups (full, incremental) protect against data loss. Use point-in-time recovery with binary logs.

★ Security = Authentication + Authorization + Encryption + Auditing + Backup

8

Q9. Denormalization

Denormalization is the intentional introduction of redundancy into a database schema to improve read performance. It's the reverse of normalization.

Why Denormalize?

Fully normalized databases require many JOINs for queries, which are expensive for large datasets. Denormalization trades write efficiency for read speed.

Techniques

Storing derived data — store computed column (e.g., total_price = qty × unit_price)
Pre-joining tables — merge frequently joined tables into one
Duplicating columns — copy dept_name from Dept into Emp to avoid join
Materialized views — persist result of complex query
Partitioning / Summary tables — store aggregated data separately

Example

Normalized: Orders(order_id, cust_id) + Customers(cust_id, cust_name, city)

Denormalized: Orders(order_id, cust_id, cust_name, city) — city & name stored in Orders to avoid JOIN.

-- Before (normalized) — needs join
SELECT o.order_id, c.cust_name FROM Orders o JOIN Customers c ON o.cust_id = c.id;

-- After (denormalized) — no join needed
SELECT order_id, cust_name FROM Orders;

Advantages	Disadvantages
Faster read queries (fewer joins)	Data redundancy (more storage)
Simpler queries	Update anomalies (update in multiple places)
Better performance for reporting/analytics	Insert / Delete anomalies
Reduced query complexity	Inconsistency risk if not carefully managed
Good for OLAP systems	Not suitable for OLTP with heavy writes

★ Use Normalization for OLTP (transactional). Use Denormalization for OLAP (analytical / reporting).

✦ ✦ ✦

— End of Assignment —

9