Sunday, November 6, 2011

Database Normalization: What to test for Second Normal Form?

In the last post, Database Normalization: What to test for First Normal Form?, I explained database normalization briefly. You also saw the tests that should be executed to check the first normal form (1NF). In this post, let us understand the second normal form (2NF) and the tests that need to be executed for checking it. View my video on Second Normal Form with examples or read on...
A table that is in 2NF is already in 1NF. Additionally, each column that is not part of any candidate key depends on the entire candidate key (and not a part of it). Now, let us see the examples of some tables that are not in 2NF and how to convert them to 2NF.

a. Order (OrderNumber, CustomerName, CustomerAddress, OrderTotal)
In this table, the only candidate key is OrderNumber that is unique in each row of the table. CustomerName and OrderTotal depend on the OrderNumber. However, CustomerAddress does not depend on the OrderNumber. The customer's address depends on the CustomerName value. So, this table does not satisfy the criteria for 2NF. It does not even meet a particular qualification criteria for 1NF (having only relevant columns). Such a table design may lead to data modification problems. For example, the CustomerAddress for a particular CustomerName may be updated in few rows but not in all. So, the same customer may end up with different addresses in this table.
To convert the Order table to 2NF, it needs to be broken into two tables:
Customer (CustomerName, CustomerAddress)  
Note: CustomerName is the only candidate key and CustomerAddress depends on it.
Order (OrderNumber, CustomerName, OrderTotal) 
Note: OrderNumber is the only candidate key. CustomerName depends on OrderNumber. So does OrderTotal.
Once the table is in 2NF, the duplication of CustomerAddress is avoided and data modification problems are eliminated.
b. Project (Employee, ProjectName, ProjectDuration, AllocationDuration)
A project may have one or more employees allocated to it. ProjectName is not the primary key because multiple rows have the same project name. Also, let us assume that an employee can be assigned to more than one project at a time. So, Employee is not the primary key of this table because multiple rows have the same employee in them. Allocation duration can be identical in multiple rows. However {Employee, Project, AllocationDuration} is unique and therefore, it is a candidate key. ProjectDuration column is not a part of this candidate key. ProjectDuration does not depend on the entire candidate key but only one part of it, the ProjectName column. So, this table is not in 2NF.
In order to convert the Projects table to 2NF, it needs to broken into the relevant tables:
Project (ProjectName, ProjectDuration)
Note: ProjectName is the only candidate key. ProjectDuration depends on ProjectName.
ProjectAllocation (Employee, ProjectName, AllocationDuration) 
Note: Here the candidate key is {Employee, ProjectName, AllocationDuration}. There is no non-key column.

With the above understanding in mind, here the tests that should be applied to check 2NF on every table in the database:
1. Is each criteria of 1NF satisfied?
2. What are the candidate keys? For each candidate key, what columns are not a part of the candidate key? Is each such column dependent on the entire candidate key?

Want to learn more? See my video on 2NF for more explanation.