Database Normalisation

In March I have been concentrating on learning more about Databases. I got a couple of books (7 Databases in 7 weeks by Pragmatic Programming, and SQL and Relational Theory by CJ Date), I got the Lynda program I spoke of previously. I’ve also been doing some coding tutorials online like Learn SQL the Hard Way and SQL Zoo.

As many of my programming friends, colleagues and mentors predicted – the textbooks were my least favourite approach. In 7 Databases, I could only get through the intro and the chapters on Postgres before my eyes glazed over. The Relational Theory one I hope to read more of in the future, but I couldn’t get into it right now – I’m feeling like learning more practical skills for now, rather than theory. I wanted to know how to understand more about the database I’m using every day at work.

The Lynda program, however, was excellent -it gave me the introduction I needed, plus some stuff I didn’t even know I needed to know – e.g. Database Normalisation. Database normalisation seems to be the process of refining and improving your database. This is where you remove any duplications, separate out things which should be their own table, and just generally improve the database so that it is more sensible and easier to grow with. There are 3 main steps to go through to try and improve your database (apparently there are 6 normal forms, but the first 3 are the most important).

Note that normalisation is about removing redundancy, but it might not always be good for performance. Sometimes you might want to de-normalise for performance reasons.

First normal form – no repeating values or repeating groups

There should never be columns that have a variable number of values – i.e. 1,2 or 10. Each column in each row should have one and only one value.

e.g. if they have multiple phone numbers, a phone number column cannot have more than one more phone number for one person.

You also can’t have repeating groups.

e.g. phone number 1 and phone number 2 columns.

If you see this happening, you need to take info out into it’s own table. e.g. phone numbers table, which might be one to many with customer.

Second Normal form -refers to composite keys

Screen Shot 2015-03-11 at 7.50.20 PM

Every other column (i.e. the ones that aren’t composite keys), should be dependent on all parts of a composite key. His example was of a class schedule, and here you can see the title column, is only dependent on the course code, the date is irrelevant.

So you might want a separate table, which has just code and course name in it.

If you’re not using composite keys, you don’t even need to worry about this one.

Third Normal form

No non-key fields is dependent on any other non-key field

To follow this rule, you need to check whether any column value can give you an idea of what the another column value is?

e.g. room and capacity – they don’t need to be stored in the same table, because one room will always have a set capacity.

Another example is quantity, price and total. You don’t need total in there. If you want to, then maybe you should save it as a computed amount.


Leave a reply