What are the rules for table design heuristics for relational models?

by Dec 19, 2019

The dictionary definition of a 'heuristic' is ‘Computers, Mathematics: pertaining to a trial-and-error method of problem-solving used when an algorithmic approach is impractical.’. That definition means that this is what you do when you do not know what to do next.

The term ‘normalization’ comes from Dr. Codd, who borrowed it from the then-current political climate in which we were trying to normalize relations with the Soviet block. The goal is to remove redundancy from a database schema and to have the schema maintain some data integrity without procedural code.

These are the goals of all databases. When we had file systems, the same data repeats itself in many different files. Consequently, there was no reasonable way to bring the files together and create a consistent model of the data. Then we got the network model and the idea of a coherent, single picture of our data.

The relational model of Dr. Codd in 1970 went one step further. It is possible to formalize the relational model, and we even got the Axioms of Armstrong and mathematics! The goal of normal forms is to avoid certain data anomalies that can occur in unnormalized tables. Data anomalies are easier to explain with examples. When Dr. Codd defined the relational model, he gave 0 to 12 rules for tables (Yes, there is a rule zero.). Some of them are important for table design heuristics so that it is good to get a feel for them.

The most important one for a working programmer is The Information Rule. That rule requires one to represent all information in the database in one and only one way, namely by scalar values in columns within rows of tables.

The problem is people do not know what ‘scalar’ means. Consequently, they want to use structured data in the form of arrays and CSV lists. No! They did not get a class on scales and measurements. It is how they handled data in an old FORTRAN or COBOL program, so why should it change now?

The third rule of Dr. Codd is the ‘Systematic Treatment of NULL Values’ in the data model. SQL has a NULL that is used for both missing information and inapplicable information. However, you have to decide what each a NULL means as part of the data model. The heuristic is to declare every column as NOT NULL, then go back and deliberately choose to make it NULL-able.

Read the 23-page whitepaper “Normalization Heuristics by Joe Celko to learn about the rules for table design heuristics for relational models.

Click here to read the whitepaper.

The author, Joe Celko, serves as Member of Technical Advisory Board of Cogito, Inc. Mr. Celko joined the ANSI X3H2 Database Standards Committee in 1987 and helped write the ANSI/ISO SQL-89 and SQL-92 standards. He is one of the top SQL experts in the world, writing over 700 articles primarily on SQL and database topics in the computer trade and academic press. The author of six books on databases and SQL, Mr. Celko also contributes his time as a speaker and instructor at universities, trade conferences, and local user groups.