Data Modeling Remains Relevant

There are those who think that data modeling has become very passé’ these days. The belief is that because data modeling theory is over thirty years old and that since some data modeling tools have been around 10-20 years that somehow data modeling is now no longer relevant. Nothing could be further from the truth. In fact data modeling may now be more necessary than ever before.

 

While there are other modeling techniques and notations, such as BPM (Business Process Modeling) and UML (Unified Modeling Language), the need for accurately capturing the business data requirements and transforming that into a reliable database structural design is as paramount as ever. The key differentiator is that data modeling is the only technique and notation that focuses on the “data at rest”. All the others tend to focus more on “data in motion”. Another way to say this is that data modeling concentrates on issues that lead to a solid database design, while all the others tend to focus more on issues that will result in better application design or the things useful to programmers – such as data structures, objects, classes, methods, application code generation, etc.

 

I’ve personally been involved in several trials as an expert witness where plaintiffs were suing defendants for serious financial remuneration where custom database applications had performance and/or data accuracy problems. In every single case there was a failure to data model the business requirements. Thus the data effectiveness suffered. Moreover ad-hoc database design or database design using more programmatic oriented techniques and tools often resulted in inefficient database design. And no amount of coding trickery could overcome the resulting bad database design. So in every case, the plaintiff won.

 

The other reason data modeling has seen measurable resurgence is the data warehousing phenomenon. With cheap storage these days, most companies can afford to and benefit from retaining historical aggregate and/or summary data for making significant strategic decisions. With the accumulation of numerous source legacy OLTP (Online Transaction Processing) systems, there are two keys ways to approach populating a data warehouse: directly from source to warehouse (Figure 1) or through an intermediary database often referred to as an ODS or Operational Data Store (Figure 2).

 

 

There is already sufficient debate as to which approach is superior. I won’t address that here. However regardless of which approach one decides to utilize, the database design (i.e. the data at rest) is paramount. Because in a data warehouse the “data itself” and the business information it contains is the most relevant and valuable asset. Typical data warehouse queries and reports issued via BI (Business Intelligence) tools process that asset to yield key strategic decision making results.

 

The other key area where data modeling often supports the whole data warehousing and business intelligence effort is through the mapping of legacy data fields to their DW and BI counterparts. This meta-data mapping about how front-line business data maps to the data warehouse helps with the design of both the queries and/or reports, as well as with the ETL (Extract Translate and Load) programming efforts. Without such a mapping, as the OLTP legacy systems evolve there would be no automatic tie to the dependent data warehousing information. Hence one would have to almost totally re-engineer rather than simply follow the OLTP source data ramifications and ripples downstream to the DW and BI end points.

 

For those not involved with data warehousing projects, possibly just performing the more traditional OLTP type systems development, data modeling is still no less important. But often people get so caught up in novel paradigms such as “extreme programming”, “agile software development” or “scrum” that they compromise or even possibly entirely skip data modeling. The problem is that these new approaches don’t always spell out exactly how data modeling should be incorporated, thus people often skip doing it. My belief is that no matter what latest and greatest approach one utilizes, data modeling should be integrated into your development process wherever it makes sense.  Figure 3 shows how both conceptual and physical data modeling should fit into an overall database design process - whether it’s for a totally new system, or one that’s simply being updated or even totally re-engineered.

 

 

There’s one final reason why data modeling has been getting more attention these days. In many cases organizations are finally requiring data models as a “sign-off” deliverable as part of the development process. I attribute this to organizations attempting to adhere to the Software Engineering Institute’s (SEI’s) Capability Maturity Model (CMM) and Capability Maturity Model Integration (CMMI) concepts. The idea here is quite simple: in order to mature ones’ development process regardless of technique, you need to mature in terms of both the processes and tools used to achieve the desired better end result. For example the proper and effective use of project management tools is often quoted as the single best way to escalate from Level 1 (Chaos) to Level 2 (Repeatable). The central idea that both processes and tools can lead to maturity has helped to reinvigorate many peoples’ interest in data modeling.

 

Now comes the hard part, what data modeling tool should one use? That might seem like a tough or loaded question. There are numerous data modeling tools out there. Plus there are many enterprise modeling suites that contain data modeling capabilities. Rather than advise any particular tool, I’m going to outline some basic guidelines of things to avoid. My belief is that any tool that meets some standard and minimal requirements will enable one to produce effective and efficient data models – and hence the resulting databases.

 

  1. Avoid drawing tools that aspire to be data modeling tools. A good data modeling tool supports defining tons of meta-data that has business relevance. Think of the diagram as just the tip of the iceberg – where 90% of the berg mass what you don’t see as it’s underwater. The same is true for data modeling. If you concentrate on just what the picture is, you’ll probably compromise the effectiveness of the resulting database.
  2. Choose a tool that fits your needs. Often people will purchase a killer modeling tool that offers everything imaginable. But if all you need or will use is the data modeling portion, then why pay for more. The key concern here is that the more any tool does besides data modeling, the more chance that its data modeling capabilities may have been compromised in order to do everything else. Sometime more is not better.
  3. Verify that the data modeling tool you choose generates solid DDL, especially for the target database platform. This is another case where more might not be better. It will be better if your tool supports 100% accurate CREATE or ALTER scripts for a few databases important to you than one that does them all at a lesser level. But be very careful, the DDL generated by many tools, even those that focus on a few databases, can often generate less than optimal DDL. You have to know what to look for – so engage your DBA in making the decision just to be safe.
  4. Verify that your data modeling tool provides robust model consistency and accuracy checking reports and/or utilities. As data models grow (and they will), it can be quite overwhelming to have to manually check everything. And you cannot expect the poor DBA to sanity check the thousands or tens of thousands DDL lines a data modeling tool can quickly generate. Remember, above we talked about both effectiveness and efficiency. Effectiveness is mostly on your shoulders, but efficiency can be aided by good data modeling checking utilities.

 

In closing, data modeling has come a long way since its inception. And even though the heydays of CASE and software engineering have passed with the 90’s – the need for and usefulness of data models has not subsided. And data modeling can assist with any effort regardless of development methodology or paradigm. So don’t pass up on data modeling these days just because it’s a mature technique. You might well be very sorry if you do.