There are those who think that data modeling has become very passé’ these days. The belief is that because data modeling theory is over thirty years old and that since some data modeling tools have been around 10-20 years that somehow data modeling is now no longer relevant. Nothing could be further from the truth. In fact data modeling may now be more necessary than ever before.
While there are other modeling techniques and notations, such as BPM (Business Process Modeling) and UML (Unified Modeling Language), the need for accurately capturing the business data requirements and transforming that into a reliable database structural design is as paramount as ever. The key differentiator is that data modeling is the only technique and notation that focuses on the “data at rest”. All the others tend to focus more on “data in motion”. Another way to say this is that data modeling concentrates on issues that lead to a solid database design, while all the others tend to focus more on issues that will result in better application design or the things useful to programmers – such as data structures, objects, classes, methods, application code generation, etc.
I’ve personally been involved in several trials as an expert witness where plaintiffs were suing defendants for serious financial remuneration where custom database applications had performance and/or data accuracy problems. In every single case there was a failure to data model the business requirements. Thus the data effectiveness suffered. Moreover ad-hoc database design or database design using more programmatic oriented techniques and tools often resulted in inefficient database design. And no amount of coding trickery could overcome the resulting bad database design. So in every case, the plaintiff won.
The other reason data modeling has seen measurable resurgence is the data warehousing phenomenon. With cheap storage these days, most companies can afford to and benefit from retaining historical aggregate and/or summary data for making significant strategic decisions. With the accumulation of numerous source legacy OLTP (Online Transaction Processing) systems, there are two keys ways to approach populating a data warehouse: directly from source to warehouse (Figure 1) or through an intermediary database often referred to as an ODS or Operational Data Store (Figure 2).
There is already sufficient debate as to which approach is superior. I won’t address that here. However regardless of which approach one decides to utilize, the database design (i.e. the data at rest) is paramount. Because in a data warehouse the “data itself” and the business information it contains is the most relevant and valuable asset. Typical data warehouse queries and reports issued via BI (Business Intelligence) tools process that asset to yield key strategic decision making results.
The other key area where data modeling often supports the whole data warehousing and business intelligence effort is through the mapping of legacy data fields to their DW and BI counterparts. This meta-data mapping about how front-line business data maps to the data warehouse helps with the design of both the queries and/or reports, as well as with the ETL (Extract Translate and Load) programming efforts. Without such a mapping, as the OLTP legacy systems evolve there would be no automatic tie to the dependent data warehousing information. Hence one would have to almost totally re-engineer rather than simply follow the OLTP source data ramifications and ripples downstream to the DW and BI end points.
For those not involved with data warehousing projects, possibly just performing the more traditional OLTP type systems development, data modeling is still no less important. But often people get so caught up in novel paradigms such as “extreme programming”, “agile software development” or “scrum” that they compromise or even possibly entirely skip data modeling. The problem is that these new approaches don’t always spell out exactly how data modeling should be incorporated, thus people often skip doing it. My belief is that no matter what latest and greatest approach one utilizes, data modeling should be integrated into your development process wherever it makes sense. Figure 3 shows how both conceptual and physical data modeling should fit into an overall database design process - whether it’s for a totally new system, or one that’s simply being updated or even totally re-engineered.
There’s one final reason why data modeling has been getting more attention these days. In many cases organizations are finally requiring data models as a “sign-off” deliverable as part of the development process. I attribute this to organizations attempting to adhere to the Software Engineering Institute’s (SEI’s) Capability Maturity Model (CMM) and Capability Maturity Model Integration (CMMI) concepts. The idea here is quite simple: in order to mature ones’ development process regardless of technique, you need to mature in terms of both the processes and tools used to achieve the desired better end result. For example the proper and effective use of project management tools is often quoted as the single best way to escalate from Level 1 (Chaos) to Level 2 (Repeatable). The central idea that both processes and tools can lead to maturity has helped to reinvigorate many peoples’ interest in data modeling.
Now comes the hard part, what data modeling tool should one use? That might seem like a tough or loaded question. There are numerous data modeling tools out there. Plus there are many enterprise modeling suites that contain data modeling capabilities. Rather than advise any particular tool, I’m going to outline some basic guidelines of things to avoid. My belief is that any tool that meets some standard and minimal requirements will enable one to produce effective and efficient data models – and hence the resulting databases.
In closing, data modeling has come a long way since its inception. And even though the heydays of CASE and software engineering have passed with the 90’s – the need for and usefulness of data models has not subsided. And data modeling can assist with any effort regardless of development methodology or paradigm. So don’t pass up on data modeling these days just because it’s a mature technique. You might well be very sorry if you do.