Thoughts on Database Design

I am most known in the industry for my abilities related to writing automated calibration procedures.  I have created projects in VB, Lab View, MET/CAL®, MUDCATS, C#, C++, and RMB, just to name a few.  But I am also a pretty skilled database developer as well, having worked in MS Access, Dbase, PostgreSQL, My SQL, and SQL Server.  I still have my Access 1.0  on 3.5” floppy disks from the 1990s because it was the first software I ever spent my hard earned money on, setting me back a whopping 99 bucks.

At the time, I was an E-4 in the US Army stationed at White Sands Missile Range, NM.  I had been tasked with writing an inventory and calibration control database for the thousands of items we had on the EMI Testing Range.  They gave me a copy of Dbase, but I hated it.  So, I purchased a copy of Access instead.

What I learned way back then was how to create a relational database.  Not all the items we had on-site were calibrated.  In fact, I would speculate less than 5% required calibration.  The problem was the contractors running the site would buy anything and everything in triplicate.  We had thousands of items: test fixtures, computers, and equipment just lying around collecting dust.  It was my job to inventory and organize all the equipment.

I started out with a pretty basic single table database I named “Assets.” Since I am a lazy programmer, I quickly changed my design because my boss wanted supply information in the database.  I got tired of typing in the National Stock Number (NSN) nomenclature and supply codes multiple times.  This is where I first learned about “is a” and “has a” rules of object design.  The idea is pretty simple: for a column or field name to belong in a table, it had to pass the “is a” test.  For example, the asset is serial number 12345, so serial number should be a field in the Assets table.  The “has a” test is a little more complex because its objective is to determine what fields should be in other tables and then related.  In this case, the asset is a manufacturer does not make sense.  So, manufacturer should be in another table.

Using the “is a” “has a” exercise in creating a database can be quite exhausting and often produces conditions that pass both tests, for example, the asset is a model number.  By all rights, you could put the model number in the asset table, but the underlying value of the “is a” “has a” rules is for better database design.

Recently, I came across a database from a large, well respected company that seemed to break all the rules of good database design.  When my customer told me they migrated their 12.5 GB database into a 60 GB database, I had to ask WTF?  Most databases I have normalized actually shrunk in data, but never have I seen a database increase 4x in size.  Something had to be wrong!

So we popped the hood and looked at the tables and structure of the database, and it was as I expected—very poor table relationship management!  But it got worse… they were trying to do four different things in the core tables.  As the application put data into a table it only used 1/4th of the fields.  That left 3/4th of the table with DB NULLs.  DB NULLs take up space; NULL data in integer, double, and fixed length string fields use the same storage space as real data.  So, the 60 GB of data in the database was mostly empty space.

Now, I know not all the readers of Automation Corner want to be database experts, but I do have some simple advice when you are talking to someone about their database.  Ask them what to you call the table and how many fields does it have?  Then think about how many things you can logically associate with an “is a” relationship to the table.  If their number is bigger than 2x your estimate, just say “Very interesting” and buy a different database.

I have done this often in the past.  I asked one company “How many fields are in your Work Order table?”  They said “Over 255.  It is a pretty complex table.”  So, I thought, Asset ID, status, log-in dates, cal dates, technician, etc. and I could get to 25 or 30 fields  (x2 for benefit of the doubt), but there is no way I can get to 255+ unless you are storing all the standards information used in the calibration in the work order table.  If that is the case, then that means the maximum number of standards I can use in a calibration is limited to their database design.  BIG PROBLEM!

That database is going to have the same problem as the first database I mentioned.  It will get slow and clunky because it has to load records with “DB NULL” in memory.  It will take more space on the hard drive and more space in memory because it was poorly designed.

Metrology databases over the next few decades will be moving into and participating with Big Data Systems.  We have tons and tons of data that could add value to larger system. But to be usable, our databases will have to be fast and reliable!

Software product names mentioned here are for identification purposes only. All trademarks and registered trademarks are the property of their respective owners.