Taming Expanding Database Tables with The Entity-Attribute-Value Pattern

Learn how to avoid having long database tables with too many columns by applying a simple data modeling technique. In this post, we'll use the Entity-Attribute-Value pattern to inject flexibility into our architecture.

Data modeling skills are a fundamental building block in software engineering, and they have a substantial influence over the quality of the code we produce. As a developer in the consulting space, I get to help and learn from clients and their different types of architectures. A few months ago, my team ran into an interesting problem where we were able to use a simple data modeling technique in our implementation. It is formally known as the Entity-Attribute-Value model, and after several iterations of building features on top of this, we are incredibly pleased with how it has allowed our application to evolve. This can help to avoid a common data modeling anti-pattern. 

The Anti-Pattern 

If you have worked on legacy systems, the chances of having encountered this are high. It rears its ugly head in the form of a ridiculously long database table with dozens (even hundreds) of columns. Also known as the "God Object" anti-pattern (a reference to the massive object you will have as a result in your application, containing lots of properties and being responsible for too many things), this can happen due to poor data modeling practices and years of developers carelessly dumping columns onto the same table. The scenario under which this can happen is quite common, and it can happen to the best of us. 

How It Happens 

In the world of evolving requirements and aggressive deadlines, a database schema can quickly slip into this pattern if we do not pay attention to the warning signs. It starts as a normal object or entity with a core set of well-defined attributes. As time passes, requirements continue to dictate that we add properties to the same entity. These attributes are not what we would consider core attributes, but if they trickle along one at a time every month or so, engineers are more likely to drop a new column alongside their core attribute columns. Add a few months or years of development, and you have a table guaranteed to make any DB admin cringe. Let me show you how this would play out in a real application. 

Game App Example 

Let us assume we are building a simple game application. The main characters in the game are superheroes. Our client gives us the following requirements. For a superhero entity, we must store: 

  • Civilian name 
  • Superhero name 
  • Super strength level (1-5) 
  • Does it have a cape (true/false)? 

Initial schema

We are assured by our client that this will not change, and at most there will be one or two additional attributes in the future. Fast-forward 12 iterations, and we have added several more attributes like sidekick and butler_name to the super_hero table. The majority of these will only apply to a specific use case, so they will be null for 99% of the rest of the superhero records in the table. In mathematical terms, we have created a sparse matrix. Going back to our linear algebra fundamentals, we know that storing a sparse matrix "as-is" is extremely memory-inefficient because we have to store lots of cells with 0, null or "N/A" values.

Entity-Attribute-Value Model

This model gives us an efficient way to store a growing number of attributes. However, before using this model we must analyze our data, and see if current and future attributes fit either of the following conditions:

  • sparse: attributes that will only apply to a handful of rows, so they will be effectively empty for the rest (e.g. number_of_evil_siblings in our example above).
  • volatile: attributes that change regularly, grow quickly, or both (e.g. sensor readings in the IoT space, patient clinical records).

If your entity's attributes do not fall into either of these categories, this model may not be the right fit. For our superhero example, we can already see the sparseness of the attributes, so we can apply the Entity-Attribute-Value model by splitting our data into:

  • attribute definitions or types - e.g. butler_name
  • attribute values - e.g. J.A.R.V.I.S
  • entity being described - e.g. id=7 (Iron Man)

Here we need to analyze existing attributes and determine which ones should remain as core attributes, and which ones are more likely to be less common across all existing and future superheroes. The non-core attribute columns will be turned into attribute type rows and stored in a reference table (super_hero_attribute_type), and we will need to create another table to store the sparse attribute values (super_hero_attribute). Our schema now looks like this:

Despite having three tables instead of one, this schema is much more efficient and flexible:

  • All query results will always contain non-empty values.
  • Client code or queries will not have to worry about ignoring "N/A" values and can focus on more targeted processing.
  • New attributes become new rows instead of new columns (adding new columns is usually more expensive than adding new rows).
  • Smaller updates to internal model objects when new attributes are added.
  • Tables are easier to inspect visually.
  • As data patterns emerge over time, if a group of attributes needs to be migrated into a separate table, it can be done without ripping out columns (again, moving rows around is usually simpler than modifying existing columns).

Asking a client to know what their data will look like in a year is unrealistic. Priorities shift and consumer behavior drives the direction of features. Knowing simple data patterns like this can help us to introduce some flexibility and keep technical debt away.



Share this article
The Author

Related Insights

View All Insights

interested in working with us for your next project?

Let's Chat