What’s Normal about Data Normalization?!


This is the third in our series of requirements modeling. In the Modeling Overview we defined what models are and what they are used for. We explored the close relationship between requirements elicitation and modeling, and introduced the concept of Concurrent Requirements Modeling and gave a very brief explanation of each. In Data Modeling Part 1: Just Don’t Show Me Any Crows Feet! We provided ways to translate the complexity of an entity/relationship diagram (ERD) into business language. Now we delve into data normalization and why those of us who do business analysis work need to understand the fundamentals.

Years ago a consulting recruiter tried to convince one of the authors (Elizabeth) that I would be great at facilitating data modeling sessions. At that time I knew only a little about facilitation and nothing about data or data modeling. He took me over to a large poster of cows and horses or pigs and sheep—I don’t remember any more. Next to these animals were five rules of normalization. What I do remember is that although I could read every word, I had no idea what the words meant. I had no context and it was all gibberish to me. Years later I did learn data modeling and the concepts of normalization, but it sure would have been easier if instead of pigs and cows, the poster had meaningful words that made sense even to a novice data modeler. In this article, we won’t use farming analogies. And we promise to translate the gibberish into understandable concepts that you can put into practice right away.

Data Normalization

What is a data normalization? Normalization ensures that each attribute is assigned correctly to its proper entity. And that the attribute does not repeat within an entity and is fully dependent on its primary key and nothing but the key. Make sense? If not, let’s translate the gibberish into business language.

  1. An entity is business information that business stakeholders care about.
  2. An attribute is a fact about an entity.

Data Normalization: EntitiesUsing a Customer as an example, the business cares about its customers, which is good because in this example we are a customer! The example Data Normalization: Attributescompany needs to know information about its customers, such as the name of each customer. This is just one fact or attribute of a customer. There are lots of facts about customers. Some of these facts are required, which means we cannot leave the screen or web page without filling out this information. Other facts are optional. In other words they are helpful, but not essential. Customer name is probably a required fact, whereas customer email or phone number might be optional, depending on the application.

What does it mean to correctly assign an attribute? Some information about the Customer is less about the Customer and more about another piece of business information, like Order. For example, as a Customer we place an order on September 20. Let’s call this fact an order date. True, there is some relationship between the customer and the Order that is placed, as well as when the Order was placed. But the order date is a fact that is more related to the Order than it is to the Customer. So let’s create another container of information that the business cares about—an Order. We now have two entities: Customer and Order. Customer name belongs with the Customer. [1] Order date belongs with the Order. Placing the order date with Customer means that the order date attribute is not assigned correctly; placing it with Order means that it is.

Why in the world would we ever care about where we place our attributes? Our answer is that it’s easier and quicker to create, change, and delete business information when attributes are correctly assigned to entities. In our example a customer can place orders on a website. But to receive the items in the Order, we need to provide my customer information or profile. In other words, we need to create a Customer profile and it’s easier not only to create the profile but also change it, if all the customer information is together, rather than mixed up with my Order information.

Let’s say that one of us places another Order on October 1. If the order date were assigned to the customer, it would be harder to view order history. There would be Customer information and Order information all mushed together. Multiple customers would have multiple orders, and it would be harder and would take longer to find a specific order for a specific customer.

Benefit of normalization. In summary, normalization means that each fact about a piece of business information is grouped together. In our example Customer information is grouped together, as is Order information. When customers select Items, all the information about the item is grouped together and related to the Order. Here are the main benefits of grouping like information together:

  • It’s faster to create, change, and delete information
  • It’s faster to retrieve information. In a future article, we’ll talk about getting data from multiple sources in a business analytics type of application.
  • We can avoid repeating groups. Repeating groups?? Sounds like more gibberish, so stay tuned for Data Modeling Part 3, a business explanation of the five normal forms.

Authors: Elizabeth Larson and Richard Larson, Watermark Learning

Elizabeth Larson, CBAP, PMP, CSM and Richard Larson, CBAP, PMP are Co-Principals of Watermark Learning, a globally recognized business analysis and project management training company. With over 30 years of industry experience each, they have used their expertise to help thousands of BA and PM practitioners develop new skills. Their speaking history includes repeat appearances at IIBA and PMI Global Congresses, chapter meetings, and professional development days, as well as BA World conferences.

They have co-written the acclaimed CBAP Certification Study Guide and The Practitioners’ Guide to Requirements Management. They have been quoted in PM Network and CIO magazine. They were lead contributors to the BABOK Guide® Version 2.0, as well as the PMBOK Guide® – Fourth edition.

[1] Attributes should be atomic or at its lowest level (e.g., last name, first name, middle initial vs. customer name) and not derived as date of birth (mm/dd/yyyy vs. age).

Like this article:
  3 members liked this article


Only registered users may post comments.



Copyright 2006-2023 by Modern Analyst Media LLC