Welcome to the latest issue of the data patterns newsletter. If this is your first one, you’ll find all previous issues in the archive.
In this edition we’ll talk about my favorite subject, data modeling. Then on the SQL POW section I’ll share another article I wrote.
As I’ve mentioned before, modeling data is all about giving meaning to meaningless bits and bytes. I love data modeling, it’s my favorite topic to discuss. I’ve gotten into quite a few online quarrels about it.
But data modeling has lost its luster. It’s become uncool. Why?
I believe the biggest issue with it is its perceived size of the effort. Data models tend to be very complex and require what’s known as BDUF (big design upfront)
Open any book about data warehouse modeling (e.g. Kimball’s dimensional modeling book or Linstedt’s Data Vault 2.0 book) and you’ll notice how both require that you know everything you need to build before you start building it.
This goes counter to software engineering principles of agile and rapid iteration, and it’s the reason so many data warehouse projects fail or become too big.
The proponents of the modern data stack take the opposite approach. They eschew BDUF modeling in favor of building stuff rapidly and delivering value quickly. This works great in the beginning, but without a design, things devolve into an unwieldy mess of dependencies and spaghetti dbt code. I’ve witnessed this more than once.
So what can we do? Is there a solution to this problem?
I think there is. We need to borrow a few concepts from software engineering to do it. What we need is a way to rapidly iterate while also being able to pay down technical debt.
Enter SQL refactoring.
I’ve already covered the basics of SQL refactoring in a previous edition:
For those of you who are interested in studying advanced SQL patterns, I’ve written two articles on my blog.
How to apply modularity to SQL In this article I break down some of the patterns you need to learn to write production ready, modular SQL code that makes your queries easy to read, understand and maintain. It covers three core principles: DRY, SRP and moving logic upstream. Read this before reading the second one.
Refactoring SQL - Level 1 In this article I go through an example of taking a working SQL query and transforming it so the result is the same, but the query is easier to read and maintain and perhaps a tad more efficient. It applies the 3 principles from the first article, so read that first.
The key idea here is to refactor as you go.
Keeping in mind the 3 core principles of SRP, DRY and moving logic upstream, you can take a more relaxed approach to modeling. You can get going quickly delivering value to stakeholders while carving out time internally for refactoring and modeling as you go.
Of course this method isn’t perfect for all cases. If you have a complex dbt repo, you might want to do more planning and design upfront.
Dbt refactoring isn’t for the faint at heart. It might require that you change models that have many dependencies, so it only works if the team is empowered to make changes. If not, things will rot and models will become unmaintainable.
SQL Pattern of the Week (SQLPOW)
For those of you who are interested in studying advanced SQL patterns, I’ve written a new article on my blog.
How to Model Data by Refactoring SQL - Level 2. In this article I go further into refactoring SQL through dbt. I show you two examples of where the 3 core principles could be applied to a tool like dbt.
Until next time.
in case you are like me, SRP=single responsibility principle https://en.m.wikipedia.org/wiki/Single-responsibility_principle