Welcome to this edition of the Data Patterns newsletter. Just like last time, there will be a career corner and then the tech corner.
In this edition we’ll talk about getting really good at technical skills (SQL in this case but the advice applies to most skills in the data field). Then I’ll share my usual SQL pattern of the week (SQLPOW)
Strap in! This is going to be a good one.
Career Corner
“Do you have any resources for learning advanced SQL?”
I get this question a lot, and honestly I haven’t had a good, useful answer for it.
You see, most people assume you can get good at a technical skill by practicing it. That’s true but not in the way you’d think.
The idea of deliberate practice (aka 10,000 hours) is very popular. But practicing with random exercises doesn’t lead to long term retention. What you remember best is when you associate a problem with a specific solution
The only way I’ve gotten really good at SQL has been by solving real world problems. Problems have a way of teaching you things that no course or practice exercise ever can.
But these problems are hard to come by outside the context of a real business given that many of them occur when dealing with really large or really strange and messy data.
The closest you can get to that is by designing or finding an interesting project at work or in your spare time. I’ve already talked about that approach in a previous newsletter which you can read here
My mission is to help you get so good at SQL you end up loving it as much as I do and to achieve it, I believe I’ve found another way.
It involves studying patterns and seeing them in action.
I’m obsessed with patterns. They help you build on the combined experience of practitioners and codify best practices that have been battle-tested in the real world. Every pattern deals with a specific, recurring problem in the domain and provides proven solutions.
When experts work on a particular problem, it’s unusual for them to tackle it by creating new solutions. They often recall similar problems they (or others) have already solved and reuse the essence of the solution. That essence is codified as a pattern.
My book, which many of you already own, covers some of them.
To give you an idea of how studying these patterns can be useful to you, here's an example from one of my customers.
This is a quote from an email Alexandre sent me as a reply to my question on the usefulness of the book:
The book helped me approach SQL in a broader perspective and revisit the key concepts that really matter when writing production grade code, as well as patch many fundamental gaps in my knowledge. The day after reading about accidental INNER joins I helped a colleague debug a query by immediately spotting the same issue.
By simply reading about a pattern I had noticed many times in my career, Alexandre was able to immediately spot it in the wild. I bet he never forgets it.
I really believe in this method of teaching SQL by exposing you to patterns and I’m doubling down on it. I’m launching a new project that’s related to this. I don’t have anything specific to share yet, but make sure you subscribe and stay tuned to find out when I do.
SQL Pattern of the Week (SQLPOW)
This week we’ll continue analyzing the same SQL query we saw last time. You can read about that here. And you can find the code here.
This is more of a “mini” pattern I often find myself using when calculating ratios. By default many SQL engines perform integer division when both numbers are integers. The quickest way to force a decimal point division is by multiplying with 1.0
1.0*sum(c2.inc_amt)/cs.users as cum_amt_per_user
Also notice how the author is using a really outdate method of joining the two cohorts
CTEs. I DO NOT recommend this. You need to be very precise and actually spell out how you’re joining them by using the inner join
keyword
Since that wasn’t very satisfying, here’s another pattern, which is actually a general problem solving algorithm you can apply to other fields.
When I’m solving a problem with SQL from scratch there’s a few patterns I always find myself using. It doesn’t matter if it’s a long query or a short one, here’s what I do:
1. Solve the problem for a few rows then generalize. It’s a lot easier to solve a problem for a few rows that cover some or all of the use cases than to solve it all at once. The tight feedback loops will instantly let you know if your solution works or needs refinement.
Do you need to return two years worth of data to calculate a tricky metric? Solve it for one week first then expand. I often will even mock up fake data to see if the solution works.
2. Visualize the shape that data is going to look like in the end and find the right granularity first. Before diving directly into a solution, take a moment to draw (or visualize) the final shape of your data. Then look at what you have and map out the steps to get there.
3. Don’t optimize your query too early. Focus on getting the right solution first then worry about optimization. If you try to do too much upfront you might feel overwhelmed and struggle to find the perfect solution.
Until next time.
I’ve been finding that setting unique and not NULL tests on the expected primary keys  of the output of my queries has been really helpful (I use dbt to abstract the act of actually implementing the tests, but in theory you could do that by writing the SQL code to test it)
TDD should be a method available to analysts and data engineers, not just software engineers 😁
Insightful