Welcome to the latest issue of the data patterns newsletter. If this is your first one, you’ll find all previous issues in the Substack archive.
In this edition we’ll talk about data modeling and how to actually do it. I’ll also give some concrete examples from the world of SaaS businesses so you get a better idea how it works.
I’ve talked about data modeling in this newsletter before. You can read two of my previous posts about it here and here.
While I’ve covered the mechanics of doing the actual modeling with SQL, I’ve never talked about the most fun aspect: building the business blueprint. It’s very important for an analyst or data scientist to understand the business and this is the best way to do that.
A business blueprint is a simple document that explains how the business works by looking at its core processes. A business is made up of processes (e.g. Customer Acquisition) and these processes are made up of activities.
What are Activities?
Activities are like events but not as granular. While an event might be “clicked on button” or “loaded shopping cart”, the activity we care about is “customer started session” and “customer added product to cart”
Activities form the conceptual and logical layer of modeling, which is the most important part if you want to understand the business.
These activities can be expressed as RDF triples. RDF stands for Resource Description Framework and is used in knowledge graphs, but we’re not going down that rabbit hole today. Suffice it to say that we can express activities in terms of: “Who does What, Where, When, hoW, hoW much/many”
Let’s take a look at an example.
In the typical ELT modeling pattern you start with raw data you transform it to facts and dimensions tables for BI use cases and if you have other use cases — like say data science or feature store — you go back to the source and transform it again to whatever shape you need.
The benefit is that it’s easy to get started, quick to deliver results and you only model what you need without needing a big design upfront. But that doesn’t last.
Eventually the models start to get complex, there are thousands of tables strewn about the data warehouse and maintenance is a nightmare. It’s also quite brittle. Any small change in source data can cause ripples of problems downstream.
When you model with activities, you end up with a logical layer for business concepts that acts as an interface to raw data. This makes it much more robust to changes. It’s easier to maintain and you can use it for multiple purposes.
Essentially you go from Raw data => Activities => Entities + Attributes => metrics tables, feature store, etc. While this is harder to get started due to some design needed upfront, it’s overall easier to maintain in the long term.
Modeling SaaS Growth Accounting
Let’s take a look at an example. Growth accounting is used to measure the health of a SaaS business and see whether it’s growing.
You only need a few activities to model it:
Activities (from the CRM System)
Contact Signs Contract
Customer Renews Contract
Customer Expands On Contract
Customer Contracts On Contract
Customer Renews On Contract
Customer Commits To Contract Churn
Customer Resurrects From
Activities (from the Accounting System)
Customer Begins Subscription
Customer Incurs Overage On Contract
Customer Orders One-Time Service
Customer Is Active On Subscription
Customer Ends Subscription
These activities can be mapped directly to metrics:
The same activities can be turned into a traditional ER or wide table for reporting:
This makes activities very versatile since they can be used for multiple purposes (BI reporting, feature tables for ML/DS, customer 360 views, etc). At this point all you have to do is map these activities to your company’s raw data and you’re off to the races.
I’ve used activities to build my metrics playbook project, which I’ve written about here
If this is a popular topic, in future issues I’ll continue with other aspects of a SaaS business side of metrics and how to model them with activities.
Until next time.