Modeling Data for SOMA Metrics
Consulting Chronicles #2 - How to model data for SOMA for a B2B SaaS company
Hello and welcome to the latest issue of Data Patterns. If this is your first one, you’ll find all previous issues in the archive.
In our last issue we started implementing SOMA for a B2B SaaS company. You can read that here. Like I mentioned in that post, while the company is fake the consulting is real.
To give you some context, SoFake is a B2B SaaS company selling a data tool that costs $150/mo per seat. Since they’re just starting out, their main focus is sales.
SoFake has a sales team of 4 account executives (AEs) who are constantly booking meetings, showing demos to potential clients and closing deals. SoFake has almost no analytics in place so this is a greenfield project for us.
Last time around we identified the most important activities we need to track in order to model data for the sales metrics.
But before we dive into that, let’s take a look at the SOMA architecture.
SOMA defines the Metrics and Terms and then suggests Activities and Entities for the modeling layer. The metrics are defined in JSON to allow for maximum flaxibility. You can see all the B2B SaaS metric definitions in our Github repo at: somastandard.com
Today we’ll look at some SQL code I wrote to model this data.
Data Modeling with Activities
SOMA recommends activities for a number of different reasons:
Activities are very intuitive to business stakeholders. They’re easy to reason about and onboard onto.
Activities are faster to implement. Many operational systems (e.g. web analytics) already provide a stream of events that can easily be turned into activities
Activities are simpler to audit and composable. They allow you to build an immutable ledger that tracks an entity (e.g. customer) across multiple systems.
Activities simplify data pipelines. You can build entities from them if you like or go directly from activities to metrics
For our sales activities we rely on the CRM as defined in the SOMA spec. SoFake uses HubSpot for their CRM so we’ll look to map raw data from the `deals` and `deals_history` tables primarily.
They key to modeling with activities is the AcvitySchema developed by Ahmed Elsamadisi at Narrator. I’ve written about it before on my blog so I won’t go into too much detail.
This is the schema we need to implement. Below you’ll see the actual SQL code I’ve written. At this time the code isn’t public yet, but if you want it just reply back and I’ll send it to you.
Here’s the implementation of Account Executive Opens New Business Opportunity activity. Here the `deal_id` is our entity which we get directly from the `deals` table. We also join in a table called `hs_current_pipeline_value` which traverses tables Deal => Quote => Line Item to pull the correct ARR amount for the expected deal amount.
Here’s the code for hs_current_pipeline_value. The table `associations` is custom to our ETL implementation. We’re writing our own pullers vs relying on open source libraries like Singer or commercial tools so we pull data directly from the HubSpot API.
Here’s the implementation of Account Executive Updates Opportunity Close Date activity. We’re using the `deals_history` table filtered to the property `closedate` to get an immutable ledger of changes.
The history table is usually available in popular CRM tools like HubSpot and SalesForce, it just needs to be enabled. If it’s not available or possible to get from the operational system, it would have to be implemented via incremental models in the warehouse which is a lot trickier. I won’t get into that here.
Here’s the implementation of Account Executive Updates Pipeline Value activity. Again we’re using the `deals_history` table but the logic is a little more complicated because we need to get data from `quotes_history` which is embedded in the model `hs_historical_pipeline_value`
Normally we can get this from the `dealamount` column in `deals` but SoFake bundles the ARR with the set-up fee so we need separate line items.
Here’s the code for hs_historical_pipeline_value:
Here’s the implementation of Account Executive Wins Deal activity. Here we need to ensure we get the earliest timestamp so we can avoid duplication. Why is this needed? After all flipping the deal to `closedwon` should be the final stage and can no longer be changed right? In theory it should be, but in practice it might not.
This is where the specifics of the implementation come into play. If your CRM admin was diligent, they would make that stage final and unchangeable. But oftentimes exceptions start to creep up. Account execs might close a deal by mistake and would very much like to change it back please, so we have to code defensively.
Here’s the implementation of Account Executive Loses Deal activity. We do the same thing as above and get the earliest timestamp.
That’s it for today. We’ll get into implementing the metrics next time around.
Until then.
I'd be interested in previewing the B2B SaaS models and testing out. I see the git repo for the SOMA metrics but don't see anything to relates to the last article