Examples of using Infer with dbt
- 3 months ago
- 3 min read
There are many ways in which you can use SQL-inf within your dbt pipelines to power ML analytics in your data stack.
To illustrate some of the use cases we have build a demo dbt project showcasing a few of the core SQL-inf commands in a complex, real-life setting, namely an e-commerce marketplace.
To learn about potential use cases have a look at our Use Cases page.
For more details about SQL-inf itself, checkout out the SQL-inf Syntax reference.
All of our examples are contained in a single project, which is available in our dbt-infer-examples repo on Github.
We use the convention that if a model uses an SQL-inf command we prefix the model name with the command it uses. For example, we have a model named predict_customer_ltv
, which uses the PREDICT
SQL-inf comamnd to predict the customer ltv. As usual when developing with dbt, you should only build models for the things you want to persist.
When defining intermediary datasets it is often better to use CTEs that aren't materialised in the data layer.
Often when building models using SQL-inf you find yourself defining datasets that are only used as inputs to SQL-inf commands and nowhere else.
In these cases we define the inputs as a CTE define within the model and we like to use the convention that if the model using an SQL-inf command is called my_model
we call the CTE for the input data my_model_input
.
To illustrate what that might look like, here is a simple example of a predictive model, which we call predict_customer_churn
.
WITH predict_customer_churn_input AS ( SELECT (CASE WHEN end_date is NULL THEN True ELSE False THEN) as churn, age, gender, location, balance FROM customers ) SELECT * FROM predict_customer_churn_input PREDICT(churn)
Here we engineer a feature called churn
, based on whether an end_date
has been set for a given customer, which we then use as the target to predict the probability of customer churn.
Throughout these examples we use the same data set.
It is based on two Kaggle datasets of customer and seller data from Olist, a brazilian marketplace integrator - meaning they help smaller sellers connect and sell their products on larger marketplaces.
The combined dataset contains 11 tables containing seller data, customer and purchase data as well as reviews.
You can find the datasets here:
The SQL-inf models in the dbt-infer-examples repo splits into three categories. Here we list each with a short description and links to more details about each category and associated models.
These models show how to use the predictive and explainable AI capabilities of SQL-inf to analyse, predict and understand the drivers of metrics as well as events.
See the details here.
These models show how to use the similarity and clustering capabilities of SQL-inf to find similar records, sort records by similarities and cluster and segment records by similarities.
See the details here.
These models show how to use the text analysis capabilities of SQL-inf, like Sentiment
and Topics
to analyse unstructured text feedback.
See the details here.