The Analytics Setup Guidebook Review

Everything you need to build a scalable analytics platform in 2020

Mark MacArdle
Towards Data Science

--

I’m shocked to be telling you this next sentence: I read a free ebook from a company and actually loved it. I normally have a low opinion of free ebooks, seeing them as either overly long marketing pitches or too vague to be useful. For instance Snowflake’s For Dummies book on data warehouses is 60 pages long and yet is so dedicated to being abstract it never mentions Redshift, BigQuery or even Snowflake.

The Analytics Setup Guidebook from Holistics is a totally different story. It offers an overview of the different parts of the analytics stack: data warehouses, importing data, transforming and reporting it. (Note: it doesn’t cover more in depth applications like machine learning). Crucially it doesn’t just describe these parts abstractly, it discusses and compares the tools and services available today (it was published in July 2020).

Book cover, used with permission from Holistics

If you’re already familiar with modern analytics and data tools there won’t be much new here. Rather than break new ground, what this book aims to do is give a lay of the land overview on the different parts of the analytics stack and how they all fit together, as well as different tools available and how they compare. Where relevant, it gives historical information so you can understand where past approaches came from, why they were used and why they may no longer be appropriate. It really succeeded in its aim for me personally, greatly helping me organise things in my head.

I’d never heard of Holistics before this and have no affiliation with them. They’re a Singapore based analytics company that make a data transformation and reporting product that seems very similar to Looker. They do reference their own product in the book, but only where appropriate. Their blog is really good too.

Content

Chapter 1 - Introduction

  • High level overview of the analytics stack and what data warehouses are for.
  • Some basics set ups if you’re only getting started on building analytics infrastructure.

Chapter 2 - Data warehouses

  • Comparison of the most popular cloud warehouses (Redshift, Snowflake and BigQuery) and when to start using one.
  • That there are ETL tools like Stitch and Hevo which can be used to import data from most services.
  • They strongly recommend loading all raw data into your warehouse and doing any transforming inside it (ELT as opposed to the traditional ETL). Why ETL used to be the standard approach, data lakes, and why ELT is now a superior option is discussed.

Chapter 3 - Modelling data

(Modelling as in transforming or aggregating data rather than making predictive models).

  • The concept of the data modelling layer. This is something I’d heard mentioned a lot but hadn’t been clear on before. The tools available in the space (dbt, Looker, Dataform, Holistics) are covered.
  • Kimball modelling, why it was popular and what parts of his book, The Data Warehouse Toolkit, are still relevant. For example, Kimball had a few approaches to deal with slowly changing dimension tables (SCDs), but the modern approach is Maxime Beauchemin’s just-daily-snapshot-everything approach.
  • A real world example of how Holistics modelled event data from their website and when and why they made two more summarised views of it.

Chapter 4 - Analysing/reporting your data

  • Tells the story of the three waves of BI as 3 jobs a fictional data analyst has had since the mid ’90s. It covers what BI tools, warehouse infrastructure and company processes they would have had to use. This was one of the best parts of the book for me and more useful than Looker’s original post on this. It helped me understand where old tools sit and why some people are still very attached to them. This is especially useful as so many businesses are still in the second wave.
  • Current business intelligence tools are compared and how to categorise and think about them is explained. Tableau, Power BI, Looker, Holistics, Chartio, Mode, Sisense and more are discussed.
  • An arc of adoption is used to describe how analytics evolves in an organisation over time: ad hoc queries, then static reports and dashboards, then self service analysis for business users.

Chapter 5

Just the conclusion, doesn’t introduce anything new.

What could have been better

There were two things to do with modelling data that I wish were better addressed. I asked Holistics about these and they were kind enough to provide responses, listed below.

1) In the book data modelling is only spoken about in the context of using dedicated tool (eg dbt, Looker). What are the benefits of using a tool over the really simple approach of just saving SQL views in your warehouse?

Holistics response: In this twitter thread co-author Huy Nguyen describes 4 benefits:

  • Dependency tracking (lineage): modelling tools will give you visual dependency graphs.
  • Self service: tools like Looker and Holistics allow you define custom measurements that can be used by others in your company.
  • Data catalog: tables/models you make can be enriched at the time or later with metadata. This metadata could then be pulled into a discovery tool.
  • Version control: you can use git with them so a history of changes can be seen.

2) How to avoid duplicating business logic for a metric that needs to be calculated by end users, like average order value? Users may want to look at it by any number of splits so you can’t pre-aggregate it. Is using a tool that does both modelling and reporting (Looker, Holistics) the only way? What do companies using transformation only tools like dbt do?

Holistics response (taken from a direct message): This is related to self-service BI and sits at the reporting layer, so tools like Looker, Holistics and Metabase will solve this use case well. If you’re using only dbt, the simple solution is to pick one self-service BI tool like above. Do note that ‘data modeling’ sits at both the transformation layer (dbt, dataform) and reporting layer (Looker, Holistics).

Rating: 🐙🐙🐙🐙½

4.5 Octopuses. With its many arms this book will pluck your many disparate pieces of knowledge and neatly reorganise them into a well laid out and annotated scrap book for future use.

--

--