Tableau Prep Explained in under 10 mins!
How do you make data prep visual? Here's how Tableau Prep tackles it in under 10 minutes.
- Tableau Prep treats data prep as a journey from inputs to outputs, with cleansing, joins, unions, pivots and aggregations codified as visual steps
- Compared to VLOOKUPs in Excel or SQL joins, Prep keeps everything visual and shows matched versus unmatched rows with colour coding at each step
- A typical flow connects files, adds clean steps to check data, drags one step onto another to create a join, builds calculated fields, removes unneeded columns and outputs to CSV, database or Tableau Server
- You can name and add descriptions to steps so flows are well documented for others, and fuzzy matching makes grouping a one-click operation
- From 2020.3 you can write Prep output to databases, and with the Data Management add-on and Prep Conductor you can schedule and govern flows on Tableau Server or Online
Introduction and visual data prep
This is the first of many sketch notes explaining the products in Tableau’s platform, and there’s no better place to start than Tableau’s newest product, Tableau Prep Builder. If you’ve seen my 10 minute explainer on the whole Tableau platform, you’ll remember the core innovation that runs across every product: visual analysis. That feels obvious when we talk about data visualisation, but data prep is a tougher challenge. How do you make data prep more visual? That’s exactly what Tableau Prep sets out to do, empowering everyday business users to cleanse and transform their own data.
What data prep actually means
Businesses store data in lots of different places, so retrieving it throws up a range of problems. You might have static files on your desktop that need to be brought together with database data. You might have data stored at different levels of aggregation, making accurate comparisons difficult. You might just need to do some grouping to create a new product or business category.
Many of these issues are documented and handled in the data warehouse, but plenty only surface when you start visualising the data in Tableau. Desktop users are often the bearers of bad news when they visualise data that has never been visualised before. Add the explosion of data from digital and IoT systems, and it’s clear businesses can’t rely solely on data warehousing teams to handle cleansing and transformation, especially when the analytical need is generated by the business users themselves.
Existing tools and their limits
Tableau is not the first to recognise this problem. SQL, used by nearly all traditional databases, was designed to standardise how we store and manage data. Visual tools such as Alteryx already exist, and you can do advanced statistical computing in R or Python. So what problem is Tableau Prep actually solving?
Comparing two lists: Excel and SQL
Take a simple example: comparing two lists to find matching items. In Excel you’d start with two sheets and reach for a VLOOKUP, probably heading to YouTube to remind yourself of the syntax. A few lookup values, table arrays and column indexes later you have your formula. In SQL you’d write something fairly short. Simple for some, but for others it reads like a foreign language, and how many data prep problems are genuinely that simple? Add complexity and in Excel you’re calling your VBA or macro guru, while in SQL you’re raising a support ticket to your DBA and data warehousing team.
How Tableau Prep is different
Tableau Prep takes cleansing and transforming data and makes it simple for everyday business users. It imagines data prep as a journey, with data inputs at the start and outputs at the end. Along the way you carry out a series of steps: cleansing, aggregation, pivoting, joins, unions and more. All of these are codified using visual indicators, and they stay visual the whole way through. Prep always shows you your data in a clean, summarised interface, along with a summary explaining how each operation affects your data. Take two inputs, connect them, apply a join, and Prep shows you how many rows matched and how many didn’t, alongside a table view of the result, ready to save to a file.
Live demo: building a flow
We grab our two lists and drop them onto the canvas. Then we add two clean steps to check the data is correct, with a summary appearing in the view below. To join the files, we drag one step onto the flow above and a join is created automatically. Prep shows which type of join we’re doing and the result on the bottom left, with a summary on the right colour-coding the matched and unmatched rows by which side of the data they came from.
We add a few more clean steps for some basic prep, creating a new total field combining the amount fields from both lists, then removing the three columns we no longer need to tidy the dataset up. We name the steps so others can follow the flow, and add descriptions where the name alone isn’t enough. With a clean, well-documented flow, we’re ready to export. There are several output options: publish to Tableau Server, write to a database, or for static files simply export to CSV.
Advanced features and sharing
All the way through, Prep focuses on always being visual, giving you a clear picture of your data at every step. It builds in smart tools like fuzzy matching to make grouping a one-click operation, while keeping a familiar Tableau interface. New data sources are continuously added, and you can share your prep data sources to Tableau Server as files or, from 2020.3, write directly to databases to stay in the flow of your analysis. With the Data Management add-on and Tableau Prep Conductor, you can schedule flows to run in the background on Tableau Server or Tableau Online, with the same oversight, security and governance you’re used to.
Closing thoughts and resources
As a consultant who’s worked in business intelligence for the last seven years, I genuinely enjoy using Tableau Prep. It’s new and needs some polishing around the edges, but it changes with features and patches nearly every month and takes a truly innovative approach to common data challenges. You can also expect the same vibrant, helpful community you’ve come to know in the Tableau and Salesforce communities. If you’re getting started, check out Tableau’s getting started videos, my own playlist on this channel for monthly feature updates, and the Tableau forums if something is vexing you.