Null explanation / detection in Ask data: New in Tableau 2021.3
Tableau 2021.3 can now flag when nulls are skewing your aggregations, but actually triggering that alert turned out to be a troubleshooting saga.
- Explain Data only works on aggregated marks, so nulls won't be detected on a scatter plot where individual data points render null and disappear from the view.
- The new 'percentage of nulls' explanation type can be enabled and filtered to via the Explain Data configuration wheel under explanation types.
- Visual grouping and Tableau's automatic 'other' grouping can break Explain Data, so manually grouping records works more reliably.
- Double-clicking the generated field (or control-dragging it into the calculation window) reveals the underlying SUM(IF null then 1 else 0) calculation Tableau builds on the fly.
- Tableau's documentation describes a 'higher than expected amount of missing data' but never defines the threshold, so a dataset deliberately full of nulls may normalise and fail to trigger the alert.
- Building a null-heavy test dataset0:14
- Connecting the CSV in Tableau1:03
- Why scatter plots don't trigger Explain Data2:13
- Visual grouping attempts and failures3:04
- Enabling the percentage of nulls type4:55
- Manual grouping finally triggers it7:09
- Inspecting the underlying calculation9:38
- Testing lower null percentages10:18
- Checking the documentation for a threshold11:49
0:00Hey, it's Tim here. In Tableau 2021.3,
0:02Tableau have added a new capability which
0:05allows Explain
0:05Data to realize when they're null data
0:07types causing issues with aggregations and
0:10therefore
0:10sort of alerting you to problems you might
0:12have that are being caused by nulls. Now in
0:15order to
0:15show you this, I'm actually going to go to
0:17a website called Markeroo. I'll put it up
0:18on screen now
0:19and you can see that I've actually built a
0:21null data set. Now I don't know how many of
0:23these I
0:23need, I actually probably only need one of
0:25them to be fair, but what we've done is we
0:27've essentially
0:27created some ID numbers and then we've got
0:30a progressively sort of decreasing level of
0:33nulls
0:33in each of these columns. If I actually
0:35look at the preview you can see what that
0:36ends up looking
0:37like and you can see that essentially the
0:39final column has very few nulls but the
0:40first column has
0:42tons of nulls. This is sometimes how data
0:44comes out of systems so those nulls are
0:47technically
0:47empty but when you process nulls in aggreg
0:49ations they can cause errors, they can cause
0:51problems,
0:51especially if you try and do things like
0:53calculating averages Tableau will actually
0:55skip
0:55past the nulls and therefore you'll get the
0:58wrong average if you maybe needed to
1:00replace those nulls
1:00with zeros. So let's take a look and see
1:02how that works in Tableau. I'll go over to
1:04Tableau here and
1:06I'm going to connect the CSV. I've already
1:08downloaded it onto my desktop so I
1:10downloaded
1:10the file from Markeroo. I'm just going to
1:12go to the text bar here and connect to my
1:14null test data
1:15and open it up. Now that we've opened it up
1:18you can see a couple of things. What's
1:20really
1:20interesting with this release is Tableau
1:22sort of changed this data connection window
1:25not really
1:25said anything so this always hints that
1:27something else is coming in the future and
1:29I think it might
1:30be related to a recent announcement they
1:33had about the enterprise sort of version of
1:35Tableau,
1:36this new version of Tableau that they're
1:37talking about. I never really understand
1:39these styles
1:40speak it's sort of weird it's like a big
1:42press release with no real sort of product
1:45feature to
1:46go with it. The sort of like the screenshot
1:48they're showing is a version of Tableau
1:50that I've never
1:50seen in a beta or a pre-release so while
1:52they're coming out with it now and they've
1:54got nothing to
1:55show have no idea but nonetheless here we
1:57are you can see there's a new element here
2:00you can kind of
2:00do some interesting things with it we can
2:02collapse it let's not get distracted by
2:04that let's get stuck
2:05into this null data set you can see the
2:07nulls are registering here in Tableau so
2:09let's go ahead
2:10to sheet number one and let's open this up.
2:13Now one of the things I'm going to do here
2:15is just build
2:16a very simple scatter plot but I've got to
2:18be careful because if I build a scatter
2:20plot using
2:21my ids or this explain data feature is not
2:23going to work because the explain data
2:25feature is only
2:26working when we're aggregating data points
2:29so let me just tell you what I mean by that
2:31let's just
2:31grab some data points here so I'll grab the
2:3450 percent nulls and this 50 80 percent now
2:37when I
2:38click on this blue dot the reason this won
2:40't trigger any null explanation type is
2:42because
2:43there is no null aggregating to this one
2:45data point this feature only works when
2:47there are
2:48nulls being aggregated inside of a data
2:50point and so by definition this view this
2:53scatter plot
2:54is not showing me any nulls because they're
2:56all not in the view if something renders
2:58null then
2:58it doesn't render in the view so what I
3:00need to do is aggregate these points up in
3:02some way or form
3:04now you could do that visually you could
3:05decide to do this so I could I could sort
3:07of decide to
3:07draw a little box and once I've got those
3:10in a box I could group them like this and
3:12that's sort
3:13of called visual grouping I'm going to go
3:15ahead and maybe create one more group over
3:17here I'll
3:17just create three large groups and we'll
3:20just go ahead and do this and just do this
3:23I'm really
3:24hoping this actually generates the feature
3:26that I want and when we do those groupings
3:28here in the
3:28visualizations not only do you see the
3:30colors change I've got blue and orange but
3:32you should
3:32also see that it's created a group field
3:34over here on the left hand side so what we
3:36can actually do
3:37is go into that and edit the group and for
3:39the remaining fields you'll see that it's
3:42already
3:42created an other grouping that's
3:44automatically done when you sort of
3:45visualize it inside of a
3:47any sort of visualization and you do the
3:49grouping so let's hit okay that's pretty
3:52much where we want
3:53to go and I'm going to remove the id from
3:55the detail field and that gives us just
3:57three points
3:58and now we're ready to test this feature I
4:00'm going to go ahead and click on this top
4:02right dot here
4:03and once I've done that I'm just going to
4:05go over here and you'll see that
4:06unfortunately I don't
4:07get the capability to sort of break this
4:10down and that's because I think explain
4:12data has not been
4:13sort of rigged to work with these groups
4:15properly so what I'm going to do I'm going
4:17to try another
4:18way of grouping these and we'll sort of
4:19give this another attempt okay so what I'm
4:21going to do is
4:22it doesn't want to work with these groups
4:24so what I'm going to do is I'm going to
4:26force it
4:27to not group everything into others so we
4:29're just going to bring everything out here
4:32and leave those
4:32out on their own and click okay I think
4:34what's actually happening is that the other
4:37grouping is
4:37actually what's breaking this so we've
4:39removed it but that's fine because we can
4:41still see these
4:42other data points and now you see explain
4:44data actually wants to work it's just over
4:47here where
4:47my sort of cursor is you can see there that
4:49it's there so let's go ahead and click on
4:51that and
4:52let's see what explanation types we get now
4:54one last thing if I'm trying to force this
4:57what I
4:57probably want to do is I want to go down
4:59here and make sure that I can see that
5:01explanation type
5:02in the configuration so let's go ahead
5:04click on that wheel go to explanation type
5:06so you can see
5:06I've got a full cohort of explanation types
5:09here and the new one is actually there
5:10percentage of
5:11nulls is actually added to the list so let
5:15's go ahead and remove this and click okay
5:19and now what
5:20tableau is going to be doing is it's only
5:21going to be looking for the percentage of
5:23null sort of
5:23error so what I'll do is I'll click back
5:25out I'll click on this blue one now to see
5:27if I can trigger
5:28this so let's go ahead and click on the
5:30light bulb and immediately we see two sort
5:32of issues
5:33it's looking at the field 50 of null and 80
5:36of null it says it's higher than expected
5:38when we
5:38go through you'll see that there's no
5:40explanation so additional fields were not
5:42available for
5:42consideration in this analysis so it's not
5:45really getting anything from that first one
5:47let's go
5:47ahead to the second one and again the same
5:49thing nothing's really being used here so
5:51that's that's
5:52sort of a shame let's go back and let's
5:54click on one of these other ones let's go
5:56to this one here
5:57let's run the explanation again and let's
6:00see if we get it here so we don't get it
6:02here
6:03and if I go over there unfortunately we're
6:05not getting it there so it's sort of
6:07strange this is
6:08not triggering how I expected it to trigger
6:10I would have hoped that the percentage of
6:13nulls
6:13within these records is actually quite high
6:16because of course I'm using fields from
6:18both the
6:1850 and the 80 percent of null setup so
6:21maybe there's something interesting there
6:23with the
6:24way that it's working that it's just sort
6:26of not bringing them in what I will do is I
6:28will I don't
6:29want to filter the data so I'll just show
6:31the data the default position and I'll keep
6:34I'll keep doing
6:34that just so that all the records are there
6:36and then what I'll do is I'll go back into
6:38this
6:38analysis let's go back into this analysis
6:41and look at the explanation again and then
6:44see if this time
6:45it comes through it doesn't unfortunately
6:48come through which is sort of odd now what
6:50will happen
6:51here is if I look at this sort of group of
6:53explanations unfortunately you see here
6:57explanation
6:58cannot be calculated from for a mark that
7:01is not it even knows that it's null so
7:03unfortunately this
7:04is sort of strange I don't know what's
7:06going on here let's try one more thing let
7:09's try this let's
7:10bring in the list I'm actually going to do
7:12this manually I don't know I don't know why
7:13this is
7:14the case but I'm just going to try this
7:16sometimes this has worked in other cases so
7:18you go manually
7:19grouping stuff and then suddenly it all
7:22works so let's let's let's spread this out
7:24so I can
7:25actually see the grouped ones at the top
7:27here you go and let's do this and let's
7:29just manually group
7:31these and what I'm hoping is that I do
7:33catch a decent amount of nulls in each of
7:36these groups
7:37so let's just scroll down a little bit I'll
7:39make this group the largest group just so
7:41that the
7:42totals are sort of really high and what we
7:45'll see here is that the remaining items
7:49what we'll do is
7:50we put them into a final group and because
7:53there's no other grouping what I'm hoping
7:55is that this
7:56because this has all been manually grouped
7:58you hopefully should expect this to work a
8:00little
8:01bit better so I'm going to go into a new
8:03sheet and I'm now going to use the second
8:05grouping that's
8:06just been created here it's called id group
8:08one I'm going to put that on color so we
8:10get yes a few
8:11dots 80 of nylon rows 50 of nylon columns
8:15that gives us a scatter plot now we're on
8:18to take two
8:18so let's hit this blue mark and rerun this
8:21analysis and let's see what happens now so
8:24again
8:24this higher than expected and no
8:27explanation type there go into this one
8:30over here let's let's I'm
8:31just going to click on this to run again
8:35and ah finally we finally found it so I don
8:39't know what's
8:40going on here I don't know if this
8:42percentage of nulls feature has some sort
8:44of weird condition or
8:46weird behavior I am playing around with a
8:48sort of machine learning algorithm and
8:51maybe by creating
8:53such a hypersensitive nulls I'm making the
8:55data set look normal maybe potentially and
8:58that's what's
8:58causing this issue so we don't know but the
9:02analysis we wanted is actually here so you
9:05can
9:05see here that it does actually class my
9:08other data points you can see these data
9:10points here
9:11are essentially these other ones here on
9:14the chart okay so the blue one this one is
9:17actually the one
9:18that I've selected over there so that's how
9:20this is working let's clear these
9:21annotations and you
9:22can see that it does recognize there's a
9:25hypersensitive nulls we've got a little
9:27axis
9:27here across the bottom and then we've got
9:29the number of basically records the value
9:32is quite
9:32high because it's just summing everything
9:34up in that column I want to see this as a
9:36separate chart
9:37you can see that it's actually here and if
9:39you've ever noticed this if you double
9:42click inside of
9:42the field you can actually see the
9:44calculation the table is created so if I
9:46double click that you'll
9:47see this double forward slash percentage of
9:49nulls but if I scroll down you'll actually
9:50see the
9:51whole calculation is hiding on the second
9:53row if you're not sure about that just hold
9:55control and
9:56drag it over into the calculation window
9:58then click edit and you can see the
10:00calculation in
10:01full that says sum of if nulls percentage
10:03than one or zero it's essentially just
10:05creating this
10:06on the fly in the background to give us
10:08this analysis so that's interesting this
10:10feature
10:11works this is definitely there it's
10:13definitely in the settings we might be
10:14triggering it in a
10:15triggering it in a slightly different way
10:17let me try one last thing let's see if we
10:19can get this
10:20more consistently by using something with a
10:23lower percentage of null so let's go to 20
10:25of null
10:26and 10 of not so here we have a much
10:29smaller percentage of records with nulls
10:32and let's go
10:32ahead and try and see if we can get this on
10:34the data point so let's go ahead click this
10:36first one
10:36here click on the little light bulb and let
10:39's go look at this value and again we don't
10:41get it on
10:42this record we go to the next one we don't
10:44get it on that one maybe it's just that
10:46particular one
10:47it's cursed let's run it again on the
10:49second one and we don't get it this time on
10:52this one
10:53and we don't get it again on this one so
10:55let's go to the next one I'm really now
10:58curious how
10:59is this algorithm that they've sort of
11:01built in here working because we really had
11:03to hunt for
11:04that explanation even though I've
11:07essentially cooked a data set that you know
11:10should be
11:12looking at nulls and so there we go we
11:14finally get it on this green one now I don
11:17't know what's
11:18different about this green data point
11:19compared to these other data points but hey
11:22tableau if
11:22you're watching download the data source
11:25let me know what's going on here what is
11:27the explanation
11:28really behind this because in many ways
11:30they're nulls in this whole entire data I
11:33've sort of
11:33cooked it up and I'm expecting that
11:35analysis to show after a specific threshold
11:38maybe it's 5%
11:39maybe it's 20% maybe it's 30% maybe you're
11:41looking at my data set and you're coming up
11:43with a
11:44percentage and that's being normalized and
11:47what I will do is if I grab the
11:48documentation for the
11:50record when I do these videos I always have
11:52the documentation up I really don't know
11:54everything
11:55off the top of my head so on the other
11:56screen here I have the documentation and
11:58you can see here
12:00the explanation of the feature is actually
12:02pretty good if we go down let's go look at
12:06nulls
12:07and null explanation so here we go we have
12:09null values there's nothing here in the
12:11documentation
12:12that mentions a threshold so it just says
12:14null value explanation type calls that
12:17situation where
12:18there's a higher than expected amount of
12:20missing data in the mark I guess the
12:22question is what's a
12:23higher than expected amount of missing data
12:25how is that actually derived we don't know
12:28yet maybe
12:28we'll find out it indicates the fraction of
12:30target measure values that are null and how
12:32the null
12:32values might be contributing to the
12:35aggregate value of that measure okay so the
12:38percentage of
12:38values that are null yeah yeah yeah next
12:40option the exclude null values in the mark
12:42okay yeah
12:43perfect so in essence I think this is
12:46really the key thing how is that value
12:49derived what is a
12:50higher than expected amount higher than
12:52expected amount for what kind of data where
12:54is that value
12:55how is it deriving that value if I cook up
12:58a data set full of nulls well that maybe
13:01normalizes the
13:02nulls and therefore you're not going to
13:04alert me as often maybe with this green one
13:06I have a lower
13:06percentage of nulls I really don't know I'm
13:08not about to go sort of digging through it
13:10either to
13:10find out so yeah tableau if you're watching
13:12I'd really love to know what's going on
13:14there and yeah
13:15maybe you can help shed some light for
13:17other people who might be relying on this
13:19feature okay
13:20thanks for watching I'll catch you in the
13:21next video be sure to check out some of the
13:23other
13:23features in 2021.3 it's sort of strange
13:26that we're sort of playing with tableau in
13:28this way
13:29and again the beta program has actually
13:31been pretty good we've actually got a
13:33decent chance
13:34to try some of these features out but again
13:35some of them are just brand new we haven't
13:37had them in
13:38the beta process to play with so we are
13:40going to find things like this and bugs
13:41when these
13:42videos won't be as smooth as you think you
13:44probably saw this explanation and the
13:46length
13:46of the video and thought what's going on
13:48here this looks like an interesting feature
13:49and most
13:50of it was just troubleshooting so that's
13:52sometimes just how it is thanks for
13:54watching and I'll catch
13:55you in the next video
Tableau added a new explanation type about missing data to help users understand why the values of certain marks are beyond the expected range.