Understanding the grain of your data in Tableau
Understanding what your data is about is critical to make sure you calculate the right aggregate values.
- The Orders count field actually counts rows in the Orders table, not distinct orders, so a single order can span multiple product rows
- Superstore operates at the product level of detail, meaning each row is a product within an order rather than the order itself
- Average sales in Tableau is really the average product value, calculated as total sales divided by the number of rows in the dataset
- To get the true average order value you divide total sales by the distinct count of orders, which differs from the default average
- Each logical table in the data model can have its own grain (Orders at product level, Returns at order level, People at regional level), and Tableau handles the aggregation across relationships
0:00Hey, it's Tim here. In today's video we're
0:04taking a slight tangent of functions to
0:06cover
0:06a topic that's going to be important over
0:09the next few days when I start covering LOD
0:12s.
0:12That concept is called granularity. Now
0:14granularity is really really important in a
0:16dataset because
0:17it decides pretty much how we handle our
0:19data and so it's really vital that this is
0:21a concept
0:22you understand and you really investigate
0:24whatever dataset you're working with. Today
0:27I'm going to show you how to do that. Let's
0:28get stuck in. Okay so we're here inside of
0:30Tableau.
0:31I'm actually going to open the second Super
0:32store sales here. This is the American
0:34version. Let me
0:35just go ahead and click on that and you'll
0:36see that it takes us straight into Tableau.
0:39Now one of the things here is we need to
0:41basically start investigating what we mean
0:43by level of
0:44detail. So I'm going to sort of challenge
0:46your perspective and by the way if you've
0:47been using
0:48Tableau for a long while this might be
0:49super basic for you but I still think it's
0:51a worthwhile
0:52refresher just to make sure that you're in
0:53tune with everything that we're going to be
0:55talking
0:55about over the next few days when we look
0:58at LODs. Now I'm just going to drag in a
1:01few rows of data
1:02in here. To do that I'm just going to drag
1:04order ID put it onto rows and then I'm
1:06going to drag
1:07sales put it onto where it says ABC and I'm
1:09going to leave it at that. Essentially what
1:12we can
1:12definitely agree on here is that this table
1:15is showing us the total sales for each
1:17order. There's
1:18no sort of contention there but if I was to
1:21ask you this question what level of gran
1:23ularity does
1:24the superstore's data set actually work on
1:27there's actually a slightly complicated
1:29answer. You see
1:31ever since 2020.2 that answer has become
1:33more complex because of something called
1:35the data model
1:36and so what we have to do is ask that
1:38question of each model in our data set.
1:41Essentially each
1:42logical layer will have a different level
1:44of granularity in our data set. Let's
1:46invest to get
1:46that a little bit further. So here I've
1:48dragged in order ID and what I'm going to
1:50do is you'll see
1:51this field here that says count now. The
1:54interesting thing is this says count of
1:56orders, orders count
1:57but it's actually slightly misleading
1:59because this name orders is actually
2:01derived from the name of
2:02the table so it doesn't necessarily count
2:04the number of orders it's just counting the
2:06number
2:07of rows inside of the orders table. So that
2:09's a really important thing to be aware of.
2:12So if I
2:12was to drag orders count into this table
2:15you'd see that I'm actually getting
2:17multiple rows of data
2:19in each order and essentially what's
2:20happening here is that when you make an
2:22order you don't just
2:23order one thing all the time you could
2:25actually order multiple things and so that
2:27's what we're
2:28seeing here in the data. Just to keep this
2:30simple and to keep this table easy to work
2:32with I'm going
2:32to go ahead and select a few of these you
2:34can select whichever ones you want you just
2:36need to
2:36make sure you select some with one rows two
2:38rows and four rows. So I'm going to select
2:40this one I'm
2:41going to hold command on a mac control on a
2:43windows machine and just select a few of
2:45these
2:46just to make sure we've got a variety of
2:48orders in our data set and once I've
2:50selected those I'm going
2:50to select keep only and just bring in a
2:53nice table. Now the next thing I'm going to
2:55do is I'm actually
2:56going to enable the summary window
2:58essentially if I go here to worksheet and
3:00then just show the
3:01summary window this is a nice little table
3:03that then appears here on the right hand
3:05side and what
3:06I want to do is drag that over to the left
3:08over here just above filters. Now I've got
3:11to admit I
3:12didn't know you could drag the summary
3:13window where you wanted until last week and
3:15I've been using
3:16Tableau for a long while I don't know why I
3:18didn't even try that myself but you can
3:20drag it to this
3:20particular location this is a good location
3:23for me because I don't really use the pages
3:24shelf so
3:25having it right there actually makes that
3:27of that space useful and I've still got a
3:29lot of vertical
3:30space to work with. Now this summary window
3:32is really cool because it actually reacts
3:34to what
3:34you're doing inside the Tableau let's say I
3:36want to know the total of these two cells I
3:38can just
3:39click on command and it will do something
3:41for me it will tell me the sum here but it
3:43also do the
3:44average the min the max and the median
3:46between those two values so it's a really
3:48nice way of
3:49having a calculator right in the view
3:51without having to go and get your
3:52calculator and do some
3:53math and it works with the values you see
3:55it's not working on the underlying data
3:57which makes it a
3:58really really practical tool when working
4:00with your data set. So now the next thing
4:03to do is to
4:03understand what's actually going on here so
4:06this particular data here has four so if I
4:09just click
4:09on this four and I go to this little icon
4:11here at the very end you'll see that it has
4:14a table icon
4:15if I click on that it actually loads a
4:16summary view for me and in there it shows
4:19the summary
4:19of what I'm seeing here in the view if I
4:21just drag this down you'll see that
4:23essentially this
4:25set of summary information corresponds to
4:28what we have here in our table so
4:31everything is really
4:31really nice and simple to to understand
4:34however if we go down to the detail here in
4:36the second tab
4:37you'll see that we actually get more
4:39information and looking at this data you
4:41can see that
4:41there's actually four products in this
4:43single order so they all share the same
4:45order id
4:46in this column here you can see they all
4:47have the same order id which means they all
4:49came as part of
4:50the same order but the person who was
4:52ordering this ordered things from different
4:54categories
4:55and they ordered it from the same city and
4:58place I assume that Houston, United States
5:01and customer
5:02related to the customer the manufacturer
5:04came from lots of different places the
5:07postal code I guess
5:08this is the customer's postal code but
5:10essentially we're not always sure so these
5:12are the kind of
5:12questions you'd really need to ask when you
5:15look at the granularity of data is the
5:16information
5:17operating at the same level or is it
5:19actually inherited from a higher or maybe
5:22even lower level
5:23okay but you can see here that the detail
5:26of this data set is actually at the product
5:28level
5:29so our entire superstore data set is
5:32actually based at the granularity of the
5:34product in every
5:36order not the order even though the table
5:38is called orders it's actually operating at
5:41the
5:41product level of detail okay what does this
5:44mean well if I close this window and I just
5:47do a few
5:48things here what I'm going to do is I'm
5:50going to drag sales into this view again
5:52and this time
5:53around I'm going to bring in the average so
5:54to do that I'm just going to put it here on
5:56the right
5:57hand side that puts it onto the detail pane
5:59and then I'm just going to go to the
6:01measure select
6:02average then drag it in here so we get a
6:04third item over here on the right hand side
6:06and now you
6:07can see there's an average being calculated
6:09and now that we have that average I'm
6:12actually going
6:12to go here to the analysis totals and show
6:15the column grand totals okay and so we get
6:18a range
6:19of numbers we get the sales the total sales
6:21this I believe is the correct figure and we
6:24also get the
6:25count of orders there's 15 and this is
6:27essentially telling us there's 15 products
6:30in all our orders
6:31and in our orders we've got eight orders
6:34here and so what I can do just to make sure
6:36this isn't so
6:37confusing is I can actually bring in a
6:39count distinct so let's go ahead and bring
6:41this on
6:42to detail and let's click on this little
6:44drop down go to measure select count
6:47distinct now that
6:48that's the calculation I want I'm now going
6:50to drag it here onto the measure values
6:51pane
6:52and now we can see what's going on so we've
6:54got eight rows here in the table because
6:56the
6:56grand total takes up a row but we've
6:58actually got seven orders okay so now you
7:00can start to see what
7:02we mean by granularity there's different
7:04things going on at different levels the
7:07only thing really
7:08going on at the same level of detail as our
7:10orders is actually the distinct count of
7:13orders in this
7:13particular table everything else is
7:15essentially an aggregation of some sort and
7:17we need to basically
7:18start to understand what that's doing now
7:22this 148 is an interesting number because
7:26you would
7:26think that this is actually the average of
7:29these cells and so averages always catch
7:31people out
7:31because people sometimes aren't necessarily
7:34clear about what they're doing so in this
7:36case you might
7:37see 148 but if I just hold command here and
7:40I just go and select each and every one of
7:42these
7:43um you can see that the average is actually
7:44108 and that's the average of these values
7:47now it's
7:48not good to do averages of averages but the
7:51key thing here is because I'm actually
7:53operating at
7:54the order level of id whatever I do in this
7:57data set tableau is always going to be
8:00actually
8:01calculating the average product value in my
8:04entire data set because all it's doing is
8:07taking the
8:08total value and dividing it by the number
8:10of rows in the entire data set in this
8:12particular case
8:14and so if you've ever thought that the
8:16average sales in any particular context is
8:18anything but
8:19the average product sales then that's
8:21technically not correct because the level
8:23of detail that we're
8:24operating here at is actually the product
8:27level of detail I can't drum that home
8:29enough it's really
8:30super important especially when you look at
8:32level of detail calculations because
8:34understanding that
8:35allows you to create calculations that
8:37solve this problem in lots of different
8:39ways okay now in
8:40order to demonstrate this I'm going to
8:42actually ask a slightly different question
8:44if I was to ask
8:45the question what is the average value of
8:48an order in this data set on screen you'd
8:50have to answer
8:51that in a slightly different way number one
8:53this sales column is actually where we know
8:56the value
8:57of each and every cell so the average value
9:00of an order in this data set on screen
9:02would actually be
9:04essentially this row of data divided by the
9:09number of orders which is seven and so the
9:12answer to that
9:13is actually here 317. 317 is the actual
9:17value of each and every order it is not the
9:20same as the
9:21average sales in this data set because the
9:23average sales is actually working at the
9:26product level of
9:27detail and so the complex thing here is
9:29when we start doing things like level of
9:31detail
9:32calculations we have to manipulate what's
9:34going on because essentially tableau is
9:36always going
9:37to be aggregating at the granularity that
9:39we've set in the data set and so with level
9:41of detail
9:42counts what we can start to do is start to
9:44direct it in the specific grouping or
9:46specific level of
9:47detail that we're trying to work at okay
9:50and so I just wanted to create this video
9:52to stress that
9:52point now just as a brief example what I'm
9:54going to do is I'm going to open up this
9:56data model
9:57connection here and you'll see that this is
9:59actually created for us when you connect to
10:00sample superstore cells and we've actually
10:04got three logical tables in the view now if
10:09I go to
10:09each one of these you'll see that this one
10:11is the orders table which is what we can
10:12see here and
10:13we've already figured out that this is the
10:15product level of granularity if I go to the
10:18people table
10:18you'll see that this operates a slightly
10:20different level now initially you'll look
10:22at this and you
10:23go how is this even working how is it
10:25creating a relationship to orders table it
10:27's because we
10:28haven't enabled the ability to show hidden
10:31fields if I just grab my arrow here you can
10:33just see that
10:34this box here isn't ticked so if we tick
10:36that we can see that the order id is
10:38actually hidden in
10:39fact the region is hidden because that's
10:41what it's actually creating the
10:42relationship on if I click
10:43on this line up here you can see that this
10:46is what the relationship is based on okay
10:48and so what this
10:49what's happened here is we've just hidden
10:51the column name so we don't have it dupl
10:53icated
10:53tablo has renamed it here to say that it's
10:55coming from the people table but that's
10:57just basically
10:58telling us that it's hidden okay and if we
11:00click on the returns table you'll see that
11:02the same is
11:03true actually with order id I'd actually
11:05enable this ahead of time so you can see
11:06that here that
11:07show hidden fields is already enabled
11:09because we just enabled it and so now you
11:11see it straight
11:12away but if I untick it this too would also
11:15be hidden okay and so this returns table is
11:18operating
11:18at the order level but the people table is
11:21actually operating at the regional level
11:23okay
11:24and then when we create this relationship
11:26tableau is doing the hard work of figuring
11:28out
11:28how this all comes together and then it
11:30will use that information to decide how
11:32things aggregate
11:33across the entire data set so that's pretty
11:35much it I just wanted to very very briefly
11:38touch on
11:38this concept of granularity because in
11:40subsequent videos I'm going to be touching
11:42on LODs and what
11:43I'd like to be able to do is reference this
11:45video multiple times in different videos so
11:48that you can
11:48get a little refresher on what is meant by
11:50granularity in a data set so there are a
11:53couple
11:53of ways this can be referred to sometimes
11:55it's called the grain of your data
11:56sometimes it's
11:57called the granularity of the data
11:59sometimes it's called the lowest level of
12:01detail in your data set
12:02sometimes people will just call it whatever
12:05they want but ultimately generally speaking
12:07when people
12:07are talking about what is the level of
12:09detail or what's the detail of your data
12:11set what's the
12:11grain they're talking about what is on each
12:14and every individual row and that's a
12:16really important
12:17concept to be aware of okay that's it for
12:19this video thanks for watching and tune in
12:22the rest
12:22of this week we're going to be taking on L
12:24ODs tomorrow I'm going to be touching on the
12:25concept
12:26of order of operations because that's the
12:28next key concept we need to tackle before
12:30we take on
12:30LODs so tune in tomorrow for that video
12:37thanks for watching and I'll catch you in
12:44the next one