Linked tasks in Tableau Prep Conductor: New in tableau 2021.3
Linked tasks finally let you run Tableau Prep flows one after another — and I'm hoping it comes to Server extracts next.
- Linked tasks must first be enabled in the site or server settings tab — there are two options, one to allow scheduling linked tasks and a second to allow running them immediately.
- Deleting a single linked task removes it from all subsequent flows it was set up on, saving housekeeping effort.
- You build the link from the New Task dialog by choosing the flow order in dropdowns; each step has error handling so a downstream flow only runs when the previous one succeeds, with optional data quality warnings and high-vis alerts.
- Each linked task spawns its jobs independently and sequentially — the second job is only sent once the first succeeds.
- The maximum number of sequential linked tasks is 20, though Tim cautions that anything near that suggests a fragile data pipeline.
- What linked tasks do0:00
- Enabling the feature in settings1:04
- Reviewing the two example flows1:46
- Deleting linked tasks behaviour2:27
- Setting up a linked task3:03
- Error handling and alerts4:00
- Scheduling and creating the task5:34
- How jobs spawn on the server7:08
- Run history and the owner's view9:56
- The 20-task maximum limit11:47
0:00Hey, it's Tim here. In today's video, I
0:01want to show you a really exciting feature
0:03from 2021.3.
0:04I hadn't actually covered this in the
0:06previous batch of videos that I made. The
0:08new feature is
0:08called linked tasks. Let me just show you
0:10very quickly what it looks like. If I go in
0:12here to
0:13this particular prep flow, you'll see that
0:14it's just one prep flow that basically gets
0:16some data
0:16together from Excel file. It's actually the
0:19sample workflow from Tableau prep, and it
0:21exports it out
0:22to a published data source here at the very
0:24end on the right hand side. If we go to the
0:26schedule
0:26tasks tab, you'll see that I have a flow
0:28and a schedule, and you can see that it
0:30says one of two
0:31linked tasks. If I actually click on that,
0:33you'll see that there's actually some sense
0:35of order when
0:36these flows run. So the first flow is
0:38called part one, the second flow is called
0:40part two. In this
0:41video, I'm going to show you how to set all
0:42of this up. But the great thing here is, is
0:45that flow
0:45two will only run when part one succeeds,
0:48and I can actually set them on the same
0:50hourly schedule,
0:51and it will run them one after another,
0:53essentially running them sequentially.
0:55Now this is a feature I wish Tableau server
0:57had for all data sources. But in today's
0:59video,
1:00I'm just going to show you how to set this
1:01up in prep as it's now available in 2021.3.
1:04Let's get stuck in. Now before we get stuck
1:06in, if you're a server admin or a site
1:07admin for Tableau
1:08online, you're going to want to go to the
1:10settings tab here. And you need to make
1:12sure that this top
1:13option here is enabled, it's not enabled by
1:15default, you have to come in here and let
1:18people
1:18use this feature. So you want to tick these
1:20two options. Now depending on what you want
1:23to sort
1:23of achieve, you can just take the top one,
1:25which lets people schedule link tasks, but
1:27not let people
1:29run them immediately, if that makes sense.
1:30So if you tick the second box, people can
1:32actually go
1:33and run those link tasks immediately. That
1:35's handy in the use case where maybe there's
1:36a Salesforce
1:37data source or something out there that
1:39needs refreshing immediately, and you just
1:41want to get
1:41it out to your users. Now that that's set
1:43up, now that you've got that option ticked,
1:45you can
1:45actually go back to the project that I had
1:47set up for this. And you can see here that
1:49I have all the
1:50parts that I need. I've essentially got two
1:52flows, I'll open them up in separate tabs
1:54here, so we can
1:54just have a look at what they're doing. The
1:57first one essentially just has a simple set
2:00of Excel
2:00files coming together during some data prep
2:02and coming out to a published data source.
2:04The second one has that published data
2:07source, so it's dependent on that first
2:09flow, and then
2:10some targets and some quotas coming
2:12together and then being output into another
2:14published data
2:15source. And both of those data sources are
2:17actually here in my folder, you can see
2:18here
2:19they're called part two file and part one.
2:21Really bad naming, yes I understand, but
2:24nonetheless you
2:24can see that they're working perfectly fine
2:26. Now if I go to part one, you can see that
2:28I actually
2:28already have a linked task here. So what I
2:30'm going to do is I'm going to show you the
2:32first cool thing
2:33about this. When you delete one linked task
2:35, it deletes all of them. So if I go here to
2:37this tick
2:38box, select actions and delete, and you'll
2:40see that this linked task disappears from
2:42the other
2:43flow. If we go back to the 2021.3 folder,
2:46go to part two, you'll see that this
2:48scheduled task has
2:50also disappeared, it's not there anymore.
2:52So one great thing about linked tasks is if
2:54you create
2:55them, they get deleted from all the
2:56subsequent flows where they've been set up.
2:58It's a really
2:59really nice feature, saves you having to do
3:00lots of sort of housekeeping, and it makes
3:02a lot of
3:02sense actually. Now how do we set this up?
3:05Well you just need to go to one of the two
3:06flows that
3:07you want to link. So in this case, I'm
3:09going to part two, I'm going to select new
3:12task. And when
3:13we get to this option, you've got the
3:14standard task setup that we've had before.
3:16But now you can
3:17see this new feature here called linked
3:19tasks. Again, remember this doesn't show,
3:22this won't show
3:23if it's not enabled on your tabular online
3:25site or on tabular server. Now what you get
3:28to see is this
3:28interface. And this interface essentially
3:31allows us to set up our linked tasks. You
3:33can see here
3:33that the first flow that it has is part two
3:35, and then it's asking me to set up a second
3:37one. Now
3:38what I can actually do is I can actually go
3:40into this little drop down and I can
3:41actually change
3:42that. I can actually say look run flow part
3:44one, and if I add that you see that it
3:46changes that
3:47first one to part one, even though we're
3:49setting this up from part two. And then I
3:51can go down into
3:52the second drop down and I could select run
3:54flow part two, and I can add that in there.
3:57And now
3:58these two are set up. Over on the right
4:00hand side, you've got this little pencil,
4:02and when I click on
4:02that it opens up this interface that allows
4:05us to set some error handling. Now this
4:07error handling
4:08lets us do a couple of things. It basically
4:10says look if this task succeed, start the
4:13next task,
4:13and that will just go down to step two and
4:15keep going. However, if it fails, I can add
4:18a data
4:18quality warning, and I can even edit what
4:21that data quality warning does or behaves
4:23like here in
4:24this little section. I can create a message
4:26, don't forget these messages have some
4:28formatting
4:29guidelines so you can add images, you can
4:31add lots of stuff into these as well, and
4:34then you can also
4:34preview that if you've ever set that up.
4:36But for this video I'm just going to keep
4:37it simple. We
4:38can make it a high vis alert. This is
4:40essentially important if someone's
4:42depending on this. Let's
4:44say there's a downstream workbook or a
4:45downstream flow or something like that,
4:47depending on this.
4:48This actually pops up this high vis alert
4:51to all of those properties or assets, and
4:53it lets people
4:54know that look this data set hasn't
4:55refreshed because there was an issue with
4:57the linked
4:57hassle with this particular task in hand.
5:00So I've set that up and now we've set
5:02everything up. So
5:03it will also email me as the owner of this
5:05and it will stop the remaining tasks.
5:06Essentially it won't
5:07run tasks two, three, and four. This is
5:10going to be really good for making sure
5:12that tasks are
5:12running efficiently. I'll show you the
5:15behavior for tasks once we've set this up.
5:17Now if I go to
5:17the second one you get the same set of
5:19options and of course you can see it's
5:20exactly the same.
5:21I can again add data quality warnings. I
5:24can set all this up and I've got an option
5:25here to delete
5:26everything, but that's pretty much good to
5:28go. If I click on the pencils you'll see
5:30that it collapses
5:31those interface and they disappear. The
5:33last thing is how do we actually create
5:35this task, because if
5:36you look on the bottom right here you can
5:37see that this is grayed out. This will
5:39remain grayed out for
5:40as long as this interface hasn't been
5:42completed, and the reason it hasn't been
5:44completed just yet
5:45is because we've got a schedule. So we
5:47haven't got a schedule. So if I go here to
5:49the schedule
5:50I'm going to set this to run every hour. We
5:51're going to delete this very soon so it's
5:53not actually
5:53going to run every hour. You can see that
5:56this blue create task button goes blue and
5:58now we're
5:59ready to create the tasks. Now the other
6:01thing that caught me out here is that I had
6:03actually
6:03added a couple of more additional tasks and
6:06I was sitting there in task number two
6:08wondering,
6:08"hey what's going on here? Why can't I
6:10create the task?" and that's because just
6:12out of the
6:13view here we had this sort of third option
6:15here that I couldn't see. So I was sitting
6:18here
6:18wondering what have I not done and it's
6:20just because I needed to scroll down a
6:22little bit
6:22more to that third task which I'd sort of
6:24added by mistake and I don't know how it
6:26happened. So
6:26that's just something to be aware of when
6:28you're setting this up. If you have many
6:29tasks just be
6:30sure to scroll down so you can see all
6:32those things are set up. This button will
6:34only go
6:35blue if everything is set up correctly. And
6:37now that's set up we can go ahead and
6:38create the tasks
6:40and you can see that it will just go ahead
6:42process that and also that link task has
6:44been created.
6:45We can see that this link task is here. If
6:47I click on it you see you get this little
6:49pop-up that
6:50shows you both tasks. The other thing you
6:52can do is you can actually go to the first
6:54flow just by
6:54clicking on it, opens it in a new tab and
6:57it will show us this first flow and again
6:59if we go to
7:00schedule tasks you can see that this is
7:02flow one of two of the linked tasks. So
7:04everything is set
7:05up nicely, everything is good to go. Now it
7:07's scheduled to run every hour but what we
7:09can do
7:10is we can actually run this task now. So if
7:12I go here to the actions you can see that I
7:14have an
7:15option to run this task now. The reason I
7:17want to do this is to show you the behavior
7:19of how this
7:20spawns tasks for server admins essentially.
7:22Any server admins who really care about
7:24this because
7:24they're the only ones can actually see this
7:26outcome. So let's hit run now and
7:28immediately
7:28go over to the tasks bar as an as an site
7:31admin or as a server admin to see how this
7:33is working.
7:34So if we go over to the jobs you'll see
7:36that we have a pending job right there. You
7:38've got a
7:39pending job and essentially what's going on
7:41here is it's running the first of those
7:43linked tasks.
7:44If I click ok and refresh my page hopefully
7:46I catch it before it's run the second task.
7:48You can see that it's in progress and you
7:50can see it's now running part one so we
7:52didn't have
7:53to wait long for that to run. Let's refresh
7:55it again and hopefully this time you can
7:57see it's
7:58still running so we're gonna have to wait
8:00some time just to make sure this is running
8:02and I'm
8:02doing this in real time. I'm trying to
8:04catch it when it spawns the second task so
8:06sometimes it
8:06might take a while for these things to run.
8:08Let's just keep refreshing it one more time
8:11and hopefully
8:11we'll see it spawn another task. I want to
8:13really show you this. I'll probably end up
8:15cutting this
8:15out so you don't see the whole way. Okay
8:18now you can see here that it's loaded up
8:19and what it's
8:20actually done is it spawned the second task
8:22. So each of those tasks are spawned
8:24independently.
8:25So the linked task will spawn two jobs and
8:27send them independently and the second one
8:30will only
8:30be sent if the first one has succeeded. So
8:33this is how they appear and you can kind of
8:35see this
8:36is working nicely. If I just wait for this
8:38to refresh you can see this has gone into a
8:41queue.
8:41It has a little bit of a different runtime
8:43so this is in minutes so it kind of had to
8:46wait for
8:46you know 20 seconds let's say 0.2 is 20
8:49percent. I sort of take that as a fraction.
8:51Ren for just over half a minute. So if we
8:54give it a bit of time we should sort of get
8:57the same
8:58average runtime and average key time
9:00essentially just depending on where it is
9:02in the queue.
9:03Tableau Online is a shared resource so you
9:05can't always expect sort of consistent tim
9:07ings but you
9:08can see here that nothing really takes more
9:10than two minutes here. So it's actually
9:12pretty fast
9:13it's just that I'm making a video so I'm
9:14impatient and things are taking a while to
9:16load. So you can
9:17see that it's actually finished in the
9:19queue and it's now actually probably
9:20running. You can see
9:22that the pending time queue time was 1.1
9:24and now the runtime is 0.1. So it's
9:26actually in progress
9:27it's being processed and the data source
9:30here is just on server it's just an excel
9:32file which is in
9:33bedded inside of the prep flow so there's
9:35no real data sources to connect to. If you
9:37had a big data
9:37source that this had to connect to this
9:40would take a bit longer but hopefully we
9:42should see that both
9:43jobs are completed. So now you can see here
9:46are the tasks this was run part two was run
9:48and it
9:48completed at 741 and part one was run and
9:51was completed at 740. So they've actually
9:55run
9:55sequentially there. Now if I go back and I
9:58go back to this folder and I go back to
10:01part one
10:02you can actually see the last succeeded
10:04time. So if I just open this you can see
10:06here the time again
10:07740 and if I just go back to the scheduled
10:10tasks I'm going to do the shortcut here and
10:13go to part
10:13two just by clicking on that it opens
10:15another tab. You've got to bear in mind
10:17that's opening
10:18new tabs rather than sort of closing the
10:20existing ones so it might be sporting off
10:21tabs you don't
10:22want but you can also see the last runtime
10:24here at 741. So if you're the owner this is
10:27what you
10:27can see where things are going. Now if you
10:29're actually scheduling these or you're
10:31running them
10:31immediately and you're not a server admin
10:33when you hit run now and the job gets sent
10:35over to the task
10:37menu you can see I've just kicked it off
10:39again. If I refresh this page you can
10:41actually see that
10:42it tells you that this job is pending or it
10:44's running so if I go to the overview here
10:46and we
10:46go to run history actually we can see that
10:49it's actually gone off but I think until
10:52the job starts
10:53running we can't see it here yet so if I go
10:55to part one instead we might be able to
10:58actually
10:58see that this one is actually running so
11:00you can see here that it's scheduled. So
11:02the second task
11:03doesn't get scheduled until the first one
11:04is run I'm probably repeating this a
11:06hundred times but
11:07I just want to be clear so that you can
11:08understand how exactly this works. So there
11:11you can see it's
11:11scheduled it's going to run when it's
11:13supposed to run because we ran it
11:15immediately that actually
11:16means that it's been queued. If I refresh
11:18this again you might have it you might
11:19actually see
11:20the fact that it's progressing you can see
11:22that it's in progress and when it's done
11:24you'll get a
11:24little tick and you'll be able to see it
11:26here in the run history and so you can see
11:27that I actually
11:28run this a few times today just to make
11:30sure that it works as expected and some of
11:31this was actually
11:32on the schedule it wasn't me actually going
11:34in and clicking run you can see these are
11:36pretty
11:37standard runs on an hourly basis and you
11:39can kind of see the wait times never really
11:41longer than a
11:42minute here so that was a really really
11:44good to see. So that's pretty much it that
11:46's linked tasks
11:48I really wish this comes to Tableau Server
11:50very soon for extracts because occasionally
11:52there are
11:52some Tableau Server extracts that you'd
11:54love to run in a specific order especially
11:56inside of a
11:56workbook just to make sure that things have
11:58the right chronology or if things get used
12:00by other
12:01processes further down the line then it
12:02makes more sense but this is particularly
12:04useful for Tableau
12:05Prep where for data prep that is actually
12:07sometimes a necessity. So if you want to do
12:10that
12:10you can absolutely go ahead and start
12:11scheduling these flows to run one after
12:13another. I don't know
12:14the real limit of this you know I've gone
12:17into this interface and if I go into this
12:20and just go
12:21edit the task you can see you can open it
12:22up again and you can just keep adding tasks
12:24and I've I
12:26haven't actually been able to get to a
12:28limit now oh there we go we've hit it I
12:30must not have tried
12:32properly the first time 20 is the maximum
12:34limit so we've actually found the maximum
12:36limit that's
12:37sort of interesting to know so there you go
12:39that's the maximum limit we just did that
12:40by adding tasks
12:41as many as we can I'd love to see a task
12:44that ran with 20 sequential tasks I mean
12:47that that is one
12:48fragile flow I have to say for 20 different
12:51moving parts to go right on any given day
12:53that's a lot to
12:54ask for so if you're having to set up
12:55things with more than 20 there's something
12:57probably wrong with
12:58your data pipeline in your data play that's
13:00pretty much it for me I hope you found this
13:02useful let
13:03me know what you think of this feature I
13:05think it's really useful feature for Table
13:06au Prep and
13:07it's going to be hopefully something that
13:09is on some dev desk to bring to Tableau
13:11Server generally
13:12so extracts can also benefit from this
13:14capability thanks for watching and I'll
13:16catch you in the next
13:17video
13:18things can happen.
Tableau Release notes: “With linked tasks in Tableau Prep Conductor, you can now schedule flows to run after one another on Tableau Server. Easily automate the orchestration of multiple flow jobs, ensuring they happen in sequence after each task is completed successfully.”
00:00 - Intro 00:12 - How this feature works 01:05 - Enabling this for your Tableau Online site or Tableau server 01:49 - The flows we’re using 02:34 - Deleting linked tasks 03:05 - Setting up a Linked task 04:08 - Adding data quality warnings to linked tasks 05:35 - Making sure the task is setup correctly 06:45 - Viewing the linked tasks 07:17 - How the linked tasks work in the job and task queue as an admin 09:56 - How to view the status as a user 11:48 - Let’s hope this comes to server tasks in general 12:19 - The limit of linked tasks.