Getting started with the Snowflake PT-2: Loading data
I work through Snowflake's official lab to load 61 million rows of CityBike data — stage, file format, COPY INTO and all.
- Snowflake 'warehouses' aren't storage in the traditional sense — they're compute units you size (XS to 4XL) for performance, and you're charged based on usage rather than flat storage.
- Loading data follows a clear sequence: create a database and table, create an external stage pointing to the S3 bucket, define a file format, then COPY INTO the table.
- You can do everything via the visual UI or via SQL — the lab worksheet provides ready-made scripts, but you must run sections one at a time and never tick 'run all queries'.
- An external stage just tells Snowflake where your source data lives (e.g. an S3 bucket) before it's loaded in, acting as a holding area.
- When a COPY fails on a gzip file, the UI-created file format may not apply compression correctly — running the provided SQL to create the file format fixed the load.
- Finding the getting started guide0:00
- Sample SQL scripts and trial setup1:18
- Touring the Snowflake interface4:20
- Databases, shares and marketplace5:51
- How warehouses and compute work9:44
- Worksheets and the history tab15:20
- Preparing to load the CityBike data21:06
- Creating the database and table22:30
- Setting the worksheet context24:16
- Creating an external S3 stage32:03
- Creating a file format37:33
- Loading data and debugging the error41:07
0:00Hey, it's Tim here. In today's video, we're
0:01pushing on with our learning of Snowflake.
0:04If you've got a Snowflake trial, you can
0:05follow along with this. Basically,
0:06you're going to be using the guide that
0:08they have on their website. If you go to
0:10the Snowflake
0:10homepage, go to this resources dropdown and
0:13you want to select guides. Don't click on
0:16it. Just
0:16go to the dropdown, then select guides. It
0:19actually takes you to this guides page,
0:21which has a load of different series that
0:23they have available to you to help you
0:25learn Snowflake.
0:26And the one I'm going to be doing is the
0:28getting started with Snowflake, essentially
0:30. So this is a
0:30sort of zero to nuts soup to nuts sort of
0:33course they have here. And I actually
0:36started doing this
0:37in the previous video about a month ago. I
0:39sort of pledged to keep pushing with this.
0:41So here I am.
0:42I'm on basically step number two. Actually,
0:44in the previous video, what I'd done is I'd
0:46created my
0:46trial. I went to the overview, we prepared
0:49our lab, but we sort of took a tangent and
0:52started using it
0:52inside a Tableau rather than sort of doing
0:54anything else. So actually, what I'm going
0:56to be doing here
0:57is just going through with the lab. And we
0:59're probably going to do this in several
1:00videos.
1:01It's a 90 minute lab. If I just show you,
1:03you get this little time estimate here on
1:05the top right
1:06hand side that shows you how long roughly
1:08is left. So I'm actually probably going to
1:10do this over two
1:11videos. I don't want to make sort of one
1:13really long video. And so that will sort of
1:15help break
1:16it down. I've already downloaded the sample
1:18code. So if I just go to this tab, you see
1:20that I've
1:20already got the SQL scripts they have. Now,
1:22what they've done here is they've
1:24essentially typed
1:25everything you should need to get this
1:28whole demo and guide done. So you can, of
1:31course, do this,
1:32you know, naturally by just doing the
1:35instructions they tell you. But they've
1:36also got these bits of
1:38codes that you can just copy and run
1:40manually if you just want to make sure
1:42everything's done
1:43right. But you really want to understand
1:45what's going on here. I mean, this is one
1:47way of doing
1:47it. But if you're familiar with databases,
1:49you're probably going to be more
1:50comfortable typing this
1:51yourself. And if you're not, well, it's
1:53probably a good idea to start to understand
1:54what some of
1:55these things are doing. Because essentially
1:57, Snowflake is a database, and it's going to
1:59have
1:59specific ways of working. So it's good to
2:01know how they want you to use it, and how
2:03you can do
2:03all of these commands just through the
2:05command line interface they give you
2:07through the web.
2:08Okay, so you're going to need that script
2:10downloaded somewhere in your machine,
2:12we'll actually end up putting it into a
2:14worksheet in Snowflake. We'll go through
2:16that in a second.
2:17And then we'll be able to get go from there
2:19. Now, if I just carry on back to the guide,
2:22you can see they recommend a 30 day trial.
2:24And in my previous video, I did say don't
2:26do this unless
2:27you're ready to spend 30 days taking
2:29advantage of it, because they really do let
2:32you run everything
2:32and everything on this trial. So it's a
2:34really good opportunity to make sure it's
2:37suitable for
2:37your business use case suitable for your
2:39personal use case, if you're like me using
2:41it for personal
2:42reasons, and also for your own business.
2:44Okay, so that's basically the overview of
2:46this guy,
2:46check out the previous video, if you haven
2:48't checked that out, we're going to get
2:49stuck in
2:50here and just keep going. Now the overview
2:52just gave us sort of overview of the course
2:54. So we
2:54don't need to cover that we'd actually
2:56already prepared our lab, we've got the SQL
2:58code, I
2:58already have my Snowflake environment. And
3:00so the the second sort of the third step
3:02here is to go
3:03into your Snowflake environment, and they
3:05give you a walkthrough. So I'm essentially
3:07going to just try
3:07and do this walkthrough with you. Now, what
3:09I've actually done is I've actually got
3:11this page
3:11loaded up over here in this progressive web
3:14app. So if I just sort of open this up, you
3:17can see
3:17it's exactly the same. And so to try and
3:19avoid sort of jumping between too many
3:21browser window,
3:22I just have three things open, I have the
3:25guide here on the left hand side with blue,
3:28I have the actual Snowflake instance here
3:30on the right hand side just behind my face.
3:32And then I've
3:32got the SQL script, which will close as
3:34soon as we've created a worksheet inside of
3:37Snowflake. Now
3:38the other thing is I've done some of this
3:39already. So I'm going to sometimes come up
3:42against an issue
3:42where I've already done what it's asking.
3:44And so it might sort of complain that, hey,
3:46this already
3:47exists. So we'll try and change things up,
3:49I won't do things exactly by the book. But
3:51the other thing
3:52to notice is that I will have some
3:53instances and stuff already in there. So I
3:56'm not completely
3:57vanilla. And I'm actually doing this on a
3:59paid instance of Snowflake. Now, I'm
4:00running my own
4:01paid instance here. And that's basically
4:03what I'm using for this demo. Okay, so let
4:05's get stuck in.
4:06And let's just try and learn a bit of Snow
4:08flake. So if we go back to the overview, and
4:11we just go
4:11to tab three, essentially, the first bit is
4:13essentially about logging in. And once you
4:16create your Snowflake trial, you sort of
4:17get walked through all of that. So that's
4:19totally fine.
4:20Let's go ahead and look at the interface
4:22and see what's what, okay. So essentially,
4:24there's a
4:24couple of different areas, if I just go
4:26back here to Snowflake, you'll see that
4:28there's sort of this
4:29tab. So if I just pull out my annotation
4:31tools, you can see here that you've got
4:33this toolbar that
4:34goes across the top. And this is basically
4:36your main navigation area. This is what
4:38sort of walks
4:38you through the entire product. Now, it is
4:41a database. So of course, you don't have to
4:43connect
4:43to it entirely using this web interface.
4:46That's just what I'm going to use here. But
4:49you can
4:49connect to in any database tool. So I use a
4:52tool over here, let me just open up DBeaver
4:55. And whilst
4:57that opens up, you can see that you can get
4:58more or less the same functionality, but it
5:01will be
5:01more familiar with the tool that you're
5:03using. So if you're using DBeaver, for
5:05example, you'll know
5:06what all of those tools do and where they
5:08are. And if I just close this little
5:10upgrade, sort of
5:12indicator and just bring this in here, you
5:15'll see that I have a DBeaver set up here.
5:17And you can see
5:18my Snowflake instance is just right there.
5:20So it's exactly the same one. And if I
5:21expand the
5:22hierarchy, I can see all the different
5:24databases in there. So if you're using a
5:26database tool to
5:27do all these things, that's also totally
5:28fine. But I'm just going to try and stick
5:30to the guide
5:31because that's essentially the best way to
5:32learn how Snowflake works. And then from
5:34there, you can
5:35start sort of applying those skills to
5:37other tools if you want to. So check out D
5:39Beaver, it's really
5:39cool, it's great and free. If you're sort
5:42of a one man band or just using it for non
5:44sort of serious
5:45work. If you want to use it for paid work,
5:47then it's probably a good idea to buy a
5:48license. Okay,
5:49so I'll just go ahead and close that. And
5:50we'll just sort of work our way around. So
5:52we've got
5:52the databases, you can see I have quite a
5:55few databases, I have the Snowflake sample
5:57database,
5:57I'll probably delete this soon because it's
5:59actually quite big. So now that I'm out of
6:01my
6:01trial, I'm actually paying to store the
6:03data they sort of added on to the instance.
6:05So I'll probably
6:06close that at some point. There's a demo DB
6:08, there's a utility DB, this is actually
6:10more of a sort of
6:11like a sort of it's like a shadow of
6:14something they leave on there when you
6:17create the account,
6:17the utility DB is used for a bunch of other
6:20things, we'll go into some other time.
6:22Demo DB is just exactly that. You've got
6:24the Snowflake sample data, this is data
6:26that they
6:27sort of want you to work with. So there's
6:29an open weather data set, there's also this
6:31other sort of
6:32supermarket data set there that's really
6:33cool. There's this weather data set that I
6:35've actually
6:36already created. So what I will actually do
6:38in this case is I'm going to go, I was
6:40going to drop
6:41this, but actually we'll leave it on there.
6:43We'll leave it on there because, no
6:45actually, I'm going
6:46to drop this, I'm going to drop this, and
6:47then we're going to create it again,
6:48because I'm pretty
6:49certain that we create this in the demo. If
6:51not, I'm going to have to create it. We'll
6:52have to find
6:53out how to do that. And then I've got this,
6:55which is a database for my business.
6:57Essentially,
6:58it's just a database that stores all
7:00incoming data from another service called 5
7:02tran.
7:03That's another video. It's a really great
7:05data integration tool that you can use to
7:07bring lots of
7:08data into Snowflake. I'm primarily using it
7:10to bring YouTube data and Google analytics
7:13data and
7:13a bunch of other website and social media
7:15data into one place so I can build
7:17analytical views
7:18around that essentially. So that's why I'm
7:20using that. So we won't be touching that,
7:22that's my own
7:23sort of private place for the stuff I do,
7:25especially with these videos and other work
7:27that I do. So
7:28I'll be keeping that to myself. But
7:30nonetheless, everything else we'll create
7:31here in the demo
7:33today. Shares, this is again probably a
7:36topic for the future. This is actually
7:39quite an important
7:41part of Snowflake, but for this sort of
7:43intro, it's probably not appropriate to go
7:45into. Long
7:46story short, you can share your data with
7:48other organization or other entities and
7:50essentially,
7:51by giving them access, instead of having to
7:54send like a huge file, like a huge CSV file
7:56or a huge
7:57data dump over S3, you can essentially just
7:59give them sort of access to your Snowflake
8:01instance,
8:02and they can start querying the data
8:04without you having to figure out how to
8:06transport data
8:07in a big way. So this opens up
8:08opportunities for collaboration between
8:10businesses,
8:11and also opens up opportunity for making
8:13your data a little bit more portable to
8:15others so that they
8:16too can sort of get in and start doing all
8:18the great things that they do. But also you
8:20get
8:21visibility of what they're doing because of
8:22course everything stays in your instance of
8:24Snowflake.
8:24It's not sort of leaving your database per
8:27se, you're not handing it over, you're sort
8:29of
8:30inviting people to collaborate with you
8:32with your data. Okay, there's a data
8:34marketplace. This is
8:35essentially, if we just go ahead and
8:36explore this, this is essentially like a
8:38store. So I'll bring
8:39this in, it's open actually in a new tab.
8:41So let's just bring this in. And if we just
8:44look at this,
8:45it's going to ask me to sign in to continue
8:46. Let's go ahead and sign in. Let's see what
8:49happens here.
8:49I think it's just going to use my Snowflake
8:52credentials, that's correct. And so this is
8:54actually using the new interface. Snowflake
8:57is also simultaneously sort of going
8:59through a design
9:00at the moment. So you can see that on the
9:03left hand side here, this is actually
9:06pretty much
9:06almost exactly the same thing as what you
9:08've just seen in the previous window, but in
9:10a slightly
9:11different format. That's because they're
9:13actually going through like a redesign. So
9:14there are some
9:16bits which are already in the new design
9:17and the some bits which are in the old
9:19design.
9:19I'm using the old design because that's
9:21what the majority of people are using. But
9:23fundamentally,
9:23it's a really, really nice marketplace. You
9:26can go in there and find other sort of ways
9:29of plugging
9:29into Snowflake and tools that work off Snow
9:32flake that can sort of do things in an
9:34enhanced way. So
9:35again, it's a really nice place to come and
9:37check out what other partners in the
9:38community are doing
9:39with Snowflake. So definitely do check that
9:42out if you have a chance. Now, warehouses
9:44in Snowflake
9:45are slightly different. Warehouses are,
9:48instead of thinking of them as the
9:50traditional sort of
9:51database warehouse methodology, the way to
9:54think about it is to think about it as a
9:58performance
9:59unit. So let's say I have a task and I need
10:02to read a million rows off a database. In
10:05the Snowflake
10:05world, essentially I can do that using a
10:08number of warehouses. And I could have a
10:11small warehouse,
10:12I could have a medium warehouse, and I
10:14could have a large warehouse. Now, the
10:16analogy I like to use
10:17is sort of a car analogy. So let's pretend
10:19you're going to the supermarket. And the
10:22supermarket is
10:24just down the road. Let's say it's 100
10:26yards down the road. Now, if I have a fast
10:28car, let's say
10:29Ferrari, and I have a small car, like a, I
10:31don't know, let's choose a Vauxhall Corsa,
10:34a very small
10:35car here in the UK. Fiat Pinto might be
10:37something that more people around the world
10:39are familiar
10:40with. To get to the supermarket, they'll
10:42probably both take the same amount of time,
10:45because the
10:45distance that they have to travel is just
10:48so small that realistically, the biggest
10:50factor there is
10:51probably just getting into your car. Okay,
10:53so small car, fast car, you know, they're
10:55both going to get
10:56to the end result very easy because the
10:57journey, the task is actually very small.
10:59It's only 100
11:00yards, right? Let's instead say the
11:03supermarket is actually now 10 miles down
11:06the road. Okay,
11:07now, this is where the fast car has an
11:10advantage because the distance is much
11:12larger. And it's also
11:13more powerful, it's able to get to the end
11:15result much faster. And so that's sort of
11:17the analogy
11:18here, the different warehouses can all do
11:20the same thing. But what you're really
11:22choosing is
11:23how much power you want to use behind that.
11:25And so that's why they have these sort of
11:27different sizes.
11:28If you go ahead and create one, it sort of
11:30gives you some sort of idea of how these
11:33scale. So
11:34you've got extra small, small, medium and
11:36large, and you've got extra large to hexa
11:39large,
11:39three x large and four x large. Now, this
11:41terminology actually comes from AWS, AWS
11:44have a
11:44very similar sort of pricing structure for
11:46how they call something easy to instances,
11:49okay. So
11:50this easy to instances essentially compute
11:53units. And an easy to instance is just a
11:56computer,
11:57essentially, just think of it as a computer
11:58, you could assume your laptop is
12:00essentially like an
12:02instance of a compute unit, okay. And in
12:04this case, you can choose different levels
12:07of compute
12:08to do the tasks. Now, depending on the task
12:10you're doing, and depending on the person
12:12you are, you
12:12might want to choose different things. Let
12:15's take for example, your CEO, he wants to
12:17go look at this
12:18data, fresh and live every single time, you
12:21might decide to put your CEO on a large
12:23setup, because
12:24they might only use it once a month, they
12:26might only use it maybe once in a blue moon
12:28, but when
12:28they use it, you want them to have the best
12:31experience possible, okay. But on the other
12:33hand,
12:34there is a downside because each of these
12:36instances take time to sort of spin up. So
12:38the
12:38first person who goes to use it might see a
12:40sort of a small lag, let's say a minute or
12:42two. And
12:43then once it's up, they'll get the same
12:45performance that you should expect once the
12:47instance is
12:48running at its peak speed. So essentially,
12:50you've got to be a little bit smart about
12:52your instances,
12:53you don't want to sort of create one that's
12:55so rare that it barely gets used that every
12:57time,
12:58you know, one or two people go to use, it
12:59has to be spun up and spun down. But at the
13:02same time,
13:02you kind of want to optimize your costs,
13:04you want to kind of put a lot of people
13:06behind
13:06one compute unit so that you can keep
13:08utilizing it for the period of time that
13:10you're charged. And so
13:12the instances sort of work in a minute sort
13:14of components, okay. And the credits are
13:17basically
13:17based on an hourly setup. So essentially,
13:21the Snowflake model is to charge you based
13:25on how
13:25heavily and how often you're using your
13:27data, rather than charging you for things
13:30like storage
13:31and a bunch of other sort of factors that
13:33really aren't to do with your data. And so,
13:35you kind of get this nice effect when, you
13:37know, at night, when no one's doing
13:39anything on your
13:40database, your database costing you almost
13:42nothing, unless you're running sort of some
13:44procedures overnight, your database is
13:46costing you nothing. At nine o'clock,
13:48though, when 100 people
13:50come online, and they're all hitting the
13:51database, that might be where you see the
13:53spike in terms of
13:54costs in terms of running it. So you get
13:56kind of get these peaks and troughs. And
13:58the idea is that,
13:59you know, traditional database just has a
14:01flat line, because once you have the
14:02database,
14:03once you have the power, it stays that way
14:05has to be switched on all the time, it
14:07doesn't matter if
14:081000 people are using it, or one person's
14:10using it, it's going to be running at
14:12almost the same
14:13level. Now, again, in real terms, the
14:15internet is moving in a slightly different
14:17direction.
14:18And cloud computing is sort of made
14:20database a lot more agile, especially
14:22Amazon, you have
14:23very sort of flexible data setup. So that's
14:26a very long winded sort of, you know,
14:29probably not great
14:30explanation of how warehouses work. But the
14:32key thing to take away is that they're not
14:35sort of
14:35warehouses in the traditional database
14:38sense, these are more of a compute sort of
14:41element with
14:42how everything works. Now, the other thing
14:44you get to do is to choose how quickly they
14:45come down. So
14:46once they're up, you can choose how quickly
14:48to take them down. And so if you actually
14:50know,
14:50there's going to be a lot of queries
14:51happening on something, let's say your CEO
14:53comes on at 9am,
14:55every single day. And then the next day
14:57decides to change the time they come on.
14:59Well, you can say
15:01that look, when this person's online, they
15:03typically use it extensively for an hour.
15:05So
15:05don't take that instance down for an hour,
15:07okay, and keep everything running. And that
15:09's essentially
15:10how it works. It will just basically
15:12automatically kill that warehouse when when
15:14it's done, okay.
15:15And you can also add a comment if you want
15:17to create one here, and so on and so forth.
15:19So
15:19that's all there for you to use. And the
15:21next one is worksheets. Now I've actually
15:23got some
15:24worksheets already set up. So if I go to
15:26this test script, you see this is actually
15:29the labs
15:30script that we're going to be using. So
15:31this is exactly the same thing. This is why
15:33I was saying
15:33earlier on that I've actually done some of
15:35this already. So I'm sort of ahead of the
15:37game there.
15:38And then there's also this sort of new
15:39worksheet. This is just a blank one that I
15:40created, but
15:41your worksheets stay here in the web
15:43interface, they don't go anywhere. If you
15:45're working in any
15:46other sort of db tool, you'll probably have
15:47your own place there where you keep all of
15:49this stuff.
15:50So it's just a nice way of keeping things
15:52in a nice sort of clear way. And also if
15:54you're trying
15:54to build like a complex query, this is a
15:56nice way of just making sure that it's
15:58going to do exactly
15:59what you want it to do. Because essentially
16:00, you get a little bit of a data preview
16:02here below
16:03of what's happening with your data. But
16:04again, we'll see that in the guide as we go
16:06through this.
16:07So don't worry too much about that. The
16:09last tab is the history tab. This shows you
16:11everything
16:12that's happening on your instance
16:13throughout the whole setup. So you can see
16:15here that
16:16I'm obviously online using my instance, and
16:18I've been using it quite a lot. Now there's
16:20also a
16:22couple of different types of users that I
16:24have on mine. So you can see that I've
16:25actually got
16:26a username for the five tran setup that I
16:30have. So I've got a couple of usernames
16:32that do slightly
16:33different things. But essentially, every
16:35time that particular user, if I just go
16:37filter to that
16:38user very quickly, you'll see that this is
16:42actually a specific user that's been
16:44created,
16:45that isn't me. Okay, so this is essentially
16:48a way that I can monitor how five tran is
16:50accessing my
16:51snowflake instances, and the level of data
16:53that they're loading in. So although we're
16:55not going
16:56to look at that, you can kind of see what's
16:58happening, how it's sort of adding data
17:00into
17:00into the setup. A lot of this is actually,
17:04if I just try and open up one of these
17:07queries,
17:08and just try and see, you can see here that
17:10for example, this is a poll from Google
17:12Analytics,
17:13if I just click on that, actually, we'll
17:14get the query, you can see here, this is a
17:16poll from
17:17Google Analytics, you can see the
17:18information that it's actually pulling in
17:20and dropping into
17:21our database. And it's basically just, it
17:23goes through a series of steps when it's
17:26moving data
17:26from one place to another. And so just kind
17:28of tries to follow best practice,
17:30essentially,
17:31as it does that. So that's a really nice
17:33way of just being able to monitor that if I
17:35see an
17:35error in five tran, I can of course come
17:37here, and I can monitor and see how snowfl
17:39akes saw that error.
17:40So you can see here there was an error, you
17:43know, SQL compilation error table, this
17:45table particular
17:46does not exist or is not authorized. At
17:48that moment in time, it might have just
17:49been a bug,
17:50because in subsequent runs, it seems to be
17:52working fine. So sometimes these things you
17:54can just
17:55ignore, they're just temporary connection
17:57issues, whatever that just, you know, come
17:58and go. Other
17:59times, if they're persistent, then you can
18:02monitor them here and also try to start to
18:04understand that
18:05the other reason I've done this is because
18:07I want to be able to isolate and sort of
18:09show and
18:10understand what impact five tran has on my
18:12database costs, because of course, I'm
18:15charged
18:15for using it. And so the warehouse that the
18:17five tran is using is actually the extra
18:19small one,
18:20that is pretty much sufficient for
18:21everything it's doing, because it's only
18:23loading a small
18:24number of rows each and every time. And it
18:26can do that fairly quick. So by using the
18:28extra small,
18:29I can also start to see well, how much is
18:31it costing me for five tran to be doing
18:32this work
18:33on my behalf? And I could do it myself. But
18:35essentially, if it's the same amount of
18:37rows
18:37coming in on the same schedule, it would
18:39still cost me the same to do that any other
18:42way. Because
18:42essentially, again, with Snowflake, I'm
18:44charged for what I use, not for everything
18:46else around
18:47that. Data storage is very cheap, though,
18:49maybe we'll do that in another video. So
18:51just sort of
18:51creating things to look at in subsequent
18:53videos. Okay. So that's pretty much an
18:55overview of this
18:56top tab. That's everything. The only other
18:58thing is obviously the setup over here on
19:00the right hand
19:01side, there's a couple of other things,
19:03things like where your organization is,
19:06sorry, your
19:06organization code, your database name, you
19:09know, where you are, which sort of country
19:11you've decided
19:12to set up, all of that is all here. And of
19:14course, you can change the sort of roles
19:16and go to sort of
19:17higher levels and lower levels. Now, if you
19:19've got the trial, and by default, you start
19:21as a sysadmin,
19:22but you can obviously switch and go back to
19:24an account admin. And you'll all notice
19:26that an
19:26account admin actually has some additional
19:29features. Okay, so this top tab gets a
19:31little
19:31busier, we get one new thing. And if I go
19:34into the account tab, I can actually start
19:36to see other
19:37information about my usage, how I'm using
19:40it, how much data I've got stored, and a
19:42bunch of other
19:43things related to my account. Okay, so if
19:45you're an admin, this is what you'd expect
19:47to see. And
19:48you get a nice little breakdown of the
19:49credits that are being used. So you can see
19:51here that
19:51since I set up my Fivetran setup, I'm
19:53actually getting a more consistent sort of
19:55base of use.
19:56But I haven't actually used it for a while,
19:58largely because, you know, few personal
20:00things
20:00were going on, so I didn't have time to
20:02come in here. So this is great. I didn't
20:04pay anything for
20:05my database this whole time, because I wasn
20:07't coming in. But now Fivetran is coming in
20:09daily,
20:10you can see there's some quite big data
20:11loads being done. Now, these were initial
20:13syncs. So
20:14these are the kind of things that do, will
20:17take up a lot of credits, because they're
20:19syncing really
20:20large data sets for the first time. And
20:21then in subsequent times, you can see that
20:24now these data
20:24sets are much smaller, they're coming
20:26through in a sort of smaller level. And
20:28today, because I'm
20:29doing this demo, and we're going through, I
20:30'm incurring a bit more of a cost there as
20:32well,
20:32and 0.44 credits. Okay, so that's, that's
20:36just a really sort of quick overview of all
20:38this stuff.
20:38It's by no means a thorough review. I just
20:41wanted to give an overview of basically
20:43what's,
20:44what's in this guide, as I see it, really,
20:47not really any other way. And so there's a
20:49whole
20:49bunch of things in here that you can sort
20:51of walk through yourself, really take some
20:54time to kind of
20:55go through it and enjoy and really make
20:56sure that you understand it. Because this
20:58is really important
20:59stuff. And if you're going to be spending
21:01time in the web interface, then it's going
21:02to be really
21:03fundamental. Okay. So the next thing I'm
21:05going to do is we're going to go on to
21:07preparing the load
21:08data. Okay, so this step here, so we've got
21:11this lab information, and everything's good
21:13. The next
21:14step is actually getting us to load data
21:16into Snowflake. So if we just look at this,
21:18let's start
21:19by preparing to load the structured CityB
21:21ike rider transaction data into Snowflake,
21:24create a database
21:25table, create an external stage, create a
21:28file format for the data. Okay. And so what
21:30we need to
21:31do here is make sure that we basically
21:33follow the steps. Now, the data is going to
21:35be hosted in S3.
21:36Okay. And it's from a company in New York
21:38called CityBike NYC. I think this data has
21:41been sort of,
21:42you know, cleansed to make sure that it's
21:44not, there's no personal data in it. But it
21:46's a great
21:46sort of data set to use because it's 1.9
21:49gig compressed. Okay. So it's 61 million
21:53rows. This
21:53is a really large data set compared to most
21:56people's Excel files, which only has like a
21:58row
21:58limit of one million. And by the way, you
22:01don't want to do that on a really slouchy
22:03computer,
22:03because it will take a while. Okay. So this
22:06is a real serious proper data set in sort
22:09of relative
22:10terms. Now, this is kind of also what most
22:11businesses might have when they're looking
22:13at transactions. If you work in retail,
22:16this is trivial. This is probably what you
22:18generate every
22:19single hour across all your stores, across
22:21all your products. Okay. So data can get
22:24big real
22:25fast, depending on how you store it and
22:27where you store it. So we're going to
22:29follow along. The
22:30first step here is to create a database
22:31table and we're going to call it CityBike.
22:33Now I'd actually
22:34previously already created this. So let's
22:36go ahead and find out what's going on. Do
22:38we have a
22:39database called CityBike? I think what I
22:41did is I dropped it earlier on. Yes, I did
22:43drop it earlier
22:44on. So let's go ahead and click create and
22:46you can just sort of give it a name here
22:48and we can call
22:48this, I'm just going to call this Snowflake
22:52Guide. Getting started so I can sort of
22:55track this later
22:56on. And you can actually show the SQL. You
22:59see what it's doing is it's giving you a
23:00nice interface,
23:01but if you click show SQL, it actually sort
23:03of shows it to you and you can go and
23:05select that
23:06and just run that in the interface. So you
23:08don't have to always do things using the
23:10interface,
23:10you can actually just go ahead and use SQL.
23:12So I'll close that and I'll just do this
23:14here.
23:15Snowflake Guide, getting started. Let's hit
23:18finish and you'll see that it's processing
23:20that and this
23:21should all be done. There it is, there's
23:23the database, it's ready to go. It was done
23:25by me
23:26because I was an account admin. I should
23:28really do this as a sysadmin. And so you
23:30know what I'm
23:30going to do, I'm going to drop this and
23:33click yes and we're going to do it again.
23:36So let's go back,
23:37let's switch back to my sysadmin role and
23:39let's make sure that's all good. For all
23:42intensive
23:42purposes it doesn't actually matter because
23:45I'm the only person on this instance, but
23:47just for,
23:47you know, for the sake of following the
23:50tutorial it's expecting it to be done under
23:53sysadmin. So
23:54let's go ahead and do exactly that. So let
23:56's paste CityBike back in. Let's put the
23:59comment that we
24:00had in there again, Snowflake trial guide
24:04and hit finish. And there we go, we've
24:08created a database
24:10and we're pretty much good to go. Okay,
24:12navigate to the database tab, click create
24:13name, the
24:14database is CityBike, then click finish. So
24:16that's essentially the step that we've done
24:18.
24:18Now navigate to the worksheets tab, you
24:20should see the worksheet with all the SQL
24:22were previously
24:23loaded. So in the previous step what we
24:25needed to do is load the SQL. So if I go
24:27over here to my
24:28worksheets tab you'll see that I already
24:32have the script. So where's it gone? Where
24:35has it gone?
24:37It's, oh there we go, it's sort of bugged
24:38out there. You see I was already on test
24:40script but
24:40it didn't quite work. I also can't spell, I
24:43've called this tetscript, so never mind
24:46about that.
24:47So here we have the script. Just to be
24:49clear, like there are different sections of
24:51this, so don't
24:51make the mistake of running everything. You
24:53've got to be really careful, okay. So there
24:55's different
24:56modules that go through different steps.
24:58There's also this issue where these modules
25:00are sort of
25:01out of sync. So although we're doing module
25:03four here in the getting started guide, it
25:05actually
25:05says module three. So just look and make
25:07sure you're running the right section. Don
25:09't sort of
25:10run ahead and try run this stuff because it
25:12won't line up. So the guide will tell you
25:14what you need
25:14to be running when and you can kind of see
25:17this going on here, okay. We need to set
25:19the context
25:19appropriately within the worksheet. In the
25:21top right, click on the drop down arrow
25:23next to the
25:23context section to show the worksheet
25:25context menu. Here we control what elements
25:28the user can see
25:29and run from each worksheet. We're using
25:31the UI here to set the context later in the
25:34lab. We'll
25:34accomplish this via SQL commands within the
25:37worksheet. So like I said to you, you can
25:39do this
25:40using the visual interface or you can do it
25:43using the SQL commands, okay. Select the
25:45following
25:46context setting roles sysadmin, warehouse
25:49compute, excel, database, city bike, schema
25:52equals public.
25:52I'm not going to choose excel because I'm
25:54now paying for this myself. So let's go
25:56over here
25:57and what it's talking about is this context
25:59menu here on the top right hand side, okay.
26:02So it's
26:02basically making sure that look when I run
26:04this query, am I doing it with the right
26:06set of context?
26:07Number one, am I in the right sort of
26:09security level? Number two, which warehouse
26:11am I using?
26:12And number three, what database am I doing
26:14it to? And then lastly, what schema is this
26:16being applied
26:17to? Okay, so let's just go back sysadmin
26:20warehouse compute, warehouse.excel. I think
26:23I'll just keep
26:24this as is. I'm not going to change that. I
26:27know it says to do something else. The got
26:30cha here is
26:30that this is a really large data set. So I
26:32think the reason they want you to use an
26:34excel is because
26:35it's a large data set. So I'm probably,
26:38what I will do is I will use this one. I'll
26:42use a large.
26:42I won't use excel, I'll just use large and
26:44I will hit resume. And what that will do is
26:47we'll just
26:48switch on that instance and the one that I
26:50was using will stop sort of working in a
26:52minute's
26:52time, okay. So we've got the analytics
26:54warehouse set up and let's go back to the
26:56guide. And the
26:57next thing we want to do is make sure we're
26:59pointing to the database city bike and the
27:00schema public, okay. So select a database
27:03and let's say city bike and the schema is
27:06public
27:06and that's pretty much it. So we've got all
27:08those settings set up correctly and we're
27:10pretty much
27:11good to go. If I go further down here you
27:13'll see it just walks through those settings
27:15,
27:16okay. And that is pretty much it. You're
27:18just setting the context. You don't need to
27:21do anything
27:22from this point on. Everything you do will
27:24be using this schema. And another thing to
27:26note,
27:26I don't know if you noticed this but you
27:28can see I've got this little green dot.
27:30This means my
27:30analytic warehouse is now online.
27:32Previously there was no dot there, it was
27:34just sort of selected and
27:35so now that it's telling me it's on.
27:38Essentially this is going to be running for
27:40the next minute
27:41or so. I think it's auto suspend on five
27:43minutes or one minute. So it's going to
27:45keep running for
27:46the amount of time that I have it and so as
27:48I start using it it will just sort of spool
27:50up,
27:50okay. Next we'll create a table called
27:53trips that will be used for loading comma
27:55delimited data. We
27:57will use the UI within worksheets tab to
27:59run the DDL that creates the table, okay.
28:03The SQL text
28:03below is what we previously loaded into the
28:05worksheet and this is what we've got. And
28:07essentially they want us to run the query
28:09by placing your cursor anywhere in the
28:11command line
28:12and clicking the blue run button at the top
28:14of the page or use the keyboard shortcut
28:16control command enter, okay. So basically
28:19what they want us to do is to run this sort
28:21of SQL,
28:22okay. And the interesting thing here is
28:26they haven't sort of elected run all
28:28queries, okay.
28:29So they've got this warning here, never
28:32check all queries box at the top of the
28:34worksheet of this
28:35lab. We want to run SQL queries one at a
28:38time in a specific order not all at once.
28:41So that is
28:41something to be very aware of. I think some
28:43people might tick that and what it will do
28:45is look at
28:45everything and run it all at once. They don
28:47't want you to do that they just want you to
28:49run it this
28:50once. Okay so let's go ahead and run this
28:51query. Let's go back here. So basically
28:53they're saying to
28:54place the cursor anywhere in the command so
28:56I'm just going to place it here at the
28:57beginning.
28:58That sort of makes sense to me, okay. And
29:00you could also just select it like this and
29:02then
29:03what that will do if you hit run it will
29:04only run what you've selected. So let's hit
29:07run and you'll
29:07see do you want to run the following
29:09queries and you'll see just basically
29:11running what I highlighted.
29:13The other thing I can do is I could just
29:14put my cursor in there hit run and you'll
29:17see that it
29:18just did it, okay. So table trip
29:19successfully created. So it just ran that.
29:22Now if I want to
29:23know you know what happened with that run I
29:26can go to data preview. I can't see
29:28anything there
29:29because there's no data we've just created
29:31the table. I can go to the query id and you
29:33can see
29:33here that it gives me the id of the query
29:36that was run to go and run that and if I
29:38actually click on
29:39that it actually takes me to the query and
29:41you can see this is the query that I ran.
29:43So this is sort
29:44of within the history tab so if you see
29:46here we've changed perspective we've gone
29:49from where were we
29:51we were over here in the worksheets tab I
29:53've got an issue with our mouse at the
29:54moment and what
29:55we've done is we've essentially gone to the
29:57history tab and then having gone to the
29:59history tab we've
30:00gone to analyze a specific query this query
30:03and this query ran this sql. So that's
30:07basically sort
30:07of the anatomy of how this is all working
30:10and it's sort of guiding you through this.
30:12Now because I'm
30:13using this pwa I've actually got the back
30:15button back here so I'm just going to click
30:17that to take
30:18me back to my worksheet and you can see I'm
30:20exactly where I left off. So we've run that
30:22query
30:22everything is good and you know once that
30:25query is run you'll see that a little
30:27success notice you'll
30:28see that it's giving you lots of warnings
30:30here but you'll see this table trip
30:31successfully created
30:32and that's what we saw just down here at
30:35the bottom you can see that very very
30:37clearly. Okay
30:37so that's now created essentially what we
30:40've done is we've created a database and
30:42there was nothing
30:43in the database so what we've done is we've
30:46gone in and created the columns essentially
30:48that is
30:49going to hold our data so you can see here
30:51that we've got create the table we've got a
30:53trip
30:53duration which is an integer start time
30:56which is a time stamp stop time is also a
30:58time stamp station
30:59id is an integer and so on and so forth we
31:01're literally describing each of the columns
31:03and the
31:03data types that go with it and at the end
31:05of that we're done and that's pretty much
31:07it we've just
31:08created the scaffolding the frame for our
31:10data but we need to go in a little bit more
31:13and carry on.
31:14Okay so now that we've done that we're
31:16pretty much good to go what we can do is we
31:18can actually
31:19go and see the table itself so that's what
31:21exactly it's asking us to do and it's
31:23asking us to go into
31:24the table and check that out so we're going
31:26to do that now we're going to go to the
31:27databases
31:28section here and what we're going to do is
31:31going to go to city bike and then once we
31:33're there we'll
31:34see we just have the one table that we
31:35created which is called trips let's click
31:37on that again
31:38and you can see here that these are the
31:40columns and now you can see even the data
31:42types okay so
31:43you can see the types here that were
31:45written next to that sql so it sort of
31:46makes a lot of sense
31:48it's a very logical thing if you were
31:49working in excel you do this exact same
31:51thing like before
31:52you paste data you make sure that it's all
31:54got the structure everything's sort of
31:56organized
31:57the only difference is this is actually
31:58powerful and this is how you should be
31:59doing it not in excel
32:00so um that's pretty much that and now the
32:03next thing is to create an external stage
32:06so we've
32:06basically created the framework and we need
32:08to create the stage let's uh read through
32:10this a
32:11little bit more we're working with
32:12structured comma delimited data that has
32:14already been staged
32:15in public um has already been staged in a
32:18public external s3 bucket before we can use
32:21this data we
32:21first need to create a stage that specifies
32:24the location of our external bucket so what
32:26we're
32:26basically saying is listen we have some
32:28data somewhere and we need to tell snow
32:30flake where
32:31that place is and what we wanted to do is
32:33to use that place like a holding place
32:35essentially so
32:36we're basically saying yo snowflake we're
32:38going to give you some data but before we
32:40give it directly
32:41to you we're going to put it over here
32:43first yeah and that's where you need to
32:44look every time data
32:45is going to come in so that's essentially
32:47what's going on right so we're going to go
32:49to databases
32:50we're going to go to uh stages then we're
32:52going to go to create and then it's going
32:54to ask us
32:55where the data is going to come from so let
32:57's go ahead and do that so let's go to
32:58databases
32:59and once we've gone to databases let's
33:02select trips and once we're here you can go
33:05to stages
33:06okay and you can go ahead and create a
33:08stage okay and so now that we're here we
33:10get this little
33:11uh sort of option to say hey where's your
33:13data going to be coming from now the reason
33:16we get
33:16these options i've got a snowflake manage
33:19setup we've got an amazon one an azure one
33:22and a google
33:22cloud platform these are essentially the
33:24four places that snowflake lets you run
33:26snowflake
33:27essentially so i chose to have my snowflake
33:30instance running on amazon s3 based in ire
33:34land
33:34so that's what this option is and if i was
33:37transporting data to snowflake it would
33:39make
33:39the most sense for me to be doing it on am
33:42azon on an s3 bucket also in ireland because
33:45then
33:45what happens there is that the the data has
33:48sort of the most uh sort of fluid
33:51capability to get
33:52into snowflake it sometimes doesn't even
33:54have to traverse the internet to get there
33:56because
33:56if i do it in the same region it might just
33:59use the amazon backbone network to actually
34:01get itself
34:02into snowflake so in this case i'm just
34:03going to go over to the guide and see what
34:05they suggest
34:06they've also suggested the s3 i wonder if i
34:08wonder if that's because i'm on an amazon
34:10setup and maybe
34:11this guide knows that so maybe that's sort
34:14of what's going on so yeah we've used am
34:16azon s3 and
34:17now if we go back so on the create stage
34:19box enter the following settings then click
34:22finish okay so
34:22name city bike trips schema public and the
34:25url i'm actually going to copy this because
34:27i don't want
34:28to get this wrong so let's go ahead and do
34:30this so let's click next so we've got our
34:33amazon s3 and
34:35let's let's just make sure um this clicks i
34:38think we've got a bug here where i can't i
34:42can't select
34:43it easily so uh actually what i do need to
34:45do is get make sure i get this absolutely
34:47right so
34:48city bike trips uh we can just copy and
34:50paste that that's all we've had to do so
34:52far and public is
34:54correct now we need to get the s3 uh
34:57location just over here and so what we're
35:00basically doing is
35:01telling snowflake where the data is going
35:03to be coming from now if you've got any
35:05sort of specific
35:06security settings with your aws setups you
35:09're going to need the aws key uh the secret
35:12key and
35:13if you've got any encryption on there you
35:15're going to need the master key as well so
35:16this these three
35:17settings are more related to aws it makes
35:20sense to use this if you've got the option
35:21but because
35:22snowflake is doing a demo this is a public
35:25player sort of set of data set so anyone
35:27can go into the
35:28s3 bucket and essentially grab it um so
35:30that's pretty much it i think that's
35:31everything we don't
35:32need to do anything else uh we can go ahead
35:34and finish and so that's what we're going
35:36to do we're
35:37just going to click on finish i've got this
35:39bug again it won't let me hit the cursor oh
35:42no oh this
35:43is funny let's see let's see if i can get
35:46it to work uh come on there we go there we
35:49go i have to
35:50go on a previous one and then that kind of
35:52works so if we hit finish you'll see that
35:54there's been
35:54created so now it knows where the data is
35:56this is essentially called the staging area
35:59so that's where
35:59it's going to be picking data up from okay
36:02and now uh let's take a look at this city
36:04bike trick stage
36:05navigate to the worksheets tab then execute
36:07the following statement so this is
36:10basically where we
36:10are we've seen the city bike trips okay now
36:13it's telling us to go back to the workshe
36:15ets and we're
36:16going to basically execute this uh code
36:19here so let's see if i can just um make
36:21this larger can i
36:22make this larger i can i made it a little
36:24bit larger this is essentially the sql we
36:26would have
36:27run if we were trying to create that um
36:28ourselves but we did it through the
36:30interface rather than
36:31using a sql if we go back and we just look
36:34at this the actual sql we need to run is to
36:36list the city
36:37bike trip so basically this is going to
36:39list all the files inside of our staging so
36:41let's go ahead
36:42highlight that and hit run and you'll see
36:45it asks us do you sure you want to run the
36:47sql and i'll
36:48click run i won't click don't ask me again
36:50because i want that to come up every time
36:52for this demo
36:53and now you can see all the different files
36:55and how big they are inside of that s3
36:58bucket so it
36:59looks like there's a a different file and
37:01if we just sort of expand this out it looks
37:04like we get
37:05uh a file for each day so you've got the
37:07second of april 2018 and so on and so forth
37:10so file for
37:11each day um it has a size has an md5 checks
37:14um to make sure the file is actually the
37:17file
37:17that it should be it's a security kind of
37:19thing to make sure files are what they say
37:21they are
37:22um but there's essentially 376 files so
37:24there's almost a year's worth of data in
37:27there to to look
37:28at okay so now that we've seen that we've
37:30seen the preview and seen what's there we
37:32need to go
37:32ahead and create a file format okay so
37:34essentially before we can load data into
37:36snowflake it's saying
37:37here we have to create a file format that
37:39matches the data structure so you're
37:41basically telling
37:41snowflake how the data looks like right so
37:44from the databases tab click on the city
37:47bike database
37:48hyperlink so let's go ahead and click on
37:50that through databases and then once we are
37:53there
37:54let's hit city bike uh trips and they
37:57select file formats
38:00uh now we're here we have no file format so
38:03we're gonna head go ahead and create one
38:06now
38:06if we just look at the settings here the c
38:09sv is public the format is capital s v uh
38:13compressed
38:13method is automatic just let it figure that
38:16out column separators a comma row separ
38:19ators a new
38:20line header lines to skip a zero and
38:21essentially if your data didn't start on
38:24the first row you
38:25might sort of tell it to skip a few lines a
38:27field optional optionally enclosed by
38:30double quote and
38:31a null string and all these other settings
38:33so these are very similar settings to any
38:35sort of
38:36tool that's going to be passing a csv file
38:38even altrix has something similar like this
38:40but on the left hand side so let's go ahead
38:43and create a new file format and we're just
38:45going to
38:45call this i think it called it csv as
38:48normal so let's just go csv you can see it
38:51's pretty much
38:52filled it out for us if i just go back here
38:54and look at this that's sort of very
38:56similar um i'm
38:57going to delete this option for null string
38:59i don't want it to do anything that's not
39:00in the
39:00guide because i guarantee you that will
39:03sort of cause us to trip up to csv public c
39:05sv auto common
39:06new line csv public csv also common new
39:09line zero none and nothing is ticked there
39:12so we're pretty
39:13much good to go notice the trim space
39:15before and after is not ticked as well if
39:17you had
39:18a you know a bunch of field which might
39:19have had leading on trailing spaces you
39:21might switch that
39:22on because you know that's an attribute of
39:24how your data is processed so let's go
39:26ahead and hit
39:27finish and we're nearly there we're nearly
39:29there and i'm sure the next step will be to
39:31load the
39:32data finally and so let's go ahead to the
39:35next step and let's let's find out how to
39:37load the data so
39:39so far let's just recap what we've done so
39:40far let's make sure everyone's on board
39:42okay so if
39:44you go back to the database what we did is
39:45we created the database as a starter we
39:47called it
39:48city bike once we created the city bike
39:50database we actually needed to create some
39:52tables essentially
39:53we need to put our data somewhere so if i
39:56go back to city bike and just look at this
39:59we actually
40:00created a table called trips and trips is
40:02basically has the setup for our data when
40:05we did that we
40:06created specific field types if we go in
40:08and select trips we'll actually see each
40:10individual
40:11field we created and the different data
40:14types that we created in order to do that
40:16okay now that we've
40:17done that we're ready to load our data but
40:19we need to tell snowflake where the data is
40:21so over here in
40:22stages we can actually tell hey snowflake
40:25we have some data in amazon s3 that is
40:28essentially what
40:29we'd like you to look at okay so once we've
40:31set that up and we've given it all the
40:32security
40:33credentials for aws the next thing we need
40:35to do is tell it what file format that data
40:38is going to
40:38be coming in as so once we go back to file
40:41formats um it's basically going to use this
40:44as a way of
40:44passing that data in the stage okay so uh
40:46when we created this file we gave some
40:48settings for the
40:50csv file itself but now we've given it the
40:52sort of fundamental understanding of what
40:54to do with the
40:55data we've told it where it is we've told
40:57it what it looks like we've told it the
40:59file format and
41:00we've created the structure that it's going
41:01to load it into so essentially that's all
41:03our bases
41:04covered we're now ready to load the data so
41:07let's go ahead and do exactly that so for
41:09this one we're
41:10going to go to the warehouses and we're
41:12going to go to the configure option okay
41:14now for this one
41:15i think what they're trying to do is they
41:17're trying to change the size and for this
41:18they're
41:19setting in the extra large um because i
41:21think this is a this is a really big data
41:24set so i think what
41:25they're trying to do is make sure that it
41:27sort of performs in a reasonable sense okay
41:30so if we just
41:30sort of read a little bit of this size drop
41:32down is where the capacity of the warehouse
41:34is selected
41:35i think i've already touched on this as an
41:37example the four xl option allocates 128
41:40nodes the sizing
41:41can be changed up or down with a simple
41:43click and that's absolutely correct if you
41:45have a snowflake
41:46enterprise additional higher you'll see the
41:48maximum cluster selection this is where you
41:50can set up a single warehouse to be a multi
41:52cluster up to 10 clusters as an example if
41:55a
41:554xl warehouse was assigned a maximum
41:58cluster size of 10 it could scale up to
42:021280 nodes
42:03that is crazy like unless you've got
42:06serious data and i'm i'm talking serious
42:09data like weather data
42:12for the whole world streaming in real time
42:14you're gonna like super computers need that
42:17sort of level
42:18of compute and to be honest with you if you
42:20if you decide to do let's say a really data
42:23intensive task
42:24let's say you decide to load up all the
42:27transactions of an entire organization you
42:29need it to be done
42:30overnight that is probably a good use case
42:32for that as well where you know you've got
42:34this one
42:34task you've basically tried it already on a
42:37much smaller set of data but this time
42:39around you need
42:40it done overnight you need it done quickly
42:42you might then go and sign up for 128 nodes
42:46in a 10
42:46cluster or something like that so it's just
42:50crazy i i i'd love to know the people out
42:53there that
42:54have to set these up because if i that was
42:56me having to set that up i'd be bricking it
42:58so um the final selection shows you uh
43:00basically how to automatically suspend i'd
43:02already covered
43:03all of this so you don't need to worry
43:04about that i sort of explained it earlier
43:06on but by all means
43:07go through this yourself this is the bit we
43:10're interested in loading the data okay so
43:13now
43:13essentially what we've done is we've set
43:15everything up we've set up the structure
43:17and it wants us to make sure that we're
43:19doing things in the right way okay so roles
43:21ys admin
43:21warehouse compute whatever it's basically
43:23asking us to make sure that we've got the
43:25right settings
43:26i'd already changed mine so i'm not going
43:28to change it again and it's asking us to
43:30execute
43:30the following statement in the worksheet to
43:32load the stage data into the table this may
43:34take up to
43:3530 seconds okay so let's go ahead and go
43:37into the worksheet and i'm just going to
43:39sort of bring this
43:40down i'm going to make my window larger so
43:44it's easy to see and um if i just go down
43:47uh you can
43:48just see here that here we are create a
43:50replace file format csv type this is what
43:53we did before
43:54okay so this is another way of doing it in
43:56sql and we're now on module four which is
43:59technically
43:59number five in the guide copy into trips
44:02from city bike uh trips file format csv
44:06okay so this
44:07is essentially the code they want us to run
44:09let's just double check that copy into
44:11trips from city
44:12bikes file format called csv so essentially
44:15we're basically saying um copy data into
44:18the trips table
44:20from this city bike trips which is the
44:23staging area the file format is a csv so
44:26this is all
44:26the different components coming together to
44:28give us this one instruction so let's
44:30highlight that
44:31and it run it will ask us do you sure you
44:34want to run that i'll click yes hit run and
44:36this can take
44:37a while because of course it's going into
44:39the database and it's working and of course
44:41there is
44:41an issue so there is um this particular
44:44file is not recognized file city bike trips
44:47and the csv.gv
44:49line one character one um is not an active
44:51file so the issue here is the fact that it
44:53's compressed
44:54i'm pretty sure that's the issue so let's
44:57go back this might be deliberate it's not
45:00deliberate so
45:00let's just double check this i'm wondering
45:03if i if i missed a step in earlier steps so
45:06let's go
45:06let's go back to databases i think i might
45:09have missed a small step and let's just go
45:13to file
45:14formats and let's see can i edit this i can
45:18and uh let's just go in here and let's
45:22double check
45:23that you there's no option here to tell it
45:25whether it's compressed or not so
45:27compression method was
45:29automatic so um interesting it's not sort
45:32of recognizing the compression format so i
45:36'm going
45:37to set this to gzip just to see if this
45:39gets around the problem so let's just hit
45:42finish
45:42and let's go back to our worksheet and then
45:45let's try and run this because you see the
45:48issue here is
45:49it's looking for csv but it sort of didn't
45:52recognize this gz which is the gzip sort of
45:55file format so let's go ahead and run this
45:58again i'm gonna try and hit that again hit
46:02run and let's
46:03give it a bit of time and we still got the
46:05same error so now i need to think about
46:06this and figure
46:07out how to debug this a few minutes later
46:09okay so what i'm gonna do i just dropped
46:12the table i just
46:13i just dropped the table and i let that go
46:15what i'm gonna do is i'm actually gonna go
46:17to the
46:17worksheet i don't trust that file format
46:19step that we went through previously for
46:21some bizarre reason
46:22i don't think it's working quite right so
46:24what i'm going to do is i'm going to use
46:26the sql script
46:27that they provided us here before i then
46:29run the next thing so i'm just going to hit
46:31run on this
46:32step you'll see that the sql comes in here
46:34hit run the file format was successfully
46:37created now if i
46:38go back to the databases i should see the
46:40file format is there it's literally just
46:43been created
46:43at 10 four minutes past 10 go back into my
46:47worksheet and now we should we should be
46:50able
46:50to load this in so let's go ahead and hit
46:53run and figure out what's going on
46:56okay it's running this looks like it's been
47:02successful so the previous step there was
47:07something weird about the settings that is
47:09sort of incorrect in the guide that we were
47:11following
47:12it was actually correct here in the sql so
47:15my hunch is some of the options that we saw
47:18at the
47:18very very bottom aren't actually updated or
47:20included in the guide over here so there's
47:23there's a bunch of options you can't see
47:25off screen here because when you go to
47:27create the
47:28databases option um i think if i go here to
47:32create when this sort of pop-up comes up
47:36you see here
47:37there's the bottom there's a bunch of other
47:39options here and i think we couldn't see
47:41the
47:41settings at the bottom here and in the
47:43guide i'm not sure they're specific about
47:46that i think maybe
47:47these are the settings maybe i missed this
47:49to be honest who knows but long story short
47:51we've managed
47:51to fix it and what i did was i dropped the
47:54table and what i did was i then just went
47:56to the worksheet
47:57which is just over here and i ran the
47:59script instead instead of doing it through
48:02the interface
48:02i ran this which they provide you which is
48:04very good at them because my god i would
48:06have been
48:06pulling my hair out trying to figure that
48:09out for quite a few longer um and yeah it's
48:11just these are
48:12just one of the things where you need to
48:13spend more and more time with the database
48:15to get familiar
48:16as i said you know i'm learning so that's
48:18that's potentially something to sort of
48:20look into um the
48:22nice thing about writing the script is the
48:24fact that you actually see everything sort
48:26of plain
48:26and simple right there and you also sort of
48:28build a good habit of making sure you've
48:30covered your
48:30bases and by you know triple checking it
48:33visually the things with interfaces is that
48:35sometimes you
48:36can miss option you can accidentally select
48:38the wrong option when you've got to write
48:40it out and
48:40you've got to make sure you check it before
48:42you run it it's just going to be right
48:43every single
48:44time okay so that's worked out um we've
48:47basically brought in um the data we we
48:50basically did this
48:52step and then we went ahead and run this
48:54step so if i go ahead and look at the query
48:57id and we can
48:58actually go and see that we can actually go
49:00see it two ways so you can see here that it
49:02copied the
49:02trips and below here you can see it's
49:04loaded all the data you can just see that
49:06sort of streaming
49:07in i think it was something like 300 and
49:09something files i won't bother with all of
49:11that let me just
49:12go back one and what i will do is we'll go
49:15to the next step so basically um once we've
49:19loaded all
49:20the data let's go back into loading data we
49:22've run all the data and we can see all the
49:24history
49:25we've done all of this as well um in the
49:27history window copy into the trips from
49:30city bikes sql
49:31query yep we've looked at that already
49:33truncate table trip so this one is an
49:35interesting one
49:36go back to the worksheet to clear the table
49:38of all data and metadata by using the trunc
49:40ate table
49:41command okay so um that's already written
49:44out for us here so what does this do let's
49:46go let's go
49:47ahead and find out so let's just hit run
49:49statement executed successfully you can see
49:52that was the
49:53sql that was run okay and um and then what
49:56it's going to get us to do oh okay it's
50:00going to get
50:01us to load the data in again and this time
50:03it's basically testing the different size
50:05databases
50:06so when we did that we dropped all the base
50:08of the data and then here we are running
50:10that sort
50:11of query again but what they want us to do
50:13is to change to let's say a different
50:15warehouse so at
50:16the moment i'm going to enlarge let's
50:18change back to small this will take
50:20considerably longer uh not
50:21considerably should take longer but not
50:23considerably longer so let's go ahead and
50:26run this again hit
50:27run and click run we don't need to set
50:29anything up or anything because that's
50:31already set up
50:32it's just doing the same query but this
50:34time it's going to be slower because i'm
50:36using a slower
50:36date warehouse previously we were using a
50:39large so it was actually reasonably fast i
50:42think it
50:42happened in the space of like you know a
50:44few seconds um if i actually go back to the
50:46history
50:47we can leave that running go back to the
50:50history and check how long uh the previous
50:53one took so
50:54let's just go and check all users here i
50:58need to make sure it's any users and we can
51:01see here that
51:03this is the extra small one uh but where's
51:05the large one that we ran before um oh it
51:09didn't
51:09didn't i didn't maybe i didn't run it oh
51:11this is still the five tran one so let's
51:14let's go back to
51:14my own profile i think that's the one to to
51:17use here and we should see extra small
51:20extra small
51:21medium uh large there we go so the large
51:24one took 12 and a half seconds and we
51:27loaded 534 megabytes
51:30scanned um this one is taking a little
51:33longer 35.9 seconds it has actually
51:36finished um so that's
51:37that's good it took a bit longer not too
51:39much longer and if i go back into my works
51:41heets
51:42uh we'll see that that should should all be
51:45done so there you go that's all the data
51:48loaded 36.44
51:50seconds and yeah that's pretty much our
51:52data loaded so now we've got our data
51:55inside of our
51:56table um which has been sort of a little
51:58bit of a long journey um create a new
52:00warehouse for data
52:01analytics so this is essentially what you
52:03then might do if you want to do some
52:05analytical work
52:06you create another warehouse so that
52:08essentially any sort of transactional
52:10activity is not
52:11interrupting uh so it's not being
52:13interrupted by any other analytical
52:15activity so analytics can
52:17sometimes use up database performance a lot
52:19so by creating a separate warehouse you can
52:21sort of
52:22separate that out a little bit and create
52:24some separation between the two things okay
52:26so that's
52:27pretty much it i'm going to stop the video
52:28here because i think we've been going for
52:30nearly an hour
52:31but i thought it was just important to just
52:32to go through that and just talk through it
52:34about
52:34through it as much as i possibly could i'm
52:37starting to lose my words and slurry now it
52:39's been
52:39there it's been a long video as well so
52:41hopefully hopefully you found that useful
52:43you listen you
52:44could just go through this yourself
52:46probably a lot faster on the snowflake
52:48website but i just
52:49thought it's good to give a commentary and
52:51sort of for me to talk you through it it's
52:53a nice way for
52:53me to reinforce the knowledge as well so
52:55hopefully we're helping each other out um i
52:57'd love to know
52:58what you thought um if there's any
52:59questions you have about snowflake anything
53:01you're not sure of
53:02challenge me in the comments below i'd love
53:03to sort of get those questions and try and
53:05answer
53:06them and try and figure out how how it all
53:08works okay i'll catch you in the next one
53:10we'll be
53:11finishing some of this other stuff loading
53:13the data took quite some time so maybe in
53:14the next one
53:16we're going to be using some of the
53:18analytical queries and sort of trying to
53:20really get get
53:21hands on with the database so i think these
53:23will stay about an hour each so i'm sort of
53:25envisaging that we've maybe got another um
53:28you know two videos before we get to the
53:31end so look
53:31out for that in a week's time and for now
53:33thank you very much for watching and we'll
53:35catch you
53:36in the next one take it easy
Future-proof your career https://n1d.io
| My Courses on Linkedin Learning: https://www.linkedin.com/learning/instructors/tim-ngwena Following on from the last video, we get stuck into the Snowflake guide on getting started with Snowflake.
Find the guide here: https://j.mp/3nYHYPo
Intro 0:00
The overview of the interface 2:52
The snowflake web UI toolbar & Databases 4:29
Snowflake Shares 7:37
Snowflake data marketplace 8:32
Snowflake warehouses 9:42
Snowflake worksheets 15:22
Snowflake History 16:09
Snowflake admin tab 18:55
Preparing to load data into Snowflake 21:06
Setting up the database in Snowflake 22:37
Set Context for the worksheet in Snowflake 25:24
Run a query in the Snowflake web interface 27:49
Viewing a table and creating a stage in Snowflake 31:28
List files in an S3 bucket in Snowflake 36:29
Create a file format in Snowflake 38:02
Loading Data and debugging the file format 41:10
Main Timeline 52:25 Share feedback and Suggestions: https://tableautim.canny.io/suggestions -
Join this channel to get access to perks:
https://www.youtube.com/channel/UC7HYxRWmaNlJux-X7rNLZyw/join ----------
(C) 2023 TN-Media LTD. No re-use, unauthorized use, or redistribution, of this video without prior permission.