Getting started with Snowflake PT-3: Running queries and using JSON
Re-run the exact same query and it comes back in 65 milliseconds for free, because Snowflake's cache already has the answer.
- Snowflake's result cache holds query results for 24 hours across all warehouses, so re-running an identical query returns in milliseconds and uses no compute credit
- Zero-copy cloning takes a snapshot via metadata and pointers rather than copying data, letting you spin up an independent, writable dev environment without doubling storage
- The variant column type lets Snowflake ingest semi-structured data like JSON, Parquet or Avro without predefining a schema
- You can query nested JSON fields directly using SQL dot notation (e.g. v:city.coord.lat) as if each field were a relational column
- The query profile view breaks down where time and cost were spent, such as a table scan consuming most of the work
- Recap of loading data0:39
- Using Snowflake as a database1:36
- Running hourly statistics query5:43
- Profiling a query9:39
- The result cache in action11:24
- Busiest months query15:18
- Zero-copy cloning a table16:35
- Working with semi-structured JSON21:54
- Creating the variant table and stage24:41
- Loading and previewing JSON30:49
- Creating views and dot notation34:34
- Joining weather to trips data40:56
0:00Hey, it's Tim here. We're back doing more
0:01Snowflake. We're going to be pushing on,
0:03this is the third video in a series where I
0:04'm essentially just going through the
0:06getting started
0:07guide that Snowflake have on their website.
0:10Essentially, we finished off at the fifth
0:13step
0:14last time loading data. So we essentially
0:16managed to load data from S3 into our Snow
0:18flake environment.
0:19And so today we're going to be pushing on,
0:21we're probably going to be looking at some
0:22more of the
0:23analytical queries and also trying to look
0:25at some of the account administration
0:27aspects.
0:27So without further ado, let's get stuck in.
0:29This is a long video. If you look at the
0:31timestamps
0:31below, you'll sort of get a gauge of what's
0:33coming up and what you might be interested
0:35in.
0:35So yeah, let's get stuck in. Let's not
0:37waste any time and let's start doing this.
0:39Okay, so last time we finished this step of
0:41loading data, you can see here,
0:43we basically went through this entire step
0:45and we got pretty much the end. We had a
0:47couple of issues.
0:48Most notably, specifically when it came to
0:51making sure that we had the right column
0:54types and
0:55making sure that the CSV files that were
0:57loading from S3 were actually being
0:59ingested properly.
1:00We managed to get around that using the SQL
1:02that was actually in the worksheet. So this
1:05worksheet
1:06is available to you when you start off the
1:08demo. There's a little bug here where you
1:10can see that
1:10I've loaded it up, but it's not actually
1:12sort of loading it. And now when we go out
1:14and come back
1:14in, it works. So that's something to be
1:16aware of. So basically, yes, you can see
1:18here that the last
1:19thing we did was we just essentially copied
1:22everything from, well, we copied into trips
1:25from
1:25CityBikeTrips, if that makes sense. Okay,
1:28so we can see here the file format was CSV.
1:30So now that we've done that, the next thing
1:33to do is to carry on to Module 5, which is
1:35actually
1:35Module 6. And at this point in this demo,
1:38all they're trying to do is get you to use
1:40Snowflake
1:41as a database. So for those of you who've
1:42used Tableau, this is normally the kind of
1:44stuff you
1:45use Tableau for. You wouldn't be using SQL,
1:47you'd be doing it nicely in Tableau Desktop
1:50. Okay. So
1:51essentially, this is a very simple part of
1:53the process. Now, I've already got Snow
1:55flake running
1:56over here. So we don't need to sort of
1:57worry too much about that. What we will be
1:59doing sometimes
2:00is just simply highlighting the query and
2:02then hitting run just to save time for the
2:04video.
2:05But we will try and sort of decipher some
2:06of the queries and see how they're working.
2:08I'm not a
2:08SQL expert. So I'll just be sort of trying
2:10to give my basic understanding of this. But
2:13as you know,
2:13this is a video where I'm learning and you
2:15're kind of learning with me. So hopefully
2:17that helps. Okay.
2:18So yeah, let's go back to the Step 6. And
2:20let's just sort of figure our way around
2:23this. And one
2:24of the key things to know about this step,
2:25and actually many of the steps going
2:26forward is that
2:27Snowflake is trying to sort of introduce
2:29the philosophy of how they work. Okay.
2:32And so there'll be sort of a lot of things
2:34here that seem simple, but actually, they
2:35're kind of
2:36trying to get you to sort of see the
2:38difference between this and the traditional
2:40relational
2:40database as it were. Okay. So some of these
2:43points might be a bit labored, but I think
2:45they're really
2:45useful ones to sort of go through. So let's
2:47work through this. So let's see this.
2:49Everything
2:49they highlight in yellow is always useful
2:51to read. So I will skip right to that. A
2:53real world roles
2:53and queries. Within a real company,
2:56analytics users would likely have a
2:58different role than
2:59SysAdmin. Essentially, not everyone's an
3:02administrator. To keep the lab simple,
3:04we're going to stay with a SysAdmin role
3:05for this section. Additionally, querying
3:08would typically be
3:09done with a business intelligence product
3:11like Tableau. Hey, hey, hey, Looker, Power
3:13BI, and so
3:14on and so forth. In my view, that's in the
3:17right order. Okay. For more advanced
3:19analytics, data
3:21science tools like DataRobot, DataQ, AWS
3:23SageMaker, or many others could query Snow
3:26flake essentially.
3:27So if you need more sort of hands-on tools,
3:29then you've already got those capabilities
3:31for you.
3:33Any technology that leverages JDBC/ODBC,
3:36these are database connectors. So JDBC, I
3:40think it
3:40just stands for Java Database Connector or
3:42something like that, and ODBC's Open
3:45Database
3:45Connector or something like that. Spark or
3:48Python can run analytics on the data in
3:50Snowflake.
3:51To keep this lab simple, all queries are
3:53being done via Snowflake worksheet. So
3:55essentially,
3:56what they're saying is, listen, in the real
3:57world, you wouldn't use the Snowflake
3:59interface,
4:00and you'd probably use actual tools to do
4:02this. So like, you know, this is just us
4:04showing you how it works. But in the real
4:06world, you'll probably be using something a
4:08bit more
4:08organic, like a tool, because you're likely
4:11trying to do something to your data. Okay,
4:13so that's a good context to know. Like, a
4:15lot of people will sort of hear databases,
4:17and they'll immediately see SQL, and they
4:19'll think that everyone in the world writes
4:21SQL. Actually,
4:22no, you use tools that typically do the SQL
4:24for you. Or you use tools that help compile
4:27the SQL,
4:27but then you run the bits that you want and
4:29bits you don't want. Okay. So execute the
4:33select
4:34statements and result cache. So essentially
4:36, this one is going to be trying to get us
4:38to see the
4:38cache in action. So run the query below to
4:41see a sample of the trip status. So let's
4:43go back.
4:44And you see here that it's actually got
4:46everything set up. Now, it's going to make
4:48sure that we're
4:48using this specific role. So let's just
4:50make sure we're using that role. So role is
4:52sysadmin,
4:54warehouse is analytical warehouse. In my
4:56case, I'm going to stick to compute, then
4:58you've got
4:59use database, city bike, and use schema
5:01public. So everything is pretty much where
5:03we left it off
5:04last time. Okay. So let's go ahead and just
5:06select this line and hit run. And then see
5:08what we get
5:09back, it's going to ask this, I'm going to
5:11keep sort of not ticking this box, because
5:13I think I
5:14want people to see this every single time.
5:15So they're clear that I'm actually running
5:17the query.
5:18So when we go ahead and run that query, it
5:20took a bit of time, because of course,
5:22my snowflake instance was asleep, but the
5:23warehouse wasn't running. Now it's running,
5:25everything is good to go. So we can see
5:27here, we get a sample of the data, and we
5:29're pretty much
5:30good to go. Everything's worked exactly as
5:32expected. Okay. So we get pretty much the
5:35same
5:35data, it might be slightly different given
5:37you know, where, where everything is and
5:39how everything
5:40is done. But essentially, we get the sample
5:43data. First, let's look at some basic
5:45hourly statistics
5:46on city bike usage. Okay, so we've
5:48downloaded some data from S3. Now we want
5:51to start analyzing it
5:52and see how our bikes used. Run the query
5:55below in the worksheet, it will show for
5:57each hour,
5:58the number of trips, average trip duration
6:01and average trip distance. So let's go
6:03ahead and
6:04go back. I think actually my dog is snoring
6:06in the background. One sec.
6:09Alright, I had to wake him up slowly just
6:15to stop snoring.
6:17It's as boring as as my dog thinks it is. I
6:21hope not. If that's the case, let me know
6:24in the
6:24comments below. So let's go back here and
6:27just select a few of these lines. So this
6:30is the step
6:31we're trying to do here. So let's just look
6:33at this query in more detail. Let me just
6:35give this
6:35more space. Let me see if I can drag this
6:37down there and get blue marker down. And if
6:39I just
6:40hit command plus, I think this will zoom in
6:42. There we go. So select basically, it's
6:45basically taking
6:46the start time and it wants us to take the
6:48hour, it's truncating to the hour and just
6:51bring it back
6:52as a date. So basically, select the trip
6:54time, round it up to the hour and just
6:56bring it back
6:56as a date. Essentially, the trip time is
6:59actually like a date time interface. So by
7:03truncating to
7:03the hour, we just return everything within
7:05the hour to have the same start time. And
7:07then the
7:08date, we just bring in as a date
7:09essentially. So that will give the column
7:11name date essentially.
7:12And then count the number of trips,
7:14basically count everything and return it as
7:17number of trips.
7:18And then the average trip duration. So
7:20essentially the trip duration is in seconds
7:22, divided by 60
7:23will give us the minutes. And so if we do
7:25that and then take the average of that,
7:28then essentially
7:30we'll get the average duration in minutes.
7:32So they've done it slightly different here.
7:34They've taken sort of the trip time,
7:35created an average in seconds, then divided
7:38it by 60.
7:39You could do the same thing essentially,
7:40then you get the answer in minutes. It will
7:42be decimal
7:43point minutes, but essentially it'll still
7:45be fine. Then the average start latitude,
7:50the start station longitude, end station.
7:54So what it's basically doing is it's using
7:58a special
7:59function. So have a sign, I think is a
8:02specific spatial function or specific
8:04function that will
8:05allow us, if I just scroll across so we can
8:07see the rest of this query and see if I can
8:09,
8:10I can't really, I can bring this to the
8:11left. So if we just look at this, it's
8:13basically going to
8:14be calculating the distance. So this is
8:16like a big clue of what's going on. And I
8:18think this function
8:19is like a mathematical function that
8:21essentially takes a start point, the finish
8:24point, and takes
8:26those two things and it calculates the
8:27distance. I don't know how it does it
8:29because it's not like
8:30the journey time. It's just basically
8:32drawing a line between the two. I don't
8:34believe it's
8:34doing anything sort of complex, like
8:36plotting the routes. To do that, you'd need
8:39an API to go out to,
8:40let's say something like a mapping database
8:42so you can get something back. So it's just
8:45calculating
8:45the distance as the bird flies essentially,
8:48and it's doing that from trips and a group
8:50by one,
8:50order by one essentially. So just give it
8:53some sense of order. So let's go ahead and
8:55highlight
8:56that and hit run and let's hit run again.
8:58And we'll sort of bring this up and there
9:01you go.
9:02You get basically for each hour, the number
9:05of trips, the average duration, the average
9:07distance.
9:08And because it's aggregating this, we don't
9:10get sort of the locations, just taking look
9:12how many trips are happening in a
9:13particular time and what's the average
9:15distance and average
9:16duration. So we can sort of look at all the
9:18different start times of our journey by
9:20hour
9:21and see, okay, how many trips are being
9:23made and what's the average sort of picture
9:25look like in
9:25this sense. It's a very generalized way of
9:27doing this. Once you start doing more
9:29analytical
9:29questions, you might want to do some more
9:31sort of bespoke queries. So that's that
9:33query done.
9:34Essentially, we just select that, we run it
9:36, and then you're able to just look at it.
9:38Now,
9:38if I go ahead and look at the query ID and
9:40I just go into the query by clicking on it,
9:42you actually go to this sort of view where
9:44it shows you the query again. So you can
9:46see
9:46it's a success. You can see the query ID
9:48itself and you can see the query that run.
9:51One thing you can do is you can actually
9:53profile the query. So if I just zoom back
9:55out,
9:55you can actually sort of see the profile of
9:56the query. Essentially, it's showing you
9:58sort of what was happening. Okay, so if I
10:00just click on this, if I just show you this
10:03little
10:03square marker here, it makes sure that
10:05everything is sort of in the screen. And it
10:07sort of shows you
10:08like, you know, what cost you the most, you
10:10know, what was the most expensive part of
10:12this process.
10:13And so you can see here the table scan was,
10:15you know, took 72% of sort of the total
10:18cost.
10:18I never know how to sort of interpret this
10:20because it's sort of like costs in Snow
10:23flake are based on
10:24credit systems. So it's, it doesn't sort of
10:26give you a monetary value, but it's
10:28essentially just
10:28showing you look, this query, and most of
10:30your effort was spent down here. And then
10:32look, some
10:33of these other things took a lot less time.
10:35So this is our sort of whole query, we go
10:36to 61
10:37million rows, we aggregate it, we sort it,
10:39and then we return the result. And that's
10:42what we see.
10:43Inside of the query window. And you can
10:45actually see some of that sort of
10:48performance over here on
10:50the right hand side, broken down by various
10:51factors. Again, this is probably worth
10:53another
10:54video. So I'll probably do another video on
10:56, you know, breaking down the profile of a
10:58query in
10:58Snowflake or something like that. I keep
11:00taking notes, by the way, on the side,
11:01making sure I take
11:02all these things down, because they're not
11:04things I understand, and it's probably
11:05worth coming back
11:06to them. But if we go back, then we just go
11:08back in this sort of window, and it takes
11:10us back to
11:10our initial query that we ran. And we can
11:12just sort of expand this out. So that's
11:14that's basically
11:15that I just wanted to sort of give a deep
11:16dive into what one query looks like. So you
11:18can
11:19understand to the mechanics of how this is
11:21all working and how it's all coming
11:22together. Okay,
11:23so let's go back to our query. And this is
11:25the query we've looked at, we've got a
11:27result.
11:28And basically, the point here is that Snow
11:30flake has a result cache that holds the
11:32results
11:32of every query executed in the past 24
11:34hours. These are available across
11:36warehouses. So
11:37essentially, it doesn't matter how many
11:39warehouses you have, whether a small
11:41warehouse did the job or
11:42a large warehouse did the job, the cache is
11:45universal to all of those warehouses. Okay,
11:47so query results return to one user
11:49available to any other user on the system
11:52who executes the same
11:53query. So it's got like a global awareness
11:55of queries being asked. So if someone, let
11:58's say,
11:58run a query on a super fast state of
12:00warehouse, sorry, and then I come along
12:02with my small
12:03warehouse and ask the same question, I'll
12:05benefit from the fact that the question's
12:07already been
12:07answered by a more powerful warehouse. So
12:09that's a nice way of saving costs as well.
12:11It's really
12:11trying to optimize the amount of use you
12:14get out of this query sort of hits don't
12:16cost you anything
12:17because there's already an answer there. So
12:20why would I need to ask the same question
12:22again?
12:23And that cache here is executed over 24
12:25hours. I'm not sure if you can change that
12:27period,
12:28it'll probably cost more to keep that cache
12:30available for longer. But it's nice that
12:33any
12:33query asked in the last 24 hours are going
12:36to be available, which means that data is
12:38kept fresh
12:38data is, you know, you don't have anything
12:41that's sort of over over a day. But the
12:43other key thing
12:44is, it's also interesting to understand
12:46whether this is sort of automatically
12:48invalidated once
12:49you load new data into it. So does it have
12:51an awareness of, oh, I've had new data, so
12:53let me
12:53not rely on the query? I'm sure that's
12:55probably the case. Maybe that will get
12:57explained in a bit.
12:58But let's push on. Not only do these
13:00repeated queries return extremely fast, but
13:02they also use
13:03no compute credit. So you don't get charged
13:05for them. Essentially. Let's see the result
13:07class in action by running the exact same
13:09query again. So let's go back. And you'll
13:12see here that
13:12they don't actually give us the query again
13:14. So we're just going to literally do the
13:16same thing,
13:16highlight it, and run it again. And it's
13:18going to ask us again, and we're going to
13:21hit run.
13:21And you'll see that this one returned in 65
13:24milliseconds. Okay. So if we go to the
13:26query ID,
13:27and we go to it, you'll see the query
13:29profile view that we ran earlier on. Now,
13:32it looks completely
13:34different. And you can see that because for
13:36a brief moment, you saw what it looked like
13:38. In
13:38fact, what I'll do is I'll probably take a
13:40screenshot of that and overimpose and fade
13:42in
13:42between the two of them in post. But you
13:44can see this one just use the cache. So it
13:46basically is
13:46a query reuse, and return the answer
13:48straight away. The other way I can show you
13:50that if I go
13:51to the history tab, I've only run two
13:53queries, essentially. So they were
13:56immediately after each
13:57other. And you can see here they are. So
13:59these are basically the exact same query.
14:01So here's the
14:01second one that I ran. And that's the
14:03sequel. And then here's the first one that
14:05ran exactly the
14:06same. And just just to sort of keep this
14:09easy for you to see, let me just highlight
14:11those two
14:11queries. So here are the two queries, you
14:14can see they're both done in the Snowflake
14:16UI. And the key
14:17thing you want to sort of pay attention to
14:19is the total duration. So the first time it
14:21ran, it took
14:222.1 seconds, not like a slow response at
14:24all. But the second time it ran, it was 65
14:27milliseconds
14:28to get the response back. And it didn't
14:30need to scan the data at all either the
14:32byte scan stayed
14:33basically zero and the second time around.
14:35So essentially, the second query was free,
14:38I didn't
14:38end up using any warehouse in order to do
14:40that. Okay, so that's a really sort of nice
14:42thing to
14:42see. It's nice to see that it's always
14:44being optimized to sort of save me money,
14:46which is
14:47sort of nice to make sense. Okay, so let's
14:49go back to where I was before. Let's go
14:52back a couple
14:52of steps. And I'm just heading back in my
14:54browser for the record recently, you can't
14:56see your
14:57browser bars, because I think it's easier
14:58to see what I'm doing without it. So I've
15:00basically turned
15:01this into like a web app, essentially, so
15:03that we can just go through this without
15:04any sort of
15:05distraction. And so that's that. And that's
15:07basically that we've seen the cache working
15:09,
15:09let's go on to the next step. In the
15:11history window, note that the second query
15:13runs
15:13significantly faster because the results
15:15have been cached. I've shown you this
15:16already. And next,
15:18let's run this query to see which months
15:20are the busiest months. So let's go ahead
15:22back into our
15:22little SQL script. And let's have a look at
15:25this SQL here. So select start time, and as
15:29month name.
15:30So I think month name is like a function
15:32that returns just the name of the month and
15:34returns
15:34it as month. Notice that it's not month
15:37year, so it will just say January, February
15:39, March across
15:40multiple years of data. So it will
15:41aggregate all of these up, count the number
15:43of trips in each of
15:44those months and do that from the trips
15:47table group by one order by two descending.
15:50Okay, so
15:52I think essentially group by one, I'm
15:54thinking that's like a like this function
15:56here that just
15:57gives us a row number and then order by
16:00descending just basically returns the in
16:03descending order by
16:05a number of trips. So if I just, let's just
16:07run that query, I'll actually be able to
16:08see what it
16:09does rather than just talking out my ass
16:11one second. Yeah, so it does, it does
16:12essentially
16:13do what I thought it does. So if I just
16:15move my face here to the top, you can see
16:17here that June
16:18is the busiest month, then September, then
16:20October. And yeah, we basically pretty much
16:22get everything
16:23that we need and it's sort of really nice
16:25to see. Okay, perfect. So that's that. So
16:29just another
16:30query. There's no real point here. It's
16:32just basically getting us to run a query,
16:33which is nice.
16:34Cloning a table. Now this is nice, okay.
16:37Snowflake allows you to create clones, also
16:40known as zero
16:41copy clones of tables, schemas and
16:44databases in seconds. A snapshot of a data
16:47present to,
16:48sorry, let me start again. A snapshot of
16:50data present in the source object is taken
16:52from the
16:52clone from when the clone is created and is
16:54made available to the cloned object. In
16:57English,
16:58basically, it takes a snapshot of the data,
17:00but it doesn't actually copy the data. We
17:02'll come to that
17:03in a second. The cloned object is writable
17:05and independent of the client source.
17:07Therefore,
17:07changes made to either source object or the
17:10client object and not included in the other
17:12. I.e.
17:12let's say I want to test something. I take
17:16a snapshot of my real data. It creates a
17:18separate
17:19clone. And what it does is then from that
17:21point onwards, these two clones are
17:23independent. So
17:24anything I do to my client doesn't change
17:26the real data source. Okay. And the next
17:29cool thing is
17:29a popular use case for zero copy cloning is
17:32to clone production environment for use by
17:34development and testing to test and
17:36experiment with that adversely impacting
17:38production
17:38environment and eliminating the need to set
17:40up and manage two separate environments.
17:42So a typical thing in analytics is let's
17:44say you're building an analytical product,
17:46let's say some dashboards. And what you
17:48want to do is you want to use real
17:50production data
17:50in order to do this. But what you don't
17:52want to do is to take down your production
17:54data.
17:54So what you typically have to do is stand
17:56up a whole nother environment called dev or
17:59development
18:00where you basically publish all your data.
18:02And it's normally a copy from production.
18:04So you have
18:04to go through the whole process of moving
18:06all your data across. And that in itself
18:08can actually break
18:09the production database. I've actually seen
18:11this happen a couple of times. And so it's
18:13a lot of
18:13effort and work. And in real terms, what
18:16you actually end up having is two physical
18:18databases,
18:19one for dev and one for prod. And there's
18:21also a downside, which is dev is never as
18:23fast as prod,
18:24because companies like to cheap out. So
18:26what you end up happening is you build an
18:28analytical
18:28product never really knowing what the real
18:30performance is like, because you built it
18:32on dev.
18:32And then when you get to prod, prod has a
18:34whole set of different circumstances that
18:36you have to
18:36cater for. So all the work you did on to
18:38have never actually even mattered. And so
18:40that's me
18:41having a bit of a rant. But anyway, zero
18:43copy cloning is cool. And let's just go
18:45into this a
18:46bit. A massive benefit of zero copy cloning
18:49is that the underlying data is not copied.
18:51Only the metadata and pointers to the
18:54underlying data change. Hence, they are
18:56zero copy. And
18:58storage requirements are not double when
18:59the data is cloned. Most data warehouses
19:01cannot do this.
19:02But for Snowflake, this is easy. So what
19:04they're saying is, you create a clone, but
19:07you don't take
19:07up twice as much storage, because all you
19:09're doing is you're copying the metadata
19:11around it.
19:11So all your query, everything runs on this
19:14other metadata layer, that's essentially
19:16not actually
19:16touching your data. So your data always
19:18trade stays true to form, everything you've
19:21done on
19:21top of it, sort of kept separate to the
19:23data. And the zero copy clone is sort of an
19:26enhanced
19:27version of doing that. And you can do that
19:29to enable other people to start working on
19:31it. Let's
19:32say analytical use cases, they're normally
19:34quite resource intensive. So they tend to
19:36be done on dev
19:37environments or separate analytical
19:39warehouses. And so this is really nice. Now
19:41, the weird thing
19:42here is that Snowflake are confusing the
19:44term warehouses. So don't again confuse
19:46warehouses
19:47and Snowflake with warehouses in typical
19:49databases. So when it talks about most data
19:52warehouses cannot do this, Snowflake means
19:55a traditional database warehouse, not a
19:58Snowflake
19:59warehouse. Okay. So they're going to show
20:01us how to do this. Run the following
20:02command in the
20:03worksheet to create the development table.
20:05So we're just gonna go back to our sheet.
20:07And let's
20:08go run that. So create table trips, dev
20:11clone trips, hit run, hit run again, this
20:14will take a
20:15bit of time. Oh, actually didn't take time
20:17because of course, it exists. It's just
20:18creating another
20:19instance. So table trips, dev successfully
20:22created. Okay. So in order to see that, let
20:25's, let's go
20:25ahead table trips, dev, I think that's in
20:28our city bike. So let's go into public. And
20:30you can see
20:31here that trips dev has been created. So if
20:34I just sort of highlight this for you, you
20:36can see here,
20:37I now have two databases. And they're
20:38essentially the same, I could run one query
20:40on one and another
20:41query on the other. And they're exactly the
20:43same. But now these two are separated. I
20:45can't,
20:46I can't break trips just because I'm
20:48working on trips dev. But the underlying
20:50data is exactly the
20:51same. I didn't need to load all my data
20:53from S3 again, to get this one to work,
20:56right? It's
20:56everything is just sort of been kept nicely
20:58separate. Okay. So that's really cool. I
21:00really
21:01like that. It's nice, sort of nice to have.
21:04And if I just go back to my, actually, don
21:07't do that,
21:07let's go to my worksheets. And then why hit
21:09back there, we're actually on this step.
21:11And we've
21:12already completed that. So that's good.
21:14Essentially, we just wanted to see where
21:16trips dev was. And
21:17we've seen that already. So that's a nice
21:19thing. That's a nice thing to have. But
21:22again, just to
21:22sort of summarize this section, analytical
21:24queries, results, cache cloning, these are
21:26just standard
21:27database capabilities. If you work with a
21:29database, this stuff is bread and butter.
21:32But actually,
21:32some of the benefits of Snowflake is it
21:34breaks that down in a way that's sort of
21:35really, really
21:36cool. And I think it's really interesting.
21:39Okay. Next, number seven. So we're moving
21:41on here,
21:42we could see at the top, it says 46
21:43remaining, I like this sort of time
21:45estimate, I doubt we'll
21:46keep to that because I'm talking whilst
21:47doing this. So if I did that might end up
21:49being like
21:50a three hour video or something. But anyway
21:52, let's keep going. Working with semi
21:53structured data,
21:55views and join. Okay, so going back to the
21:58lab example, the city bikes analytics team
22:00wants to
22:01determine how weather impacts ride counts.
22:03To do this in this section, we will load
22:06data in JSON
22:07format here, held in the public s3 bucket.
22:10I'll come to that in a minute and create a
22:12view and
22:12query the semi structured data using SQL
22:15dot notation, run a query that joins the
22:18JSON data
22:18to the previously loaded trips data,
22:21analyze the weather and ride count data to
22:23determine
22:24their relationship. Okay. So this is really
22:26interesting, because JSON is a slightly
22:28different
22:29data format. Okay, it's not a sort of a
22:31traditional data source, like a CSV.
22:33It's typically more used in web development
22:35. So I think it stands for JavaScript object
22:38notation,
22:38which is just another sort of, I don't know
22:41why they come up with all these names. But
22:43JavaScript
22:44object notation is essentially a way that
22:46you write JavaScript to talk about sort of
22:49information in
22:49the hierarchy. It typically uses
22:52information stored in arrays and different
22:54levels of hierarchy
22:56with arrays in those hierarchies. But I'm
22:58not a web programmer. So I could also just
23:00be, you know,
23:01maligning that description. So if I am let
23:03me know in the description, we'll get stuck
23:05in. Okay. So
23:06this is actually quite a cool step, because
23:08we quite often have to work with data
23:10sources from
23:10the web. And I know in Tableau, you can
23:12connect to JSON files directly. So you kind
23:14of don't have
23:15to worry about that. But what's even better
23:17is actually not having to burden Tableau
23:19with the
23:20sort of processing that it's going to have
23:22to do to process that JSON, and just
23:24connecting to it
23:25directly through Snowflake. And so Snow
23:27flake can do the hard work of reading that
23:29data. And then
23:30you can just natively query columns and
23:32rows exactly like you would in any other
23:35database,
23:35and let Snowflake do all the hard work of
23:38sort of understanding that that data. So it
23:40's
23:40a nice little sort of feature. And I think
23:42it's a really sort of nice, nice thing to
23:44do. So let's
23:45let's figure out how it works. So let's
23:47read this yellow section semi structured
23:50data. Snowflake can
23:51easily load and query semi structured data
23:53such as JSON, Parquet, or Avro without
23:56transformation.
23:57So you don't need to do any transformation
23:59work to get this to work. This is important
24:01because an
24:01increasing amount of business relevant data
24:03being generated today is semi structured,
24:06and many
24:06traditional data warehouses cannot easily
24:08load and query such data. Snowflake makes
24:10it easy. So the
24:12point here is, typically, if you were
24:13having to work with this data, you'd have
24:15to run a step
24:16before you loaded it into your database,
24:18then you can start to work with it. The
24:20problem there is it
24:21adds time, cost and also expertise, because
24:23you've got to have people who know what
24:24they're doing to
24:25write the scripts and the transformations
24:27required to do that. Now, for some people,
24:29that might not
24:29be a big step. But if lots of your data
24:31comes from web data sources, then this is
24:34literally going to
24:35be like a whole team doing this and making
24:38sure this works every time. So to do this
24:41demo, we need
24:41to create a new table. So we'll go back to
24:44our worksheet, and we'll just start sort of
24:46using some
24:47of these. So create a database, we're going
24:49to create some roles to look at this data.
24:52And then
24:52next, let's create a table called JSON
24:54weather. We're basically just going to go
24:56work our way
24:57through this and bring in data from various
24:59places. So this is a fun exercise, it'll
25:01take a
25:01bit of time. But let's get into it. I think
25:04we should be able to get to the end of this
25:06very
25:06easily. So let's go ahead and do this. So
25:08if I go back in here, and we're just into
25:11model six, so
25:12let's go ahead and create a database called
25:14weather. And if we do that and hit run, it
25:17's
25:18actually going to say database weather has
25:20been created, I'm just going to refresh
25:21this left hand
25:22side. And you should see here weather is
25:25turned up. So this one here is just the one
25:27we want.
25:28And what we can do is we can just sort of
25:30expand this and just see that there's
25:32nothing there. So
25:33there's no tables or anything in our public
25:36sort of area in this particular view. So
25:40that's good.
25:41Then we have some roles to create. So use
25:43role sysadmin, which is already where I am,
25:45use warehouse compute warehouse already
25:47where I am, use database weather already
25:49where I am,
25:50and use schema public already where I am.
25:52Now notice, in the previous step, we were
25:55running
25:55everything on city bikes. And so when I
25:58actually created this database, I reckon it
26:01automatically,
26:02I'm not 100% sure, but I think it
26:04automatically took me to that database.
26:06Because if you create
26:08a database, likelihood is the thing you
26:10want to do next is to work in it. So
26:12instead of having you
26:14have to remember each time to switch
26:15databases, it just switched you over, which
26:17is sort of a nice
26:18thing. It just keeps stops you sort of
26:19working. But if you're working in your own
26:21tool, don't
26:22create a database and then start running
26:23the wrong queries on the wrong database.
26:26That's a bit of a
26:27however. Okay. So now once we've created
26:29our database, we can go ahead and create
26:31this table.
26:32So JSON weather data and v variant. Let's
26:34just go back and see what it says about
26:36that.
26:37So here we go. Let's create a table called
26:41JSON weather that will be used for loading
26:43JSON. In
26:44the worksheet, run the SQL text below. Snow
26:46flake has a special column type called
26:48variant, which
26:49will allow us to store the entire JSON
26:51object and eventually query it directly. So
26:54that's why this
26:55v variant thing has been inserted. I don't
26:58know what the initial v is for. Again, if
27:02you know,
27:02let me know in the comments, but variant
27:04makes sense. They've actually told us what
27:06that is. So
27:07essentially if we're working with JSON data
27:09, we need to tell Snowflake that's what it
27:11is. Okay.
27:12Semi-structured data magic. The variant
27:15data type allows Snowflake to ingest semi-
27:17structured
27:17data without having to predefine the schema
27:20. This is very cool. So remember before when
27:23we worked on
27:23CityBikes, we had to sort of convey the
27:25schema because we had to tell it that, look
27:28, this column
27:28is a string. This column is a number. This
27:31column is called this. This column is
27:32called that. JSON
27:34has that sort of hierarchy built in. So
27:35essentially what it's saying is that the
27:37variant sort of
27:38notation allows them to just go in and not
27:40have to specify a schema. The reason this
27:43is is because
27:43when you use JavaScript object notation,
27:46essentially you can have an endless number
27:48of
27:48hierarchies inside of that sort of data
27:50array. So if you had to describe the schema
27:52, you'd always
27:53be limiting yourself because if you add,
27:55like I say, a new feature in a web
27:56application,
27:57you might add a new hierarchy in your JSON
27:59object to reflect the data that comes from
28:02that new
28:02feature. And so if you don't have to do
28:04that, the schema can always be adapting
28:06itself to suit your
28:07needs. And so not having to specify schema
28:10is really, really, really handy. So that
28:13makes a
28:13lot of sense. So let's go back. And I just
28:15wasn't sure what the variant was. So let's
28:17go ahead and
28:18create table and run. So now that's just
28:21done, what we should be able to see if I
28:23just go back into
28:25public and just refresh this again, not in
28:28my views, go to public. You can see here
28:31that tables
28:32is there. So JSON where the data has been
28:34created, and that's what we were working
28:37with. So we're
28:37still in weather. And that's all been
28:40nicely created. Okay. So now that we've
28:43done that,
28:44let's have a look and see what's next. So
28:47verify that the table JSON has been created
28:49, we've seen
28:50it already, we've gone to database says now
28:52we can load data into it. So we've checked
28:55all this
28:55already. So we can now start loading data.
28:57So via the worksheet, create a stage where
29:00the unstructured
29:00data is stored in AWS S3. So again, like
29:04last time, we need to go tell Snowflake
29:06where the data is,
29:08this time round, notice we're doing it
29:10entirely inside of the worksheet we're
29:13doing inside of the
29:14query editor. So this is something you can
29:16do, I could go into the databases, click
29:19create a stage
29:20and everything, but you can actually do it
29:21all through the command line. So you know,
29:23what tends
29:24to happen is the more and more you work
29:25with these stores, the more and more you
29:26just do all your
29:27work in one place. Because having done it
29:29in that one place, you can then just make
29:31sure that
29:32everything is sort of done properly. And
29:34like we saw before, it's actually probably
29:35a better way to
29:36do it, because then you can make sure you
29:37've covered all your bases when it comes to
29:39writing
29:39the sequel you need to write. So let's go
29:42ahead and run this. And let's see what
29:45happens. So
29:45create stage New York weather URL is that.
29:48And that's pretty much it. So hit run.
29:52Stage error New York weather successfully
29:54created, we can go validate that a little
29:56bit by going to
29:57databases, we can go to weather, and then
30:00we go to the stages tab. And we should see
30:04the stage there
30:04New York weather. So that's been created
30:06for us, it's really nice to see that. Let's
30:08go back to the
30:09worksheet. And we're pretty much good to go
30:11. Now when we say list at New York weather,
30:14what we're
30:14basically going to see is what files are
30:16inside of that S3 bucket. So let's go ahead
30:19and hit that.
30:20And what we should see is all the different
30:22sort of weather files that dump that as
30:24JSON files.
30:24So essentially, the weather is being sort
30:27of collected at various intervals. And
30:29every day,
30:30I assume it's just sort of dumping the
30:32weather for that particular day. So it's
30:35also sort of
30:36in different parts. I'm not sure exactly
30:38what weather is in which file, but you can
30:40see here
30:40that it's it's just dumping in there. And
30:43the second thing is it's compressed. It's a
30:45gzip
30:45file again, something to be aware of. Okay.
30:48So once it's there, I don't think we have
30:50to do
30:51anything because you can just see here, the
30:53next step is to copy that JSON weather data
30:55from our New York weather staging area to
30:58the new place, which is JSON weather. So
31:01this is sort of
31:02written backwards. So what we're basically
31:04saying is, copy all our data from New York
31:07weather
31:08into the JSON weather data. And the file
31:11format is type JSON. So it's a bit like Y
31:14oda in
31:16Star Wars, he always sort of talks from the
31:17middle of the sentence to the front and
31:19then to the back,
31:20you know, wise one is and you could just
31:23say you're a wise one. Okay. So this is
31:25sort of like
31:26JSON. This is sort of sequel for you. I don
31:28't know why it's like this. This is
31:30programmers years ago,
31:32who just wanted to make things maybe more
31:34cumbersome. Who knows, maybe there's a good
31:37reason. If you know, let me know. So let's
31:40go ahead and run this query. Copy into JSON
31:42weather data from New York weather file
31:44format equals type J equals JSON, hit run.
31:47And now once
31:48we've done that, you can see I'm fairly
31:50confident here, I know what's going to
31:51happen. So I'm not
31:52even bothering to go back to the guide,
31:54because you know, you can start to sort of
31:55guess what's
31:56going to happen each time. And you can see
31:58that it loaded each and every one of these
32:00pretty well,
32:01we got a loaded status for each one. So
32:03that works, we're pretty happy with that.
32:06And now what
32:07we can do now we've loaded the data, we can
32:09just go ahead and look at the first 10 rows
32:12of data
32:13from our new database. So let's go ahead
32:15and hit run. And when you do that, you'll
32:17see that this
32:18is real JSON data. So we get the city, let
32:20me just bring this up so we can see more of
32:23it. So we only
32:23get 10 rows. But you can see here that if I
32:26just, I just click on one of the lines, it
32:28gives me a
32:28little preview. And so you can see here, we
32:30get the city, the coordinates, the Latin
32:32long. And so
32:33you can sort of see the hierarchy I was
32:35talking about. So there's this object
32:37called city. And
32:38then in that hierarchy, it goes one step,
32:40and each city has a coordinate. And each
32:43coordinate has two
32:44attributes, a latitude and longitude. And
32:47that's all part of the city label. Okay,
32:50then you've got
32:51country. Actually, I'm wrong. Coordinate is
32:55one thing, country is another thing. And
32:59the name is
32:59another thing. The ID is another thing. The
33:01languages, the different ways you can say
33:03it is
33:03another thing. So this is a really sort of
33:06complex hierarchy. And essentially, I think
33:08this is just
33:09the weather in that particular city at that
33:11particular time. I don't I think if I keep
33:13scrolling down, it's actually loading more
33:15data, which is what's going on. So when I
33:17get to row 226,
33:19I think that's actually the bottom of the
33:22data set this time. I don't think there's
33:24other I think
33:25there is more 226 is the last row. So you
33:27can see there's quite a lot of data. And
33:29you've got
33:30different timestamps and different sort of
33:33weather. But I think this is just here's
33:36this
33:36specific time, here's the weather. And this
33:38is the description basically. Okay, so that
33:40's how this
33:41data looks like. And then you've got this
33:43multiple times each and every time. So
33:45essentially, this is
33:46New York weather again and again and again.
33:49And it's got different locations in New
33:51York, it's not
33:52just the one location, you can see here, we
33:54've got different latitudes and longitudes,
33:55they're ever
33:56so slightly different. Maybe they're coming
33:58from set weather stations. And that's what
34:00they're
34:00using essentially to take the weather in
34:02New York, because New York is a big place.
34:05But if you're
34:05just talking about the city, New York, well
34:07, there's probably only going to be three or
34:08four
34:08weather stations you can realistically use
34:10to get a real time weather reading from the
34:13city
34:13and as a data source. So that makes a lot
34:15of sense. So that's essentially it, we've
34:17seen we've loaded
34:18our JSON, we're pretty much ready to go.
34:20Now, if I go back to the guide, it's going
34:22to talk a little
34:23bit about working with V. So we've done all
34:25of this step, if you're wondering why I'm
34:27skipping
34:27past this, we've actually done all of this,
34:29I didn't necessarily do it here, I just
34:31talked
34:31through it. Okay. Next thing, let's look at
34:34how Snowflake allows us to create a view
34:36and also
34:36query the JSON data directly using SQL.
34:39Okay, so views and materialized views, a
34:41view allows the
34:42result of a query to be accessed as if it
34:44were a table. Views can help present data
34:47to end users in
34:48a cleaner manner, limit what end users can
34:50view in a source table and write more
34:52modular SQL.
34:53There are also materialized views in which
34:55SQL results are stored as though the
34:57results were
34:58a table. This allows faster access but
35:00requires storage space. Materialized views
35:02can be accessed
35:03with Snowflake's Enterprise Edition or
35:06higher. Okay, so you need to be able to use
35:08a certain sort
35:09of set of databases. Now, this is sort of
35:12interesting because I'm actually on a paid
35:15version of Snowflake, so I'm not sure this
35:17is going to work. Materialized views can be
35:19accessed
35:19with an Enterprise Edition or higher. I'm
35:21not an Enterprise Edition, I'm just on the
35:22standard
35:23edition but I know the demo would probably
35:24have enabled this. So if this doesn't work,
35:27this is
35:27probably why. So let's go ahead and try and
35:29run this anyway. We should get some of the
35:32warning
35:32that sort of makes some sense. So let's see
35:34if we can create the view anyway and just
35:37hit run.
35:37Okay, so here we are with the SQL. We've
35:40got all the information there, let's hit
35:42run. So yeah,
35:43the view was successfully created. Let's
35:46just refresh this little right hand side
35:49area.
35:49And you can see here that I now have a new
35:52view just over here. So we have tables
35:55and then we have views. Don't forget, views
35:57are like tables but they're not the same,
35:59okay.
35:59They're sort of based on tables. It's like
36:02you're basically creating a perspective of
36:05the real
36:05thing. So you want something to be simpler
36:07for someone to work with, so what you go
36:09and do is
36:10you go and do all the work in advance. And
36:12when you materialize that view, essentially
36:14that work
36:14doesn't have to be done every single time.
36:16It's already been done once and that can be
36:18the sort
36:19of starting point. But what you're not
36:20doing is changing your source data in order
36:22to do that.
36:23So that's why sort of views are useful in
36:25this particular instance. So that's really
36:27nice.
36:27SQL.notation v.city.core.lat is used in
36:31this command to pull out values at lower
36:33levels within
36:34the JSON hierarchy. This allows us to treat
36:36each field as if it were a column in a
36:38relational table.
36:40So in the previous step, what we'd actually
36:43seen is that each city has a bunch of
36:45information. Things
36:46like coordinates, things like the location,
36:49the Latin long, the different ways it can
36:51be spelt
36:52for example, different ways it can be used
36:54in language. And what we're doing with the
36:56dot
36:56notation here is we're just going deeper
36:58and deeper into the hierarchy. So again,
37:00you can go
37:01deeper into a JSON hierarchy as you add
37:04more complexity to your hierarchy. That's
37:06essentially
37:07what's going on here. So you can see here v
37:09.city, essentially the name of the city,
37:12and then we go dot id and then we bring it
37:16back as a city id, essentially. It's an
37:19integer which
37:19is why it says int. The next one is v.city,
37:22so city.name. And again, we just bring it
37:25back
37:25as a string as city name. Let's go
37:27something a little bit harder. So v.city,
37:31then dot coordinates,
37:32dot lat, that basically brings us back the
37:35latitude. And then as you go down, you can
37:37see
37:37that there's obviously different levels of
37:39this hierarchy that you can go and get. So
37:41it's a
37:42really nice simple way of getting to data
37:44in a hierarchy without having to know too
37:46much about
37:47whole entire structure of the whole thing,
37:49or having a column for everything pred
37:51efined in
37:51a schema in advance. So that's a really
37:53nice thing to do. And what they've done is
37:57they've
37:57actually filtered it a little bit. So from
37:59Jason Weather where city id equals this. So
38:02this I think
38:03is the city id for New York. So that's why
38:05that's there. It's filtering the data. We
38:09're getting it
38:09from Jason Weather, but we're making sure
38:12the city id is 5128638, which I believe
38:15would be the same
38:16id as New York. So if I click on this, I
38:18can actually go ahead and preview the data.
38:21And so
38:22what I did there in case you missed that
38:24just for reference is I went down here, I
38:26clicked on Jason
38:27Weather, that created this little preview
38:29so I could see all the data types coming
38:31from this
38:31thing. And then I clicked on preview data,
38:34and that preview data then gave me this
38:36table. So that's
38:37sort of how it works. And that's how it all
38:40came together. So yeah, you can see here,
38:42this is all
38:43New York 5128638. That's essentially just
38:45like me running a filter in Tableau
38:47essentially. And now
38:48that's super, super easy to work with. Okay
38:50, so that's that's it. That's basically this
38:53particular
38:53step, we've got a couple more steps. Let's
38:55see if we can sort of figure out the next
38:57step without
38:58reading the instructions. So select star
39:00from Jason Weather data, select everything
39:01from Jason
39:02Weather data, where the date is the same as
39:052018, January the first 2018, essentially,
39:09and just
39:09return the first 20 rows. Okay, so let's
39:12just run that to make sure that is actually
39:15what's going on.
39:16Let's hit run. And we should see that
39:19everything is from yes, the Oh, interesting
39:22. So hold on,
39:24hold on a second. So let's what happened
39:26here. So date truncates the month, of
39:28course. And
39:31observation, observation time is okay. So
39:33it's basically taking everything in 2018,
39:37January,
39:38and whether it's, you know, whether it's on
39:41the 17th of Jan, or the first of Jan, it's
39:44all going
39:44to return the basically the the day of the
39:47first of the month, because we're trunc
39:50ating to the
39:50month. So if I give you a date and the time
39:52, let's say the second of January, 2018, on,
39:57you know,
39:57whatever, what, what time, let's choose a
39:59time like 2am in the morning, when you
40:01truncate that
40:02down to the month, it gets rid of
40:03everything up until month, and it just
40:05returns that but it gives
40:06it a timestamp still of midnight. So that's
40:10why you'll get to the 17th and the ninth,
40:13because
40:13they're all coming back as that. And what
40:16we can see here is we're not actually
40:19giving the column
40:20name as well, we're not doing it as that.
40:22So just just be careful that this is this
40:24is like running
40:25a filter essentially, okay. And we're
40:27limiting it to the first 20 rows. And we're
40:29basically just
40:30returning everything. So if we go back to
40:31the guide, let's kind of see what logic
40:33they were
40:33trying to get us to do here. So if we go
40:36back here, view the worksheet, verify the
40:39view with
40:39the following query, notice the results
40:41look like a regular structured data source,
40:42your result
40:43set may have different observation time
40:45values, okay. So that's basically it, they
40:47just wanted us
40:48to see that look, you've loaded data from a
40:50semi-structured data source. So that's very
40:53easy to run. Okay, now, then we start to do
40:55things that are a little bit more complex.
40:58Now we're going
40:59to start working with joins. So here you
41:01can see user join operation to correlate
41:04against data set.
41:05So we'll now join the JSON weather data to
41:08our citybike.public.trips data to
41:10understand,
41:12so to answer our original question of how
41:14weather impacts the number of rides. So run
41:16the command
41:17below to join weather to trips and count
41:19the number of trips associated with certain
41:22weather
41:22conditions. So essentially what they want
41:24us to do is start doing something you do in
41:26the real world.
41:27Let's match up this weather data to this
41:29trips data and see if there's a correlation
41:31.
41:31So that is going to require a little bit of
41:33sort of bringing data together, but also a
41:36little bit
41:36of math because essentially we need to sort
41:38of create some sort of correlation, see if
41:40the two
41:41are related. So here you can just look at
41:43the query, select weather as conditions,
41:45count the number of trips from city bike
41:48public trips, and then bring that data
41:52together left
41:53out of join to the JSON weather data view
41:56on date trunk, our observation time equal
41:59date trunk,
42:00our start time. So basically it's saying
42:02compare the weather observation time to the
42:04hour to the
42:05date truncation hour start time. So you
42:08compare the observation time to the start
42:11time of the
42:12journey and where the condition is not null
42:14, where basically there is no weather data
42:16and group by
42:17one order by two descending. Okay so let's
42:20go to the query and just look at that. So
42:23let's there
42:23it is again, it's pretty straightforward, I
42:26hope it's straightforward, if not I'll go
42:28over it once
42:28it's run so we can see it more clearly. So
42:31let's hit run and what we should see is you
42:33should see
42:34here the conditions on the left and here
42:35the number of trips on the right. So it's
42:37just
42:37basically doing a general sort of tick for
42:39each condition go and do this. So if we
42:42just actually
42:42look at the SQL again I think now we we're
42:45in a better place to understand it. So
42:48select weather
42:48as conditions, so that's why we have
42:51conditions on the left, and then count the
42:53number of trips
42:54from CityBike public trips, hence we have
42:57this on the right hand side. Okay left out
42:59to join
43:00JSON weather data on date trunk observation
43:02time equals that. So it's basically just
43:05then defining
43:05the relationship between the two things
43:07where condition is not null and then group
43:09by order
43:10by two. So essentially what it's doing is
43:13you're defining the way these two datasets
43:16come together
43:16here and you're basically choosing what you
43:19're going to see here at the top and then
43:21you're
43:21choosing how it's ordered over here
43:23essentially. Okay so that's pretty cool
43:26that's nice to see
43:27it's nice to see that that works and that I
43:29think is the very last step of this
43:30particular exercise.
43:31So that was seven working with semi-struct
43:33ured data views and joining essentially
43:35doing analytical
43:36queries and making sure that all works
43:39really really well. Okay so that's pretty
43:41much it we're
43:42going to stop this here in the next video
43:44what we're going to do is look at time
43:45travel and try
43:46and figure out how that works. It's a
43:48really cool feature it basically lets you
43:50undo things
43:50and you can see here that I've actually
43:52been already looking at it just checking it
43:53out but
43:54it lets you undo things basically go back
43:56in time if you're on the standard edition
43:58it goes back
43:59up to 24 hours if you're on enterprise
44:01edition up to 90 days of history you can
44:04essentially run a
44:04query as if you were doing it back in time
44:06and essentially get the results from that
44:09point in
44:09time and not where you are today. It also
44:11allows you to rescue yourself should you
44:12delete something
44:13or do something that you sort of shouldn't
44:15do. So we'll check this out in the next
44:17video and this
44:18has been sort of 40-45 minutes so I think
44:20this is about the right time if I carry on
44:22for another bit
44:23longer we'll probably end up being an hour
44:25and a half and that's probably too long for
44:26a video. So
44:27thanks for watching I'll catch you in the
44:29next video if you've enjoyed this let me
44:30know what
44:30you think in the comments below and we'll
44:32catch you in the next video take it easy.
Future-proof your career https://n1d.io
| My Courses on Linkedin Learning: https://www.linkedin.com/learning/instructors/tim-ngwena In this video we go through using and running analytical queries in Snowflake. We will continue to use the web interface but the same queries will work in any tool. We also take a look at semi-structured data formats, specifically JSON with it’s support for flexible schemas. We will build a database, load JSON data into it and query it without transforming the structure of the data using Snowflake’s native capabilities.
Intro 0:00
Recap of previous video 0:39
Running some analytical queries 2:25
Run the first analytical query 5:49
Analysing the query 9:35
The Snowflake Cache 11:25
Run one more query 14:53
Cloning a table in Snowflake 16:37
Section summary 21:19
Creating a database for Semi structured JSON data 24:57
Create a stage for semi structured data 29:07
Load semi structured data from AWS S3 into Snowflake 30:55
Previewing our semi structured data in the table 32:17
Creating a view on on semi structured data 34:21
Running an analytical query from our view 39:04
Joining our semi structured data to our structured data 40:54 Share feedback and Suggestions: https://tableautim.canny.io/suggestions -
Join this channel to get access to perks:
https://www.youtube.com/channel/UC7HYxRWmaNlJux-X7rNLZyw/join ----------
(C) 2023 TN-Media LTD. No re-use, unauthorized use, or redistribution, of this video without prior permission.