background top icon
background center wave icon
background filled rhombus icon
background two lines icon
background stroke rhombus icon

Download "Учебный курс QPT. Тема 11: Приемы оптимизации"

input logo icon
Video tags
|

Video tags

Postgres
Postgres pro
Postgres Professional (Software Company)
PostgreSQL (Software)
Free Software (Software Genre)
Software (Industry)
Постгрес
Постгрес Про
postgresql
qpt
education
Subtitles
|

Subtitles

subtitles menu arrow
  • ruRussian
Download
00:00:05
we have the last topic left, we are
00:00:09
talking about some
00:00:12
optimization techniques because so far we have
00:00:14
only discussed these mechanisms that
00:00:17
exist,
00:00:18
but we haven’t talked about optimization yet,
00:00:21
here I really want not to slide into some
00:00:25
collection of recipes
00:00:27
because with optimization, as a rule,
00:00:30
the recipe does not work in in some cases, and this
00:00:33
still helps, in some it doesn’t help, and
00:00:36
here it’s important to
00:00:39
understand the reasons and draw
00:00:42
mental conclusions, but nevertheless, it’s
00:00:46
impossible to get by completely, so
00:00:48
let’s try to talk about what
00:00:52
levers we have to influence requests when
00:00:56
they can We have already found a request that
00:00:58
we can greatly optimize for the upcoming media, let’s
00:01:02
start with such a
00:01:03
philosophical introduction on what we
00:01:05
basically have options for
00:01:10
our actions, there is one option such
00:01:16
that there is a request for a request to work
00:01:18
incorrectly, we can look in it to
00:01:22
find the place in it that does not work
00:01:25
since we would like to continue to
00:01:27
somehow influence this request
00:01:30
so that it begins to be filled as
00:01:32
we need it, well, for example, we see that our
00:01:37
table is read completely, we understand
00:01:39
that it’s not good
00:01:41
to create Enders, it starts to be calculated according to
00:01:44
Yandex, great
00:01:49
sometimes it works sometimes it works worse
00:01:52
because it is not always easy
00:01:53
to guess the weapon on it is necessary to correct the request in the request
00:01:56
that they say it began to work better and
00:02:01
often all this is born in the fact that you really
00:02:04
want to turn off the scheduler altogether or
00:02:07
write it yourself, at least in your
00:02:10
head you have matured to this idea that
00:02:12
it another plan did not live and then we
00:02:14
try to convince the planner that
00:02:16
we are wrong and here there is such a
00:02:19
desire to have some hints to help
00:02:23
the optimizer, but there are none, well,
00:02:26
more precisely, as we saw, we have some options
00:02:28
there, but in an explicit form
00:02:33
absence and the postgres developers they
00:02:39
deliberately do not want to introduce the hand base and
00:02:43
because you think it’s wrong
00:02:45
because if there are any problems with the
00:02:48
planner item boatman planner
00:02:50
they come up with a means of bypassing it,
00:02:56
well, well, this solution in general can be
00:03:01
understood because if we somehow -
00:03:04
in this way, we nail down the complete
00:03:06
query execution plan
00:03:07
until we can quickly
00:03:10
fix the problems not the query, but as soon as
00:03:14
something changes in the data,
00:03:16
the distribution changes, the quantity of our
00:03:19
nailed plan remains and ceases
00:03:23
to be effective and again we need to
00:03:26
optimize it for the already the
00:03:28
changed situation, but if we did
00:03:30
n’t do this, the crowds of the worlds must have
00:03:33
lip, in theory, to set up an escape plan
00:03:37
in the series and because it ignites the
00:03:38
tone with statistics, to rebuild, this is the
00:03:44
first second option is to
00:03:46
trust the circle on the contrary to
00:03:49
the planner, to assume that the planner
00:03:51
is working well, correctly, simple we need to
00:03:55
help the shell in any way we can by
00:03:58
giving him good adequate statistics
00:04:01
which he can use,
00:04:04
with this approach we need to look
00:04:06
at the error in the calculation radically
00:04:09
if in some node we see that the error is
00:04:12
large we need to try to correct
00:04:15
the statistics so that the scheduler I understood
00:04:20
what he is with real media and when he sings he will
00:04:23
most likely rebuild it so that he
00:04:25
will become adequate without a photo of our
00:04:29
violent intervention in what
00:04:31
he is doing, but if the plan is all, the
00:04:36
statistics are all correct, all the stages are all
00:04:38
well chosen, but at the same time it will still
00:04:41
work out not the same as needed, then most
00:04:44
likely the problem is in some global
00:04:45
settings that need to be tweaked
00:04:49
and brought into line in practice,
00:04:55
you need to take this approach into
00:04:58
some kind of totality and because
00:05:02
sometimes it is easier to go one way,
00:05:05
sometimes another way, well one way or another,
00:05:07
as a result, we must get a plan
00:05:10
that we are building,
00:05:13
let me remind you once again that everything with us
00:05:17
is based on statistics, if the statistics
00:05:20
are bad, inadequate, then there is nothing further to say,
00:05:24
he is a polisher, I am learning to do it, it is not
00:05:26
necessary in this sense, you need to set up
00:05:31
automatic cleaning, it costs 100 watts,
00:05:34
analysis, how is this you are being examined in detail in
00:05:37
any course,
00:05:38
so I won’t talk about the project again now,
00:05:41
but this needs to be done so that the
00:05:44
statistics are collected during the next step, we
00:05:49
can adjust the accuracy of the statistics,
00:05:51
this parameter is the default statistics starved
00:05:53
about which we talked about, it can be
00:05:57
exchanged if necessary, in which
00:06:00
cases it is useful as we saw create
00:06:02
Yandex by expression that was
00:06:05
additional statistics were collected specifically for
00:06:07
this expression in what cases does
00:06:10
advanced statistics help us,
00:06:13
including la helps for quite often in the
00:06:15
case of correlated predicates the end
00:06:22
even in the case when we have all the
00:06:24
statistics, they are relevant and good
00:06:26
planners are not He can always
00:06:28
make the correct nings out of it, in this case
00:06:33
he also has to help somehow,
00:06:35
sometimes he can reformulate
00:06:38
the query so that he can pick up these
00:06:40
statistics that are there, if he is
00:06:45
fundamentally wrong, sometimes
00:06:47
you can place some of the queries in a
00:06:49
temporary table so that the scheduler can
00:06:52
have some kind of then an understanding of how
00:06:54
much data he has to
00:06:56
work with further, but this place also has
00:06:59
side effects; nevertheless, there remains the
00:07:05
creation of a kind of
00:07:06
materialization point at which the scheduler
00:07:09
receives statistics; the intermediate
00:07:13
data type skype
00:07:14
did what next next to us there are
00:07:19
some settings that affect
00:07:23
fireworks base to whale coefficient that
00:07:27
participate in the cost estimation that we
00:07:31
have settings that show
00:07:34
the cost of disk operations their door of
00:07:38
all the pack of acute house pavich space that
00:07:42
is the cost to follow the reading of
00:07:44
history by chance in the crush as you remember
00:07:48
equal the pack costs four if
00:07:52
you have a create a disk, then you need to mix it up very
00:07:55
much, down to one,
00:07:57
if you just have some kind of low array
00:08:00
with you, of course, then most likely you also need to
00:08:02
reduce it because it will be unreasonably
00:08:04
high and the high number here will
00:08:09
actually influence the fact that Yandex
00:08:11
will be used less often than I
00:08:14
would like the scheduler to read the
00:08:17
last access because it will
00:08:19
be deprived
00:08:21
there is one more parameter effective oh kan
00:08:25
karen si it can be increased if
00:08:29
you have a
00:08:30
disk array in which there is more than
00:08:34
one independent disks the
00:08:36
parameter is used for us in few places in fact it is
00:08:39
used only for in order to
00:08:44
pre-calculate in the cache
00:08:46
page when scanning using a bitmap,
00:08:50
what’s next here is setting the
00:08:54
processor operating time, these are these secu parameters,
00:09:01
because there are three things, I’m not sure that
00:09:04
it makes sense to twist them somehow because
00:09:06
they just won’t guess where they need to be
00:09:08
unscrewed, what’s here there is also something interesting, it’s
00:09:12
interesting that you can set a cost for
00:09:15
your functions, a regular function
00:09:21
written in an occupational language
00:09:23
receives a cost of 100,
00:09:26
and this can make a difference if you
00:09:30
use calls to different functions in a query under sampling conditions,
00:09:35
here are the
00:09:38
funds, which costs less, it will be
00:09:40
listed if possible earlier,
00:09:42
so as not to call fun at all, which
00:09:45
costs more if it is possible, this is also the
00:09:48
case and it does not happen often, but sometimes it
00:09:51
can play a role for parallel
00:09:57
execution. We have parameters that set the cost of
00:10:02
creating an infrastructure for parallel
00:10:05
running of processes and the parameter spilled the
00:10:09
cost tapa, which gives us the cost of
00:10:11
sending one line between processes,
00:10:16
it’s also nortug here,
00:10:18
you can experimentally understand what value these
00:10:21
parameters should have, but by default it’s
00:10:25
not such an important parameter, a piece of
00:10:29
such a frag, how does it influence you, as you
00:10:32
remember, on the optimization of queries
00:10:35
used in cursor if you need the
00:10:39
query to be optimized from the point of
00:10:43
view issuing the first lines, then there
00:10:46
should be a small value of 1, so it should
00:10:48
become small, but if
00:10:51
you don’t want this, you can, accordingly, you can have one,
00:10:55
what else is there for the construction we have, we
00:10:59
have the work men parameter, which is very
00:11:03
important, this is like this part of the amount of memory
00:11:09
that is allocated for executing one for a
00:11:12
single operation such as
00:11:15
sorting, hashing, and so on, the
00:11:20
default value is 4 megabytes, which is
00:11:23
almost certainly less than what you want;
00:11:26
it is turned to minus so that
00:11:32
the area can be launched
00:11:35
anywhere; take it apart, Peter, in a normal
00:11:41
server; this value, of course, is increased;
00:11:44
it affects us what about the fact that
00:11:50
by increasing it you reduce the
00:11:53
use of temporary files which have
00:11:56
been chewed up by starts to use and with them there is not
00:11:58
enough of this volume and plus the hobby of
00:12:02
work nemo can influence you on what
00:12:04
will be preferable will be preferable to
00:12:08
hashing gentle
00:12:10
sorting because with a small
00:12:12
amount of memory under grease prefers
00:12:14
grouping to do using sorting
00:12:16
so that
00:12:18
because sorting us kaida less
00:12:24
depends on this
00:12:25
and
00:12:27
there was an operation of a comic
00:12:32
unit of time with a fence and
00:12:36
pave it to consider his beard
00:12:41
staging often and for a long time the
00:12:45
general grew there is nothing to create
00:12:51
probably yes, but you also need to take into account the total
00:12:55
amount of memory that you have server, but
00:12:58
because if you make too much of a logo,
00:13:01
your memory may run out and fall
00:13:07
if he conceived a global leader,
00:13:10
all networks are milked here and it turns out he can
00:13:13
create no more than a casino war,
00:13:16
any grown value is arbitrary
00:13:19
in general,
00:13:21
who does not write a petabyte will want to
00:13:26
explore the world boy after
00:13:29
Bazhov in in general, in general, yes, but how can you,
00:13:38
how much memory is not allocated or
00:13:46
used for connections, unfortunately, it is
00:13:49
very difficult to find out
00:13:52
the main ones,
00:13:54
you can estimate the memory in general,
00:13:56
this is a sad situation because
00:13:59
he knows, but he doesn’t know for sure,
00:14:02
unfortunately he doesn’t know you, there are
00:14:06
some Developers have difficulty with this,
00:14:08
they cannot make a
00:14:10
normal tool that will show
00:14:13
what is happening, which shows
00:14:21
all six are there because
00:14:27
memory allocation inside occurs in a
00:14:30
special cunning way, where the layer
00:14:34
is written to the manager, it has its own locator, yes there
00:14:42
is an extension pgm stud, in my opinion it’s
00:14:48
called it so
00:14:51
crude enough but it shows a
00:14:54
little bit of allocation
00:14:59
there were sides in the sense of building in the sense that the
00:15:03
developers are on their knees made for themselves,
00:15:05
that is, it is not for productive
00:15:07
use sometimes you can hardly why
00:15:12
sometimes it’s useful to look at it can be
00:15:15
new it’s hard to look at politics it’s hard in the
00:15:20
code
00:15:24
and blood in exchange for you there is for this us
00:15:27
Turkmen to which are used for
00:15:29
all sorts of service actions, for example, for the
00:15:33
consciousness of indexes, so if there are
00:15:35
opportunities, you will also see sometimes it
00:15:37
brings its advantages there is such a parameter that
00:15:41
also has an effect in the cache says it is not related to
00:15:47
memory allocation
00:15:48
it speaks to the address about that how much
00:15:53
memory is supposedly allocated for the
00:15:57
cache as it is at the address of such an operating
00:16:00
system, like this, it’s not connected to
00:16:05
anything physical, that is, it can be twisted
00:16:06
as you like, no memory in this sense is
00:16:09
allocated, the meaning of such is that the
00:16:13
more you put this value, the
00:16:17
cheaper you will have index
00:16:20
access because the plane will
00:16:22
assume that it has a little cache and these are the
00:16:25
case where I’m not reading from the disk, it will
00:16:27
assume that they will not get me
00:16:31
bratz or lock and evaporate, that is,
00:16:35
these could be the parameter you can play with
00:16:36
to increase or there is less pain for
00:16:40
index
00:16:41
access in the system,
00:16:48
it’s on its own, guys, that’s all
00:16:52
you’re trying to do in this way with clay,
00:16:55
how many
00:16:56
pages you ran to rely on products, and
00:16:59
here you don’t even have to guess, just
00:17:02
some more or less such value
00:17:04
in a vacuum in general the larger it is, the
00:17:07
cheaper index access becomes, all
00:17:10
that is indicated in gigabytes in
00:17:13
specific ones is because this value is
00:17:18
using a bent
00:17:21
hose, the bonus is in no way connected, decides
00:17:26
don’t go
00:17:27
and talk, when we have
00:17:34
index access, we have one of the company’s
00:17:37
costs in the set random
00:17:41
readings of index pages random hooks,
00:17:44
the more you increase this value,
00:17:46
the smaller the share of this
00:17:50
cost you get, that is,
00:17:56
the ideology is such that if you have a lot of USA,
00:17:59
then most likely you have an index page
00:18:02
for laminated train stations, go for them
00:18:04
to disk Patimat desheli
00:18:08
that's all history and operational
00:18:16
buffer, of course, taken together, that is, there is
00:18:20
no need to be tied to any
00:18:21
real numbers that you still have a
00:18:25
rather abstract parameter, in fact,
00:18:29
you expect the lines of Bob Kaiser from the
00:18:39
general to twist it makes sense if you
00:18:42
see that you are building normal ones
00:18:47
smoothly with from the point of view of assessments, it’s radically
00:18:51
sti, but bubbles are eliminated by sequential
00:18:55
reading, but you see that in reality OS
00:18:59
index access turns out to be cheaper
00:19:01
in terms of time, and this parameter you
00:19:05
can sort of calibrate, even
00:19:06
tell him that in fact
00:19:10
index access in my system is
00:19:11
cheaper than you Do you think I use it
00:19:15
more often, so there are still queries
00:19:20
that we need to join, and this is an important
00:19:23
point, there are 2 join parameters, collapse
00:19:27
limits, free of charge will be removed,
00:19:29
these parameters indicate that under
00:19:37
fire it will try to sort out different
00:19:40
order of joining tables, but only up to
00:19:45
eight,
00:19:46
and that’s all after eight, he can’t
00:19:49
imagine seeing which you
00:19:50
wrote syntactically, that is, if
00:19:53
you have queries in which there are
00:19:56
more than 8 tables, then you can
00:19:59
get bad ones, it’s time you may
00:20:03
need to increase the value, here the
00:20:06
first one affects Joe by 2, he can
00:20:08
superimpose under the query,
00:20:14
well yes, all of this can be increased from
00:20:18
yourself, their increase leads to the fact that
00:20:23
your planning time will increase, but this
00:20:26
may make sense because otherwise you can
00:20:28
just get all the bad pond, well, there are
00:20:36
different battles and there is also a parameter
00:20:39
that turns on the dietary optimizer
00:20:41
in which it is equal to 12
00:20:44
if you have more than 12 connections, that post
00:20:47
will be
00:20:48
optimized no longer by brute-force
00:20:53
dynamic programming, but
00:20:55
using the days of what the guy should do, that
00:21:06
when a channel is obtained, you have a request in
00:21:08
which there are a lot of connections
00:21:11
to sort through different orders to connect
00:21:14
remains very expensive there, the complexity
00:21:17
grows exponentially and in at some
00:21:20
point they just planned it, it starts to
00:21:22
take more time than this size
00:21:24
and at this moment you need to switch
00:21:27
to a dietary optimizer which, without
00:21:30
making a new device,
00:21:32
it makes such a random selection of
00:21:34
different clones there and then tries to somehow
00:21:37
combine them with each other to
00:21:38
get what something similar to the right, it
00:21:41
does this much faster than exhaustive
00:21:43
search, but it can get a not very
00:21:46
optimal plan before and because you
00:21:48
just can’t guess, accordingly,
00:21:52
such parameters need to be selected if
00:21:56
you have large requests, these parameters make
00:21:59
sense to select taking into account the
00:22:01
hardware that you have if you can
00:22:04
allow everything to increase the value then
00:22:06
increase the
00:22:08
mascara in general there are such handles keep in mind
00:22:12
that you need to remember about them if you see a
00:22:14
very large supply with a large number of
00:22:16
connections
00:22:18
it means where you can set these parameters
00:22:22
as if they could be set globally
00:22:25
for all instances but you need to remember that a
00:22:29
configuration parameter can be passed at
00:22:31
other levels, it can be done at the database level, you
00:22:34
can use new roles, you can use new
00:22:38
functions, but the base and the role from the point of view of
00:22:41
these parameters are
00:22:44
hardly needed, but for a function you can
00:22:48
use it if you have some kind of request for a
00:22:50
function you you can
00:22:52
assign certain parameters to this particular function and,
00:22:55
accordingly, it can be
00:22:57
increased with glue, it can be done at the
00:23:01
level of an individual function,
00:23:03
and in addition, you can set
00:23:05
set commands in your session
00:23:08
or even inside a separate transaction
00:23:10
if we talk about crying, that’s why
00:23:12
you can enter in this way
00:23:15
start the transaction set the
00:23:18
nozzle to you some necessary values
00:23:20
then the requests the transaction ends the
00:23:24
parameters are reset there are a number of
00:23:28
parameters that can be used
00:23:30
for debugging
00:23:31
they are hardly worth using
00:23:35
in real life on a real base to
00:23:39
play with they are useful there are a bunch of
00:23:42
parameters that begin with the word
00:23:44
in Ebola they allow you to turn on and off
00:23:46
different methods and access to data, different
00:23:51
connection methods with aggregation,
00:23:58
caching, sorting, and so on, you
00:24:02
can disable or enable something
00:24:04
and see whether the cat will lead you to
00:24:07
success or not, and the
00:24:14
force parameter is it time for a lot that you also
00:24:16
saw can be used to
00:24:19
understand whether your request, in principle,
00:24:21
will have 1 defeat, you wrote something in it
00:24:25
that is spilled more than once in any way, not
00:24:27
when further, this means
00:24:33
settings settings, what else can it do, we
00:24:37
can play again outside the data scheme, there
00:24:41
are two processes one normalization
00:24:46
2 g normalization normalization is
00:24:49
the elimination of redundancy in data, it is
00:24:52
associated with the fact that we break up some
00:24:54
large tables into separate ones and
00:24:56
thereby eliminating redundancy,
00:25:00
normalization forgives us queries and makes it
00:25:03
easier to check consistency, but
00:25:07
this is the process that is usually
00:25:08
used when designing a
00:25:12
logical circuit data then
00:25:15
it turns out that everything is fine, but
00:25:19
we need some data often
00:25:21
and to get it we need to insult
00:25:24
several tables and ask the lera and so
00:25:26
on, in this case the reverse process can be used
00:25:29
where normalization is when we specifically
00:25:35
bring some
00:25:36
accuracy so that you can do it faster
00:25:40
to get a fashionable result from not
00:25:43
normalizing, you need to be careful because
00:25:45
this is always associated with the fact that you
00:25:48
have redundant information,
00:25:51
this information needs to be synchronized
00:25:53
with the main source of truth, which is
00:25:57
located somewhere in other tables, what are the
00:26:02
means of normalization, firstly, the index
00:26:06
about they are rare
00:26:09
in terms where normalization is not what you need
00:26:12
and there is Yandex, this sonorous structure,
00:26:15
all the data that is in Yandex is
00:26:18
not in the table,
00:26:19
the index just serves to
00:26:22
speed up access, what is
00:26:24
good about the index is that it
00:26:26
synchronizes automatically, you
00:26:27
jump and bend over, so it is a very convenient
00:26:30
way what else are there fields that
00:26:34
you can masturbate in advance to
00:26:37
get the result faster, if you
00:26:40
remember our winter base there, in our
00:26:42
booking we have a total
00:26:44
booking amount that corresponds to the amount of
00:26:48
individual flights of which the
00:26:50
armored one includes this example, where are the
00:26:53
normalizations so that such fields
00:26:57
are out of sync, they usually write
00:26:59
what -the triggers on the base table
00:27:02
that
00:27:04
automatically update when the data changes, and
00:27:06
such a field is also neutralized and
00:27:11
the view is mounted by the view,
00:27:14
a little more complicated because it is not automatically
00:27:17
updated and you need to choose the
00:27:20
right moment or to
00:27:22
update it manually, one
00:27:24
way or another all the time
00:27:26
you have to think about the issues of organizing the
00:27:31
moons, you can simply cache the
00:27:34
results somewhere even outside the database,
00:27:36
for example in applications, it also often
00:27:40
helps if the application
00:27:42
often sends the database to the same
00:27:45
request,
00:27:46
it can cache itself somewhere and
00:27:48
then use the result, but
00:27:52
again you have to think about
00:27:54
how to update this, of course, as you did,
00:27:57
so that if the data in the database has changed so
00:27:59
that Kashiru is in the lead, re-read in
00:28:03
general, any device like this is
00:28:06
caching, but it entails other
00:28:11
problems, you need to understand how to
00:28:14
update that cash during what else
00:28:21
we there are different types of data, parts
00:28:24
of them are many different, and the choice of suitable types
00:28:27
can often also affect
00:28:31
performance, for example, there is a
00:28:36
data type
00:28:37
range from and to, for
00:28:40
example, a date range strange, you can
00:28:45
make 2 fields
00:28:47
date from date to in the table,
00:28:50
or you can make one field like range
00:28:53
and
00:28:55
we argue about the type of ranges, various
00:29:00
interesting things are possible, you can use
00:29:03
index support to
00:29:06
find the intersections of different ranges and
00:29:08
so on, that is, using the
00:29:10
appropriate data type can give you
00:29:14
some specific gain, you can use other
00:29:21
types such as
00:29:24
Masel and Jason and instead to
00:29:27
create separate tables next to each other,
00:29:30
classic revolutionary, connected by
00:29:32
some keys, what this can give us,
00:29:35
what this can give us is that we don’t need
00:29:38
to make a connection, we immediately read it, well, we
00:29:41
get low-level information and
00:29:44
another plus is that if you have
00:29:48
auxiliary tables,
00:29:50
that is, I have one or two fields of
00:29:53
some small size, then you have
00:29:57
very large overhead
00:29:58
costs; for each version of the line from the
00:30:02
header, it is large 32 4 bytes
00:30:05
and
00:30:07
it is added to the size of each line
00:30:10
accordingly, it can turn out so that
00:30:12
you will save there is a lot of space
00:30:21
here, too, this will lead to
00:30:25
acceleration, but this method does not
00:30:28
always work; in some cases, it is better
00:30:31
to keep different tables anyway, that is,
00:30:34
you need to look at the specific task, in
00:30:40
addition to data types, we also have hot
00:30:43
integrity, integrity constraints, they are
00:30:46
important in themselves because that if you have
00:30:49
no restrictions in the database, then the application
00:30:52
can always change something and set it to
00:30:55
delete so that you get
00:30:58
incorrect data inside its database and
00:31:01
then you have to somehow work with it
00:31:03
live, write some kind of
00:31:06
date fixes in order to change it there and correct the
00:31:10
total turnover data
00:31:11
integrity constraints are a useful
00:31:13
thing, but besides this, they can sometimes be
00:31:18
taken into account by the scheduler and help it create a
00:31:21
more accurate form; they can
00:31:27
help eliminate unnecessary connections; we
00:31:29
saw an example at the very beginning of the course
00:31:33
when we had a call to the
00:31:35
view of gta and boost 3 tables and
00:31:38
of them one table we removed it
00:31:42
because we don’t need it and you clearly
00:31:44
understood that
00:31:47
integrity constraints can be removed; they help
00:31:51
find situations in which this can be
00:31:54
safely done;
00:31:57
integrity constraints can improve
00:32:01
selectivity estimates; and there are various others; there
00:32:05
are cases when the presence of constraints
00:32:08
contributes to a better pair of
00:32:12
watches;
00:32:21
there are also restrictions
00:32:23
which is called a check and it can be
00:32:27
used in conjunction with the
00:32:29
construct exclusive parameter
00:32:31
to exclude from consideration a table
00:32:37
in which it is guaranteed that there is no necessary fauna data;
00:32:39
this is used when
00:32:42
partitioning, including in order not to
00:32:48
look at those section tables in which
00:32:51
there is no data, which is guaranteed to
00:32:52
fall under the conditions of your request
00:32:58
What else is there? You can play with the physical
00:33:02
arrangement of data. We have
00:33:04
table spaces with which
00:33:07
we can place
00:33:08
our tables and indexes on different
00:33:10
physical devices and thus
00:33:13
achieve some kind of acceleration. We can
00:33:17
use partitioning to
00:33:18
divide the table into separate parts,
00:33:21
each of which is included. can also be
00:33:24
placed on separate
00:33:26
tablespaces in order to speed up access
00:33:36
[music]
00:33:38
we will have our own index on each section
00:33:45
if there is one, or they will use
00:33:55
statistics to enter it, he will not
00:34:00
go to those sections that he clearly
00:34:03
understands that there is no need to go, but this
00:34:07
means that the main condition is to be a
00:34:09
sanitization key and that’s why chlorine
00:34:12
will understand how to do it
00:34:17
now declarative partitioning
00:34:20
which I use and it appeared it
00:34:22
should take into account before this it was done
00:34:25
just by hanging a check restriction
00:34:29
and setting and the value of the Sanstrike parameter,
00:34:34
well, actually now it’s the same an approximate
00:34:37
mechanism, it’s just disguised,
00:34:41
normal declarative constructions and
00:34:44
how not to bend it, but
00:34:50
partitioning allows you to
00:34:53
do a full scan more economically,
00:34:55
that is, a nice silo index, you
00:34:57
can quickly look up a large table using the index
00:34:59
old if you need to read
00:35:03
most of the data, then partitioning
00:35:06
allows you to read here we all need it, it’s
00:35:08
not easy to read the entire large table,
00:35:13
and that’s the end of it, we can change
00:35:17
the query itself, there’s like limitless scope
00:35:24
for imagination, you can use different
00:35:27
methods, for example, you can
00:35:30
materialize some queries
00:35:32
using its express commands, you can
00:35:39
force the
00:35:40
killer to connect tables exactly in the
00:35:44
order in which you specified sets
00:35:47
join the love's Levitt wreath, you can
00:35:51
do something else,
00:35:55
you can change queries in order to
00:35:58
tell the planner that there is
00:36:00
some other way to execute the
00:36:01
request, that is, in theory,
00:36:05
declarative squares should give us
00:36:07
what we can we can write a query any way we
00:36:09
like, we can execute it in a different way for
00:36:13
such a person, and it is important that we
00:36:16
receive exactly the same data as we asked for in the
00:36:19
query, but the polisher cannot always
00:36:24
consider different equivalent records,
00:36:27
then the query, for example, does not always
00:36:30
cope with revealing correlated
00:36:32
subqueries, and this leads to the fact that
00:36:36
we arise inside just letters
00:36:38
that could have been avoided, he
00:36:41
cannot always eliminate an extra table,
00:36:45
for example, he does not know how to replace the conditions in
00:36:50
which arno junior 2
00:36:52
queries occur, and so on, that is, there are
00:36:54
different options that are
00:36:57
simply not considered by the planner and we can
00:36:59
rewrite the query ourselves so that
00:37:02
it is executed in some other way,
00:37:04
sometimes these can also have an effect, as we
00:37:11
have seen, a common problem is that
00:37:15
we have code written in such a
00:37:20
procedural style with loops inside
00:37:23
which some other loops are called
00:37:27
and there is a request somewhere inside
00:37:31
this leads to the fact that the database
00:37:34
executes a large number of
00:37:36
relatively small queries, each of
00:37:39
which can be written optimally;
00:37:41
it can no longer be optimized in dollars;
00:37:44
simply due to the quantity, you
00:37:45
get a long execution time;
00:37:48
in this case, it’s good to replace such things
00:37:52
with a declarative style, that is, get rid of
00:37:56
from loops, replace them with connections and this
00:37:59
can give a very big gain,
00:38:02
unfortunately it was difficult to do and not always
00:38:09
easy and simple it can be replaced like
00:38:20
this I wanted to say about it let's
00:38:23
see an example,
00:38:28
so we will have an example, this is what
00:38:34
we will write some kind of query now we will
00:38:38
to try to optimize it,
00:38:41
we will disable the parallel execution of a
00:38:45
virtual machine here on the core of the
00:38:49
meaning in the parallel wall, the
00:38:53
parallel plans that will
00:38:55
come out are simply more difficult to read,
00:38:56
so we will abstract this
00:39:00
and try to solve this problem,
00:39:05
let’s say we need to calculate the average
00:39:08
cost of tickets in the context of different
00:39:13
types of aircraft each type has its own
00:39:16
average cost,
00:39:17
but at the same time it is necessary to exclude from
00:39:19
consideration the cheapest tickets the most
00:39:22
expensive tickets cutoffs at the extreme point,
00:39:25
well, maybe such a far-fetched task, but
00:39:28
using its example we will look at the shoulder, but the
00:39:32
first approach to
00:39:37
the projectile what are we doing here we are making a
00:39:41
request from Kraft signs and we group the
00:39:48
craft code stage and inside we will have a
00:39:51
request that we actually
00:39:53
consider this very ours in the average
00:39:56
cost,
00:39:57
we consider the average within the reunion of
00:40:02
flights flights
00:40:04
flights with flights and we have two
00:40:08
conditions that cut off the
00:40:10
lowest cost of tickets and cut off the
00:40:14
highest cost
00:40:16
this request gives us the data that
00:40:19
we want,
00:40:20
here it is, here is our airbus a300 200, which does
00:40:27
not fly, it has an empty value of 1, there are
00:40:30
some Saturday sails, and
00:40:33
this request, as you will see, is carried out for more than a
00:40:35
minute
00:40:36
69 way, I would like to do something with it,
00:40:42
how will we to approach this task,
00:40:47
how could we do this in
00:40:52
real time,
00:40:53
well, look at the plan, what we see in the
00:40:57
plan, in the plan, we see the main query,
00:41:01
here it is, the table of aircraft we dream of
00:41:07
yandex.ru from to us and we carry out the dosage,
00:41:11
and inside we call for the request
00:41:16
that this one,
00:41:19
which was not disclosed under grease, it
00:41:22
remains in question, so we see at the
00:41:25
sap node, some kind of aggregation is happening here, here we
00:41:30
are connecting two
00:41:34
tablets, flights and flights, stronger
00:41:39
occurs with the help of wide ones and we
00:41:43
have two more for requests that
00:41:44
are executed each one once,
00:41:48
the node or the throne tells us this, this means that we
00:41:51
execute it once, we have these here
00:41:55
for clearings, one calculates the minimum
00:41:57
cost, you will calculate the maximum
00:41:59
cost for this, we perform a full
00:42:04
scan of the ticket fights table
00:42:06
and in one case and otherwise .
00:42:12
that we can optimize here like this
00:42:15
right away
00:42:17
[music]
00:42:21
chapter i and I turned them off here, I
00:42:27
wrote a post of owls, otherwise we would have
00:42:31
a lot of all sorts of numbers
00:42:33
and the general structure of the tattoo would be lost.
00:42:35
Shoes, you can recognize these and the Chinese, well, it
00:42:42
would be nice to have an index limit because the
00:42:45
minimum and maximum are easy to find the channel
00:42:48
eat the index we understand that something
00:42:51
can be done very quickly, well, whatever
00:42:54
comes to mind, we’ll try to find it, so we’re
00:42:57
creating the play mode index
00:43:00
because it’s a little used in our country,
00:43:02
now we’ll create it, I
00:43:08
hope this it takes some time after all,
00:43:16
8 days he lines they threw you
00:43:20
count separately
00:43:22
they each him and the question is just how
00:43:28
to move forward because let’s first
00:43:31
try the index we will need an
00:43:34
onion for this at least the disease is not fast and
00:43:40
Yandex is created and quickly
00:43:50
106 he created now we’ll rather launch a
00:43:55
new request, I’m doing it here from captivity to
00:43:58
lay and let’s see what we have,
00:44:04
but I think that we have about 2
00:44:06
minutes to talk about
00:44:11
some other things, and not only will it not
00:44:17
improve, but we should also will worsen, but in
00:44:19
fact, that’s what you think, why can we
00:44:22
while
00:44:23
atma visually think about why everything is
00:44:26
even sadder Chimbulak it would seem
00:44:34
now where in fact the initial ones need
00:44:37
bronze in
00:44:40
these 2 before they
00:44:43
should, but they will be
00:44:54
one of the
00:44:58
well, almost the same remained in this case
00:45:01
a little longer than 9 when I launched it on mine, it
00:45:13
turned out to be much better, in general,
00:45:19
it became, let’s say, why it
00:45:24
seemed that let’s see our
00:45:28
no plan, what we got in this
00:45:31
case, we have an index here inside for cops,
00:45:34
he was looking for
00:45:36
that is, we really have access
00:45:38
occurs by index and
00:45:40
we count the minimum by the index and the
00:45:43
maximum we also count by the index,
00:45:45
that is, this place should
00:45:47
improve, but if earlier we connected
00:45:51
the main large Hajime tables of our years
00:45:53
and did a full scan,
00:45:56
now you have Yandex and
00:45:59
the scheduler decided that that cop ticket
00:46:02
flights we need to contact him with the
00:46:07
use and legs that we have
00:46:12
but at the same time he missed the assessment
00:46:19
radically sti missed very much
00:46:22
he decided that in this way he would select 42
00:46:25
approximately 1000 lines and the reality there is 8
00:46:28
flax but otherwise he would not began to use
00:46:33
if he thought that there was a lot of why,
00:46:37
what do you think,
00:46:40
let the statistics tell you everything is fine, the carrier,
00:46:46
well, I’ll tell you the reasons, it’s here, here, here, here, well,
00:46:51
of course for him, these are
00:46:57
just two conditions that
00:47:01
the vegetable considers unrelated to each other, we
00:47:05
understand that we these conditions
00:47:07
we cut off a few
00:47:09
lines from one edge, a few lines from
00:47:12
the other, the main power of us remains
00:47:15
yellow does not know for him, this is simply
00:47:17
a condition that there is little else
00:47:21
meaning about which he does not know better, and
00:47:24
before this case he also applies
00:47:26
some kind of built-in selectivity assessment
00:47:28
that with ours case has nothing in common, the
00:47:30
result is that
00:47:32
he has such a big error
00:47:35
42000 place 8 million how to
00:47:40
improve this assessment here, just for us the deals
00:47:44
for what they offered, we can calculate
00:47:47
the minimum maximum first and substitute their
00:47:50
request so that he generally understands what
00:47:54
values ​​\u200b\u200bwill be here relay rack when it
00:47:57
takes on specific values
00:47:59
it will correctly evaluate the
00:48:00
selectivity let's try to
00:48:03
do this
00:48:04
we will start the transaction at the level of babble
00:48:06
buried to get consistent
00:48:09
data then we will calculate the minimum maximum
00:48:13
and with the help of this conversation design we
00:48:16
remember them in Quito variables by and
00:48:18
squira
00:48:19
amine its a max and then we will execute
00:48:24
the request by substituting these
00:48:26
values ​​that are calculated in advance here inside
00:48:32
what should happen to us, we should
00:48:37
correct the cost estimate,
00:48:40
the estimate will be radically sti and the planner
00:48:45
should realize that now with so
00:48:51
many lines there is
00:48:52
no point in counting flights birds, you’ll be fine,
00:48:56
let’s wait and let's check, well, yes, but
00:49:02
in this case, all the scans are more profitable
00:49:05
because we have all these eight million and we can, that is, there’s
00:49:08
no escape, let’s see what
00:49:14
we got, it’s become a little better,
00:49:17
it’s become 50 seconds,
00:49:18
it’s good and indeed we now have
00:49:23
sex can both well and accordingly, we
00:49:32
got a little better so what can we
00:49:39
do next, how should we continue to be
00:49:45
anyway, we want a lot less what
00:49:50
we can improve, the
00:49:55
next step should be to
00:49:59
get rid of this cycle, we have 9
00:50:04
lines and for each zenith of lines we
00:50:07
are performing this operation, that is,
00:50:10
we are not just doing sex canongate
00:50:12
partitions, but this is not good,
00:50:17
how can we get rid of this for queries
00:50:22
that could open the floor itself
00:50:30
into a temporary table, it won’t work, but
00:50:33
because we have this
00:50:37
little thing, it’s so tied correlated to the
00:50:40
query, it uses a certain value
00:50:42
that is taken from some external
00:50:47
table and it is different every time, so the
00:50:50
Jews are the execution trace, that is, we need to
00:50:53
open it, we need to replace the query
00:50:56
with a connection,
00:50:58
how can we do this, we can
00:51:01
rewrite the query in this way we
00:51:13
need the minimum and maximum according to the
00:51:15
conditions of the problem as a whole, that is, the whole thing was stolen,
00:51:18
look what we will do and
00:51:23
these two tables
00:51:24
flight stick it fly ts are attached to the
00:51:30
main query with connections so much we
00:51:33
need to use the left join
00:51:34
because we may end up
00:51:43
not finding the ones we need data and then we
00:51:47
need to return the simple elbow for
00:51:51
our Airbus A321 this is
00:51:54
really important what happened I’ll
00:51:58
learn some kind of general plan without
00:52:01
this safons which the complete one from the cycle we can
00:52:07
easily do 18 q
00:52:09
which is more interesting than waiting for minutes
00:52:19
you can do it here something else before it’s reasonable,
00:52:29
we see that hashing is required,
00:52:32
but it doesn’t fit
00:52:35
your entire burden of exchange and we spend
00:52:40
some time sending the
00:52:44
data to the child’s disk to calculate it,
00:52:47
in fact, that’s what we’ll do now,
00:52:49
let’s increase the urc men in
00:52:52
twice it was 4 megabytes 8 should be enough for
00:52:55
us to become unbearable, let’s run it
00:53:00
again and well, 17 a little bit a little bit means
00:53:09
things have sped up for us, and here I
00:53:15
just turned on the output of smokers so that you can
00:53:20
see how much I/O we
00:53:28
did at every step,
00:53:31
sometimes this feature is interesting
00:53:35
to look at, so we’ll say
00:53:42
the owner did it and this is how many
00:53:47
pages you read
00:53:49
from the
00:53:50
mess and how many we read from the disk at the same time,
00:53:56
and so on, here,
00:54:03
too, something given from the disco had to be
00:54:05
raised like this Well, this is the kind of
00:54:12
query we got, and then it’s like, what’s next is
00:54:15
unclear because if we
00:54:18
use these, which showed,
00:54:21
we’ll calculate how much we have in total; the page in
00:54:22
our tables will turn out to be 168 thousand and
00:54:26
a half, and we see that
00:54:32
by adding one and the other, we get this figure it
00:54:35
turns out that we no longer read any unnecessary data in the request, we do
00:54:39
n’t
00:54:42
sort through it twice, so you are somewhere
00:54:44
closer to the value of the niches
00:54:49
which will be difficult to obtain,
00:54:52
of course there are some other methods left,
00:54:54
but they already affect the setup of the entire
00:54:59
instance does not apply to the smell,
00:55:02
you can roughly try
00:55:08
because
00:55:09
roses say, well, it’s not clear
00:55:12
because sex can we use a buffer
00:55:15
ring, so this and this and
00:55:24
some other maneuvers we have,
00:55:28
in fact, I need a request and it
00:55:31
will be more difficult for you the service is developing so
00:55:43
well what I wanted to show is some
00:55:46
example of the
00:55:47
process of optimizing queries, we tried
00:55:51
everything, we tried everything, but for some reason they came into the room with
00:55:53
an option so Mary
00:55:57
looked at you,
00:55:58
you have quite a lot of different
00:56:01
methods
00:56:02
that we can influence on the currents you
00:56:05
are executing queries,
00:56:07
some methods are applicable in some
00:56:09
situations, some in other situations,
00:56:11
some methods
00:56:13
influence everything in general, that is, if we
00:56:16
connect some parameter, we can
00:56:18
move everything in the query plan, well if in a
00:56:21
good way they can also
00:56:25
influence the other locally, it’s
00:56:26
easier to apply these -it’s
00:56:31
difficult to give recipes because you really
00:56:33
need to understand each situation,
00:56:35
take what is applicable here and what doesn’t,
00:56:39
so
00:56:40
nothing can replace common sense,
00:56:43
you need to analyze and think, we
00:56:49
have a practice here
00:56:51
that suggests activating
00:56:54
another request, you can do something about it,
00:56:59
look at it, it shows a request for another
00:57:03
property
00:57:04
there We had a query that
00:57:06
iterates, but something is returned that
00:57:07
aggregates, and the entire query
00:57:10
should be executed fairly quickly,
00:57:13
so there are slightly different criteria,
00:57:17
but that seems to be all

Description:

PostgreSQL 10. Оптимизация запросов. Тема 11: Приемы оптимизации подробнее: https://postgrespro.ru/education/courses/QPT Авторы курса: Егор Рогов, Павел Лузанов © Postgres Professional, 2019. https://postgrespro.ru/ При использовании ссылка на сайт обязательна.

Preparing download options

popular icon
Popular
hd icon
HD video
audio icon
Only sound
total icon
All
* — If the video is playing in a new tab, go to it, then right-click on the video and select "Save video as..."
** — Link intended for online playback in specialized players

Questions about downloading video

mobile menu iconHow can I download "Учебный курс QPT. Тема 11: Приемы оптимизации" video?mobile menu icon

  • http://unidownloader.com/ website is the best way to download a video or a separate audio track if you want to do without installing programs and extensions.

  • The UDL Helper extension is a convenient button that is seamlessly integrated into YouTube, Instagram and OK.ru sites for fast content download.

  • UDL Client program (for Windows) is the most powerful solution that supports more than 900 websites, social networks and video hosting sites, as well as any video quality that is available in the source.

  • UDL Lite is a really convenient way to access a website from your mobile device. With its help, you can easily download videos directly to your smartphone.

mobile menu iconWhich format of "Учебный курс QPT. Тема 11: Приемы оптимизации" video should I choose?mobile menu icon

  • The best quality formats are FullHD (1080p), 2K (1440p), 4K (2160p) and 8K (4320p). The higher the resolution of your screen, the higher the video quality should be. However, there are other factors to consider: download speed, amount of free space, and device performance during playback.

mobile menu iconWhy does my computer freeze when loading a "Учебный курс QPT. Тема 11: Приемы оптимизации" video?mobile menu icon

  • The browser/computer should not freeze completely! If this happens, please report it with a link to the video. Sometimes videos cannot be downloaded directly in a suitable format, so we have added the ability to convert the file to the desired format. In some cases, this process may actively use computer resources.

mobile menu iconHow can I download "Учебный курс QPT. Тема 11: Приемы оптимизации" video to my phone?mobile menu icon

  • You can download a video to your smartphone using the website or the PWA application UDL Lite. It is also possible to send a download link via QR code using the UDL Helper extension.

mobile menu iconHow can I download an audio track (music) to MP3 "Учебный курс QPT. Тема 11: Приемы оптимизации"?mobile menu icon

  • The most convenient way is to use the UDL Client program, which supports converting video to MP3 format. In some cases, MP3 can also be downloaded through the UDL Helper extension.

mobile menu iconHow can I save a frame from a video "Учебный курс QPT. Тема 11: Приемы оптимизации"?mobile menu icon

  • This feature is available in the UDL Helper extension. Make sure that "Show the video snapshot button" is checked in the settings. A camera icon should appear in the lower right corner of the player to the left of the "Settings" icon. When you click on it, the current frame from the video will be saved to your computer in JPEG format.

mobile menu iconWhat's the price of all this stuff?mobile menu icon

  • It costs nothing. Our services are absolutely free for all users. There are no PRO subscriptions, no restrictions on the number or maximum length of downloaded videos.