background top icon
background center wave icon
background filled rhombus icon
background two lines icon
background stroke rhombus icon

Download "#3. Линейная модель. Понятие переобучения | Машинное обучение"

input logo icon
Similar videos from our catalog
|

Similar videos from our catalog

#5. Уравнение гиперплоскости в задачах бинарной классификации | Машинное обучение
12:03

#5. Уравнение гиперплоскости в задачах бинарной классификации | Машинное обучение

Channel: selfedu
#6. Решение простой задачи бинарной классификации | Машинное обучение
12:57

#6. Решение простой задачи бинарной классификации | Машинное обучение

Channel: selfedu
#22. Вероятностная оценка качества моделей | Машинное обучение
9:08

#22. Вероятностная оценка качества моделей | Машинное обучение

Channel: selfedu
#35. Агломеративная иерархическая кластеризация. Дендограмма | Машинное обучение
10:25

#35. Агломеративная иерархическая кластеризация. Дендограмма | Машинное обучение

Channel: selfedu
#36. Логические методы классификации | Машинное обучение
12:43

#36. Логические методы классификации | Машинное обучение

Channel: selfedu
#1. Что такое машинное обучение? Обучающая выборка и признаковое пространство | Машинное обучение
12:01

#1. Что такое машинное обучение? Обучающая выборка и признаковое пространство | Машинное обучение

Channel: selfedu
#9. Пример использования SGD при бинарной классификации образов | Машинное обучение
9:22

#9. Пример использования SGD при бинарной классификации образов | Машинное обучение

Channel: selfedu
#30. Методы парзеновского окна и потенциальных функций | Машинное обучение
10:48

#30. Методы парзеновского окна и потенциальных функций | Машинное обучение

Channel: selfedu
#10. Оптимизаторы градиентных алгоритмов: RMSProp, AdaDelta, Adam, Nadam | Машинное обучение
14:57

#10. Оптимизаторы градиентных алгоритмов: RMSProp, AdaDelta, Adam, Nadam | Машинное обучение

Channel: selfedu
#7. Функции потерь в задачах линейной бинарной классификации | Машинное обучение
10:02

#7. Функции потерь в задачах линейной бинарной классификации | Машинное обучение

Channel: selfedu
Video tags
|

Video tags

машинное обучение
машинное обучение python
машинное обучение с нуля
машинное обучение python уроки
машинное обучение это
машинное обучение лекции
машинное обучение курс
машинное обучение python с нуля
машинное обучение и искусственный интеллект
искусственный интеллект
искусственный интеллект на python
machine learning
machine learning python
machine learning course
machine learning что это
machine learning engineer
machine learning tutorial
Subtitles
|

Subtitles

subtitles menu arrow
  • ruRussian
Download
00:00:01
Balakirev and we are continuing the course on
00:00:03
machine learning in the previous lesson,
00:00:06
we saw that the machine
00:00:08
learning problem is actually an
00:00:10
optimization problem, in particular minimizing the
00:00:13
average empirical risk
00:00:15
that depends on the model and on the
00:00:18
loss function, that is, we must find such a
00:00:20
model which would minimize this
00:00:24
quality indicator mathematically, this
00:00:26
can be written as follows: we
00:00:28
find such a value and and such a model
00:00:31
from all possible models in which
00:00:34
this functional quality takes
00:00:36
the smallest value and this mix of l
00:00:39
this is such a model, that is, the
00:00:41
optimal model but this the most general
00:00:44
formulation of learning problems, if our
00:00:47
model is parametric and depends on the
00:00:50
selection of these parameter vtt
00:00:52
unknown parameters, then the
00:00:55
same formula can be rewritten like this,
00:00:56
but the video here with arches must be taken
00:00:59
according to the parameter t, that is, we find
00:01:01
such a vector of parameters. in which
00:01:04
this quality functional is minimized
00:01:06
and only then this ordinary
00:01:09
model is used in production, that is,
00:01:11
in practical implementation, like this
00:01:14
and mathematically, you can briefly write down
00:01:16
what we talked about in the previous
00:01:18
lesson, let's go back to our
00:01:22
simplest example when the target
00:01:24
values ​​were determined here is such a shadow on a
00:01:26
function plus threads of Gaussian noise with
00:01:29
zero average and some variances,
00:01:31
as we have already said, the optimal
00:01:34
model for such a problem will have
00:01:36
this linear form, that is, a set of
00:01:39
linear functions is a parametric
00:01:41
model depending on two parameters k and
00:01:43
b, and these parameters designated with
00:01:45
a cap, that is, they are selected according to the
00:01:48
training sample and in the general case they
00:01:50
contain some errors, that is, they do
00:01:52
not exactly coincide with these
00:01:54
kaiba parameters, but
00:01:57
we can write the same model like this, and here is
00:01:59
the form of a certain parameter theta1 multiplied by the
00:02:02
first sign plus a certain parameter t then
00:02:05
2 multiplied by the second sign and
00:02:08
obviously if we take as theta1 a
00:02:10
drifting snow with a cap, that is, from this
00:02:13
one theta2 this is b with a cap as the
00:02:16
first sign we simply take x and
00:02:18
as the 2nd sign just one, then
00:02:21
we have Once we get this
00:02:23
model, that is, we can write this model in general
00:02:26
form like this, and
00:02:29
if we generalize this notation to n
00:02:33
different features, then we will get
00:02:36
this formula; this formula
00:02:38
determines all possible linear models,
00:02:41
and the vector parameters you contain and
00:02:44
component why such a model is considered a
00:02:46
ruler if we go into the feature and
00:02:49
space then in it this model
00:02:53
from x it will define a plane,
00:02:56
more precisely a hyper plane of the same
00:02:58
space, multidimensional n-dimensional,
00:03:00
so it will not just be a plane but a
00:03:02
bieber plane and the orientation of this
00:03:05
hyperplane is precisely determined by these
00:03:07
parameter vectors. but now
00:03:10
the question may arise: how can this hyper
00:03:12
plane solve machine
00:03:14
learning problems, forecasting, classification,
00:03:17
ranking, and so on, I will tell you about this,
00:03:19
and in particular in
00:03:21
this lesson we will see how such a
00:03:23
linear model predicts
00:03:25
certain numerical values, it
00:03:27
solves a regression problem, and so give a
00:03:30
linear model, they are quite simple and
00:03:32
well studied and, moreover, often
00:03:34
lead to an acceptable result, and
00:03:38
at the beginning we will study in detail
00:03:40
the work of these linear models,
00:03:42
let’s just for example
00:03:44
consider a linear regression problem in
00:03:47
which we will build many target
00:03:50
values
00:03:51
for such a function, that is, the
00:03:53
sine function, and here again there are random
00:03:55
additions, if we draw a graph of this
00:03:57
function, it may look something like this,
00:04:00
that is, depending on and the
00:04:01
random value of the points rolled off
00:04:03
may change a little, but in general it
00:04:05
will be like this Well, strictly speaking,
00:04:07
we have only one
00:04:09
feature x here because this y from x
00:04:12
depends only on 1 x and for each x
00:04:14
we ​​get a strictly defined value of
00:04:16
y and it would seem that in such a situation
00:04:20
we can build a linear model
00:04:23
using only from that the parameter t
00:04:24
that is, you are Tatyana multiplied, but you
00:04:27
understand that if we build such a
00:04:29
model and try to approximate a
00:04:31
set of points like this straight line, then
00:04:34
nothing good will come of it,
00:04:36
as I do, after all, it will go like this and that’s it, and here
00:04:38
we clearly see a nonlinear the dependence of the
00:04:41
players on the X's therefore, in order to
00:04:44
remain within the framework of linear models when
00:04:48
solving such problems, we will do this
00:04:50
trick, we will expand this
00:04:53
initial sign space
00:04:56
with the help of the following signs f
00:04:59
from 0 from x it will be just a constant but
00:05:02
let it be unity for simplicity f1 from x
00:05:05
from just from will be this initial
00:05:07
feature x then f2 from exit will be x
00:05:11
squared and so on fn from x this will be x
00:05:14
to the power of n and all these features are
00:05:17
linearly independent and this is very important
00:05:20
when we expand our feature and
00:05:23
space by adding new ones the signs
00:05:25
must always be looked at so that these signs
00:05:29
are linearly independent, that is, they are
00:05:31
not expressed 1 through the other, but look
00:05:35
for an example if we
00:05:37
take x as the first sign f1 of x, that
00:05:40
is, the same as we had, and
00:05:42
as the 2nd sign drove x plus 5 then
00:05:45
this second sign and 1 they will be
00:05:47
linearly dependent they just differ
00:05:49
by the value 5 and in fact this
00:05:52
second sign it will be expressed through
00:05:54
this 0 prize ok and through 1 as
00:05:57
follows it will be 5 multiplied by f
00:06:01
0 of x Well, and accordingly they get this
00:06:03
five and here if we add another
00:06:06
f1 from x then we will just get this
00:06:09
second sign f2 this is exactly a
00:06:11
linear relationship when one ghost
00:06:15
can be expressed through another form of a
00:06:18
linear combination but still why is this
00:06:21
bad if to put it simply, they do not
00:06:24
add new information, but simply
00:06:27
duplicate what already exists, which means that
00:06:29
in fact it is simply superfluous, so after you
00:06:31
and I expanded the feature and
00:06:34
space with the help of linearly
00:06:36
independent features, our model is a
00:06:39
linear mat or began to take
00:06:41
this form with us n plus one feature and,
00:06:44
accordingly, n + 1 weight coefficient,
00:06:47
that is, we find the vector of these
00:06:49
parameters from the training sample, so
00:06:52
this model from x described the
00:06:55
empirical data well, but in particular, the
00:06:57
vector of the outcome of the parameters can be
00:06:59
found using the least squares method in
00:07:01
this case, the mouth function I must be
00:07:03
quadratic, we form the
00:07:05
corresponding average empirical
00:07:07
risk, and then we differentiate everything according to
00:07:09
theta parameters and equate the result to
00:07:12
0, we will have a system of linear equations,
00:07:14
we solve it and get the corresponding
00:07:17
model, this is exactly how I acted when I
00:07:19
built various models to
00:07:21
approximate these empirical
00:07:24
dependencies here, that’s when it
00:07:26
was equal to one, it was a polynomial of the
00:07:29
first degree, that is, of this
00:07:31
kind, it’s 1 + 2 multiplied by x, it’s a polynomial of the
00:07:35
1st degree, but in fact it’s a straight line when
00:07:38
it’s equal to 2 then there was added one more
00:07:40
term, this x squared, and we
00:07:43
got a parabola, that is, this
00:07:45
was a pair, it was a proxy for the world, this is this data, this is
00:07:47
how we then increase the
00:07:49
degree of the polynomial and we have this
00:07:51
curve that begins to describe these these more and more accurately
00:07:54
empirical
00:07:56
data, that is, its training sample,
00:07:57
so we can
00:07:59
train this about the metric
00:08:02
model and it would seem that we have just
00:08:04
found a universal solution for
00:08:06
problems and aggression, that is, we can
00:08:08
increase the polynomial to an arbitrarily
00:08:11
large degree so that this model from
00:08:14
x more and more accurately described these
00:08:17
experimental data, this is
00:08:19
cool and accordingly it may be
00:08:21
good to predict subsequent
00:08:23
values, but alas, nature does not
00:08:26
give up its positions so easily and in polynomials with
00:08:29
high degrees there is one
00:08:31
unpleasant moment, let me show you this In the
00:08:34
following example, let us have
00:08:36
such a function and we will approximate it with
00:08:38
such a linear model, that is, a
00:08:40
polynomial with a degree of 54, and
00:08:44
we will not train this model from x using the entire
00:08:47
sample, but only through the sample, that
00:08:50
is, look here, these are the red dots
00:08:53
just a sample of the training sample, that
00:08:55
is, these are the samples that participated in
00:08:57
finding these parameters from the performance characteristics,
00:09:00
green dots are the points where this
00:09:04
model builds a forecast from x, that is,
00:09:06
in the beginning, here the forecasts are built
00:09:08
quite well, but when this x
00:09:10
increases, then look what What happens is that
00:09:12
our forecast begins to diverge sharply
00:09:16
and this is precisely the disease of polynomials
00:09:19
of large degrees, at these forecast
00:09:22
points they begin to behave in a
00:09:24
completely unpredictable way,
00:09:26
this is a well-known fact familiar to all
00:09:28
mathematicians and in relation to our
00:09:30
problem this means that such a
00:09:33
model will be poorly built forecasts at
00:09:36
new points unknown to it, that is, on the
00:09:39
other hand, a polynomial of a high degree
00:09:41
describes the points of the training sample well,
00:09:43
since mine is lucky on this graph, but is
00:09:46
completely unsuitable for making
00:09:49
forecasts at new points, but with
00:09:51
polynomials of lower degrees, these problems are
00:09:54
usually not present to demonstrate this. I
00:09:57
’ll give you another graph,
00:09:59
look, we’ll build two models 1
00:10:03
from x which will be calculated based on all .
00:10:06
functions y from x, that is, it will
00:10:08
use all the data in the training
00:10:10
set and the second model, which will be
00:10:13
absolutely the same as the first, only
00:10:15
the parameter t, then it will be calculated
00:10:19
using half of the points taken through the report, that
00:10:22
is, this is how in this case we will
00:10:23
select the red points by which
00:10:26
we calculate the vector of parameters you and in the
00:10:29
green points this model makes
00:10:31
forecasts, so if for these two
00:10:33
completely identical models we calculate
00:10:36
these quality indicators, that
00:10:38
is, the empirical risk of the environment and build
00:10:40
a graph, then look here at this
00:10:44
degree polynomial, that is, here a1 and a2 they
00:10:47
represent a polynomial i.n. this is
00:10:50
the degree of the polynomial and when the degree of the
00:10:52
polynomial here becomes more than 40,
00:10:54
then this graph of q2 from x, the blue graph,
00:10:58
it begins to diverge from the red
00:11:01
graph, this is what this means, this is
00:11:04
2, this is exactly the model that can be learned
00:11:06
from half the points and this is this the
00:11:08
discrepancy just means poor
00:11:11
quality of forecasting, I want one that doesn’t
00:11:13
go up just because this
00:11:15
model was trained using all the points in the
00:11:18
training set, but if this
00:11:20
model is given points of the same nature
00:11:23
but which were not included in the training set,
00:11:25
it will also go up in this one that is,
00:11:28
it will also predict
00:11:30
the data poorly, that is, due to the fact that we
00:11:32
have two attacks from the model that are exactly
00:11:33
the same, only they are trained a little
00:11:35
differently, we see how this
00:11:38
degree of the polynomial affects their generalizing
00:11:40
abilities, that is, up to a polynomial of the
00:11:43
fortieth degree, in general, everything goes
00:11:44
good, but then, as the degrees increase,
00:11:47
forecasts at new points begin to be
00:11:49
built very poorly, so from the point of
00:11:52
view of machine learning, this is a clear
00:11:55
example of the effect
00:11:56
that is called in Russian
00:11:58
retraining in English about Wolf prices,
00:12:01
that is, retraining is a discrepancy between the
00:12:04
found model and the law of data change,
00:12:06
as here here the model goes and
00:12:10
behaves like this, and you see, and the given one
00:12:12
changes in these ways, this kind of
00:12:14
discrepancy is precisely the effect of
00:12:17
retraining, that is, the model
00:12:20
has adjusted well to all the data on which it was
00:12:22
trained and behaves completely unpredictably
00:12:25
on new data of the same
00:12:28
nature and as a result, such a
00:12:31
regular Perry model does not have sufficient
00:12:34
generalizing abilities and cannot be
00:12:36
extended to an arbitrary set of data of the
00:12:38
same nature as in the training
00:12:41
set, and this is exactly what we want from
00:12:44
our model to apply it to new
00:12:47
observations and obtain
00:12:49
correct results that is, here it
00:12:51
is important not just to carry out the function according to
00:12:54
empirical dependencies, but to isolate their
00:12:57
model; the law of nature; this is the
00:13:00
main task of machine
00:13:01
learning: to find a model of x that
00:13:05
will adequately describe these
00:13:08
empirical data, first to the training
00:13:10
sample, and then to work well
00:13:12
for an arbitrary set of data, it must be
00:13:15
said that absolutely any model from x
00:13:18
found from empirical data has
00:13:21
one or another degree of Perry training; it is
00:13:25
impossible to completely get rid of this effect; we will always, in one way or another,
00:13:27
adapt to the data in the training
00:13:30
set and all we can do is
00:13:33
minimize this effect Well, and as
00:13:36
a result, to increase the generalizing
00:13:38
ability of our model, in future
00:13:41
classes we will see how we can
00:13:43
combat this overtraining effect and
00:13:45
what approaches there are
00:13:48
[music]

Description:

Понятие линейной модели. Пример использования линейных моделей для задач регрессии. Полиномиальная линейная модель. Недостатки полиномов высоких степеней. Переобучение (overfitting). Инфо-сайт: https://proproprogs.ru/ml Телеграм-канал: https://t.me/machine_learning_selfedu

Preparing download options

popular icon
Popular
hd icon
HD video
audio icon
Only sound
total icon
All
* — If the video is playing in a new tab, go to it, then right-click on the video and select "Save video as..."
** — Link intended for online playback in specialized players

Questions about downloading video

mobile menu iconHow can I download "#3. Линейная модель. Понятие переобучения | Машинное обучение" video?mobile menu icon

  • http://unidownloader.com/ website is the best way to download a video or a separate audio track if you want to do without installing programs and extensions.

  • The UDL Helper extension is a convenient button that is seamlessly integrated into YouTube, Instagram and OK.ru sites for fast content download.

  • UDL Client program (for Windows) is the most powerful solution that supports more than 900 websites, social networks and video hosting sites, as well as any video quality that is available in the source.

  • UDL Lite is a really convenient way to access a website from your mobile device. With its help, you can easily download videos directly to your smartphone.

mobile menu iconWhich format of "#3. Линейная модель. Понятие переобучения | Машинное обучение" video should I choose?mobile menu icon

  • The best quality formats are FullHD (1080p), 2K (1440p), 4K (2160p) and 8K (4320p). The higher the resolution of your screen, the higher the video quality should be. However, there are other factors to consider: download speed, amount of free space, and device performance during playback.

mobile menu iconWhy does my computer freeze when loading a "#3. Линейная модель. Понятие переобучения | Машинное обучение" video?mobile menu icon

  • The browser/computer should not freeze completely! If this happens, please report it with a link to the video. Sometimes videos cannot be downloaded directly in a suitable format, so we have added the ability to convert the file to the desired format. In some cases, this process may actively use computer resources.

mobile menu iconHow can I download "#3. Линейная модель. Понятие переобучения | Машинное обучение" video to my phone?mobile menu icon

  • You can download a video to your smartphone using the website or the PWA application UDL Lite. It is also possible to send a download link via QR code using the UDL Helper extension.

mobile menu iconHow can I download an audio track (music) to MP3 "#3. Линейная модель. Понятие переобучения | Машинное обучение"?mobile menu icon

  • The most convenient way is to use the UDL Client program, which supports converting video to MP3 format. In some cases, MP3 can also be downloaded through the UDL Helper extension.

mobile menu iconHow can I save a frame from a video "#3. Линейная модель. Понятие переобучения | Машинное обучение"?mobile menu icon

  • This feature is available in the UDL Helper extension. Make sure that "Show the video snapshot button" is checked in the settings. A camera icon should appear in the lower right corner of the player to the left of the "Settings" icon. When you click on it, the current frame from the video will be saved to your computer in JPEG format.

mobile menu iconWhat's the price of all this stuff?mobile menu icon

  • It costs nothing. Our services are absolutely free for all users. There are no PRO subscriptions, no restrictions on the number or maximum length of downloaded videos.