Download "#3. Линейная модель. Понятие переобучения | Машинное обучение"

Please wait. We're preparing links for easy ad-free video watching and downloading.

Previous video: #6. Решение простой задачи бинарной классификации | Машинное обучение Next video: #5. Уравнение гиперплоскости в задачах бинарной классификации | Машинное обучение

#5. Уравнение гиперплоскости в задачах бинарной классификации | Машинное обучение

#5. Уравнение гиперплоскости в задачах бинарной классификации | Машинное обучение

Channel: selfedu

#6. Решение простой задачи бинарной классификации | Машинное обучение

#6. Решение простой задачи бинарной классификации | Машинное обучение

Channel: selfedu

#22. Вероятностная оценка качества моделей | Машинное обучение

#22. Вероятностная оценка качества моделей | Машинное обучение

Channel: selfedu

#35. Агломеративная иерархическая кластеризация. Дендограмма | Машинное обучение

#35. Агломеративная иерархическая кластеризация. Дендограмма | Машинное обучение

Channel: selfedu

#36. Логические методы классификации | Машинное обучение

#36. Логические методы классификации | Машинное обучение

Channel: selfedu

#1. Что такое машинное обучение? Обучающая выборка и признаковое пространство | Машинное обучение

#1. Что такое машинное обучение? Обучающая выборка и признаковое пространство | Машинное обучение

Channel: selfedu

#9. Пример использования SGD при бинарной классификации образов | Машинное обучение

#9. Пример использования SGD при бинарной классификации образов | Машинное обучение

Channel: selfedu

#30. Методы парзеновского окна и потенциальных функций | Машинное обучение

#30. Методы парзеновского окна и потенциальных функций | Машинное обучение

Channel: selfedu

#10. Оптимизаторы градиентных алгоритмов: RMSProp, AdaDelta, Adam, Nadam | Машинное обучение

#10. Оптимизаторы градиентных алгоритмов: RMSProp, AdaDelta, Adam, Nadam | Машинное обучение

Channel: selfedu

#7. Функции потерь в задачах линейной бинарной классификации | Машинное обучение

#7. Функции потерь в задачах линейной бинарной классификации | Машинное обучение

Channel: selfedu

машинное обучение

машинное обучение python

машинное обучение с нуля

машинное обучение python уроки

машинное обучение это

машинное обучение лекции

машинное обучение курс

машинное обучение python с нуля

машинное обучение и искусственный интеллект

искусственный интеллект

искусственный интеллект на python

machine learning

machine learning python

machine learning course

machine learning что это

machine learning engineer

machine learning tutorial

00:00:01

Balakirev and we are continuing the course on

00:00:03

machine learning in the previous lesson,

00:00:06

we saw that the machine

00:00:08

learning problem is actually an

00:00:10

optimization problem, in particular minimizing the

00:00:13

average empirical risk

00:00:15

that depends on the model and on the

00:00:18

loss function, that is, we must find such a

00:00:20

model which would minimize this

00:00:24

quality indicator mathematically, this

00:00:26

can be written as follows: we

00:00:28

find such a value and and such a model

00:00:31

from all possible models in which

00:00:34

this functional quality takes

00:00:36

the smallest value and this mix of l

00:00:39

this is such a model, that is, the

00:00:41

optimal model but this the most general

00:00:44

formulation of learning problems, if our

00:00:47

model is parametric and depends on the

00:00:50

selection of these parameter vtt

00:00:52

unknown parameters, then the

00:00:55

same formula can be rewritten like this,

00:00:56

but the video here with arches must be taken

00:00:59

according to the parameter t, that is, we find

00:01:01

such a vector of parameters. in which

00:01:04

this quality functional is minimized

00:01:06

and only then this ordinary

00:01:09

model is used in production, that is,

00:01:11

in practical implementation, like this

00:01:14

and mathematically, you can briefly write down

00:01:16

what we talked about in the previous

00:01:18

lesson, let's go back to our

00:01:22

simplest example when the target

00:01:24

values were determined here is such a shadow on a

00:01:26

function plus threads of Gaussian noise with

00:01:29

zero average and some variances,

00:01:31

as we have already said, the optimal

00:01:34

model for such a problem will have

00:01:36

this linear form, that is, a set of

00:01:39

linear functions is a parametric

00:01:41

model depending on two parameters k and

00:01:43

b, and these parameters designated with

00:01:45

a cap, that is, they are selected according to the

00:01:48

training sample and in the general case they

00:01:50

contain some errors, that is, they do

00:01:52

not exactly coincide with these

00:01:54

kaiba parameters, but

00:01:57

we can write the same model like this, and here is

00:01:59

the form of a certain parameter theta1 multiplied by the

00:02:02

first sign plus a certain parameter t then

00:02:05

2 multiplied by the second sign and

00:02:08

obviously if we take as theta1 a

00:02:10

drifting snow with a cap, that is, from this

00:02:13

one theta2 this is b with a cap as the

00:02:16

first sign we simply take x and

00:02:18

as the 2nd sign just one, then

00:02:21

we have Once we get this

00:02:23

model, that is, we can write this model in general

00:02:26

form like this, and

00:02:29

if we generalize this notation to n

00:02:33

different features, then we will get

00:02:36

this formula; this formula

00:02:38

determines all possible linear models,

00:02:41

and the vector parameters you contain and

00:02:44

component why such a model is considered a

00:02:46

ruler if we go into the feature and

00:02:49

space then in it this model

00:02:53

from x it will define a plane,

00:02:56

more precisely a hyper plane of the same

00:02:58

space, multidimensional n-dimensional,

00:03:00

so it will not just be a plane but a

00:03:02

bieber plane and the orientation of this

00:03:05

hyperplane is precisely determined by these

00:03:07

parameter vectors. but now

00:03:10

the question may arise: how can this hyper

00:03:12

plane solve machine

00:03:14

learning problems, forecasting, classification,

00:03:17

ranking, and so on, I will tell you about this,

00:03:19

and in particular in

00:03:21

this lesson we will see how such a

00:03:23

linear model predicts

00:03:25

certain numerical values, it

00:03:27

solves a regression problem, and so give a

00:03:30

linear model, they are quite simple and

00:03:32

well studied and, moreover, often

00:03:34

lead to an acceptable result, and

00:03:38

at the beginning we will study in detail

00:03:40

the work of these linear models,

00:03:42

let’s just for example

00:03:44

consider a linear regression problem in

00:03:47

which we will build many target

00:03:50

values

00:03:51

for such a function, that is, the

00:03:53

sine function, and here again there are random

00:03:55

additions, if we draw a graph of this

00:03:57

function, it may look something like this,

00:04:00

that is, depending on and the

00:04:01

random value of the points rolled off

00:04:03

may change a little, but in general it

00:04:05

will be like this Well, strictly speaking,

00:04:07

we have only one

00:04:09

feature x here because this y from x

00:04:12

depends only on 1 x and for each x

00:04:14

we get a strictly defined value of

00:04:16

y and it would seem that in such a situation

00:04:20

we can build a linear model

00:04:23

using only from that the parameter t

00:04:24

that is, you are Tatyana multiplied, but you

00:04:27

understand that if we build such a

00:04:29

model and try to approximate a

00:04:31

set of points like this straight line, then

00:04:34

nothing good will come of it,

00:04:36

as I do, after all, it will go like this and that’s it, and here

00:04:38

we clearly see a nonlinear the dependence of the

00:04:41

players on the X's therefore, in order to

00:04:44

remain within the framework of linear models when

00:04:48

solving such problems, we will do this

00:04:50

trick, we will expand this

00:04:53

initial sign space

00:04:56

with the help of the following signs f

00:04:59

from 0 from x it will be just a constant but

00:05:02

let it be unity for simplicity f1 from x

00:05:05

from just from will be this initial

00:05:07

feature x then f2 from exit will be x

00:05:11

squared and so on fn from x this will be x

00:05:14

to the power of n and all these features are

00:05:17

linearly independent and this is very important

00:05:20

when we expand our feature and

00:05:23

space by adding new ones the signs

00:05:25

must always be looked at so that these signs

00:05:29

are linearly independent, that is, they are

00:05:31

not expressed 1 through the other, but look

00:05:35

for an example if we

00:05:37

take x as the first sign f1 of x, that

00:05:40

is, the same as we had, and

00:05:42

as the 2nd sign drove x plus 5 then

00:05:45

this second sign and 1 they will be

00:05:47

linearly dependent they just differ

00:05:49

by the value 5 and in fact this

00:05:52

second sign it will be expressed through

00:05:54

this 0 prize ok and through 1 as

00:05:57

follows it will be 5 multiplied by f

00:06:01

0 of x Well, and accordingly they get this

00:06:03

five and here if we add another

00:06:06

f1 from x then we will just get this

00:06:09

second sign f2 this is exactly a

00:06:11

linear relationship when one ghost

00:06:15

can be expressed through another form of a

00:06:18

linear combination but still why is this

00:06:21

bad if to put it simply, they do not

00:06:24

add new information, but simply

00:06:27

duplicate what already exists, which means that

00:06:29

in fact it is simply superfluous, so after you

00:06:31

and I expanded the feature and

00:06:34

space with the help of linearly

00:06:36

independent features, our model is a

00:06:39

linear mat or began to take

00:06:41

this form with us n plus one feature and,

00:06:44

accordingly, n + 1 weight coefficient,

00:06:47

that is, we find the vector of these

00:06:49

parameters from the training sample, so

00:06:52

this model from x described the

00:06:55

empirical data well, but in particular, the

00:06:57

vector of the outcome of the parameters can be

00:06:59

found using the least squares method in

00:07:01

this case, the mouth function I must be

00:07:03

quadratic, we form the

00:07:05

corresponding average empirical

00:07:07

risk, and then we differentiate everything according to

00:07:09

theta parameters and equate the result to

00:07:12

0, we will have a system of linear equations,

00:07:14

we solve it and get the corresponding

00:07:17

model, this is exactly how I acted when I

00:07:19

built various models to

00:07:21

approximate these empirical

00:07:24

dependencies here, that’s when it

00:07:26

was equal to one, it was a polynomial of the

00:07:29

first degree, that is, of this

00:07:31

kind, it’s 1 + 2 multiplied by x, it’s a polynomial of the

00:07:35

1st degree, but in fact it’s a straight line when

00:07:38

it’s equal to 2 then there was added one more

00:07:40

term, this x squared, and we

00:07:43

got a parabola, that is, this

00:07:45

was a pair, it was a proxy for the world, this is this data, this is

00:07:47

how we then increase the

00:07:49

degree of the polynomial and we have this

00:07:51

curve that begins to describe these these more and more accurately

00:07:54

empirical

00:07:56

data, that is, its training sample,

00:07:57

so we can

00:07:59

train this about the metric

00:08:02

model and it would seem that we have just

00:08:04

found a universal solution for

00:08:06

problems and aggression, that is, we can

00:08:08

increase the polynomial to an arbitrarily

00:08:11

large degree so that this model from

00:08:14

x more and more accurately described these

00:08:17

experimental data, this is

00:08:19

cool and accordingly it may be

00:08:21

good to predict subsequent

00:08:23

values, but alas, nature does not

00:08:26

give up its positions so easily and in polynomials with

00:08:29

high degrees there is one

00:08:31

unpleasant moment, let me show you this In the

00:08:34

following example, let us have

00:08:36

such a function and we will approximate it with

00:08:38

such a linear model, that is, a

00:08:40

polynomial with a degree of 54, and

00:08:44

we will not train this model from x using the entire

00:08:47

sample, but only through the sample, that

00:08:50

is, look here, these are the red dots

00:08:53

just a sample of the training sample, that

00:08:55

is, these are the samples that participated in

00:08:57

finding these parameters from the performance characteristics,

00:09:00

green dots are the points where this

00:09:04

model builds a forecast from x, that is,

00:09:06

in the beginning, here the forecasts are built

00:09:08

quite well, but when this x

00:09:10

increases, then look what What happens is that

00:09:12

our forecast begins to diverge sharply

00:09:16

and this is precisely the disease of polynomials

00:09:19

of large degrees, at these forecast

00:09:22

points they begin to behave in a

00:09:24

completely unpredictable way,

00:09:26

this is a well-known fact familiar to all

00:09:28

mathematicians and in relation to our

00:09:30

problem this means that such a

00:09:33

model will be poorly built forecasts at

00:09:36

new points unknown to it, that is, on the

00:09:39

other hand, a polynomial of a high degree

00:09:41

describes the points of the training sample well,

00:09:43

since mine is lucky on this graph, but is

00:09:46

completely unsuitable for making

00:09:49

forecasts at new points, but with

00:09:51

polynomials of lower degrees, these problems are

00:09:54

usually not present to demonstrate this. I

00:09:57

’ll give you another graph,

00:09:59

look, we’ll build two models 1

00:10:03

from x which will be calculated based on all .

00:10:06

functions y from x, that is, it will

00:10:08

use all the data in the training

00:10:10

set and the second model, which will be

00:10:13

absolutely the same as the first, only

00:10:15

the parameter t, then it will be calculated

00:10:19

using half of the points taken through the report, that

00:10:22

is, this is how in this case we will

00:10:23

select the red points by which

00:10:26

we calculate the vector of parameters you and in the

00:10:29

green points this model makes

00:10:31

forecasts, so if for these two

00:10:33

completely identical models we calculate

00:10:36

these quality indicators, that

00:10:38

is, the empirical risk of the environment and build

00:10:40

a graph, then look here at this

00:10:44

degree polynomial, that is, here a1 and a2 they

00:10:47

represent a polynomial i.n. this is

00:10:50

the degree of the polynomial and when the degree of the

00:10:52

polynomial here becomes more than 40,

00:10:54

then this graph of q2 from x, the blue graph,

00:10:58

it begins to diverge from the red

00:11:01

graph, this is what this means, this is

00:11:04

2, this is exactly the model that can be learned

00:11:06

from half the points and this is this the

00:11:08

discrepancy just means poor

00:11:11

quality of forecasting, I want one that doesn’t

00:11:13

go up just because this

00:11:15

model was trained using all the points in the

00:11:18

training set, but if this

00:11:20

model is given points of the same nature

00:11:23

but which were not included in the training set,

00:11:25

it will also go up in this one that is,

00:11:28

it will also predict

00:11:30

the data poorly, that is, due to the fact that we

00:11:32

have two attacks from the model that are exactly

00:11:33

the same, only they are trained a little

00:11:35

differently, we see how this

00:11:38

degree of the polynomial affects their generalizing

00:11:40

abilities, that is, up to a polynomial of the

00:11:43

fortieth degree, in general, everything goes

00:11:44

good, but then, as the degrees increase,

00:11:47

forecasts at new points begin to be

00:11:49

built very poorly, so from the point of

00:11:52

view of machine learning, this is a clear

00:11:55

example of the effect

00:11:56

that is called in Russian

00:11:58

retraining in English about Wolf prices,

00:12:01

that is, retraining is a discrepancy between the

00:12:04

found model and the law of data change,

00:12:06

as here here the model goes and

00:12:10

behaves like this, and you see, and the given one

00:12:12

changes in these ways, this kind of

00:12:14

discrepancy is precisely the effect of

00:12:17

retraining, that is, the model

00:12:20

has adjusted well to all the data on which it was

00:12:22

trained and behaves completely unpredictably

00:12:25

on new data of the same

00:12:28

nature and as a result, such a

00:12:31

regular Perry model does not have sufficient

00:12:34

generalizing abilities and cannot be

00:12:36

extended to an arbitrary set of data of the

00:12:38

same nature as in the training

00:12:41

set, and this is exactly what we want from

00:12:44

our model to apply it to new

00:12:47

observations and obtain

00:12:49

correct results that is, here it

00:12:51

is important not just to carry out the function according to

00:12:54

empirical dependencies, but to isolate their

00:12:57

model; the law of nature; this is the

00:13:00

main task of machine

00:13:01

learning: to find a model of x that

00:13:05

will adequately describe these

00:13:08

empirical data, first to the training

00:13:10

sample, and then to work well

00:13:12

for an arbitrary set of data, it must be

00:13:15

said that absolutely any model from x

00:13:18

found from empirical data has

00:13:21

one or another degree of Perry training; it is

00:13:25

impossible to completely get rid of this effect; we will always, in one way or another,

00:13:27

adapt to the data in the training

00:13:30

set and all we can do is

00:13:33

minimize this effect Well, and as

00:13:36

a result, to increase the generalizing

00:13:38

ability of our model, in future

00:13:41

classes we will see how we can

00:13:43

combat this overtraining effect and

00:13:45

what approaches there are

00:13:48

[music]

Description:

Понятие линейной модели. Пример использования линейных моделей для задач регрессии. Полиномиальная линейная модель. Недостатки полиномов высоких степеней. Переобучение (overfitting). Инфо-сайт: https://proproprogs.ru/ml Телеграм-канал: https://t.me/machine_learning_selfedu

Preparing download options

Popular

HD video

Only sound

All

* — If the video is playing in a new tab, go to it, then right-click on the video and select "Save video as..."

** — Link intended for online playback in specialized players

Questions about downloading video

How can I download "#3. Линейная модель. Понятие переобучения | Машинное обучение" video?

http://unidownloader.com/ website is the best way to download a video or a separate audio track if you want to do without installing programs and extensions.
The UDL Helper extension is a convenient button that is seamlessly integrated into YouTube, Instagram and OK.ru sites for fast content download.
UDL Client program (for Windows) is the most powerful solution that supports more than 900 websites, social networks and video hosting sites, as well as any video quality that is available in the source.
UDL Lite is a really convenient way to access a website from your mobile device. With its help, you can easily download videos directly to your smartphone.

Which format of "#3. Линейная модель. Понятие переобучения | Машинное обучение" video should I choose?

The best quality formats are FullHD (1080p), 2K (1440p), 4K (2160p) and 8K (4320p). The higher the resolution of your screen, the higher the video quality should be. However, there are other factors to consider: download speed, amount of free space, and device performance during playback.

Why does my computer freeze when loading a "#3. Линейная модель. Понятие переобучения | Машинное обучение" video?

The browser/computer should not freeze completely! If this happens, please report it with a link to the video. Sometimes videos cannot be downloaded directly in a suitable format, so we have added the ability to convert the file to the desired format. In some cases, this process may actively use computer resources.

How can I download "#3. Линейная модель. Понятие переобучения | Машинное обучение" video to my phone?

You can download a video to your smartphone using the website or the PWA application UDL Lite. It is also possible to send a download link via QR code using the UDL Helper extension.

How can I download an audio track (music) to MP3 "#3. Линейная модель. Понятие переобучения | Машинное обучение"?

The most convenient way is to use the UDL Client program, which supports converting video to MP3 format. In some cases, MP3 can also be downloaded through the UDL Helper extension.

How can I save a frame from a video "#3. Линейная модель. Понятие переобучения | Машинное обучение"?

This feature is available in the UDL Helper extension. Make sure that "Show the video snapshot button" is checked in the settings. A camera icon should appear in the lower right corner of the player to the left of the "Settings" icon. When you click on it, the current frame from the video will be saved to your computer in JPEG format.

What's the price of all this stuff?

It costs nothing. Our services are absolutely free for all users. There are no PRO subscriptions, no restrictions on the number or maximum length of downloaded videos.