Download "#17. Гауссовский байесовский классификатор | Машинное обучение"

"videoThumbnail #17. Гауссовский байесовский классификатор | Машинное обучение

#23. Показатели precision и recall. F-мера | Машинное обучение

#23. Показатели precision и recall. F-мера | Машинное обучение

Channel: selfedu

#36. Логические методы классификации | Машинное обучение

#36. Логические методы классификации | Машинное обучение

Channel: selfedu

Машинное обучения Python для начинающих. Интенсив по Data Science

Машинное обучения Python для начинающих. Интенсив по Data Science

Channel: Skillbox Программирование

#34. Алгоритм кластеризации DBSCAN | Машинное обучение

#34. Алгоритм кластеризации DBSCAN | Машинное обучение

Channel: selfedu

байесовский классификатор

байесовский метод

байесовский подход

байесовский вывод

bayesian classification in data mining

bayesian classification

bayesian statistics

bayesian optimization

машинное обучение

машинное обучение python

машинное обучение с нуля

машинное обучение python уроки

машинное обучение лекции

машинное обучение курс

искусственный интеллект

искусственный интеллект на python

machine learning

machine learning python

machine learning tutorial

00:00:01

Balakirev and we are continuing the course on

00:00:03

machine learning in the previous lesson.

00:00:06

We looked at the hired I’m afraid of

00:00:08

glass pruning shears and generally made an introduction

00:00:10

to the theory of the optimal Bay

00:00:12

classifier. In this lesson we

00:00:14

will continue this topic and talk about the

00:00:16

Gaussian Bay classifier 3, that is,

00:00:19

already not naive about the full-fledged Gaussian

00:00:22

troops of the classic 3, let's first

00:00:25

assume that the set of objects in the

00:00:27

training sample obeys the

00:00:29

Gaussian normal

00:00:31

distribution law and this sample can

00:00:33

look like this, that is,

00:00:35

we have here, as it were, two-dimensional features and

00:00:37

space and each . this is an object of

00:00:39

the training sample and this object is also

00:00:42

for class 1 and these are objects for class 2

00:00:46

then, in accordance with the buy so

00:00:48

all theorem, it burns into the inference machine should be

00:00:50

built based on this formula

00:00:52

here poet and y is still the a priori

00:00:55

probability of the appearance of class y n x

00:00:58

provided y is the conditional

00:01:00

distribution density, that is, the likelihood function of

00:01:03

inputs and outputs, but about x this can be

00:01:07

perceived as a kind of reinforcing

00:01:09

factor that does not depend on oil of

00:01:11

class y, since in our problem we

00:01:14

assume that this sample obeys the

00:01:17

golf distribution, then formally

00:01:20

here

00:01:21

we can write such a conditional probability distribution density

00:01:24

in the form of a multidimensional

00:01:25

Gaussian distribution

00:01:27

here this one I can y this is the vector

00:01:30

of mathematical expectations for images x

00:01:32

that belong to a strictly defined

00:01:35

class and 7 y this is the covariance matrix of

00:01:39

also images that belong to a strictly

00:01:42

defined class is this one

00:01:44

with a letter e is nothing more than the

00:01:46

mathematical expectation operator, that is, we

00:01:48

take the mathematical expectation from the

00:01:50

products of such a product of two

00:01:52

vectors and in the end we get a

00:01:54

covariance matrix, and under these

00:01:57

conditions, the

00:01:58

classification decision algorithm

00:02:00

for multidimensional gold sky probability

00:02:02

distribution density is written in

00:02:05

exactly the same way as for the optimal

00:02:08

Bayevsky stake of the pruning shears, that is, the difference is

00:02:10

formal and there is nothing here, only the

00:02:12

appearance itself, this is the conditional

00:02:14

distribution density, everything else

00:02:18

remains the same, that is, at the level of

00:02:19

such general mathematics, absolutely

00:02:22

nothing changes, but I will remind you that this

00:02:24

lambda y is a penalty and which we

00:02:27

superimpose on the incorrect classification by the

00:02:29

model of class y and look, from this

00:02:32

formula follows a special case for the

00:02:34

naive Abai class of Qatar when

00:02:37

this covariance matrix

00:02:39

is diagonal, that is, it takes a

00:02:41

tweet like this on the diagonal, the variances are all the

00:02:44

rest 0 and if the covariance

00:02:47

matrix is exactly like this then in the case of a

00:02:49

multidimensional Gaussian distribution

00:02:50

we automatically obtain a record of the

00:02:54

probability distribution density

00:02:56

by seeing the products of the corresponding

00:02:59

one-dimensional ones, i.e. here x with an index and

00:03:02

at the top is this feature of the vector x, that

00:03:05

is, the image of x, and with such a

00:03:07

covariance matrix, we get

00:03:09

that the features are independent of each other, but

00:03:13

as you understand, this is an

00:03:14

assumption of the covariance matrix, this is a

00:03:16

very strong assumption bordering on the

00:03:19

range, hence the name

00:03:21

naive I'm afraid of glass pruner signs in

00:03:25

machine learning problems quite

00:03:27

often turn out to be

00:03:29

linearly dependent to some extent, and as we know from

00:03:32

probability theory, the degree of linear

00:03:34

dependence of 2 any random variables

00:03:37

is determined by the covariance of this such an

00:03:39

expression or and the most normalized from the

00:03:42

covariance, we get the

00:03:44

correlation coefficient the correlation coefficient

00:03:46

changes in range from minus one to

00:03:48

one, if the correlation value is

00:03:51

zero, then for a tie of random variables

00:03:53

this automatically means complete

00:03:56

independence,

00:03:57

and for other distributions linear

00:03:59

independence, if the

00:04:01

correlation coefficient is equal to one or minus

00:04:04

one, then the random variable is completely

00:04:07

linearly dependent and will line up like

00:04:09

this in In this case, from the

00:04:11

value of one random variable, it is possible to

00:04:13

calculate the value of 2 random variables

00:04:16

because they have a strict linear

00:04:19

relationship, but in most

00:04:20

practical problems the

00:04:22

correlation coefficient in absolute value cuts between zero and

00:04:24

deni c, that is, the random variables in

00:04:27

this case are only

00:04:29

linearly independent to some extent. This means

00:04:32

that for a certain value of

00:04:34

one feature we can only exhale a

00:04:37

certain range of values, a narrower

00:04:40

range of values for 2 features and

00:04:42

the use of a naive buysku classic of spending

00:04:45

for such a case may be

00:04:47

questionable, especially if the

00:04:49

correlation coefficient in modulus is close to one,

00:04:52

good, but then the question is how do we

00:04:55

In practice, we can take into account these

00:04:57

covariance relationships between characteristics and

00:04:59

build not a hired but a full-fledged Boris

00:05:02

glass pruner. The good news is that in the

00:05:04

case of a multidimensional Gaussian

00:05:06

distribution, this is relatively

00:05:08

easy to do. As you may have guessed, for this

00:05:11

we just need to

00:05:12

construct estimates of the mathematical

00:05:14

expectation and covariance matrix from the training sample for

00:05:17

each class separately and the formula for

00:05:19

calculating those estimates is generally simple and

00:05:21

obviously the mathematical expectation is the

00:05:24

arithmetic mean and the elements of the

00:05:26

covariance matrix can be calculated in

00:05:27

this way, there are theorems

00:05:29

that prove that these

00:05:31

estimates correspond to the

00:05:33

maximum likelihood estimate for Elmer's

00:05:35

Gallic distributions, that is, these the

00:05:38

estimates will be adequate and in the general case

00:05:40

we can’t come up with anything better here.

00:05:43

Of course, we could trust the

00:05:45

estimates ourselves. The number of objects of each class in the

00:05:48

training set should be as

00:05:50

large as possible, otherwise we simply won’t collect

00:05:52

sufficient statistics. It is believed that the

00:05:55

minimum is 100 objects of one

00:05:58

specific class, but that’s all but it is desirable to

00:06:01

have from 1000 or more, in order to

00:06:04

better understand all this, let's apply this

00:06:06

approach of a handful of Bay classifier to a

00:06:09

simulated binary

00:06:11

classification problem in which the training

00:06:13

sample will be generated based on a

00:06:16

bivariate normal distribution with the

00:06:18

following parameters, the

00:06:20

correlation coefficient for class 1 and the variance for

00:06:23

class 1 will be equal accordingly, 0.8

00:06:26

unit mathematical expectation, we

00:06:28

will take a quartz zone of

00:06:30

the matrix based on these parameters

00:06:32

will be determined by these parameters and the

00:06:34

same for the second class, in total we

00:06:37

will generate a thousand images, that is,

00:06:40

1000 points for each class and Using this

00:06:43

data, we will then calculate these

00:06:46

estimates, the mathematical expectation and the

00:06:48

covariance matrix, and then apply

00:06:51

the algorithm of the Gaussian Bay

00:06:53

classifier, which can be written like

00:06:55

this. This video is the form we

00:06:57

get when we take the natural

00:06:58

logarithm from this optimal

00:07:01

Bay classifier, that is, we

00:07:03

take the natural logarithm separately

00:07:05

for this factor and for this

00:07:07

factor which represents a

00:07:09

multidimensional distribution density

00:07:11

probability check the

00:07:13

distribution density in this

00:07:15

distribution plane we have the first

00:07:17

term to select here and the second

00:07:19

term is what is under the exponents,

00:07:21

only the degree remains and in the end we

00:07:23

come to this formula let's

00:07:26

see how everything will work not

00:07:28

through a program in python it is

00:07:30

quite simple it will be laid out by gear

00:07:32

hop and each of you can download it

00:07:34

the link under this video at the beginning we

00:07:36

connect the necessary libraries then we

00:07:39

install the grain of the random

00:07:40

number sensor so that for all of you it gives for the

00:07:43

same value, and then we submit

00:07:46

these variance correlation coefficients

00:07:47

for the 1st class, here we have the

00:07:50

vector of mathematical expectation and the

00:07:52

covariance matrix, and the same for the

00:07:54

second class, then we model a

00:07:56

thousand random variables of each class

00:07:58

in accordance with the multidimensional normal

00:08:01

distribution, well, I give you, we calculate the experimental

00:08:04

estimate for the vector of mathematical

00:08:07

expectation and for covariance matrices,

00:08:09

that is, in one this is a quartz dream and a matrix

00:08:11

for class 1 vg-2 this is a ring for a matrix

00:08:14

for class 2, then we determine the

00:08:17

necessary parameters for the golf

00:08:20

bracket model from which classifier this is the

00:08:22

probability of the class appearing and the penalty

00:08:24

that we impose for incorrect

00:08:26

classification, I took the same penalty everywhere,

00:08:28

then this one would be just

00:08:30

due to this formula, what

00:08:33

is under the argamak itself and then we take the

00:08:36

arc max from the input image x which you have

00:08:40

here given an arbitrary output

00:08:42

image x with two parameters and we calculate

00:08:45

for it this formula for

00:08:48

class 1 and then for class 2, that is, for

00:08:51

each class this vector of

00:08:52

mathematical expectation of the covariance

00:08:54

matrix will be different and then we select

00:08:56

the class whose probability will be

00:08:58

maximum, that is, this is the

00:09:00

max arch, we display the

00:09:02

class number in the console, and here the questions

00:09:04

display the set of points that were

00:09:06

generated, let's run the program

00:09:08

and see how it works,

00:09:10

look at the points distributed in these

00:09:13

ways, the class number we got 0, with

00:09:17

the vector x in this one from the input one

00:09:20

takes the values 0 and minus 4, that is,

00:09:23

for X we take 0, for Y we take -4,

00:09:26

getting into the area of these blue

00:09:29

dots and the area of blue. this is just the

00:09:31

first class, but in this case, 0,

00:09:33

let’s put plus 4 instead of minus 4

00:09:35

and run this program again, our

00:09:38

class has changed, that is, when we

00:09:40

put 0 and plus 4, we ended up in the area of

00:09:43

these orange dots and,

00:09:45

accordingly, the class number we

00:09:47

have also changed, that is, a Gaussian bo made of

00:09:49

glass pruning shears works for us, but

00:09:52

now let’s take a closer look at how

00:09:54

this one is different from the angle of the

00:09:56

skib from the pruning shears cycle, here are its naive

00:09:59

implementations when we consider what is recognized

00:10:01

and independent, of course I said that in

00:10:03

this case with a naive Bayevsky to the

00:10:05

squadron, our covariance matrices

00:10:06

take this form, but still,

00:10:09

what does it actually give if you

00:10:12

look again at the distribution of points gauges of

00:10:14

whose distribution with a correlation of 0.7, then

00:10:18

you can see the so-called ellipse of their

00:10:20

scattering and within this ellipse

00:10:23

we can identify two main axes,

00:10:26

here they are marked in orange along these axes,

00:10:28

these points are essentially distributed,

00:10:31

that is, this is how

00:10:32

random variables behave in a

00:10:35

golf scam distribution and in the general case, the

00:10:37

covariance matrix sigma which

00:10:40

describes this behavior of the points

00:10:42

can be imagined in the form of such a

00:10:44

spectral decomposition here in this is a

00:10:47

matrix consisting of eigenvectors of the

00:10:49

matrix stigmata there is a covariance

00:10:51

matrix that because of the eigenvectors

00:10:54

this is just the whole vector that

00:10:55

determines the direction of the main axes of the

00:10:58

scattering ellipse, but this old

00:11:01

matrix with it determines the dispersion

00:11:04

scatter for each of coordinates, that is,

00:11:07

in fact, this covariance

00:11:09

matrix determines linear transformations; it is

00:11:12

uniformly distributed over a set of

00:11:15

points, that is, it is uncorrelated among a

00:11:17

set of points and the

00:11:19

set is correlated. that is, when they

00:11:21

are distributed approximately in this

00:11:22

way, that is, by registering the inverse of

00:11:25

this covariance matrix in

00:11:27

the classifier, we essentially move

00:11:30

to a new space in a new

00:11:33

coordinate system, to this new

00:11:35

coordinate system, and already in this new

00:11:39

coordinate system, the set of these points

00:11:41

turns out to be uncorrelated and here too,

00:11:45

in this new coordinate system,

00:11:47

we process them in the usual naive way, that

00:11:51

is, we simply sum them up and get

00:11:52

the value, that’s exactly how a

00:11:56

Gaussian bo made of glass

00:11:58

pruning shears works, well, in more detail from the attacks but the

00:12:00

spectral decomposition of

00:12:02

eigenvectors and numbers, we also have We’ll

00:12:04

talk about it in future classes, but I think

00:12:07

that the general principle of how the bays work in those

00:12:09

classifiers jumped from those

00:12:10

distributions is clear to you

00:12:12

[music]

Description:

Принцип построения и работы гауссовского байесовского классификатора в многомерном признаковом пространстве. Его отличие от наивного байесовского классификатора. Инфо-сайт: https://proproprogs.ru/ml Телеграм-канал: https://t.me/machine_learning_selfedu machine_learning_17.py: https://github.com/selfedu-rus/machine_learning

Preparing download options

Popular

HD video

Only sound

All

* — If the video is playing in a new tab, go to it, then right-click on the video and select "Save video as..."

** — Link intended for online playback in specialized players

Questions about downloading video

How can I download "#17. Гауссовский байесовский классификатор | Машинное обучение" video?

http://unidownloader.com/ website is the best way to download a video or a separate audio track if you want to do without installing programs and extensions.
The UDL Helper extension is a convenient button that is seamlessly integrated into YouTube, Instagram and OK.ru sites for fast content download.
UDL Client program (for Windows) is the most powerful solution that supports more than 900 websites, social networks and video hosting sites, as well as any video quality that is available in the source.
UDL Lite is a really convenient way to access a website from your mobile device. With its help, you can easily download videos directly to your smartphone.

Which format of "#17. Гауссовский байесовский классификатор | Машинное обучение" video should I choose?

The best quality formats are FullHD (1080p), 2K (1440p), 4K (2160p) and 8K (4320p). The higher the resolution of your screen, the higher the video quality should be. However, there are other factors to consider: download speed, amount of free space, and device performance during playback.

Why does my computer freeze when loading a "#17. Гауссовский байесовский классификатор | Машинное обучение" video?

The browser/computer should not freeze completely! If this happens, please report it with a link to the video. Sometimes videos cannot be downloaded directly in a suitable format, so we have added the ability to convert the file to the desired format. In some cases, this process may actively use computer resources.

How can I download "#17. Гауссовский байесовский классификатор | Машинное обучение" video to my phone?

You can download a video to your smartphone using the website or the PWA application UDL Lite. It is also possible to send a download link via QR code using the UDL Helper extension.

How can I download an audio track (music) to MP3 "#17. Гауссовский байесовский классификатор | Машинное обучение"?

The most convenient way is to use the UDL Client program, which supports converting video to MP3 format. In some cases, MP3 can also be downloaded through the UDL Helper extension.

How can I save a frame from a video "#17. Гауссовский байесовский классификатор | Машинное обучение"?

This feature is available in the UDL Helper extension. Make sure that "Show the video snapshot button" is checked in the settings. A camera icon should appear in the lower right corner of the player to the left of the "Settings" icon. When you click on it, the current frame from the video will be saved to your computer in JPEG format.

What's the price of all this stuff?

It costs nothing. Our services are absolutely free for all users. There are no PRO subscriptions, no restrictions on the number or maximum length of downloaded videos.