Download "3.2 Расчет нагрузки на систему"

"videoThumbnail 3.2 Расчет нагрузки на систему

10 вопросов про A/B-тесты | Часть 2 | Валерий Бабушкин | karpov.courses

10 вопросов про A/B-тесты | Часть 2 | Валерий Бабушкин | karpov.courses

Channel: karpov.courses

4.2 Высокоуровневый дизайн

4.2 Высокоуровневый дизайн

Channel: karpov.courses

Снижение дисперсии через стратификацию Сuped | Валерий Бабушкин | Вводный урок | karpov.courses

Снижение дисперсии через стратификацию Сuped | Валерий Бабушкин | Вводный урок | karpov.courses

Channel: karpov.courses

5.2 Выбор подходящих баз данных

5.2 Выбор подходящих баз данных

Channel: karpov.courses

Обзор платформы A/B-тестирования

Обзор платформы A/B-тестирования

Channel: karpov.courses

5.1 Выбор подходящих баз данных

5.1 Выбор подходящих баз данных

Channel: karpov.courses

1.7 Генерация ssh ключей и клонирование репозитория

1.7 Генерация ssh ключей и клонирование репозитория

Channel: karpov.courses

8 урок SQL: Оконные функции

8 урок SQL: Оконные функции

Channel: karpov.courses

4.1 Высокоуровневый дизайн

4.1 Высокоуровневый дизайн

Channel: karpov.courses

Анатолий Карпов

Аналитика

машинное обучение

data science

SQL

база данных

Python

pandas

визуализация

карьера

зарплата аналитика

курсы

karpov courses

курсы по программированию

курсы по аналитике

курсы по стастистике

ML

stepik

data engineer

roapmap

роадмап

инженер данных

пути развития

карьера инженера данных

курсы карпова

карпов

machine learning

карьера в IT

работа в IT

дата сайнс

школа data science

00:00:09

Hello everyone The topic of today's lecture is

00:00:11

calculating the load on the system. In previous

00:00:14

lectures, we discussed the question of what

00:00:16

requirements can be placed on the system,

00:00:19

what the user can expect when

00:00:21

using it. Based on this, we can

00:00:24

imagine what load this can

00:00:26

cause on our system. For example, if we

00:00:29

presented Neither the fact that the user

00:00:31

can track an order in real

00:00:33

time Or, for example, a

00:00:36

taxi driver, then it obviously follows that we

00:00:39

need to maintain some kind of

00:00:41

connection or receive frequent

00:00:43

updates from some remote

00:00:45

server, if we have

00:00:48

millions or tens of millions of users then and

00:00:50

there will be millions and

00:00:53

tens of millions of connections held, from here we can

00:00:55

conclude that when we have made some

00:00:58

demands on our system, we can already

00:01:00

make some preliminary calculations

00:01:03

about what kind of load on one side or another

00:01:06

this can lead to even

00:01:09

before we in general, we are engaged in at least some

00:01:12

implementation of even the most minimal

00:01:14

application, we can also conclude

00:01:17

that this may cost us too

00:01:19

much and not start implementing

00:01:21

the system at all until we have weakened,

00:01:24

for example, some of the requirements that we

00:01:26

presented; also at this stage we

00:01:29

will be able evaluate what

00:01:32

load characteristics will be important to us, for example, we

00:01:35

can understand that the reading load

00:01:37

will be many times greater than the

00:01:39

writing load, for example on a social network. You

00:01:42

can publish several posts a

00:01:45

day while viewing dozens and

00:01:47

hundreds of posts from your friends or just

00:01:49

some then well-known people, what

00:01:52

components will make up

00:01:54

the load on our system, firstly?

00:01:56

Any system has some users,

00:01:59

so we can estimate based on

00:02:02

some of our requirements or wishes of

00:02:04

the customer’s business. And how many

00:02:06

users can, for example, come per

00:02:09

day per month and so on, or How much

00:02:12

can be the maximum number of

00:02:13

simultaneous users based on

00:02:16

what requirements We placed on

00:02:18

our system, we can conclude what kind of

00:02:20

computing load this will lead to,

00:02:23

how much it will be necessary to

00:02:24

maintain connections with our server, for example,

00:02:28

and therefore we can understand

00:02:31

what load will be further for our

00:02:34

services that will already carry out

00:02:36

some kind of internal work, then we

00:02:38

can then estimate accordingly what the

00:02:40

computing load will be on the service,

00:02:43

from this we can conclude how many

00:02:45

instances we need or how

00:02:47

much hardware and what cost

00:02:50

will be needed to withstand

00:02:51

such an internal load also,

00:02:54

the user can most likely

00:02:55

transfer some data, for example,

00:02:57

upload some photos or

00:03:00

arbitrary files; this will

00:03:02

obviously follow from the load on

00:03:04

some kind of storage already used in

00:03:06

our system, be it just hard

00:03:09

drives in our hardware servers or

00:03:12

some kind of then cloud instances or just

00:03:15

cloud storage, for example some kind of

00:03:17

Amazon S3 in which we can already

00:03:19

store our files, which means if we

00:03:22

know that the user uploads on

00:03:24

average such and such a number of files of

00:03:26

such and such a size, this leads to the

00:03:28

amount of

00:03:30

some data stored on our

00:03:31

servers or on beams and estimating

00:03:34

the cost of servers or the cost of

00:03:36

cloud services, we can make

00:03:38

some estimate of how much it will cost us to

00:03:40

save all the data of our

00:03:42

users. At some

00:03:44

time horizon. In total, the total load on the service

00:03:47

can be assessed by several

00:03:49

components. The first is

00:03:51

user traffic then there are how many

00:03:52

total users in our service,

00:03:55

how many, for example, log in every day or

00:03:57

how many simultaneous users we have,

00:04:00

then the activity of our users

00:04:02

leads to the fact that some kind of

00:04:04

network interaction occurs between, for example, a

00:04:06

client application on the phone and

00:04:09

our servers, this will result in

00:04:12

some kind of load on network, that is,

00:04:14

user traffic,

00:04:16

we simply measure it in some megabytes or

00:04:18

gigabytes per second and also how many

00:04:21

connections we will have.

00:04:23

Then, depending on what

00:04:25

functionality our system

00:04:28

provides, we need to estimate

00:04:31

some calculations of how many

00:04:34

instances we will need

00:04:36

to process, for example, some

00:04:38

complex queries to a database. Or, for

00:04:40

example, if this is a service that

00:04:42

performs some functions based on

00:04:44

machine learning, then most likely We

00:04:47

need to estimate how many of these or other

00:04:48

servers we will need in order to

00:04:51

respond with such and such frequency to

00:04:53

user requests that require the launch of one

00:04:56

or another model; the fourth component

00:04:59

is storage; we estimate how

00:05:02

much the content generated by our

00:05:04

users will occupy on the disk. And

00:05:07

from this we can conclude how much it will

00:05:08

cost us to support such a

00:05:10

number of disks, as well as their purchase or the

00:05:13

use of some cloud services

00:05:16

for data storage The first side from

00:05:19

which we look at what is the load

00:05:21

on our service. It turns out that this is

00:05:23

user traffic, here we

00:05:25

estimate How many total

00:05:27

users we can

00:05:30

register in our service, how many,

00:05:32

for example, come every month or

00:05:34

every day, or how many

00:05:37

users use our service

00:05:39

constantly can be considered as at least

00:05:42

several basic indicators that are

00:05:44

often used when analyzing the load

00:05:47

on a particular service This is the number of

00:05:50

users who log in every

00:05:52

month or monthly AC users, we

00:05:55

simply reduce it to Ma This is the number of daily

00:05:58

users, that is, how many of them on average

00:06:00

log in to us every day is Daily

00:06:03

Active users or Da as well as the total

00:06:07

number of users On the horizon for a

00:06:08

certain amount of time because, for

00:06:11

example, if we store a

00:06:13

user profile, some information, a

00:06:15

photograph, and so on, then obviously we

00:06:17

will still have to store data about

00:06:19

all users who

00:06:20

have registered and have not deleted their

00:06:22

account, but In this case, we

00:06:25

will most likely calculate the load based on either the

00:06:28

monthly or daily,

00:06:30

taking into account also that from day to day

00:06:33

the load may change according to some

00:06:35

standard patterns, for example, due to the fact

00:06:38

that we have weekends and for example, on

00:06:40

weekends the load may be more or

00:06:43

less On the contrary, depending on the type of

00:06:45

service, also for each user,

00:06:48

we can estimate what is the average

00:06:50

amount of content he can create. Well,

00:06:53

rather, not even estimate, but assume, for

00:06:55

example, we are creating some kind of

00:06:57

platform for microblog and based on the

00:07:00

analysis of some competing services,

00:07:02

we can assume that for example,

00:07:04

each user generates on average

00:07:06

three or four posts, attaching photographs to them,

00:07:09

we know what the

00:07:12

average size will be and then we can,

00:07:14

for example, from this estimate the load

00:07:17

on the storage, also starting from the most

00:07:20

basic indicators for assessing the

00:07:22

user load, we can

00:07:24

draw conclusions that will be useful already

00:07:26

in assessing other types of load on

00:07:29

our services, for example, based on

00:07:32

how many daily users we have,

00:07:34

making an assumption about how exactly

00:07:37

users use our service,

00:07:39

we can say that, for example, on

00:07:41

average, every second our users

00:07:44

will make so many requests to

00:07:46

view photos that is, the number of

00:07:49

requests to some of our internal

00:07:51

services or request per Second. That is,

00:07:54

this is the same rps that is often

00:07:56

used as an indicator of the load on a

00:07:59

specific internal service.

00:08:02

Accordingly, we will be able to evaluate this indicator

00:08:04

and then when assessing, for example,

00:08:07

hardware requirements, we we can understand how many

00:08:09

instances we will need to

00:08:11

implement the load that will

00:08:13

fall, for example, on reading pictures,

00:08:17

running some models, or accessing the

00:08:19

database, also if we make

00:08:22

an assumption about how many

00:08:25

different requests the user will make and how long

00:08:27

they will be processed, we can

00:08:29

draw a conclusion about how long

00:08:31

connections will be held. Because

00:08:34

if we know that users make

00:08:37

1,000 requests per second, but each

00:08:39

user request is processed

00:08:41

for example in 100 milliseconds, then we can

00:08:43

conclude that we need to hold

00:08:46

100 connections simultaneously because

00:08:49

each user establishes a

00:08:50

connection after 100 milliseconds

00:08:52

receives a response and then,

00:08:54

accordingly, the connection is terminated, and

00:09:00

because the number of connections held

00:09:02

is also a rather important parameter,

00:09:05

which in some ancient times was

00:09:08

actually quite limited, now this is

00:09:10

not such a big problem, but

00:09:12

still, do not forget about the fact that the

00:09:15

number We may also

00:09:16

have

00:09:21

limited connections,

00:09:23

also by making an assumption about the behavior of

00:09:26

our users and how

00:09:29

many active users we

00:09:30

can understand What is the total amount of

00:09:33

content they will download, for example, per

00:09:35

day or per month And thereby translating, for

00:09:38

example, a behavior pattern that the

00:09:40

user downloads so much

00:09:42

images, so many videos, we know

00:09:44

how much space they take up on average,

00:09:46

each one, we can understand how much new

00:09:49

space our users will begin to use

00:09:52

every day, accordingly, we

00:09:54

can make predictions about how much

00:09:57

new storage space will be occupied

00:10:00

every day, every month and Based on

00:10:02

So we can estimate how much it will

00:10:04

cost us to support such a system, at

00:10:07

least in terms of used

00:10:09

storage or payment for some cloud

00:10:12

storage. Well, after all the calculations

00:10:15

carried out on user traffic,

00:10:17

we can accordingly draw a conclusion how many

00:10:19

network connections we will have,

00:10:21

how much load we will have by the

00:10:23

number of connections held,

00:10:26

how much data we use in

00:10:29

the storage, from this we can

00:10:31

conclude how much all this will cost us. At

00:10:33

some visible horizon, and then

00:10:35

conclude how profitable this story is in general.

00:10:38

He said it is possible to return to

00:10:41

re-evaluate what

00:10:43

opportunities and what requirements are ours

00:10:45

users will be able to present and

00:10:47

possibly start with something smaller

00:10:49

or move on to some implementation of

00:10:52

some other idea of another project the

00:10:55

second component of the total load our

00:10:57

service consists of the network load the

00:11:00

network load in turn is

00:11:02

divided into two categories into which we

00:11:05

pay attention The first is the number of

00:11:08

connections held, that is, how many

00:11:10

simultaneous requests we

00:11:12

process when we receive some

00:11:15

requests from our users to our

00:11:17

servers or how many

00:11:19

connections we hold between

00:11:21

different parts between different instances of

00:11:24

our different

00:11:27

sub-cores

00:11:29

based on what the data users

00:11:32

transmit, we will be able to estimate what kind of

00:11:34

network traffic we generate in this case,

00:11:36

since network traffic naturally also

00:11:39

may not be free and may

00:11:41

even be very paid, and this can be a

00:11:44

very important part in the final bill.

00:11:47

When will we estimate how much

00:11:49

support will cost us of this or that

00:11:51

service, the resulting network load

00:11:54

can be estimated based on what kind of

00:11:57

load we have on it, that is, how many

00:11:59

requests we have per second, we know

00:12:02

what is our average response time and

00:12:04

Based on this, as before, we can

00:12:06

estimate how many we have at the same time

00:12:08

connections are kept open at the same time,

00:12:10

and we can also understand

00:12:13

based on what data

00:12:15

users transmit, how much they transmit on

00:12:17

average per day or per minute or per

00:12:20

second, and this means we can understand what kind of

00:12:22

network traffic we have in this case will

00:12:24

occur between either parts of the system

00:12:27

or between user and some

00:12:29

external

00:12:32

sub-core

00:12:51

level, firstly, in terms of the number of

00:12:54

connections, until recently some time

00:12:56

ago there was such a problem as 10k or

00:13:00

holding 10,000 connections simultaneously by

00:13:03

one small machine. But of

00:13:05

course, in our time this is no longer a

00:13:07

question; moreover, they exist technologies

00:13:10

that support the retention of

00:13:12

millions and even 10 million connections, you can

00:13:16

find articles from some large

00:13:18

technology companies that

00:13:20

show how This can be achieved, but

00:13:23

most likely some Reasonable

00:13:25

number of connections like 10-100,000, we

00:13:28

can easily retain even one

00:13:29

instance without switching to some very

00:13:32

extravagant

00:13:33

framework in terms of network

00:13:36

traffic. If you transfer data between

00:13:39

some cloud instances and

00:13:41

users, then most likely your

00:13:44

provider provides a

00:13:46

1 Gbit wide connection. If you use

00:13:49

some kind of iron servers, then most

00:13:52

likely if these are copper cables then they

00:13:55

support up to 10 Gbit per second. If

00:13:58

this is some kind of optical fiber, then usually

00:14:00

it is up to 40 Gbit per second, but if you

00:14:04

use some kind of super-intensive

00:14:06

network computing between different

00:14:09

computers in a cluster, then there are

00:14:12

technologies such as infin Band that

00:14:14

support communication between different

00:14:16

servers up to 100 Gbit per second and even

00:14:20

more, after we have assessed the network

00:14:23

load, we can make an assumption

00:14:25

about how much money we will have to

00:14:28

spend to withstand such a network load,

00:14:31

if we look at several

00:14:33

different cloud providers, we can

00:14:36

conclude that traffic transfer is usually

00:14:38

within, for example, either one

00:14:41

location or within a certain region

00:14:44

it can cost 1 cent per gigabyte. But

00:14:47

usually, data transfer between different

00:14:50

continents, for example, in the worst case

00:14:52

will cost up to 10 cents or about a dollar

00:14:56

per gigabyte of traffic, so later when

00:14:59

we can evaluate the load of

00:15:01

certain hypothetical services, we will

00:15:03

make the assumption that the cost of

00:15:05

traffic will be 10 cents or 1

00:15:09

dollar per gigabyte, the next class of

00:15:11

load on our services is the

00:15:13

computing load and it would seem to

00:15:16

estimate the computing load on the

00:15:17

service without doing even the slightest

00:15:21

software implementation of certain

00:15:23

possibilities will be the most difficult. But if you

00:15:26

look at various open data

00:15:28

about what kind of load

00:15:31

some average instance can withstand,

00:15:34

for example in the Cloud or some

00:15:36

popular hardware, based on the

00:15:38

use of different frameworks, we

00:15:40

can make at least an approximate

00:15:42

guess about what kind of load we have

00:15:45

will occur on our service and this means

00:15:48

how many of these or other instances we

00:15:50

will most likely need to order to

00:15:54

hold, for example, a certain number of

00:15:56

connections or respond to some

00:15:59

simple requests or to some more

00:16:01

complex requests that include

00:16:04

changes, for example what - from the tabular

00:16:06

data,

00:16:08

we can understand at least some approximate figures by looking at

00:16:11

different open sites from which

00:16:14

actually involve measurements of such

00:16:16

different types of load on different

00:16:18

frameworks and on different cloud instances.

00:16:21

For example, the slide shows the result of

00:16:24

one of these measurements from the te company on the

00:16:27

website Bench and in 2021 you can

00:16:31

see that when using a

00:16:33

cloud instance on

00:16:35

are and when using different

00:16:38

frameworks, then in general the average

00:16:40

[music]

00:16:48

transmitted in Jon format, we received that

00:16:52

such instances gave us 70,000

00:16:55

responses per second if

00:16:58

some kind of database query was carried out then it

00:17:01

was there 30,000 per second If there were

00:17:04

several records,

00:17:05

several complex queries, then it is

00:17:08

accordingly clear that

00:17:10

the performance dropped significantly If

00:17:12

any data updates were made,

00:17:14

then the PS was Even less, that is, about a

00:17:17

thousand 2000 Nuno, and if the service all that

00:17:20

needed to be done was respond with some

00:17:22

super simple text response, then

00:17:26

the load was hundreds of thousands, as

00:17:29

shown here, that we could respond to

00:17:32

400,000 such requests per second, so for

00:17:35

example, based on the fact that our service

00:17:37

supports updating the

00:17:39

user profile or receiving some

00:17:41

complex algorithmic news feed

00:17:44

Or what - messages from other

00:17:47

users, we can, for example, draw a

00:17:49

conclusion based on such estimates that we will

00:17:53

then need for such and such a

00:17:54

number of users at least

00:17:56

such and such a number of instances to

00:17:58

support our computing load,

00:18:00

and just like in the case of network load,

00:18:03

we can understand what it is - this is the number of

00:18:06

servers we need to order. Well, then we can

00:18:08

estimate how much it will cost us

00:18:10

and understand whether the requirements

00:18:13

that we set for our service are realistic.

00:18:15

Or realistic, for example, the capabilities

00:18:18

that we wanted to include in our

00:18:20

system if you look at the specific

00:18:23

frameworks that are indicated in such

00:18:25

sites and look at some of the most

00:18:28

popular frameworks, you can see that

00:18:30

for example, for flask, this is one of the

00:18:32

popular simple frameworks for

00:18:35

Python for composing web services. That

00:18:38

is, those services that will be

00:18:40

used somewhere internally in our system are

00:18:43

most likely one of then the average obb

00:18:46

ist will give us 10,000 responses to

00:18:49

simple queries 5,000 if we have

00:18:52

some data read from databases and 1,000

00:18:56

rps if we have

00:18:58

some kind of writing, respectively new

00:19:00

data in our databases the more

00:19:02

productive and modern

00:19:06

http framework which is based on A

00:19:09

synchronization with the help of Katin which is in

00:19:11

Python Since recently, it has been quite

00:19:13

natively and simply supported, then we can

00:19:16

build on the fact that for simple

00:19:19

basic queries with the issuance of plain

00:19:21

text, we will have a computing

00:19:23

load of up to 30,000 queries per second, and

00:19:27

if we read from the database

00:19:29

data, then up to 10,000 requests per second, and

00:19:32

some data will be

00:19:35

updated at a speed of 1,000

00:19:37

requests per second. If you look at

00:19:40

the data using not a Cloud instance

00:19:43

aure, but some real-life

00:19:45

hardware, you can see that, of course,

00:19:47

these indicators will be higher and for the

00:19:50

last asynchronous ao GTP that

00:19:53

we considered, these indicators

00:19:54

grow to more than hundreds of thousands of

00:19:57

requests per second for simple text

00:19:59

data, tens of thousands, almost even 100,000

00:20:02

requests we will be able to process in order to

00:20:05

receive some data from the databases and

00:20:08

accordingly we will be able to process

00:20:10

5,000 requests per then in order to update

00:20:12

our data, for example, user data

00:20:15

or some other data in our databases

00:20:18

based on the analysis of the results of such

00:20:21

testing of frameworks, then we can

00:20:23

proceed for some very basic

00:20:26

assessment of the computing load on our

00:20:28

services from the fact that if we use

00:20:32

some simple cloud instances we

00:20:34

can process up to 100,000 requests per

00:20:36

second for issuing some simple

00:20:39

text data, up to 10,000 seconds we

00:20:42

can process requests for

00:20:45

getting data from the database and up to

00:20:47

1,000 requests per second for changing

00:20:49

data in the database.

00:20:51

then real physical

00:20:53

servers, then we can estimate that,

00:20:55

accordingly, all these indicators will be

00:20:58

five times higher, that is, up to 500,000

00:21:01

requests with simple text data,

00:21:03

up to 000 requests per second for reading from the

00:21:06

database and up to 5,000 requests for

00:21:09

obtaining changes to some

00:21:12

existing data in in the tables, of

00:21:15

course, everything will depend on what

00:21:17

specific load we will have, what

00:21:20

load peaks may be included, how

00:21:22

it will be distributed. Based on this,

00:21:25

we may need to produce even more

00:21:27

than we expected, but

00:21:29

accordingly, this can be understood

00:21:31

after a more detailed immersion in

00:21:34

various scenarios using our

00:21:36

system Well, the last type of load

00:21:38

that we wanted to analyze is

00:21:40

the load on the storage, that is, based on the

00:21:43

fact that the user, for example, downloads

00:21:45

some photos, publishes some

00:21:47

information. We can estimate how much on

00:21:50

average they will generate loads,

00:21:52

for example, per day per month or on the horizon of a

00:21:55

certain number of years and estimate how

00:21:58

many particular disks we

00:22:01

will need to store all this

00:22:02

information, as well as how many disks we may

00:22:05

need to cope with

00:22:07

some instantaneous

00:22:09

reading or

00:22:10

writing load, we will proceed from the fact that

00:22:13

There are several popular types of

00:22:15

hardware storage that we can

00:22:17

use. The first type of storage

00:22:20

that we will use is

00:22:22

hard drives or HD. That is, these are drives

00:22:25

that have a very slow

00:22:28

random read speed. But even with

00:22:30

sequential read writes, their

00:22:32

speed usually does not exceed several

00:22:34

hundred megabytes per second easily. find disks

00:22:38

available that allow you to store up to

00:22:40

20 TB and at a cost of up to $500,

00:22:44

thus hard drives - This is the

00:22:46

cheapest way to store

00:22:48

information the next type is SSD I

00:22:52

indicated here right away nvm SSD

00:22:54

there is also Ida SSD which

00:22:57

will be limited on average like this the same

00:22:59

speed as nvme SSD hard drives, they

00:23:03

also have speeds of up to 35 GB per second when

00:23:06

connecting PCI Express version 3,

00:23:09

more modern PCI Express

00:23:11

versions 4 or 5 will support

00:23:14

sequential read speeds of up to 710

00:23:17

or even more gigabytes per second, while being

00:23:20

easy to find for corporate clients

00:23:22

disks that store up to 8 TB and at the same time

00:23:25

they cost up to 2,000 dollars. That is,

00:23:28

this is a faster but at the same time more

00:23:30

expensive way of storing information and it is

00:23:33

often used to store some

00:23:37

more sensitive information or information

00:23:39

that we need to receive more quickly,

00:23:41

whereas hard drives are used to

00:23:43

store some more long-term

00:23:46

information that is accessed by our

00:23:49

services less often, for example, and the third way to

00:23:52

store information, which is not even

00:23:54

so obvious, is the RAM

00:23:57

of our computers

00:23:58

or, accordingly, on servers we can

00:24:00

usually install up to one or

00:24:03

maybe 2 TB in total RAM, the

00:24:06

speed of accessing

00:24:07

RAM will reach 50 and

00:24:10

maybe even 100 GB per second on the most

00:24:12

modern systems, and the

00:24:15

cost of one 128 GB stick will be

00:24:19

about $1000. That is, we can

00:24:21

say that this is generally the most expensive

00:24:23

method of storing data In a computer, there is a

00:24:25

disadvantage that after

00:24:28

our server is turned off, all

00:24:30

the information stored in

00:24:32

RAM actually disappears, so

00:24:35

usually such storage is used to

00:24:38

store some super-

00:24:40

relevant information, for example, we can

00:24:42

use RAM to

00:24:44

store caching of some data

00:24:47

which our system accesses

00:24:48

most often and, accordingly, we can

00:24:50

get it the fastest, for example, this is

00:24:53

information about which users

00:24:55

we have online, or for example,

00:24:57

if we use some service

00:25:00

that displays a feed of the latest posts

00:25:03

for some popular users, then

00:25:05

for popular users this

00:25:07

We can read the finished tape

00:25:09

and store it in RAM, and when

00:25:12

other users access it

00:25:13

to get, for example, the latest tweets of

00:25:16

some famous figure, then this

00:25:20

information will be obtained especially

00:25:22

quickly without the need to resort to

00:25:25

some slower means of

00:25:27

storing information, if assessed based

00:25:31

on available inventory What is the average

00:25:33

price for storing this or

00:25:36

that type of information on this or that

00:25:39

type of media? We can conclude that on

00:25:41

average storing 1 TB of data will cost

00:25:45

us $10,000 in

00:25:47

RAM up to $300 on solid-state

00:25:51

drives or SSDs and up to $30 on

00:25:54

hard drives or HDDs, while

00:25:58

one large server most likely

00:25:59

stores or uses up to 1 TB of

00:26:04

RAM up to 50B. That is, if we

00:26:07

use eight terabytes of nme SSDs,

00:26:10

put eight or six of them there, then

00:26:13

we will have on average somewhere... then up to 50 TB

00:26:16

or we can use a dozen,

00:26:19

for example, hard drives or even maybe

00:26:21

a little more, but we will proceed from the fact

00:26:23

that one server supports storage of up to

00:26:26

200 TB when using hard

00:26:30

drives. If we have a lot of drives, we

00:26:33

should not forget that they can

00:26:35

fail if we have

00:26:37

thousands or tens of thousands of disks, then based on

00:26:40

the fact that in modern, even reliable

00:26:42

hard disks, the average number of

00:26:45

failures is 1% That is, if you

00:26:48

use, for example,

00:26:49

100,000 disks, then in a year

00:26:53

at least 100 of them will fail, so don’t

00:26:55

forget that we need a reserve of disks and storage

00:26:58

Well, in general, don’t forget about

00:27:00

duplication and, accordingly, creating

00:27:03

backups for Important information, this

00:27:05

generally applies to any systems, even

00:27:08

your home ones. And the last thing that can be

00:27:10

added is by type of load and by various

00:27:13

indicators that we can

00:27:15

use as a starting point when

00:27:17

assessing this or that type of load on

00:27:20

our system, there is useful information

00:27:23

that is given on the slide, that is,

00:27:26

one of the Google engineers

00:27:28

published a post in 2011 in which he

00:27:31

indicated the main types of delays that

00:27:34

arise in the system when using

00:27:36

this or that another type of storage medium.

00:27:39

That is, for example, How much time does a

00:27:42

computer spend to get

00:27:43

information that is stored in the

00:27:45

processor cache, to get information

00:27:47

that is stored on the hard drive, or

00:27:49

to get information that needs to be

00:27:51

transferred between different servers in one

00:27:54

location or even between different servers

00:27:57

in different parts world on different

00:28:00

continents, therefore, we should not forget

00:28:02

that such an additional contribution to this

00:28:05

or that fast action our system

00:28:07

also exists, the specified engineer also

00:28:11

provided a geographical representation.

00:28:13

To understand that some types of

00:28:16

data access differ visually

00:28:19

by tens and hundreds of times, we should not forget

00:28:22

that when you want to transfer some

00:28:24

data and you have an idea about how in

00:28:27

real time you would have all

00:28:29

users exchange information

00:28:30

with each other or create some kind of

00:28:33

application with online chat for all

00:28:35

users around the world, do not

00:28:37

forget that accordingly the transfer

00:28:39

of data between these users or

00:28:41

between users on our servers

00:28:43

will also take some time. Well, then in the

00:28:47

next lectures we will discuss the question of How

00:28:49

to deal with the fact that we want to make our

00:28:52

service either super fast Or so that

00:28:55

all users see the same thing at

00:28:58

different ends world always at the same time and

00:29:01

how realistic it is to make all these

00:29:04

wishes and add on top and

00:29:07

Reliability and other indicators so that

00:29:11

self-education of the above about

00:29:13

different types of load and assessment of what

00:29:16

load can be caused by certain

00:29:19

scenarios for using our system, we

00:29:22

will start from different

00:29:24

indicators to different types of load on the

00:29:26

traffic user side, we will

00:29:28

evaluate how many active

00:29:30

users we have every month, how many

00:29:32

active users we have every day, what are

00:29:34

the scenarios for using our service among

00:29:36

our users and how much they

00:29:38

generate this or that content.

00:29:41

Thus, we will be able to understand what kind of

00:29:43

load we will have on network computing

00:29:46

and storage will also be useful to understand

00:29:49

what is the ratio of certain

00:29:52

types of load based on

00:29:53

user scenarios, for example, for

00:29:55

some

00:29:56

microblog,

00:29:58

most likely the read load will be several

00:29:59

times higher than the write load for network

00:30:03

connections, we know that retention is even

00:30:05

10 and 100,000 connections on one instance is

00:30:08

not a question for a long time, so we can

00:30:11

estimate how much an instance will require

00:30:13

for some super massive load and

00:30:16

some super large number of

00:30:18

connections, but most likely this will not be the

00:30:20

main bottleneck of our

00:30:23

server from the network load side, we

00:30:25

know that retention is approx. 10 and even

00:30:28

100,000 connections are no longer a

00:30:31

question, so most likely the number of

00:30:33

necessary connections we will have

00:30:35

will not be the main bottleneck in

00:30:37

our system, we know that even cloud

00:30:40

instances provide us with up to 1 Gibi per

00:30:43

second channel for communication between different

00:30:46

parts of our system. also when

00:30:49

generating some network traffic We

00:30:51

will most likely pay no more than

00:30:53

10 cents per gigabyte of network traffic

00:30:56

between parts of our system or

00:30:59

between users and our system in

00:31:02

terms of computing load we

00:31:04

know that a simple Cloud

00:31:07

instance will cope with with a load of

00:31:10

100,000 responses of some simple

00:31:12

text data, we will be able to process

00:31:14

up to 10,000 requests to obtain some

00:31:18

simple data from tables, as well as up to

00:31:21

1,000 requests per second to update

00:31:24

some more complex data in tables,

00:31:26

from a storage point of view, we know that

00:31:28

We have HDD hard drives that

00:31:31

allow us to load or read up to

00:31:35

300 MB per second. At the same time, they cost

00:31:38

us $ 30 to store 1 trub of

00:31:40

data. There is a solid state drive or

00:31:43

SSD that allows us to transfer

00:31:46

data at a speed of 5 GB per second and at the same time

00:31:49

they cost to us at 300 dollars per

00:31:51

terabyte And also there is RAM

00:31:54

for storing some actual

00:31:56

operational data,

00:31:57

for example, this could be caching

00:31:59

some frequently used data, while the

00:32:02

speed will be around 50 gigabytes per

00:32:05

second, but the cost of such so-

00:32:08

called storage is up to 10,000 dollars

00:32:11

per terabyte and we also assume

00:32:14

that in one physical

00:32:16

instance we will have up to 1 TB of

00:32:19

RAM, up to 50 TB is stored on

00:32:23

flexible drives, while we also

00:32:25

take into account that in one

00:32:27

machine we can have up to 1 TB of

00:32:31

RAM up to 50 TB is stored on

00:32:34

several solid state drives or

00:32:36

SSDs, but we can also store 200 or even

00:32:39

more terabytes using hard drives

00:32:42

or HDDs. Let's now try to estimate the

00:32:45

load on the system using the example of

00:32:48

some well-known services. The first

00:32:50

example is a link shortener. Quite a

00:32:53

simple service like this Previously, they were often

00:32:56

used in various social networks

00:32:58

in which there was a limit on the length of

00:33:00

characters. Therefore, if you want to

00:33:03

share some kind of link which, for

00:33:05

example, can contain several hundred

00:33:07

characters, then of course such links were either

00:33:10

not placed in messages at all or

00:33:12

took up too much space, which is why

00:33:15

services appeared such as Tiny URL or

00:33:18

beatly, which allowed you to insert

00:33:20

some kind of long link and get

00:33:22

some new Al for the link, which you

00:33:26

could then insert your message

00:33:29

to evaluate What load on our

00:33:31

service will be we can make an

00:33:33

assumption about how many

00:33:35

users we have and what the pattern of their

00:33:38

behavior For example, if we can

00:33:40

assume that we carry out

00:33:43

hundreds of millions of reductions every month,

00:33:45

while for example the number of monthly

00:33:47

users we have is 10 million and every day

00:33:51

we have 1 million active

00:33:53

users, and on average each person

00:33:56

makes three reductions from this we

00:33:58

like times and we get our estimate of 100 million, and

00:34:01

since this is a service for

00:34:03

publishing links on a social network, then

00:34:06

accordingly this link will then be

00:34:09

accessed and clicked on much

00:34:11

more than it was recorded. Well, we recorded it

00:34:15

once, so we can assume, for

00:34:17

example, that the read-to-write ratio

00:34:19

in our service has

00:34:21

a ratio of 100: od since we create

00:34:25

100 million links every month and

00:34:28

we have a read-write ratio of 100 to od, we

00:34:31

can assume that we

00:34:33

have 10 billion calls to

00:34:36

our service to obtain the actual

00:34:38

full link based on the

00:34:41

short link identifier at in order to estimate the

00:34:44

load on creating records, we need to

00:34:46

divide 100 million records created per

00:34:50

month by the number of seconds in a month,

00:34:53

we can round up that in one day we have

00:34:55

100,000 seconds, although in fact there

00:34:57

are

00:35:06

86,000 seconds, and if we divide 100 million

00:35:09

records per month, then this will be 30-something

00:35:12

rps Or we can assume that Well, let’s say no

00:35:15

more than 40 rps, we have a load on creating

00:35:18

records, and since the load on reading is

00:35:21

100 times greater, then

00:35:23

accordingly we have a 4,000 rps load on reading.

00:35:27

Moreover, if, for example, each record

00:35:30

can accommodate in 1 kb Well, since to

00:35:33

create a simple record like the original

00:35:37

link is a shorter link and let’s say

00:35:39

some metadata about the user ID

00:35:42

or at the time of creation,

00:35:44

we can obviously assume that this is a completely

00:35:47

correct assumption based on the fact that we only

00:35:49

need 500 or 000 characters.

00:35:53

Based on of this, if we simply

00:35:55

multiply the size record, then we

00:35:58

will get the already generated network traffic

00:36:00

which will be 4 MB per second

00:36:03

or 32 megabits if we look at the

00:36:06

horizon of 5 years since we know that

00:36:09

100 million new records are created every month,

00:36:12

we have 12 months a year and we

00:36:15

multiply this by 5 years, we get that we need to

00:36:17

create and store 6 billion records, and

00:36:21

since each of them takes up no

00:36:23

more than 1 KB, we can conclude that we

00:36:26

need to store only 6 TB of information

00:36:29

based on all the estimates that we made

00:36:32

for the network load for computing

00:36:34

load generated network traffic

00:36:37

total occupied storage space in general

00:36:39

we can conclude that based on

00:36:42

all these components we can say that

00:36:45

our entire service our entire system will be able to

00:36:48

run as a whole and on one average

00:36:50

ordinary workstation that

00:36:53

we come across no more than several

00:36:55

thousand dollars The next example of a service

00:36:57

that we will consider in general will be

00:37:00

quite similar to the previous one. But this is a

00:37:02

service that allows you to not just

00:37:04

save a long link and transfer

00:37:07

some short ID. It allows you to

00:37:09

save some large texts in

00:37:11

order to share them with yours. friends

00:37:13

or colleagues, for example, you want to

00:37:15

share some piece of code that does

00:37:17

not work for you, or for example

00:37:20

some long error log you want to pass on to

00:37:22

users so that they can look,

00:37:24

for example, on some forum and we will help you with a

00:37:27

solution to this or that problem

00:37:29

Based on what we were told that

00:37:31

such a service will have up to 1

00:37:34

million downloads per day, that is, we will

00:37:37

assume that we have 10 million

00:37:39

users per month, daily activity

00:37:43

is carried out by 500,000 users and

00:37:45

we have two downloads on average, we

00:37:48

assume that the read-

00:37:50

write ratio in our service there will be 10 to od,

00:37:52

unlike the previous service, by

00:37:53

shortening links since most likely

00:37:56

those texts are code or logs that you

00:38:00

share, most likely they will be viewed on

00:38:01

average by fewer people than

00:38:03

will follow shortened links in

00:38:06

some popular posts on popular

00:38:08

social networks Based on these

00:38:10

assumptions, we can conclude that

00:38:12

we will have a total of 300 million calls to

00:38:15

our service every month since

00:38:17

we have 1 million downloads per day, the

00:38:20

record reading ratio is 10: od plus

00:38:23

we have 30 days a month, if the download is for

00:38:27

creating records, then we we can

00:38:30

conclude that since we download 1 Mil

00:38:33

records per day in the Day, we again Do

00:38:36

super approximately that we have 100,000

00:38:38

seconds and it turns out that we have a load

00:38:41

on creating a record of 10 rps, but since the

00:38:44

load on reading is 10 times greater,

00:38:47

it will be be 100 rps if each

00:38:50

entry fits in 10 KB That is,

00:38:53

we make the assumption that on average

00:38:55

our logs or the code that will be

00:38:58

loaded will be,

00:39:00

say, 5,000 characters then we will

00:39:03

generate traffic up to 1 MB per second

00:39:06

or 8 Mbit if just like in the

00:39:08

previous case, look at the Horizon

00:39:10

for five years, then we will store 30 million

00:39:14

records per month for 5 years and 12 months, that

00:39:17

is, 2 billion records, since each of

00:39:20

them is on average 10 KB in size, then in total we

00:39:23

store 20 TB of records for what

00:39:26

On average, either one fairly expensive

00:39:29

hard drive or just a few

00:39:32

disks of a more average popular

00:39:34

size are enough, and here we can draw the same

00:39:37

conclusion as in the case of the previous service

00:39:39

that, in general, we will most likely be

00:39:41

satisfied with one simple workstation

00:39:44

within a few thousand dollars, the

00:39:47

next service with which we have all

00:39:48

come across in one way or another - this is an

00:39:50

auto-completion service. That is, we have

00:39:53

some kind of search string in

00:39:55

our service, or we have a service that

00:39:57

actually provides us with a search

00:39:59

string if it is a search service, but

00:40:02

let’s say when booking hotels

00:40:04

or restaurants most likely you can

00:40:06

search by the name of

00:40:08

restaurants or by some

00:40:10

dishes that are there, for example, that is, we

00:40:14

have users who enter

00:40:15

some data and get some kind of

00:40:18

answer based on what

00:40:21

queries we have that start with those

00:40:23

letters which we have already introduced, we will

00:40:25

proceed from the assumption that our service is,

00:40:27

of course, very popular, that

00:40:30

up to 1 billion calls are made to it

00:40:32

every day, while we have, let’s say,

00:40:35

10 ml of unique requests that are often

00:40:38

repeated among different users and

00:40:41

the requests consist of, on average, let’s say

00:40:43

five words up to 10 characters long, then

00:40:46

we can conclude that on

00:40:48

average we have 50 characters in each request. Why do we

00:40:51

need 200 bytes to

00:40:54

put these characters in some

00:40:57

encodings like Unicode, while

00:41:00

storing all requests will take us 2 GB of

00:41:03

storage if see that we have

00:41:06

10 million unique requests and each of

00:41:08

them is no more than 200 bytes, it turns out to be

00:41:12

2,109 bytes or 2 GB If we assume that

00:41:16

every day we have some

00:41:17

new unique requests, let’s say

00:41:20

some 5% of the requests we previously had in

00:41:23

general didn’t see that if we take the

00:41:26

Horizon again for 5 years, then we can estimate

00:41:29

that to the 2 GB that we initially have,

00:41:32

we will be added in 2,000 days,

00:41:36

which we can assume that this is a

00:41:37

rough estimate for 5 years, since in each

00:41:40

year we have 365 or no more than 400 days

00:41:44

it turns out that we have 2,000 days a

00:41:46

five percent increase to 2

00:41:48

GB it turns out in total that we have 200 GB of

00:41:53

approximately a bottle of data stored on the

00:41:55

horizon for 5 years Well, for such

00:41:57

indicators, at least from the storage side, it turns out that

00:42:02

one simple small one is enough for us

00:42:04

workstation within a few thousand

00:42:06

dollars, the next type of service that

00:42:09

we will consider will no longer be designed

00:42:11

for storing and issuing simple KS

00:42:13

Information, but for storing arbitrary

00:42:16

files that you need to save or

00:42:18

synchronize between different

00:42:20

computers. This is a Cloud disk. Of

00:42:24

course, there are a lot of them. Let’s consider a

00:42:27

hypothetical service that again, of

00:42:30

course, it is very popular and there are up to

00:42:32

1 billion users, while we have

00:42:36

about 100 million active every day, of

00:42:38

which each user has

00:42:40

five different devices connected, that is, for

00:42:43

example, a phone tablet, several

00:42:45

laptops or a computer, and

00:42:47

files are synchronized between them

00:42:50

Let's say each user stores

00:42:52

up to 100 files of about 100 KB in size,

00:42:55

let's say these are some text

00:42:57

documents that He transfers or

00:42:59

presentations and active users

00:43:02

work, for example, with one file And

00:43:04

thereby they change it and we need to update

00:43:08

this file somewhere in our storage

00:43:10

based on from these assumptions we can

00:43:12

conclude that, for example, on the

00:43:14

network side, we need to update 100

00:43:17

million files every day because we have 100 million

00:43:20

active users per day, they

00:43:22

update one file, files with an average

00:43:24

size of 100 KB, thus we need to

00:43:27

update 10 TB of different files, it turns out

00:43:31

that 10 TB per day since we have 100,000

00:43:35

seconds in a Day, approximately from this we can

00:43:37

conclude that we will have traffic of the

00:43:39

order of 1 gigabyte to transfer these 10

00:43:41

TB per day and also what we will need to

00:43:44

store On the horizon 5 years, that is, 5 years

00:43:48

for roughly speaking, 400 days a year and 10

00:43:51

TB of data transfer every day, since

00:43:54

all these files need to be stored and recorded, it

00:43:58

turns out that we need to store 20

00:44:00

petabytes of data. Moreover, if we

00:44:03

generate traffic of 1 GB per second and

00:44:06

each file is 100 KB in size, we can

00:44:08

conclude that we are transferring

00:44:09

10,000 files at the same time and for this

00:44:12

we need to keep 10,000 connections

00:44:14

open from the side of the computing

00:44:16

load since we need to write

00:44:18

100 million records per day, we have 100,000

00:44:21

seconds in a Day, approximately from this

00:44:23

we will conclude that we have 1000

00:44:26

requests per second to create

00:44:29

or update a record in the database from a

00:44:32

storage point of view, since we have 1

00:44:34

billion users and they store 100

00:44:37

files of 100 KB each, in total we get that

00:44:39

we need to store 10 petabytes of files, and

00:44:43

since we transmit traffic, we need

00:44:45

to calculate how much it will cost us 20

00:44:48

pba of traffic at the average price that we

00:44:51

previously analyzed is 10 cents per gigabyte,

00:44:54

it turns out that we need 2 million dollars

00:44:58

to store 10 petabytes of data, based on the

00:45:02

fact that we store 1 TB on a hard drive

00:45:05

for 30 dollars, then 10 petabytes of storage

00:45:08

will cost us 300,000 dollars Now

00:45:12

if we look at our obtained

00:45:15

values and remember that how much of

00:45:18

this or that storage can be stored

00:45:21

on servers or what load can

00:45:24

these or other simple servers

00:45:26

or cloud instances withstand, we can

00:45:29

conclude that we will

00:45:31

need up to a hundred servers for data storage alone, since

00:45:34

we you need to store 10 pib files and the

00:45:37

connection and traffic as a whole can withstand,

00:45:40

roughly speaking, one server because 1

00:45:43

Gigabit per second and 10,000 open

00:45:46

connections - This is quite simple now

00:45:48

and for one average machine. But it turns out

00:45:52

that here we will still rely

00:45:53

more on a data node or those instances on

00:45:56

which we will store our data on

00:45:59

hard drives and, accordingly,

00:46:01

supporting our entire system will cost

00:46:03

several million dollars over a

00:46:04

9-year horizon, taking into account the fact that we

00:46:07

still need to pay a lot for transmitting

00:46:09

traffic the next popular service on

00:46:11

which we store not only what - this is

00:46:13

simple text data and some

00:46:16

content from users, let’s say

00:46:18

gram. That is, this is an application

00:46:20

for sharing photos with your

00:46:23

friends or your subscribers where

00:46:26

users can upload photos and

00:46:28

videos and accordingly we will try to

00:46:30

evaluate based on its popularity.

00:46:32

And what is the load from different parties

00:46:35

account for such a system, we will make

00:46:38

the assumption that the rgam service is of course

00:46:40

very popular and it has only up to 1 billion

00:46:44

users, while every day

00:46:46

let’s say 100 million users come in,

00:46:49

upload some kind of photo, let’s say

00:46:51

up to 100 KB in size and, accordingly,

00:46:53

also view photographs of other

00:46:55

different We

00:46:56

will proceed from the fact that views to

00:47:00

downloads have a ratio of 100 to od,

00:47:02

that is, we download one photo but at the same time

00:47:04

view 100 others on

00:47:06

average, then from the network

00:47:09

load side, uploading 100 million photos

00:47:12

with 100,000 seconds in one day

00:47:15

gives us 1,000 rps since we

00:47:19

upload a photo with an average size of

00:47:21

100 KB and a

00:47:24

multiplyable size with a width of 1 Gbit and

00:47:28

for 5 years. We, as in the previous case,

00:47:32

pump 20 pib of traffic since reading

00:47:35

and downloading has a ratio of 100 to one, so

00:47:37

we get

00:47:39

100,000 rps for reading, the network channel is loaded on

00:47:43

100 Gbit we need to hold 100,000

00:47:46

connections at the same time and in

00:47:48

total we will transfer 2 eb of traffic for

00:47:51

reading from the computing

00:47:53

load since we have 1000 PS for

00:47:57

writing and 100,000 PS for reading metadata,

00:48:00

that is, we can proceed from

00:48:02

how much we need for For this

00:48:04

instance, we only need to store all 20

00:48:07

petabytes of uploaded photos, plus a

00:48:09

certain amount of metadata about

00:48:11

users, when they last

00:48:12

logged in, and so on, but this is

00:48:14

insignificant compared to the main data

00:48:17

that we store. Well, the final

00:48:19

cost of supporting such a system,

00:48:21

of course, mainly affects transmission of

00:48:23

traffic since we estimated a byte of traffic

00:48:26

is not a little 200 million dollars we

00:48:29

will need to transmit such traffic

00:48:32

but in order to store 20 pib of photos we will

00:48:35

need hardware worth 600,000 dollars.

00:48:38

Similarly as in the case of the previous

00:48:41

service, we conclude that in order to

00:48:43

store all the photos that is,

00:48:46

given the server on which our photos are stored,

00:48:49

we will need hundreds of such

00:48:51

servers, but in order to maintain the

00:48:54

traffic connection, we will need

00:48:57

dozens more, taking into account that we also

00:48:59

have a fairly high load on the database.

00:49:01

Well, supporting the entire final

00:49:04

system will cost us in hundreds of millions of

00:49:06

dollars, mainly, of course, due to the

00:49:08

massive amount of transmitted

00:49:09

traffic, the next example of a service with

00:49:12

user content is

00:49:14

Telegram. That is, again, this is a service where

00:49:16

our users can create

00:49:18

some kind of text information or some kind of

00:49:20

Media, for example photographs, let’s make

00:49:23

an assumption that we have 100 million

00:49:25

active daily users of our

00:49:27

service, let’s say each user

00:49:30

sends an average of 100 messages a

00:49:32

day, with the sizes being, let’s say, 1 kb,

00:49:36

since these can be very

00:49:37

short text messages, as well as

00:49:39

small photographs or some kind of

00:49:41

stickers. Therefore, we will based on this

00:49:44

assessment Well, we will

00:49:46

assume that each message is read by 10 people on average,

00:49:48

because some of the messages you

00:49:50

send are personal, so they are

00:49:52

read by one person, and some of the messages

00:49:54

you send to groups. Therefore, we will

00:49:56

proceed from this assessment that our

00:49:58

post read ratio will be 10: od from the

00:50:01

point of view of network load We have 100

00:50:04

million users generating

00:50:06

100 messages every day, while we have 100,000

00:50:09

seconds in one day, we get that our

00:50:12

load on creating messages is 100,000

00:50:14

rps since each message is on

00:50:17

average 1 kb in size, the generated

00:50:20

traffic is 1 Gbit per second and

00:50:23

we can also estimate that in n years we

00:50:25

will transfer 20 pib of traffic since the

00:50:29

reading load is 10 times greater, then our

00:50:31

reading load is 1 million rps

00:50:35

we load 10 Gbit of traffic we need to

00:50:38

hold 1 million connections simultaneously

00:50:40

and in total In 5 years we will transfer 200 pib of

00:50:44

traffic from a computational point of view we need to

00:50:47

create 100,000 new records per

00:50:50

second and also read 1 million records

00:50:53

per second from databases to store all

00:50:56

created messages we need 20 pib of

00:50:59

space plus a small amount of

00:51:01

metadata just like in case with the

00:51:02

previous service and again, the

00:51:05

cost is more likely to be influenced by our traffic.

00:51:08

That is, transferring 200 pib to us will cost us

00:51:11

$20 million, but for storage we

00:51:14

need $600,000 to buy

00:51:16

hard drives and, accordingly, also

00:51:18

servers for them, in the end we can

00:51:20

do it again The conclusion is that we need

00:51:22

hundreds of servers to store data to

00:51:24

hold simultaneous connections and to

00:51:27

create and read records from the database,

00:51:30

while maintaining such a system will

00:51:32

again cost us tens of millions of

00:51:34

dollars over a 5-year horizon. The next

00:51:37

service we are considering is Twitter.

00:51:39

That is, this a service in which users

00:51:41

create short posts, sometimes

00:51:43

attaching some media files to them, and

00:51:46

then other users accordingly

00:51:48

view these status updates and

00:51:52

receive some information or thoughts from

00:51:55

people who are interested in listening. We will

00:51:58

assume that we have 100 million

00:52:01

active users every day in

00:52:03

our service and each of us makes one

00:52:06

tweet and checks the feeds from other

00:52:08

users, let’s say ten different

00:52:11

other users, while the reading of

00:52:13

the entry will be roughly 100 to

00:52:15

one since, accordingly, T

00:52:18

is created a large number of

00:52:19

users read it, let’s say at the

00:52:22

beginning of the feed you see 10 messages from

00:52:24

each of the other users since

00:52:27

you are subscribed to 10 users, the

00:52:29

ratio is 100 to one, the

00:52:31

load on the network when creating

00:52:34

new tweets. That is, you need to

00:52:36

create 100 million of them every day, 100,000

00:52:39

seconds in one day, the load

00:52:41

on creating new posts is 1,000 rps

00:52:44

since each record Size is no more than,

00:52:47

let’s say, 10 KB That is, you have text

00:52:49

information, sometimes the attached pictures

00:52:52

turn out to be 1,000 multiplied by 10 KB,

00:52:54

which means that our generated traffic

00:52:56

is a total of 100 Mbit per second and

00:52:59

On the horizon of 5 years you need to transfer 2

00:53:02

pib of data like this Since the load of reading and

00:53:06

writing is 100 to one, then for

00:53:08

reading you get a load of 100,000

00:53:11

rps, we occupy 10 Gbit of traffic, you need to

00:53:14

hold 100,000

00:53:16

connections simultaneously and in total we will transfer 200

00:53:19

pib of data, from a calculation point of view,

00:53:22

our load is carried out in 1,000

00:53:25

requests every second recording

00:53:27

100,000 queries is necessary to read from the

00:53:30

database, plus for such a service

00:53:32

most likely we have some complex

00:53:34

Joy between different users and their

00:53:37

friends, it is possible to use some

00:53:40

queries to create an algorithmic

00:53:42

feed for recommending tweets, accordingly

00:53:45

this can lead to an even greater

00:53:46

load But this we will be able to

00:53:48

analyze further, in

00:53:51

order to store all such created tweets,

00:53:53

we will similarly need 2

00:53:55

peta bytes of space plus some

00:53:58

insignificant space for storing

00:53:59

metadata about users, their

00:54:01

activity, last date of

00:54:03

account creation, and so on, transferring

00:54:06

200 petabytes of traffic will cost

00:54:08

the same 20 million dollars to

00:54:11

store two petabytes of space, we will

00:54:13

need to purchase hard drives for

00:54:15

60,000 dollars, and in general we can

00:54:18

again conclude that we need

00:54:20

dozens of servers to store all the

00:54:22

data Plus we need additional

00:54:24

servers for

00:54:25

calculations and some kind of creation of an

00:54:28

algorithmic tape, and so on further, well,

00:54:30

supporting the entire system will cost us

00:54:32

tens of millions of dollars over a

00:54:34

5-year horizon, at least based on the heavy

00:54:37

load on data transfer from

00:54:40

the network. Well, the last service that we

00:54:43

will analyze today is netflix. That is,

00:54:45

it is a platform for watching

00:54:47

video content of films and TV series from the

00:54:49

library We will make our estimates on the

00:54:52

assumption that we have 10 ml of

00:54:54

daily users. We have

00:54:56

20,000 films in the library and each

00:54:59

user, let’s say, watches either a

00:55:01

couple of episodes or one film. During an

00:55:03

average of one hour, we will omit the load on the part of the

00:55:06

content creators themselves

00:55:08

for now. network

00:55:10

load If we proceed from the assumption

00:55:12

that we are watching a video

00:55:14

resolution of 1080 B, the bitrate of such videos is 10

00:55:17

Mbit per second. Then the total traffic will be

00:55:20

generated by 10 million users

00:55:22

who watch such content on average an

00:55:25

hour, that is, roughly speaking, 4,000 seconds and

00:55:28

with a bitrate of 10 megabits per second we

00:55:31

get that per day we generate 40 pib of

00:55:34

network traffic or 40 million GB from the point of view of

00:55:38

calculations, we are accessing

00:55:40

most likely some metadata from the

00:55:42

user, metadata of the series in order to

00:55:44

get some links to the direct

00:55:46

video recording itself and transfer it

00:55:48

to the application, it turns out that We have 10 million

00:55:51

users, let’s say, receive

00:55:53

metadata for some

00:55:55

different series and we have 100,000 seconds

00:55:57

out of it, the load on receiving

00:55:59

metadata is 1,000 PS, that is, quite

00:56:02

insignificant against the background of the traffic

00:56:04

that we estimated for the network from a

00:56:07

storage point of view, if we assume that we

00:56:09

there are 20,000 different titles But

00:56:11

since we have both films and TV series there,

00:56:13

let’s roughly estimate that for each

00:56:15

title we store 10 hours of content,

00:56:19

it turns out that we have 20,000, 10

00:56:22

hours in each hour We have 4,000

00:56:24

seconds approximately And if we also

00:56:27

assume that we also store

00:56:29

some original resolution, for example 4K, for

00:56:31

which Trey will no longer be 10 Mbits per

00:56:33

second but 50, we have a

00:56:36

version of 50 Mbits per second, 10 Mbits per

00:56:38

second and some versions with even lower

00:56:40

resolution if we add it all up different

00:56:43

bitrate for different resolutions and

00:56:46

multiplied by everything that was before, we

00:56:48

get that in total We need to store 5

00:56:51

pib of information. Well, if now we

00:56:54

try to draw a conclusion

00:56:55

based on calculations just like in the case of

00:56:57

other services discussed earlier, then

00:57:00

we will make a simple Conclusion: our

00:57:02

network traffic We simply will not be able to

00:57:04

pay no matter how rich a company

00:57:06

we are, so netflix, of

00:57:09

course, does not try to pay for traffic

00:57:11

according to the estimates and tariffs that we

00:57:14

considered earlier; in fact,

00:57:16

netflix participates in a special initiative

00:57:19

for providers when it transfers to them

00:57:21

directly hard drives with films in what

00:57:25

special locations of the providers And

00:57:27

when you open the netflix application and

00:57:29

try to watch a film, you do

00:57:31

not download this film from some

00:57:33

conditional netflix servers, but you

00:57:36

download this film most likely from the

00:57:37

server of your own provider

00:57:40

thereby this is also more profitable for the provider,

00:57:42

since it does not transmit traffic outside, but

00:57:45

simply transfers it to your device, and of

00:57:48

course, this is also profitable for Netflix, since

00:57:50

netflix does not pay for the traffic that

00:57:53

it needs to transmit, for example, between

00:57:55

different continents. Well, of course, these

00:57:58

would be some kind of problems a summary of all 4K

00:58:02

films between all users on all

00:58:04

continents at a price of 10 cents per

00:58:08

gigabyte. This concludes today’s lecture.

00:58:10

We have considered the issue of assessing

00:58:13

the load that may be on our

00:58:15

service. Well, then we will move on to

00:58:17

the implementation of some basic schemes for

00:58:20

our systems and consider Next we will ask

00:58:23

the question: Actually, how to

00:58:25

develop our system and how to make sure

00:58:26

that our system satisfies the

00:58:28

requirements that we have not left

00:58:30

since we have already assessed that probably,

00:58:33

hypothetically, such a system could

00:58:35

exist and we will generally be able to pay for it

00:58:37

based on those assessments

00:58:39

we did today

Description:

Учитесь Data Science с нами: https://karpov.courses/

Preparing download options

Popular

HD video

Only sound

All

* — If the video is playing in a new tab, go to it, then right-click on the video and select "Save video as..."

** — Link intended for online playback in specialized players

Questions about downloading video

How can I download "3.2 Расчет нагрузки на систему" video?

http://unidownloader.com/ website is the best way to download a video or a separate audio track if you want to do without installing programs and extensions.
The UDL Helper extension is a convenient button that is seamlessly integrated into YouTube, Instagram and OK.ru sites for fast content download.
UDL Client program (for Windows) is the most powerful solution that supports more than 900 websites, social networks and video hosting sites, as well as any video quality that is available in the source.
UDL Lite is a really convenient way to access a website from your mobile device. With its help, you can easily download videos directly to your smartphone.

Which format of "3.2 Расчет нагрузки на систему" video should I choose?

The best quality formats are FullHD (1080p), 2K (1440p), 4K (2160p) and 8K (4320p). The higher the resolution of your screen, the higher the video quality should be. However, there are other factors to consider: download speed, amount of free space, and device performance during playback.

Why does my computer freeze when loading a "3.2 Расчет нагрузки на систему" video?

The browser/computer should not freeze completely! If this happens, please report it with a link to the video. Sometimes videos cannot be downloaded directly in a suitable format, so we have added the ability to convert the file to the desired format. In some cases, this process may actively use computer resources.

How can I download "3.2 Расчет нагрузки на систему" video to my phone?

You can download a video to your smartphone using the website or the PWA application UDL Lite. It is also possible to send a download link via QR code using the UDL Helper extension.

How can I download an audio track (music) to MP3 "3.2 Расчет нагрузки на систему"?

The most convenient way is to use the UDL Client program, which supports converting video to MP3 format. In some cases, MP3 can also be downloaded through the UDL Helper extension.

How can I save a frame from a video "3.2 Расчет нагрузки на систему"?

This feature is available in the UDL Helper extension. Make sure that "Show the video snapshot button" is checked in the settings. A camera icon should appear in the lower right corner of the player to the left of the "Settings" icon. When you click on it, the current frame from the video will be saved to your computer in JPEG format.

What's the price of all this stuff?

It costs nothing. Our services are absolutely free for all users. There are no PRO subscriptions, no restrictions on the number or maximum length of downloaded videos.