background top icon
background center wave icon
background filled rhombus icon
background two lines icon
background stroke rhombus icon

Download "Парсим Avito.ru при помощи Python 3 (часть 2) - собираем номера телефонов."

input logo icon
Video tags
|

Video tags

парсим
парсинг
скрапинг
python
парсер
бот
selenium
webdriver
pillow
pytesseract
tesseract
tesseract-ocr
ocr
ajax
javascript
avito
распознавание
собрать
как написать парсер на python
парсер на питоне
парсинг авито
парсинг с selenium
как парсить сайт
python парсинг avito
парсинг авито python
парсинг номеров авито
python ocr
python image to text
python tesseract
python скрапинг html
python парсинг html
Subtitles
|

Subtitles

subtitles menu arrow
  • ruRussian
Download
00:00:00
good afternoon with you Oleg Molchanov
00:00:03
and today we will continue the party of
00:00:05
the form
00:00:06
again we will return to the phone crisis and like the
00:00:12
games last time we will ask for a
00:00:16
phone number let's look at this
00:00:20
page there is a button on the
00:00:25
right today we need clips for it
00:00:27
and in response a
00:00:32
pop-up window is generated where you will be shown a
00:00:36
phone number, and as we see, this
00:00:39
phone number is a
00:00:42
picture, so we collected all the complexity and
00:00:45
text data just
00:00:47
poured with the picture, we’ll have to tinker,
00:00:50
we’ll need to take a screenshot and then
00:00:55
recognize the phone number, and that’s
00:01:00
what we’ll focus on today
00:01:04
let's prepare
00:01:07
our working environment, let's create a folder,
00:01:13
you can see it
00:01:17
. of our project, today I will
00:01:19
use the virtual environment
00:01:27
kek to isolate my project I motivate
00:01:36
source
00:01:38
no bodies the specified folder bin
00:01:44
active activated let's
00:01:48
see what packages power are
00:01:51
installed therefore the environment and frieze
00:01:56
yes dollars and none
00:02:00
in this case of all virtual
00:02:03
environments
00:02:04
it is such a clean system that is
00:02:06
isolated from global packages, or rather
00:02:09
from packages
00:02:10
installed by collabs for each
00:02:12
specific project, we install
00:02:15
the necessary versions of the packages we need, no one
00:02:20
bothers anyone, this is the essence and environment,
00:02:23
since our page is already like this, our
00:02:27
task already requires some kind of
00:02:29
interactivity, click on the button to
00:02:31
get some kind of pop-up menu
00:02:35
there is something else to do, the beautiful soup
00:02:38
that we used last time is
00:02:41
no longer suitable, we need to use
00:02:44
another thing called lilac,
00:02:47
let's install it and the table will be
00:02:57
responsible for navigation, searching for elements,
00:02:59
creating screenshots, and so on, that
00:03:03
is, all this work will
00:03:05
produce with the help of Syria we are creating
00:03:10
our script
00:03:13
this will be tel
00:03:16
oh yes we will give an
00:03:24
entry point to develop mine he is the most
00:03:40
for now
00:03:41
and now we will invest in
00:03:46
the settlement ready this time we will
00:03:59
create a class a whole class for this business
00:04:03
kindergarten 1st class here we describe the constructor and
00:04:14
he tells us to accept only one argument
00:04:18
salov we create a driver variable that
00:04:29
firefox imagines and this is
00:04:34
where problems can arise
00:04:37
now I will dwell on them in more detail
00:04:42
so pay attention when I drive
00:04:45
the poet it starts me up automatically 35 this is
00:04:48
thanks to the isolated environment okay
00:04:52
not the point silver fox
00:05:09
when you enter this line and give the
00:05:13
command and execute the interpreter, a
00:05:15
new instance of firefox is created
00:05:18
if you installed selenium for the first time,
00:05:23
it can develop occult errors, or rather,
00:05:26
I have a bunch of errors, it’s just a big 3-bet,
00:05:29
but the error is that
00:05:32
gecko drive 2 frames wish was not found and it
00:05:38
will need to be installed separately, let
00:05:41
me tell you briefly how to do this
00:05:44
violin google app with hat and now it
00:05:56
gives us the first
00:06:00
link to the repository with these
00:06:02
drivers,
00:06:03
but for each operating system
00:06:06
we can find our Allen x64,
00:06:09
so I’ll choose saving
00:06:15
since I already
00:06:20
edited it left for me I saw
00:06:26
unpack the archive
00:06:28
now our next task is to
00:06:32
copy the directory to definitely
00:06:35
copy it needs to be copied with administrator rights
00:06:38
degree diagram
00:06:41
driver path this leader the entire user has been a
00:06:50
log login administrator password
00:06:58
after this it has been copied or it
00:07:00
should work for you firefox gives me
00:07:04
an error because you are the
00:07:07
main user and now you are busy with this process and
00:07:09
it says that the file is busy last this
00:07:14
error should not be there
00:07:15
if it appears for you it means you have
00:07:18
them installed on the system and so
00:07:23
we created a firefox direct which has
00:07:28
now launched successfully for us we
00:07:31
call the method by us gate
00:07:35
this is the name he came up with to take
00:07:40
it define the debate
00:07:49
here we are talking about the driver this
00:07:54
driver which was created in the constructor
00:07:59
driver grandfather rays cards we have this
00:08:04
page speech here and
00:08:11
copied let's check how it all
00:08:17
works don't forget the functions make an
00:08:20
instance of our bot,
00:08:23
here you were, the designer of the bot, since
00:08:30
we work in the virtual environment every now and then
00:08:32
using an external console,
00:08:39
when I figure out how to use the sport in the
00:08:42
virtual environment in subs, I’ll shoot a
00:08:45
separate video of the
00:08:48
tank to the metro and wait with our paw, it will take
00:08:56
some time until he creates a new
00:09:00
instance of Fox, that’s it, that’s it
00:09:03
The page loaded normally, as we can see
00:09:10
everything is on fashion, so close the
00:09:12
next step, we need to press
00:09:18
this button with the phone, let’s look at it
00:09:20
carefully again, the
00:09:28
headlight tank opens somewhere cool.
00:09:33
and here she is, up to the big track, in
00:09:41
fact, the population provides small
00:09:44
many methods, great opportunities for
00:09:48
navigating around the house of the
00:09:50
document with laziness of memory, they are all night
00:09:57
for their creation and there is something by the Ottomans, let's
00:10:02
see the works,
00:10:06
all these methods are fine changes and the
00:10:09
elements file you are like in a soup,
00:10:11
this returns one element, this can
00:10:14
return not a list by ID
00:10:19
in the name of expose support to the link text by hotel
00:10:24
by class by selector all with I
00:10:28
will use them in this video saved now
00:10:31
I will tell you what this is a
00:10:33
custom tariff saved this is a way to
00:10:36
access a specific
00:10:37
element structure and members of just everything
00:10:41
let's look for this button here is a button 2
00:10:48
tons with such and such a class
00:10:54
by installing a temporary motor here we
00:11:00
call again
00:11:01
driver x
00:11:11
expanse this line kind of mind quotes become
00:11:15
further go s2 heard or one layer 1
00:11:19
hear means absolute addressing that is
00:11:22
absolute addressing goes from the beginning of the
00:11:25
entire document in Ottawa hdmi and to
00:11:29
specific ones,
00:11:30
this is quite a large chain and if
00:11:36
suddenly something some element appears
00:11:40
and disappears, for example they added an
00:11:44
advertising block on top, then expose absolute
00:11:47
it breaks and stops working, so it is
00:11:52
still advisable to use
00:11:53
relative paths existence relative to
00:11:56
some element which certainly
00:12:00
will not disappear somewhere left therefore
00:12:05
for relativity and paths instead of seeing
00:12:07
two hearing
00:12:08
then we point out so cotton and on the mushroom
00:12:14
there for a long time with batum all the specifics of which
00:12:19
relate to this therefore it turns out to be in
00:12:21
square brackets
00:12:23
Samara and here we are already using some of
00:12:28
our CSS classes or what we have there, we
00:12:32
have Kaunas, so after the dog
00:12:36
we write the word class machine, this part already
00:12:40
reminds us of the Fox form, and unfortunately we
00:12:44
will have to copy
00:12:46
this entire class if beautiful foot soup is to look for
00:12:50
elements according to one of the classes, it is
00:12:52
composite, for example, in this
00:12:55
video format they are packaged in court, I could find it
00:12:59
selenium doesn’t work like that, so
00:13:02
I have to copy this entire class,
00:13:07
look how long it is, but we copied it and
00:13:14
got this button, the next step is we
00:13:20
have to click on this button
00:13:22
bank click riveted let’s Let's see, well,
00:13:31
let's take another screenshot and immediately
00:13:35
call the L.T. method.
00:13:41
screenshot of this me there him let's
00:13:45
write it you account link accepts the
00:13:55
argument surf play
00:14:01
safe call the as if screenshot method
00:14:05
save screenshot and
00:14:09
pass the file name or the screen as an argument
00:14:17
let's look, that is, we
00:14:21
get the following we came to this
00:14:25
page
00:14:26
we found a button by x paulson here we
00:14:32
used looked for a button with a class with
00:14:37
this hefty class found this
00:14:39
button
00:14:40
recorded on this button
00:14:43
with a temporary bottom and then clicked on it
00:14:47
clicked because here the
00:14:52
firefox windrunner object is in this
00:14:55
variable batum the next three bullets did not
00:15:00
take a screenshot calling the function screenshot the
00:15:04
same driver of ours is transmitted,
00:15:06
this firefox and the call to the saif method,
00:15:13
the screenshot is saved, we release it
00:15:18
today, we will launch it today and
00:15:20
probably most of the time it takes us, they
00:15:22
know, to the demonstration for verification,
00:15:30
so the page loads,
00:15:40
he clicked the button, now we call it, a
00:15:44
screenshot will appear, here is our screenshot
00:15:47
perfectly only the one around is not the one
00:15:50
we are waiting for, why is it not the one, simply because the
00:15:54
page
00:15:59
is so good it will load, that is,
00:16:03
we clicked on the yakso button, the request went to the
00:16:06
server, the server took a long time before it
00:16:08
responded, probably take
00:16:11
this phone number from the database, turn
00:16:13
it into a picture and render it this is
00:16:16
the window that’s why he didn’t get there, didn’t
00:16:20
make it in time, they took the screenshot too early,
00:16:23
so here we need to skip a little
00:16:29
time, so I’ll take the
00:16:34
import module and antami
00:16:38
fountain prevented
00:16:41
Smith Oleg drivers, in fact, there are
00:16:45
methods for waiting for
00:16:47
these new ones to suffer so I’m stupidly
00:16:49
left
00:16:51
if both the cats and we have three seconds
00:16:56
saved saved and let’s run everything one
00:17:03
more time
00:17:14
so
00:17:15
we load 5 have to wait
00:17:26
or the click succeeded well oh yes is a
00:17:31
screenshot has the waves it
00:17:34
seems normal here the saved
00:17:38
screenshot is exactly the one we need
00:17:41
super the next thing we do is a screenshot
00:17:49
taken now we don’t need it by
00:17:55
itself, we only need this
00:17:57
element tag and night, that is, we
00:18:02
need to take this picture, the
00:18:03
picture itself cannot be saved by scratching
00:18:05
because most likely this is a
00:18:09
property but it is generated on YouTube
00:18:12
with every request
00:18:14
and this is crazy and I I decided in this way I
00:18:20
now large it like this like this
00:18:24
in size this picture size then
00:18:27
we can get
00:18:28
moved we can now we
00:18:30
will deal with these were me at the bed of nettles
00:18:35
as follows all this of course
00:18:38
function guidance 22
00:18:49
we must again
00:18:52
find this button your button picture
00:19:06
must find this picture let's
00:19:10
save it here let it be nil
00:19:18
driver export look again at the
00:19:30
picture and we need this div
00:19:36
because we most likely won't be able to find this tree and streams
00:19:41
so we will find in this div and from this
00:19:45
container goes this container has already been
00:19:48
born hold we group kvass vulture
00:19:55
we have a hole contents where it is in square brackets
00:19:59
dog class this and gave us a
00:20:06
div to itself this one is not needed this container
00:20:09
this we go further class hearing
00:20:14
and put an asterisk asterisk means
00:20:17
any element any because in these
00:20:21
containers there is
00:20:22
only one element, we will receive it
00:20:27
here, we got this picture, the
00:20:31
next thing that interests us is
00:20:35
launched on the already needed bloodlines and now
00:20:40
let’s look more clearly what is crop
00:20:47
circumcision circumcision pair mke chic this is the
00:20:51
frame
00:20:56
this is the picture you are the first point that we
00:21:03
need for cereals x y and the coordinates of the
00:21:08
second point
00:21:10
x one and y one that is, the clear
00:21:14
coordinates of these two points can be
00:21:16
built this rectangle, yes,
00:21:18
so we need to get the coordinates of
00:21:20
this point from for this there is a
00:21:25
location property some location on the image
00:21:30
location more precisely on the method they method in the
00:21:33
sense it returns the words this is on
00:21:39
the dictionary and the contents of the dictionaries it is
00:21:44
the keys x with this that value and the key y also
00:21:51
with some value we got the following
00:21:56
that we need this union is also a
00:21:58
property like this Solis this is also a
00:22:03
dictionary of
00:22:06
meetings of rings width is
00:22:09
also some kind of value here and find the
00:22:15
height
00:22:18
taxes will be needed to calculate 2
00:22:21
points you can area the coordinates of the first
00:22:23
point you got the width you
00:22:25
found the coordinates of the second point and got
00:22:28
the height you
00:22:29
found the coordinates along the y axis then by
00:22:34
adding and adding to Alex
00:22:46
we get the coordinates x 1 and
00:22:52
accordingly when you
00:23:02
add the height with The coordinates of the first
00:23:07
point in Russia are the same, we turn on the game
00:23:10
one, this point, why we add it up
00:23:14
because the top point is 0 0 to this is
00:23:17
our first million, the
00:23:18
top tap is the top left. the screen is
00:23:24
00, respectively, if we move down,
00:23:29
we add
00:23:30
if we need to move up, we
00:23:33
subtract, this is what we will
00:23:38
do now, and we got the coordinates, we learned
00:23:42
the extent of space in this
00:23:44
picture, now we call the method
00:23:47
creamy, large, everything unfortunately
00:23:51
works out for me and it’s a very beautiful water
00:23:53
function number ok, I’ll write the circle functions
00:24:00
now, we’ll pass two parameters
00:24:02
de ok
00:24:04
rat from we’ll write the functions dill lion
00:24:14
so that I’ll clap,
00:24:16
of course we’ll bathe the picture or but
00:24:22
they haven’t ported the picture yet, we
00:24:28
need to install
00:24:29
the library one and the throne,
00:24:34
then there was a saw library
00:24:36
but it’s no longer supported and now
00:24:40
it works and and fork silt and means therefore
00:24:45
you need a person
00:24:46
library to work with the image
00:24:52
so everything has become
00:24:56
import the image leave blood played
00:25:10
class constructor and night and transfer
00:25:16
the file we call this a screenshot and
00:25:21
interview we have a picture now we
00:25:29
say girls
00:25:32
blood and I forgot to pass on the arguments here
00:25:37
and they passed on the arguments to me here didn’t accept
00:25:40
them you speak them of course vis-a-vis
00:25:44
speak like this with the student called the circle I
00:25:52
got excited frame and we didn’t find
00:25:55
the coordinates x since this is a dictionary
00:26:03
let’s say goodbye to the Unix key Grishina again
00:26:10
the dictionary eric vice now actually
00:26:29
functions circle circle takes one
00:26:32
argument so we need to use
00:26:35
a tuple I pass the coordinates of 1 point x y
00:26:41
the coordinates of the second point which in the
00:26:45
picture we see on me coordinates and and you
00:26:48
points we add the width and accordingly
00:26:53
the height
00:26:54
so x plus
00:27:01
y +
00:27:03
hide height we got the ship and save
00:27:08
its file
00:27:12
gif gif why because we
00:27:15
take up less space and also because we have a
00:27:19
two-color picture, we just
00:27:21
save it, check it,
00:27:40
but you will now take a screenshot of it all the time
00:27:43
if you want the coordinates of a normal
00:27:51
service, they lured trade,
00:27:53
or maybe not,
00:27:59
also our farig or error 6, there is no king yet
00:28:12
no,
00:28:16
but it appeared great in the fall, what
00:28:19
is our phone please
00:28:22
great microbes or our image and
00:28:26
this is our phone isn’t it beautiful okay
00:28:31
now we need to recognize this in order to
00:28:34
recognize this we
00:28:37
must have the library installed in our system and
00:28:40
those direct and north it is in the
00:28:44
repositories, which is almost an
00:28:47
addition from the registrar,
00:29:02
yes, I already have it installed in the system
00:29:05
and therefore everything is fine with us,
00:29:09
it’s probably not installed, so this is
00:29:12
the system library,
00:29:14
now we need it, I’m the library,
00:29:18
so to speak, and the table is there to absorb it
00:29:26
for him use is go direct
00:29:33
for the project links of course me to all this
00:29:38
we have everything installed
00:29:43
import
00:29:46
oh you for
00:29:50
you which the string and the dream string
00:30:01
so how much did they save it for me
00:30:06
because how it turns out how I do it I
00:30:08
get a cascade call of these methods
00:30:12
that’s why I’m here laid this to do forces
00:30:22
will be recon
00:30:26
bodies and like knives and as below on the galley as a
00:30:29
general recognition there on the spit
00:30:35
carnosine again we create an object on the
00:30:53
map of cyclic print we will
00:31:04
use the print function here does not
00:31:12
call the key of the thick ring as an
00:31:19
argument it will give our object
00:31:21
revich created and where Demchenko, through
00:31:26
opening the file, wrote it into a variable
00:31:29
and the ball be on the ground function and dream with
00:31:31
match arrows, we
00:31:33
check everything, we forget
00:31:37
what we opened last time,
00:31:45
of course, botch
00:31:57
yes, for this, in order
00:31:59
not to wait every time, imagine you a hundred thousand, you need to
00:32:01
parse it all 100000 phones or
00:32:06
some kind of interaction with such sites
00:32:10
where there is a lot of Ajax java script for this,
00:32:14
of course, selenium is used, very inconvenient
00:32:17
brushes, so-called headless
00:32:20
browsers, they come with a phantom type browser, and you
00:32:25
need to use them,
00:32:27
looks at something here, here is our
00:32:31
phone number, please note attention, here it is,
00:32:36
let's see if this is a
00:32:39
phone number up to 8 906 278 275 and all
00:32:47
the task for today is completed in this
00:32:51
way, you can use all my
00:32:53
phones and the captcha is simple by the way, and
00:32:59
if you liked it, like it,
00:33:01
subscribe to the channel,
00:33:03
thank you and all the best to you

Description:

Мои курсы: Boosty: https://boosty.to/omolchanov/posts/995a18dd-487b-4000-9b3f-0aafa5e060cd Patreon: https://www.patreon.com/posts/karty-vsekh-41011404 Показываю способ как можно автоматизировать процесс сбора номеров телефонов с Avito.ru. Сбор телефонов делаю через создание скриншота страницы с телефоном, кропаю полученную картинку по размеру телефона и сохраняю в gif. Полученный gif распознаем (OCR - optical character recognition). Пишем скрипт на Python 3. Библиотеки: 1. Selenium: - pip install selenium 2. Pillow - pip install pillow 3. Pytesseract: - pip install pytesseract - sudo apt-get install tesseract-ocr *** ИСХОДНЫЙ КОД *** Основных проектов доступен в Patreon: https://www.patreon.com/posts/iskhodnyi-kod-26640469 *** 🔷 Для донатов. Всегда очень признателен за это: https://www.donationalerts.com/r/omolchanov ⭐ "Практический курс парсинга сайтов на Python" ⭐ Лендинг курса: https://zaemiel.github.io/courses/ О курсе и карта курса: https://www.patreon.com/posts/30462246 Видео о курсе: https://www.youtube.com/watch?v=aRsbRYZxTGA

Preparing download options

popular icon
Popular
hd icon
HD video
audio icon
Only sound
total icon
All
* — If the video is playing in a new tab, go to it, then right-click on the video and select "Save video as..."
** — Link intended for online playback in specialized players

Questions about downloading video

mobile menu iconHow can I download "Парсим Avito.ru при помощи Python 3 (часть 2) - собираем номера телефонов." video?mobile menu icon

  • http://unidownloader.com/ website is the best way to download a video or a separate audio track if you want to do without installing programs and extensions.

  • The UDL Helper extension is a convenient button that is seamlessly integrated into YouTube, Instagram and OK.ru sites for fast content download.

  • UDL Client program (for Windows) is the most powerful solution that supports more than 900 websites, social networks and video hosting sites, as well as any video quality that is available in the source.

  • UDL Lite is a really convenient way to access a website from your mobile device. With its help, you can easily download videos directly to your smartphone.

mobile menu iconWhich format of "Парсим Avito.ru при помощи Python 3 (часть 2) - собираем номера телефонов." video should I choose?mobile menu icon

  • The best quality formats are FullHD (1080p), 2K (1440p), 4K (2160p) and 8K (4320p). The higher the resolution of your screen, the higher the video quality should be. However, there are other factors to consider: download speed, amount of free space, and device performance during playback.

mobile menu iconWhy does my computer freeze when loading a "Парсим Avito.ru при помощи Python 3 (часть 2) - собираем номера телефонов." video?mobile menu icon

  • The browser/computer should not freeze completely! If this happens, please report it with a link to the video. Sometimes videos cannot be downloaded directly in a suitable format, so we have added the ability to convert the file to the desired format. In some cases, this process may actively use computer resources.

mobile menu iconHow can I download "Парсим Avito.ru при помощи Python 3 (часть 2) - собираем номера телефонов." video to my phone?mobile menu icon

  • You can download a video to your smartphone using the website or the PWA application UDL Lite. It is also possible to send a download link via QR code using the UDL Helper extension.

mobile menu iconHow can I download an audio track (music) to MP3 "Парсим Avito.ru при помощи Python 3 (часть 2) - собираем номера телефонов."?mobile menu icon

  • The most convenient way is to use the UDL Client program, which supports converting video to MP3 format. In some cases, MP3 can also be downloaded through the UDL Helper extension.

mobile menu iconHow can I save a frame from a video "Парсим Avito.ru при помощи Python 3 (часть 2) - собираем номера телефонов."?mobile menu icon

  • This feature is available in the UDL Helper extension. Make sure that "Show the video snapshot button" is checked in the settings. A camera icon should appear in the lower right corner of the player to the left of the "Settings" icon. When you click on it, the current frame from the video will be saved to your computer in JPEG format.

mobile menu iconWhat's the price of all this stuff?mobile menu icon

  • It costs nothing. Our services are absolutely free for all users. There are no PRO subscriptions, no restrictions on the number or maximum length of downloaded videos.