background top icon
background center wave icon
background filled rhombus icon
background two lines icon
background stroke rhombus icon

Download "Обучение парсингу на Python #7 | Парсинг сайтов на фрилансе | Requests, Beautifulsoup"

input logo icon
Video tags
|

Video tags

python
python программирование
python для начинающих
python обучение
python 3
python programming
python tutorial
Scraping
web-scraping
web scraping
parsing
что такое парсинг
парсинг
парсинг сайтов
парсинг это
обучение парсингу
как правильно парсить
парсинг данных с сайта
парсинг что это
парсер
уроки python
Beautifulsoup python
lxml python
как парсить
парсинг сайтов python
методы beautifulsoup
Web-Scraping
программы на python
фриланс
фриланс заработок
Subtitles
|

Subtitles

subtitles menu arrow
  • ruRussian
Download
00:00:07
friends, hello everyone, you are on the python
00:00:10
today channel and today we will practice
00:00:12
scraping
00:00:13
and together we will complete an order. The client needed to
00:00:15
write a script to collect data from the
00:00:18
watch website. They needed to collect
00:00:20
the name of the link models and of course
00:00:23
the price. The output script should be generated by
00:00:26
Jason and sisvi files with the recorded
00:00:28
data and save them under the current date
00:00:31
we did the work together with one of
00:00:33
my padawans artem hello without you
00:00:36
this video wouldn’t be
00:00:38
millionaires we haven’t become millionaires yet but we earned our 20
00:00:41
bucks an hour before
00:00:43
we start I want to say a special thank you to the
00:00:45
following subscribers friends thank
00:00:48
you for contribution to the development of the channel, thank you
00:00:50
for appreciating my work, the
00:00:52
videos are coming out largely thanks to
00:00:54
your support, we will need the
00:00:56
request
00:00:57
beautiful soup and l xml libraries, if you
00:01:00
do not have them installed yet, we will install them
00:01:03
in our virtual environment with the following command,
00:01:08
we will import them, we will
00:01:14
create a function, we will try to send a
00:01:17
request to the site, we will save the answer and
00:01:19
let's see what we can take away,
00:01:21
create a dictionary for request headers,
00:01:24
put the edjing browser user in it,
00:01:56
then send a request to the page,
00:01:59
call the get method of the request library, the
00:02:02
parameters of which we pass the legal address,
00:02:11
write the conditions for creating a directory
00:02:14
in it, save html files so as
00:02:17
not to clog up the project tree, import
00:02:21
module with a groove and call the exist method in
00:02:30
the parameters of which we pass the name of
00:02:32
the directory we want to create, everything is
00:02:35
easy to read, if the specified
00:02:38
path does not exist, then we turn to the
00:02:40
mk1 method and create a directory, we save the
00:02:45
result of the requests to the file, we turn to the
00:02:47
context manager is open, we set the
00:02:50
file name
00:02:57
and write to it the result obtained
00:02:59
by calling the text method on the library object
00:03:02
request friends, if you like to work Windows and
00:03:05
during the video you will have
00:03:07
problems with encoding when writing and
00:03:09
reading from a file, especially
00:03:11
Cyrillic recognition, then watch
00:03:13
the video about recording data all with vip-file, we
00:03:16
analyzed this let
00:03:18
's run the task and see what we
00:03:20
got, pick it up, open the page in the
00:03:26
browser
00:03:27
[music]
00:03:32
naturally we got naked honor elbe
00:03:34
styles
00:03:35
and here we compare our clocks
00:03:42
with the original everything is ok but
00:03:48
not all clocks are on the page we received
00:03:52
by clicking on the show button
00:03:54
we get more the next portion, yes, we can
00:03:58
click on the selenium button
00:03:59
or look in the network to look for the
00:04:01
sent request and see what
00:04:03
comes in response, but if we
00:04:05
look carefully at the page that we
00:04:07
managed to save, then below we will see a block of
00:04:10
nations decline; please note that on the
00:04:16
original page of this class there
00:04:18
are no such things at all
00:04:20
based on the number of pages, we need to
00:04:24
take the number 5 and then write a loop in
00:04:26
which we will move to each
00:04:28
page, save the source code and then
00:04:31
parse it, comment out the request code,
00:04:34
we don’t need it yet, we have the source code
00:04:37
saved, read the resulting page into the
00:04:39
k heart variable and proceed to
00:04:41
parking
00:04:47
we create a beautiful soup object in
00:04:50
the parameters of which we pass the
00:04:52
src variable and varnish cemil parser, then we designate
00:04:57
the variable p discount and let's take
00:04:59
the number of pages the
00:05:00
links have no classes,
00:05:04
but for us this is not a problem, we will find
00:05:06
the parent div with the class Belexpo Guinea
00:05:08
tire container and then collect
00:05:10
all the links from it we need the number to be in the
00:05:14
penultimate link
00:05:27
since we now have a list
00:05:29
using index -2 and using the
00:05:32
text method we get the number 5. If
00:05:35
something is unclear to you right now, watch the
00:05:38
first video on a detailed analysis of the
00:05:40
main methods of the beautiful
00:05:42
soup library, I think there will be no questions left, there
00:05:45
will be a link in the description,
00:05:46
we convert the resulting string to a number and
00:05:50
write a for loop in which we need to
00:05:52
go through 5 pages, use the
00:05:55
range function and add one to our number
00:05:58
since the range function does not take into
00:06:00
account the last number, that is, if
00:06:02
we specify from one to five, we will get the
00:06:07
result numbers from one to four
00:06:14
we form a ural for requests
00:06:39
we print the result we receive 5
00:06:45
links we go to the last one and check
00:06:48
whether all the watches we managed to assemble the
00:06:57
latest model with a price of almost 106 thousand
00:07:00
we load all the watches from the first page and
00:07:06
everything is correct then we send a request in a
00:07:14
cycle to each of 5 pages
00:07:22
and save them under different names
00:07:27
the name will differ numbers in
00:07:29
accordance with the iterations put a
00:07:41
short pause between each iteration
00:07:44
so that the request has time to load the data
00:07:48
let our function return the
00:07:50
number of pages
00:07:51
we will need this value in the next
00:07:53
function run the code in the directory
00:08:11
all 5 pages
00:08:13
appear open the last one and check the clocks, everything is
00:08:23
fine, we managed to collect all the necessary
00:08:26
pages, all that remains is to parse them, collect
00:08:29
and save the data we need,
00:08:31
create a new function collect data
00:08:35
accept it will sing the jazz count
00:08:37
obtained from the first function, first
00:08:40
of all we write a loop in which we will read the
00:08:42
fock page in range received earlier
00:08:45
from 1 to the iscount page,
00:08:50
open the file and save the contents into a
00:08:53
variable,
00:09:06
create a beautiful soup object, go to the
00:09:13
site and look at what we can
00:09:15
grab onto, the data we need lies in
00:09:25
div blocks with strange IT people,
00:09:27
we fall deeper and see so hey, the
00:09:30
attribute of which contains a link to a
00:09:32
detailed description watches and inside there are
00:09:35
several Peterhof that interest us,
00:09:37
in one of them there is a model of a watch and
00:09:40
in the other the price is super
00:09:42
so it has a class, copy it and
00:09:45
see if there are any extra tags with this
00:09:47
class great with this class
00:09:54
all the cards we need for the watch go
00:10:01
create a variable call the fine method
00:10:06
all we pass the tag as the first argument and the
00:10:09
class by which we select as the second.
00:10:12
Now we have a list of the necessary
00:10:15
cards, we write a for loop and go through each one,
00:10:21
first we select the article,
00:10:27
it is located in the pi tag with the class product
00:10:30
iten and tickle, we turn to the fine method,
00:10:37
specify petek
00:10:40
then the class and we get the contents
00:10:44
using the text method, similarly we find
00:10:47
and take the price list
00:10:48
and then ural
00:11:15
is located in the id attribute sheriff
00:11:19
we use the get method in the parameters
00:11:22
of which we pass the desired value
00:11:24
we print the result we first work
00:11:39
with one page
00:11:43
we call the function and run the script the code
00:11:54
works but needs to be corrected a little
00:11:56
first, let's cut off the
00:11:58
spaces in the article in the price list inside the
00:12:05
pi tag, there is also a line rub for us;
00:12:45
place the value ph discount
00:12:50
we will create a list for our data at
00:12:56
each iteration of the loop we will fill
00:12:59
it with dictionaries with new values ​​after
00:13:21
all the pages have been processed and the
00:13:23
cards have been collected we start writing
00:13:26
first to the Jason file we open the file for
00:13:29
writing with the hey flag we import the
00:13:38
Jason module we call the dump method
00:13:48
we pass our list as the first parameter,
00:13:50
then the file is the indentation intent and the parameter n
00:13:56
shura and s si ai ai with the fall flag, I have
00:14:00
already explained the meaning of these parameters more than once in
00:14:02
previous videos on parsing, see the
00:14:05
tex playlist, I almost forgot about the requirement to
00:14:08
save files under the current date,
00:14:10
import daytime module and get the
00:14:13
current date in
00:14:25
the format day month year substitute
00:14:31
the value of the variable in the name run
00:14:38
the code and see what we get in
00:14:44
the directory Jason file
00:14:46
appears open and here is the collected data with a
00:14:49
beautiful indentation everything is great now
00:14:52
let’s write the code to record everything with the vip-file
00:14:54
first write it down column headers,
00:14:57
of course, you don’t have to do this;
00:14:59
write the data at once; it all depends on the
00:15:01
customer’s imagination; open the file for
00:15:04
writing; create a writer; import the
00:15:16
module all the ESV
00:15:24
into the write method; transfer our file;
00:15:27
call the writer’s method in heaven troll;
00:15:30
and in the tuple we list the desired
00:15:33
headers; article link and price; in the
00:15:41
loop at each iteration we will add
00:15:44
lines to our file, everything is the same, only
00:15:49
the flag changes to append and of course we change
00:15:52
the value of the columns to the data we collected,
00:15:55
delete the previous file and
00:16:06
run the script with
00:16:15
Jason, everything is ok,
00:16:20
open the ESV and all the super data
00:16:27
is collected,
00:16:28
I hope the video was useful to you so
00:16:31
don’t forget to like the entire code
00:16:33
you can download on github
00:16:35
or in the telegram channel where you will find a
00:16:37
lot more useful information
00:16:39
subscribe links will be in the description
00:16:41
friends thank you so much for watching
00:16:44
if the video was useful and
00:16:46
interesting to you and you want to get more
00:16:47
practice on python and other languages,
00:16:50
be sure to like and share
00:16:52
your opinions or ideas in the comments,
00:16:54
subscribe to the channel, be healthy,
00:16:57
bye everyone

Description:

Обучение (Web-Scraping) веб парсингу на Python. В данном видео выполняем заказ на фрилансе по парсингу сайта с помощью библиотек requests и Beautifulsoup4. Научимся делать запросы, сохранять страницы, парсить из них нужную нам информацию, а после сохраним данных в файлы json и CSV формата, т.е в таблицы. 💰 Поддержать проект: https://yoomoney.ru/to/410019570956160 🔥 Стать спонсором канала: https://www.youtube.com/channel/UCrWWcscvUWaqdQJLQQGO6BA/join *****Ссылки***** Дешевый/надежный сервер в Европе: https://zomro.com/?from=246874 promo_code: zomro_246874 Хороший proxy сервис: https://proxy6.net/ Крутой заказ на фрилансе | Подбираем забытый пароль к Excel файлу с помощью Python https://www.youtube.com/watch?v=DXVs0rJ6OPM Пишем Telegram бота на Python + Загружаем Telegram бота на сервер(хостинг): https://www.youtube.com/watch?v=x-VB3b4pKcU Плейлист по распознаванию лиц на Python: https://www.youtube.com/playlist?list=PLqGS6O1-DZLpVl2ks4S_095efPUgunsJo Плейлист по парсингу сайтов на Python: https://www.youtube.com/playlist?list=PLqGS6O1-DZLprgEaEeKn9BWKZBvzVi_la Плейлист по Instagram боту: https://www.youtube.com/playlist?list=PLqGS6O1-DZLqYx83MknKLaDxaIlES2nZr Код проекта на github: https://github.com/pythontoday/scrap_tutorial И в telegram канале: https://t.me/python2day *****Соц.сети***** Telegram: https://t.me/python2day

Preparing download options

popular icon
Popular
hd icon
HD video
audio icon
Only sound
total icon
All
* — If the video is playing in a new tab, go to it, then right-click on the video and select "Save video as..."
** — Link intended for online playback in specialized players

Questions about downloading video

mobile menu iconHow can I download "Обучение парсингу на Python #7 | Парсинг сайтов на фрилансе | Requests, Beautifulsoup" video?mobile menu icon

  • http://unidownloader.com/ website is the best way to download a video or a separate audio track if you want to do without installing programs and extensions.

  • The UDL Helper extension is a convenient button that is seamlessly integrated into YouTube, Instagram and OK.ru sites for fast content download.

  • UDL Client program (for Windows) is the most powerful solution that supports more than 900 websites, social networks and video hosting sites, as well as any video quality that is available in the source.

  • UDL Lite is a really convenient way to access a website from your mobile device. With its help, you can easily download videos directly to your smartphone.

mobile menu iconWhich format of "Обучение парсингу на Python #7 | Парсинг сайтов на фрилансе | Requests, Beautifulsoup" video should I choose?mobile menu icon

  • The best quality formats are FullHD (1080p), 2K (1440p), 4K (2160p) and 8K (4320p). The higher the resolution of your screen, the higher the video quality should be. However, there are other factors to consider: download speed, amount of free space, and device performance during playback.

mobile menu iconWhy does my computer freeze when loading a "Обучение парсингу на Python #7 | Парсинг сайтов на фрилансе | Requests, Beautifulsoup" video?mobile menu icon

  • The browser/computer should not freeze completely! If this happens, please report it with a link to the video. Sometimes videos cannot be downloaded directly in a suitable format, so we have added the ability to convert the file to the desired format. In some cases, this process may actively use computer resources.

mobile menu iconHow can I download "Обучение парсингу на Python #7 | Парсинг сайтов на фрилансе | Requests, Beautifulsoup" video to my phone?mobile menu icon

  • You can download a video to your smartphone using the website or the PWA application UDL Lite. It is also possible to send a download link via QR code using the UDL Helper extension.

mobile menu iconHow can I download an audio track (music) to MP3 "Обучение парсингу на Python #7 | Парсинг сайтов на фрилансе | Requests, Beautifulsoup"?mobile menu icon

  • The most convenient way is to use the UDL Client program, which supports converting video to MP3 format. In some cases, MP3 can also be downloaded through the UDL Helper extension.

mobile menu iconHow can I save a frame from a video "Обучение парсингу на Python #7 | Парсинг сайтов на фрилансе | Requests, Beautifulsoup"?mobile menu icon

  • This feature is available in the UDL Helper extension. Make sure that "Show the video snapshot button" is checked in the settings. A camera icon should appear in the lower right corner of the player to the left of the "Settings" icon. When you click on it, the current frame from the video will be saved to your computer in JPEG format.

mobile menu iconWhat's the price of all this stuff?mobile menu icon

  • It costs nothing. Our services are absolutely free for all users. There are no PRO subscriptions, no restrictions on the number or maximum length of downloaded videos.