Телеграмм чат группы scrapy_python страница 2384

parrot.ru

Рейтинг популярных групп и каналов

В рейтинге участвует:

групп:

каналов:

Виртуальный сервер на SSD - недорого!

Аренда выделенных и виртуальных серверов (VDS/VPS), хостинг, аренда IP-адресов, администрирование, круглосуточная поддержка

qwarta.ru подробнее

Резервное копирование с проверкой на вирусы!!!

Удобный сервис создания резервных копий на любой сервер сети интернет. Отслеживайте изменения, проверяйте на вирусы. Надежно защитите свой бизнес!

go.backupland.com

Выбираете сервер? Любая конфигурация на заказ!

Аренда физических серверов любых конфигураций под любые запросы - 1С бухгалтерия, игровые сервера, нагруженные проекты, интернет-магазины!

qwarta.ru подробнее

Size: a a a

Scrapy

810 membersпожаловаться на группу

2021 January 29

H

Harsh in Scrapy

I had question related to captcha solving while crawling a site with scrapy.

The site I'm trying to scrape have captcha before reaching the target ( an pdf link ).

We could use 2captcha or other services for solving captchas. I just don't know how to incorporate in scrapy crawler.

I feel the asynchronous nature of scrapy won't allow to wait for the captcha solution to be submitted from solving service. I could be wrong.

If someone have experienced same problem, please give insight. Thanks

источник

21:15пожаловаться #1

M

I had question related to captcha solving while crawling a site with scrapy.

The site I'm trying to scrape have captcha before reaching the target ( an pdf link ).

We could use 2captcha or other services for solving captchas. I just don't know how to incorporate in scrapy crawler.

I feel the asynchronous nature of scrapy won't allow to wait for the captcha solution to be submitted from solving service. I could be wrong.

If someone have experienced same problem, please give insight. Thanks

If captcha is simple, you can use middleware to handle it. As I remember, it should be called after Httpmiddleware

источник

21:20пожаловаться #2

К

Кирилл in Scrapy

I had question related to captcha solving while crawling a site with scrapy.

The site I'm trying to scrape have captcha before reaching the target ( an pdf link ).

We could use 2captcha or other services for solving captchas. I just don't know how to incorporate in scrapy crawler.

I feel the asynchronous nature of scrapy won't allow to wait for the captcha solution to be submitted from solving service. I could be wrong.

If someone have experienced same problem, please give insight. Thanks

Yes, there is no simple solution for this. You can try to scrape synchronously if your tasks allow you to scrape slowly.
In the opposite case, you have to split the flow of requests into separate flows, sessions, stop some requests, and retry them to wait for the solution of a captcha

источник

21:21пожаловаться #3

H

Harsh in Scrapy

If captcha is simple, you can use middleware to handle it. As I remember, it should be called after Httpmiddleware

Thanks for Middle ware hint.
It's cloud flair bot detection, followed by re capatch2. So two different requests I guess.

источник

21:23пожаловаться #4

H

Harsh in Scrapy

Yes, there is no simple solution for this. You can try to scrape synchronously if your tasks allow you to scrape slowly.
In the opposite case, you have to split the flow of requests into separate flows, sessions, stop some requests, and retry them to wait for the solution of a captcha

Thanks for the inputs.
Yeah. We could go with synchronous approach.

источник

21:24пожаловаться #5

К

Кирилл in Scrapy

Also you have to write your own functions to interact with captcha solving services because most of them have libs based on the synchronous requests package

источник

21:25пожаловаться #6

К

Кирилл in Scrapy

Thanks for the inputs.
Yeah. We could go with synchronous approach.

synchronous way is the easiest

источник

21:26пожаловаться #7

H

Harsh in Scrapy

Also you have to write your own functions to interact with captcha solving services because most of them have libs based on the synchronous requests package

Yeah. We'll need to use simple requests, form reuqest.

I checked for example etc on GitHub, so far no luck with it. I'll search scrapy issues if I find something.

источник

21:26пожаловаться #8

H

Harsh in Scrapy

synchronous way is the easiest

Yeah :)

источник

21:26пожаловаться #9

К

Кирилл in Scrapy

Thanks for the inputs.
Yeah. We could go with synchronous approach.

Then just put your requests in a chain(call every request from the callback of the previous request), check every response for a captcha. With this approach, you can even use requests from within scrapy without any problem

источник

21:33пожаловаться #10

H

Harsh in Scrapy

Then just put your requests in a chain(call every request from the callback of the previous request), check every response for a captcha. With this approach, you can even use requests from within scrapy without any problem

Currently, I start navigation through home page to target, other information is scrapped in that journey. For i.e car name, car year, make etc.

captchas comes for all and last page where pdf download is

So If only I could chain last response with all information in meta tag, I could add pdf link to all that information from solved captchas.

Could you give hint on in general how we make it synchronous? The scrapy requests?

источник

21:42пожаловаться #11

К

Кирилл in Scrapy

Then just put your requests in a chain(call every request from the callback of the previous request), check every response for a captcha. With this approach, you can even use requests from within scrapy without any problem

I've given

источник

21:44пожаловаться #12

К

Кирилл in Scrapy

A chain of scrapy requests

источник

21:45пожаловаться #13

H

Harsh in Scrapy

A chain of scrapy requests

i.e

def get_captcha(self, response):
  yield Request ( url = 2captchaEndpoint, callback = parse_something)

источник

21:47пожаловаться #14

H

Harsh in Scrapy

A chain of scrapy requests

I found a reference link. Thanks sir :)

источник

21:48пожаловаться #15

СТ

Семён Трояновский... in Scrapy

Sounds like you probably don't need scrapy at all

источник

21:56пожаловаться #16

H

Harsh in Scrapy

Семён Трояновский

Sounds like you probably don't need scrapy at all

Actually, we have crawlers running in scrappinghub. So it'll be good if it could be achieved with scrapy

источник

23:09пожаловаться #17

H

Harsh in Scrapy

Otherwise it'll be big change to move to apify for puppeteer based automation. It may come to that too, yet in not recent future

источник

23:10пожаловаться #18

A

Andrii in Scrapy

Кто-то замерял если ли разница в скорости скрапинга если запускать его из допустим пайчарма или из wsl?

источник

23:30пожаловаться #19

AR

Andrey Rahmatullin in Scrapy

Пайчарм-то при чём

источник

23:31пожаловаться #20