Official documents: https://scrapy.org/
1, Introduction to the Scrapy framework
Writing a reptile requires a lot of work. For example: sending network request, data analysis, data storage, anti crawler mechanism (changing ip proxy, setting request header, etc.), asynchronous request, etc. It's a waste of time if you have to write these jobs fro ...
Posted by justice1 on Wed, 25 May 2022 13:30:41 +0300
A regular expression is a special sequence of characters that can help you easily check whether a string matches a certain pattern.
The re module also provides exactly the same functions as these methods, which take a pattern string as their first argument.
re.match attempts to match a pattern from the beginning of th ...
Posted by cahva on Wed, 25 May 2022 09:02:15 +0300
Web crawler (also known as web page) spider , network robot, in FOAF In the middle of the community, more often referred to as web page chaser), it is a kind of automatic crawling according to certain rules web A program or script that contains information. Other names that are not often used are Ants , automatic indexing, emulator ...
Posted by Blekk on Sun, 22 May 2022 18:28:14 +0300
Building your own website is one of the signs of success as a coder,
Then there are other signs of success, holding Bai Fumei in your left hand, rolling a small barbecue in your right hand, and stepping on Santana under your feet
Some children will say, uncle fish, aren't you designing a database table? How do you return the p ...
Posted by kingcobra96 on Fri, 20 May 2022 09:38:56 +0300
Station P crawler, the analysis process crawls the original drawing in batches png
Website link of P station
1. If you want to crawl the original image in batches, you must first be able to find the download url of the original image. You can't eat a fat man in one bite.
Select the picture review element to find the address of the picture
Posted by le007 on Thu, 19 May 2022 17:47:59 +0300
1, Web page analysis and crawling fields
1. Crawl field
There are not many crawling fields, only three fields are needed, and the "content" field needs to be crawled in the details page
2. Web page analysis
Starting URL https://www.zhihu.com/explore
The discovery section is a typical ajax loading page.
We open the web page, ...
Posted by ethridgt on Tue, 17 May 2022 17:45:20 +0300
Recently, I learned about Android reverse, contacted the APP of TB system, and learned that the APP of large manufacturers is for data security. This article mainly introduces the HOOK process of the signature parameter x-sign of a treasure live APP. Of course, other parameters can also be HOOK. This article is only for learning and communicati ...
Posted by nogeekyet on Fri, 13 May 2022 16:39:44 +0300
js cracking process
Static web page dynamic web page
step1: Find Parameters step2: analyze js function step3: analyze parameters step4: verify step5: convert to python code
Many people learn python and don't know where to start. After learning python and mastering the basic g ...
Posted by ds111 on Tue, 10 May 2022 10:10:57 +0300
The last article talked about how to use the combination of sweep and selenium to crawl data. This article is about how to use selenium to crawl websites that use Ajax to load data and pass the anti crawl.
All the environments used in this article have been configured in the previous article. If you don't know ...
Posted by hbradshaw on Sun, 08 May 2022 23:47:21 +0300
Kuan X-App-Token reverse analysis
It is only used for research and learning. It is forbidden to apply relevant technologies to improper ways. For example, if it infringes on privacy or rights, please contact me to delete it immediately
I have nothing to do. Today, let's analyze the difficulties of data capture of ku'a ...
Posted by gkwhitworth on Thu, 05 May 2022 02:44:14 +0300