Climbing 1000 pictures of pure girls, I have nosebleed

A few gossips

Hello everyone, I'm your old friend Qingge. I shared an introductory practical tutorial on Java crawlers and received a lot of praise. It seems that the big guys are still very enthusiastic about crawlers. Since everyone wants to learn reptiles so much, let's arrange something exciting today. If you have to ask me how exciting I am, I can only give you three words: you know ðŸĪŠ

Insert an anti crawling message: author of this article: programmer Qingge, blog: https://blog.csdn.net/xqnode

I know that everyone is very tired in work and life. If you are a programmer, it will be even harder. I really feel a little distressed about the endless classes and bug s. Then I will give you a wave of benefits in this period. High energy warning! A large number of beautiful women are coming. Please hold it, hold it!


What? Not having fun? I knew you said that. As a brother, I can understand very well. Then you have to take a good look at the following tutorial, because it will directly determine whether you can have a good sleep tonight 😂

Enter the theme

The website we climbed today is Aesthetic girls network , let's come in and have a look

Isn't it WOW? They are all pretty little sisters 😎 Who can stand it? I'll click in and have a look


what the fuck? What the hell is this? How did the little sister become an aunt!

Director: wrong, come again!

Ah, why did I see this at a young age? Brothers, I believe in love again! Once there was a sincere love in front of me, which I didn't cherish. Now, she came back, still so pure and shy, wearing a broken flower skirt, lying on the bed and smiling at me.

Isn't this youth?
Isn't this love?

What are you waiting for? I am coming!

Web page code analysis

Back to the theme, ladies and gentlemen, I have a bad habit of being distracted when I see beautiful women. I'm really sorry. I almost forgot that I came to write code today 😅

The webmaster of this website is estimated to be bored by crawlers every day. The server is under great pressure and has been down frequently for some time. In order to solve the future problems, he simply turned off the f12 function of the web page and stopped looking at the code, so as to prevent countless se wolves from engaging in his small station. After all, it's not easy for others. You work too hard and hate too deeply, so you don't give others a way to live

Today, I did my best to write this tutorial. I tried my best to get his source code. Guess what happened to me?

I don't know if you have observed it. Usually, when you look at the source code of the website, there will be this paragraph in front of the address bar: View Source:, followed by the address of your actual website. For example, Baidu's source code is like this: View Source: https://www.baidu.com


For the same reason, we can also use this address to see the source code of the website, such as: View Source: https://www.vmgirls.com/15215.html

We can find the location of the picture from the source code analysis of the web page, and the analysis shows that the href attribute of the a tag is the address of the picture:

Click to confirm my idea:

OK, if the position of the picture is determined, it's easy to do

Start rolling code

As the old rule, we continue to use jsoup to capture web page data. First, we introduce the package of jsoup:

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>

Connect to the target web page to get the Document object:

Document document = Jsoup.connect("https://www.vmgirls.com/15298.html").get();

Get the div of the parent of the target image from the Document object. After getting the div, things will be much easier.

We get all the a tags from this div, and then judge whether the href attribute of the a tag contains the jpeg keyword. If so, it is the image we need to capture.

Elements elements = element.getElementsByTag("a");
        for (Element a : elements) {
            String href = a.attr("href");
            if(href.contains("jpeg")) {
                System.out.println(href);
            }
        }

Print the results, and you really got all the pictures.

Is this step complete? We haven't downloaded it to the local disk yet. How can you watch it secretly in the dead of night when you're not in your computer 😆

Then we have to find a way to download these links. Java downloading network files is very simple. In order to save trouble, I directly use the third-party tool class hutool. Interested students can learn about the sweet tool class hutool by themselves to ensure that you fall in love with him once you use it. Official website address: https://www.hutool.cn/

Introduce hutool

<dependency>
    <groupId>cn.hutool</groupId>
    <artifactId>hutool-all</artifactId>
    <version>5.3.7</version>
</dependency>

Download pictures to local disk:

for (Element a : elements) {
            String href = a.attr("href");
            if(href.contains("jpeg")) {
        System.out.println(href);
        // Download the picture to the imgs folder under the project root path
        HttpUtil.downloadFile("https:" + href, new File(System.getProperty("user.dir") + "/imgs"));
    }
}

Run again:

The console shows that the download is complete. Let's open the folder of the local disk and have a look:


It's perfect. Seeing the pictures of beautiful women in this row, I still feel a little restless [shy] in my heart

Batch download

What I just demonstrated is to download pictures from a single web page, so many bad guys will be very curious about how to download beauty pictures in batches?

Don't think I don't know your careful thinking, because I'm thinking about it, ha ha ðŸĪĢ

Don't talk. I really found a way.

After studying day and night, I found that the addresses of different web pages are controlled by numbers, such as: https://www.vmgirls.com/14636.html and https://www.vmgirls.com/15298.html

That means we find this number and the problem will be solved.

So where can I find this number? The answer must be on the home page. Open the source code of the home page and see:

I noticed that the a tag of class = media content contains the data we need, so let's cycle and see if we can get:

Document main = Jsoup.connect("https://www.vmgirls.com").get();
Elements medias = main.getElementsByClass("media-content");
for (Element media : medias) {
    System.out.println(media.attr("href"));
}

After running the code, I did get the address of this number, but the data is still a little confused.

We need to do further screening to screen out the address with the end of html.

Document main = Jsoup.connect("https://www.vmgirls.com").get();
Elements medias = main.getElementsByClass("media-content");
for (Element media : medias) {
    String href = media.attr("href");
    if (href.endsWith("html")) {
        System.out.println(href);
    }
}

Run again and it's perfect:

Next, it's easy to do. With the address of the web page, we only need to cycle one by one to get the data of the web page in batch!

Full code:

Document main = Jsoup.connect("https://www.vmgirls.com").get();
Elements medias = main.getElementsByClass("media-content");
for (Element media : medias) {
    String url = media.attr("href");
    if (url.endsWith("html")) {
        Document document = Jsoup.connect("https://www.vmgirls.com/" + url).get();
        Element element = document.getElementsByClass("nc-light-gallery").get(0);
        Elements elements = element.getElementsByTag("a");
        for (Element a : elements) {
            String href = a.attr("href");
            if (href.contains("jpeg")) {
                System.out.println(href);
                HttpUtil.downloadFile("https:" + href, new File(System.getProperty("user.dir") + "/imgs"));
            }
        }
    }
}
System.out.println("Download complete");


What about? Seeing this dazzling array of girls is the temptation of adolescence! Are you excited? It's all right. Take the code and run by yourself. No one knows about sneaking in the quilt at night ðŸĪ­

At least, I won't say it~

I'm Qingge, a programmer. That's right. It's me to be a funny young man. Don't hurry to pay attention to this coquettish young man 😋

My original official account: the java learning guide is currently preparing a batch of Java dry goods tutorials. Now I can catch up with it ðŸĪĐ

Thank you for reading. After reading, don't forget the third row 🍭 I'll see you next time~

Tags: Java crawler

Posted by Afser on Mon, 02 May 2022 06:24:47 +0300