[2020-09-24] Zhihu simulated login -- selenium

Disclaimer: This article is only for study and research. It is prohibited to use it for illegal purposes. Otherwise, the consequences will be at your own risk. If there is any infringement, please inform and delete it, thank you!

Project scene:

This time I bring you the selenium automatic login of Zhihu, using the qq authorized login entrance, link here , I tried to use the login API interface to simulate login before, but it didn't work~, so it's better to log in with the browser first.

solution:


1.OK, let's try it now, click the qq authorization icon, then click the account password to log in, after entering the qq account password, the slider type verification code that fills the gap will pop up.



2. At first glance, it is a headache to fill in the verification code of the missing block. This website is not bad, there is no detection of biological behavior characteristics, just calculate the distance that the slider needs to slide. The specific verification ideas are as follows:
  1. First obtain two pictures, one is a complete picture with a gap, and the other is a missing block picture.
  2. Then perform binarization.
  3. Then use cv2.matchTemplate to match the position of the gap map where the slider is located, return the distance x, and then go to the browser to get the actual sliding distance distance.
  4. The x obtained at this time is not the distance of selenium to simulate the sliding, you need to measure a few more groups (x,distance), and then use Curve Fitting Tool , to calculate the formula for the actual sliding distance required.
  5. The final distance obtained is the actual sliding distance of selenium.

3. Let’s talk about how to get the two pictures first. The addresses of the two pictures are OK.



4. Calculate the missing block position.
def get_diff_location():
    # Get the image and grayscale it
    block = cv2.imread("block.jpg", 0) # Missing block picture
    index = cv2.imread("index.jpg", 0) # Background picture
    # Image name after binarization
    block1 = "block1.jpg"
    index1 = "index1.jpg"
    # Save the binarized image
    cv2.imwrite(block1, block)
    cv2.imwrite(index1, index)
    block = cv2.imread(block1)
    block = cv2.cvtColor(block, cv2.COLOR_RGB2GRAY)
    block = abs(255 - block)
    cv2.imwrite(block1, block)
    block = cv2.imread(block1)
    template = cv2.imread(index1)
    # get offset
    result = cv2.matchTemplate(block, template, cv2.TM_CCOEFF_NORMED)  # Find the position of the block in the template, and the returned result is a matrix, which is the matching result of each point
    x, y = np.unravel_index(result.argmax(), result.shape)
    # print("Offset in x direction", int(y * 0.4 + 18), 'x:', x, 'y:', y)
    return y

5. To get the actual sliding distance, subtract the two values.




6. Then, in steps 4 and 5, take a few more sets of data and put them on the curve fitting website to output the conversion formula. Finally, we use selenium to test it, usually 1-3 times will be successful!


7. Finally post the complete code!
import cv2
import numpy as np
import time
import requests
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

def get_diff_location():
    # Get the image and grayscale it
    block = cv2.imread("block.jpg", 0) # Missing block picture
    index = cv2.imread("index.jpg", 0) # Background picture
    # Image name after binarization
    block1 = "block1.jpg"
    index1 = "index1.jpg"
    # Save the binarized image
    cv2.imwrite(block1, block)
    cv2.imwrite(index1, index)
    block = cv2.imread(block1)
    block = cv2.cvtColor(block, cv2.COLOR_RGB2GRAY)
    block = abs(255 - block)
    cv2.imwrite(block1, block)
    block = cv2.imread(block1)
    template = cv2.imread(index1)
    # get offset
    result = cv2.matchTemplate(block, template, cv2.TM_CCOEFF_NORMED)  # Find the position of the block in the template, and the returned result is a matrix, which is the matching result of each point
    x, y = np.unravel_index(result.argmax(), result.shape)
    # print("Offset in x direction", int(y * 0.4 + 18), 'x:', x, 'y:', y)
    return y

def run():
    url = 'https://www.zhihu.com/signin?next=%2F'
    option = webdriver.ChromeOptions()
    option.add_experimental_option('excludeSwitches', ['enable-automation'])  # webdriver anti-detection
    option.add_argument("--no-sandbox")
    option.add_argument("--disable-dev-usage")

    desired_capabilities = DesiredCapabilities.CHROME  # Modify page load strategy
    desired_capabilities["pageLoadStrategy"] = "none"  # Commenting out these two lines will cause a delay in the final output, that is, wait for the page to load before outputting
    driver = webdriver.Chrome(options=option)

    driver.get(url)
    time.sleep(2)
    driver.find_element_by_xpath('//*[@id="root"]/div/main/div/div/div/div[3]/span[2]/button[2]').click() # Click qq to authorize login
    time.sleep(2)
    driver.switch_to.window(driver.window_handles[-1]) # toggle handle
    time.sleep(1)
    driver.switch_to.frame("ptlogin_iframe")
    driver.find_element_by_id('switcher_plogin').click() # Click to log in
    time.sleep(2)
    #Enter account password
    driver.find_element_by_id('u').send_keys('123123123')
    driver.find_element_by_id('p').send_keys('123123123')
    time.sleep(1)
    # Click to log in
    driver.find_element_by_id('login_button').click()
    time.sleep(3)
    driver.switch_to.frame('tcaptcha_iframe')
    while True:
        # Save background image with missing blocks
        with open('index.jpg','wb') as f:
            url = driver.find_element_by_id('slideBg').get_attribute('src')
            f.write(requests.get(url).content)
        # save block diagram
        with open('block.jpg','wb') as f:
            url = driver.find_element_by_id('slideBlock').get_attribute('src')
            f.write(requests.get(url).content)
        #get slider
        button = driver.find_element_by_id('tcaptcha_drag_thumb')
        # Slide the slider
        ActionChains(driver).click_and_hold(button).perform()
        x = get_diff_location()
        print('distance before fitting',x)
        if x > 500:
            # Refresh Code
            print('The distance before fitting is wrong')
            ActionChains(driver).release().perform() # release the mouse
            driver.find_element_by_id('e_reload').click()
            time.sleep(2)
            continue
        distance = int(-0.002886710239855681*x*x*x+4.044880174577657*x*x-1888.1544118978823*x+293800.78433441074)
        if (distance > 200) and (distance < 300):
            distance -= 100
        elif distance > 300:
            print('distance error')
            # Refresh Code
            ActionChains(driver).release().perform()  # release the mouse
            driver.find_element_by_id('e_reload').click()
            time.sleep(2)
            continue
        print('Approximate distance to slide',distance)
        ActionChains(driver).move_by_offset(xoffset=distance, yoffset=0).perform()
        time.sleep(1)
        ActionChains(driver).release().perform() # release the mouse
        time.sleep(2)
        # Check whether the swipe is successful
        try:
            driver.find_element_by_id('tcaptcha_drag_thumb')
            print('Swipe failed, will try again!')
            # Refresh Code
            driver.find_element_by_id('e_reload').click()
            time.sleep(2)
        except Exception:
            break

    print('Verification succeeded!')
    time.sleep(11111)

if __name__ == '__main__':
    run()


Tags: Python Selenium crawler

Posted by navarre on Sat, 14 May 2022 19:44:23 +0300