python crawler foundation Selenium

1, Selenium+Python environment construction and configuration

1.1 introduction to selenium

Selenium is a web automated testing tool. Many students who study functional automation begin to prefer selenium because it has many advantages over QTP:

  • Free, and you don't have to worry about cracking QTP
  • Compact, it is just a package for different languages, and QTP needs to download and install more than 1 G program.
  • This is also the most important point. No matter you are more familiar with C, java, ruby, python or C #, you can complete automatic testing through selenium, while QTP only supports VBS
  • Support multiple platforms: windows, linux, MAC, and support multiple browsers: ie, ff, safari, opera, and chrome
  • Support the execution of distributed test cases. Test cases can be distributed to different test machines for execution, which is equivalent to the function of distributor.

Official documents:

1.2 selenium+Python environment configuration

Prerequisite: Python development environment has been installed (Python 3.5 and above are recommended)

Installation steps:

  1. Installing selenium
    Win: pip install selenium
    Mac:pip3 install selenium
  2. Install webdriver
    For the webdriver address of each browser, please refer to:
    Chrome: perhaps
    Note: webdriver should correspond to the corresponding browser version and selenium version
Webdriver version Supported Chrome versions
v2.41 v67-69
v2.40 v66-68
v2.39 v66-68
v2.38 v65-67
v2.37 v64-66
v2.36 v63-65
v2.35 v62-64
v2.34 v61-63
v2.33 v60-62
  1. webdriver installation path
    Win: copy webdriver to Python installation directory
    Mac: copy webdriver to / usr/local/bin directory

2, Element positioning and basic operation of browser

2.1 launch browser

2.1.1 normal mode startup

Launch Chrome browser:

from selenium import webdriver

browser = webdriver.Chrome()

Launch Firefox browser:

from selenium import webdriver

browser = webdriver.Firefox()

Start IE browser:

from selenium import webdriver

browser = webdriver.Ie()

2.1.2 Headless mode startup

Headless Chrome is an interface free form of Chrome browser. You can run your program using all the features supported by chrome without opening the browser. Compared with modern browsers, headless Chrome is more convenient to test web applications, obtain screenshots of websites, do crawlers to grab information, etc. Compared with earlier phantom JS and SlimerJS, headless Chrome is closer to the browser environment.

Headless Chrome requires Chrome version:
According to the official documents, the mac and linux environments require the chrome version to be 59 +, while the windows version requires the chrome version to be 60 +, and the chrome River requires the 2.30 + version.

from selenium import webdriver
from import By
from import WebDriverWait
from import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

chrome_options = webdriver.ChromeOptions()
# Use headless no interface browser mode
chrome_options.add_argument('--headless') //Add no interface option
chrome_options.add_argument('--disable-gpu') //If this option is not added, sometimes there will be positioning problems

# Start the browser and get the web page source code
browser = webdriver.Chrome(chrome_options=chrome_options)
mainUrl = ""
print(f"browser text = {browser.page_source}")

2.1.3 load configuration and start browser

Selenium operating browser does not load any configuration. Here is the method to load Chrome configuration:

Use the Chrome address bar to enter chrome://version/ , check your "profile path", and then call this configuration file when the browser starts. The code is as follows:

from selenium import webdriver
option = webdriver.ChromeOptions()
option.add_argument('--user-data-dir=C:\Users\Administrator\AppData\Local\Google\Chrome\User Data') #Set to the user's own data directory

The method of loading Firefox configuration is somewhat different:

Open Firefox and click Settings > in the upper right corner? (help) > troubleshooting information > display folder. Open it and copy the path

# coding=utf-8
from selenium import webdriver
# Profile address
profile_directory = r'C:\Users\xxx\AppData\Roaming\Mozilla\Firefox\Profiles\1x41j9of.default'
# Load configuration
profile = webdriver.FirefoxProfile(profile_directory)
# Launch browser configuration
driver = webdriver.Firefox(profile)

2.2 element positioning

Object positioning should be the core of automatic testing. If you want to operate an object, you should first identify the object. An object is just like a person. It has various characteristics (attributes). For example, we can find a person by his ID number, name, or street, floor, or house number where he lives. Then an object has a similar attribute. We can find the object through this attribute.

webdriver provides a series of object location methods, including the following:

  • ID location: find_element_by_id()
  • Name: find_element_by_name()
  • Class positioning: find_element_by_class_name()
  • Link location: find_element_by_link_text()
  • partial link location: find_element_by_partial_link_text()
  • Tag positioning: find_element_by_tag_name()
  • XPath positioning: find_element_by_xpath()
  • CSS positioning: find_element_by_css_selector()
from selenium import webdriver
#########Positioning method of Baidu input box##########
#Locate by id
#Locate by name
#Locate by tag name
#Locate by class name
#Positioning through CSS
#Locate by xpath

2.2.1 solutions when class contains spaces:

When actually locating elements, it is often found that class name is a composite class with multiple class combinations, separated by spaces. If an error is reported during direct positioning, it can be handled in the following ways:

  • The class attribute is unique, but there are spaces. Select the only one on both sides of the space
  • If the class es separated by spaces are not unique, they can be located by index
  • Positioning through css method (space is replaced by '.)
#Add (.) Use a dot (.) To replace
#Contains the entire class
self.driver.find_element_by_css_selector('class="dtb-style-1 table-dragColumns').click()

Reference code:

# coding:utf-8
from selenium import webdriver
driver = webdriver.Firefox()
# Method 1: take a single class attribute
# Method 2: locate a group and take down the subscript to locate (which is the worst Policy)
# Method 3: css location
# Method 4: it is also possible to take a single class attribute
# Method 5: CSS attribute positioning method directly containing spaces
driver.find_element_by_css_selector("[class='j-inputtext dlemail']").send_keys("yoyo")

2.3 three waiting modes of selenium

Sometimes, in order to ensure the stability of the script, the waiting time needs to be added to the script.

2.3.1 forced waiting

The first and simplest way is to forcibly wait for sleep(xx). You need to introduce the "time" module. This is called forced waiting. No matter whether your browser is loaded or not, the program has to wait for 3 seconds. Once 3 seconds arrive, continue to execute the following code. It is very useful for debugging. Sometimes you can wait in the code, but it is not recommended to always use this waiting method. It is too rigid and seriously affects the execution speed of the program.

# -*- coding: utf-8 -*-
from selenium import webdriver
import time

driver = webdriver.Firefox()

time.sleep(3)  # Force a wait of 3 seconds before proceeding to the next step


2.3.2 hidden waiting

The second method is called implicit waiting by adding implicitly_wait() method can easily realize intelligent waiting; implicitly_wait(30) should be used better than time Sleep () is more intelligent. The latter can only choose to wait for a fixed time, while the former can wait intelligently within a time range.

# -*- coding: utf-8 -*-
from selenium import webdriver

driver = webdriver.Firefox()
driver.implicitly_wait(30)  # Wait implicitly for up to 30 seconds


Invisible waiting is to set a maximum waiting time. If the web page is loaded within the specified time, execute the next step. Otherwise, wait until the time expires, and then execute the next step. Note that there is a disadvantage here, that is, the program will wait until the whole page is loaded, that is, generally, you will not execute the next step until you see that the small circle in the browser tab bar is no longer turned. However, sometimes the elements you want on the page are already loaded, but because some js and other things are particularly slow, I still have to wait until the page is completely completed before I can execute the next step, I want to wait until the elements I want come out. What's the next step? There is a way. It depends on another waiting method provided by selenium - explicit waiting.
It should be noted that the hidden waiting works for the whole driver cycle, so it can be set only once. I once saw someone use the hidden waiting as a sleep and come wherever they go

2.3.3 explicit waiting

The third method is explicit wait, WebDriverWait, which is combined with the until() and until() of this class_ With the not () method, you can wait flexibly according to the judgment conditions. Its main meaning is: the program looks at it every xx seconds. If the condition is true, execute the next step. Otherwise, continue to wait until the maximum time is exceeded, and then throw TimeoutException.

The WebDriverWait class of the wait module is an explicit wait class. First look at its parameters and methods: (class)


driver: afferent WebDriver Example, that is, the example in our previous example driver
timeout: Timeout, the longest waiting time (taking into account the hidden waiting time)
poll_frequency: call until or until_not The interval between methods in. The default is 0.5 second
ignored_exceptions: Ignore the exception if called until or until_not If an exception in this tuple is thrown in the process of, the code will not be interrupted and continue to wait. If an exception outside this tuple is thrown, the code will be interrupted and an exception will be thrown. Default only NoSuchElementException. 


method: During the waiting period, every other period of time(__init__Medium poll_frequency)Call the passed in method until the return value is not False
message: Thrown if timeout occurs TimeoutException´╝îtake message Incoming exception


And until contrary, until When an element appears or any condition is true, it will continue to be executed,
until_not When an element disappears or any condition is not true, it will continue to be executed. The parameters are the same and will not be repeated.

After reading the above contents, it is basically clear that the calling method is as follows:

WebDriverWait(driver, timeout duration, call frequency, ignore exception) Until (executable method, information returned when timeout)

What needs special attention here is until or until_ For the method parameter of the executable method in not, many people have passed in the WebElement object, as follows:

WebDriverWait(driver, 10).until(driver.find_element_by_id('kw ') # error)

This is the wrong usage. The parameters here must be callable, that is, the object must have a call() method, otherwise an exception will be thrown:

TypeError: 'xxx' object is not callable

Here, you can use expected provided by selenium_ Various conditions in the conditions module can also use the is of WebElement_ displayed() ,is_enabled(),**is_ The selected() * * method or the method encapsulated by yourself can be used.

from selenium import webdriver
from import By
from import expected_conditions as EC
from import WebDriverWait

base_url = ""
driver = webdriver.Firefox()
'''When both implicit waiting and display waiting exist, the timeout time is the larger of the two'''
locator = (By.ID,'kw')

WebDriverWait(driver,10).until(EC.title_is(u"Baidu once, you will know"))
'''judge title,Returns a Boolean value'''

WebDriverWait(driver,10).until(EC.title_contains(u"use Baidu Search"))
'''judge title´╝îReturns a Boolean value'''

'''Determine whether an element has been added dom In the tree, it does not mean that the element must be visible. If it is located, it will be returned WebElement'''

'''Determine whether an element has been added to the dom Inside and visible, visible representative elements can be displayed, and both width and height are greater than 0'''

'''Judge whether the element is visible. If it is visible, return this element'''

'''Determine whether at least one element exists in dom In the tree, if it is located, it returns the list'''

'''Judge whether at least one element is visible in the page. If it is located, it will return to the list'''

WebDriverWait(driver,10).until(EC.text_to_be_present_in_element((By.XPATH,"//*[@ id='u1']/a[8]"),u' setting ')
'''Determines whether the specified element contains the expected string and returns a Boolean value'''

WebDriverWait(driver,10).until(EC.text_to_be_present_in_element_value((By.CSS_SELECTOR,'#su'),u' Baidu once ')
'''Determines whether the attribute value of the specified element contains the expected string and returns a Boolean value'''

'''Judge the frame Whether it can be or not? switch Go in and return if you can True also switch Go in, or go back False'''
#Note that there is no frame that can be switched in

'''Determine whether an element exists in dom Or invisible,Return if visible False,Invisible returns this element'''
#be careful#Cookwrap is an element hidden in this page

'''Determine whether an element is visible and enable Yes, delegates can click'''

'''Wait for an element from dom Remove from tree'''
#There is no suitable example here

'''Determine whether an element is selected,Generally used in drop-down lists'''

'''Judge whether the selected state of an element meets the expectation'''

'''Judge whether the selected state of an element meets the expectation'''

instance = WebDriverWait(driver,10).until(EC.alert_is_present())
'''Determine whether there is on the page alert,If so, switch to alert And return alert Content of'''
print instance.text


2.4 browser operation

2.4.1 browser maximization and minimization

Maximize browser display


Minimize browser display


2.4.2 browser setting window size

Set the browser width of 480 and height of 800 to display

browser.set_window_size(480, 800)

2.4.3 browser forward and backward



back off


2.5 operation test object

Generally speaking, the following methods are commonly used to manipulate objects in webdriver:

  • Click -- click the object
  • send_keys -- simulate key input on the object
  • Clear -- clear the contents of the object, if possible
  • Submit -- submit the contents of the object, if possible
  • Text -- used to obtain the text information of the element

2.6 keyboard events

To call the keyboard key operation, you need to introduce the keys package:
from selenium.webdriver.common.keys import Keys send_keys() call key:
send_keys(Keys.TAB) # TAB
send_keys(Keys.ENTER) # enter

Reference code:

from selenium import webdriver 
from selenium.webdriver.common.keys import Keys #The keys package needs to be introduced
import os,time

driver = webdriver.Firefox() 

driver.maximize_window() # Browser full screen display


#The positioning of the tab is equivalent to clearing the default prompt information of the password box, which is the same as clear() above 

#By locating the password box, enter replaces the login button

#You can also locate the login button and use enter instead of click 


Usage of keyboard combination keys:

#ctrl+a select all input box contents 
#ctrl+x cuts the contents of the input box 

2.7 mouse events

Mouse events generally include right clicking, double clicking, dragging, moving the mouse to an element, and so on.
ActionChains class needs to be introduced.
Introduction method:
from selenium.webdriver.common.action_chains import ActionChains

ActionChains Common methods:
perform()  Execute all ActionChains Behavior stored in;
context_click()  Right click;
double_click()   Double click;
drag_and_drop()  Drag;
move_to_element()  Mouse over.

Mouse double click example:

#Navigate to the element you want to double-click
 qqq =driver.find_element_by_xpath("xxx") 
#Double click the anchored element 

Mouse drag and drop example:

#Locate the original location of the element 
element = driver.find_element_by_name("source") 
#Locate the target location to which the element is to be moved 
target = driver.find_element_by_name("target")
#Perform the move operation of the element 
ActionChains(driver).drag_and_drop(element, target).perform()

2.8 multi storey frame / level positioning

In the process of locating elements, we often encounter the problem of missing elements, which is generally caused by the following factors:

  • Incorrect element positioning method
  • Page has iframe or embedded window
  • Page Timeout

webdriver provides a switch_to_frame method can easily solve this problem.

#ifrome1 (id = f1) was found first

Similarly, if it is an embedded window:

2.9 Expected Conditions analysis

There are two usage scenarios for Expected Conditions:

  • Use directly in assertions
  • Use with WebDriverWait to dynamically wait for elements on the page to appear or disappear

Relevant methods:

  • title_is: judge whether the title of the current page is exactly equal to the expected value
  • title_contains: judge whether the title of the current page contains the expected string
  • presence_of_element_located: to judge whether an element is added to the dom tree does not mean that the element must be visible
  • visibility_of_element_located: determines whether an element is visible Visible means that the element is not hidden, and the width and height of the element are not equal to 0
  • visibility_of: it does the same thing as the above method, except that the above method needs to be passed into the locator, and this method can directly pass the located element
  • presence_of_all_elements_located: determine whether at least one element exists in the dom tree. For example, if the class of n elements on the page is' column-md-3 ', this method returns True as long as one element exists
  • text_to_be_present_in_element: judge whether the text in an element contains the expected string
  • text_to_be_present_in_element_value: determine whether the value attribute in an element contains the expected string
  • frame_to_be_available_and_switch_to_it: judge whether the frame can be switched in. If yes, return True and switch in. Otherwise, return False
  • invisibility_of_element_located: determines whether an element does not exist in the dom tree or is invisible
  • element_to_be_clickable: judge whether an element is visible and enable d. In this case, it is called clickable
  • staleness_of: wait for an element to be removed from the dom tree. Note that this method also returns True or False
  • element_to_be_selected: determines whether an element is selected. It is generally used in the drop-down list
  • element_selection_state_to_be: judge whether the selected state of an element meets the expectation
  • element_located_selection_state_to_be: the function of the above method is the same, except that the above method passes in the located element, and this method passes in the locator
  • alert_is_present: judging whether there is an alert on the page is an old question, which many students will ask

Judge Title: title_is(),title_contains()

  1. Import expected first_ Conditions module
  2. Because the name of this module is relatively long, it is renamed EC for the convenience of subsequent calls (a bit like renaming when querying multiple tables in the database)
  3. After opening the blog home page, judge the title, and the returned result is True or False
# coding:utf-8
from selenium import webdriver
from import expected_conditions as EC
driver = webdriver.Firefox()
# Judge that title is exactly equal to
title = EC.title_is(u'Baidu')
print title(driver)

# Determine whether the title contains
title1 = EC.title_contains(u'Baidu')
print title1(driver)

# Another way of writing
r1 = EC.title_is(u'Baidu')(driver)
r2 = EC.title_contains(u'Baidu')(driver)
print r1
print r2

3, Selenium checklist

3.1 Python Webdriver Exception quick lookup table

Various exceptions may occur during the use of webdriver. We need to understand the exception and know how to handle it.

abnormal describe
WebDriverException The base class of all webdriver exceptions. Thrown when there are exceptions and they do not belong to the following exceptions
InvalidSwitchToTargetException The parent class of the following two exceptions is thrown when the target to switch does not exist
NoSuchFrameException When you want to use switch_ to. Thrown when frame() cuts into a nonexistent frame
NoSuchWindowException When you want to use switch_ to. Thrown when window() cuts into a nonexistent window
NoSuchElementException Element does not exist. It is usually found_ Element and find_elements thrown
NoSuchAttributeException Generally, when you get non-existent element attributes, you should pay attention to that some attributes have different attribute names in different browsers
StaleElementReferenceException The specified element is out of date and not in the current DOM tree. It may have been deleted or the page or iframe has been refreshed
UnexpectedAlertPresentException An unexpected alert occurs, which is thrown when the execution of the instruction is blocked
NoAlertPresentException Thrown when you want to get an alert, but no alert actually appears
InvalidElementStateException The parent class of the following two exceptions is thrown when the element state cannot perform the desired operation
ElementNotVisibleException The element exists but is not visible and cannot interact with it
ElementNotSelectableException Thrown when you want to select an element that cannot be selected
InvalidSelectorException This error is usually thrown when your xpath syntax is wrong
InvalidCookieDomainException Thrown when you want to add a cookie to a domain other than the current url
UnableToSetCookieException Thrown when the driver cannot add a cookie
TimeoutException Thrown when an instruction is not completed in sufficient time
MoveTargetOutOfBoundsException Thrown during the move operation of actions to move the target out of the window
UnexpectedTagNameException Thrown when the obtained element tag does not meet the requirements. For example, when you pass in an element with a non select tag by instantiating select
ImeNotAvailableException Thrown when the input method is not supported. These two exceptions are not common. It is said that the ime engine is only used for Chinese / Japanese support under linux
ImeActivationFailedException Thrown when activation of the input method fails
ErrorInResponseException Not common. It may be thrown when there is an error on the server side
RemoteDriverServerException Not common. It seems that this error will be reported when the driver fails to start the browser in some cases

3.2 quick reference table of XPath & CSS positioning method

describe Xpath Css
Direct child element //div/a div > a
Child element or descendant element //div//a div a
Locate with id //div[@id='idValue']//a div#idValue a
Locate by class //div[@class='classValue']//a div.classValue a
Sibling element //ul/li[@class='first']/following- ul>li.first + li
attribute //form/input[@name='username'] form input[name='username']
Multiple attributes //input[@name='continue' and input[name='continue'][type='button
4th child element //ul[@id='list']/li[4] ul#list li:nth-child(4)
1st child element //ul[@id='list']/li[1] ul#list li:first-child
Last child element //ul[@id='list']/li[last()] ul#list li:last-child
Property contains a field //div[contains(@title,'Title')] div[title*="Title"]
Property starts with a field //input[starts-with(@name,'user')] input[name^="user"]
Property ends with a field //input[ends-with(@name,'name')] input[name$="name"]
text contains a field //div[contains(text(), 'text')] Unable to locate
Element has an attribute //div[@title] div[title]
Parent node //div/... Unable to locate
Sibling node //li/preceding-sibling::div[1] Unable to locate


Here is an online code beautification tool, online access address:

In addition, if you use vscade, you can install the corresponding plug-ins for fast online beautification.

  • Open vscope and enter: carbon now SH in the plug-in column
  • Click Install
  • Click reload to install
  • Press the shortcut key Alt + CMD + a (used in win system: ALT+WIN+A)

Effect preview:

Tags: Python

Posted by lip9000 on Thu, 05 May 2022 23:31:18 +0300