select, poll and epoll are IO multiplexing mechanisms. I/O multiplexing is a mechanism that enables a process to monitor multiple descriptors. Once a descriptor is ready (generally read ready or write ready), it can notify the program to carry out the corresponding read-write operation.
select, poll and epoll are synchronous I/O in essence, because they need to be responsible for reading and writing after the reading and writing events are ready, that is, the reading and writing process is blocked
Asynchronous I/O does not need to be responsible for reading and writing. The implementation of asynchronous I/O will be responsible for copying data from the kernel to the user space.
Differences among sell, poll and epoll:
select:
At present, almost all platforms are supported
By default, there is a maximum limit on the number of file descriptors that a single process can monitor. By default, only 1024 socket s are supported on linux
This limit can be raised by modifying the macro definition or recompiling the kernel (modifying the maximum number of ports supported by the system)
After the kernel prepares the data, it notifies the user that there is data, but does not tell the user which connection has data. The user can only obtain the data by polling
Suppose select lets the kernel monitor 100 socket connections. When one connection has data, the kernel will notify the user that there is data in 100 connections
However, the user is not told which connection has data. At this time, the user can only check one by one through polling and then obtain the data
Here is to assume that there are 100 socket connections. What if there are tens of thousands, hundreds of thousands?
Then you have to poll tens of thousands of times, hundreds of thousands of times, and you get only one result. This will waste a lot of useless expenses
Only horizontal triggering is supported
Every time you call select, you need to copy the fd set from the user state to the kernel state. This overhead will be great when fd is a lot
At the same time, every time you call select, you need to traverse all fd passed in the kernel. This overhead will be great when there are many fd
poll:
There is no essential difference from select, but there is no limit on the maximum number of file descriptors
Only horizontal triggering is supported
Just a transitional version, rarely used
epoll:
Linux2. epoll, which appeared in June, has all the advantages of select and poll, and is recognized as the best multi-channel IO ready notification method
There is no limit to the maximum number of file descriptors
It supports both horizontal trigger and edge trigger
windows platform is not supported
After the kernel prepares the data, it will notify the user which connection has data
IO efficiency does not decrease linearly with the increase of fd number
Using mmap to speed up the messaging between kernel and user space
Horizontal trigger and edge trigger:
Horizontal trigger: after the ready file descriptors are told to the process, if the process does not perform IO operations on them, these file descriptors will be reported again when epoll is called next time. This method is called horizontal trigger
Edge trigger: it only tells the process which file descriptors have just become ready. It only says it once. If we don't take action, it won't tell it again. This method is called edge trigger
Theoretically, the performance of edge trigger is higher, but the code implementation is quite complex.
Features of select and epoll:
select:
Select is used to monitor the array of multiple file descriptors through a select() system call. When select() returns, the ready file descriptors in the array will be modified by the kernel, so that the process can obtain these file descriptors for subsequent read and write operations.
Due to the delay of network response time, a large number of TCP connections are inactive, but calling select() will perform a linear scan on all socket s, so it also wastes some overhead.
epoll:
Epoll also tells only those ready file descriptors, and when we call epoll_ When wait() obtains ready file descriptors, it returns not the actual descriptors, but a value representing the number of ready descriptors. You only need to obtain the corresponding number of file descriptors in turn from an array specified by epoll. Memory mapping (mmap) technology is also used here, which completely eliminates the cost of copying these file descriptors during system call.
Another essential improvement is that epoll adopts event based ready notification. In select/poll, the kernel scans all monitored file descriptors only after the process calls a certain method, and epoll passes epoll in advance_ CTL () to register a file descriptor. Once a file descriptor is ready, the kernel will use a callback mechanism similar to callback to quickly activate the file descriptor. When the process calls epoll_ You are notified when you wait ().
select
select(rlist, wlist, xlist, timeout=None)
The file descriptors monitored by the select function are divided into three categories: writefds, readfds, and exceptfds.
After calling, the select function will block until a descriptor is ready (with data readable, writable, or except), or timeout (timeout specifies the waiting time, and if the immediate return is set to null), the function returns. When the select function returns, you can traverse the fdset to find the ready descriptor.
poll
int poll (struct pollfd *fds, unsigned int nfds, int timeout);
Unlike the way that select uses three bitmaps to represent three fdset s, poll uses a pollfd pointer.
struct pollfd { int fd; /* file descriptor */ short events; /* requested events to watch */ short revents; /* returned events witnessed */ };
The pollfd structure contains the events to be monitored and the events that occur. The method of "parameter value" transmission of select is no longer used.
At the same time, there is no limit on the maximum number of pollfd (but the performance will decline if the number is too large).
Like the select function, after poll returns, pollfd needs to be polled to get the ready descriptor.
From the above, both select and poll need to traverse the file descriptor to obtain the ready socket after returning.
In fact, a large number of clients connected at the same time may only be in a ready state at one time, so its efficiency will decrease linearly with the increase of the number of monitored descriptors.
epoll
Epoll is proposed in the 2.6 kernel and is an enhanced version of the previous select and poll. Compared with select and poll, epoll is more flexible and has no descriptor restrictions.
epoll uses a file descriptor to manage multiple descriptors, and stores the events of the file descriptor of user relationship in an event table of the kernel, so that it only needs to copy once in user space and kernel space.
epoll operation process
epoll operation requires three interfaces, as follows:
int epoll_create(int size);//Create an epoll handle, and size is used to tell the kernel how many listeners there are int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
1. int epoll_create(int size);
Create an epoll handle. Size is used to tell the kernel how much the number of listeners is. This parameter is different from the first parameter in select(). It gives the value of fd+1 of the maximum listener. The parameter size does not limit the maximum number of descriptors that epoll can listen to, but is just a suggestion for the kernel to initially allocate internal data structures.
When the epoll handle is created, it will occupy an fd value. Under linux, if you check / proc / process id/fd /, you can see this fd. Therefore, after using epoll, you must call close() to close it, otherwise fd may be exhausted.
2. int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
The function performs op operations on the specified descriptor fd.
epfd: epoll_ Return value of create().
op: represents op operation, which is represented by three macros:
Add EPOLL_CTL_ADD, delete EPOLL_CTL_DEL, modify EPOLL_CTL_MOD.
Add, delete and modify listening events for fd respectively.
fd: fd (file descriptor) that needs to be monitored
epoll_event: it tells the kernel what to listen for
3. int epoll_wait(int epfd, struct epoll_event * events, int maxevents, int timeout);
Wait for io events on epfd and return maxevents at most.
The parameter events is used to get the set of events from the kernel. Maxevents tells the kernel how big the events are, and the value of maxevents cannot be greater than the value of epoll created_ size when creating (). The parameter timeout is the timeout (in milliseconds, 0 will be returned immediately, - 1 will be uncertain. It is also said that it is permanently blocked). If the function returns 0, it indicates the number of events that have timed out.
A simple select multiple concurrent socket server code is as follows:
#!/usr/bin/python #Author:sean import select import socket import queue server = socket.socket() HOST = 'localhost' PORT = 8080 print("start up %s on port: %s",% (HOST,PORT)) server.bind((HOST,PORT)) server.listen() server.setblocking(False) #No blocking msg_dic_queue = {} #This is a queue dictionary that stores the data to be returned to the client inputs = [server] #The inputs store the connections to be monitored by the kernel. The server here refers to monitoring the connection status of the server itself #inputs = [server,conn] outputs = [] #outputs stores the data connection object to be returned to the client while True: print("waiting for next connect...") readable,writeable,exceptional = select.select(inputs,outputs,inputs) #If no fd is ready, the program will always be blocked here # print(readable,writeable,exceptional) for r in readable: #Handle active connections, and each r is a socket connection object if r is server: #Represents a new connection conn,client_addr = server.accept() print("arrived a new connect: ",client_addr) conn.setblocking(False) inputs.append(conn) #Because the newly established connection hasn't sent data yet. If you receive it now, the program will report an exception #Therefore, if you want the server to know when the client sends data, you need to let the select monitor the conn msg_dic_queue[conn] = queue.Queue() #Initialize a queue and store the data to be returned to the client else: #If r is not a server, it means it is a file descriptor established with the client #The data from the client comes and is received here data = r.recv(1024) if data: print("received data from [%s]: "% r.getpeername()[0],data) msg_dic_queue[r].put(data) #The received data is put into the queue dictionary first, and then returned to the client if r not in outputs: outputs.append(r) #Put it into the returned connection queue. In order not to affect the processing of connections with other clients, data is not returned to the client immediately else: #If the data is not received, it means that the client has been disconnected print("Client is disconnect",r) if r in outputs: outputs.remove(r) #Clean up disconnected connections inputs.remove(r) del msg_dic_queue[r] for w in writeable: #Process the list of connections to return to the client try: next_msg = msg_dic_queue[w].get_nowait() except queue.Empty: print("client [%s]"% w.getpeername()[0],"queue is empty...") outputs.remove(w) #Make sure that writeable does not return processed connections on the next loop else: print("sending message to [%s]"% w.getpeername()[0],next_msg) w.send(next_msg) #Return to client source data for e in exceptional: #Handling exception connections if e in outputs: outputs.remove(e) inputs.remove(e) del msg_dic_queue[e]
select multiple concurrent socket client code is as follows:
#!/usr/bin/python #Author:sean import socket msgs = [ b'This is the message. ', b'It will be sent ', b'in parts.', ] SERVER_ADDRESS = 'localhost' SERVER_PORT = 8080 # Create a few TCP/IP socket socks = [ socket.socket(socket.AF_INET, socket.SOCK_STREAM) for i in range(500) ] # Connect the socket to the port where the server is listening print('connecting to %s port %s' % (SERVER_ADDRESS,SERVER_PORT)) for s in socks: s.connect((SERVER_ADDRESS,SERVER_PORT)) for message in msgs: # Send messages on both sockets for s in socks: print('%s: sending "%s"' % (s.getsockname(), message) ) s.send(message) # Read responses on both sockets for s in socks: data = s.recv(1024) print( '%s: received "%s"' % (s.getsockname(), data) ) if not data: print(sys.stderr, 'closing socket', s.getsockname() )
epoll multi concurrent socket server code is as follows:
#!/usr/bin/python
#Author:sean
import socket, logging
import select, errno
logger = logging.getLogger("network-server")
def InitLog():
logger.setLevel(logging.DEBUG)
fh = logging.FileHandler("network-server.log")
fh.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.ERROR)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
ch.setFormatter(formatter)
fh.setFormatter(formatter)
logger.addHandler(fh)
logger.addHandler(ch)
if __name__ == "__main__":
InitLog()
try:
# Create TCP socket as listening socket
listen_fd = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
except socket.error as msg:
logger.error("create socket failed")
try:
# Set SO_REUSEADDR option
listen_fd.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
except socket.error as msg:
logger.error("setsocketopt SO_REUSEADDR failed")
try:
# Bind -- no ip address is specified here, that is, bind all network card ip addresses
listen_fd.bind(('', 8008))
except socket.error as msg:
logger.error("bind failed")
try:
# Set the number of listen backlog
listen_fd.listen(10)
except socket.error as msg:
logger.error(msg)
try:
# Create epoll handle
epoll_fd = select.epoll()
# Register the readable event of listening socket in epoll handle
epoll_fd.register(listen_fd.fileno(), select.EPOLLIN)
except select.error as msg:
logger.error(msg)
connections = {}
addresses = {}
datalist = {}
while True:
# Where epoll performs fd scanning -- if no timeout is specified, it is blocking waiting
epoll_list = epoll_fd.poll()
for fd, events in epoll_list:
# If listening, fd is activated
if fd == listen_fd.fileno():
# accept -- get the ip and port of the connected client and the socket handle
conn, addr = listen_fd.accept()
logger.debug("accept connection from %s, %d, fd = %d" % (addr[0], addr[1], conn.fileno()))
# Set the connection socket to non blocking
conn.setblocking(0)
# Register the readable event of the connection socket with the epoll handle
epoll_fd.register(conn.fileno(), select.EPOLLIN | select.EPOLLET)
# Save conn and addr information respectively
connections[conn.fileno()] = conn
addresses[conn.fileno()] = addr
elif select.EPOLLIN & events:
# Readable event activation
datas = ''