web request execution process -- this paragraph is well written

This is the original:  https://www.cnblogs.com/hiit/p/11192384.html

Original text: https://www.sohu.com/a/327057430_120129354

 

Interview question for big factory: retest Baidu PHP Engineer today

------------------

The interviewer asked a lot of questions. I'll sort it out and recall:

1.Redis second kill implementation?

The principle of redis queue to solve the high concurrency of rush purchase: before the program and the database, we can use redis queue as a buffer mechanism to queue the requests of all users and follow the principle of first in first out (lpush and rpop in redis). Lpush program is to press the user's requests into the redis queue, and then use rpop as a daemon to get the data in the queue and write it according to the specified rush purchase quota, Write all successful users to redis and generate orders, view the winning users in lpush program and remind users of the rush purchase results in time!

2. The implementation of server timer, the difference between crontab and crontab -e, how to end the dead cycle?

The first one: set it under / etc/crontab and specify the user name. 1. The vim command enters / etc/crontab and adds 59 23 * * root / root / Catina / RM to the last line_ 8080lina. SH3. Restart crontab to make the configuration effective. The second method: directly use crontab -e without specifying the user. 1. Enter crontab -e. 2.: wq exit and save. 3. Check whether the above script has execution permission. 4. Also check whether the operation files involved in the script have permission. 5. Restart crontab to make the configuration effective. 1 Use ctrl-c to jump out of the loop 2 PS - ef|grep name query process number 3 Kill process number

3. How does PHP run shell script and where does the configuration file start

php provides us with three functions: system(), exec(), passthru() to call external commands Although these three commands can execute the shell commands of linux system, they are different: system() outputs and returns the last line of shell results. exec() does not output results, but returns the last line of shell results. All results can be saved in a returned array. passthru() only calls the command and directly outputs the running result of the command to the standard output device as it is. Same point: you can get the status code of command execution. Example: system("/usr/a.sh"); The first is to turn off safe mode_ Mode = off, then look at the disable function list disable_functions = proc_open, popen, exec, system, shell_exec, passthru. It's OK to remove exec and restart the server

4. Get HTTP header file

1 get all (client) http request header information #1 array apache_request_headers(void) #2: via$_ SERVER, and each HTTP request header information is represented by "HTTP" At the beginning, in$_ Get if from SERVER key_ modified_ Request information for since$_ SERVER['HTTP_IF_MODIFIED_SINCE']2 get all headers sent by the SERVER in response to an HTTP request array get_headers(string $url [, int $format = 0]) # URL the URL address of the requested SERVER # format 0: the returned header information is in the form of index number, 1: the returned header information is in the form of associative array $head_arr = get_headers("xwww.baidu.com");   $head_arr_index = get_headers("xwww.baidu.com",1);

5. How many ways can nginx load balancing be realized?

1. Polling (default) each request is allocated to different back-end servers one by one in chronological order. If the back-end server goes down, it can be automatically eliminated. upstream backserver { server 192.168.0.14; server 192.168.0.15;}2. Weight specifies the polling probability, which is proportional to the access ratio. It is used in the case of uneven performance of back-end servers. upstream backserver { server 192.168.0.14 weight=3; server 192.168.0.15 weight=7;} The higher the weight, the greater the probability of being visited. For example, 30% and 70% respectively. 3,ip_ There is a problem with the above method of hash, that is, in the load balancing system, if the user logs in on a server, when the user requests the second time, because we are the load balancing system, each request will be relocated to a certain server in the server cluster. Then the user who has logged in to a server will be relocated to another server, and his login information will be lost, This is obviously inappropriate. We can use IP_ The hash instruction solves this problem. If the client has accessed a server, when the user accesses it again, the request will be automatically located to the server through the hash algorithm. Each request is allocated according to the hash result of access IP, so that each visitor can access a back-end server regularly, which can solve the problem of session. upstream backserver { ip_hash; server 192.168.0.14:88; server 192.168.0.15:80;}4. fair (third party) allocates requests according to the response time of the back-end server, and those with short response time are given priority. 5,url_hash (the third party) allocates requests according to the hash result of the access URL, so that each URL is directed to the same (corresponding) back-end server, which is more effective when the back-end server is cache.  

6.Nginx. What's the difference between / path/to/photos /path/to/photos / in conf rewrite?

When the Web server receives a request for a url without a slash at the end, for example: XX COM / product. At this time, the server will search whether there is a file named "product" in the root directory of the website. If not, treat product as a directory, and then return to the default home page under abc directory. When the Web server receives a request with a slash at the end, it will be processed directly as a directory. For semantic clarity. Of course, many applications now rewrite routes.  

7.cookie session attack protection?

What kind of Cookie information can be used by attackers 1 The Cookie contains other information that should not be seen by anyone other than the developer, such as USERID=1000, USERSTATUS=ONLINE, ACCOUNT_ID=xxx and so on. 2. The Cookie information is encrypted, but it is easy to be decrypted by an attacker 3 How to prevent attacks using cookies when input verification is not performed on Cookie information 1 Do not save sensitive information in cookies 2 Do not save sensitive information that is not encrypted or easy to decrypt in cookies 3 Strictly verify the Cookie information obtained from the client 4 Record illegal Cookie information, analyze it, and improve the system according to this information. 5. Use SSL/TLS to transfer Cookie information

8.PHP common functions and description?

9.PHP extension file installation process?

Phpize installation / / download the compressed package of libevent extension file (you can download it at any directory in the current system) ~# WGet XXXX / get / libevent-0.1.0 Tgz / / unzip the file ~# tar - zxvf libevent-0.1.0 Tgz / / enter the source directory ~# cd libevent-0.1.0 / / for example, / usr/local/php7/bin/phpize / / run the phpize command and write the path of the entire phpize ~#/ Configure -- with PHP config = / usr / local / PHP / bin / PHP config / / run the configure command. During configuration, attach ~# make~# make test~# sudo make install to the path of PHP config / / modify PHP Ini, add: extension = libevent So / / restart the corresponding PHP FPM

10. A client http request returns the whole process from server to nginx to php response?

During the HTTP transaction execution process, the client (browser) makes a request operation (enter the web address, click the link, submit the form). The client resolves the domain name and requests the IP address from the set DNS server. The client uses three handshakes to establish TCP/IP connection with the server according to the IP address returned by the DNS server. After the TCP/IP connection is successful, the client sends an HTTP request to the server. The Web Server on the server side will judge the resource type of HTTP request and distribute the content; If the requested resource is a PHP file, the server software will start the corresponding CGI program for processing and return the processing result. The server responds the processing result of the Web Server to the client. The client receives the response of the server and renders the processing result. If the response content needs to request other static resources, it can speed up the access to the required resources through CDN. The client renders the rendered view and disconnects the TCP/IP connection

11. What is the difference between the principles of CGI, FastCGI, PHP-CGI and PHP-FPM?

CGI: a protocol for data exchange between public gateway interface Web Server and Web Application. FastCGI: FastCGI is like a long live CGI program, which can run all the time. It is a communication protocol with CGI, but it is more efficient than CGI. Similarly, the SCGI protocol is similar to FastCGI. PHP-CGI: it is the interface program of CGI protocol provided by PHP (Web Application) to Web Server. PHP-FPM: it is the interface program of FastCGI protocol provided by PHP (Web Application) to Web Server. In addition, it also provides some relatively intelligent task management.

  

12. What does the server status code 200 300 400 500 represent?

1.200-successful 2.300-307 indicates that further operation is required to complete the request. The code status is usually redirection. 3.400-417 indicates that the request may have made an error, Preventing the server from processing 400 - Request syntax that the server does not understand 401 - authentication error 403 - the server rejects the request 404 - the web page is not found (the most common is the server state) 405 - Method disable 406 - do not accept (the web page that cannot respond to the request using the requested content features) 407 - proxy authorization required 408 - Request timeout (server waiting for request timeout) 4.500-505 indicates that the server has an internal error when trying to process the request, It's the server's fault, not the request's fault 500 - internal error of the server (for example, the server in the test environment hangs) 501 - the server does not have the function to complete the request 502 - error network management 503 - the server is unavailable (overloaded or shutdown maintenance, suspended state) 504 - gateway timeout 505 - the http version is not supported (the http protocol version requested is not supported by the server)

13.linux c .h file compiled into so file, gcc understand?

The following is to compile mylib C as an example, how to compile So file. First, compile mylib c:$gcc -c -fPIC -o mylib.o mylib.c-C means only compile, not connect- The o option is used to describe the output file name. GCC will generate an object file mylib o. Note the - FPIC option. PIC refers to Position Independent Code. This option is required for shared libraries to enable dynamic linking. Generate shared library: $GCC - shared - O mylib so mylib.o library files start with lib. Share library files to So is the suffix- Shared means to generate a shared library.

14.Redis memory recycling mechanism?  

Volatile LRU: select the least recently used data from the dataset with set expiration time (server.db[i].expires); eliminate volatile TTL: select the data to expire from the dataset with set expiration time (server.db[i].expires); eliminate volatile random: select any data from the dataset with set expiration time (server.db[i].expires); eliminate allkeys LRU: eliminate from the dataset (server.db[i].dict) Select the least recently used data from the dataset (server.db[i].dict) to eliminate allkeys random: select any data from the dataset (server.db[i].dict) to eliminate no exclusion: it is prohibited to expel data

Redis will adopt the noeviction policy by default. In other words, if the memory is full, write operations are no longer provided, but only read operations are provided. Obviously, this often can not meet our requirements, because for the Internet system, it often involves millions or even more users, so it is often necessary to set a recycling strategy. It should be pointed out that LRU algorithm or TTL algorithm is not a very accurate algorithm, but an approximate algorithm. Redis will not determine the most accurate time value by comparing all key value pairs, so as to determine which key value pair to delete, because it will consume too much time, resulting in too long garbage collection execution time and service pause. 15. How to skip the verification code when a crawler simulates landing?

1. When climbing the website, we often encounter the problem of login, which requires the relevant methods of simulated Login. python provides a powerful url library, which is not difficult to do. 2. First of all, we must understand the role of cookies. Cookies are the data stored on the user's local terminal by some websites in order to identify the user's identity and track the session. Therefore, we need to use the Cookielib module to maintain the cookies of the website. 3. This is the address 1 and verification code address 24 to log in. It can be found that the verification code is dynamically updated. It is different every time it is opened. Generally, this verification code is synchronized with the cookie. Secondly, it must be thankless to identify the verification code. Therefore, our idea is to first visit the verification code page, save the verification code, obtain cookies for login, and then post data directly to the login address. 5. First, analyze the post request and header information required by the login page through the packet capture tool or Firefox or Google browser. Simulate the login verification code address and post address, bind the cookies, automatically manage, use the user name and password, access the verification code address with code, obtain the cookie, save the verification code to the local, open the saved verification code picture, input, construct the form according to the packet capturing information, construct the headers according to the packet capturing information, and generate post data? The request is constructed in the form of key1 = value1 & key2 = Value2 to request to print the page after login. After successful login, you can use the cookie to access other pages that need to be logged in.

 

________________________________________________________________

The interviewer asked a lot of questions. I'll sort it out and recall:

1.Redis second kill implementation?

1 2 3 4 5 redis The principle of queue to solve the high concurrency of rush purchase:   Before the program and database, we can use redis The queue is a buffer mechanism, which allows all users to queue their requests and follow the principle of first in, first out( redis Medium lpush and rpop), lpush The program is to press the user's request into redis Queue, and then use rpop Make a daemon to get the data in the queue and write it according to the specified number of rush purchases, Write all users who have successfully snapped up redis And generate orders in lpush Check the winning users in the program and remind the users of the rush purchase results in time!

  

 

2. The implementation of server timer, the difference between crontab and crontab -e, how to end the dead cycle?

1 2 3 4 5 6 7 8 9 10 11 First: in/etc/crontab Under settings, specify the user name 1,vim Command entry/etc/crontab 2,Add on the last line 59 23 * * * root /root/catina/rm_8080lina.sh 3,restart crontab,Make configuration effective Second: direct use crontab -e,You do not need to specify a user 1,crontab -e get into 2,: wq Exit save 3,Check whether the above script has execution permission 4,It also depends on whether the operation files involved in the script have permissions 5,restart crontab,Make configuration effective 
1 2 3 1.use ctrl-c Jump out of the dead circle 2.ps -ef|grep Name query process number 3.kill Process number

3. How does PHP run shell script and where does the configuration file start

1 2 3 4 5 6 7 php Provided us with system(),exec(),passthru()These three functions call external commands. Although all three commands can be executed linux Systematic shell Orders, but in fact they are different: system() Output and return the last line shell result. exec() Returns the last line without outputting the result shell As a result, all results can be saved into a returned array. passthru() Only call the command and output the operation result of the command directly to the standard output device as it is. Same point: you can get the status code of command execution example: system("/usr/a.sh");  
1 2 3 4 5 The first is to turn off safe mode safe_mode = off Then look at the list of disabled functions disable_functions = proc_open, popen, exec, system, shell_exec, passthru Here we need to exec Remove Restart the server OK Yes 

4. Get HTTP header file

1 2 3 4 5 6 1 Get all (client) HTTP Request header information     #1 array apache_request_headers(void)     #2: Pass$_ SERVER, and each HTTP request header information is represented by "HTTP" At the beginning, in$_ Get if from SERVER key_ modified_ Request information for since     $_SERVER['HTTP_IF_MODIFIED_SINCE']   2 Get a response from the server HTTP All headers sent by the request array get_headers(string $url [, int $format = 0 ] )
1 # url Of the requested server URL address # format 0: the returned header information is in the form of index number, 1: the returned header information is in the form of associative array < br > $head_ arr = get_ headers(" https://www.baidu.com "); <br>  $head_arr_index = get_headers("https://www.baidu.com",1);

  

5. How many ways can nginx load balancing be realized?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 1,Polling (default) Each request is allocated to different back-end servers one by one in chronological order. If the back-end server down It can be removed automatically. upstream backserver {     server 192.168.0.14;     server 192.168.0.15; } 2,weight Specify the polling probability, weight It is directly proportional to the access ratio and is used for the of uneven performance of back-end servers situation. upstream backserver {     server 192.168.0.14 weight=3;     server 192.168.0.15 weight=7; } The higher the weight, the greater the probability of being visited, as in the above example, they are 30%,70%. 3,ip_hash There is a problem with the above method, that is, in the load balancing system, if the user logs in on a server, when the user requests the second time, because we are the load balancing system, each request will be relocated to a certain server in the server cluster, and then the user who has logged in to a server will be relocated to another server, and his login information will be lost. This is obviously inappropriate. We can use ip_hash The instruction solves this problem. If the client has accessed a server, when the user accesses it again, the request will be automatically located to the server through the hash algorithm. Access per request ip of hash Result allocation, so that each visitor has a fixed access to a back-end server, which can be solved session Problems. upstream backserver {     ip_hash;     server 192.168.0.14:88;     server 192.168.0.15:80; } 4,fair((third party) Requests are allocated according to the response time of the back-end server, and those with short response time are allocated first. 5,url_hash((third party) Access by url of hash Results to allocate requests so that each url Directed to the same (corresponding) back-end server, which is more effective when the back-end server is cache. 

6.Nginx.conf rewrite end symbol problem / path/to/photos / path/to/photos / difference?

1 2 3 When Web The server received a request for a with no slash at the end url On request, For example: xx.com/product,At this time, the server will search the root directory of the website to see if there is one named“ product"If you don't have any documents, just put them away product Process as a directory and return abc The default home page under the directory. When Web When the server receives a request with a slash at the end, it will be directly treated as a directory. For semantic clarity. Of course, many applications now rewrite routes.<em id="__mceDel" style=" font-family: "PingFang SC", "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 14px"> </em>

7.cookie session attack protection?

1 2 3 4 5 6 7 8 9 10 Like what? Cookie Information can be exploited by attackers 1. Cookie It contains other information that should not be seen by anyone other than the developer, such as USERID=1000,USERSTATUS=ONLINE,ACCOUNT_ID=xxx Wait for this information. 2. Cookie The information is encrypted, but can easily be decrypted by an attacker 3. Right Cookie No input verification was performed when the information was received How to prevent and use Cookie Attacks carried out 1. Don't be Cookie Save sensitive information in 2. Don't be Cookie Save sensitive information that has not been encrypted or is easy to decrypt 3. For the data obtained from the client Cookie Strict verification of information 4. Record illegal Cookie Analyze the information and improve the system according to the information. 5. use SSL/TLS To deliver Cookie information

8.PHP common functions and description?

9.PHP extension file installation process?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 phpize install //Download the compressed package of libevent extension file (you can download it at any directory in the current system) ~# wget http://pecl.php.net/get/libevent-0.1.0.tgz //Unzip file ~# tar -zxvf libevent-0.1.0.tgz //Enter the source directory ~# cd libevent-0.1.0/ as /usr/local/php7/bin/phpize //Run the phpize command and write the path of the whole phpize ~# ./configure --with-php-config=/usr/local/php/bin/php-config //Run the configure command. When configuring, attach the path of PHP config ~# make ~# make test ~# sudo make install //Modify PHP Ini, add: extension = libevent so //Restart the corresponding PHP FPM

10. A client http request returns the whole process from server to nginx to php response?

1 2 3 4 5 6 7 8 9 10 HTTP Transaction execution process   The client (browser) makes the requested operation (enter the website address, click the link, submit the form). The client resolves the domain name and sends it to the set DNS Server request IP Address. Client based DNS Server return IP The address is established with the server by three handshakes TCP/IP connect. TCP/IP After the connection is successful, the client sends a message to the server HTTP Request. Server side Web Server Can judge HTTP The type of resource requested for content distribution processing; If the requested resource is PHP File, the server software will start the corresponding CGI The program processes and returns the processing results. The server will Web Server The processing result of the response is sent to the client The client receives the response from the server and renders the processing results. If the response content needs to request other static resources, it can CDN Accelerate access to required resources. The client renders the rendered view and disconnects it TCP/IP connect

 

11. What is the difference between the principles of CGI, FastCGI, PHP-CGI and PHP-FPM?

1 2 3 4 CGI: Is a public gateway interface Web Server And Web Application A protocol for data exchange between. FastCGI: FastCGI It's like a resident( long-live)Type CGI Program, it can run all the time. with CGI,It is a communication protocol, but it is better than CGI Some optimization has been made in efficiency. Again, SCGI Agreement and FastCGI similar. PHP-CGI: yes PHP (Web Application)yes Web Server Provided CGI Protocol interface program. PHP-FPM: yes PHP(Web Application)yes Web Server Provided FastCGI The interface program of the protocol also provides some relatively intelligent task management.

   

12. What does the server status code 200 300 400 500 represent?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1.200-success 2.300-307 Indicates that further operations are required to complete the request. The code status is usually redirect 3.400-417 Indicates that the request may be in error, preventing the server from processing 400-Request syntax not understood by server 401-Authentication error 403-The server rejected the request 404-Web page not found (most common server status) 405-Method disabled 406-Not accepted (unable to respond to the requested page with the requested content attribute) 407-Proxy authorization required 408-Request timeout (server waiting for request timeout) 4.500-505 Indicates that an internal error occurred when the server tried to process the request. It is the server's fault, not the request's fault 500-Internal server error (for example, the server of the test environment hangs) 501-The server does not have the ability to complete the request 502-Error network management 503-The server is unavailable (overloaded or shut down for maintenance, suspended state) 504-gateway timeout 505-http Version not supported (requested) http Protocol Version (not supported by server)

13.linux c  . h file compiled into so file, gcc understand?

1 2 3 4 5 6 7 8 Below to compile mylib.c For example, how to compile.so Documents. First, compile mylib.c: $gcc -c -fPIC -o mylib.o mylib.c -c Means compile only(compile),Not connected.-o Options are used to describe the output(output)File name. gcc A target will be generated(object)file mylib.o. be careful-fPIC Options. PIC finger Position Independent Code. This option is required for shared libraries to enable dynamic connections(dynamic linking). Generate shared library: $gcc -shared -o mylib.so mylib.o Library files to lib Start. Share library files to.so Is a suffix.-shared Represents generating a shared library.

https://blog.csdn.net/ngvjai/article/details/8520840

14.Redis memory recycling mechanism?  

1 2 3 4 5 6 volatile-lru: From a dataset with a set expiration time( server.db[i].expires)Select the least recently used data and eliminate them volatile-ttl: From a dataset with a set expiration time( server.db[i].expires)Select the data that will expire and eliminate it volatile-random: From a dataset with a set expiration time( server.db[i].expires)Select any data to eliminate allkeys-lru: From dataset( server.db[i].dict)Select the least recently used data and eliminate them allkeys-random: From dataset( server.db[i].dict)Select any data to eliminate no-enviction(Expulsion): Prohibition of expulsion data

Redis will adopt the noeviction policy by default. In other words, if the memory is full, write operations are no longer provided, but only read operations are provided. Obviously, this often can not meet our requirements, because for the Internet system, it often involves millions or even more users, so it is often necessary to set a recycling strategy. It should be pointed out that LRU algorithm or TTL algorithm is not a very accurate algorithm, but an approximate algorithm. Redis will not determine the most accurate time value by comparing all key value pairs, so as to determine which key value pair to delete, because it will consume too much time, resulting in too long garbage collection execution time and service pause.
15. How to skip the verification code when a crawler simulates landing?

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1,When climbing the website, we often encounter the problem of login, which requires the relevant methods of simulated Login. python Provides a powerful url Ku, it's not hard to do this. 2,First of all, you have to understand cookie The role of, cookie Is some websites in order to identify users and session Data stored on the user's local terminal for tracking. So we need to use Cookielib Module to keep the website cookie. 3,This is the address 1 and verification code address 2 to log in 4,It can be found that this verification code is dynamically updated. It is different every time you open it. Generally, this verification code is similar to cookie It's synchronous. Secondly, it must be thankless to identify the verification code. Therefore, our idea is to visit the verification code page first, save the verification code and obtain it cookie Used for login, and then directly to the login address post data 5,First, analyze the needs of the login page through the packet capture tool or Firefox or Google browser post of request and header Information. Simulate login Verification code and address post address take cookies Binding automatic management SQL Server Authentication Access verification code address with code,obtain cookie Save verification code to local Open the saved verification code image input Form is constructed according to packet capturing information Construct according to packet capture information headers generate post data ?key1=value1&key2=value2 Form of structure request request Print the page after login After successful login, you can use the cookie Access other pages that require login.

Posted by mosi on Wed, 11 May 2022 15:13:02 +0300