Explore the principle of XSS exploiting encoding bypass

The article was last updated: November 01, 2019 09:56:55

For the bypassing posture of XSS attack, you may use the code by opening your mouth, but there are so many ways to code, how to code, and what is the principle, most people may not be familiar with it, this time I will use this article to explain Explore the posture and principle of XSS encoding bypassing. If there is any error, please correct me.

0x01 Know the decoding process of the requested webpage

Coding belongs to the basic knowledge of computer systems, and its content can be written in a book, but we all know more or less. In general, coding is to convert characters into binary numbers, and decoding is to restore binary numbers. Numbers are characters. From the browser requesting the url to the display on the page, it has also gone through some encoding and decoding processes. The following is a brief introduction to the process. The specific process can refer to here

  • URL encoding URL encoding is to allow non-standard characters such as Chinese characters in the URL. The essence is to convert a character into % and add the hexadecimal number corresponding to UTF-8 encoding. So it is also called Percent-encoding. When the server receives the request, it will automatically perform a URL decoding on the request.
  • HTML encoding/decoding When the browser receives the binary data sent by the server, it first decodes it in HTML, and what is presented is the source code we see. The specific decoding method depends on the specific situation, so we need to specify the encoding in the page to prevent the browser from decoding in the wrong way, resulting in garbled characters. For example, the Baidu search homepage specifies the decoding method as UTF-8: (In order to find the key point, code the rest)

However, some characters in HTML conflict with keywords, such as <, >, &. After decoding, the browser will mistake them for tags. How to solve it? In order to display reserved characters correctly, we need to use character entities in HTML source code, such as our common space  , Character entities are represented by &beginning + predefined entity names, but not all characters have entity names, but they all have entity numbers, which can also be represented by &# at the beginning + entity number + semicolon. For ex amp le:

show result

describe

entity name

entity number

<

Less than sign

&lt;

&#60;

>

greater than sign

&gt;

&#62;

After the browser decodes the HTML, it starts parsing the HTML and converts the tags into DOM nodes in the content tree. At this time, when identifying tags, the HTML parser cannot identify the content encoded by the entity. Only by establishing a DOM tree can the Identify the content of each node. If entity encoding occurs, entity decoding will be performed. As long as it is the value of the attribute in the DOM node, it can be HTML encoded and parsed.

Therefore, in PHP, the htmlspecialchars() function is used to convert the predefined characters into HTML entities. Only after the DOM tree is established, the HTML entities will be parsed, which plays a role in XSS protection.

  • JS decoding (only supports UNICODE) After HTML parsing generates DOM nodes, the next parsing work will be done according to the DOM nodes. For example, when processing tags such as <script> <style>, the parser will automatically switch to JS parsing mode, and src and href are added after The javascript pseudo URL will also enter the JS parsing mode. For example, <a href="javascript:alert ('<\u4e00>')">test</a>, JavaScript starts the JS interpreter, JS will first correct the content conduct s parsing, there is an escape character \u_ 00 inside, the leading \u indicates that he is a unicode character, according to the number behind it, parses into one, and then becomes: href="javascript:alert ('<1>')">test

It can be seen that if you want characters to be recognized by JS after encoding, you can perform unicode encoding on the characters.

0x02 XSS coding practice

Let's use an ordinary XSS code to illustrate <a href="javascript:alert('xss')">test</a>, the process of browser parsing is roughly as follows:

First, the HTML parser starts to work and HTML decodes the characters in the href, and then the URL parser decodes the href value. Under normal circumstances, the URL value is a normal URL link, such as: https://www.baidu. com, then the URL parser does not need other decoding after the work is completed, but the URL resource type in this environment is Javascript, so the Javascript parser in the last step in this environment will also perform a decoding operation, and the final parsed script is executed.

The whole parsing sequence is 3 links: HTML decoding --> URL decoding --> JS decoding

We can make the following deformations, and the following cases can be successfully popped up:

  • Convert javascript:alert('xss') into HTML entities, because after parsing HTML, an <a>DOM node is established, and then the HTML entities in the DOM node are parsed.
<a href="&#x6a;&#x61;&#x76;&#x61;&#x73;&#x63;&#x72;&#x69;&#x70;&#x74;&#x3a;&#x61;&#x6c;&#x65;&#x72;&#x74;&#x28;&#x27;&#x78;&#x73;&#x73;&#x27;&#x29;">test</a>
copy
  • Do JS encoding for the alert, because when HTML intends to parse the HTML entity of the <a>DOM node, it finds that the URL resource type is Javascript, so the JS parser will be called to parse the JS code
<a href="javascript:\u0061\u006c\u0065\u0072\u0074('xss')">test</a>
copy
  • It is also possible to use decoding order for mixed encoding:
1. original code
<a href="javascript:alert('xss')">test</a>
2. right alert conduct JS coding( unicode coding)
<a href="javascript:\u0061\u006c\u0065\u0072\u0074('xss')">test</a>
3. right href in the label\u0061\u006c\u0065\u0072\u0074 conduct URL coding
<a href="javascript:%5c%75%30%30%36%31%5c%75%30%30%36%63%5c%75%30%30%36%35%5c%75%30%30%37%32%5c%75%30%30%37%34('xss')">test</a>
4. right href in the label javascript:%5c%75%30%30%36%31%5c%75%30%30%36%63%5c%75%30%30%36%35%5c%75%30%30%37%32%5c%75%30%30%37%34('xss')conduct HTML coding:
<a href="&#x6a;&#x61;&#x76;&#x61;&#x73;&#x63;&#x72;&#x69;&#x70;&#x74;&#x3a;&#x25;&#x35;&#x63;&#x25;&#x37;&#x35;&#x25;&#x33;&#x30;&#x25;&#x33;&#x30;&#x25;&#x33;&#x36;&#x25;&#x33;&#x31;&#x25;&#x35;&#x63;&#x25;&#x37;&#x35;&#x25;&#x33;&#x30;&#x25;&#x33;&#x30;&#x25;&#x33;&#x36;&#x25;&#x36;&#x33;&#x25;&#x35;&#x63;&#x25;&#x37;&#x35;&#x25;&#x33;&#x30;&#x25;&#x33;&#x30;&#x25;&#x33;&#x36;&#x25;&#x33;&#x35;&#x25;&#x35;&#x63;&#x25;&#x37;&#x35;&#x25;&#x33;&#x30;&#x25;&#x33;&#x30;&#x25;&#x33;&#x37;&#x25;&#x33;&#x32;&#x25;&#x35;&#x63;&#x25;&#x37;&#x35;&#x25;&#x33;&#x30;&#x25;&#x33;&#x30;&#x25;&#x33;&#x37;&#x25;&#x33;&#x34;&#x28;&#x27;&#x78;&#x73;&#x73;&#x27;&#x29;">test</a>
copy

How about it? After three codes, you can pop up the box, do you feel a little confused? It's best to try it yourself here to see if you can pop the frame.

The question is, since the browser will perform URL decoding on the links in the href, whether it can URL-encode the content in the href as a whole:

<a href="javascript:alert('xss')">test</a>
right href inside javascript:alert('xss')make once URL coding:
<a href="%6a%61%76%61%73%63%72%69%70%74%3a%61%6c%65%72%74%28%27%78%73%73%27%29">test</a>
copy

The above experimental results show that this is not feasible. Here is a detail of the URL parsing process. You cannot perform any encoding operations on the protocol type, otherwise the URL parser will think that it has no type, which will cause the above DOM node to be blocked. The encoded "javascript" is not decoded and of course not recognized by the URL parser. For example, http://www.baidu.com can be encoded as http://%77%77%77%2e%62%61%69%64%75%2e%63%6f%6d, but cannot The protocol is also encoded: %68%74%74%70%3a%2f%2f%77%77%77%2e%62%61%69%64%75%2e%63%6f%6d

The XSS code similar to the above parsing situation is:

<img src=xxx onerror="javascript:alert(1)">
copy

Slightly deform it to get:

1. right javascript:alert(1)conduct HTML coding
<img src=xxx onerror="&#x6a;&#x61;&#x76;&#x61;&#x73;&#x63;&#x72;&#x69;&#x70;&#x74;&#x3a;&#x61;&#x6c;&#x65;&#x72;&#x74;&#x28;&#x31;&#x29;">
2. right alert conduct JS coding
<img src=xxx onerror="javascript:\u0061\u006c\u0065\u0072\u0074(1)">
3. Mix the above two methods
<img src=xxx onerror="&#x6a;&#x61;&#x76;&#x61;&#x73;&#x63;&#x72;&#x69;&#x70;&#x74;&#x3a;&#x5c;&#x75;&#x30;&#x30;&#x36;&#x31;&#x5c;&#x75;&#x30;&#x30;&#x36;&#x63;&#x5c;&#x75;&#x30;&#x30;&#x36;&#x35;&#x5c;&#x75;&#x30;&#x30;&#x37;&#x32;&#x5c;&#x75;&#x30;&#x30;&#x37;&#x34;&#x28;&#x31;&#x29;">
copy

Try our ultimate killer again

<script>alert(1)</script>
copy

Encode it:

1. right alert conduct JS coding
<script>\u0077\u0069\u006e\u0064\u006f\u0077(1)</script>
Note: this is not correct alert(1)conduct HTML encoded because HTML Found this while parsing DOM node is script´╝îwill call JS Parse to parse the contents. But there is a little trick:
<svg><script>&#x61;&#x6c;&#x65;&#x72;&#x74;&#x28;&#x27;&#x78;&#x73;&#x73;&#x27;&#x29;</script>
copy

And a little detail:

For the JS coding just now, we only coded the alert in alert ('xss'), can we all code together here? The answer is No. When performing JavaScript parsing, characters or strings will only be decoded into string literals or identifier names. In the above example, when the JavaScript parser is working, it will decode \u0061\u006c\u0065\u0072u0074 to alert, while alert is a valid identifier name, it can be resolved normally. Like parentheses, double quotes, single quotes, and so on. These characters can only be treated as plain text and therefore cannot be executed, such as <script>alert ('aaa\u0027)</script>missing closed single quotes and failed to execute successfully after parsing.

There are so many things about XSS encoding. In short, to learn any knowledge, you need to understand the principle. Looking back, you will always find something you don't know.

0x03 reference

Posted by Sassy34 on Fri, 09 Sep 2022 21:02:25 +0300