AST abstract syntax tree

AST abstract syntax tree

why

The purpose of mainstream project plug-ins: javascript translation, code compression, css preprocessing, eslint, prettier, etc. are all based on AST.

what

according to the grammar of a programming language, each AST node corresponds to an item of a source code. (according to the syntax of the programming language, each ast node corresponds to a source code item.)

demo

Link address: astexplorer.net
AST parsing tool

js syntax

function square(n) {
  return n * n;
}

ast syntax tree

// Parser acorn-8.0.1
{
  "type": "Program",
  "start": 0,
  "end": 38,
  "body": [
    {
      "type": "FunctionDeclaration",
      "start": 0,
      "end": 38,
      "id": {
        "type": "Identifier",
        "start": 9,
        "end": 15,
        "name": "square"
      },
      "expression": false,
      "generator": false,
      "async": false,
      "params": [
        {
          "type": "Identifier",
          "start": 16,
          "end": 17,
          "name": "n"
        }
      ],
      "body": {
        "type": "BlockStatement",
        "start": 19,
        "end": 38,
        "body": [
          {
            "type": "ReturnStatement",
            "start": 23,
            "end": 36,
            "argument": {
              "type": "BinaryExpression",
              "start": 30,
              "end": 35,
              "left": {
                "type": "Identifier",
                "start": 30,
                "end": 31,
                "name": "n"
              },
              "operator": "*",
              "right": {
                "type": "Identifier",
                "start": 34,
                "end": 35,
                "name": "n"
              }
            }
          }
        ]
      }
    }
  ],
  "sourceType": "module"
}

Get ast from plain text (through compiler)

  • lexical analysis

    scanner. It reads our code and combines them into tokens according to predetermined rules At the same time, it removes whitespace, comments, etc. Finally, the whole code will be divided into a list of tokens (or one-dimensional array). When lexical analysis of the source code, it will read the code letter by letter. When it encounters spaces, operators, or special symbols, it will think that a session has been completed.

  • Syntax parsing, also known as parser

    It transforms the array from lexical analysis into tree expression. At the same time, verify the syntax, syntax errors, and throw syntax errors.
    When generating the tree, the parser will delete some unnecessary identification tokens (such as incomplete parentheses). Therefore, AST does not match the source code 100%, but we can already know how to deal with it. As an aside, the parser covers 100% of all code structures, and the spanning tree is called CST (specific syntax tree)

More compiler knowledge

The super tiny compiler warehouse address

Convert Lisp into C language

LangSandbox warehouse address

Create your own language, compile it into C language or machine language, and finally run it.

AST generated by third-party library

Focus on Babylon

Babylon
Babylon is a JavaScript parser used in Babel.Support for JSX, Flow, Typescript.

babel

babel is a javascript compiler. In general, it is divided into three stages: generating and translating. We can give babel some javascript code, which modifies the code and generates a new return code. The process is to create AST, traverse the tree, modify tokens, and finally generate the latest code from AST.

babel parsing generation

1. Generate syntax tree using babylon parsing code

import * as babylon from "babylon";
const code = `
  const abc = 5;
`;
const ast = babylon.parse(code);

Spanning tree results:

{
  "type": "File",
  "start": 0,
  "end": 18,
  "loc": {
    "start": {
      "line": 1,
      "column": 0
    },
    "end": {
      "line": 3,
      "column": 0
    }
  },
  "program": {
    "type": "Program",
    "start": 0,
    "end": 18,
    "loc": {
      "start": {
        "line": 1,
        "column": 0
      },
      "end": {
        "line": 3,
        "column": 0
      }
    },
    "sourceType": "script",
    "body": [
      {
        "type": "VariableDeclaration",
        "start": 3,
        "end": 17,
        "loc": {
          "start": {
            "line": 2,
            "column": 2
          },
          "end": {
            "line": 2,
            "column": 16
          }
        },
        "declarations": [
          {
            "type": "VariableDeclarator",
            "start": 9,
            "end": 16,
            "loc": {
              "start": {
                "line": 2,
                "column": 8
              },
              "end": {
                "line": 2,
                "column": 15
              }
            },
            "id": {
              "type": "Identifier",
              "start": 9,
              "end": 12,
              "loc": {
                "start": {
                  "line": 2,
                  "column": 8
                },
                "end": {
                  "line": 2,
                  "column": 11
                },
                "identifierName": "abc"
              },
              "name": "abc"
            },
            "init": {
              "type": "NumericLiteral",
              "start": 15,
              "end": 16,
              "loc": {
                "start": {
                  "line": 2,
                  "column": 14
                },
                "end": {
                  "line": 2,
                  "column": 15
                }
              },
              "extra": {
                "rawValue": 5,
                "raw": "5"
              },
              "value": 5
            }
          }
        ],
        "kind": "const"
      }
    ],
    "directives": []
  },
  "comments": [],
  "tokens": [
    {
      "type": {
        "label": "const",
        "keyword": "const",
        "beforeExpr": false,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "value": "const",
      "start": 3,
      "end": 8,
      "loc": {
        "start": {
          "line": 2,
          "column": 2
        },
        "end": {
          "line": 2,
          "column": 7
        }
      }
    },
    {
      "type": {
        "label": "name",
        "beforeExpr": false,
        "startsExpr": true,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null
      },
      "value": "abc",
      "start": 9,
      "end": 12,
      "loc": {
        "start": {
          "line": 2,
          "column": 8
        },
        "end": {
          "line": 2,
          "column": 11
        }
      }
    },
    {
      "type": {
        "label": "=",
        "beforeExpr": true,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": true,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "value": "=",
      "start": 13,
      "end": 14,
      "loc": {
        "start": {
          "line": 2,
          "column": 12
        },
        "end": {
          "line": 2,
          "column": 13
        }
      }
    },
    {
      "type": {
        "label": "num",
        "beforeExpr": false,
        "startsExpr": true,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "value": 5,
      "start": 15,
      "end": 16,
      "loc": {
        "start": {
          "line": 2,
          "column": 14
        },
        "end": {
          "line": 2,
          "column": 15
        }
      }
    },
    {
      "type": {
        "label": ";",
        "beforeExpr": true,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "start": 16,
      "end": 17,
      "loc": {
        "start": {
          "line": 2,
          "column": 15
        },
        "end": {
          "line": 2,
          "column": 16
        }
      }
    },
    {
      "type": {
        "label": "eof",
        "beforeExpr": false,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "start": 18,
      "end": 18,
      "loc": {
        "start": {
          "line": 3,
          "column": 0
        },
        "end": {
          "line": 3,
          "column": 0
        }
      }
    }
  ]
}

2. Using babel's converter transforming syntax tree syntax

import traverse from "babel-traverse";
traverse(ast, {
  enter(path) {
    if (path.node.type === "Identifier") {
      path.node.name = path.node.name
        .split("")
        .reverse()
        .join("");
    }
  }
});
{
  "type": "File",
  "start": 0,
  "end": 18,
  "loc": {
    "start": {
      "line": 1,
      "column": 0
    },
    "end": {
      "line": 3,
      "column": 0
    }
  },
  "program": {
    "type": "Program",
    "start": 0,
    "end": 18,
    "loc": {
      "start": {
        "line": 1,
        "column": 0
      },
      "end": {
        "line": 3,
        "column": 0
      }
    },
    "sourceType": "script",
    "body": [
      {
        "type": "VariableDeclaration",
        "start": 3,
        "end": 17,
        "loc": {
          "start": {
            "line": 2,
            "column": 2
          },
          "end": {
            "line": 2,
            "column": 16
          }
        },
        "declarations": [
          {
            "type": "VariableDeclarator",
            "start": 9,
            "end": 16,
            "loc": {
              "start": {
                "line": 2,
                "column": 8
              },
              "end": {
                "line": 2,
                "column": 15
              }
            },
            "id": {
              "type": "Identifier",
              "start": 9,
              "end": 12,
              "loc": {
                "start": {
                  "line": 2,
                  "column": 8
                },
                "end": {
                  "line": 2,
                  "column": 11
                },
                "identifierName": "abc"
              },
              "name": "cba"
            },
            "init": {
              "type": "NumericLiteral",
              "start": 15,
              "end": 16,
              "loc": {
                "start": {
                  "line": 2,
                  "column": 14
                },
                "end": {
                  "line": 2,
                  "column": 15
                }
              },
              "extra": {
                "rawValue": 5,
                "raw": "5"
              },
              "value": 5
            }
          }
        ],
        "kind": "const"
      }
    ],
    "directives": []
  },
  "comments": [],
  "tokens": [
    {
      "type": {
        "label": "const",
        "keyword": "const",
        "beforeExpr": false,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "value": "const",
      "start": 3,
      "end": 8,
      "loc": {
        "start": {
          "line": 2,
          "column": 2
        },
        "end": {
          "line": 2,
          "column": 7
        }
      }
    },
    {
      "type": {
        "label": "name",
        "beforeExpr": false,
        "startsExpr": true,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null
      },
      "value": "abc",
      "start": 9,
      "end": 12,
      "loc": {
        "start": {
          "line": 2,
          "column": 8
        },
        "end": {
          "line": 2,
          "column": 11
        }
      }
    },
    {
      "type": {
        "label": "=",
        "beforeExpr": true,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": true,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "value": "=",
      "start": 13,
      "end": 14,
      "loc": {
        "start": {
          "line": 2,
          "column": 12
        },
        "end": {
          "line": 2,
          "column": 13
        }
      }
    },
    {
      "type": {
        "label": "num",
        "beforeExpr": false,
        "startsExpr": true,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "value": 5,
      "start": 15,
      "end": 16,
      "loc": {
        "start": {
          "line": 2,
          "column": 14
        },
        "end": {
          "line": 2,
          "column": 15
        }
      }
    },
    {
      "type": {
        "label": ";",
        "beforeExpr": true,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "start": 16,
      "end": 17,
      "loc": {
        "start": {
          "line": 2,
          "column": 15
        },
        "end": {
          "line": 2,
          "column": 16
        }
      }
    },
    {
      "type": {
        "label": "eof",
        "beforeExpr": false,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "start": 18,
      "end": 18,
      "loc": {
        "start": {
          "line": 3,
          "column": 0
        },
        "end": {
          "line": 3,
          "column": 0
        }
      }
    }
  ]
}

3. Generator code using babel generator

import generate from "@babel/generator";
const newCode = generate(ast).code;

// newCode => const cba = 5;
babel plugins

In the above steps, the first step (analysis) and the third step (generation) have babel processing.
When developing the Babel plugin, we only need to describe the "visitors" that transform your AST node.

// my-babel-plugin.js
module.exports = function() {
  return {
    visitor: {
      Identifier(path) {
        const name = path.node.name;
        console.log(name);
        path.node.name = name
          .split("")
          .reverse()
          .join("");
      }
    }
  };
};
// In Babel config. JS, restart the project to take effect
// plugins: ["./src/plugins/mybabelplugin.js"]

Learning Babel plug-in making Babel Handbook
Chinese plug-in manual

Automatic code refactoring tool, artifact JSCodeshift

For example, you want to replace all old anonymous functions and turn them into Lambda expressions (arrow functions).

// transform
load().then(function(response)) {
  return response.data;
}
// to
load().then(response => response.data)

The above operation code editor may not be able to do so, because this is not a simple find and replace operation. At this time, jscodeshift can be used.
If you want to create a framework that automatically migrates your code from the old framework to the new framework, this is a nice way.

jscodeshift

jscodeshift is a toolkit for running codemods on multiple JavaScript or TypeScript files.

react-codemod
This repository contains a collection of codemod scripts for use with JSCodeshift that help update React APIs.
This repository contains a set of codemod scripts for jscodeshift and for updating the React api.

Prettier
// transform
foo(reallyLongArg(), omgSoManyParameters(), IShouldRefactorThis()),isThereSeriouselyAnotherOne());
// to
foo {
  reallyLongArg(),
  omgSoManyParameters(), 
  IShouldRefactorThis(), 
  isThereSeriouselyAnotherOne()
};
// Prettier formats our code. It adjusts long sentences, arranges spaces, parentheses, etc.

<A prettier printer>

Finally

js2flowchart online conversion preview address
js2flowchart warehouse address

It transforms js code into svg flowchart
This is a good example because it shows you that you can do whatever you want when you have AST. It's not necessary to convert AST back to string code. You can draw a flow chart or other things you want through it.

What are the usage scenarios for js2flowchart? Through the flow chart, you can explain your code or write documents for your code; Learn other people's code through visual interpretation; Create a flowchart for a simple description of each processing process through a simple JS syntax.
You can also use it in code, or through the CLI, you just need to point to the file you want to generate SVG. In addition, there is the VS Code plug-in (linked in the project readme)

First, we parse the code into ast. Then, we traverse the AST and generate another tree, which I call workflow tree. It deletes many unimportant tokens, but puts key blocks together, such as functions, loops, conditions, etc. After that, we traverse the workflow tree and create a shape tree. The nodes of each shape tree contain information such as visualization type, location and connection in the tree. In the last step, we traverse all shapes, generate the corresponding SVG, and merge all SVGs into one file
The follow-up will continue to update, learning...

Tags: Javascript Front-end AST

Posted by Bojan86 on Thu, 05 May 2022 08:24:22 +0300