Parser combiner for C + + meta programming

Absrtact: with the help of the constexpr capability of C + +, it is easy to construct parser combiner, which releases great potential for user-defined literal.

##Introduction

Not long ago, I saw a Talk on CppCon: [constexpr all the things]( https://www.youtube.com/watch?v=PJwd4JLYJJY ), I was shocked by this speech technology. I parsed json strings at compile time, and then proposed to construct regular expressions at compile time (build FSM at compile time). There was a lot of applause on the spot. Behind it was the powerful constexpr feature of C + +, which greatly improved the power of compile time calculation.

As early as C++11, there were constexpr features. At that time, there were many constraints, and there could only be one return statement. The only thing you could do was to implement some mathematical and hash functions recursively; When it comes to C++14, this constraint is released, allowing the community to produce a series of constexpr libraries like ordinary functions; In C++17, constexpr is more generalized, allowing 'if constexpr' to replace the SFINAE method of meta programming. Some algorithms in STL library support constexpr, and even lambda is constexpr by default; In C++20, it is even more difficult to imagine that constexpr new is supported. STL vector s are all constexpr. If constexpr allocator and constexpr destructor are used, all constexpr containers can be unified.

With the help of the constexpr capability of C + +, it is easy to construct a Parser combiner, and the implementation of a Parser is not so complicated. It releases great potential for user-defined strings, which is also the focus of this paper.

##What is Parser

Parser is a parser function, which inputs a string and outputs the parsed type value set. The function signature is as follows:

```haskell
Parser a:: String -> [(a, String)]
```

For simplicity, here we consider outputting only zero or one type value result instead of a set. Then the signature is as follows:

```haskell
Parser a:: String -> Maybe (a, String)
```

For example, a numeric Parser parses the input string '123456', and the output result is' just (1, '23456') ', that is, the numeric 1 and the remaining string' 23456 'are obtained, which can be used by the next Parser; If parsing fails, output 'None'.

The function signature corresponding to C + + is as follows:

```cpp
// Parser a :: String -> Maybe (a, String)
using ParserInput = std::string_view;
template <typename T>
using ParserResult = std::optional<std::pair<T, ParserInput>>;
template <typename T>
using Parser = auto(*)(ParserInput) -> ParserResult<T>;
```

This is the definition of Parser.

According to the definition, several basic parsers can be implemented, such as matching a given character:

```cpp
constexpr auto makeCharParser(char c) {
    // CharParser :: Parser Char
    return [=](ParserInput s) -> ParserResult<char> {
        if (s.empty() || c != s[0]) return std::nullopt;
        return std::make_pair(s[0], ParserInput(s.begin() + 1, s.size() - 1));
    };
};
```

`makeCharParser ` is equivalent to a factory. Given the character 'c', create a Parser matching 'c'.

Match characters in the given set:

```cpp
constexpr auto oneOf(std::string_view chars) {
    // OneOf :: Parser Char
    return [=](ParserInput s) -> ParserResult<char> {
        if (s.empty() || chars.find(s[0]) == std::string::npos) return std::nullopt;
        return std::make_pair(s[0], ParserInput(s.begin() + 1, s.size() - 1));
    };
}
```

##What is a parser combiner

Parser is the smallest unit that can be combined. Any complex parser can be combined between parsers, and parser combiner is a high-order function, which inputs a series of parsers and outputs a new composite parser.

According to the definition, one combiner can combine two parsers, and calculate new results according to the results of the two parsers, so as to obtain a new Parser:

```cpp
// combine :: Parser a -> Parser b -> (a -> b -> c) -> Parser c
template<typename P1, typename P2, typename F,
    typename R = std::invoke_result_t<F, Parser_t<P1>, Parser_t<P2>>>
constexpr auto combine(P1&& p1, P2&& p2, F&& f) {
    return [=](ParserInput s) -> ParserResult<R> {
        auto r1 = p1(s);
        if (!r1) return std::nullopt;
        auto r2 = p2(r1->second);
        if (!r2) return std::nullopt;
        return std::make_pair(f(r1->first, r2->first), r2->second);
    };
};
```

Since C + + supports operator overloading, you can overload a binary operator to combine two parsers. For example, the result of taking one of the two parsers produces a new Parser:

Take the result of the Parser on the left:

```cpp
// operator> :: Parser a -> Parser b -> Parser a
template<typename P1, typename P2>
constexpr auto operator>(P1&& p1, P2&& p2) {
    return combine(std::forward<P1>(p1),
                   std::forward<P2>(p2),
                   [](auto&& l, auto) { return l; });
};
```

Take the result of the Parser on the right:

```cpp
// operator< :: Parser a -> Parser b -> Parser b
template<typename P1, typename P2>
constexpr auto operator<(P1&& p1, P2&& p2) {
    return combine(std::forward<P1>(p1),
                   std::forward<P2>(p2),
                   [](auto, auto&& r) { return r; });
};
```

Sometimes it is necessary to match the same Parser multiple times, similar to the ` * 'operation of regular expression. This operation can be regarded as' fold'. Execute the Parser multiple times until the matching fails, and each result is passed to a function operation:

```cpp
// foldL :: Parser a -> b -> (b -> a -> b) -> ParserInput -> ParserResult b
template<typename P, typename R, typename F>
constexpr auto foldL(P&& p, R acc, F&& f, ParserInput in) -> ParserResult<R> {
    while (true) {
        auto r = p(in);
        if (!r) return std::make_pair(acc, in);
        acc = f(acc, r->first);
        in = r->second;
    }
};
```

With the 'fold' function, you can easily implement the function to match 'many' any number of times and 'atLeast' at least once:

```cpp
// many :: Parser a -> Parser monostate
template<typename P>
constexpr auto many(P&& p) {
    return [p=std::forward<P>(p)](ParserInput s) -> ParserResult<std::monostate> {
        return detail::FoldL(p, std::monostate{}, [](auto acc, auto) { return acc; }, s);
    };
};
```
```cpp
// atLeast :: Parser a -> b -> (b -> a -> b) -> Parser b
template<typename P, typename R, typename F>
constexpr auto atLeast(P&& p, R&& init, F&& f) {
    static_assert(std::is_same_v<std::invoke_result_t<F, R, Parser_t<P>>, R>,
            "type mismatch!");
    return [p=std::forward<P>(p),
           f=std::forward<F>(f),
           init=std::forward<R>(init)](ParserInput s) -> ParserResult<R> {
        auto r = p(s);
        if (!r) return std::nullopt;
        return detail::foldL(p, f(init, r->first), f, r->second);
    };
};
```

Another operation is to match zero to once, which is similar to that of regular expression` Operation, here I define it as' option 'operation:

```cpp
// option :: Parser a -> a -> Parser a
template<typename P, typename R = Parser_t<P>>
constexpr auto option(P&& p, R&& defaultV) {
    return [=](ParserInput s) -> ParserResult<R> {
        auto r = p(s);
        if (! r) return make_pair(defaultV, s);
        return r;
    };
};
```

With the above basic operations, let's see how to use them.

##Actual combat

###Analytical value

There are many template meta programming in the project. Before C++17, the template Dependent type (non type parameter) does not support double, but C++20 supports double. The temporary scheme is to use 'template < char C> struct NumWrapper {};` Simulate the type of double, and when you need to obtain its value, you need to parse the string, which should be determined at compile time.

The first is the matching symbol ` + / `. If there is no symbol, it is considered to be ` + `:

```cpp
constexpr auto sign = Option(OneOf("+-"), '+');
```

The second is the integer part, which may or may not be. If not, it is considered as 0:

```cpp
constexpr auto number = AtLeast(OneOf("1234567890"), 0l, [](long acc, char c) -> long {
    return acc * 10 + (c - '0');
});
constexpr auto integer = Option(number, 0l);
```

Then the decimal point, If there is no decimal point, in order not to lose precision, a 'long' value is returned.

```cpp
constexpr auto point = MakeCharParser('.');
// integer
if (! (sign < integer < point)(in)) {
    return Combine(sign, integer, [](char sign, long number) -> R {
        return sign == '+' ? number : -number;
    })(in);
}
```

If there is a decimal point, it is regarded as a floating point number and its' double 'value is returned.

```cpp
// floating
constexpr auto decimal = point < Option(number, 0l);
constexpr auto value = Combine(integer, decimal, [](long integer, long decimal) -> double {
    double d = 0.0;
    while (decimal) {
        d = (d + (decimal % 10)) * 0.1;
        decimal /= 10;
    }
    return integer + d;
});
return Combine(sign, value, [](char sign, double d) -> R { return sign == '+' ? d : -d; })(in);
```
Because of this Parser Possible return`long`perhaps`double`Type, so it can be unified into and types`std::variant`: 
```cpp
constexpr auto ParseNum() {
    using R = std::variant<double, long>;
    return [](ParserInput in) -> ParserResult<R> {
        // ...
    };
}
```

Finally, our 'NumWrapper' is implemented as follows, which can be mixed into the template type system:

```cpp
template<char... Cs>
constexpr std::array<char, sizeof...(Cs)> ToArr = {Cs...};
template<char ...Cs>
class NumberWrapper {
public:
    constexpr static auto numStr = ToArr<Cs...>;
    constexpr static auto res = ParseNum()(std::string_view(numStr.begin(), numStr.size()));
    static_assert(res.has_value() && res->second.empty(), "parse failed!");
public:
    constexpr static auto value = std::get<res->first.index()>(res->first); // long or double
}
```

If it is only used to parse numbers, it will kill the chicken with an ox knife, because in the version before 'parser combiner', I finished parsing in an ordinary 'constexpr' function. The code is very boring, but now I may want to go back to the code.

###Jason Analysis Guide

The theme of CppCon this time is to parse the 'json' string during compilation. Of course, use 'string directly_ View ` just carry the string. Then construct some constexpr containers, such as constexpr vector with fixed length. Since it is 17 years old, this is the only way to do it if constexpr new is not supported. With constexpr vector, you can construct a map container, which is also a very simple pair vector set.

Then, a parser combiner is proposed to parse the string and 'fmap' into the json data structure.

When it was first implemented, the json data structure was also a large 'template < size'_ t Depth> struct json_ Value;` Template bearing, so that only the maximum number of recursive layers can be specified, which is not practical enough. Then talker came up with a very clever way to remove the layer number constraint, that is, first recursively scan 'sizes()' and calculate the number of all values, so as to determine how many 'value' containers are needed for storage, and then calculate the string length. Due to the influence of 'UTF8' and escape string, the length to be parsed may actually be less than the input length. After having a certain space, perform the second recursion ` value_ recur<NumObjects, StringSize>::value_ Parser() ` scan, and fill in the 'value' data structure every time the complete value is parsed. Because arrays are similar to objects, they may be nested. At this time, the third pass of recursion ` extend is performed_ recur<>::value_ Parser() ` scan, do a width first search, determine the number of outermost elements, and then parse the filled values in turn.

 

Click follow to learn about Huawei's new cloud technology for the first time~

Tags: C++ string

Posted by Canadian on Mon, 16 May 2022 17:16:38 +0300