LSM tree - leveldb bloom filter

LSM tree - leveldb bloom filter


The bloom filter is a bit similar to a hash table, but it is more efficient than a hash table, because it uses bits to determine whether the key exists. The bloom filter brings certain side effects while efficiently searching whether the key exists - it does not guarantee the existence of the key, so it is only applicable to systems that allow a certain fault tolerance rate.

In a word: Bloom Filter is a probability based data structure, which can only tell us that an element is definitely not in the set or may be in the set.

The suspended thing of Bloom filter is that it does not guarantee that the elements are 100% in a set, so it is suitable for businesses with certain fault tolerance. Many contents about its theory and practice are referred to or directly extracted from online materials and their own understanding. If there is any error, please correct it.


The theoretical basis and related articles are similar, which is summarized from this article written by the great God: Concept and principle of Bloom Filter_ jiaomeng's blog - CSDN blog_ bloomfilter , concise and easy to understand. In addition, it is suggested to refer to the original papers. Many of the contents here are actually summarized from the papers written by foreigners.

In this website, you can view the actual operation effect through JS code:
Bloom Filters by Example (

Note that two simple hash functions Fnv and Murmur are used in the case.

For a bloom filter, it is usually defined as follows:

  1. n key s.
  2. The space v of m bits is initialized to 0.

  1. Bloom Filter theory suggests using k mutually independent hash functions to represent the set of n elements of S={x1, x2,..., xn}. For any element x, the mapping position hi(x) of the ith Hash Function will be set to 1 (1 ≤ i ≤ k).

    If multiple hash function positions are 1, only the first hash result is used. For example, the position of the second "1" from left to right in the following figure is the selected position of the final hash function.

  1. In order to judge whether the current element is in the set, it is necessary to perform a hash function for the current element Y for k times. If all hi(y) times are 1 (I < = I < = the times in k are 1), it is considered that the current element Y may be in the set, otherwise it absolutely does not exist.

    In the following content, y1 is considered not to exist in the set because there is a result with hash 0, while y2 all hashes fall on 1, so it can be considered that there may be a set.

The theoretical content of Bloom filter is relatively simple, and the key part is the selection of hash function and the balance of error rate.

Error rate calculation

First, the bloom filter needs to pay attention to the bit length, that is, the array length. Usually, a large bloom filter has a smaller error rate than a small bloom filter.

The calculation formula of misjudgment rate is: (1-e^(-kn/m))^k.

The derivation process is as follows. The process is not particularly important. Just understand the final formula:

n is the number of key s and m is the number of bits (that is, the size of the array)

According to this formula, it can be found that the capacity of the data set that may be inserted needs to be determined first_ n_, Then adjust k and_ m_ To configure filters for your application, the larger m, the larger k, and the smaller n, the lower the misjudgment rate.

Considering the probability that p is set to 0, it can be considered that when half of m is set to 1 and half is set to 0, the misjudgment rate is the lowest. Note that this sentence will be introduced in detail in the final derivation.

How many hash functions?

According to the error rate calculation conclusion, there is another problem here, that is, how many hash functions should be selected. Too many hash functions are easy to reduce the calculation efficiency and affect the performance, and too few will increase the misjudgment rate.

Happily, this formula has also been derived:

The optimal number of hash function k is ln2 * (m/n).

You can determine the size of Bloom filter through the following steps:

  1. Sure_ n_ Scope of change
  2. Select_ m_ Value of
  3. Calculate_ k_ Optimal value of
  4. For a given n_ m_, k calculate the error rate. If the error rate is unacceptable, go back to step 2, otherwise it will end

Time complexity and space complexity of Bloom filter?

The time complexity of insertion and test operations is O(k), because if you want to insert or query an element, you only need to perform K times of function operations on the element.

It is difficult to estimate the spatial complexity because of the error rate. If it is difficult to estimate the size of a filter, it is best to choose a hash table or an extensible Bloom filter.

be careful ⚠️: The filter size of LevelDB cannot be less than 64 bit array.

How many digits in m are 1

Directly remember the conclusion: it can be considered that when half of m is set to 1 and half is set to 0, the misjudgment rate is the lowest.

levelDB implementation

The essence of LevelDB's bloom filter is the hash function. It achieves the performance of multiple hashes through one hash and ensures that the misjudgment rate is limited.

The specific code implementation can be read bloom cc: leveldb/ at main · google/leveldb · GitHub introduction

Without going deep into the source code, let's take a look at the author's index How to explain MD:
leveldb/ at main · google/leveldb · GitHub

Due to the organization of leveldb data on the disk, a call to Get() may involve multiple reads from the disk, so the optional FilterPolicy mechanism can be used to greatly reduce the number of reads from the disk. In fact, this refers to the use of Bloom filter to improve the filtering efficiency.

leveldb::Options options;

options.filter_policy = NewBloomFilterPolicy(10);

leveldb::DB* db;

leveldb::DB::Open(options, "/tmp/testdb", &db);

... use the database ...

delete db;
// Note that when you close leveldb, you need to manually free the memory space occupied by the filter
delete options.filter_policy;

The previous code links the filtering strategy based on Bloom filter to the database.

The filtering based on Bloom filter depends on reserving a certain amount of data for each key in memory (in this case, each key is 10bits, because this is the parameter we pass to NewBloomFilterPolicy).

This filter will reduce the number of unnecessary disk IO required for Get() calls and improve the efficiency by about 100 times. Increasing the number of bits of each key will lead to greater reduction, but the cost is more memory. We suggest that those applications whose working sets are not suitable for in memory are not suitable for in memory use, and those applications with a large number of random reads set a filtering strategy.


The whole filter passes through the external interface FilterPolicy to reduce the call time of DB::Get() function. The bloom filter is usually used internally by default.

The following is the code of interface definition:

namespace leveldb {

class Slice;

class LEVELDB_EXPORT FilterPolicy {


virtual ~FilterPolicy();

// Return the name of this policy. Note that if the filter encoding

// changes in an incompatible way, the name returned by this method

// must be changed. Otherwise, old incompatible filters may be

// passed to methods of this type.

/*Returns the name of the policy. Note that if the encoding of the filter changes, the name returned by this method must be changed. Otherwise, incompatible old filters may be passed to this type of method.
virtual const char* Name() const = 0;


// keys[0,n-1] contains a list of keys (potentially with duplicates)

// that are ordered according to the user supplied comparator.

// Append a filter that summarizes keys[0,n-1] to *dst.


// Warning: do not change the initial contents of *dst. Instead,

// append the newly constructed filter to *dst.

/* keys[0,n-1] Contains a list of keys (there may be duplicates). Sort according to the comparator provided by the user. Append a filter summarizing keys[0,n-1] to * dst.
 Warning: do not change the initial content of * dst. Instead, append the newly built filter to * dst.
virtual void CreateFilter(const Slice* keys, int n,

std::string* dst) const = 0;


// "filter" contains the data appended by a preceding call to

// CreateFilter() on this class. This method must return true if

// the key was in the list of keys passed to CreateFilter().

// This method may return true or false if the key was not on the

// list, but it should aim to return false with a high probability.
 "filter "Contains the data attached by the previous call to createfilter () of this class. If the key is in the list of keys passed to CreateFilter(), the method must return true. If the key is not in the list, the method may return true or false, but it should aim at a high probability of returning false.
virtual bool KeyMayMatch(const Slice& key, const Slice& filter) const = 0;


// There are many notes in this part, which will be introduced later

LEVELDB_EXPORT const FilterPolicy* NewBloomFilterPolicy(int bits_per_key);


} // namespace leveldb



The specific code explanation is put in the comments. What is worth paying attention to is the part of creating the filter and the part of hash function. This part introduces the source code of the filter itself, and the key hash function is put in the following section.

// Copyright (c) 2012 The LevelDB Authors. All rights reserved.

// Use of this source code is governed by a BSD-style license that can be

// found in the LICENSE file. See the AUTHORS file for names of contributors.


#include "leveldb/filter_policy.h"
#include "leveldb/Slice.h"

#include "util/hash.h"


namespace leveldb {


namespace {

// hash function 
static uint32_t BloomHash(const Slice& key) {
    // Note 0xbc9f1d34
    return Hash(, key.size(), 0xbc9f1d34);



class BloomFilterPolicy : public FilterPolicy {

    explicit BloomFilterPolicy(int bits_per_key) : bits_per_key_(bits_per_key) {
        // We intentionally round down to reduce probing cost a little bit
        // We intend to round off to reduce the detection cost a little
        k_ = static_cast<size_t>(bits_per_key * 0.69); // 0.69 =~ ln(2)
        if (k_ < 1) k_ = 1;
        if (k_ > 30) k_ = 30;


const char* Name() const override { return "leveldb.BuiltinBloomFilter2"; }


void CreateFilter(const Slice* keys, int n, std::string* dst) const override {

    // Compute bloom filter size (in both bits and bytes)
    // Calculate the size of the cloth filter (including bits and bytes).
    size_t bits = n * bits_per_key_;

    // For small n, we can see a very high false positive rate. Fix it
    // by enforcing a minimum bloom filter length.
    // For small n, we can see a very high misjudgment rate. This problem is solved by enforcing the minimum Bloom filter length.
    // tip: This is what we said before. If the bit number is too small, it will increase the misjudgment rate
    if (bits < 64) bits = 64;
    size_t bytes = (bits + 7) / 8;
    // At least 64 bits
    bits = bytes * 8;
    const size_t init_size = dst->size();
    // Resize the container to contain_ n_ Element.
    dst->resize(init_size + bytes, 0);
    dst->push_back(static_cast<char>(k_)); // Remember # of probes in filter
    char* array = &(*dst)[init_size];
    for (int i = 0; i < n; i++) {
        // Use double-hashing to generate a sequence of hash values.
        // See analysis in [Kirsch,Mitzenmacher 2006].
        // Double hashing is used to generate a series of hash values. See the analysis in [Kirsch,Mitzenmacher 2006].
        // tips: see resources for the original paper - > LNCS 4168 - less hashing, same performance: building a better bloom filter (Harvard. EDU)
        uint32_t h = BloomHash(keys[i]);
        const uint32_t delta = (h >> 17) | (h << 15); // //Rotate 17 bits to the right
        for (size_t j = 0; j < k_; j++) {
        const uint32_t bitpos = h % bits;
        array[bitpos / 8] |= (1 << (bitpos % 8));
        h += delta;



        Key functions:
    bool KeyMayMatch(const Slice& key, const Slice& bloom_filter) const override {

    const size_t len = bloom_filter.size();
    // A 1-bit filter is meaningless
    if (len < 2) return false;

    const char* array =;
    const size_t bits = (len - 1) * 8;  

    // Use the encoded k so that we can read filters generated by
    // bloom filters created using different parameters.
    // Use the encoded k so that we can read the bloom filter created with different parameters.
    const size_t k = array[len - 1];

    if (k > 30) {

        // If the number of sstables exceeds the number of k we set, it will directly return true without filtering out the SSTable

        // Reserved for potentially new encodings for short bloom filters.
        // Consider it a match.
        // Short Bloom filters reserved for possible new encodings. Think of it as a match.
        return true;

    // Key: hash function
    uint32_t h = BloomHash(key);
    // Right 17 bit
    const uint32_t delta = (h >> 17) | (h << 15); // Rotate right 17 bits
    for (size_t j = 0; j < k; j++) {
        const uint32_t bitpos = h % bits;
        if ((array[bitpos / 8] & (1 << (bitpos % 8))) == 0) return false;
            h += delta;
        return true;


        size_t bits_per_key_;
        size_t k_;
    } // namespace

    // New bloom filter strategy (see note below)
    const FilterPolicy* NewBloomFilterPolicy(int bits_per_key) {
        return new BloomFilterPolicy(bits_per_key);

} // namespace leveldb

Newpolicybloomfilter function

Why is it called the new bloom filter strategy? You can see the comments given by the author:

Returns a new filtering policy, which uses a Bloom filter, and each key has about the specified number of bits of each key. The number of bits of the key is 10.
The best value of the final test is 10, which will produce a filter with a false positive rate of 1%.

Note that the memory of related objects must be released manually after use:

The caller must delete the result after any database that uses the result is closed. After the database is closed, the caller must delete the result.

If a user-defined comparator is used, it ignores some parts of the compared key and some parts of the compared key. At this time, NewBloomFilterPolicy() is not allowed, but a user-defined FilterPolicy implementation must be provided, because the original filter also ignores the corresponding parts of the key.

For example, if the comparator ignores the trailing space, using a new FilterPolicy (such as NewBloomFilterPolicy) will cause an error in the original behavior of FilterPolicy (such as NewBloomFilterPolicy), because it will not ignore the trailing space of the key.

As mentioned before, one of the key codes is a high-quality hash function. Here is hash Relevant code of CC:

Note that the pseudo-random number seed used by the hash function here is 0xbc9f1d34, and the corresponding hexadecimal number is 9134.

Here we can also see that leveldb uses its own high-quality hash function to make one function replace N functions. This is an adjustment to the theory. The internal control also controls the length of leveldb to be at least 64 bit s.

data: bit digit
n: n key
seed: Seed, actually fixed as 0xbc9f1d34
uint32_t Hash(const char* data, size_t n, uint32_t seed) {

    // Similar to murmur hash
    // Noise like hash
    const uint32_t m = 0xc6a4a793;
    const uint32_t r = 24;
    // limit points to the next position of the last position of the char * array, similar to the iterator end()
    const char* limit = data + n;
    uint32_t h = seed ^ (n * m);
    // Pick up four bytes at a time
    // Take 4 bytes as one parsing    
    while (data + 4 <= limit) {
        //  Decode the first 4 bytes each time until the last is less than 4 bytes

        // DecodeFixed32 low level Get Version, which reads directly from the character buffer without any boundary check. The recent clang and gcc optimize it into a mov / ldr instruction.
        uint32_t w = DecodeFixed32(data);
        data += 4;
        h += w;
        h *= m;
        h ^= (h >> 16);
    // Pick up remaining bytes
    // Process remaining bytes    
    switch (limit - data) {
        // Convert the remaining bytes to uint32_t inside
        case 3:
            // static_cast represents a benign transformation, meaning represents
            h += static_cast<uint8_t>(data[2]) << 16;
            // FALLTHROUGH_ The integrated macro can be used to annotate the hidden gap between switch labels. The real definition should be provided externally. This is a backup version for unsupported compilers.
            #ifndef FALLTHROUGH_INTENDED

            #define FALLTHROUGH_INTENDED \
            do { \
            } while (0)
        case 2:
            h += static_cast<uint8_t>(data[1]) << 8;
        case 1:
            h += static_cast<uint8_t>(data[0]);
            h *= m;
            h ^= (h >> r);
    return h;



It can be regarded as a simple string sds design similar to Redis, but the language is c + +.

Relevant explanations can be found in the following documents:

leveldb/ at main · google/leveldb · GitHub

Slice is a simple data structure that contains a pointer and size into some external storage. Users of slice must ensure that they do not use the slice after the corresponding external storage is deallocated (the memory must be released manually after running out).

Multiple threads can call const methods on a Slice without external synchronization (thread safe object), but if any thread may call non const methods, all threads accessing the same Slice must use external synchronization.

C + + or C-like strings can be simply converted to Slice:

leveldb::Slice s1 = "hello";

std::string str("world");
leveldb::Slice s2 = str;

The reverse is the same:

std::string str = s1.ToString();
assert(str == std::string("hello"));

Be careful when using Slice, because it is up to the caller to ensure that the external byte array pointed to by Slice remains valid when Slice is used. For example, the following example is wrong:

In the following example, Slice may point to an external reference without ensuring the existence of the external reference.

leveldb::Slice Slice;
if (...) {
  std::string str = ...;
  Slice = str;

When the if statement is out of range, str will be destroyed and the stored content of Slice will disappear.

Other contents

unit testing

The unit test written by the author can see the specific effect more intuitively. The path is: / leveldb main / util / bloom_ test. cc.

// Copyright (c) 2012 The LevelDB Authors. All rights reserved.

// Use of this source code is governed by a BSD-style license that can be

// found in the LICENSE file. See the AUTHORS file for names of contributors.


#include "gtest/gtest.h"

#include "leveldb/filter_policy.h"

#include "util/coding.h"

#include "util/logging.h"

#include "util/testutil.h"


namespace leveldb {


    static const int kVerbose = 1;


    static Slice Key(int i, char* buffer) {
    EncodeFixed32(buffer, i);
        return Slice(buffer, sizeof(uint32_t));


class BloomTest : public testing::Test {

    BloomTest() : policy_(NewBloomFilterPolicy(10)) {}


    ~BloomTest() { delete policy_; }


    void Reset() {


    void Add(const Slice& s) { keys_.push_back(s.ToString()); }

    void Build() {
        std::vector<Slice> key_Slices;
        for (size_t i = 0; i < keys_.size(); i++) {
        policy_->CreateFilter(&key_Slices[0], static_cast<int>(key_Slices.size()),
        if (kVerbose >= 2) DumpFilter();


    size_t FilterSize() const { return filter_.size(); }

    // Print
    void DumpFilter() {
        std::fprintf(stderr, "F(");
        for (size_t i = 0; i + 1 < filter_.size(); i++) {
            const unsigned int c = static_cast<unsigned int>(filter_[i]);
            for (int j = 0; j < 8; j++) {
            std::fprintf(stderr, "%c", (c & (1 << j)) ? '1' : '.');
        std::fprintf(stderr, ")\n");

    // matching
    bool Matches(const Slice& s) {
        if (!keys_.empty()) {
        return policy_->KeyMayMatch(s, filter_);

    double FalsePositiveRate() {
        char buffer[sizeof(int)];
            int result = 0;
            for (int i = 0; i < 10000; i++) {
            if (Matches(Key(i + 1000000000, buffer))) {
    return result / 10000.0;




    const FilterPolicy* policy_;
    std::string filter_;
    std::vector<std::string> keys_;



TEST_F(BloomTest, EmptyFilter) {



TEST_F(BloomTest, Small) {



static int NextLength(int length) {

    if (length < 10) {
        length += 1;
    } else if (length < 100) {
        length += 10;
    } else if (length < 1000) {
        length += 100;
    } else {
        length += 1000;
    return length;



TEST_F(BloomTest, VaryingLengths) {
    char buffer[sizeof(int)];

    // Count number of filters that significantly exceed the false positive rate
    int mediocre_filters = 0;
    int good_filters = 0;
    for (int length = 1; length <= 10000; length = NextLength(length)) {
    for (int i = 0; i < length; i++) {
    Add(Key(i, buffer));




ASSERT_LE(FilterSize(), static_cast<size_t>((length * 10 / 8) + 40))

<< length;


// All added keys must match

for (int i = 0; i < length; i++) {

    ASSERT_TRUE(Matches(Key(i, buffer)))
    << "Length " << length << "; key " << i;



// Check false positive rate

double rate = FalsePositiveRate();

    if (kVerbose >= 1) {
        "False positives: %5.2f%% @ length = %6d ; bytes = %6d\n",
        rate * 100.0, length, static_cast<int>(FilterSize()));
    ASSERT_LE(rate, 0.02); // Must not be over 2%
    if (rate > 0.0125)
        mediocre_filters++; // Allowed, but not too often


if (kVerbose >= 1) {
    std::fprintf(stderr, "Filters: %d good, %d mediocre\n", good_filters,


    ASSERT_LE(mediocre_filters, good_filters / 5);



// Different bits-per-byte


} // namespace leveldb

c + + syntax

I haven't learned C + +, so this part adds some keywords and grammatical meanings I don't understand.


C + + reference manual Explain as follows:

  • explicit decorated constructors cannot be called implicitly.
  • Implicit conversions between class objects are prohibited.

In this article, we focus on the first point: after the constructor is explicit ly modified, it can no longer be implicitly called.

Here are online related cases:


class Explicit

        Explicit(int size)
            std::cout << " the size is " << size << std::endl;
        Explicit(const char* str)
            std::string _str = str;
            std::cout << " the str is " << _str << std::endl;

        Explicit(const Explicit& ins)
            std::cout << " The Explicit is ins" << std::endl;

        Explicit(int a,int b)
            std::cout << " the a is " << a  << " the b is " << b << std::endl;

int main()
    Explicit test0(15);
    Explicit test1 = 10;// Implicit call Explicit(int size)

    Explicit test2("RIGHTRIGHT");
    Explicit test3 = "BUGBUGBUG";// Implicit call Explicit(const char* str)

    Explicit test4(1, 10);
    Explicit test5 = test1;

Although there is no error in the program, it seems that assigning a variable of type int or const char * to a variable of type explicit is not very good, and it is difficult to check the error once used. Therefore, after the constructor is modified by explicit, it can no longer be implicitly called. The effect after adding keywords is not demonstrated. In addition, the whole program cannot be compiled.

Roast: it's a crazy thing. In the past version, implicit calling was used to improve coding efficiency. Later, it was found that the pit was too big. Dig the pit and fill it yourself.

resize function

It is easier to understand the following cases directly:

  • myvector.resize(5);
    Adjust the original vector array with 10 numbers to the length of 5 numbers, delete the redundant numbers and free up memory. 5 < 10 reduce array length
  • myvector.resize(8,100);
    Adjust the length of the vector array of 5 numbers to 8, and fill the insufficient number with 100, that is, add 3 100. 8 > 5 increase the length of the array and specify the filling elements
  • myvector.resize(12);
    Adjust the length of the vector array of 8 numbers to 12 and fill it with 0 by default, that is, increase 4 zeros. 12 > 8 increase the length of the array without specifying the filling element

Mathematical derivation

The derivation part is for those who want to know more deeply. They can directly remember the above conclusion. It doesn't matter if they don't understand it.

Most of the following contents come from the basis of the paper LNCS 4168 - Less Hashing, Same Performance: Building a Better Bloom Filter ( Translation.

false position: the misjudgment rate, that is, the misjudgment rate increases with the increase of hash and bit bits of 1.

According to the composition of bloom filter, for a specified bit, the probability of setting it to 0 and 1 is as follows:

P(1) = 1/m
P(0) = 1 - 1/m

k hash functions. The probability of setting the bit to 0 is:

P'(0) = P(0) ** k = (1 - 1/m) ** k

After n key s, the probability of setting the bit to 0 is

P''(0) = P'(0) ** n = (1 - 1/m) ** kn

According to the formula of natural logarithm e:

We can approximate the previous P''(0)

For the value of natural logarithm e, you can see the following:

When a non-existent key is detected, the following conditions are met:

The corresponding k bit s are exactly set to 1, which is the scene of false positive.

The probability is:

The question is, how to minimize false_ What about positive?

To simplify the description, first define P (i.e. P''(0): the probability that a bit is set to 0):

According to the formula:

If the base number is e and is a fixed value, minimize false_positive_rate is the minimization index

According to the previous calculation results, we can make the following deformation:

The final result g:

According to the symmetry, f takes the minimum value when p = 1/2 °.

At this time, the minimum value of k and f is:

Final derivation result:

Considering the probability that p is set to 0, it can be considered that when half of m is set to 1 and half is set to 0, the misjudgment rate is the lowest.

An example of the combination relationship table of false position and m/n and k can be seen in the following screenshot:


Bloom Filter is usually used to quickly determine whether an element is in the collection. In essence, it tolerates a certain error rate in exchange for space-time efficiency.

Meaning of LevelDB: the conflict handling part is saved on the basis of hash table. LevelDB uses some optimization in its implementation: using a hash function to achieve the effect of approximately k hash functions. This enables efficient concurrent writes without sacrificing too much performance.

In addition to the hash function and the optimization part for concurrent writing, other parts of LevelDB are very close to the theoretical basis of Bloom filter. It is also an excellent learning case. As a production case application of C + + version of filter, it is also a good reference model.

Finally, it must be right to ask bloom for questions.

reference material

The following information will definitely give you a thorough understanding of the bloom filter.

For the official account, please "read the original" for the access link.

Tags: Leveldb

Posted by gloveny on Fri, 20 May 2022 11:34:33 +0300