# Optimal Binary Retrieval Tree

## Definition

Dataset S is an ascending set, and for the optimal binary search tree itself and all its subtrees, the left child < root < right child is satisfied.

## Number of comparisons

Suppose you look for data x in the binary tree, compare it with the root node, go into the left subtree search smaller than the root node, go into the right subtree search larger than the root node, and end the search if it is equal to the node (treat each node as the root node, then do the same).
Assuming that the data x you are looking for is a data node, the number of comparisons should be the depth of the data node + 1 (depth = number of layers - 1, that is, the root node depth is 0).
Assuming that the data X is a gap node, the number of comparisons should be the depth of the gap node, for example, finding x=1.7. When finding 1, 1.7 is greater than 1, then entering the right subtree, but the right subtree only has the gap node L1, X does not belong to S, and the number of comparisons is the depth of L1.
That is, x belongs to S, the number of comparisons is x's depth in the tree plus 1, X does not belong to S, and the number of comparisons is x's depth in the tree.

## Access Probability

For a binary retrieval tree T, the average number of comparisons m(T) is equal to the number of comparisons corresponding to all nodes multiplied by their access probability.

The optimal binary search tree is the binary search tree with the least average number of comparisons under a given access probability distribution.

## Subproblem Analysis

If T is an optimal binary search tree, then its corresponding left and right subtrees (root node determination) are both optimal binary search trees. Assuming that the left subtree is better, there will also be a better binary search tree in the range i to j, which is inconsistent with T being the optimal binary search tree, so it is not established.

In the case of k=i and k=j, the left subtree is empty, only the gap node, the right subtree is empty, only the gap node. If a data is within the gap node range, it has already determined whether it belongs to S when it is initially compared with the root node (all nodes are first compared with the root node). So there is no need to calculate the number of comparisons in this subtree (we have counted the number of comparisons for the root node, then we will say that the average number of comparisons for the two-way retrieved tree T is equal to the left subtree plus the right subtree plus the root node), so the number of comparisons is 0.

(The PPT part subscript is wrong here, just let me know)

## Recursive equation (or dynamic programming)

As to why w[i][j] should be introduced, it is equal to the sum of all access probabilities from I to j, because the average number of comparisons of left and right subtrees is calculated, respectively, but when left and right subtrees are merged into a binary search tree T, and the depth of all nodes in the left and right subtrees is added by 1, then the new average number of comparisons should be added by 1 times the access probabilities of all nodes, that is, w[i][j], so w[i][j] is introduced.

According to this formula, the minimum average number of comparisons is = min (left subtree minimum average number of comparisons + right subtree minimum average number of comparisons + w[i][j]).

Here you can see why the number of comparisons of corresponding interval data is zero when k=i and k=j are mentioned above, because these comparisons are counted on the root xk.

## Example

```#include<bits/stdc++.h>
#define maxsize 1000+5
#define INF 0x3f3f3f3f
int SS[maxsize][maxsize];
double PA[maxsize], PB[maxsize];
double w[maxsize][maxsize];
double m[maxsize][maxsize];
void BST(double PA[maxsize], double PB[maxsize], int n)
{
//Initialized, only the subtree of the gap node has a comparison number m of 0, only the access probabilities of all nodes of the gap node w and the access probabilities of the gap node.
for(int i=1; i<=n+1; i++)
{
m[i][i-1]=0;
w[i][i-1]=PA[i-1];
}
//Length of interval i to j
for(int len=1; len<=n; len++)
{
//Left boundary
for(int l=1; l+len-1<=n; l++)
{
//Boundary
int r=l+len-1;
m[l][r]=INF;
w[l][r]=w[l][r-1]+PA[r]+PB[r];
//Traversing Root Nodes
for(int k=l; k<=r; k++)
{
double tmp=m[l][k-1]+m[k+1][r]+w[l][r];
if(tmp<m[l][r])
{
m[l][r]=tmp;
//Record the root of this interval
SS[l][r]=k;
}
}
}
}
return ;
}
//There is an error in the title. It should be traversed in order, first output the root node, then recurse the left subtree and the right subtree.
//SS saves the root nodes from l to r. For each node, it is considered the root node in the zone and saved.
void printTree(int l, int r)
{
if(l>r) return ;
int mid=SS[l][r];
std::cout<<mid<<" ";
printTree(l,mid-1);
printTree(mid+1,r);
return ;
}
int main()
{
int n;
std::cin>>n;
std::cin>>PA[0];
for(int i=1; i<=n; i++)
{
std::cin>>PB[i];
std::cin>>PA[i];
}
BST(PA, PB, n);
printTree(1,n);
return 0;
}
```

Tags: Algorithm

Posted by jestercobblepot on Wed, 04 May 2022 19:00:08 +0300