Heap and heap sorting

Heap and heap sorting

Definition of heap

  • The heap is a complete binary tree: except for the last layer, the other layers must be full at several points, and the nodes of the last layer must be filled gradually from left to right
  • The value of each parent node in the heap must be greater than or equal to (or less than or equal to) the value of its two child tree nodes: greater than or equal to the large top heap and less than or equal to the small top heap

Implementation of heap

The complete binary tree uses arrays for storage, so the heap is also implemented based on arrays. The most important operations of the heap are the following two basic operations:

  • Add (insert) new element
  • Delete heap top element

The operation method is described in detail below:

New element

The algorithm of new elements is as follows:

  • 1. Put the new element at the end of the heap, that is, at the end of the array, but this may lead to non-compliance with the characteristics of the heap, so the second step of bottom-up heap is required
  • 2. If the parent-child relationship of the node is not satisfied, the node will continue to be compared with the parent-child relationship of the heap. 2. If the parent-child relationship of the node is not satisfied, the node will continue to be compared with the heap; If it cannot be exchanged, the heap is completed

The following figure shows the stacking process of a large top stack. First insert the new element 22 into the tail, and then stack it

Delete heap top element

The algorithm for deleting heap top elements is as follows:

  • 1. Exchange the values of the heap top element and the heap tail element, so that the heap will be reduced and the heap top element will be deleted. However, at this time, it may not conform to the characteristics of the heap. All the heap from top to bottom in the second step is required
  • 2. Heap. Here, top-down heap is used to compare with its child nodes along the path of the node. If the parent-child node relationship is not satisfied, exchange and repeat this process until the parent-child node relationship is satisfied

The following figure shows the deletion process of a large top heap

Realization of large top stack

Here, a large top heap is realized by simple simulation with Python 3. The small top heap is similar, that is, it is different. There is no need to repeat the implementation here

class MaxHeap:
    def __init__(self, size: int):
        """
        Initialization of the heap, setting the size of the heap
        :param size: Heap size
        """
        # Here's a trick. The array subscript does not start from 0, but from 1. This makes it easy to obtain and calculate the subscript of parent-child nodes. All the array space here is size+1
        self.data = [None] * (size + 1)
        self.size = size
        self.used = 0
        print("init heap:", self.data)

    def push(self, value: int) -> None:
        """
        Heap insertion
        :return:
        """
        if self.used == self.size:
            self.pop()

        self.used += 1
        print("insert:", self.used, self.data, end="==>")
        self.data[self.used] = value
        self._shitUp()
        print(self.data)

    def pop(self) -> None:
        """
        Delete operation of heap
        :return:
        """
        if self.used == 0:
            return

        self.data[1] = self.data[self.used]
        print("pop:", self.data, end="==>")
        self.used -= 1
        self._shitDown()
        print(self.data)

    def _shitUp(self):
        """
        Bottom up stacking
        Compare with the parent node, and exchange if it is greater than
        :return:
        """
        child = self.used
        while child // 2 > 0 and self.data[child] > self.data[child//2]:
            self.data[child], self.data[child//2] = self.data[child//2], self.data[child]
            child = child // 2

    def _shitDown(self):
        """
        Top down stacking
        Here is the exchange with the child node of the maximum value, so one is used maxPos Where to save the maximum value
        If it is the location of the current parent node, the heap ends
        If not, exchange parent and child nodes and continue the cycle
        :return:
        """
        parent = 1
        while True:
            maxPos = parent
            if parent * 2 <= self.used and self.data[parent*2] > self.data[parent]:
                maxPos = parent * 2
            if parent * 2 + 1 <= self.used and self.data[parent*2+1] > self.data[maxPos]:
                maxPos = parent*2+1
            if maxPos == parent:
                break
            self.data[maxPos], self.data[parent] = self.data[parent], self.data[maxPos]
            parent = maxPos

    def toList(self):
        return self.data



if __name__ == "__main__":
    maxHeap = MaxHeap(2)
    maxHeap.push(1)
    maxHeap.push(2)
    maxHeap.push(3)
    print(maxHeap.toList())


init heap: [None, None, None]
insert: 1 [None, 1, None]
insert: 2 [None, 2, 1]
pop: [None, 1, 1]
insert: 2 [None, 3, 1]
[None, 3, 1]

LeetCode 347: use self writing heap for the first K high-frequency elements

If you need to run the test, you need to remove the print in the code, otherwise it will exceed the output limit.

One of LeetCode is solved by using heap, and the answer is implemented by language. Here we will try to implement a small top heap by ourselves. The only change made by the small top heap is to add a dictionary, convert and compare when comparing. The general code and ideas are in the following code.

"""
347. front K High frequency element
 Given a non empty array of integers, return the number before the occurrence frequency k High element.



Example 1:

input: nums = [1,1,1,2,2,3], k = 2
 output: [1,2]
Example 2:

input: nums = [1], k = 1
 output: [1]


Tips:

You can assume a given k Always reasonable, and 1 ≤ k ≤ The number of different elements in the array.
The time complexity of your algorithm must be better than O(n log n) , n Is the size of the array.
The question data ensures that the answer is unique, in other words, the first one in the array k The set of high frequency elements is unique.
You can return answers in any order.



Problem solving ideas:
Use the small top heap implementation because you want to return to the previous maximum K Number, small top pile can be saved K Number, and the top of the heap is the lowest decimal, and the rest is greater than it

1.Use first hashmap Count the number of occurrences of the saved number
2.use k Initialize small top heap with data
3.If it is larger than the top of the stack, it will be inserted, and if it is smaller than the top of the stack, it means the front K The big number doesn't have its share

Statistics N,ergodic N,Large top reactor operation logK,Maximum time complexity O(N)

Try to implement a heap yourself
 The built-in ran 60 ms,Self written 56, feel similar
"""
import collections
from typing import List
import heapq


class SolutionP:
    def topKFrequent(self, nums, k):
        """
        :type nums: List[int]
        :type k: int
        :rtype: List[int]
        """
        count = collections.Counter(nums)
        return heapq.nlargest(k, count.keys(), key=count.get)


class Solution:
    def topKFrequent(self, nums: List[int], k: int) -> List[int]:
        count = {}
        for num in nums:
            count[num] = count.get(num, 0) + 1
        print(count)

        # Initialize the heap with K numbers
        minHeap = MinHeap(k, count)
        keys = list(count.keys())
        print(keys)
        for i in range(0, k):
            minHeap.push(keys[i])

        # It is inserted only when it is greater than the top of the stack. If it is less than, it means that there is no copy of the first K
        for i in range(k, len(keys)):
            if count[keys[i]] > count[minHeap.getMinest()]:
                minHeap.push(keys[i])
                
        return minHeap.toList()


class MinHeap:
    def __init__(self, size: int, myDict: dict):
        """
        Initialization of the heap, setting the size of the heap
        :param size: Heap size
        """
        # Here's a trick. The array subscript does not start from 0, but from 1. This makes it easy to obtain and calculate the subscript of parent-child nodes. All the array space here is size+1
        self.data = [None] * (size + 1)
        self.size = size
        self.used = 0
        self.myDict = myDict
        print("init heap:", self.data)

    def push(self, value: int) -> None:
        """
        Heap insertion
        :return:
        """
        print("insert value:", value, end=" ")
        if self.used == self.size:
            self.pop()

        self.used += 1
        print("insert:", self.used, value, self.data, end="==>")
        self.data[self.used] = value
        self._shitUp()
        print(self.data)

    def pop(self) -> None:
        """
        Delete operation of heap
        :return:
        """
        if self.used == 0:
            return

        self.data[1] = self.data[self.used]
        print("pop:", self.data, end="==>")
        self.used -= 1
        self._shitDown()
        print(self.data)

    def _shitUp(self):
        """
        Bottom up stacking
        Compare with the parent node, and exchange if it is greater than
        :return:
        """
        child = self.used
        while child // 2 > 0 and self.myDict[self.data[child]] < self.myDict[self.data[child // 2]]:
            self.data[child], self.data[child // 2] = self.data[child // 2], self.data[child]
            child = child // 2

    def _shitDown(self):
        """
        Top down stacking
        Here is the exchange with the child node of the maximum value, so one is used maxPos Where to save the maximum value
        If it is the location of the current parent node, the heap ends
        If not, exchange parent and child nodes and continue the cycle
        :return:
        """
        parent = 1
        while True:
            minPos = parent
            if parent * 2 <= self.used and self.myDict[self.data[parent * 2]] < self.myDict[self.data[parent]]:
                minPos = parent * 2
            if parent * 2 + 1 <= self.used and self.myDict[self.data[parent * 2 + 1]] < self.myDict[self.data[minPos]]:
                minPos = parent * 2 + 1
            if minPos == parent:
                break
            self.data[minPos], self.data[parent] = self.data[parent], self.data[minPos]
            parent = minPos

    def getMinest(self) -> int:
        """
        Returns the value of the heap top element
        :return:
        """
        return self.data[1]

    def toList(self):
        return self.data[1:]


if __name__ == "__main__":
    s = Solution()
    print(s.topKFrequent(nums=[1, 1, 1, 2, 2, 3], k=2))
    # [-3,-4,0,1,4,9]
    print(s.topKFrequent(nums=[6, 0, 1, 4, 9, 7, -3, 1, -4, -8, 4, -7, -3, 3, 2, -3, 9, 5, -4, 0], k=6))

reference material

Tags: Python Algorithm python3

Posted by activeradio on Wed, 25 May 2022 13:10:30 +0300