转载:http://blog.csdn.net/pngynghay/article/details/22052737
堆的概念
堆实际上是一棵完全二叉树,其任何一非叶节点满足性质: Key[i]<=key[2i+1]&&Key[i]<=key[2i+2]或者Key[i]>=Key[2i+1]&&key>=key[2i+2] 即任何一非叶节点的关键字不大于或者不小于其左右孩子节点的关键字。 堆分为大顶堆和小顶堆,满足Key[i]>=Key[2i+1]&&key>=key[2i+2]称为大顶堆,满足 Key[i]<=key[2i+1]&&Key[i]<=key[2i+2]称为小顶堆。由上述性质可知大顶堆的堆顶的关键字肯定是所有关键字中最大的,小顶堆的堆顶的关键字是所有关键字中最小的。
其中,大根堆和小根堆在海量数据的top N问题中,有着很好的时间复杂度。
首先,先给出一个交换两个变量数值的函数。
[cpp]
view plain
copy
void Swap(uint32_t* array, uint32_t i, uint32_t j) { assert(array); uint32_t tmp = 0; tmp = array[j]; array[j] = array[i]; array[i] = tmp; }
头文件包含
[cpp]
view plain
copy
#include <stdlib.h> #include <stdint.h> #include <assert.h> #include <string.h> #include <stdio.h>
大根堆实现
[cpp]
view plain
copy
void MaxHeapify(uint32_t* array, uint32_t heapSize, uint32_t currentNode) { uint32_t leftChild = 0, rightChild = 0, largest = 0; leftChild = 2*currentNode + 1; rightChild = 2*currentNode + 2; if(leftChild < heapSize && array[leftChild] > array[currentNode]) largest = leftChild; else largest = currentNode; if(rightChild < heapSize && array[rightChild] > array[largest]) largest = rightChild; if(largest != currentNode) { Swap(array, largest, currentNode); MaxHeapify(array, heapSize, largest); } } void MaxHeapCreat(uint32_t* array, uint32_t heapSize) { int i = 0; for(i = heapSize/2; i >= 0; i--) { MaxHeapify(array, heapSize, i); } }
小根堆实现
[cpp]
view plain
copy
void MinHeapify(uint32_t* array, uint32_t heapSize, uint32_t currentNode) { uint32_t leftChild = 0, rightChild = 0, minimum = 0; leftChild = 2*currentNode + 1; rightChild = 2*currentNode + 2; if(leftChild < heapSize && array[leftChild] < array[currentNode]) minimum = leftChild; else minimum = currentNode; if(rightChild < heapSize && array[rightChild] < array[minimum]) minimum = rightChild; if(minimum != currentNode) { Swap(array, minimum, currentNode); MinHeapify(array, heapSize, minimum); } } void MinHeapCreat(uint32_t* array, uint32_t heapSize) { int i = 0; for(i = heapSize/2; i >= 0; i--) { MinHeapify(array, heapSize, i); } }
top N问题
利用小根堆解决获取大量数据中最大的N个值,先构建一个拥有N个元素的小根堆。然后,将其余的元素插入到小根堆即可。插入方法如下:
[cpp]
view plain
copy
void MinInsert(uint32_t* array, uint32_t heapSize, uint32_t elem) { if(elem > array[0]) { array[0] = elem; MinHeapify(array, heapSize, 0); } }
利用大根堆解决获取大量数据中最小的N个值,先构建一个拥有N个元素的大根堆。然后,将其余的元素插入到大根堆即可。插入方法如下:
[cpp]
view plain
copy
void MaxInsert(uint32_t* array, uint32_t heapSize, uint32_t elem) { if(elem < array[0]) { array[0] = elem; MaxHeapify(array, heapSize, 0); } }
时间复杂度分析
堆调整一次的时间复杂度是O(logN)。所以,通过堆来解决top N 问题的时间复杂度是O(nlogN).
其中,n为数据的个数,N为堆维护的数据的个数。
测试程序
[cpp]
view plain
copy
int main() { int i = 0, heapSize = 10; uint32_t array[] = {2,20,13,18,15,8,3,5,4,25}; uint32_t minelem = 10, maxelem = 1; MinHeapCreat(array, heapSize); printf("Output the MinHeap:\n"); for(i = 0; i < heapSize; i++) { printf("%d\t", array[i]); } MinInsert(array, heapSize, minelem); printf("\nOutput insert elem %d:\n",minelem); for(i = 0; i < heapSize; i++) { printf("%d\t", array[i]); } printf("\n"); MaxHeapCreat(array, heapSize); printf("Output the MaxHeap:\n"); for(i = 0; i < heapSize; i++) { printf("%d\t", array[i]); } MaxInsert(array, heapSize,maxelem); printf("\nOutput insert elem %d:\n",maxelem); for(i = 0; i < heapSize; i++) { printf("%d\t", array[i]); } printf("\n"); }
测试结果