看计算机科学中最重要的32个算法,其中有个是集束搜索(又名定向搜索,Beam Search)——最佳优先搜索算法的优化。使用启发式函数评估它检查的每个节点的能力。不过,集束搜索只能在每个深度中发现前m个最符合条件的节点,m是固定数字——集束的宽度。
泛泛的介绍,不是很能理解清楚,于是有百度又google,写篇东西备忘。先贴维基百科的地址:Beam Search
翻译过来就是:
Beam Search(集束搜索)是一种启发式图搜索算法,通常用在图的解空间比较大的情况下,为了减少搜索所占用的空间和时间,在每一步深度扩展的时候,剪掉一些质量比较差的结点,保留下一些质量较高的结点。这样减少了空间消耗,并提高了时间效率。
算法的工作流程如下:
使用广度优先策略建立搜索树,在树的每一层,按照启发代价对节点进行排序,然后仅留下预先确定的个数(Beam Width-集束宽度)的节点,仅这些节点在下一层次继续扩展,其他节点就被剪掉了。
将初始节点插入到list中, 将给节点出堆,如果该节点是目标节点,则算法结束; 否则扩展该节点,取集束宽度的节点入堆。然后到第二步继续循环。 算法结束的条件是找到最优解或者堆为空。In order to understand this algorithm, you must be familiar with theconcept of a graph as a group of nodes/vertices and edges connectingthese nodes. It is also helpful to understand the how a search treecan be used to show the progress of a graph search. Additionally,knowledge of the Breadth-FirstSearch Algorithm is required because Beam Search is a modificationof this algorithm.
Even though the Breadth-First Search Algorithm is guaranteed tofind the shortest path from a start node to a goal node in anunweighted graph, it is infeasible to use this algorithm on largesearch spaces because its memory consumption is exponential. Thiscauses the algorithm run out of main memory before a solution can befound to most large, nontrivial problems. For this reason, Beam Searchwas developed in an attempt to achieve the optimal solution found bythe Breadth-First Search Algorithm without consuming too muchmemory.
In order to accomplish this goal, Beam Search utilizes a heuristicfunction, h, to estimate the cost to reach the goal from a givennode. It also uses a beam width, B, which specifies the numberof nodes that are stored at each level of the Breadth-FirstSearch. Thus, while the Breadth-First Search stores all the frontiernodes (the nodes connected to the closing vertices) in memory, theBeam Search Algorithm only stores the B nodes with the bestheuristic values at each level of the search. The idea is that theheuristic function will allow the algorithm to select nodes that willlead it to the goal node, and the beam width will cause the algorithmto store only these important nodes in memory and avoid running out ofmemory before finding the goal state.
Instead of the open list used by the Breadth-First SearchAlgorithm, the Beam Search Algorithm uses the BEAM to store thenodes that are to be expanded in the next loop of the algorithm. Ahash table is used to store nodes that have been visited, similar tothe closed list used in the Breadth-First Search. Beam Searchinitially adds the starting node to the BEAM and the hashtable. Then, each time through the main loop of the algorithm, BeamSearch adds all of the nodes connected to the nodes in the BEAMto its SET of successor nodes and then adds the B nodes withthe best heuristic values from the SET to the BEAM andthe hash table. Note that a node that is already in the hashtable is not added to the BEAM because a shorter path tothat node has already been found. This process continues until thegoal node is found, the hash table becomes full (indicatingthat the memory available has been exhausted), or the BEAM isempty after the main loop has completed (indicating a dead end in thesearch).
The Beam Search Algorithm is shown by the pseudocode below. Thispseudocode assumes that the Beam Search is used on an unweighted graphso the variable g is used to keep track of the depth of thesearch, which is the cost of reaching a node at that level.
The following traces of the Beam Search Algorithm use two rows torepresent each main loop of the algorithm's execution. The first rowof each numbered loop displays the nodes added to theSET. These nodes are ordered by their heuristic values, withalphabetical ordering used to sort nodes with identical hvalues. Since the SET is a mathematical set, if a node isinserted into the SET more than once from multiple parents, itonly appears in the SET once. The second row of each numberedloop lists the nodes from the SET that are added to theBEAM in the second part of the main loop. Both rows alsodisplay the hash table to show its current state. Notice thatthe hash table has only seven slots, indicating that the memorysize for this example trace is seven. A simple linear hashing schemewith key values determined by the node names' ASCII values mod 7 isused for simplicity. In all three of these lists, nodes are listed inthe format node_name(predecessor_name). The algorithm is tracedfour times with different values of B to demonstrate thestrengths and weaknesses of the algorithm. Each trace includes asearch tree that shows the BEAM at each level of the search. Inthe graph, the numbers under the node names are the h valuesfor the nodes. These traces show how Beam Search attempts to find theshortest path from node I to node B in the graph shownin Figure 1. (Figure 1 is included above each trace forconvenience.)
Figure 1At this point, the BEAM is empty, and the Beam SearchAlgorithm has reached a dead-end in its search. Since the nodeG in the SET was already in the hash table, itcould not be added to the BEAM, which left the BEAMempty. This trace illustrates the greatest weakness of the Beam SearchAlgorithm: An inaccurate heuristic function can lead the algorithminto a situation in which it cannot find a goal, even if a path to thegoal exists. While increasing the value of B may allow BeamSearch to find the goal, increasing B by too much may cause thealgorithm to run out of memory before it finds the goal. For thisreason, the choice of B has a large impact on Beam Search'sperformance. Figure 2 shows the BEAM nodes at each level inthis dead-end search.
Figure 1In this trace, the Beam Search Algorithm successfully found thegoal via the path IJACB. Even though a solution was found, thissolution is not optimal because IECB is a shorter path to thegoal node. Once again, an inaccurate heuristic function reduced theeffectiveness of the Beam Search Algorithm. Figure 3 shows theBEAM nodes at each level of the search. Notice that only onenode appears in the BEAM at level three in the tree. Thisdemonstrates that Beam Search may not always be able to fill theBEAM at each level in the search. In the last level of thetree, node A was first added to the SET, and then nodeB (the goal node) was found and caused the search tocomplete.
Figure 1With B = 3, the Beam Search Algorithm found the optimal pathto the goal. However, the larger beam width caused the algorithm tofill the entire memory available for the hash table. Figure 4 showsthe BEAM nodes at each level in the search. In the last levelof the tree, nodes A, C, and J were added to theSET, and then the goal node B was found, which caused tosearch to complete.
Figure 1Using B = 4, the Beam Search Algorithm quickly ran out ofmemory. This shows the second major weakness of the Beam SearchAlgorithm: When B becomes large, the algorithm consumes memoryvery quickly like the Breadth-First Search Algorithm. Figure 5 showsthe BEAM at each level in the search. The last level in thetree shows the progress of the search when the algorithm ran out ofmemory.
It is generally effective to analyze graph-search algorithms byconsidering four traits:
Completeness: A search algorithm is complete if it will find a solution (goal node) when a solution exists.Optimality: A search algorithm is optimal if it finds the optimal solution. In the case of the Beam Search Algorithm, this means that the algorithm must find the shortest path from the start node to the goal node.Time complexity: This is an order-of-magnitude estimate of the speed of the algorithm. The time complexity is determined by analyzing the number of nodes that are generated during the algorithm's execution.Space complexity: This is an order-of-magnitude estimate of the memory consumption of the algorithm. The space complexity is determined by the maximum number of nodes that must be stored at any one time during the algorithm's execution.In general, the Beam Search Algorithm is not complete. This isillustrated in Trace 1 above. Even though the memory was not depleted,the algorithm failed to find the goal because it could not add anynodes to the BEAM. Thus, even given unlimited time and memory,it is possible for the Beam Search Algorithm to miss the goal nodewhen there is a path from the start node to the goal node. A moreaccurate heuristic function and a larger beam width can improve BeamSearch's chances of finding the goal. However, this lack ofcompleteness is one of the foremost weaknesses of the Beam SearchAlgorithm.
Just as the Beam Search Algorithm is not complete, it is also notguaranteed to be optimal. This is shown by Trace 2 above. In thisexample, Beam Search found the goal node but failed to find the optimalpath to the goal, even though the heuristic in Figure 1 is admissible(underestimates the cost to the goal from every node) and consistent(underestimates the cost between neighboring nodes). This happenedbecause the beam width and an inaccurate heuristic function caused thealgorithm to miss expanding the shortest path. A more preciseheuristic function and a larger beam width can make Beam Search morelikely to find the optimal path to the goal.
The time for the Beam Search Algorithm to complete tends to dependon the accuracy of the heuristic function. An inaccurate heuristicfunction usually forces the algorithm to expand more nodes to find thegoal and may even cause it to fail to find the goal. In the worstcase, the heuristic function leads Beam Search all the way to thedeepest level in the search tree. Thus, the worst case time isO(Bm), where B is the beam width, and m is themaximum depth of any path in the search tree. This time complexity islinear because the Beam Search Algorithm only expands B nodesat each level; it does not branch out more widely at each level likemany search algorithms that have exponential time complexities. Thespeed with which this algorithm executes is one of its greateststrengths.
Beam Search's memory consumption is its most desirabletrait. Because the algorithm only stores B nodes at each levelin the search tree, the worst-case space complexity isO(Bm), where B is the beam width, and m is themaximum depth of any path in the search tree. This linear memoryconsumption allows Beam Search to probe very deeply into large searchspaces and potentially find solutions that other algorithms cannotreach.
Algorithms can look differently but still operate in almost thesame ways. Compare the pseudocode above with the description in yourtextbook (if available). Then consider these questions:
Does your textbook use a hash table to store the nodes that have been expanded? If not, how does it store these nodes?Does your textbook explain what type of structure should be used to implement the SET? If so, what structure does it use?You can practice the Beam Search Algorithm using the algorithmvisualization system JHAVÉ. If you have not used JHAVÉbefore, please take the time to view the instructions on usingJHAVÉ first. If your browser supports Java Webstart, you canlaunch a visualization of the Beam Search Algorithm directly from this link.
The Beam Search visualization has fifteen example graphs with whichyou can experiment. The first four examples, perfect1,perfect2, perfect3, and perfect4, have perfectheuristic functions that allow the algorithm to find the optimal pathif it has enough memory. The next seven graphs, errant1,errant2, errant3, errant4, errant5,errant6, and errant7, have inaccurate heuristic functionsthat can lead the algorithm to find paths that are longer than optimalif the beam width is too small. The last four graphs, end1,end2, end3, and end4, result in dead-end searcheswhen using smaller beam widths. For each of these examples, you can setthe values for the beam width, B, and the memory size,M, to see how different values of these parameters affect theoutcome of the algorithm on the graph. Finally, the level of detailoption allows you to control how the animation steps through thepseudocode. The high detail option shows how each node is added to theSET one-by-one and is useful when you are less familiar with thealgorithm. The low detail option generates all the nodes in theSET in one step so that you can more easily focus on the otheraspects of the algorithm.
Step through the examples in the visualization and test how thedifferent parameters modify the results found by the Beam SearchAlgorithm. Answer the questions that appear during the visualizationto assess your understanding of the algorithm. When you canconsistently answer the questions correctly, try the exercisesbelow.
Creating input for an algorithm is an effective way to demonstrateyour understanding of the algorithm's behavior.
Design a graph such that Beam Search fails to find the optimal path from the start node to the goal node with B = 1 but finds this shortest path with B = 2.Construct a graph so that a Beam Search with B = 2 reaches a dead-end and fails to find the goal node. (Note: Here a dead-end search refers to the situation in which the BEAM is empty at the beginning of the main loop and does not mean the search runs out of memory.)Consider modifying the Beam Search Algorithm as described above sothat the lines previously written as
if(hash_table is full) return ∞; hash_table = hash_table ∪ { state }; BEAM = BEAM ∪ { state };are reordered as
hash_table = hash_table ∪ { state }; BEAM = BEAM ∪ { state }; if(hash_table is full) return ∞;Does this have any effect on the results found by the algorithm?Can you devise an example in which the first version of the algorithmfinds the goal, but the modified version returns before finding thegoal?
Using your own source code, presentation software, or manuallyproduced drawings, create your own visualization of the Beam SearchAlgorithm.
Develop a ten-minute presentation on the Beam Search Algorithm thatuses the visualization you developed above to explain the strengths andweaknesses of the Beam Search Algorithm.
终上所述,Beam Search算法是在资源受限系统上运行的,处理大规模搜索问题的算法。整个算法大体上就类似广度优先搜索(BFS)+剪枝,sicily的马周游就是一种Beam Search。
参考文章:http://jhave.org/algorithms/graphs/beamsearch/beamsearch.shtml