很重要 - [ICSE 2018 论文阅读] Towards Practical Program Repair with On-Demand Candidate Generation

xiaoxiao2021-04-15  57

前言

本文旨在阅读 ICSE 2018 论文——Towards Practical Program Repair with On-Demand Candidate Generation。

看标题

标题中有两个字眼很吸引人: 1)Towards practical program repair 向实际、实用的自动修复靠拢 2)On-demand candidate generation. 随需应变的补丁生成,听起来很酷

具体怎么实现的?用了什么技术?

感觉现在的自动修复论文一直在贴紧实用,跟紧工业化需求。 未来写文章也应该贴紧实际,拥有现实意义和价值。

作者

Jinru Hua, Mengshi Zhang, Kaiyuan Wang and Sarfraz Khurshid The University of Texas at Austin, USA {lisahua,mengshi.zhang,kaiyuanw,khurshid}@utexas.edu

Sarfraz Khurshid的主页:https://users.ece.utexas.edu/~khurshid/

这位老师也很厉害,他2003年完成博士答辩,博士论文我已经放在我的资源页,值得一看。 其博士论文主要讲了: 1)Test Generation 2)Software Testing 3)Modeling structures 4)Mutation 5)Translations 6)Nonisomorphic Generation 7)Structure Generation Using SAT 8)Ease of Specificatio 9)Static Analysis 10)Software Model Checking

感觉很厉害,论文长达126页,值得一读。

他的学生:

Doctoral  Hayes Converse (M.S. (Thesis) 2016, UT ECE)   Nima Dini (M.S. (Thesis) 2016, UT ECE)  Diego Funes  Jinru Hua  Rui Qiu (M.S. (Thesis) 2014, UT ECE)  Allison Sullivan (M.S. (Thesis) 2014, UT ECE)  Kaiyuan Wang (M.S. (Thesis) 2015, UT ECE)  Hua Zhong (M.S. (Report, Option III) 2015, UT ECE) Masters  Cagdas Yelen  Mengshi Zhang

他的研究方向:

Software Testing, Symbolic Execution, and Model Checking 2001-2012年Error Recovery and Automated Debugging 2005-2012Object Modeling and Static Analysis 2000-2012Parallel Development and Requirements 2007-2009Theorem Proving 2003-2007Semantics 1999

这个学校很强,见:[2]

德克萨斯大学奥斯汀分校(University of Texas at Austin),简称UT-Austin),成立于1883年,是一所著名的世界顶尖公立研究型大学,是北美顶尖大学联盟美国大学协会(AAU)的成员,美国最负盛名的“公立常春藤(Public Ivy)”最初八所院校之一,是德克萨斯大学系统中的旗舰校区,也是德克萨斯州境内最顶尖的高等学府之一,坐落在德克萨斯州的首府奥斯汀市区,距离市内的州政府总部约一英里。学校现有学生约51,000人,教职员工数约3,000人(2014秋季) [1] ,为全美单一校园中学生人数中第五的大学。 德克萨斯大学奥斯汀分校(University of Texas at Austin) ,又名德克萨斯大学奥斯汀分校、德州大学奥斯汀分校,简称UT Austin,创建于1883年,坐落在美丽的德克萨斯州首府奥斯汀市,是一所世界著名大学。在美国有着“公立常春藤”的美誉。

对作者的思考

https://dblp.org/pers/hd/h/Hua:Jinru :

这算是厚积薄发吗?从2016年,论文开始涉及到sketch,到2017-2018,一直是在做sketch,最后发了ICSE 2018,而且我有理由相信,这不是结束,而是开始。第一篇顶会发出去了,第二篇便不会远。

一个ICSE 2018 technique paper,对应一个FSE 2018 demonstration paper,震惊

https://conf.researchr.org/profile/conf/jinruhua :

两篇文章,一长一短,都是在写sketchfix,不过FSE这篇短文没有算在dblp里面。 看来不是regular paper,就不算。

摘要

背景知识:可以看到作者抓住的点是——现在的自动修复效率不高,因为需要一个一个candidate patch去验证。

Effective program repair techniques, which modify faulty programs to fx them with respect to given test suites, can substantially reduce the cost of manual debugging. A common repair approach is to iteratively frst generate candidate programs with possible bug fxes and then validate them against the given tests until a candidate that passes all the tests is found. While this approach is conceptually simple, due to the potentially high number of candidates that need to frst be generated and then be compiled and tested, existing repair techniques that embody this approach have relatively low effectiveness, especially for faults at a fne granularity.

所以作者的idea就是:把缺陷程序分成一个个的sketch,然后按需修复??? each sketch once which may represent thousands of concrete candidates 这么神奇的吗,有待继续阅读。

To tackle this limitation, we introduce a novel repair technique, SketchFix, which generates candidate fxes on demand (as needed) during the test execution. Instead of iteratively re-compiling and re-executing each actual candidate program, SketchFix translates faulty programs to sketches, i.e., partial programs with “holes”, and compiles each sketch once which may represent thousands of concrete candidates. With the insight that the space of candidates can be reduced substantially by utilizing the runtime behaviors of the tests, SketchFix lazily initializes the candidates of the sketches while validating them against the test execution.

作者进行了实验,证明自己的sketchfix很强。 但是我觉得这个是有条件的实验,比如AST node-level granularity,还有23minutes,这个不好说的,我觉得。 所以是不是最好的工具? 但先别说最好,光是这个idea,确实是别人没想到的。 所以搞学术还是应该有自己的独立见解,思维,思路,想法。如果一直跟在后面不敢创新,是一件很可怕的事情。

We experimentally evaluate SketchFix on the Defects4J benchmark and the experimental results show that SketchFix works particularly well in repairing bugs with expression manipulation at the AST node-level granularity compared to other program repair techniques. Specifcally, SketchFix correctly fxes 19 out of 357 defects in 23 minutes on average using the default setting. In addition, SketchFix fnds the frst repair with 1.6% of re-compilations (#compiled sketches/#candidates) and 3.0% of re-executions out of all repair candidates.

substantially 非常;大大地 very much; a lot 同义词: considerably The costs have increased substantially. 成本大大提高了。 (formal) 基本上;大体上;总的来说 mainly; in most details, even if not completely What she says is substantially true. 她的话大体符合事实。

sketch N-COUNT 概略描述;简述 A sketch of a situation, person, or incident is a brief description of it without many details. N-COUNT 草图;略图;素描 A sketch is a drawing that is done quickly without a lot of details. Artists often use sketches as a preparation for a more detailed painting or drawing.

introduction

作者在第二段就开始思辨现在的自动修复技术,并指出作者发现的问题:

To allow the exploration of large numbers of candidates, researchers have developed various techniques in previous work. For example, some techniques [7, 24, 36, 37, 40] infer constraints and synthesize repairs by translating the constraints to propositional satisfability (SAT) formulas. Such translation-based synthesis may involve incomplete translations or create impractical problems that require creating complex models for all involved libraries. Moreover, they generally exclusively reason about boolean or integer type [24, 37] and can hardly handle manipulation of non-primitivetype expressions in presence of libraries or complex constructs like AST node-level type casting. Some techniques mine historical data [25, 28, 30] or analyze documents [27, 58] to rank the repair candidates. These techniques have shown their effectiveness on some classes of defects like exception handling, yet they may not be effective at repairs that require fne-grained expression manipulations at the AST node-level

[24] Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. 2017. S3: syntax- and semantic-guided repair synthesis via programming by examples. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017. 593–604 [37] Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2016. Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis. In 38th International Conference on Software Engineering, ICSE 2016.

所以我有一个发现:当前的文章必然建立在过去的文章(通常是顶会文章,因为它们代表着最前沿的学术成果)的不足之上,新文章往往是:发现问题 -> 思考解决问题的方法 -> 实现方法 -> 写成文章 -> 发表。 然而我发现问题都做不到。实在有些伤感,难以抑制。

作者的主要思想: 看了之后,感觉主要是:1)space of candidate programs can be pruned substantially by utilizting runtime information; 2) and by generating candidates on demand during test validation. 这是在补丁生成上玩的花样,我暂时还没看懂。 需要更进一步的描述。

We present SketchFix, which is a novel technique for more effective generate-and-validate program repair using a perspective different from previous work. Our key insight is that the space of candidate programs can be pruned substantially by utilizing runtime information and by generating candidates on demand during test validation. To illustrate, consider trying to fx a faulty condition in a while-loop as well as the body of the loop; if a test execution raises an exception upon evaluating a specifc candidate while-loop condition, all candidates of the while-loop body are pruned from search for that choice of the candidate condition expression. In fact, our approach for lazy candidate generation will not create any candidates for the while-loop body (which may contain thousands of patches) if the while-loop body is not executed. When a test fails due to either a runtime exception or a test assertion failure, the parts of the candidate program that were directly executed determine the generation of the future candidates. Instead of the traditional approach of iteratively generating and validating each repair candidate, we tightly integrate the generation and validation of candidates by effectively utilizing runtime behaviors of the test executions to prune a large part of the search space, which must be explored otherwise.

introduction-究极震惊-补丁和源程序相似度的idea,竟然引出了这么多顶会的论文

就下面这一句话,我感觉2017-2018年,围绕这个,出了好多相似度的论文,全部关于补丁正确性的,那么下一个引导补丁正确性的应该走向何方呢?

Recent techniques present the insight that patches that are semantically closer to the original programs are more likely to be correct from the perspective of the developers [5, 24].

讲真,这个线索是很重要的。未来APR的发展何去何从?就得从这里开始分析,寻根摸底。 或者,根据这个的发展轨迹,来寻根摸底。 此外,读完这篇文章之后,我觉得自己不能再泛读了,得开始精度这一条线索上的所有论文,摸清楚相似度理论,并尽量复现。原来过去的论文会引导以后的论文不断发展,那就要看自己有没有好眼力了。

简单小结

为什么要小结了? 1)因为下面的我又看不懂了 2)我觉得很多基础的东西还没有学会,比如说程序的结构,AST分析,这些都有所欠缺,我得恶补这些知识,充分了解代码的本质,程序的本质,此外,还要多编程,多参与。

以后,暂停大范围阅读,开始更精确一点的阅读,更缓慢一点的阅读,有所复现,而不能再停留在阅读、浏览论文的层面。

参考文献

[1] Sarfraz Khurshid. https://users.ece.utexas.edu/~khurshid/

[2] 得克萨斯大学奥斯汀分校. https://baike.baidu.com/item/得克萨斯大学奥斯汀分校/5587833?fromtitle=The University of Texas at Austin&fromid=11377825&fr=aladdin

[3] Jinru Hua. https://dblp.org/pers/hd/h/Hua:Jinru

[4] Jinru Hua. https://conf.researchr.org/profile/conf/jinruhua

转载请注明原文地址: https://www.6miu.com/read-4817819.html

最新回复(0)