INFERRING LISP PROCRAMS FROM EXAMPLES D a v i d E Shaw W i l l i a m R. Swartout C. C o r d e l l Green A r t i f i c i a l I n t e l l i g e n c e Laboratory D e p a r t m e n t o f C o m p u t e r Science Stanford University Stanford, California ABSTRACT A program programs is from described which infers certain single example input-output recursive LISP pairs Synthesized E X A M P L E is able to infer a class of functions which perform certain list-to list transformations. In particular, each recursive p r o g r a m s may recur in more than one argument, and may involve f u n c t i o n in this class steps t h r o u g h the input list from left to right, t h e synthesis of a u x i l i a r y functions p r o d u c i n g p a r t of the o u t p u t at each step An actual user session with t h e p r o g r a m , called E X A M P L E , is presented, and the operation Consider, for example, the pair of t h e p r o g r a m and its i m p o r t a n t heuristics are outlined. T h e o u t p u t is produced in three steps ACKNOWLEDGEMENTS A recursive subfunction p r o d u c e s the sublists I, 2, and 3 in successive steps and the main T h e a u t h o r s w i s h to thank Richard Waldinger and Doug Lenat f u n c t i o n appends them together f o r t h e f r u i t s of several valuable discussions held early in the L e t us b r i e f l y outline the way this function is synthesized course E X A M P L E determines w h i c h part of the output is produced in of this work We are also grateful for the editorial assistance of A v r a C o h n , w h i c h made possible the preparation of t h e f i r s t step of recursion this draft p r o d u c e d in the the first step Research T h i s research was supported m part by the Advanced Projects Agency under contract DAHC 15-73-C-CM35 produced by the a In the above example, sublist I is It is assumed that this subhst is subfunction subfunction, EXAMPLE a n d in p a r t by the State of C a l i f o r n i a through a California State synthesize G r a d u a t e Fellowship s p e c i f i c a t i o n w h i c h describes this subgoal generating (B C D) are chosen as input SECTION I - INTRODUCTION may now A c o m m o n aspect of many definitions of automatic programming recursively is t h e g o a l of f a c i l i t a t i n g p r o g r a m specification find In this paper, we be synthesized First. a thus attempts to new input-output T h e arguments A and T h e subfunction specified by by calling the EXAMPLE program R e t u r n i n g to the synthesis of the main function, we t h r e e r e m a i n i n g steps 1) T h e recursive call of the mam To describe a f u n c t i o n is f o i m r d , 2) the resulting code is embedded in either a p a r t i c u l a r p r o g r a m by example, the user supplies only a sample C O N S or A P P E N D expression so as to properly conjoin the input o u t p u t f r o m each recursive step, and 3) terminating conditions are consider t h e specification of programs by examples. and output The computer then infers a plausible selected candidate program T h e i n d u c t i v e inference of programs from input-output examples has also been explored by Licklider [1973] and Hardy [ 1974] M o r e generally, this inference task is related to the problems of p r o g r a m inference f r o m traces [ B i e r m a n n . 1973] and grammatical i n f e r e n c e [ F e l d m a n , Crps. H o r n i n g and Reder, 1969, 1969; Horning, We w i l l say that an output has been realized when a function has been specification which satisfies Unfoitunately, not all the given syntheses input-output which realize t h e o u t p u t w i l l be f o u n d acceptable to the user simply To see w h y . we consider a t r i v i a l synthesis scheme which can realize any output B i e r m a n n and Feldman. 1972, Blum and Blum, 1973] synthesized by breaking the input arguments down into their c o n s t i t u e n t atoms and recombining these atoms mechanically to T h i s paper describes a p r o g r a m , called E X A M P L E , that infers r e c u r s i v e L I S P functions f r o m single input-output pairs. f o r m t h e desired o u t p u t Given U s i n g this scheme, the f u n c t i o n specified by m a y be synthesized t r i v i a l l y as 260 T h e user p r o b a b l y intended, t h o u g h , to specify a function which f i n d s all c o m b i n a t i o n s of two elements from an input list of any length. function T h e above synthesis is implausible since it performs this only EXAMPLE for lists formulates of four subgoals elements in a As manner we will which see, guards against i m p l d i t M b l e synthesis of this sort. A discussion of the types of synthesis tasks for which example specification is appropriate, of the problems associated with s p e c i f i c a t i o n by example, and of the relationship between this and other m e t h o d s of p r o g r a m specification appears in Green, et al.[197-11 S E C T I O N 2 - A N A C T U A L SESSION L e t us n o w e x a m i n e an actual session in which the E X A M P L E program synthesizes several user-specified LISP functions. M a t e r i a l typed by the user appears in lower case and is preceded by an asterisk (*). Responses by E X A M P L E are in upper case, w h i l e o u r comments appear in italics. T h e session begins when t h e user types "exampleO" to initiate the specification process. 261 B B B C C C ) is p r o d u c e d by subsequent recursive calls We will refer to the i n i t i a l l y p r o d u c e d o u t p u t segment as the head of the output. The r e m a i n i n g segment w i l l be called the recurrate A f t e r the d i v i d i n g point between head and recurrate is f o u n d , EXAMPLE attempts to synthesize the code that produces the head in t h e same way it attempted the original (user-specified) goal. T h i s subgoal is again specified w i t h an input-output pair, w i t h t h e head a p p e a r i n g as the output: ( W e i g n o r e for now the question of specifying the input part of t h e head realization s u b g o a l ) I n o r d e r t o d i s t i n g u i s h the head f r o m the recurrate, E X A M P L E d i v i d e s the o u t p u t i n t o equal-length groups of adjacent elements A number of other input-output pairs are included in the By way of i l l u s t r a t i o n , we consider a simple variant of C O M B 2 : a p p e n d i x , along w i t h the corresponding programs synthesized by EXAMPLE It should be noted that EXAMPLE can not synthesize f u n c t i o n s i n v o l v i n g counting operations or numerical c o m p a r i s o n s (a f u n c t i o n that sorts a list of integers by value, for example) return F u r t h e r , all termination checks are null tests which can only the value NIL. T h u s , for example, the function E X A M P L E d i v i d e s the o u t p u t into groups of two elements, as i n d i c a t e d above Successive groups are then compared using a template-matching procedure T h i s procedure searches for the first major group w h i c h is substantially d i f f e r e n t in some way from its predecessors, w h i c h r e t u r n s the last element of a list, c o n j e c t u r i n g a head-recurrate separation just before this change ( A B C D> -» D c a n n o t be synthesized, since an equality test and the ability to Comparing r e t u r n a n o n - N I L atom would be required. change A function which successive aftei the groups, third EXAMPLE group, and discovers postulates the a major following separation f i n d s t h e f i r s t ha!j of a list, w h i c h might be specified by also f a l l s outside the class of functions synthesized by example. We h a v e t r i e d only to convey a feeling for some of the programs still beyond the reach of EXAMPLE A more precise c h a r a c t e r i z a t i o n of the class of functions attacked by the current p r o g r a m is f o u n d in sections 4.2 and 4.3. In t h e case of some i n p u t - o u t p u t pairs, the serial comparison p r o c e d u r e must in fact proceed backward through the output list structure. S i m p l e heuristics are used to select a scanning direction for the output T h i s direction determination is used in several later stages of synthesis T h e procedures for grouping, matching, a n d detet m i n i n g scanning direction are discussed in section 4.3. S E C T I O N 3 - H O W IT WORKS: AN OVERVIEW The program first determines whether a r e a l i z a t i o n of the target output is possible simple E X A M P L E is now able to reduce the synthesis task to several nonrecursive T h e programming constructs a v a i l a b l e for nonrecursive synthesis w i l l be described in section 4 . 1 . s i m p l e r subgoals T h e head and recurrate must each be realized, a n d the resulting blocks of code combined in an appropriate way. a l o n g w i t h code for t e r m i n a t i n g conditions In order to specify t h e head-trealization subgoal in input-output form, a new set of input arguments must be formulated Arguments used in If t h e o u t p u t can not be realized using available nonrecursive s p e c i f y i n g the subtask must again be carefully chosen to avoid the constructs, a synthesis i n v o l v i n g recursive constructs is attempted. p o s s i b i l i t y of i m p l a u s i b l e synthesis. T h e r e c u r s i v e L I S P functions synthesized b y E X A M P L E produce w h i c h f o r m the i n p u t of the parent goal may be broken down in some p a r t of t h e o u t p u t d u r i n g the original top-level evaluation s p e c i f y i n g i n p u t for the subgoal. a n d t h e r e m a i n d e r d u r i n g subsequent recursive calls. realizing Considering t h e specification Still, some of the arguments For example, the initial goal of ( A B A C A D B C B D C D ) from (A B C D) s p a w n s t h e subgoal of realizing f o r e x a m p l e , we see that the i n i t i a l value segment Is p r o d u c e d d u r i n g the first recursive step, while the remainder of the output. T h e heuristics used to break d o w n parent input arguments are discussed in section 4 4 262 O n c e new arguments have been generated, E X A M P L E attacks t h e h e a d - r e a l i z a t i o n subtask exactly as it d i d the original problem. If h e a d r e a l i z a t i o n itself requires a recursive synthesis, of course, a s e p a r a t e a u x i l i a r y Junction must be synthesized call to the auxiliary function (with In this case, a appropriate arguments) a p p e a r s as t h e head realization code T h e p r o b l e m s o f recurrate realization are different synthesizes (CDR, only recursive calls whose CDDR, number of etc.) CDRs embedded is of the within postulated arguments are the tails original which the using EXAMPLE lambda-varubles. original certain clues The arguments are involving the p r o p a g a t i o n of argument elements to the recurrate. If the head conjoined output and using was recurrate either are successfully CONS interpreted in the or A PPEND forward realized, If they are the original direction, the head r e a l i z a t i o n appears as the first argument of the joining function, w h i l e in the case of backward scanning, the recurrate realization appears first CONDitional The resulting body of code is embedded in a statement, f o l l o w i n g a set of termination checks. E a c h t e r m i n a i i u n f o r m involves a null-check on some tail of a c u r r e n t a r g u m e n t , w i t h the value N I L returned if the result is positive H e r e , each " a r g r a i l " represents some composition of the function CDR o v e r some i n p u t argument C O N S or APPEND function " f u n c t i o n - n a m e A UX 1" with appropriate arguments. T h e f o r m of the rectnsive argument list will be discussed later F i n a l l y , we note that the order of the "head realizing-code" and t h e r e c u r s i v e call may be interchanged 4.3 S E G M E N T I N G I N T O HEAD AND RECURRATE and recurrate 4 1 T h e "head-realmng-code" i s either some n o n r e c u r s i v e l y synthesized expression or a call to an a u x i l i a r y R e c u r s i v e realization SECTION 4 "Join-function" may be either requires correct identification of the head of the o u t p u t We recall that a template-matching p r o c e d u r e is used to locate the first major change in successive H O W IT WORKS: T H E WHOLE STORY g r o u p s of elements. In the present version of E X A M P L E , each g r o u p i n i t i a l l y consists of a single element N O N R E C U R S 1 V E SYNTHESIS If such a g r o u p i n g In t h e c u r r e n t version of E X A M P L E , nonrecursive synthesis is does not allow head-recurrate separation in the template-matching a l l o w e d o n l y if the o u t p u t can be realized simply from the current stage, t h e size of the groups ts increased input arguments constructs w h i c h without decomposing those arguments No break d o w n the arguments (such as C A R or C D R , f o r e x a m p l e ) are considered at this stage We now examine the template-matching procedure in detail. E X A M P L E l o i m s a template by comparing the first two groups T h e effect of this a p p e a r i n g i n the o u t p u t . Consider C O M B 2 l i m i t a t i o n is to prevent the synthesis of an implausible function, w h i c h m i g h t be generated by breaking down each argument into its p r i m i t i v e components and c o m b i n i n g them mechanically to realize t h e o u t p u t In this case, compared the two-element to f o r m groups (A a template (A B) and (A C) are x). where x stands for the d i f f e r i n g elements w h i c h appear in the two instances. ( T h e first A t present, E X A M P L E allows nonrecursive synthesis only i f the h e a d - r e c u r r a t e segmentation postulated by E X A M P L E is in fact o u t p u t can be realized w i t h a composition of the functions C O N S an i n a c c u r a t e guess based on the use of single-element groups, a n d L I S T over the i n p u t arguments. correct functions which may be synthesized More precisely, the class of without recursion segmentation discussed the here is found upon subsequent s c a n n i n g w i t h two-element groups.) is the union of In g e n e r a l , all atoms appearing in corresponding positions are c o m p a r e d for e q u a l i t y appears in the If the two atoms are the same, that atom template Otherwise, r e p r e s e n t i n g the u n e q u a l atoms, appears a unique variable x, A description of the r e l a t i o n s h i p between the two d i f f e r i n g atoms is associated with x T h u s , t h e C O M B 2 template indicates that C, the second instance of x in the o u t p u t , is the immediate successor in the input list of B. t h e first instance Because of these restrictions on nonrecursive synthesis, most user- analyzed s p e c i f i e d f u n c t i o n s of interest are not synthesized at this stage. As we w j l l see, the importnt with A template for the function a g r o u p size of one. is comprised of a single v a r i a b l e a n d the associated i n f o r m a t i o n that the second instance f u n c t i o n of nonrecursive synthesis is the of t h i s v a r i a b l e is the double-successor of the first 263 T h e t e m p l a t e is then used to predict the t h i r d group of elements, In c e r t a i n cases, the o r i g i n a l input list is a reasonable choice of a s s u m i n g the same relationship between the second and t h i r d i n p u t for the subgoal g r o u p s as was observed between the first and second. output pair To predict For example, consider the following input- t h e t h i r d g r o u p a p p e a r i n g in C O M B 2 , for example, the successor of C is instantiated for the template variable x. T h e resulting H e r e the head is exactly the same as the original input argument i n s t a n t i a t e d template, (A D), in fact agrees with the t h i r d group ( A B C D) appearing results w h e n in the output. This template, though, c o r r e c t l y predict the f o u r t h group from the t h i r d does not A t r i v i a l nonrecursive realization o f the head thus EXAMPLE t h u s correctly d i v i d e s the head from the recurrate after the t h i r d is group r e a l i z a t i o n , E X A M P L E always tries this first method, in which For somp f u n c t i o n specifications, no major change is detected u s i n g these heunstics. In this case, the first group of specified as the subgoal For a first attempt at head t h e i n p u t for the subgoal is the same as the original input For elements is taken to be the head, and the remaining groups the some p r o b l e m s , t h o u g h , the original arguments must be broken recurrate. d o w n in some manner in order to realize the head T h e two i n i t i a l (one element) groups of F. (A B C D) ->((F A X F BXF CXF D)). In this case, E X A M P L E fails to accomplish the first subgoal, and creates a f o r e x a m p l e , yield the template (F x), which allows prediction of n e w subgoal whose i n p u t arguments are subparts of the original all groups arguments To illustrate, the first subgoal generated in t r y i n g to h e a d - r e c u r r a t e separation methods employed by E X A M P L E work realize head reasonably well, s a t i s f y i n g the i n p u t - o u t p u t relation universally effective Separation after (F A) is thus assumed. it must A be emphasized larger class that of W h i l e the tney functions are not might be synthesized using better heuristics for this critical decision It was noted in section This 3 that the output must sometimes be the output, (A B C D ) of COMB2 however, can it not to synthesize be realized a subfunction from the input E X A M P L E thus generates the new subgoal scanned b a c k w a r d ( f r o m right to left) in order to effect the proper head-recurrate separation EXAMPLE chooses a scanning w h i c h w i l l eventually result in the successful synthesis of a two- d i r e c t i o n by n o t i n g whether elements from the front of the input argument a u x i l i a r y function list p r o p a g a t e toward the front or the back of the output EX A M P L E If they generates this new subgoal by decomposing the t e n d to appear in the end of the output, E X A M P L E assumes that o r i g i n a l i n p u t , ( A B C D), according t o a simple heuristic, t h e h e a d w i l l be f o u n d at the end of the output list. we note that certain atoms f r o m the input list may appear in the In this case, a reverse scanning direction is used to distinguish the head and head recurrate. diitinguishers, In section 4 6, we w i l l see other effects of the decision to scan b a c k w a r d 44 SUBGOAL REALIZING THE HEAD has reference. chosen been chosen heuristically and A scanning noted for later A d j a c e n t groups of elements have been scanned in the direction procedure. and compared using a not in the appear as recurrate template-matching By locating the site of the first major change, that These arguments for atoms, the called head head lealization subgoal. H e r e A is a head distinguishes since it appears in the h e a d , (A B A C A D), but not m the recurrate, (B C B D C D), A is t h u s chosen as an argument L e t us r e v i e w the work E X A M P L E has done so far. direction but First, original argument after Second, the remainder of the removing the head distinguisher i n c l u d e d unless none of its atoms are found in the head remark in argument is We passing that if E X A M P L E broke down the original completely i n t o its constituent atoms, head realization w o u l d always succeed (nonrecursively) A complete decomposition p a r t of the o u t p u t generated d u r i n g the first step of recursion (the of dangerous, h e a d ) has been distinguished from the part produced d u r i n g all p o s s i b i l i t y of i m p l a u s i b l e synthesis successive recursive calls (the recurrate) f o r selective i n p u t decomposition If no major change was a p p a r e n t , a d e f a u l t separation point has been assumed this now attempt to synthesize code which will generate the head when evaluated with the arguments of the toplevel f u n c t i o n call general admitting the T h i s danger is the m o t i v a t i o n subf C O M B2, unction The let us original summarize goal of the synthesis synthesizing a of its function s p e c i f i e d by s p a w n s t h e head realization subgoal As mentioned before, the head realization subgoal is specified w i t h an input-output p a n , just as the user specified the o r i g i n a l task Selection in We implicitly assume that evaluations of this same code d u r i n g all subsequent recursive calls will produce the recurrate. is B e f o r e c o n t i n u i n g o u r discussion of the synthesis of the m a i n function must though, Finally, those atoms a p p e a r i n g only in the head have been distinguished. EXAMPLE sort, of T h e target output, of couise, is the head itself an appropriate input argument T e m p l a t e m a u h n i g w i t h single-element groups separates the head of this subgcul output, (A list for the head r e a l i z a t i o n subgoal is less t r i v i a l B), f r o m the recurrate, ( A C A D ) E X A M P L E t h r n decomposes the argument (B C D) into B and (C D ) , d i s c a r d i n g (C D), whose atoms f a i l to appear in the head. 264 p r a c t i c e , h o w e v e r , C O N S is used in the case of a single-element head found outermost by list forward scanning EXAMPLE structure of the head adjusts the to allow the use of the a p p r o p r i a t e j o i n i n g function 4 7 SYNTHESIZING TERMINATING CONDITIONS We saw in section 4 1 that all terminating conditions synthesized by E X A M P L E rest a tail of some argument, returning N I L if a N U L L t a i l i s encountered T h e number o f CDRs involved i n each t a i l depends on the number of C D R s used in the recursive c a l l on t h a t argument T h u s C O M B 2 , which is synthesized using C D R r e c u r s i o n , embodies the single null check EXAMPLE which must each number is thus determine the number of CDRs w i t h i n reursive assumed argument should be embedded This equal to the number of atoms from the b e g i n n i n g of that argument which fail to appear in the recurrate U n f o r t u n a t e l y , this method fails for many input-output pairs in w h i c h t h e atoms of the i n p u t do not all propagate to the output. C e r t a i n weak heuristics are used to allow synthesis of some such f u n c t i o n s , but the p r o b l e m is not entirely solved Once the must be acknowledged that this heuristic check its decision about head-recurrate segmentation by querying yields incorrect t e r m i n a t i n g conditions for some functions which are o:herwise w i t h i n t h e target class o f E X A M P L E T h e r e s u l t i n g block of code is embedded in a function d e f i n i t i o n c a l l w i t h the user specified name and list of lambda variables. The resulting function is then defined for e v a l u a t e d w i t h the user-specified input list recursive call has been synthesized, E X A M P L E can t h e user It system use and If this evaluation in fact yields t h e user-specified output, the function is presented to t h e user f o r v e r i f i c a t i o n and further user testing In the case of the above function F O O , for example, t h e user is asked if the proposed recursive call in fact realizes the SECTION b - CONCLUSION recurrate " D O E S F O O [ F , (B C D)) - «F B)<F C)(F D))>" (The user specifipd identifiers are substituted v a r i a b l e s used in the actual recursive call) for T h e E X A M P L E p r o g r a m was written i n I N T E R L I S P b y D a v i d the formal Shaw and was revised by W i l l i a m Swat tout. A number of A negative response is E X A M P L E sessions have been observed d u r i n g the past year, t a k e n as evidence of faulty segmentation of head and recurrate, b u t no f o r m a l study has yet been conducted of the programs users o f t e n l e a d i n g to a revised conjecture regarding scanning group actually size specified. specify or of the way in which such programs are It seems to us that such further study of actual p r o g r a m s p e c i f i c a t i o n w o u l d be valuable at this point 4.6 C O N J O I N I N G T H E H E A D AND RECURRATE N o w t h a t the head and recurrate have been realized, E X A M P L E T h e exact c o n j o i n s the t w o resulting pieces of code using either C O N S or p r o g r a m specification is not yet clear APPEND. the If the o u t p u t was analyzed using a forward scanning d i r e c t i o n , the head realizing code appears as the first argument of t h e j o i n i n g f u n r t i o n , since the head must have been found at the b e g i n n i n g of the o u t p u t If reverse scanning was used, the head must the output, and the head realization be at the end of a p p e a r s as the second argument Several factors are considered APPEND should be backward scanning, used tor APPEND capacity input-output for specification examples by will play in facilitating We believe, however, that examples may be a useful c o m p o n e n t of f u t u r e automatic p r o g r a m m i n g systems We c o n c l u d e w i t h several other LISP functions synthesized by the E X A M P L E program T h e shorthand notation < f u n c t i o n name> <input list> -» <output> w i l l represent the user specification of a function <function name> in deciding whether C O N S or conjunction In is always chosen the case of T h e joining f u n c t i o n w i l l also be A P P E N D whenever the head contains more t h a n one element role In accoidance with usual human programming which returns the value a r g u m e n t s on * i n p u t list>. <output> when evaluated with the 266 REFERENCES Biermann, A W, and Feldman, J A. "On the Synthesis of Finite-State Machines from Samples of Their Behavior," IEEE Transactions on Computers, Vol C-21, No 6, June 1972, pp b92597. (also " O n the Synthesis of Finite-State Acceptors," Memo A I M - 1 1 4 . Artificial Intelligence Laboratory, Computer Science Department, Stanford University, Stanford. California, A p r i l 1970). B l u m , L., and Blum, M, "Inductive Inference A Recursion Theoretic Approach", Information and Control, to appear (Also M e m o r a n d u m ERL M386, Electronics Research Laboratory, College of Engineering, University of California, Berkeley, C a l i f o r n i a , August 1973). Feldman, Jerome A., Cips, J, Horning, J. J., Reder, S., "Grammatical Complexity and Inference," Memo AIM-89, Technical Report No. CS 125, Artificial Intelligence Laboratory, Computer Science Department, Stanford University, Stanford, C a l i f o r n i a , June 1969 Green, C C, Waldmger, R J., Barstow, D. R, Elschlager, R., Lenat, D B, McCune. B P. Shaw. D E., and Steinberg, L. I, "Progress Report on Program-Understanding Systems," Memo AIM-240, Report STAN-CS-74-444, Artificial Intelligence Laboratory, Computer Science Department, Stanford University, Stanford, California, August 1974. H a r d y , Steven. "Automatic Induction of LISP Functions," AISB Summer Conference, University of Sussex, Brighton, England, July 1974, pp 50-62 H o r n i n g . James Jay. "A Study of Grammatical Inference." Ph.D. thesis. Memo AIM-98, Report STAN-CS-69-139. Artificial Intelligence Labotatory, Computer Science Department, Stanford University, Stanford, California, August 1969 L i c k l i d e i , J C R., in "Automatic Composition of Functions from Modules. Project M A C Progress Report X July 1972 - July 1973, Section I I I E l . Project M A C , Massachusetts Institute of Technology. Cambridge, Massachusetts, pp 151-156 267
© Copyright 2026 Paperzz