How chess players think: evidence for the role of search at Expert level and below Patrick Turner First degree: BSc. (Hons) Mathematics Open University personal identifier: U6094525 Dissertation submitted for: MSc. in Psychological Research Methods March 2005 Abstract There are two competing views of the dominant mechanism underpinning chess thinking – pattern recognition or search-and-evaluation? Whilst the recent development of template theory has gone some way to unifying the two existing theories, there still remain a great deal of unanswered questions concerning the nature of the chess thinking process – in particular the relative contribution of recognition and search-and-evaluation to chess skill. Although recognition-based theories of chess thinking do not deny that search is part of the thought process, they emphasise that recognition of the position provides for highly selective search. Thus an Expert need not search any faster, or deeper, to arrive at a good move – he narrows down his search by pattern recognition to focus his analysis on the good moves. Conversely, search-and-evaluation theories emphasise the ability to search deeper, wider, faster and more thoroughly, coupled with the ability to evaluate leaf nodes more accurately, as the basis for the selection of good moves. They do not claim that recognition is not involved in directing search – merely that it is not the dominant mechanism. The aim of the research discussed here was to investigate support for both recognition and search theories of chess skill through experimentation involving chess players at two levels (Expert and Class A/B) completing a ‘choice of next move’ task for three chess positions. Two major conclusions are drawn from the results. Firstly, there is strong evidence for differences in search capabilities across skill levels in chess players, supporting the results of Gobet (1998a) and others. Such evidence argues against the basis of de Groot’s main conclusion (1965) that recognition is the dominant mechanism underpinning chess skill. Proponents of template theory (e.g. Gobet & Simon, 1998a) argue that such continued results for search differences across skill levels do not undermine the recognition-based theory of chess skill itself. The second major conclusion to be drawn, however, suggests that there is less support for the role of recognition than in previous studies, such as Gobet’s (1998a). It may be that the results hold only between Class A/B players and Experts. This would provide evidence to the fact that the better players at club level are superior primarily because of their search capabilities and not recognition. A different model of chess skill may be required for players below the level of Master. 3 Table of contents Introduction 5 Literature review 10 Methodology 31 Analysis 37 Project Review 55 Conclusions 63 Appendix I: de Groot positions 66 Appendix II: Protocol analysis 71 Bibliography 77 4 Introduction The game of chess provides an ideal environment for the study of human decision-making in complex domains. As such, it has provided the basis for a number of studies into human cognition, including perception, memory and decision-making. Over the decades following the publication, in 1965, of Adriaan de Groot’s original research into chess thinking, there have emerged two schools of thought concerning how chess players think – the family of recognition-based theories typified by chunking theory, due to de Groot (1965), Chase & Simon (Gobet and Simon, 1998a, 1998b; Gobet, 2004) among others; and the search-and-evaluation theory of Holding (Holding, 1985; Gobet 2004). Whilst the recent development of template theory has gone some way to unifying the two theories, there still remain a great deal of unanswered questions concerning the nature of the chess thinking process – in particular the relative contribution of recognition and search-and-evaluation (often simply referred to as ‘search’) to chess skill. The structure of chess thinking The two theories agree on the basic structure of the chess thought process. De Groot (1965) showed that this process can be represented as a sequence of mental operations on not only the perceived position that the player is confronted with but also imagined positions as might occur if certain sequences of moves are played – a development of Selz’s Framework of Productive Thinking (de Groot, 1965). Briefly, the chess thinking process comprises three main phases – a phase of orientation, noting possible threats, plans and candidate moves; a 5 phase of elaboration, within which specific sequences of moves are considered (“I move here, then he moves here” etc.), each of which terminates in an evaluation of the desirability of an imagined position (a ‘leaf node’); and a final phase within which the best move so far considered may be checked before the player commits to it (de Groot 1965, pp100-116). It is within the middle phase that search activity is carried out. Although recognition-based theories of chess thinking do not deny that search is part of the thought process, they emphasise that recognition of the position (and good moves or general plans to undertake in such a position) serves to make search activity highly selective. Thus an expert player need not search any faster, or deeper, to arrive at a good move – he narrows down his search by recognition to focus his analysis on the good moves. Conversely, search-and-evaluation theories emphasise the ability to search deeper, wider, faster and more thoroughly, coupled with the ability to evaluate leaf nodes more accurately, as the basis for the selection of good moves. They do not claim that recognition is not involved in directing search – merely that it is not the dominant mechanism. Newell & Simon (1972) formalised de Groot’s framework in the Problem Behaviour Graph (PBG) model. A PBG characterises the phase of elaboration in chess thinking, where search is undertaken. They are characterised by sequences of moves, beginning with a candidate move (or base move) and alternating for moves from each side, with possible branching in each sequence. Each branch ends in a leaf node and each leaf node is evaluated, usually only as ‘good’ or ‘bad’ for the player on move. As such, PBGs allow for the extraction of search variables such as ‘number of nodes searched’, and ‘maximum depth of search’. It is more difficult to extract variables characterising recognition although 6 ‘number of base moves considered’ serves to characterise option generation before any search is conducted. Aims The aim of the research discussed here was to investigate support for both recognition and search theories of chess skill through experimentation involving chess players of different calibres completing a ‘choice of next move’ task for a small number of chess positions with varying character. The experimental aims were to establish significant differences in choice of next move and search behaviour across two groups of chess players of differing calibres, for three different chess positions. This was to be achieved through the application of de Groot’s experimental procedure and using the analysis methods of de Groot (1965) and Newell & Simon (1972). Data from the most recent study of this kind, that of Gobet (1998a), was also to be used for comparisons of results. The specific research questions included: Do club-level chess players of differing calibres differ in terms of quality of move selection? Do club-level chess players of differing calibres differ in terms of capacity of search, mean and maximal search depth, and thoroughness of search? To what degree do the levels of search activity in club-level players fit with existing models of chess thinking? 7 Novelty The experimentation and analysis outlined above is not completely novel. It draws much of the experimental procedure, analysis methods and study variables from existing research in the field, such as de Groot (1965) and subsequent replications of that original set of experiments (Newell & Simon, 1972; Gobet, 1998a). It is novel in two respects, however: It comprises a repeated measures choice of next move task across three positions; each of the three studies named above focused only on one position; It samples from club-level players only (Experts down to Class B) and therefore serves to test some of de Groot’s original conclusions, which were based on an extremely high calibre sample including Grandmasters. Motivation for this dissertation The choice of subject matter for this dissertation is motivated by twin interests in human decision-making in naturalistic settings and empirical research into human decision-making. An enduring methodological problem that human decision-making research faces is the design of experiments that both preserve ecological validity (i.e. a naturalistic decision-making setting and task) and enable the valid measurement of important variables. Chess is a rare case of a structured and bounded decision-making environment that still affords ecologically valid, yet well-defined, experimentation. 8 Structure of this dissertation The remainder of this dissertation is structured as follows: The Literature Review introduces the main arguments for both recognition- and search-based theories of chess skill; The Methodology chapter outlines the experimental design, experimental procedure and analysis techniques undertaken. The Analysis chapter sets out the results and analysis from the experiment. The Project Review reflects upon the changes in focus for the research throughout its course, including modifications to the design, the success of the experiment, the focusing of the analysis and the validity of the methods. The Conclusions chapter revisits the main findings of the analysis in the context of the original research questions and the wider debate concerning the nature of chess skill. 9 Literature review The game of chess is ideally suited to a range of studies in cognitive psychology, particularly memory, expertise and decision-making. Success at chess is completely dependent upon skill and, whilst the configuration of the board and pieces, and the rules of the game can be understood relatively quickly, a typical chess position offers a non-trivial decision-making task, even for highly skilled players. This is because of the inherent complexity that the game offers and, although information about each position is known perfectly and the ultimate goal of the game is certain, this complexity renders chess a credible domain of interest for the study of human decision-making. There is also a substantial amount of psychological literature on chess, perhaps because of the relatively simple manner in which experiments can be conducted. Cognitive psychology and chess enjoy a history of over a century of research; the key question that has engaged psychologists throughout has been, “What constitutes skill at chess?” Although it is generally agreed that chess skill is based upon both recognition (the ability to match patterns based on the possession of ‘good’ patterns) and look-ahead search (essentially the ability to compute sequences of moves), opinions are polarised and there are distinct camps that espouse the dominance of one mechanism over the other. Most modern research on chess skill has its foundations in the studies of the Dutch international chess master Adriaan de Groot, whose original experiments, conducted between 1938 and 1944, served to develop both theories of expertise and decision-making, and corresponding experimental methods. The remainder of this chapter is divided into sections, each of which discusses a key 10 development in one or both of the competing recognition-based and search-based theories of chess skill. The role of recognition: de Groot De Groot (1965) was concerned with the thought processes underlying expert chess players’ choice of next move decisions. His main experiment was a ‘choice of next move’ task, conducted with a relatively small sample of good chess players, ranging from grandmasters (including Alexander Alekhine and Max Euwe) to Class C players (approximately average club level). De Groot used a set of chess positions, typically middlegame positions taken from games which he had played. De Groot set these positions up on a chessboard and asked his subjects, assuming the role of the player on move, to think of a move and play it on the board as if they were involved in an actual tournament game. The only extra stipulation was that the subject ‘thought aloud’ as he or she did so that de Groot could record the way in which the subject arrived at his or her next move. (This method is discussed in further detail in the next section.) De Groot recorded each subject’s thought as a verbal protocol which he then coded, using Otto Selz’s framework of productive thinking (de Groot, 1965). De Groot was motivated by Selz’s framework, which described thinking as a ‘hierarchically organised linear series of operations’ (de Groot, 1965, p vi) and, in fact, sought to test it through the coding of the protocols. De Groot demonstrated that he could successfully represent the protocols within this framework, which, at the macro-structural level, comprises three phases: a first phase of orientation that may include a listing of candidate moves for consideration; a phase of elaboration whereby candidate moves are examined in 11 detail through the consideration of possible sequences of moves that they precipitate; and a final phase in which a move is selected, possibly following some form of summarisation. De Groot’s coding, which was later formalised by Newell and Simon (1972) as a Problem Behaviour Graph (PBG), captured the history of all sequences of moves, each beginning with a base move (candidate next move) considered by the subject. Such sequences included branching, whereby the subject considered two or more possible sequences from some branching move coming after the base move. Each sequence terminated in an evaluation (positive, negative or unexpressed). Since this coding captured all the moves considered it allowed for the reinvestigation of base moves. De Groot did not expose every player to every position; positions A, B and C were most commonly used and de Groot chose only to extract quantitative variables from the encoded protocols for these positions (seen by 19, 6 and 6 players, respectively). These variables included the chosen move, the time taken for each phase, the ordered sequence of base moves considered (candidate next moves), the total number of moves, and variables concerning the frequency of both immediate and non-immediate reinvestigations. De Groot had also analysed positions A to C extensively to generate an order of ‘move quality’ for each of the legal moves in each position. De Groot’s first results were that stronger players chose better quality moves than weaker players. Secondly, there was little difference between masters and Experts1 on the various ‘search variables’, including the total number of moves considered (typically less than 100), depth of search or rate of search (number of moves per minute). De Groot then asserted “the master does not necessarily 1 Experts is capitalised when referring to the class of players directly below masters and not capitalised when referring, in general, to people possessing expertise. 12 calculate deeper, but the variations that he does calculate are much more to the point; he sizes up positions more easily and, especially, more accurately” (1965, p320). Although de Groot stated that he still expected greater search abilities in high calibre players, he conceded that such differences did not explain the observed performance differences. Having failed to establish skill differences on these search variables, de Groot therefore conducted a second experiment based on a ‘recall’ task, originally conducted – in flawed form – by Djakow, Rudik and Petrowski in 1927 (Gobet 2004). Players were exposed to 16 positions, taken from relatively obscure master games, each for a short length of time (between 2 and 15 seconds). After each presentation the player was requested to reproduce the position verbally and de Groot developed a scoring scheme for assessing the corresponding verbal protocols. The results showed, significantly, that grandmasters outperformed weaker players. De Groot inferred that experience (in its effect upon perceptual processes) was the contributory factor, asserting that the position is perceived in large complexes, each of which hangs together as a genetic, functional and/ or dynamic unit. For the master such complexes are of a typical nature.” (1965, p329, italics from original text). De Groot also suggested that “eye movements undoubtedly come into play” – a hypothesis proved, in 1996, by de Groot and Gobet (Gobet, 2004). De Groot conducted a detailed analysis of the verbal protocols for the recall task and identified contentspecific themes that demanded differing degrees of attention. It is interesting to compare this approach with the quantitative (information-theoretic) approach of Chase and Simon in the development of chunking theory (see below). Returning to the results of the ‘choice of next move experiment’, one of de Groot’s innovations was an extension of the Selzian framework of productive 13 thinking. De Groot noticed that players employed a method that he denoted ‘progressive deepening’ – the reinvestigation of sequences emanating from the same base move several times, either immediately or non-immediately, with the tendency to search both progressively wider (examining more branches) and deeper each time before evaluating at leaf nodes. This is referred to as ‘rough cut, fine cut’ by Newell and Simon (1972, p752). Selz’s concept of ‘subsidiary methods’ stated that human problem solving is based on, essentially, exhaustive depth-first search in support of one plan followed by depth-first search for a second plan if the first fails etc. (where ‘plan’ defines the context of evaluation of leaf nodes). De Groot effectively redefined ‘exhaustiveness’ in relative terms, (1965, p270). This allowed for the reinvestigation of any base move, with the examination of ever deeper and wider extensions to the search tree emanating from each move. De Groot proposed that the varying criteria by which a sequence is considered to be ‘exhausted’ upon investigation/ reinvestigation – and thus the criteria by which the corresponding base move is evaluated as good or not – are based on recognition. De Groot’s main conclusions, across both of this experiments, was that recognition (based on the possession of perceptual chess-specific knowledge), together with the application of effective set of heuristic goal-driven rules, were the major components of chess skill. The identification of recognition, in particular, as a key mechanism refuted the then commonly held view that chess skill was innate and had a large impact on theories of expertise that still persists. 14 Information processing and Problem Behaviour Graphs The representation of human problem solving in the Selzian framework was attractive to Herbert Simon, who viewed such an activity as, essentially, as information processing. Simon was also the originator of the concepts of bounded rationality, which states that there are limits on human information processing that, in turn, impose limits on human rationality, and satisficing, which describes the sufficient, yet sub-optimal, human approach to decisionmaking where bounded rationality is enforced, e.g. due to the complexity of the decision-making environment. Chess is certainly one such environment and there are clear parallels between satisficing and de Groot’s progressive deepening, the latter of which seeks a positive evaluation of a move even though a thorough analysis may be lacking. In 1965, Newell and Simon (1972) reinvestigated and replicated de Groot’s ‘choice of next move’ experiment with the aim of investigating whether the human decision-maker, in selecting his next move in chess, could be considered an Information Processing System (IPS) and whether a thorough task analysis would enable them to enrich their IPS model. Newell and Simon advocated the elicitation of verbal protocols but emphasised their quantitative analysis rather than de Groot’s extensive qualitative analysis. As such they built on de Groot’s enhanced Selzian framework and formalised the coding of the verbal protocol as a Problem Behaviour Graph (PBG). A PBG is a descriptive chronological model of an individual’s thinking throughout the course of a problem-solving task. It concerns the navigation of a human decision maker along sequences of linked nodes, each representing some projected state of the environment with links representing the application of an 15 operator to a previous node. This forms a chronologically order set of sequences of linked nodes, possibly with branching (representing the conception of two different operators on a given node), ending at given leaf nodes. A PBG for choosing the next move in a chess position represents, as nodes, future chess positions that may be arrived at through the application of a sequence of moves for white and black. Each initial move, or base move, represents the candidate moves that a player conceives, and chooses from, in completing the task. Each leaf node terminates in an evaluation (including a ‘non-evaluation’) of the position at that point. Note that a PBG is not equivalent to a search tree because the latter models all sequences of moves considered by the chess player in selecting his next move once only whereas a PBG provides a chronological view on that player’s considerations. As such, PBGs therefore may contain a number of sequences beginning with the same base move, which may or may not be different (indeed, identical sequences may or may not include different evaluations). Whilst most of the work underpinning PBGs is due to Selz and de Groot, Newell and Simon added the graphical formalism. To differentiate between difference sequences, they redefined de Groot’s ‘sub-phases’ as episodes – distinct chains of reasoning beginning with a base move, whether it be different or the same as that considered beforehand. The advantage of the PBG formulation is that it provides for the quantitative analysis of the search-and-evaluation process. Newell and Simon (1972) examined the quantitative variables derived from the protocol of a single subject (S2) and compared them with those of de Groot’s sample, noting the consensus in results in terms of both quality of move and decision-making method. In particular, S2 exhibited progressive deepening. 16 Perhaps the most important contribution of Newell and Simon’s 1965 research was their detailed analysis of the search strategies of S2 and de Groot’s subjects. They proposed a small number of principles for the generation of moves and episodes – essentially an attempt at naming the ‘heuristic rules’ that de Groot had suggested contributed to chess skill. Newell and Simon did not find much evidence, in the protocols, of means-ends analysis (goals-setting and the identification and analysis of means – i.e. moves – to achieve those goals) although they noted both that all protocols studied concerned position A – a highly tactical position in which strategic plans are of less consequence – and that de Groot had observed numerous examples of goal-setting in more strategic positions (1965, pp157-9). Despite their characterisation of search strategies, Newell and Simon share de Groot’s view on the importance of recognition in chess skill, particularly upon immediate consideration of a position and prior to any search: “players notice a small number of considerable moves, and do not notice (or at least do not mention noticing) the large number of remaining legal moves” (Newell & Simon, p775), that is, there is a perceptual process guiding search from the outset. This embodies the ‘first phase’ in de Groot’s macrostructural model of next-move selection. Chase and Simon’s Chunking theory Chunking theory emerged from the 1973 experiments of Chase and Simon (Gobet, 2004) as a general theory of expertise, originally applied to chess. In line with de Groot’s conclusions, it asserts that recognition is the key mechanism underpinning expertise. In the experiment, three classes of player (Masters, Experts and novices) were exposed to middle and end-game positions of two 17 types: positions from actual games and random positions matched for the number of pieces present. There were two tasks: the ‘recall’ task was essentially a modification of de Groot’s procedure although all positions were shown for 5 seconds and the players were subsequently asked to reconstruct them on a chess board; the ‘copy’ task differed in that the positions were not hidden from the experiments during the reconstruction phase. For the positions drawn from actual games, success at reconstruction (according to the number of pieces correctly placed) was found to be proportional to skill level. For the random positions, however, there were no significant differences across the three groups of players. Chase and Simon concluded that the improved performance for more skilled players was not due to any superiority in short-term memory, but to the recognition of familiar patterns. Chase and Simon (Gobet 2004) noted that, in both tasks, subjects reconstructed pieces in groups, as defined by the intervals between piece placement in the recall task, and by glances at the stimulus position in the copy task; further, pieces in the same group tended to share more meaningful relations (e.g. attacking, defending, same colour, same type etc. – judged by skilled players) than those in different groups. Chase and Simon denoted these patterns of pieces ‘chunks’. The experiment also provided evidence that better players possess bigger chunks (in terms of number of component pieces) and more chunks. Chase and Simon (Gobet 2004) asserted that chunks are stored in short-term memory (STM) as pointers to patterns encoded in semantic long-term memory (LTM). Essentially, chunks are akin to the conditions of productions in LTM that associate patterns with moves. Chase and Simon also expressed time 18 parameters for the rate of learning (approximately 8 seconds per chunk) and STM limits (7 chunks, in line with Miller’s predictions). In a second 1973 paper, Chase and Simon also proposed that a secondary transient memory store, a visuo-spatial store known as the mind’s eye, provides an internal representation of the position upon which mental operations may be carried out (e.g. the moves suggested by LTM). The position in the mind’s eye is also available to perceptual processes and thus chunks in a projected position following a potential move may also be perceived and matched against patterns in LTM. Thus chunking theory offers an explanation of how recognition may be combined with mental simulation to arrive at good moves. It should be noted, however, that the mind’s eye extension to the theory is not supported by empirical evidence since the experiment did not include a decision-making task. Chase and Simon conducted a second experiment to demonstrate the stability of chunks. The criterion for stability was: a chunk is considered to be repeated if at least two thirds of its component pieces are recalled together. Stability of chunks for class A players was 96%, versus 65% for the master player in the sample. Support for chunking theory comes from Charness (Gobet, 2004) who, in 1974, conducted a recall experiment with positions presented verbally, at a rapid rate (average latency 2.3 seconds per piece) in three ways: by Chase and Simon’s relations; by columns (on the board) or randomly. The best recall was found for Chase and Simon’s relations and the worst for the random condition. Criticisms of chunking theory Chunking theory was not without its critics, however. These criticisms are on a number of bases and include both methodological criticisms and theoretical 19 criticisms. Gobet and Simon (1998b) summarised the methodological criticisms raised by many authors, including Holding (1985) and highlighted some methodological concerns of their own, including the small sample size in the 1973 experiments and the one-to-one mapping of pieces placed a single ‘bursts of activity’ onto chunks. A single burst of activity was defined, in the 1973 recall experiments, as a sequence of piece placement with latencies less than 2 seconds between pieces. Gobet and Simon (1998a) argued that this latency may actually increase over the recall period. Further, a burst of activity is also dependent upon the physical limitations of picking up all component pieces of a chunk in one hand. The most outspoken critic of the theory was, perhaps, Holding (1985), who advocated the roles of both search and conceptual knowledge (rather than perceptual chunks) in chess skill. Holding’s specific arguments included the following: Chunks may be encoded into LTM in less than 8 seconds; The size of chunks is too small to reflect conceptual knowledge; Although chess skill can explain memory performance, there is no evidence for a causal relationship in the opposite direction, that is that memory (and recognition) explains chess skill. The first criticism was based on recall experiments with interpolated tasks designed to cause STM interference (e.g. Charness’s experiment of 1976, reported in Holding, 1985) had shown no effect on memory performance, suggesting that LTM encoding for chunks was rapid. The second criticism is based on Holding’s assertion that chunking theory “does not provide a sufficient basis for maintaining that chess memory is organised in small chunks whose labels are held in STM. Instead it appears that chess players who actively 20 process the given positions are able to integrate the general characteristics of these positions in a hierarchical, prototypical or schematic format, not necessarily based on pairs of pieces, that constitutes an ‘understanding’ of the positions” (Holding, 1985 p130). Key to this argument is Holding’s inspection of both positions and corresponding chunks from Chase and Simon’s experiments. He claimed that the actual chunks identified bear little relation to the important playing themes in that same position and concluded “if we assume that all the chunks for memorising purposes are to be identified on one basis and the patterns for move selection on another, the theory loses a good deal of its economy” [Holding, p103]. Indeed, if we accept the criterion for the stability of chunks across experiments, it appears that better players perceive positions in a number of ways (65% stability is a fairly low figure). The final criticism is backed up with evidence from Holding and Reynold’s (1982) experiment with random positions. Players of different skill levels from novice to Expert completed two tasks: the first was a recall task and the second was a choice of next move task on the corrected positions. As expected, there was no effect of skill on memory, but there was a significant effect of skill on (assessed) quality of next move. Holding and Reynolds concluded that “the evidence shows that skill differences continue to appear in situations where recognition by chunking is impossible” (Holding, 1985 p133). In light of such criticisms, Gobet and Simon’s replicated the 1973 experiments and made corresponding modifications to the theory (discussed in Gobet and Simon’s template theory, below). 21 SEEK Theory: the contribution of Holding Above all of Holding’s specific criticisms of Chunking Theory, his central belief was that it was basically flawed – although he accepted the result that skill has an effect on memory for meaningful chess positions, he believed that the role of recognition (based on memory) was insufficient in explaining chess skill. Holding promoted the importance of search, evaluation and knowledge to chess skill and expressed this idea in his SEEK theory. It is important to understand Holding’s distinction between the mechanisms of ‘recognition’ and ‘search’ since his use of terminology differs slightly from that of other researchers. To Holding ‘recognition’ defines the key mechanism of Chunking Theory as the association between perceived patterns (chunks) and good moves – without search. ‘Search’ involves a combination of planning a selective search through candidate moves and sequences, and evaluating the utility of these moves to support next move selection. Perhaps the most confusing aspect of Holding’s definitions is that he asserts that pattern recognition from semantic knowledge also plays a key role in directing search by suggesting good moves. To Holding, “patterns may be general rather than specific chunks” (1985, p174) and the corresponding recognition mechanism is almost certainly less ‘automatic’ in its cueing of moves than that of Chunking Theory. In fact, it appears that ‘search’, in itself, is an extremely low-level skill, involving only focusing one’s evaluative skills on different moves. It should be noted that Holding (and others) refers to ‘search’ when he really means the wider set of skills described above, i.e. search, evaluation and knowledge – all three of which are embodied in SEEK theory. Holding claimed that, within de Groot’s verbal protocols, there was, in fact, a relationship between skill level and both number of moves considered and speed 22 of search (number of moves considered per minute), although this was not statistically significant. He argued that the real effect was obscured by the highly tactical nature of the only position for which a meaningful number of protocols were published, i.e. position A. Other studies have supported this claim, in particular Charness’s 1981 experiment (Holding, 1985; Gobet, 2004), conducted with 34 skilled players and a balance of tactical and strategic positions, different to those used by de Groot, suggests a linear relationship between skill level (in terms of Elo points) and depth of search (in terms of number of moves). Holding reports that average maximal depth of search increases by 1.4 plies per standard deviation of skill (200 points) and Gobet reports that the average depth of search increases by 0.5 plies for the same interval. In 1979, Holding (1985) developed a single scale to evaluate positions on the basis of advantage to one side over the other using the expert judgement of skilled players. He then asked 50 Class A-E players to evaluate a set of quiescent positions, with level material, from actual grandmaster games on this scale. The players were also asked to select a next move. Evaluations were scored in comparison with the actual outcomes of the games. The results showed that there is an effect of skill on evaluations. In Holding and Reynold’s 1982 experiment (Holding, 1985) for recall on random positions players were also asked to evaluate the position immediately (after it had been corrected following the recall task) and after 5 minutes of consideration. There were no skill differences for ‘correctness of evaluation’ at either measurement point. Holding concluded that evaluative skill is influenced by memory, including “generic [semantic] memory for the type of specific… formations that are known to give rise to advantages and disadvantages” (Holding, 1985 p208). 23 Holding’s main conclusion is that differences in chess skill are due to search, evaluation and knowledge: “the better players show greater competence in every phase of the SEEK processes, conducting more knowledgeable evaluations, in order to anticipate events on the chessboard” (1985, pp255-256). Gobet and Simon’s template theory Gobet and Simon (1996) set out to test Holding’s conclusion by means of a ‘natural experiment’, observing the performance of the then-world champion Grand Master Gary Kasparov, in both a series of matches of simultaneous games and tournaments against expert opponents (predominately Masters and Grand Masters). The average time afforded to Kasparov for each move was 3 minutes in tournament play and 3 minutes per round (all matches of simultaneous games, played against between four and eight opponents). Gobet and Simon reasoned that the increased time-pressure in the simultaneous games would provide Kasparov with less time to evaluate moves and, therefore, if Holding’s conclusion were true, he should perform less well in the simultaneous games than in the tournament. The results showed that Kasparov’s performance did not greatly differ across the two conditions. Indeed, in the simultaneous matches, Kasparov played at the level of a very strong Grand Master. Gobet and Simon concluded that it was Kasparov’s pattern-matching that accounted for his similar performances in both simultaneous matches and normal tournament play, and that this result could be generalised to all expert chess players. This is supprted by a similar result from Calderwood, Klein and Randall (1988). Gobet and Simon (1998b) asserted that some of Holding’s criticisms were valid (e.g. those concerning LTM encoding and chunk size) whilst others were 24 incorrect (or had been shown to be incorrect). For example, Holding’s result for skill differences for choice of next move decisions in random positions was countered by Gobet and Simon’s experimental results (1998b) that indicated that chunking theory does predict a small skill difference in the recall of such positions – contrary to de Groot’s and Chase and Simon’s earlier results and preserving the possibility of a relationship between memory and skill. Gobet and Simon state that Holding’s main issue with chunking theory – that it consists of pattern recognition without search – is a misunderstanding, since the ‘mind’s eye’ extension to the theory clearly describes the use of pattern recognition to support a ‘think-head’ process, thus generating subsequent moves for consideration (this account also largely equates pattern recognition of non-base moves with Holding’s evaluation mechanism). In 1996, Gobet and Simon (1998a) replicated Chase and Simon’s original experiments, with some key modifications, including an increased sample size of 26 (ranging from Masters to Class A players) and computer-aiding for the reconstruction of positions, to eliminate the physical limitations on piece replacement in the original experiment that may have confounded results on chunk size. The main results concurred with Chase and Simon’s original study – that is, skill effects on recall in both tasks disappeared for random positions. The most startling difference in results, however, related to the size of chunks. Whilst the effect of skill level on chunk size was again present, mean largest chunk size at all skill levels was greater. In particular, for Masters this figure was 16.8 in the recall task (compared with 7 in the original experiment), and 14 in the copying task. Moreover, some positions were reconstructed by Masters using only one chunk. 25 This new data confirmed Gobet and Simon’s development of chunking theory, namely template theory (1998a, 1998b). Template theory uses the same basic mechanism as chunking theory, so that chunks are stored in STM as pointers to patterns in LTM; they are also used to reconstruct visuo-spatial images in the mind’s eye (the secondary transient memory store). Gobet and Simon stated that the more typical the position, the stronger the associations that chunk will have with semantic memory, including moves, plans and other patterns. Further, they proposed that such positions are actually represented by templates, which are essentially chunks with slots for variables. They therefore comprise a ‘core chunk’ and their parameters allow them to describe a range of chunks within a class defined by the range of variable values. Templates can provide for large constellations of pieces to be considered together where large chunks alone cannot, since the number of chunks with, e.g. more than 10 pieces, required to hold all meaningful patterns on those pieces would be unmanageably large. Templates, instead, provide for the redundancy that occurs because classes of chunks tend to share good moves, plans, tactical and strategic features etc. Gobet and Simon emphasise, within template theory, the associations between chunks and templates with semantic knowledge. As with chunking theory, the authors suggested a leaning time for 8 seconds for chunks and templates. Two learning parameters are proposed: Gobet and Simon also assert that “like the chunking theory, template theory is not limited to chess” (Gobet 1998b p.127) Template theory served to address the outstanding criticisms of chunking theory in the following ways. The null effect of interference for recall of chess positions could be accounted for by chunk size, since if less STM pointers are required to encode a single position (possibly only one for Masters) then noise 26 will not necessarily eradicate that memory. Likewise, Holding’s criticisms on chunk size and conceptual knowledge were countered by direct modifications to the theory, which were supported by empirical evidence. Finally, Gobet (1998a) has used template theory to explain skill differences for search variables; this is discussed in the next section. The integration of pattern recognition and search Gobet (1998a) conducted a replication of de Groot’s choice of next move experiment with 48 Swiss players (ranging from Master to Class B) using de Groot’s position A, and conducted an extensive analysis of the resultant verbal protocols, including the generation of problem behaviour graphs (Newell & Simon, 1972) and the extraction of the same quantitative variables as de Groot, with the aim of comparing results and reinvestigating the effects of search variables on quality of next move. Gobet was motivated both by empirical evidence that opposed de Groot’s result that search variables did not differ across skill levels, e.g. due to Charness (Gobet 2004) and by the lack of replication of de Groot’s original experiment; he was undoubtedly also motivated in seeking empirical evidence to support his own work at that time with Herbert Simon in developing template theory, since although the research was published in 1998, the original data was collected as part of a different study in 1986. As well as a small skill difference for the mean depth of search, Gobet discovered a skill effect for the way in which progressive deepening was conducted. The variables in the study characterising progressing deepening behaviour related to the number of reinvestigations of sequences starting with the same base move; these were sub-divided into immediate reinvestigations (same base move considered 27 twice in succession) and non-immediate reinvestigations (same base move considered twice with at least one different base move considered in between), and also maximal and total values, with the former providing the largest number of reinvestigations (immediate or non-immediate) among all base moves considered. The maximal number of immediate reinvestigations had a positive association with skill level and the maximal number of non-immediate reinvestigations had a negative association with skill level. Gobet’s main conclusions were that players in his sample differed along more dimensions that those in de Groot’s sample, and that the average values on all variables (pooled across skill levels) did not differ significantly between studies. Gobet notes that the differences he found within his sample were mainly between Masters and Class players. Since de Groot’s sample only included 2 players at Class level, it is perhaps not surprising that such differences did not show up in the original experiments. Importantly, Gobet claims that his skill effects for search can still be accounted for by pattern recognition models of chess thinking because sequences of moves are likely to be associated with patterns: “pattern recognition should facilitate the generation of moves in the mind’s eye, permitting a smooth search” (1998a p24). Saariluoma presented further evidence of the pattern-recognition-based search hypothesis (Gobet 1998a, 2004) with his ‘smothered mate’ experiment, in which high calibre players were asked to choose a move that would lead to mate in a specially devised endgame. The position was one that had an efficient, yet unusual sequence of moves that led to mate as well as a longer, more familiar sequence. Players tended to choose the move at the beginning of the stereotyped position. 28 Summary In summary, the relative influences of recognition and search-and-evaluation on chess skill are not fully understood. Further, the degree to which these are, in fact, separate processes rather than alternative descriptions of the same process, is unclear. Certainly most advocates of either theory believes that both recognition and search mechanisms are fundamental to chess skill. For example, de Groot’s (Gobet, 2004, p120) assertion that recognition serves to direct the look-ahead search-and-evaluation suggests that these processes are, in some sense, interdependent. Further, Holding’s (1985) conclusion that search-andevaluation is the dominant process is based on the assertion that better players plan these evaluations in a more effective way. Yet Holding’s “knowledgeable evaluations” (1985, p256) might well be directed by effective pattern-matching, which is essentially De Groot’s conclusion. Gobet and Simon’s template theory, developed in part due to criticisms of chunking theory from advocates of searchand-evaluation, provides for a credible explanation of skill differences for search (if it is accepted that templates can store sequences of moves). This extended theory apparently leaves no room for alternative explanations of chess skill wherever it could be argued that patterns exist (e.g. any experimentation involving real chess positions). It therefore offers the possibility of unifying both recognition-based and search-based theories. To refine the template theory explanation of skill differences on search variables, further data concerning such differences would be of great benefit. Further, the balance of chess research has been in favour of recall tasks, rather than choice of next move tasks. The attractions of recall tasks (over choice of next move tasks) in explaining chess skill are the objectivity of the measures and 29 the ease with which data can be analysed. Since chess skill is primarily concerned with decision-making, however, it seems strange that there are not more studies based on the choice of next move task. Finally, research based on the choice of next move task, perhaps because of the analytical overheads the task usually imposes, tends to focus on a small number of positions, often only one – notably Gobet (1998a). An obvious danger in generalising results from a single position is that any position effects are discounted. 30 Methodology This chapter outlines the experimental design, procedure and analytical methods employed in the research. It also includes an ethical section. The ecological nature of the experimentation in this study meant that a great deal of relatively unstructured data (verbal protocols) were generated through the experimental procedure. These data were subjected to a detailed and structured (qualitative) protocol analysis that provided a set of quantitative variables to be entered into statistical analyses. The intermediate results of the protocol analysis offer the best means of conveying this part of the methodology and serve to precipitate the relevant section of the Project Review. Appendix II therefore contains details of the protocol analysis, including an example verbal protocol and Problem Behaviour Graph (PBG). Participants Eight male chess players from four different clubs in Worcestershire and the West Midlands took part in the experiment. Although their ages were not recorded, all had been playing chess as graded players for between 30 and 45 years (mean 34.75 years, standard deviation 5.39 years). Their British Chess Federation (BCF) grades were converted into the Fédération Internationale Des Échecs (FIDE) standard Elo ratings using the BCF conversion formula (BCF, 2003) and subsequently mapped onto United States Chess Federation (USCF) class divisions to facilitate comparisons between the results of this experiment and those of existing studies (e.g. Gobet, 1998a). The players were assigned to 31 two skill levels according to their equivalent USCF class as described in Table 1, below. Level 1 (Expert; n=4) Level 2 (Class; n=4) Sample mean (BCF grading) 168 120 Sample mean (FIDE Elo rating) 2087 1849 Equivalent USCF class Expert Class A/ Class B Equivalent Elo rating band 2000 – 2200 1600 – 2000 Table 1; Description of Skill levels of experiment players Materials Three chess positions were used in the experiment. They were positions A, B1 and C of de Groot’s original choice of next move experiments (de Groot, 1965 pp88-93) and were labelled A, B and C, respectively. They were depicted as standard chess position images on A4 card, complete with full move histories for the games from which they were taken. The positions themselves can be found in Appendix I Portable digital recording equipment, and pen and paper, were also used in the experiment. The recording time display on the equipment was made available to the players in place of a chess clock. Experimental Design and procedure A 2 x 3 repeated measures experiment was conducted using the following independent variables: Skill (Expert; Class) and Position (A; B; C). The experiment, which was conducted with each participant individually and in a quiet and undisturbed environment, consisted of a single ‘choice of next move’ task repeated across three conditions, defined by the three positions described above (A, B and C). The procedure was essentially the same as in the original de Groot experiments of 1938-43 (de Groot, 1965). Before the first task began the 32 experimenter instructed the player that he would be presented with the positions one by one and, for each, would be required to choose his next move, as if he were playing over the board in normal tournament play; the only difference being that he was requested to think aloud as he did so. The experimenter clarified that ‘thinking aloud’ was not the same as providing a commentary on one’s thought process, i.e. it was simply a natural verbal expression of thought. Further, the player was informed that the positions were from real games and were not chess ‘problems’ (typified by a single provable winning move); and that there were no time limits imposed, although a guideline was provided: that the player should aim to spend as much time on the task as they might reasonably expect to in a tournament game. Once the experimenter had checked that instructions had been understood and had gained the player’s informed consent for their participation, the task began. The conditions were conducted sequentially with the offer of a short break between each if required. The position was presented to the player at the same time the recording began. Thereafter the experimenter only intervened if asked a direct question concerning procedure or if the participant had remained quiet for a period of approximately 30 seconds; in the latter case the experimenter prompted the player by asking, “What are you thinking now?” Throughout the recording and wherever necessary, the experimenter noted questions for clarification. At the end of each condition the recording was stopped and the experimenter requested clarification accordingly. Most such instances concerned a misreported or unspecified move, piece or square. Upon completion of the three iterations of the ‘choice of next move task’ the experiment concluded. 33 Protocol Analysis The data collected from the experiment consisted of a single verbal protocol for each player at each level of the 2 x 3 design, giving 24 protocols in total. Each protocol was transcribed into tabular format and used to generate a Problem Behaviour Graph (PBG) according to the coding scheme set out in de Groot (1965), Newell & Simon (1972) and Gobet (1998a). Appendix II describes the coding scheme in greater detail and includes an example verbal protocol and the PBG that was generated from it. It also provides definitions of the important elements of PBGs from which the quantitative variables may be extracted. Derivation of quantitative variables Table 2 describes the set of quantitative variables derived from each graph, and its means of derivation. Although most of these variables were originally devised by de Groot (1965) and also used by Gobet (1998a), two were novel and are indicated in the table. 34 Quality of Move Total Time Time of First Phase Number of Base Moves Number of Episodes Number of Nodes Total Depth Maximal Depth of Search Mean Depth of Search Standard Deviation of Depth of Search Rate of Base Moves Rate of Nodes Total IR Total NIR Maximal IR Maximal NIR Number of Null Moves Proportion of Null Moves Subjective assessment of the quality of the chosen move (see Appendix A for the derivation of scores) Total time taken for choice of next move: time elapsed from initial presentation of position to confirmation of next move selection Total time elapsed before first Episode begins Number of distinct base moves (null moves permitted) Number of distinct Episodes of problem-solving behaviour Number of nodes (moves) considered, including repeated and null moves. Aggregate of search depths for each episode, with null moves included in the totals. Episodic depth is defined by the longest sequences of moves, beginning with the base move, among all branches. This variable is only measured to enable the calculation of Mean and Maximal Search Depths. The maximal number among all episodic depths, with null moves omitted from the totals. Mean episodic depth with null moves included; Total Depth divided by Episodes. Standard deviation of episodic depth with null moves included. This is a new variable. Rate of generation of distinct base moves; Total Time divided Base Moves Rate at which nodes are considered; Total Time divided by Nodes Total number of immediate reinvestigations of all base moves Total number of non-immediate reinvestigations of all base moves The maximal IR amongst all base moves The maximal NIR across all base moves Total number of null moves among all nodes. This is a new variable and is only measured to enable the calculation of Proportion of Null Moves. Proportion of total number of nodes that are null moves; Nodes divided by Null Moves. This is a new variable. Table 2; quantitative variables derived from Problem Behaviour Graphs Ethics The only serious ethical consideration for this research is the non-disclosure of any personally identifiable data both during and after the life of the study. Although all data has been rendered anonymous before reporting, players’ choices of next move have being assessed and thus they may have reason to feel that their individual performance is under scrutiny. To mitigate against any such misconceptions, the experimenter explained that each player’s data was to 35 remain anonymous and protected from unauthorised use under the Data Protection Act 1998. The experimenter also explained that the anonymous results would be published as part of the MSc. dissertation. The players were also advised of their right to withdraw from the study, even retrospectively, and the experimenter provided contact details to each player if they wished to exercise this right. The experimental procedure itself was totally innocuous – there were no risks to the players’ physical or mental well-being as a result of taking part. 36 Analysis Each dependent variable in Table 2 except Total Depth of Search and Number of Null Moves was subjected to a repeated measures factorial analysis of variance (ANOVA) with the between-subjects variable Skill and the within-subjects variable Position. The criterion of sphericity was satisfied for all variables entering each analysis except for Number of Non-immediate Reinvestigations, which was subsequently excluded from the analysis. These results for each variable are provided in the next section in meaningful groups; details of other tests are provided under the appropriate headings. The second section compares the results with those of similar studies, notably Gobet (2004) and the final section provides a higher level discussion of all findings. Results from this study Quality of Move The main effect of Skill on Quality of Move is significant (F(1,6)=9.757, MSE=15.042, p<0.05) whilst the main effect of Position on Quality of Move (F(2,12)=3.683, MSE=6.292, p<0.6) is weakly significant; there is no interaction effect. Table 3 shows the actual moves selected by each player across the three positions, together with the Quality of Move scores assigned to each of those moves and Figure 1, below, provides a plot of the marginal means of Quality of Move for each Skill level across the three positions. 37 Skill level Expert Class Position A Move Quality Rc2 1 Position B Move Quality Rb8 5 Position C Move Quality Ne4 3 Bxd5 5 Rb8 5 Kh8 2 Bxd5 5 Rb8 5 Bd7 3 Bxd5 5 Rb8 5 e5 5 Rc2 1 Kf8 4 d5 1 b4 1 Rb8 5 Bd7 3 b4 1 Kg7 3 e5 5 Kh1 1 h5 2 Ne4 3 Table 3; Moves chosen and Quality of Move for all players across all positions Quality of Move 6 5 Estimated Marginal Means 4 3 Position 2 A 1 B 0 Class C Expert Skill level Figure 1; estimated marginal means for Quality of Move The most interesting features of the data illustrated above are that although Position A appears to split Experts from Class players in terms of Quality of Move, Move Quality in the other two Positions is better balanced across Skill levels. In particular, the marginal means for Quality of Move across Skill levels in position C are almost identical (Class = 3; Expert = 3.25). Further, no player selected a ‘bad move’ in Position B, with no Quality of Move score below 2. 38 Time variables There is no main effect of Skill on Total Time (F(1,6)=0.605, MSE=29.592, ns) and, in fact, Experts apparently taken longer than Class players in choosing their next move in all three positions, the biggest of which was observed for Position A (a mean total time of 14.5 minutes for Experts versus 9.2 minutes for Class players). The same pattern is observed for the Time the First Phase; the main effect of Skill is non-significant here also (F(1,6)=3.604, MSE=3.604, ns). There is, however a significant main effect of Position on Total Time (F(2,12)=8.117, MSE=64.528, p<0.01) (although not Time of First Phase) and there are no interaction effects on either time variable. The most noticeable differences were between positions B, which was considered, on average for 8.6 minutes and C, which taxed the players for a mean time of 14.3 minutes. Base Moves and Episodes As with Total Time, there are also main effects of Position upon the Number of Base Moves (F(2,12)=7.104, MSE=22.792, p<0.01) and Number of Episodes (F(2,12)=3.285, MSE=69.542, p<0.08), although the effect is weak in the latter case. There are no main effects of Skill, nor any interaction effects, upon either of the two variables, whose marginal means are summarised below with the corresponding number of legal moves, for each position. 39 Position A Marginal Means Number of Base Moves 4.625 Number of Episodes 10.25 B 3 7.75 Number of Legal Moves 56 35 C 6.375 13.625 37 Table 4; Marginal Means for Base Moves/ Episodes and Number of Legal Moves As can be seen in Table 4, the relationship between Position and Number of Base Moves does not apparently stem from the number of legal moves available in each position: an average of 4.625 base moves are generated for position A (56 legal moves) and 3 for position B (35 legal moves), yet 6.375 of the possible 37 legal moves are generated for position C. Further, it can be seen that there appears to be a linear relationship between the mean Number of Base Moves and the mean Number of Episodes. Number of Nodes The main effect of Skill upon Number of Nodes is significant (F(1,6)=6.593, MSE=4056, p<0.05), as is the main effect of Position (F(2,12)=4.618, MSE=1439.292, p<0.05) although there is no interaction effect. Inspection of the marginal means, as illustrated in Figure 2, indicates that Experts consider more nodes than Class players in all positions. Position A shows the greatest skill differences (66.5 nodes for Experts versus only 17.5 for Class players) whereas Position B and C show less differences on skill levels. These latter two positions differ greatly from each other, however, on mean Number of Nodes across Skill levels (26.9 in Position B versus 53.63 in Position C). Further, Experts search approximately the same number of nodes in Positions A as C, whereas Class players search approximately the same number of nodes in Positions A and B. 40 Number of Nodes 70 60 Estimated Marginal Means 50 40 Position 30 A 20 B 10 Class C Expert Skill Figure 2; Marginal Means for Number of Nodes Finally, the distribution of Number of Nodes is shown in Figure 3. Apart from the outlier (117 nodes searched by one of the Expert players in Position A), Number of Nodes is fairly normally distributed with all values < 100. Frequency Distribution of Number of Nodes 6 5 4 3 2 Std. Dev = 26.51 1 Mean = 41 N = 24.00 0 0 - 10 40 - 50 20 - 30 80 - 90 60 - 70 100 - 110 Number of Nodes Figure 3; Frequency distribution of Number of Nodes Rate of generation There are no effects (main or interaction) of Skill or Position on Rate of Base Moves. The main effect of Skill level on Rate of Nodes is weakly significant (F(1,6)=5.646, MSE=13.777, p<0.6) whilst there is no effect for Position 41 (F(2,12)=0.590, MSE=0.001978, ns) and no interaction effect. Better players generate nodes more rapidly (Expert: mean 4.09 , s.d. 1.03; Class: mean 2.58, s.d. 1.48), as illustrated in Figure 4. Number of Nodes per minute 5.0 4.5 Estimated Marginal Means 4.0 3.5 3.0 Position 2.5 A 2.0 B 1.5 Class C Expert Skill level Figure 4; Estimate marginal means of Number of Nodes per minute Depth of Search The main effect of Skill for Mean Depth of Search is significant (F(1,6)=3.977, MSE=3.899, p<0.1), although this significance is weak, and there exists a main effect of Skill on Maximal Depth of Search (F(1,6)=18.609, MSE=70.042, p<0.01). There are no other main effects on this group of either Skill or Position, or any interaction effects. Note that this group includes the new variable Standard Deviation of Depth, whose inclusion is discussed later. The relationship between Skill level and some of the search variables is investigated below. Predicting search variables To investigate the predictive power of skill (taken here as the continuous variable Elo Rating) on the two search variables Mean Depth of Search and Maximal Depth of Search, these latter variables were first pooled across positions A, B and C for each player by: 42 a. selecting the maximal search depth of all episodes undertaken to derive Maximal Depth of Search (pooled); b. Pooling both Total Depth of Search and Number of Episodes to derive the new quotient Mean Depth of Search (pooled). Table 5 summarises the corresponding search data entering the analysis. Elo rating 1720 Maximal Depth of Search (pooled) 4 Total Depth of Search (pooled) 8 Number of episodes (pooled) 7 Mean Depth of Search (pooled) 1.14 1780 8 79 25 3.16 1925 5 111 36 3.08 1970 7 128 29 4.41 2010 14 170 43 3.95 2045 11 100 26 3.85 2105 9 170 44 3.86 2190 9 144 43 3.35 Table 5; Pooled Mean and Maximal Depth of Search by Player The regression of Maximal Depth of Search on Elo Rating is significant (F(1,22)=10.597, MSE=59.802, p<0.05). The regression line is given by: Maximal Depth of Search = -14.830 + 0.011 x Elo Rating This predicts that Maximal Depth of Search increases by approximately 2.1 ply per 200 Elo points (a single standard deviation in the Elo scale and the width of most USCF classes). The regression of Mean Depth of Search on Elo Rating is weakly significant (F(1,6)=4.672, MSE=3.058, p<0.08). The regression line in this case is given by: Mean Depth of Search = -4.888 + 0.004 x Elo Rating This predicts an increase of 0.8 in Mean Depth of Search for every 200 Elo points. 43 Reinvestigations There are no main effects of Skill or Position on any of the reinvestigation variables although the interaction effect upon Maximal Number of IR is significant (F(1,6)=7.895, MSE=6.25, p<0.05). This is illustrated in Figure 5 The interaction is actually disordinal: Class players generated less immediate reinvestigations in Position A than Positions B and C, whereas the opposite is true for Experts. Maximal Number of IR 5 Estimated Marginal Means 4 3 2 Position A 1 B 0 Class C Expert Skill level Figure 5; Estimated Marginal Means for Maximal Number of IR Null moves There is a main effect of Skill upon the Proportion of Null Moves (F(1,6)=7.414, MSE=0.04596, p<0.05) (F(1,6)=6.005, MSE=3037.5, p<0.05), and no main effect for Position, nor any interaction effect. This effect is illustrated in Figure 6, below: Proportion of Null Moves is inversely proportional to Skill, suggesting that better players are more likely to think in terms of complete sequences of moves; it is noted from the verbal protocols that some players in the Class group tended to consider sequences of own moves with null moves in place of some opponent moves. 44 Proportion of Null Moves .18 .16 Estimated Marginal Means .14 .12 .10 Position .08 A .06 B .04 Class C Expert Skill Figure 6; Estimated Marginal Means of Proportion of Null Moves 45 Summary The following table summarises the main effects of Skill level and Position on each of the dependent variables entered into the analysis. Dependent variable Quality of Move Main effect of Main effect of Interaction Skill2 Position effect p<0.06 ns p<0.05 Total Time ns p<0.01 ns Time of First Phase ns ns ns Number of Base Moves ns p<0.01 ns Rate of Base Moves ns ns ns Number of Episodes ns p<0.08 ns Number of Nodes p<0.05 p<0.05 ns Rate of Nodes p<0.06 ns ns Mean Depth of Search p<0.1 ns ns Maximal Depth of Search p<0.01 ns ns Number of IR ns ns ns Number of NIR ns ns ns Maximal IR ns ns p<0.05 Maximal NIR ns ns ns Number of Reinvestigations ns ns ns Proportion of Null Moves ns ns p<0.05 Table 6; Summary of main effects of Skill and Position on dependent variables Hence the data presented suggests that Skill has a main effect upon Quality of Move, Number of Nodes, Rate of Nodes, Mean Depth of Search, Maximal Depth of Search and Proportion of Null Moves; and that Position has a main effect upon Quality of Move, Total Time, Number of Base Moves, Number of Episodes and Number of Nodes. Note that there is only one interaction effect (upon Maximal IR) yet there are no main effects for this variable. 2 p-values quoted at standard levels (0.01, 0.05) except where p>0.05. In this case the actual p-value is quoted, rounded to 1dp. 46 Comparison with other studies The results reported above are interpreted in the context of the design and sample size. This is particularly important for comparisons with results from other related studies, i.e. Gobet (1998a) and de Groot (1965). The sample was fairly small sample with a relatively narrow range of skill levels; in particular there were no Masters among the sample. De Groot’s sample3 included players of all skill levels down to Class (n=14; Grandmasters=5, Masters = 2; Experts = 5; Class = 2). Gobet’s sample was larger (n=48) with average skill level somewhere in between de Groot’s and the sample used in this study (Masters=12; Experts=12; Class A=12; Class B=12). Conversely, the data in both of the other studies is based on Position A only, whereas this study employed three very different types of position (see Appendix I). Quality of Move The results of both this study and Gobet’s confirm de Groot’s assertion that better players choose stronger moves. The significance of the effect of Position on Quality of Move in this study, however, suggests that some positions are more difficult to select a good move for than others – in particular, Position A. Interestingly, the position that the players were least comfortable with (Position C) generated the best quality moves on average. Figure 1 suggests an interaction effect, with the tactical and complex Position A splitting the two groups effectively and the strategic and quieter Position B showing little difference, but the corresponding F ratio is non-significant. 3 For the purposes of comparison, this sample includes only the players for whom detailed statistics have been extracted from their Position A protocols, courtesy of Gobet (1998a) 47 Time variables Gobet (1998a) found a weakly significant result for Total Time, suggesting that Masters choose their next move more rapidly than lower calibre players. The results above show no differences between Experts and Class players, although the marginal means indicate that Experts are slower than Class players (12.68 minutes versus 10.46 minutes). The implication is that there are, in fact, no differences between players of different levels in the time taken to choose their next move. An observation from the experiment is that some players consciously truncated their thought processes on the basis that, in a tournament game, too much time spent on the single choice would lead them into time trouble. Gobet found a significant reduction in the Time of First Phase for higher calibre players whereas the results here are also non-significant. Time of First Phase was perhaps one of the more difficult variables to extract from the protocols due to the poorly defined boundary it shares with the Phase of Elaboration (de Groot, 1965). Although certain players deliberately sized up the situation and discussed general plans before entering a longer phase of search and evaluation, others apparently focused immediately on base moves and corresponding sequences, whilst one player spent the majority of his time apparently in the First Phase before committing to a move. This issue is revisited in the Methodological Discussion. Base Moves and Episodes Gobet’s results suggest a curvilinear relationship for both variables with Skill, since Class A players generate more base moves and episodes than either Experts or Class B players, although only the effect on Number of Base Moves is 48 significant (Gobet 1998a). Perhaps unsurprisingly, with Class A and B players pooled in this experiment, there are no significant effects of Skill. The significant effects of Position on both Number of Base Moves and Number of Episodes, however, again suggests that different types of position give rise to different search and evaluation strategies irrespective of skill level, but that this relationship is not explained by the complexity of the position (as measured by number of legal moves). Position C demanded the widest search for base moves and generated the most episodes; it may be argued that the character of this position is perhaps more ambiguous that the other two, containing strategic and tactical themes. It is possible that this required players to pursue potential tactical lines as well as more strategic moves. Search variables4 De Groot (1965) based his main conclusion, that recognition is the dominant mechanism in chess thinking, on two results suggesting that search behaviour does not differ across skill levels (at least at the higher levels of chess skill): 1. Chess players rarely search more than 100 nodes in any position; 2. There are no significant effects of skill on any search variable (e.g. Number of Nodes, Mean Depth of Search, Maximal Depth of Search). Whilst both this study and Gobet’s (1998a) provide evidence in support of the first result, this study shows that Experts do search more nodes than Class players. This is partially backed up by Gobet (1998a): although he did not find a skill effect for Number of Nodes in position A, the average number of Nodes was considerably lower for the Class B group (33.9) than for the other groups (58 for 4 The variables in the previous groups Number of Nodes, Rate of generation and Depth of Search are considered here together. 49 Masters, 58.3 for Experts and 56.8 for Class A players; Gobet 1998a p13). The significant difference found here, therefore, might be due, in part, to the reduced skill range among the players in the experiment; it could be that the biggest skill differences for this search variable are actually to be found between Experts and Class players. This suggests that there is a improvement in search capacity up to Expert level, beyond which this measure remains fairly constant – and that de Groot’s second result, above, does not hold below the level of Expert. This study also confirms the significant result from Gobet (1998a) concerning the effect of Skill on Mean Depth of Search, and adds evidence to the argument (counter to that of de Groot) that higher calibre players employ greater search than lower calibre player – due to the significant result on Maximal Depth of Search. To investigate such effects in more detail, Charness (Holding 1985; Gobet, 2004) and Gobet (1998) made predictions of search capabilities for different skill levels by analysing the relationship between Elo rating and selected depth of search variables (Maximal Depth of Search and Mean Depth of Search). Charness, in his 1981 experiment investigating the effects of age and skill on search capabilities, used four positions, two of which were strategic whilst the other two were tactical in nature. Gobet used only one position, de Groot’s position A, which is highly tactical in nature. The regression equations calculated from the pooled data in this study suggest slightly larger increases in Maximal Depth of Search and Mean Depth of Search per 200 Elo points than evidenced by the previous studies (see Table 7). 50 Prediction This study Charness Gobet Increase in Maximal Depth of 2.1 1.4 N/A Search per 200 Elo points Increase in Mean Depth of Search 0.8 0.5 0.6 per 200 Elo points Table 7; predicted gain in search capabilities as a function of Elo rating In interpreting this result it is noted that: 1. de Groot’s results are based on a sample dominated by Grandmasters, Masters and Experts; 2. Charness and Gobet found skill differences for search capabilities when lower calibre players were more prevalent in the sample; 3. Both Charness and Gobet have suggested that the relationship between skill level and search capabilities across all playing levels is not linear. Whilst Charness proposes a plateau effect for high calibre players, Gobet suggests a curvilinear relationship, whereby high calibre players actually search less due to better recognition-led evaluation capabilities. Given the relatively low calibre of the players in this sample, the data presented here therefore extends the model of Gobet in suggesting that rate of change of search capability (as measured by Mean and Maximal Depth of Search) is greater at lower skill levels (e.g. between Class A/B and Expert). Note that the predictions for Mean Depth of Search are similar across three studies that used different combinations of types of position. This backs up the result of the previous section that states that there is no significant effect of Position on either Mean Depth of Search or Maximal Depth of Search. Rate of generation The weakly significant effect of Skill on Rate of Nodes is divergent with Gobet’s (1998a) result. Although neither study provides evidence for an effect of Skill on 51 Rate of Base Moves, Charness’s 1981 result (Gobet, 1998a) suggests that Grandmasters generate more base moves per minute than Experts. The reduced sample size in this study might explain why such a result was not identified here. Reinvestigations There was a degree of convergence with Gobet (1998a) concerning reinvestigation variables. Gobet’s only significant results in this area were for the main effects of Skill on Maximal Number of IR (p<0.005) and Maximal Number of NIR (p<0.02) (Gobet 1998a p16). The results presented in the previous section indicate that there are no main effects of Skill on these variables, although there is an interaction effect. Gobet suggested that that Maximal Number of IR is proportional to Skill level, which is backed up by the plot of marginal means of Maximal Number of IR in this study (Figure [max ir]). It is interesting to note that if only the data for Position A are entered into an ANOVA the effect of Skill is actually significant (F(1,6)=10.714, MSE=28.125, p<0.05). A high Maximal Number of IR represents a situation where a player becomes deeply involved in the analysis of a particular sequence (or branch) of his or her search tree, returning to the same base move a number of times in succession. It is also seen as evidence for progressive deepening (Gobet 2004, p110). Position A is the most tactical of the three at it appears that Expert players become deeply involved in the tactical analysis required to select a good move. The disordinal nature of the interaction identified in this study also suggests that Class players are less equipped to do the same and even tend to have longer sequences of reinvestigations for quieter, less tactical positions. 52 Gobet (1998a) also asserted that Maximal Number of NIR is inversely proportional to Skill, yet an ANOVA with the current data (Position A only) generates a non-significant result, as Figure 7 indicates. Maximal Number of NIR 2.0 1.8 Estimated Marginal Means 1.6 1.4 1.2 Position 1.0 A .8 B .6 Class C Expert Skill Figure 7; Estimated Marginal Means for Maximal Number of NIR Null Moves The significant skill effect for Proportion of Null Moves suggests that better players think in terms of completely specified sequences of moves more often than lesser players. By means of a comparison, Saariluoma and Hohlfeld (Gobet 2004)5 examined the proportion of null moves as a function of position type (strategic or tactical) and found that it is greater, at approximately 12%, in strategic positions; Charness (Gobet 2004) previously found this percentage to be approximately 10%. Interestingly, although the result in the current study holds for Expert players (Position B = 11%; Position A = 5.5%; Position C = 5%), Class players search approximately 15-16% null moves irrespective of position type. (See also Figure 6.) 5 Calibre of players involved in the study unspecified. 53 The differences in proportions across the 3 positions as each skill level lead to two alternative interpretations: 1. Strategic positions (Position B) demand more generalised ‘plan formulation’ than tactical positions (Position A and, to a certain extent, Position C). result is an increased proportion of templates of move sequences; 2. Better players are simply more thorough in their analysis of tactical sequences. Summary The results generated by this study broadly agree with those of Gobet (1998a), Charness (Holding, 1985; Gobet, 2004) and Saariluoma and Hohlfeld (Gobet 2004) and argue against some of de Groot’s earlier conclusions. Better players make better choices of move, as shown by de Groot (1965) and Gobet (1998a), but they also search more, to a greater depth and more thoroughly than lesser players. The exact relationship between skill and both capacity and depth of search is probably not linear. It appears that the rate of increase in search capacity plateaus at the level of Master and above; and that depth of search may actually vary in a curvilinear fashion with skill level, with a rate of increase that itself decreases, and actually changes sign, as skill level increases from Class B to Grandmaster. Given the difference in calibre of players in the samples considered across the various studies, it is entirely possible that de Groot’s results on search variables were actually correct – it is merely the applicability of the conclusions to lower skill levels that is in question. 54 Project Review This chapter reflects upon a two key issues: the necessary refocusing of the research throughout its course (including modifications both to the design and the analysis) and the validity of the data collection and analysis methods used in support of the choice of next move task. Focus of research The final dissertation is far more focused than the original research proposal suggested in might be. The main reason for this is that one half of the study was suspended to keep the study to a manageable size, both in a positive sense (due to the healthy amount of material available from the choice of next move task) and a negative sense (due to both access difficulties and increased overheads of qualitative analysis). The original experimental design included a choice of next move task and a personal construct elicitation task, the latter conceived with the aim of investigating the nature of conceptual knowledge that chess players possess. Holding (1985) postulated that conceptual knowledge, along with search and evaluation, explain skill in chess and one of his main criticisms of chunking theory was that chunks were too small in size to reflect conceptual knowledge (Gobet & Simon, 1998b). Template theory (Gobet & Simon, 1998a) addresses this criticism by introducing larger perceptual structures known as templates, which are large enough, in theory, to encode entire positions. Personal Construct Psychology (PCP) is concerned with how individuals construe the world, based on the assertion that each man possesses an ever changing set of hypotheses about the world that are represented on personal 55 constructs – essentially axes of reference characterised by contrasting poles (e.g. we may hypothesise about people on the construct ‘good-bad’ or we may hypothesise about chess positions on the construct, ‘tactical-strategic’). Must of PCP is due to George Kelly, who also devised the Repertory Grid technique, which includes methods for the elicitation of personal constructs (Fransella, Bell & Bannister, 2004). Under the assumption that personal constructs, which may exist at any level of abstraction, are equivalent ways of classifying/ describing both templates and the higher level schemata that they relate to, the research questions that the second half of the study concerned, therefore, were: How many constructs do chess players of a given skill level possess? How are the construct systems of chess players organised? What degree of overlap is there between different chess players’ construct systems, particularly those players with similar skill levels? What are the most concrete constructs and do they correspond to Chase & Simon’s piece relations in chunking theory? Thus the questions for this part of the study were fairly open-ended and the analysis was intended to be investigative. The basic procedure chosen was the method of triads, whereby thee ‘elements’ (in this case, chess positions) are presented to the participant, who is asked a question of the form, “How are two of these elements similar and thereby different from the third?” The context for answering the question is defined by the research – hence here it was a ‘choice of next move’ task on each of the three positions. The similarity-difference pair provided by the participant would form the poles of a new construct, which the experimenter would help the participant into something meaningful to him or 56 her. This would define a single episode of elicitation – the presentation of a new triad would mark the next. Typically only one (or possibly two) constructs is elicited from each triad before the next is presented. This implies that a fairly large number of elements is employed (typically more than the number of constructs expected). Personal construct elicitation is most commonly used in clinical psychology as a means of establishing a patient’s views on self and others with a view to guiding choice of therapy. As such, elements are provided simply as names or roles of individuals in the patient’s life – a triad of role names activates similarities and differences almost immediately. In the more general case of knowledge elicitation, elements take the form of exemplars from the participant’s domain of expertise. Unfortunately triads of such exemplars do not always instantly activate similarities and differences since, by their very nature, the exemplar elements require some consideration. Chess positions are typical of this type of exemplar. It was decided to attempt to elicit 10-15 personal constructs of ‘chess knowledge’ using 18 separate positions, presented as 18 different triads (i.e. positions are sampled with replacement with the condition that each appears in exactly three triads and never with the same position twice). Upon the first run of the experiment, it became clear that 18 triads was far too ambitious a target for the allotted 2 hour elicitation session because the players required time to orient themselves to each of the three positions in the each triad before they could provide any similarity-difference pairs. The experimenter attempted to counter this by imposing a limit of 5 minutes consideration time, but the effect was that the actual elicitation procedure was used by the participants to 57 ‘think aloud’ in analysing each of the three positions to their satisfaction, and time quickly ran out. No player completed more than six triads in their 2 hour session and there was no time for construct refinement (whereby construct definitions are challenged, developed, discarded etc. and construct hierarchies are developed by adding new, related constructs at higher and lower levels of abstraction and possibly linking constructs already elicited) Since the choice of next move task had already been completed at the start of the session, this meant that the participant has been engaged in experimentation for three hours. This is close to the limit of concentration for a single session and, since the players were not being paid for their involvement, it was unreasonable to expect them to continue. The experimenter then fell back on a contingency plan, whereby construct refinement was completed by each participant via e-mail, the experimenter having analysed that individual’s embryonic construct set and posed specific questions. Although this was actually been completed with six of the eight participants, the data were not fully analysed due to time constraints – shared understanding of an individual’s personal constructs is severely hindered if dialogue concerning those constructs is limited to e-mail, causing an unmanageable increase in workload. In summary, therefore, the main reasons why personal construct elicitation failed as part of this research study were: 1. Relatively slow time period for each triadic elicitation episode due to complexity of elements (chess positions); 2. Lack of continued access to participants (3 hour session limit); 3. Overheads on analysis imposed by e-mail completion of task. 58 The experimenter has retained the data and recommends that, if the study were to be extended, this data is analysed thoroughly to establish answers to the research questions set out above. Choice of next move task Thinking aloud As de Groot himself noted (1965, p80), the validity of ‘thinking aloud’ as a means of expressing one’s thought process may be called into question. de Groot pioneered the use of the technique and others, in particular Herbert Simon, have advocated its continued use for gaining insight into human problem solving techniques. Chess is particularly well-suited to verbal protocols since it includes a great deal of common and well-defined terminology to describe moves, tactics, plans and positional features. There are obvious dangers in interpreting verbal protocols as perfect records of thought, however; de Groot mentions a few of these, e.g. incompleteness due to unconscious and rapid thought, the disruptive influence of slowing one’s thinking down to verbalise thought etc. (1965, pp8084). Further, individual differences in style of verbalisation cannot be ruled out however, as a confounding factor in (ultimately) deriving protocol statistics. Of the sample of eight in this experiment, some players were certainly more at ease with thinking aloud than others. The following behaviours were observed from different individuals: Long periods of silence where, it is almost certain, complete sequences were being calculated that were never expressed; 59 Long periods spent in the First Phase or Transitional Phases, followed fairly rapidly by sequence assessments or even next-move selection, suggesting that the First/ Transitional Phase verbalisations were in fact masking deeper calculations; Fairly rapid repetition of the opening moves from a sequence to precipitate the investigation of a new branch. In these cases it was difficult to tell whether such repetition constituted fresh consideration of the base move (indicating a new episode) or reorientation of one’s place in a search tree (indicating a new branch in the same episode). To avoid the introduction of experimenter bias, the former was assumed, according to the coding scheme as described by de Groot (1965). The provision of a running commentary on one’s thought process rather than the direct verbalisation of thoughts. This occurred due both to unease with thinking aloud and over-helpfulness! Fortunately, none of these behaviours were permanent features of any individual’s thinking aloud. It is hard to believe, however, that such behaviours were limited to the sample involved in this experiment. Mitigation for these behaviours could involve a ‘practice run’ followed by experimenter feedback, although it could be argued that thinking aloud is a skill that can only be learnt effectively over longer periods (particularly to combat the first behaviour). Coding of verbal protocols and problem behaviour graphs The coding scheme for verbal protocols and PBGs in explained in greater detail in Appendix II The experimenter had very few issues in coding the verbal protocols due to the apparent universality of de Groot’s coding scheme (based on 60 Selz’s framework). The only issues arose in the identification of boundaries between the First Phase and First Episode, particularly for players exhibiting the second behaviour described above. Having established the macro-structure of the verbal protocol, the task of coding each episode as a sequence of moves in the PBG remained. In terms of identifying player errors (e.g. in naming moves, pieces or squares) the experimenter, a non-chess player, had very few problems, since such errors stood out as obvious anomalies in logical sequences of moves and could easily be corrected. (A good analogy is that of an error-correcting code: the correction can be done without an understanding of the content.) The clarity of the coding scheme due to de Groot (1965), Newell & Simon (1972) and Gobet (1998a) also greatly assisted in converting verbal protocols into PBGs. There were occasions, however, on which the apparent structure of thinking exhibited by some of the players did not fit into the PBG formulation. Examples include: The expression of fragments of sequences (i.e. no base move and location of first move in fragment unspecified – or introduced with a remark such as, “so if at some point we could play…”. The PBG coding scheme forces such fragments to be coded as transitions, even though calculations are being carried out, because it requires moves to have clearly defined positions; The expression of what are, effectively, opponent base moves. Some players apparently used a technique whereby they pretended that the opponent was on move and started to calculate sequences from candidate opponent moves. In the PBG formulation these must all be coded as 61 branches following a null base move. Whether these sorts of calculations constitute different episodes of thought remains to be decided; The representation of ‘not moves’ in the PBG. These are not quite the same as null moves, e.g. “not Qe4”. It is recommended that, in an extended study, extensions to the PBG coding scheme are trialled, with feedback sought on validity from chess players, and implications for variability in the results of quantitative analysis investigated. 62 Conclusions The specific research questions for this study were as follows: Do club-level chess players of differing calibres differ in terms of quality of move selection? Do club-level chess players of differing calibres differ in terms of capacity of search, mean and maximal search depth, and thoroughness of search? To what degree do the levels of search activity in club-level players fit with existing models of chess thinking? The first two of these questions have been answered directly by the analysis: Experts choose better moves than Class A/B players across both tactical and strategic positions; Experts also search more, to a greater depth and more thoroughly than Class A/B players. To address the final question, it is useful to turn again to Gobet’s (1998a) replication of de Groot’s experiments. Gobet provides a useful summary of which effects would be expected under both recognition-based and search-based models of chess skill: “Both pattern recognition and search models predict that stronger players choose better moves, that they select moves faster, and they generate more nodes in one minute… Search models predict that stronger players search more nodes and search deeper…. Finally, pattern recognition models predict that strong players mention fewer base moves, reinvestigate more often the same move, jump less often between different moves and have a shorter first phase.” (Gobet, 1998a, p23). These postulated relationships for different models of chess skill are shown in Figure 8 below; Proportion of Null Moves is assessed to vary inversely with skill 63 level under search-based models, since searches should be more completely defined, and has been added to Figure 8, accordingly. Figure 8; postulated differences in variables for increase in skill level under different models of chess skill Gobet suggested that he had identified all differences expected under a recognition-based model but that some changes expected under a search-based model of chess skill had not been found, since there were no skill differences on Number of Nodes. The results from this study, however, suggest that all skill differences expected under a search-based model had been identified, although Rate of Nodes and Mean Depth of Search only weakly. They also suggest that there are no skill differences on any of the variables only associated with the recognition-based model. Two major conclusions may be drawn from this set of results. Firstly there is strong and continued evidence for differences in search capabilities across skill levels in chess players, building on the results of Gobet (1998a), Charness (Holding, 1985; Gobet, 2004) and Saariluoma and Hohlfeld (Gobet 2004). Such evidence argues against the basis of de Groot’s main conclusion (1965) that recognition is the dominant mechanism underpinning chess skill. Proponents of 64 template theory (e.g. Gobet), however, argue that such continued results for search differences across skill levels do not undermine the recognition-based theory of chess skill itself. In response to Holding’s assertion that differences in depth of search cannot be explained by recognition-based models, Gobet states, “this is obviously wrong, as pattern recognition should facilitate the generation of moves in the mind’s eye, permitting a smooth search.” (Gobet, 1998a, p24). Thus skill differences on variables previously associated with search-based models of chess skill can be explained by recognition-based models. The second major conclusion to be drawn, however, suggests that there is less support for recognition-based theory from the results presented above. None of the variable differences that Gobet (1998a) asserts are predicted only by recognition-based models (and not search-based models) are significant in this study. These two conclusions must be placed in the context of the calibre of players involved in the study, however. It may be that the results hold only between Class A/B players and Experts. This, however, would provide evidence to the fact that the better players at club level are superior primarily because of their search capabilities and not recognition. A different model of chess skill may be required for players below the level of Master. 65 Appendix I: de Groot positions POSITION A 1. d4 d5 8. Bd3 Nc6 2. c4 e6 9. O-O cxd4 3. Nc3 Nf6 4. Bg5 Be7 5. e3 O-O 6. Nf3 dxc4 7. Bxc4 c5 WHITE TO MOVE 15. Ba2 Bc6 16. Rac1 Qb6 10. exd4 Nb4 11. Bb1 Bd7 12. a3 Nbd5 13. Qd3 g6 14. Ne5 Rc8 66 POSITION B 1. e4 e5 11. Qf3 O-O 2. Nf3 d5 3. exd5 e4 4. Bb5+ c6 5. dxc6 bxc6 6. Ba4 exf3 7. Qxf3 Nf6 8. O-O Be7 9. Bxc6+ Nxc6 10. Qxc6+ Bd7 BLACK TO MOVE 21. Qg4 Qe5 12. d3 Qc7 22. Be3 Bf4 13. Nc3 Bd6 23. Bd4 Qxd4 14. h3 Bc6 24. Qxf5 g6 15. Qe2 Rfe8 25. Qc5 Qd7 16. Qd2 Nh5 26. Qxh5 Bxe4 17. Qg5 Bh2+ 27. Qg4 Qxg4 18. Kh1 Re5 28. hxg4 Bc6 19. Qh4 Rf5 29. Rfe1 20. Ne4 Bg3 67 POSITION C 1. c4 e6 2. d4 Bb4+ 3. Bd2 Bxd2+ 4. Qxd2 f5 5. Nc3 Nf6 6. g3 d6 7. Bg2 Qe7 BLACK TO MOVE 8. O-O-O Nbd7 9. e4 fxe4 10. Nxe4 O-O 11. Nc3 Nb6 12. Qe2 Qd7 13. Nf3 Qc6 14. b3 a5 15. a4 Nbd5 16. Nb5 Nb4 17. Bh3 68 Assessment of quality of move De Groot (1965, p128) conducted a thorough analysis of position A to provide an assessment on the order of quality of the best 22 among all 56 legal moves. He used this analysis to assess the performance of each of his skill levels in next move selection. Gobet (1998a) reanalysed position A to assign a quantitative score to each move, on a scale of 1 (weak move) to 5 (winning move) (Gobet 1998a; Gobet, 2005)6. This enabled him to enter ‘quality of move’ into a quantitative analysis. Although Gobet (1998) does not report this scoring scheme for each of the 56 legal moves, Gobet (2005) contains enough information to reconstruct this scheme. There is less data available for positions B and C, however, the best source being de Groot’s ordering of the best 10 moves for the former and 9 moves for the latter (1965, p129). To generate move quality scores, an analysis of all positions (A, B and C) was conducted with the computer chess engine Fritz 5 (Chessbase, 1997), truncated at 13 plies. The Fritz 5 user manual asserts that Fritz analyses positions more accurately at odd-number ply-depths. 13 plies was the greatest odd-number depth to which Fritz analysis could be reasonably conducted given the computing resources available. It should be noted, however, that this represents an extremely strong analysis. The Fritz analysis generated evaluations for each legal move for each position; Fritz evaluations are based on a number of factors, of which material advantage is a key determinant. Unfortunately, these evaluations could not be used directly because they were not normalised (i.e. the evaluations for each move were dependent upon he static evaluation of the starting positions, which were not matched for material advantage) and could not easily be normalised (because the range of evaluations across all moves 6 There is a discrepancy between these source in terms of the lowest score awarded (1 or 0). This study adopts the view of the most recent source. The lowest score does not, in fact, matter, either to Gobet’s analysis or the analysis conducted in this study, since no player selected a move that has an ambiguous score. 69 was partially determined by the potential for material loss, and this was not matched across the three positions because position B, for example, contained no queens). An comparison of the Fritz analysis for position A and Gobet’s subjective analysis for the same position, however, revealed that there was a general mapping of the former onto the latter. This mapping, outlined in Table 8, below, was applied to the moves for positions B and C to obtain a full set of ‘Gobet numbers’ for each position. Moves according to Fritz evaluation White to move Black to move Moves with maximal score(s) e Moves with minimal score(s) e amongst all legal moves amongst all legal moves Moves with scores e’ where Moves with scores e’ where e-0.1 ≤ e’ < e e ≤ e’ < e+0.1 Moves with scores e’’ where Moves with scores e’’ where e-0.3 ≤ e’’ < e-0.1 e+0.1 ≤ e’’ < e+0.3 Moves with scores e’’’ where Moves with scores e’’’ where e-1.0 ≤ e’’’ < e-0.1 e+0.3 ≤ e’’’ < e+1.0 Moves with scores e’’’’ where Moves with scores e’’’ where e’’’’ < e-1.0 e+1.0 < e’’’’ Table 8; Mapping of Fritz evaluations onto Gobet Numbers Gobet number 5 4 3 2 1 This mapping essentially partitions the legal move set into 5 groups, the first of which contains only the move(s) with the best score – remembering that white aims to maximise scores and black aims to minimise them – and the last of which contains ‘blunders’ (moves that result in the equivalent loss of at least a pawn in material, worth 1.0). 70 Appendix II: Protocol Analysis Protocol Analysis begins with the transcription of the verbal protocol from digital audio media to text. The next stage is the identification of the protocol structure: First Phase, Episodes, Transitional Phases and Final Phase (de Groot, 1965; Newell & Simon, 1972) First Phase: This is characterised by general orientation, the consideration of enemy threats and own plans, and the generation of base moves without any search or evaluation. There is a lack of thinking focused at the level of move sequences. The boundary between the First Phase and the first Episode is not always clear since a player may shift gradually from generating base moves to analysing them by search and evaluation. Episode: An Episode is a distinct sequence of move considerations, with branching allowed, stemming from the single consideration of a base move. If a subsequent sequence begins from a base move then it is a new episode, whether that base move is different from the last or not. Sequences may be of length one and each leaf node need not be evaluated explicitly. Thus every time a base move is mentioned at the beginning of a sequence after the First Phase has completed then it signifies the beginning of a new Episode although discretion allows for two exceptions: 1. it is obvious that base moves are being listed; 2. it is obvious that it is a rapid repetition of the same sequence rather than a new search-and-evaluation of that sequence. Transitional Phase: this is identical in character to the First Phase except that it occurs between Episodes. Transitions typically occur when a player ‘stands back’ from his thought process to look again at general plans. 71 Final Phase: this is typified by summary statements, comparisons of base moves with apparently no further search-and-evaluation and move selection. It is also the last phase. Once each of the phases and episodes have been identified and checked, the textual protocol is tabulated, as in Table 9, below. The time at completion of each phase/ episode is noted. The right-hand column is used to make notes on move selections and errors on the part of the player. Thankfully it is usually relatively simple to identify errors (e.g. in square identification) because they generate anomalies within sequences. Player 8, Position B First Phase. Right. There’s 3… e4, e5, knight f3, d5. Right, oh gosh, it’s black to move. Initially, he’s got, er, the pawns: 7 plays 4. He’s got 2 bishops for a rook. 3 pawns… and it’s black to move. Hmm... The key to this, I would imagine, is to keep… keep the threats up. Fortunately he’s got the 2 bishops so he can place quite a few threats. Trouble is if you’ve got 1 bishop you, um, he puts them all on a square that you can’t, er, the diagonals you can’t attack but he can’t do this because you’ve got two bishops. Umm… is there any initial threats? Let’s have a look. Er, I can’t see… I can’t see… There’s no sillies – that you can just capture something. OK, so what is white threatening to do? What don’t we want him to do? We don’t really want him to mobilise his rooks. I think we’ve got to keep him… we’ve got to keep him pinned down here. Obviously if we win any more pawns it’s going to be advantageous for us so he can’t let us win any more pawns. Umm… I think we don’t really want him to let him… excuse me… we don’t want to let him get his rooks on the 7th or doubled up on the 7th, 72 which would be not very well for us. Having said that he’s got to be careful. Um, right, so what would we do? What would be a good plan? Er, right, we’ve… both bishops are pointing at his king. Er… there’s no open files, we can’t... Hmm… I suppose we could play… Thing is: what’s he going to do if…? [2:46] Episode 1. If we attack a pawn? The obvious attack is his queen’s knight pawn, here. Attack it with the rook… attack it with the rook by playing rook ‘rook there’ = Rb8 there: what’s he going to do? If he ‘pushes the pawn on’ = b3 pushes the pawn on he’s going to end up with a position that’s like Swiss cheese and the black bishop would run rampant. He must be careful not to stagger his pawns, that it looks like colander. If he does that and the bishops get in the middle here he’ll have a hell of a problem getting rid of them, plus the bishops will stop him from doubling up and all the other things. [3:42] Episode 2. Now the reason I say shall we play a sort of a waiting move… let’s threaten a pawn then find a better square for this rook to go. As the king’s stranded over the… on h1 – or king rook 1 – can we do anything about attacking it by using that as a springboard – attack a pawn then possibly come up the board? Come up the board to, perhaps, knight 4 – ‘come up the board’ = Rb5 or b5? [4:25] Transition. Are you recording in algebraic or descriptive? Because I think in descriptive. So, er, what else? I mean there is the alternative’s, er… just a small little waiting move, say. Let’s just… importantly, what can he do? What can he do? If we start attacking he’s got to defend it. [5:01] Episode 3. If… I would, personally, if I was black… if I was white and black played rook to queen’s knight 1, I’d play queen’s rook to knight 1 to defend it because I don’t like the look of playing 73 pawn to knight 3 – it’s too messy – black can start to get his black squared bishop in the holes. I personally wouldn’t move that. Er, I would defend, plus also that would give you the option of playing pawn to queen knight 4 at a later date if you were white. [5:46] Episode 4. Er, so… Right, black, rook to knight 1, OK, rook to knight 1. [5:58] Episode 5. There is the alternative, of course, of the immediate counter-attack. Rook to queen knight 1, rook to king 7, attacking the pawn so if black takes the queen knight pawn, white would then take the a7 pawn, which would then leave him… that queen rook pawn is then passed, so the rook behind it. Having said that, he... [6:32] Episode 6. So, alright, rook to queen knight 1, rook to king 7, rook takes pawn, rook takes pawn, rook takes knight pawn, he’s going to lose too many of these pawns. I don’t think it’s… I don’t think it’s going to work out for him if he just counter-attacks. I think he might be able to do that a move or so later. [6:55] Episode 7. Rook to queen knight 1, rook to queen knight 1 for white, um… now I, personally, would stop that rook from getting to king 7. Play king to bishop 1 perhaps? King to bishop 1, which would stop the rook in its tracks. Slower build up. Mmm… yes, plus you could play… black could play… [7:39] Episode 8. Look, because, these sort of positions, you need an overall strategy as opposed to plying move to move, er, and you think, “right, my overall strategy is…” Or would it be…? I don’t like white’s position insomuch as wherever he advances a pawn, black’s going to get in. If he tries to protect this with b3, black can infiltrate – er, f3 – black can infiltrate on the black squares. [8:21] ‘rook takes knight pawn’ = Rxc2, I believe, since black has already taken the knight pawn in the third move in this sequence. ‘protect this’ = the king 74 Episode 9. He can’t… the only square he can come to is this king 7, this e7, so king to there. Hmm….tactically would it be better just to play a nice quiet move to start with by playing king to bishop 1? He can’t advance anything. Pawn, hmm… hmm… Right, I shall… if we just play it: king to bishop 1, it’s his move. What could he do? I suppose he could play rook to king 2 but… yeah, so if you play king to bishop 1, you play rook to king 2, he then could have pawn to queen bishop 3 which… the rook would then protect the pawn and he could then try and advance down the centre. [9:47] Episode 10. Umm… yes, so I think we;re back to the original idea of attacking that pawn immediately. Rook to queen knight 1, rook to queen knight 1… Rook to queen knight 1, er… Rook to queen knight 1, rook to queen knight 1, bishop ‘bishop to there’ = Bd7 to there, to attack that pawn. [10:30] Episode 11. Hmm… I suppose we could attack that pawn immediately but… Attack, defend, so… if we played it ‘bishop to queen 2’ = Bd7 immediately: bishop to queen 2 to attack the pawn at knight… his knight pawn… er, his king knight pawn, he could defend it with pawn to bishop 3. You could then, whether you want to immediately, or at a later date, you can play pawn to king bishop 4. Hmm… I don’t like it – it’s getting a bit messy because he could then play rook to the 7th to attack the ‘rook to the 7th’ = Re7 pawn. [11:32] Episode 12. No, I think rook to queen knight 1 – I would play rook to queen knight 1. It doesn’t give him many options, whereas all the other things does give him options. I prefer him not to have many. [11:54] Table 9; Example verbal protocol (Player 8, Position B) The corresponding Problem Behaviour Graph is constructed row-by-row from each episode in the protocol. Although moves are operators and positions are states, it 75 makes more sense to label the moves and not the board positions, because (a) board positions are uniquely defined by the starting position and the sequence of moves; and (b) board positions are difficult to represent economically in a PBG. Each column of the PBG represents one ply (half-move). Hence all odd-numbered columns represents the player’s move and the even-numbered columns his opponent. Figure 9, below, illustrates a full PBG, including the following common features: Null moves (depicted by Ø, representing an unspecified move). Note that base moves can be null moves (e.g. Episode 8); Evaluations (depicted by a combination of symbols at the end of each leaf node: + = good for player on move, ? = unclear/ unspecified, – = bad for player on move); Branching (in Episode 3); Immediate reinvestigations of the same base move (e.g. Episodes 2-7); Non-immediate reinvestigation of the same base move (e.g. Episode 10) The selected move is shown in a separate grey box at the bottom of the PBG. Episode E1 E2 E3 1 Rb8 Rb8 Rb8 E4 E5 E6 E7 E8 E9 E10 E11 E12 Rb8 Rb8 Rb8 Rb8 Ø Kf8 Rb8 Bd7 Rb8 2 b3 Ø Rab1 b3 Rab1 Re7 Re7 Rab1 f3 Re2 Rab1 g3 + 3 + Rb5 ? + ? Rxb2 Rxb2 Kf8 + Ø Bd7 g5 5 6 Rxa7 Rxa7 ?/+ ? Rxc2 + c3 ? Re7 ?/- 4 ?/+ - Rb8 Figure 9; Problem Behaviour Graph for Player 8, Position B 76 Bibliography British Chess Federation. (2003) Conversion Between BCF Grade and FIDE Rating http://www.bcf.org.uk/grading/how_it_works/conversion.htm (last accessed 20th March 2005) Chessbase GmbH (1997), Fritz5 user’s manual, Hamburg, Chessbase GmbH. Calderwood, R., Klein, G. A. and Crandall, B. (1988), Time pressure, skill and move quality in chess, American Journal of Psychology, Vol. 101, No. 4. de Groot, A., Thought and Choice in Chess, USA, Basic Books Inc. Fransella, F., Bell, R. and Bannister, D. (2004), A Manual for Repertory Grid Technique, Chichester, John Wiley & Sons Ltd. Gobet, F (1998a), chess players’ thinking revisited. Swiss Journal of Psychology, 57, pp18-32. Gobet, F (1998b), Expert memory: a comparison of four theories, Cognition, Vol 66, pp115-52. Gobet, F. (2005), personal e-mail communication. Gobet, F., de Voogt, A. and Retschitzki, J. (2004), Moves in Mind: the psychology of board games, Hove, Psychology Press Gobet, F. and Simon, H. A. (1996), The Roles of Recognition Processes and Look-Ahead Search in Time-Constrained Expert Problem Solving: Evidence from Grand-Master-Level Chess. Psychological Science Vol. 7, No. 1. Gobet, F and Simon, H. A. (1998a), Expert Chess Memory: Revisiting the Chunking Hypothesis. Memory, Vol. 6, pp225-55 77 Gobet, F and Simon, H. A. (1998b), Pattern recognition makes search possible: Comments on Holding (1992), Psychological Research, Vol. 61, pp204-8. Holding, D. H. (1985), The Psychology of Chess Skill, USA, Lawrence Erlbaum Associates. Holding D. H. and Pfau, H. D., Thinking ahead in chess, American Journal of Psychology, Vol. 98, No. 2. Howell, D. C. (2002), Statistical Methods for Psychology, USA, Duxbury Thomson Learning. Newell, A and Simon, H. A. (1972), Human Problem Solving, USA, PrenticeHall. 78
© Copyright 2026 Paperzz