Realtime Online Solving of Quantified CSPs

Realtime Online Solving of Quantified CSPs
⋆
David Stynes and Kenneth N. Brown
Cork Constraint Computation Centre,
Dept of Computer Science, UCC, Cork, Ireland
[email protected], [email protected]
Abstract. We define Realtime Online solving of QCSP to interactively
solve Quantified Constraint Satisfaction Problems (QCSPs). We consider
when the other participant in the problem is adversarial, benevolent or
randomly picking. For an adversarial opponent, we develop an approach
based on alpha-beta pruning, and combine it with value ordering heuristics. We show that this can give noticeable improvements in performance
on tough problems. For non-adversarial opponents we show that we can
frequently achieve solutions even on problems which lack a winning strategy and that we can improve our success rate by using a lower level of
consistency than AC to maximise pruning without removing solutions.
1
Introduction
Quantified Constraint Satisfaction Problems (QCSP)[1] are a generalization of
CSPs which can be used to model many PSPACE-complete problems, including
adversarial game playing, planning under uncertainty, and interactive design. In
QCSPs, the variables are in a fixed sequence, and they can be either existentially
or universally quantified. As defined by [2], a solution to a QCSP is tuple of values
for each variable such that the constraints are satisfied. A Winning Strategy is a
family of functions which specify the values of each existential variable based on
the values of universal variables which precede it in the quantifier sequence and
every tuple defined by the winning strategy is a solution. A QCSP is satisfiable
iff a winning strategy exists.
In an Online CSP [3], some variables of a problem may not be under the
control of the solver. These uncontrolled variables have their value assignments
set externally. We assume all of the variables are arranged in a fixed sequence
of sets of controlled variables alternating with sets of uncontrolled variables.
We must assign values to the first set of controlled variables, then observe the
assignment to the following set of uncontrolled variables, and then continue with
the next set of controlled variables, and so on. The aim is to achieve a satisfying
complete assignment to the variables. We can regard QCSP as a model of the
Online CSP problem, where the existential variables represent the controlled
⋆
This work was supported in part by Microsoft Research through the European PhD
Scholarship Programme, and by the Embark Initiative of the Irish Research Council
for Science, Engineering and Technology (IRCSET).
variables, and the universal variables represent the uncontrolled variables. If the
QCSP has a winning strategy, then we can guarantee to find a solution to the
Online CSP.
In practice for many possible applications, such as Mixed Initiative Planning
[4] or adversarial games, we may have limited time to make each decision and
searching for a winning strategy for a large problem may be unfeasible. In a
Realtime Online CSP, the solver has limited time to decide the values for each
controlled variable, and thus searching for a winning strategy for the QCSP may
not be feasible. Instead, we must search for a partial strategy which we expect
will lead to a solution. The model then becomes one of game-tree search, in which
a player must make choices for each move by anticipating the likely choices of
the other player.
In this paper, we develop the use of AI game-tree search strategies for realtime online solving of QCSPs. We consider three types of possible opponents.
(i) adversarial opponents, who seek to prevent us finding a solution, (ii) random
opponents, who select values randomly, and (iii) benevolent opponents, who are
also trying to find a solution. We develop a version of alpha-beta pruning with
value ordering and show that it is most effective for larger more difficult problems
against adversarial opponents. We demonstrate that against non-adversarial opponents we can frequently achieve solutions even when no winning strategy exists and also we show that a reduced level of consistency can achieve noticeably
better results than AC when faced with these non-adversarial opponents.
2
2.1
Background
QCSPs
A Quantified Constraint Satisfaction Problem (QCSP) is a formula ΘC, where
Θ is a sequence of quantified variables Q1 x1 ...Qn xn , each Qi is either ∃ or ∀,
and each variable xi occurs exactly once in the sequence. C is a conjunction of
constraints (c1 ∧ ... ∧ cm ) where each ci constrains a subset of {x1 , ..., xn }.
If Θ is of the form ∃x1 Θ1 then ΘC is satisfiable if and only if there exists
some value a ∈ D(x1 ) such that (Θ1 C|x1 =a) is satisfiable. If Θ is of the form
∀x1 Θ1 then ΘC is satisfiable if and only if for every value a ∈ D(x1 ), (Θ1 C|x1 =a)
is satisfiable. If Θ is empty, then there are no unassigned variables left in C, so
C is simply a conjunction of ground logical statements, and ΘC is satisfiable if
and only if C is true.
A Solution to a QCSP P is a tuple t, such that |t| = n and ∀i, ti ∈ Di
and if we assign Di = ti all constraints in C are solved. A Strategy S is a
family S = {sxi |Qi = ∃} of functions of the following type: for each xi which is
existential, function sxi associates a value in D to each tuple of the preceding
universal variables. The set of Scenarios of a strategy S for a QCSP P , denoted
sce(S) is the set of all tuples t which are such that, for each i ∈ 1...n, we have:
if Qi = ∃ then txi = sxi (t ↓Ai−1 ), where Ai−1 is the set of universal variables
preceding xi . A strategy S is a Winning Strategy for a QCSP P iff every tuple
t in sce(S) satisfies the constraints of P , i.e. every scenario is a solution to P .
A QCSP is satisfiable iff there exists a winning strategy. Some work refers to a
winning strategy as a solution to a QCSP [5]. The distinction between the two is
important however, as a QCSP may have solutions even if no winning strategy
exists.
Interest in QCSPs has been growing, with a number of efficient solvers for
detecting winning strategies now in existence. [1] and [6] extended CSP propagation algorithms for QCSPs. QCSP-Solve [7] is based on an extension of standard
CSP with features from Quantified Boolean Formulae (QBF) solvers incorporated to improve performance. Block-Solve [5] is a bottom-up solver for QCSPs
which finds partial solutions to the last block of the QCSP and attempts to
extend them to full solutions. QeCode [8] is a QCSP solver which reuses a classical CSP solver (GeCode) for the quantified case. QeCode also accepts QCSP+
problems [9], which are QCSPs with restricted quantification, which makes modeling real problems easier. Strategic CSPs [10] also extend QCSPs to simplify
the modeling of problems, where universal variables adapt their domains to be
compatible with previous choices, while [11] extends techniques for problem relaxation and explanations to QCSP. [12] developed propagation algorithms for
non-binary QCSP.
In binary QCSPs two types of constraints can be removed after preprocessing
[7]. For ∃∀-constraints, any value of xi that conflicts with any value of xj can
be removed in preprocessing, and the constraint can then be ignored. Similarly,
∀∀-constraints are checked only during preprocessing (and any conflicting pair of
values means the QCSP has no solution). Only ∃∃-constraints and ∀∃-constraints
need be considered throughout search.
Gent et al [7] define the Pure Value Rule, derived from the Pure Literal Rule
in QBF [13]: a value a ∈ D(xi ) of a QCSP is pure iff ∀xj ∈ Q, where xi 6= xj
and ∀b ∈ D(xj ), the assignments (xi , a) and (xj , b) are compatible. The Pure
Value Rule, in their solver QCSP-Solve, detects pure values before and during
search. When testing for a pure value, only future variables need to be checked
against, as FC/MAC ensure compatibility with the previous variables. If a pure
value, a, of an existentially quantified variable, xi , is discovered during preprocessing (search), then the assignment (xi , a) is made and all other values in
D(xi ) are permanently (temporarily) removed. If a pure value, a, of an universally quantified variable, xi , is discovered during preprocessing (search), then a
is permanently (temporarily) removed from D(xi ). If all values of a universal are
pure, QCSP-Solve effectively ignores that variable. Pure Values are also useful
for reducing the search space in Realtime Online solving of QCSPs.
2.2
Online CSP
Dynamic Constraint Satisfaction [14–16, 3] considers problems that change over
time, by the addition, retraction or modification of variables and domains. Formalisms that specifically focus on problems that progress by the assignment of
values to variables include Mixed CSP [17] and Stochastic CSP [18], although
in the latter case we associate probability distributions with the assignment of
each uncontrolled variable. Bent and van Hentenryck [19] have considered sampling methods for solving Online Stochastic Optimization problems under time
constraints, in which problems are gradually revealed, although again with an
assumption that there is some model of the likely growth. Finally, Groetschel [20]
considers the general problem of Realtime Online Combinatorial Optimization.
2.3
Game-Tree Search
In AI game playing [21], the game is represented as a tree of game states. Players
take turns making moves, which are a transition from one state in the tree to
another. Players perform game-tree search to determine their best option, but it
is typically infeasible for the player to search the entire tree. Thus, players are
forced to form estimates of which move will lead them to a win. In practice, a
subtree of limited depth is generated and the leaf states are evaluated according
to a problem specific heuristic function and then these values must be propagated
back up the tree to the current root state, in order for the player to make a
decision. For adversarial zero sum games, most algorithms for performing this
propagation are based upon the minimax heuristic [21], which states that a
player should make the choice which minimises the (estimated) maximum that
the opponent can achieve. Since it is a zero sum game, a move which is good for
the player is bad for his opponent and vice versa. When propagating up the tree,
a state in which it is the player’s move will therefore take the value of its child
with the highest estimate, since that is the minimum for the opponent due to
it being a zero sum game. And conversely, a state in which it is the opponent’s
move will take the value of its child with the lowest estimate. In this way values
can be assigned to all the states in the subtree based upon the estimates made at
the leaf states. The player can then make the move to the child state of the root
state which has the highest value. A variation of minimax designed for general
sum games, Maximax [22], assumes each player will attempt to maximise its own
objective. [22] applied game-tree search in what they call Adversarial CSP, in
which solving agents take turns to choose instantiations for variables in a shared
CSP.
3
Realtime Online solving of QCSP
As noted previously, when performing realtime online solving of a QCSP, we
assume that finding a winning strategy within the time limit will initially be
infeasible. As we assign values and approach the final variable it will eventually
become possible to find a winning strategy or detect the lack of existence of a
winning strategy from the current variable. While it is still infeasible to search
for a winning strategy, we can instead generate partial strategies and pick what
we believe to be the best one once the time limit is reached. The methods of
generating these partial strategies and choosing between them is analogous to
A.I. game-tree search. We can view the QCSP as being like a two person game,
in which the player is responsible for the existentially quantified variables and
the opponent is controlling the universal variables. The existential player is seeking to find a solution, while the universal opponent may be either attempting
to cause a domain wipeout, be picking randomly, or be attempting to help form
a solution. When a variable is reached, a player spends their limited time performing game-tree search, looking ahead at their future choices and the choices
of their opponent and effectively exploring a sub-tree of the tree of all possible
game states that can follow the current position. The player makes estimates
of how likely it is that the leaf nodes of this subtree can be extended to a full
solution and then propagates the estimates back up to the root node in order to
select the best move possible for achieving its individual objective. The opponent
also performs a lookahead search while the player is making its move to plan for
its next move.
We use constraint propagation to improve the lookahead search, allowing the
player to backtrack out of dead ends sooner. We use QCSP Arc Consistency(AC)
[1], in which pruning any value from a universal domain is also considered a
domain wipeout, and QCSP Forward Checking (FC)1 . Our aim is to achieve a
solution, but our choice of consistency can be detrimental to this. When AC
reports a domain wipe out caused by pruning a universal, it means that no
winning strategy exists to extend the current partial solution to a full solution.
However, the other values in the universal domain which were not the cause of
the wipeout may still be extensible to full solutions. Thus possible solutions are
sometimes lost by using AC, which is undesirable when facing non-adversarial
opponents. When facing capable adversarial opponents, retaining solutions which
are known to not be part of a winning strategy is inadvisable as we know that if
they are aware of it, they can force a domain wipeout before we can reach those
solutions. When using FC, solutions are not pruned since only ∃∃-constraints
and ∀∃-constraints are considered (other types being removed in preprocessing),
but the lower amount of propagation than AC often results in picking values
which are not extendable to a solution.
To overcome these weaknesses of AC and FC, we define a new level of consistency for QCSPs, Existential Arc Consistency (EAC). A QCSP is existential
arc consistent if all of the existentially quantified domains are arc consistent.
That is, for all values v ∈ D(i) where Qi is ∃, for all constraints Cij , i 6= j, there
exists some value w ∈ D(j) such that (i=v, j =w) satisfies Cij . To implement
EAC, ∃∃-constraints are treated as normal in AC, but ∀∃-constraints are only
ever forward checked (when the universal is assigned a value), they are never
added to the constraint queue when the existential domain is altered. EAC will
not remove solutions from the problem and achieves more pruning than FC. If
AC on a state returns a domain wipeout and EAC on the same state does not,
it means a constraint Cae on variables ∀a∃e, caused a value, v ∈ D(a), to be
pruned as part of AC because no value x ∈ D(e) existed which supports v in
Cae . For the case of EAC, when it is variable a’s turn to be assigned a value, it
is then possible to detect using FC that assigning value v causes a domain wipe1
Since only constraints of types ∃∃ and ∀∃ need be considered, QCSP FC only acts
upon existential domains and no modification of CSP FC [23] is necessary.
out. Against an adversarial opponent, it is therefore never a good idea to use
EAC instead of AC. Any value choices EAC makes available to you which AC
would have pruned should never be selected, as they are trivial for the opponent
to cause a domain wipeout on. Against non-adversarial opponents, retaining all
solutions by using EAC instead of AC can be beneficial.
We initially focus on adversarial universal opponents, who are seeking to
cause a domain wipe out. For all lookahead methods except Alpha Beta Pruning,
when a node is explored the quality of all of its children is estimated and the
node’s estimation is updated as being the best of its children’s estimations.
We use minimax reasoning to decide which is the best child’s estimate. On an
existential variable the estimate which is most likely to lead to a solution is taken
to be the best. On a universal variable the estimate which is most likely to lead
to a domain wipe out is taken to be the best. The node’s updated estimation is
then propagated back up the tree, again using minimax, until the current root
node has been updated.
The default method we use for estimating a child state is by using Dynamic
Geelen’s Promise (DGP), the product of the future existential domain sizes,
which is shown as being the most effective solution-focused value ordering heuristic for top-down QCSP solving [24]. However DGP is only effective at comparing
states for the same variable. Very often we will not actually be comparing states
for the same variable as the search will have propagated up estimations from a
later variable. For example, if we are comparing a state at the 2nd last existential
variable (where the product of the future domains is a single value) to one at the
10th last existential variable (where the product of the future domains is nine
values multiplied together) it is clear that these two estimations cannot be used
to evaluate between the corresponding states. To compare such states we derived
two new heuristics based upon DGP, both of these heuristics will give the same
relative ordering as DGP would for states at the same variable. The Geometric
Mean (GM) is calculated by multiplying the future supports and then the nth
root of the resulting product is taken (where n is the number of future existential
domains). The Proportional Promise (PP) takes the DGP as a proportion of the
maximum possible DGP if no pruning of the future domains had ever occurred.
It is calculated as the product of the future existential domain sizes divided by
the product of the original sizes of those future domains before any values were
pruned. Lookahead methods not explicitly shown as using PP or GM are using
DGP.
Depth-First (DF) is an uninformed search which at each node picks a child
to explore, until it reaches a leaf node. It then backtracks returning to the most
recent node whose children it hadn’t finished exploring. It is implemented as
a LIFO (Last In First Out) stack. When a node is explored all its children are
added to the stack, and then the last node added to the stack is removed and that
node is explored next. This continues until the stack is emptied (the entire tree
has been explored) or the time limit is reached. Intelligent Depth-First(IDF)
with PP is an improved form of DF which orders the children using the PP
heuristic before adding them to the stack, so that when they are removed from
the stack it is in the order from best to worth (using minimax reasoning to decide
which is best).
Breadth-First (BF) starts at the root and explores all of its children, then
the children of those children, and then the children of all those, etc.. until the
time limit has been reached, or the entire tree is explored. It is implemented
as a FIFO (First In First Out) queue. When a node is explored all its children
are added to the end of the queue and then the node at the start of the queue
is removed and explored next. This continues until the queue is emptied (the
entire tree has been explored) or the time limit is reached.
Partial Best-First(PBF) is a modification of Best-First lookahead. In BestFirst Lookahead, an ordered list of nodes is maintained. The best node is removed
from the list and its children are estimated and then added into the ordered list.
However this does not work well for exploring QCSP with an adversarial opponent as the best move for a universal variable is the worst move for an existential
variable and vice versa. A lot of time will be wasted exploring down a high estimated universal value’s subtree which would never be picked by the universal
player who favours lower estimates. PBF works by adding all the children of
existential nodes to the ordered list, but for a universal variable the best child
will be immediately explored and the remaining children have their estimation
negated and then are also added to the ordered list. Thus these other universal
values are only explored once all existential nodes have been explored. We experimented with both DGP and PP versions of PBF and neither was consistently
superior to the other so both are included in our results.
Alpha Beta Pruning[25](AB) This performs a depth-first lookahead, but additionally maintains two values, alpha and beta, which represent the minimum
score that the maximizing player is assured of and the maximum score that
the minimizing player is assured of respectively. It stops completely evaluating
a node and prunes its subtree from the search space when at least one child
has been found that proves the node to be worse than a previously examined
neighboring node. This pruning is based upon the assumption of both players
using minimax reasoning when selecting their moves. We perform an iteratively
deepening form of AB which searches to a fixed depth limit and then increases
that depth limit before performing AB lookahead again. We continue until we
have searched to the final variable (maximum depth) or we have run out of time.
In our tests we have set the initial depth limit to 2 and at each iteration we increase the depth limit by 1. If one has specific knowledge of time limits and the
problems we are solving, it would possible to improve the performance of AB
beyond what is shown in our tests by setting a deeper initial depth. Dynamic
Geelen’s Promise Alpha Beta Pruning(DGP AB)This is the same as AB except
that instead of picking the lexicographically first child node, we instead use DGP
value ordering to pick what appears to be the best choice for that variable. The
intention being to pick values which allow more of the search space to be pruned
giving us more time to search to deeper depth limits.
We identify pure values during lookahead search and if it is for an existential
variable we assign the variable that value, since a value which does not prune any
values from the future domains is most likely to lead to a solution. For universally
quantified pure values, we prune them from the domain of the universal variable,
unless it is the final value in the domain in which case we leave it present in
order to not cause a domain wipe out. These values can safely be pruned from
the universal’s domain because they are definitely the worst possible move for
an adversarial universal opponent, since selecting it would maximise the size
of the future domains by not pruning anything from them. For all types of
opponents this behaviour is correct for the existential values, but the universal
behaviour must change depending on the type of opponent. If the opponent
were randomly picking, no pure values can be pruned from universal domains,
since the randomly picking opponent may still select them. If the opponent
is benevolently helping us reach a solution, then a universal pure value can be
treated like an existential pure value and can be instantly assigned to the variable
without investigating the other values.
4
Experiments
We tested on randomly generated binary QCSPs with fully interleaved quantifiers, using the flawless generator described in [24] which prevents the flaws
described in [26]. For each value of Q∃∃ we generated 50 random problems. We
aim to achieve a solution, considering any length of partial assignment that leads
to a domain wipeout as a failure. We omit some of the lookahead methods from
our graphs for clarity. Both participants receive the same time limit per move
unless explicitly stated otherwise. We ran all tests on a single PC with both the
existential participant and the universal participant performing lookahead in
parallel. As a result the actual cpu time each receives per move is approximately
half of the time limit, since they are sharing a single cpu. In our implementation,
we store states as we explore them during lookahead, under the assumption that
the time limit is short enough that we cannot run out of memory space during
it. If the time limits are longer, less saving of states and more recomputing may
be necessary.
Against an adversarial opponent, both participants use AC for propagation.
Figure 1 shows the performance of the different lookahead methods against a universal opponent using a complete alpha beta pruning lookahead that searches to
the end of the search tree with no time limit on 20 variable problems. Existential players still have the same time limit for looking ahead during the universal player’s moves as they do on their own moves so they cannot also perform
a complete lookahead. All remaining figures are against incomplete opponents
who share the same time limit as the existential player. Figure 2 shows our performance against a universal opponent using depth-first lookahead, also on 20
variable problems. We sometimes can achieve a solution even when the QCSP
has no winning strategy against such a poor universal opponent.
We find that on these smaller easier problems, AB often outperforms DGP
AB as there is not much scope to improve the amount of pruning done by AB and
so the added value ordering’s overhead causes it to perform more poorly. PBF
Against Universal using Alpha Beta with no timelimit
50
40
Solutions
30
20
10
DF
IDF PP
PBF PP
AB
DGP AB
Problems with a Winning Strategy
0
25
30
35
Qee
40
45
Fig. 1. n = 20, b = 1, d = 8, p = 0.20, q∀∃ = 1/2, T imeLimit = 1000ms
Against Universal using Depth First with timelimit
50
40
Solutions
30
20
10
DF
IDF PP
PBF PP
AB
DGP AB
Problems with a Winning Strategy
0
25
30
35
Qee
40
Fig. 2. n = 20, b = 1, d = 8, p = 0.20, q∀∃ = 1/2, T imeLimit = 1000ms
45
PP performs very well on these smaller problems, however as Figure 3 shows,
which is on 51 variable problems against a depth first opponent, it does not
scale well at all. On the other hand, as the problem’s difficulty increases, DGP
AB becomes an increasingly better a choice. The amount of pruning possible
is greatly increased by the value ordering and significantly deeper levels can be
explored than that of AB without value ordering. Figure 4 is against a universal
using DGP AB, again on 51 variable problems. Even as the universal opponent
is improved, DGP AB maintains its lead in performance.
Against Universal using Depth First with timelimit
50
40
Solutions
30
20
10
DF
IDF PP
PBF PP
AB
DGP AB
0
70
75
80
85
90
95
Qee
Fig. 3. n = 51, b = 1, d = 8, p = 0.20, q∀∃ = 1/2, T imeLimit = 1000ms
5
Non-Adversarial Universal Participants
We also test the lookahead methods against universal participants who are either
picking randomly or benevolently. For these tests we reduced the time limit to
500ms as it is significantly easier to do better against a non-adversarial opponent
and a limit of 1000ms makes differences between the methods less visible. We
also increase the number of test cases to 100 as a randomly choosing opponent
introduces more variance into the results and a larger sample space reduces this
variance. For a randomly picking universal, we introduce a new means of propagating the results of the lookahead back up to the root node, which we term
Weighted Estimate (WE). On existential variables, it performs as minimax. On
Against Universal using DGP Alpha Beta with timelimit
50
40
Solutions
30
20
10
DF
IDF PP
PBF PP
AB
DGP AB
0
70
75
80
85
90
95
Qee
Fig. 4. n = 51, b = 1, d = 8, p = 0.20, q∀∃ = 1/2, T imeLimit = 1000ms
universal variables, the GM or PP evaluation of each child is divided by the number of children and all of these values are summed to provide the new estimate
for the universal variable. This value is then normalised to let us distinguish between children which potentially have domain wipe outs, and those which have
none at all. This is done so that one can choose values that minimise the chance
of a random opponent causing a domain wipe out. Figure 5 shows our results
with both participants using AC, run across 100 test cases, we see that those using WE achieve solutions significantly more often than the amount of problems
which had a winning strategy. Figure 6 shows the same tests but using EAC instead for propagation for both participants. We observe moderate improvements
to the performance of the lookahead methods using WE as we would expect
when using a propagation algorithm which does not remove solutions against a
non adversarial opponent. Modifying DGP AB to pick randomly when it thinks
all choices leads to a domain wipe out also provides very favourable results as
seen in figs 5 and 6.
For a benevolent opponent, we use maximax (MM) [22], where one picks
values which are most likely to lead to a solution on both existentially and universally quantified variables. Figure 7 shows results while using AC propagation
against a universal using breadth-first lookahead with maximax. We see that
almost all lookahead methods using maximax perform very well, with no clearly
discernable best method. Once again we achieve significantly more solutions than
the number of problems which had a winning strategy. By swapping propagation
to EAC, as shown in Figure 8, we can get even larger amounts of solutions. All
Against Universal using Random, using AC
100
80
Solutions
60
40
DF
BF
AB
DGP AB
BF PP WE
PBF GM WE
Problems with a Winning Strategy
20
0
15
20
25
30
Qee
35
40
45
Fig. 5. n = 20, b = 1, d = 8, p = 0.20, q∀∃ = 1/2, T imeLimit = 500ms
Against Universal using Random, using EAC
100
80
Solutions
60
40
DF
BF
AB
DGP AB
BF PP WE
PBF GM WE
Problems with a Winning Strategy
20
0
15
20
25
30
Qee
35
40
Fig. 6. n = 20, b = 1, d = 8, p = 0.20, q∀∃ = 1/2, T imeLimit = 500ms
45
lookahead methods receive a large increase in number of solutions achieved, with
BF using MM and DGP AB doing best at low and mid range values of Q∃∃ and
IDF PP MM performing best at many high Q∃∃ values.
Against Universal using Breadth First with Maximax with timelimit, using AC
50
40
Solutions
30
20
DF
AB
DGP AB
BF MM
PBF PP MM
IDF PP MM
Problems with a Winning Strategy
10
0
15
20
25
30
Qee
35
40
45
Fig. 7. n = 20, b = 1, d = 8, p = 0.20, q∀∃ = 1/2, T imeLimit = 500ms
6
Conclusions and Future Work
We have developed a system for Realtime Online solving of QCSPs, proposing
and developing various types of lookahead methods to improve the choices we
make. We have shown that versus an adversarial opponent, using DGP AB is
the most effective lookahead method on larger problems. We also showed that
against non-adversarial opponents we can frequently expect to form a solution
even when there is no winning strategy, and that we can improve our success
rate by dropping from AC to existential arc consistency, to maximise pruning
without removing solutions. For future work, we intend to formalise the notions
of Realtime Online CSPs and QCSPs and to extend the current framework to
also cover constraints of arity larger than two. We also intend to add support
for soft constraints, allowing comparisons of solution quality for this framework
instead of an absolute failure or success.
Against Universal using Breadth First with Maximax with timelimit, using EAC
50
40
Solutions
30
20
DF
AB
DGP AB
BF MM
PBF PP MM
IDF PP MM
Problems with a Winning Strategy
10
0
15
20
25
30
Qee
35
40
45
Fig. 8. n = 20, b = 1, d = 8, p = 0.20, q∀∃ = 1/2, T imeLimit = 500ms
References
1. Bordeaux, L., Monfroy, E.: Beyond NP: Arc-consistency for quantified constraints.
In: Proceedings of CP. (2002) 371–386
2. Bordeaux, L., Cadoli, M., Mancini, T.: CSP Properties for Quantified Constraints:
Definitions and Complexity. In: Proceedings of AAAI. (2005) 360–365
3. Brown, K.N., Miguel, I.: Uncertainty and Change. Chapter 21 of Handbook of
Constraint Programming (2006) 731–760
4. Finzi, A., Orlandini, A.: A mixed-initiative approach to human-robot interaction
in rescue scenarios. In: Workshop on Mixed-Initiative Planning And Scheduling,
ICAPS. (2005) 36–43
5. Bessiere, C., Verger, G.: Blocksolve: A bottom-up approach for solving quantified
CSPs. In: Proceedings of CP. (2006) 635–649
6. Mamoulis, N., Stergiou, K.: Algorithms for quantified constraint satisfaction problems. In: Proceedings of CP. (2004) 752–756
7. Gent, I.P., Nightingale, P., Stergiou, K.: QCSP-Solve: A solver for quantified
constraint satisfaction problems. In: Proceedings of IJCAI. (2005) 138–143
8. Benedetti, M., Lallouet, A., Vautard, J.: Reusing CSP propagators for QCSPs.
In: Proceedings of Workshop on Constraint Solving and Contraint Logic Programming, CSCLP. (2006) 63–77
9. Benedetti, M., Lallouet, A., Vautard, J.: QCSP made Practical by Virtue of Restricted Quantification. In: Proceedings of IJCAI. (2007) 38–43
10. Bessiere, C., Verger, G.: Strategic constraint satisfaction problems. In: Proceedings
of CP Workshop on Modelling and Reformulation. (2006) 17–29
11. Ferguson, A., O’Sullivan, B.: Quantified Constraint Satisfaction: From Relaxations
to Explanations. In: Proceedings of IJCAI. (2007) 74–79
12. Nightingale, P.: Consistency and the Quantified Constraint Satisfaction Problem.
PhD thesis (2007)
13. Cadoli, M., Schaerf, M., Giovanardi, A., Giovanardi, M.: An algorithm to evaluate
quantified boolean formulae and its experimental evaluation. Journal of Automated
Reasoning 28(2) (2002) 101–142
14. Dechter, R., Dechter, A.: Belief maintenance in dynamic constraint networks. In:
Proceedings of AAAI. (1988) 37–42
15. Bessiere, C.: Arc-consistency in dynamic constraint satisfaction problems. In:
Proceedings of AAAI. (1991) 221–226
16. Schiex, T., Verfaillie, G.: Nogood recording for static and dynamic constraint
satisfaction problems. International Journal of Artificial Intelligence Tools 3(2)
(1994) 187–207
17. H. Fargier, J.L., Schiex, T.: Mixed constraint satisfaction: a frameworkfor decision
problems under incomplete knowledge. In: Proceedings of AAAI. (1996) 175–180
18. Walsh, T.: Stochastic constraint programming. In: Proceedings of ECAI. (2002)
111–115
19. Bent, R., van Hentenryck, P.: Regrets only! online stochastic optimization undertime constraints. In: Proceedings of AAAI. (2004) 501–506
20. M. Grötschel, S.K., J. Rambau, T.W., Zimmermann, U.T.: Combinatorial Online
Optimization in Real Time. Online Optimization of Large Scale Systems (2001)
679–704
21. Shannon, C.E.: Programming a computer for playing chess. Philosophical Magazine (Series 7) 41 (1950) 256–275
22. Brown, K.N., Little, J., Creed, P.J., Freuder, E.C.: Adversarial constraint satisfaction by game-tree search. In: Proceedings of ECAI. (2004) 151–155
23. Haralick, R.M., Elliott, G.L.: Increasing tree search efficiency for constraint satisfaction problems. Artificial Intelligence 14(3) (1980) 263–313
24. Stynes, D., Brown, K.N.: Value Ordering for Quantified CSPs. Accepted for
publication in Constraints (2008)
25. Knuth, D.E., Moore, R.W.: An analysis of alpha-beta pruning. Constraints 6(4)
(1975) 293–326
26. Gent, I.P., Nightingale, P., Rowley, A.: Encoding quantified CSPs as quantified
boolean formulae. In: Proceedings of ECAI. (2004) 176–180