Blocking, Monotonicity, and Turing Completeness in a Database

Blocking, Monotonicity, and Turing
Completeness in a Database Language
for Sequences and Streams
Yan-Nei Law, Haixun Wang, Carlo Zaniolo
12/06/2002
Outline
Database Application in ATLaS
Turing Completeness
Data Streams
Blocking and Monotonicity
User Defined Aggregates (UDAs)
Important for decision support, stream
queries and other advanced database
applications.
UDA consists of 3 parts:
INITIALIZE
ITERATE
TERMINATE
Standard aggregate average
AGGREGATE myavg(Next Int) : Real
{ TABLE state(tsum Int, cnt Int);
INITIALIZE : {
INSERT INTO state VALUES (Next, 1);
}
ITERATE : {
UPDATE state
SET tsum=tsum+Next, cnt=cnt+1;
}
TERMINATE : {
INSERT INTO RETURN
SELECT tsum/cnt FROM state;
}
}
Online aggregation
Standard average aggregate returns
results at TERMINATE state.
Online aggregate: Get results before input
all the tuples

e.g. return the average for every 200 input
tuples.
RETURN statements appear in ITERATE
instead of TERMINATE.
Online averages
AGGREGATE online_avg(Next Int) : Real
{ TABLE state(tsum Int, cnt Int);
INITIALIZE : {
INSERT INTO state VALUES (Next, 1);
}
ITERATE: {
UPDATE state
SET tsum=tsum+Next, cnt=cnt+1;
INSERT INTO RETURN
SELECT sum/cnt FROM state
WHERE cnt % 200 = 0;
}
TERMINATE : { }
}
Table from external source
Access tables stored in external source
'C:\mydb\employees'.
TABLE employee(Eno Int, Name Char(18), Sal Real,
Dept Char(6)) SOURCE 'C:\mydb\employees';
SELECT Sex, online_avg(Sal)
FROM employee WHERE Dept=1024 GROUP BY Sex;
Calling other UDAs
UDAs can call other UDAs, including
recursively calling themselves.
Example: Computation of Transitive Closure
(the graph contains no directed cycle).
 Find all the nodes that is reachable from ‘000’.
Transitive Closure
TABLE dgraph(start Char(10), end Char(10)) SOURCE mydb;
AGGREGATE reachable(Rnode Char(10)) : Char(10)
{ INITIALIZE: ITERATE: {
INSERT INTO RETURN VALUES (Rnode)
INSERT INTO RETURN
SELECT reachable(end) FROM dgraph
WHERE start=Rnode;
}
}
SELECT reachable(dgraph.end) FROM dgraph
WHERE dgraph.start='000';
Turing Completeness
Turing Completeness is not a trivial property for
a language.
SQL is not Turing Complete.
A Turing Machine is a tuple M=(Q,,,,q0,!,F).
TM is determined by 4 elements




Transition map ;
Accepting states F;
Input tape ;
Initial state q0.
Turing Machine for L3eq = {anbncn|n≥1}
Transition map:
State\Symbol
a
p[marka]
q,x,1
q[findb]
q,a,1
r[findc]
b
c
r,x,1
r,b,1
s[find!]
x
q,x,1
s,x,1
r,x,1
s,c,1
t[finda]
u,a,-1
u[findx]
u,a,-1
v[check]
Input tape: aaabbbccc
Accepting state: {z}
Initial state: p
t,b,-1
!
t,c,-1
t,!,-1
t,x,-1
v,x,1
p,x,1
v,x,1
z,x,0
Turing Machine for L3eq
INSERT INTO transition VALUES
('p','a',1,'q','x'), ('q','a',1,'q','a'), ('t','a',-1,'u','a'),
('u','a',-1,'u','a'), ('q','b',1,'r','x'), ('r','b',1,'r','b'),
('t','b',-1,'t','b'), ('r','c',1,'s','x'), ('s','c',1,'s','c'),
('t','c',-1,'t','c'), ('q','x',1,'q','x'), ('r','x',1,'r','x'),
('t','x',-1,'t','x'), ('u','x',1,'p','x'), ('v','x',1,'v','x'),
('s','!',-1,'t','!'), ('t','!',1,'v','x'), ('v','!',0,'z','x');
INSERT INTO accept VALUES ('z');
INSERT INTO tape VALUES
('a',0),('a',1), ('a',2), ('b',3), ('b',4),
('b',5), ('c',6), ('c',7),('c',8);
Turing Machine in ATLaS(1)
For each iteration:


pass the current state, current symbol, position on
the tape to UDA called turing.
If  is defined, obtain the next state, next symbol
and movement of the head.
1. current state  next state;
2. current symbol  next symbol;
3. next position = current position+movement.
If  is not defined, TM halts.
check whether current state is an accepting state.
Turing Machine in ATLaS(2)
TABLE current(stat Char(1), symbol Char(1), pos Int);
TABLE tape(symbol Char(1), pos Int);
TABLE transition(curstate Char(1), cursymbol Char(1),
move int, nextstate Char(1), nextsymbol Char(1));
TABLE accept(accept Char(1));
Turing Machine in ATLaS(3)
AGGREGATE turing(stat Char(1), symbol Char(1),
curpos Int) : Int
{ INITIALIZE: ITERATE: {
/*If TM halts, return 1 or 0 (accept or reject)*/
INSERT INTO RETURN
SELECT R.C
FROM (SELECT count(accept) C
FROM accept A
WHERE A.accept = stat) R
WHERE NOT EXISTS (
SELECT * FROM transition T
WHERE stat = T.curstate
AND symbol = T.cursymbol);
Turing Machine in ATLaS(4)
/* write tape */
DELETE FROM tape
WHERE pos = curpos;
INSERT INTO tape
SELECT T.nextsymbol, curpos
FROM transition T
WHERE T.curstate = stat
AND T.cursymbol = symbol;
Turing Machine in ATLaS(5)
/* add blank symbol if necessary */
INSERT INTO tape
SELECT '!', curpos + T.move
FROM transition T
WHERE T.curstate = stat
AND T.cursymbol = symbol
AND NOT EXISTS (
SELECT * FROM tape
WHERE pos = curpos +T.move);
Turing Machine in ATLaS(6)
/* move head to the next position */
INSERT INTO current
SELECT T.nextstate, A.symbol, A.pos
FROM tape A, transition T
WHERE T.curstate = stat
AND T.cursymbol = symbol)
AND A.pos=curpos+T.move;
}
}
Turing Machine in ATLaS(7)
INSERT INTO current
SELECT 'p', A.symbol, 0
FROM tape A WHERE A.pos = 0;
SELECT turing(stat, symbol, pos) FROM current;
Data Streams(1)
For stream processing and continuous
queries, results must be returned promptly,
without waiting for future input.
Non-blocking operators are suitable.
No terminate state is needed.
Data Streams(2)
ATLaS UDAs is a powerful tool for
approximate aggregates, synopses, and
adaptive data mining.
Stream is a source of data instead of
table.
Data from a stream are time-stamped.
Example: Compute the average of long
distance calls placed by each customer.
Data from a stream
STREAM calls(customer_id Int, type Char(6), minutes Int,
Tstamp: Timestamp) SOURCE mystream;
SELECT periodic_avg(S.minutes)
FROM Calls S
WHERE S.type = 'Long Distance‘
GROUP BY S.customer_id
Average of the last 200 inputs
AGGREGATE periodic_avg(Next Int, Time: Timestamp) :
(Real, Timestamp)
{ TABLE state(tsum Int, cnt Int);
INITIALIZE : {
INSERT INTO state VALUES (Next, 1);
}
ITERATE : {
UPDATE state
SET tsum=tsum+Next, cnt=cnt+1;
INSERT INTO RETURN
SELECT tsum/cnt, Time FROM state WHERE cnt % 200 = 0;
UPDATE state
SET tsum=0, cnt=0
WHERE cnt % 200 = 0;
}
}
Average on a five-minute window(1)
AGGREGATE window_avg(Next Int, Time Timestamp):
(Real,Timestamp)
{ TABLE state(tsum Int);
TABLE memo(mNext Int, mTime Timestamp);
INITIALIZE : {
INSERT INTO state VALUES (Next);
INSERT INTO memo VALUES (Next, Time);
}
Average on a five-minute window(2)
ITERATE: {
INSERT INTO memo VALUES (Next, Time);
UPDATE state SET tsum=tsum+Next - (
SELECT sum(mNext) FROM memo
WHERE mTime+ 5 MINUTES < Time);
DELETE * FROM memo
WHERE mTime+ 5 MINUTES < Time;
INSERT INTO RETURN
SELECT tsum, Time FROM state;
}
}
Blocking and Monotonicity
A blocking query operator is a query operator
that is unable to produce the first tuple of the
output until it has seen the entire input

e.g. myavg in Example 1
A non-blocking query operator is one that
produces all the tuples of the output before it has
detected the end of the input.

e.g. online_avg in Example 2
Definition
Sequence of Length n: S=[t1,…,tn].
[ ] has length 0.
Presequence: Sk=[t1,…tk], 0<k≤n.
S  L: Lk = S for some k. ( is partial order.)
For an operator G, Gj(S) is the cumulative output
produced up to step j by G.
blocking: Gj(S) = [ ] for j<n.
non-blocking: Gj(S)= Gj(Sj)=G(Sj).
partial blocking: [ ] < Gj(S) < G(Sj).
ATLaS UDA is



blocking: returns answers only in TERMINATE
state;
non-blocking: returns answers only in
INITIALIZE or ITERATE states;
partially blocking: returns answers in
TERMINATE and other states.
A function F can be computed using a
non-blocking operator if F is monotonic wrt
.
ATLaS is complete wrt non-blocking
computations.
Every monotonic function can be
computed by an ATLaS program with no
TERMINATE state.