exe5b

DATA BASES 2 – MARCH 4TH, 2015 – TIME: 2H 30M
PROF. DANIELE M. BRAGA, PROF. STEFANO PARABOSCHI
A. Active Databases (10 p.)
A financial institution manages the portfolios of its customers. Each customer owns financial instruments (items such
as stocks, bonds, ETFs, etc.), organized into a hierarchical structure of sub-portfolios, of arbitrary depth. Each
customer may own several portfolios, containing financial items and (sub-)portfolios:
CUSTOMER(CustomerId,Name,Birthdate,FinancialWealth)
MAINPORTFOLIO(PortfolioId,PortfolioName,OwnerCustomer,Value)
SUBPORTFOLIO(PortfolioId,PortfolioName,PortfolioUp,Value)
HOLDING(PortfolioId,ItemId,Qty)
TRADEVALUES(ItemId,Timestamp,Value)
Attribute PortfolioUp represents the portfolio that contains a sub-portfolio. Design a set of triggers that react to
changes of trade values (insertion of the latest quotes) and to changes of the portfolio contents (insert/delete/update of
quantity in the holding of items, and changes to the hierarchical containment), keeping updated the Value of
portfolios and the financial wealth of each customer (sum of the values of the owned portfolios). Triggers should take
into account the efficient propagation of updates in the hierarchy. The initial situation is consistent, inserted portfolios
are always executed with Value set to 0.
create trigger ReactToNewQuotes
after insert on TradeValues
for each row
begin
Incremental update of Portfolios’ Value due to new incoming quotes
declare deltavalue currency;
select ( new.Value – Value ) into deltavalue
from TradeValues
where ItemId = new.ItemId and Timestamp = ( select max(Timestamp) from TradeValues
where ItemId=new.ItemId and Timestamp < new.Timestamp )
update MainPortfolio M
set Value = Value + ( select Qty * deltavalue from Holding
where ( M.PortfolioId, new.ItemId ) = ( PortfolioId, ItemId ) )
where ( M.PortfolioId, new.ItemId ) in ( select PortfolioId, ItemId from Holding )
update SubPortfolio S
set Value = Value + ( select Qty * deltavalue from Holding
where ( S.PortfolioId, new.ItemId ) = ( PortfolioId, ItemId ) )
where ( S.PortfolioId, new.ItemId ) in ( select PortfolioId, ItemId from Holding )
end
create trigger PropagateValueChangeInSub
after update of Value on SubPortfolio
for each row
begin
Incremental propagation of Value changes in the hierarchy (1/2)
set Value = M.Value + new.Value – old.Value
where M.PortfolioId = new.PortfolioUp
update SubPortfolio S
set Value = S.Value + new.Value – old.Value
where S.PortfolioId = new.PortfolioUp
end
create trigger PropagateValueChangeInMain
after update of Value on MainPortfolio
for each row
begin
Incremental propagation of Value changes in the hierarchy (2/2)
update Customer C
set FinancialWealth = FinancialWealth + new.Value – old.Value
where CustomerId = new.OwnerCustomer
end
create trigger PurchaseOfNewItems
after insert on Holding
for each row
begin
Purchase [insertion] of new Items
declare lastquote currency;
select Value into lastquote from TradeValues
where ItemId = new.ItemId and Timestamp = ( select max(Timestamp) from TradeValues
where ItemId=new.ItemId )
update MainPortfolio M
set Value = M.Value + new.Qty * lastquote
where M.PortfolioId = new.PortfolioId
update SubPortfolio S
set Value = S.Value + new.Qty * lastquote
where S.PortfolioId = new.PortfolioId
end
create trigger SaleOfItems
after delete on Holding
for each row
begin
Deletion of Items ( symmetric to the previous trigger )
declare lastquote currency;
select Value into lastquote from TradeValues
where ItemId = old.ItemId and Timestamp = ( select max(Timestamp) from TradeValues
where ItemId= old.ItemId )
update MainPortfolio M
set Value = M.Value – old.Qty * lastquote
where M.PortfolioId = old.PortfolioId
update SubPortfolio S
set Value = S.Value – old.Qty * lastquote
where S.PortfolioId = old.PortfolioId
end
create trigger ChangeQtyOfItems
after update of Qty on Holding
for each row
begin
Change of Qty (logical union of the previous two triggers)
declare lastquote currency;
select Value into lastquote from TradeValues
where ItemId = new.ItemId and Timestamp = ( select max(Timestamp) from TradeValues
where ItemId=old.ItemId )
update MainPortfolio M
set Value = M.Value + ( new.Qty – old.Qty ) * lastquote
where M.PortfolioId = old.PortfolioId
update SubPortfolio S
set Value = S.Value + ( new.Qty – old.Qty ) * lastquote
where S.PortfolioId = new.PortfolioId
end
New Portfolios are always inserted with Value set to 0, which means (for consistency) that they are inserted before
any Item is added to them (via insertions into Holding), and, symmetrically, it is reasonable to assume that no
portfolio is deleted before having removed all the Items it holds (thus the previous triggers set the Value to 0
automatically). This is consistent with a (pseudo-)referential integrity from PortfolioId in Holding to the IDs of Mainand Sub- portfolios. The only reasonable change in the hierarchy that needs to be specifically addressed is the
change of ownership (update of PortfolioUp or OwnerCustomer), in case there is a direct transfer of a portfolio.
create trigger ChangeOfOwnership
after insert on Holding
for each row
begin
declare lastquote currency;
select Value into lastquote from TradeValues
where ItemId = new.ItemId and Timestamp = ( select max(Timestamp) from TradeValues
where ItemId=new.ItemId )
update MainPortfolio M
set Value = M.Value + new.Qty * lastquote
where M.PortfolioId = new.PortfolioId
update SubPortfolio S
set Value = S.Value + new.Qty * lastquote
where S.PortfolioId = new.PortfolioId
end
create trigger ChangeOfContainment
B. XML (9 p.)
<!ELEMENT Root ( Member* )>
<!ELEMENT Member ( FirstName, LastName, Email, Telephone, Performance* )>
<!ELEMENT Performance ( Marathon, BibNumber, Time?, PositionInRanking? )>
<!ELEMENT Marathon ( Date, City ) >
<!ELEMENT Time ( Hours, Minutes, Seconds ) >
The DTD above describes a club of amateur marathon runners. The profiles of members include the list of all their
participations and performances. Non-completed races do not have a time specifications, and all performances may
lack of an indication of the position in the final ranking. Also note that the performance records the hours, minutes
and seconds separately (as integers). Unspecified elements only contain PCData. Extract in XQuery:
(4 p.) The best performance of each member (i.e., the shortest recorded completion time), listing the members in
performance order (from the one with the “best personal best” to the one with the “worst personal best”.
for $m in //Member
let $best := min( for $t in $m/Performance/Time
return $t/Seconds + $t/Minutes*60 + $t/Hours*60*60 )
order by $best
return <Member>
{ $m/FirstName, $m/LastName }
<PersonalBest> { $best } </PersonalBest>
</Member>
more readable
{ $m/Performance/Time[ Seconds + Minutes*60 + Hours*60*60 = $best ]/* }
Or, more directly (and precisely):
for $m in //Member
let $RankedPerformances := for $t in $m/Performance/Time
order by xs:integer($t/Hours), xs:integer($t/Minutes), xs:integer($t/Seconds)
return $t
order by $RankedPerformances[1]/Hours, $RankedPerformances[1]/Minutes, $RankedPerformances[1]/Seconds
return <Member>
{ $m/FirstName, $m/LastName }
<PersonalBest> { $RankedPerformances[1] } </PersonalBest>
</Member>
(5 p.) The members who have run the highest number of marathons together with other members. Each member has
a number of marathons in common with other members of the club (possibly 0, if he was always “alone”).
Some members have therefore a “maximum” number of participations in common with other members.
Consider de-composing the task into simpler tasks (let clauses and/or functions), to gain readability.
define local:function runs_in_common( $email as Element ) as xs:Integer {
count( for $m in $email/../Performance/Marathon
let $c := $m/City
let $d := $m/Date
where 0 < count( //Marathon[ ./../../Email != $email and ./City = $c and./ Date = $c] )
return $m )
}
Occurrences of the same marathon in
the profile of other runners
let $maxRunsInCommon := max( for $m in //Member
return runs_in_common( $m/Email ) )
return <SocialRunners MarathonsTogether = “{$maxRunsInCommon }”>
{ //Member[ runsincommon( ./Email = $maxRunsInCommon ) ] }
</SocialRunners>
C. Concurrency Control (6 p.)
A TS-multi concurrency control system may require "pre-writes", i.e., all transactions that intend to execute a write
operation must ask, at the beginning of the transaction, for exclusive locks on all the resources that are to be written.
- Show an example of use of such pre-writes.
- Discuss the impact of pre-writes on the number of killed transactions.
- Discuss the potential impact on deadlocks (Higher/lower frequency? No impact?).
D. Physical Databases (6 p.)
A table CUSTOMER(SSN, LastName, FirstName, BirthDate), with 40K tuples is stored in a primary hash structure
built on SSN, with 8K blocks and filling factor below 50%. A table PURCHASE(CSSN, ProductId, Date, Qty, Cost),
with 200K tuples in 30K blocks, is primarily stored in a sequentially-ordered structure, in Date order. CUSTOMER also
has three secondary B+ indexes (on SSN, LastName, and BirthDate, all with F=40 in average), while PURCHASE has
a B+ index (3 levels, 4K blocks) and a secondary hash index, both built on CSSN (the hash function is the same as
for the SSN in CUSTOMER). Assuming that val(LastName) = 10K, val(BirthDate) = 4K, and only 1% of the customers
are older than 60, estimate the “minimum” execution cost (in terms of i/o operations) for the following join queries,
briefly describing the chosen “optimal” query plan:
1.
select *
from Customer join Purchase
on SSN = CSSN
where
FirstName
<>
“Robert”
and LastName <> “Smith”
from Purchase )
2.
select LastName, FirstName
from Customer join Purchase
SSN=CSSN
where Date = ( select min( Date )
3.
on
select sum(Cost) as ElderlyRevenues
from Customer join Purchase on SSN=CSSN
where BirthDate < 3/3/1955
1.
The where condition does not help, as the number of tuples to be discarded based on the inequality predicate is
negligible. We therefore address it as a full join. Three options have a similar cost. Options A1 and A2 dominate a
standard nested loop approach, due to the availability of many secondary indexes.
Option A: scan & lookup
A1: scan Customer, lookup in Purchase: full scan of Customer followed by a lookup on Purchase for each
Customer, using the hash-based index to identify the 200K/40K average transactions per customer:
c1_A1
= 8K blocks in Customer + 40K tuples in Customer  ( 1 block in the hash index + 5
pointers/value ) =
= 8K + 40K  6 = 248 K
A2: scan Purchase, lookup in Customer: full scan of Purchase followed by a lookup on Customer for each tuple,
using the primary hash to identify the customer:
c1_A2
230 K
= 30K blocks in Customer + 200K tuples in Customer  ( 1 block in the hash ) = 30K + 200K =
Option B: HASH-JOIN
For each couple of paired clusters:
1 block (the primary Customer block) +
1 block (the block of the secondary index) + 200K/8K pointers/value
c1_B = 8 K clusters  ( 1 + 1 + 25 ) = 216 K
2.
The minimum date (that of the oldest purchases) is the date of the first tuples in the first blocks of the table (tuples
are stored in Date order!). The cost of extracting these N t tuples contained into N b blocks is Nb. In order to
complete the join, Nt customers must be extracted, with a unitary cost each, using the hash on SSN. The overall
cost is
c2 = Nb + Nt
(presumably small, 2 <= c2 < 100)
if, as initially planned, val(Date)=4K, we have 50 purchases/day, probably also on the first day, so
3.
The selectivity of the predicate on Customer tells us that 1% = about 400 customers need to be extracted,
scanning the initial part of the B+ index on BirthDate, amounting to approximately 400 i/o operations.
Access to the purchases, via the hash index, costs
400 customers  ( 1 accesses to the index block + 5 purchases/customer ) = 2400
c3 = 400 + 2400 = 2800  2.8 K