CS-791/891--Preservation of Digital Objects and Collections
Estimating Frequency of Change
Written By
Junghoo Cho, Hector Garcia-Molina
Presented By
Suman Kumar Narsing.
The topics to be dealt in this are:
1.
INTRODUCTION
2.
TAXONOMY OF ISSUES
3.
PRELIMINARIES
4.
ESTIMATION OF FREQUENCY: EXISTENCE OF CHANGE
5.
ESTIMATION OF FREQUENCY: LAST DATE OF CHANGE
6.
EXPERIMENTS
7.
CONCLUSION
1. INTRODUCTION:
• Now many data sources are available online.
• These are autonomous and are updated independently.
• Ex: CNN & NY Times, online stores etc.
• As sources updated autonomously, clients don’t know exactly
when and how the sources change often.
HOW TO IMPROVE THEIR EFFECTIVENESS:
Improving a Web crawler.
Improving the update policy of a data warehouse.
Improving Web caching.
Data mining.
HOW TO ESTIMATE THE FREQUENCY OF
CHANGE:
Incomplete change history.
Irregular access interval.
Difference in available information.
EXAMPLE 1:
A web crawler accessed a page on a daily basis for 10
days, and it detected 6 changes. From this data, the
Change frequency is = 6/10 = 0.6 times a day.
EXAMPLE 2:
In a web cache a user accessed a web page for 4 times at
day1, day2, day 7 and day 10. Web page had changes in it
on day 2 and day 7. Then what does this imply? Does the
page change every 10/2 = 5 days on an average?
2. TAXONOMY OF ISSUES:
What do we mean by “ Change of an Element”?
What does “Element” mean?
What does “Change” mean?
Element – “Web page” and any Change is – any modification to
the page.
Developing Taxonomy:
• How do we trace the history of an element?
Passive monitoring
Active monitoring
Regular interval
Random interval
•What information do we have?
Complete history of changes.
Last date of change
Existence of change
Developing Taxonomy: (Contd..)
•How do we use estimated frequency?
Estimation of frequency.
Categorization of frequency
3. PRELIMINARIES:
Poisson Process: The model for the changes of an element.
The no. of events expected to occur in a unit interval:
E[X(t+1)-X(t)] = ∑kPr{X(t+1)-X(t)=k}= ∑k(λk e-λ /k|)= λ
X(t)—No. of occurrences of a change in interval (0,t]
λ – Poisson process of rate or frequency.
For s>= 0 and t<0, the random variable X(s+t)-X(s) has the Poisson probability
distribution
Pr{X(s+t)-X(s) = k} = (λt)k e-λt /k! for k =0,1…….
Graphs explaining the importance of λ:
Estimator: λ = X/T;
The distribution of λ determines how
effective the estimator λ is:
a) Bias.
b) Efficiency.
c) Consistency.
4. ESTIMATION OF FREQUENCY: EXISTENCE
OF CHANGE:
Total time elapsed =, T = nI = n/f;
Assuming estimator from now as frequency ratio,
r = λ/f = 1/f(X/T) = X/n.
Measuring X repeated accesses to the element:
Is the estimator r biased?
Theorem 4.1 The expected value of the estimator r is
E[r] = 1 – e -r
Is the estimator r consistent?
How efficient is the estimator?
Corollary 4.2 The standard deviation of the estimator r = X/n
is calculated.
5. ESTIMATION OF FREQUENCY: LAST DATE
OF CHANGE
Let T be the time to the previous event in a Poisson process
with rate λ. Then the expected value of T is E[T] = 1/ λ.
The new estimator consists of three functions.
a) Init()
b) Update()
c) Estimate()
The estimator using last modified changes:
Init() /* initialize variables */
N = 0; /* total number of accesses */
X = 0; /* number of detected changes */
T = 0; /* sum of the times from changes */
Update(Ti, Ii) /* update variables */
N = N + 1;
/* Has the element changed? */
If (Ti < Ii) then
/* The element has changed. */
X = X + 1;
T = T + Ti;
else
/* The element has not changed */
T = T + Ii;
Estimate() /* return the estimated lambda */
return X/T;
6. EXPERIMENTS:
Non-Poisson model.
Improvement from last modification date.
Effectiveness of estimators for real Web data.
COMPARISION OF NAÏVE ESTIMATOR AND OURS
Application to a Web crawler:
- Uniform Policy:
- Naïve Policy.
- Our Policy.
7. CONCLUSION:
Future work:
• Adaptive Scheme:
• Changing λ
REFERENCES:
Junghoo Cho, Hector Garcia-Molina "Estimating
frequency of change." ACM Transactions on Internet
Technology, 3(3): August 2003.
http://oak.cs.ucla.edu/~cho/papers/cho-freq.pdf
THANK YOU
© Copyright 2026 Paperzz