Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University) Agenda Trust and Relevance based Ranking of Web Databases for the Deep Web. Ad-Ranking Considering MutualInfluences. Optimal Ad Ranking for Profit Maximization Deep Web Integration Problem Millions of Databases Containing Structured Tuples Mediator Uncontrolled Collection of Redundant Information Web DB Web DB Web DB Web DB Web DB Deep Web Source Selection in Deep Web Given a user query, select a subset of sources to provide most relevant and trustworthy answers. Trustworthiness: Degree of Belief in the correctness of the data Relevance: Degree by which the data satisfies the information needs of the user. Search Results must be Trustworthy and Relevant. Surface web Search combines hyper-link based PageRank and Relevance to Assure trust and relevance of results. Source Agreement Observations Many Sources Return Answers to the Same Query. Comparison of Semantics of the answers is facilitated by structure of the tuples Idea: Compare Agreement of Answers Returned by Different Sources to Assess the Reputation of Sources! Agreement Based Relevance and Trust assessment May be intuitively understood as a meta-reviewer assessing quality of a paper based on agreement between primary reviews. Reviewers agreed upon by other reviewers are likely to be relevant and trustworthy. Agreement Implies Trust & Relevance Probability of Agreement or Two independently selected irrelevant/false tuples 1 Pa (r1, r 2) |U | Probability of Agreement or two independently picked relevant and true tuples is Pa ( g1, g 2) 1 | RT | U RT Pa ( g1, g 2) Pa (r1, r 2) Computing Agreement between Sources Closely Related to Record Linkage Problem for Integration of databases without common domains (Cohen 98). We used a Greedy matching between tuples using Jaro-Winkler similarity with SoftTF-IDF, since this measure performs best for named entity matching (Cohen et al. 03) Agreement computed using top-5 answer tuples to sample queries (200 queries each domain). The computation complexity is O(V 2 k 2 ) ; where V is number of data sources, using top-k answers. Representation: Agreement Graph W ( S 1 S 2) (1 ) 0.78 S3 S1 0.86 0.22 A( R1, R 2) | R2 | where induces the smoothing links to account for the unseen samples. R1, R2 are the result sets of S1, S2 . 0.4 0.14 0.6 S2 Link Semantics from Si to Sj with weight w: Si acknowledges w fraction of tuples in Sj Sample agreement graph for the book sources. Calculating SourceRank How do I Search using the agreement graph? 1. Start on a random node 2. If he likes the result, randomly traverse a link, with a probability proportional to its weight to search an agreed database. 3. If he does not like the node, restart the search traversing a smoothing link. This is a Weighted Markov Random Walk. The visit probability of the searcher for a database is given by the stationary visit probability of the random walk on the database vertex. SourceRank is equal to this stationary visit probability of the random walk on the database vertex. Combining Coverage and SourceRank Coverage of a set of tuples T w.r.t a query q C (T | q) R(t | q) tT Coverage is calculated using sample queries, and we used Jaro-Winkler with SoftTF-IDF similarity between the query and the tuple as the relevance measure. We combine the Coverage and SourceRank as Score Coverage (1 ) SourceRank Databases are ranked based on this Score, with 0.5. Evaluations and Results Evaluated in movies and books domain web databases listed in UIUC TEL-8 repository, twenty two from each domain. Evaluation Metrics 1. 2. 3. 4. Ability to remove closely related out of domain Sources. Top-5 precision. (relevance evaluation) Ability to remove corrupted sources (trustworthiness) Time to Compute the Agreement Graph 1. Ranks of Out of Domain Sources Source 1 Non-Topical Source Source 2 Source 3 Source 4 0 ←Rank of the Source 2 4 6 8 10 12 14 16 18 20 Coverage Combined SourceRank Source 5 2. Top-5 Precision-Movies Movies Top-4 Source Selection Movies Top-8 Source Selection 0.5 0.4 0.3 0.2 0.1 36% Precision→ Precision→ 0.5 0.4 0.3 0.2 0.1 0 40% 2. Top-5 Precision-Books Top-4 Source Selection Top-8 Source Selection 0.4 0.4 0.35 0.35 0.3 Precision→ Precision→ 0.3 0.25 0.25 0.2 0.15 0.1 0.2 0.05 0 0.15 Coverage Source Rank Combination Coverage Source Rank Combination 3. Trustworthiness of Source Selection Trustworthiness-Movies Trustworthiness-Books SourceRank Coverage 45 40 Percentage of Decrease in Position→ Percentage of Decrease in Position→ 50 35 30 25 20 15 10 5 0 0 0.1 0.2 0.3 Corruption Level→ 0.4 0.5 35 SourceRank Coverage 30 25 20 15 10 5 0 0 0.1 0.2 0.3 0.4 0.5 Corruption Level→ 4. Time to Compute Agreement Graph Time Vs number of Sources Time Vs top-k tuples 70 140 Books Movies 50 Books Movies 120 Time (Seconds)→ Time (Seconds)→ 60 40 30 20 10 100 80 60 40 20 0 5 10 15 Number of Sources→ 20 0 3 6 9 12 15 18 k (top-k tuples)→ 21 System Implementation Searches Online books and movies Web Databases http://rakaposhi.eas.asu.edu/scuba System Architecture •Implemented as a web application. •Searches real web databases Agenda Trust and Relevance based Ranking of Web Databases for the Deep Web. Ad-Ranking Considering MutualInfluences. Optimal Ad Ranking for Profit Maximization Ad Ranking: State of the Art Sort by Sort by Bid Amount x Relevance Bid Amount Ads are Considered in Isolation, Ignoring Mutual influences. We Consider Ads as a Set, and ranking is based on User’s Browsing Model Optimal Ad Ranking for Profit Maximization Mutual Influences Three Manifestations of Mutual Influences on an Ad a are 1. Similar ads placed above a Reduces user’s residual relevance of the ad 2. Relevance of other ads placed above a a User may click on above ads may not view the ad 3. Abandonment probability of other ads placed above a User may abandon search and not view the ad a Optimal Ad Ranking for Profit Maximization a User’s Browsing Model If a 2 is similar to a1 residual relevance of a 2 •User Browses Down Staring at the goes down and abandonment probabilities goes up. first Ad • At every Ad he May Click the Ad With Relevance Probability R(a) P(click (a) | view(a)) Abandon Browsing with Probability Goes Down to next Ad with probability Process Repeats for the Ads Below With a Reduced Probability Optimal Ad Ranking for Profit Maximization Expected Profit Considering Ad Similarities Considering Bid Amounts ($( ai )), Residual Relevance ( R ( ai ) ), abandonment probability ( ( ai ) ), and similarities the expected profit from a set of n ads is, i 1 n Expected Profit = $(a )R (a ) 1 R (a ) (a ) i i 1 r i r j j j 1 THEOREM: Optimal Ad Placement Considering Similarities between the ads is NP-Hard Proof is a reduction of independent set problem to choosing top k ads considering similarities. Optimal Ad Ranking for Profit Maximization Expected Profit Considering other two Mutual Influences (2 and 3) Dropping similarity, hence replacing Residual Relevance ( Rr ( ai )) by Absolute Relevance ( R ( ai )), i 1 n Expected Profit = $(a )R(a ) 1 R(a ) (a ) i i 1 i j j j 1 Ranking to Maximize This Expected Profit is a Sorting Problem Optimal Ad Ranking for Profit Maximization Optimal Ranking Rank ads in Descending order of: $(a) R(a) RF (a) $(a) (a) The physical meaning RF is the profit generated for unit consumed view probability of ad Ads above have more view probability. Placing ads producing more profit per consumed view probability is intuitively justifiable. (Refer Balakrishnan & Kambhampati (WebDB 08) for proof of optimality) Optimal Ad Ranking for Profit Maximization Comparison to Yahoo and Google Yahoo! Google Assume abandonment probability is zero Assume (a) k R(a) (a) 0 where k is a constant for all ads $( a ) R (a ) (a) $( a ) R (a ) k $(a) R(a) (a) $(a) R(a) Assumes that the user has infinite patience to go down the results until he finds the ad he wants. Assumes that abandonment probability is negatively proportional to relevance. Optimal Ad Ranking for Profit Maximization Quantifying Expected Profit 40 RF Bid Amount x Relevance Bid Amount 35 Expected Profit 30 Abandonment Probability Bid Amount Only strategy becomes optimal at (a) 0 Uniform Random as 0 (a) Relevance Difference in profit between RF and competing strategy is significant 25 20 Uniform Random as R(a) 1 Number of Clicks Zipf Random with exponent 1.5 15 35.9% Proposed strategy gives maximum Bid Amounts profit for the entire Uniform Random range 45.7% 10 5 0 $(a) 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Optimal Ad Ranking for Profit Maximization Contributions SourceRank Agreement based computation of relevance and trust of deep web sources. System implementation to search the deep web, and formal evaluation. Ad-Ranking Extending Expected Profit Model of Ads Based on Browsing Model, Considering Mutual Influences Optimal Ad Ranking Considering Mutual Influences Other than Ad Similarities. Thank You! Optimal Ad Ranking for Profit Maximization Deep Web Integration Roadmap
© Copyright 2026 Paperzz