An Adaptive Algorithm for Selecting Profitable Keywords for Search-Based Advertising Services Paat Rusmevichientong David P. Williamson Cornell University Cornell University [email protected] [email protected] ABSTRACT Increases in online search activities have spurred the growth of search-based advertising services offered by search engines. These services enable companies to promote their products to consumers based on their search queries. In most search-based advertising services, a company selects a set of keywords, determines a bid price for each keyword, and designates an ad associated with each selected keyword. When a consumer searches for one of the selected keywords, search engines then display the ads associated with the highest bids for that keyword on the search result page. A company whose ad is displayed pays the search engine only when the consumer clicks on the ad. With millions of available keywords and a highly uncertain clickthru rate associated with the ad for each keyword, identifying the most effective set of keywords and determining corresponding bid prices becomes challenging for companies wishing to promote their goods and services via search-based advertising. Motivated by these challenges, we formulate a model of keyword selection in search-based advertising services. We develop an algorithm that adaptively identifies the set of keywords to bid on based on historical performance. The algorithm prioritizes keywords based on a prefix ordering – sorting of keywords in a descending order of profit-to-cost ratio. We show that the average expected profit generated by the algorithm converges to near-optimal profits. Furthermore, the convergence rate is independent of the number of keywords and scales gracefully with the problem’s parameters. Extensive numerical simulations show that our algorithm outperforms existing methods, increasing profits by about 7%. 1. INTRODUCTION Search-based advertising services offered by search engines enable companies to promote their products to targeted groups of consumers based on their search queries. Examples of search-based advertising services include Google Adwords (http://www.google.com/adwords), Yahoo! Spon- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. sored Search (http://searchmarketing.yahoo.com/srch/ index.php), and Microsoft’s forthcoming MSN AdCenter (http://advertising.msn.com/searchadv/). In most searchbased advertising services, a company wishing to advertise sets a daily budget, selects a set of keywords, and determines the bid price for each selected keyword. When consumers search for one of the selected keywords, search engines then display the ads associated with the highest bids for that keyword on the search result page. A company whose ad is displayed pays the search engine only when the consumer clicks on the ad. Note that the ad is not displayed if the company’s spending that day has exceeded its budget. Figure 3 in Appendix A shows examples of the ads shown to a user under the search-based advertising programs offered by Google and Yahoo!. Details of the actual cost of each keyword, of how a search query matches to a keyword, and of how different bid amounts translate to the position on the search result page vary from one search engine to another. Under the Google Adwords program, for instance, each click will cost the ad’s owner an amount equals to the next highest bid plus one cent. Moreover, when determining the ad’s placement on the search result page, the Google Adwords program takes into account not only the bid amount, but also the ad’s popularity and clickthrus. On the other hand, under Yahoo! Sponsored Search program, the highest bidder receives the top spot. With millions of available keywords, a user of search-based advertising services faces challenging decisions. The user not only has to identify an effective set of keywords, she also needs to develop ads that will draw consumers and to determine a bid price for each keyword that balances the tradeoffs between the costs and profits that may result from each click on the ad associated with the keyword. Furthermore, much of the information that is required to compute these tradeoffs is not known in advance and is difficult to estimate a priori. For instance, the clickthru probability associated with the ad for each keyword – the probability that the consumer will click on the ad once it appears on the search result page – varies dramatically depending on the keyword, the ad itself, and the position of the ad on the search result page. In this paper, we consider one of the challenges faced by users of search-based advertising services: given a fixed daily budget and unknown clickthru probabilities, develop a policy that adaptively selects a set of keywords in order to maximize total expected profits. We assume that we have N keywords indexed by 1, 2, . . . , N . For any A ⊆ {1, . . . , N }, let the random variable ZA denote the profits from selecting keywords in the set A. A policy φ = (A1 , A2 , . . .) is a sequence of random variables, where for all t, At denotes the subset of keywords that will be selected in period t; At is allowed to depend on the observed outcomes in the previous t − 1 periods. For any T ≥ 1, we aim to find a policy φ = (A1 , A2 , . . .) that maximizes the total expected profits PT over T periods, i.e. t=1 E [ZAt ] . Identifying profitable keywords requires knowledge of the clickthru probability of the ad associated with each keyword. Unfortunately, the clickthru rate is generally not known in advance and we can obtain an estimate of its value only by selecting the keyword and observing the resulting displays of the ad – commonly referred to as its impressions – and the number of clicks on it. Since we pay for each click on the ad, this process can potentially result in significant costs, yet may offer an opportunity to discover potentially profitable keywords. We thus must balance the tradeoffs between selecting keywords that seems to yield high average profits based on past performance and selecting previously unused keywords in order to learn about their clickthru probabilities. This is usually cast in the machine learning literature as balancing “exploitation” (of known good options) and “exploration” (of unknown options that might be better than known options). We model the problem as follows: we assume that in every time period, some number of queries arrive, where the number is independent and identically distributed. At the beginning of each time period, we must specify a subset of keywords on which we will bid. Then queries arrive sequentially; keyword i is queried with probability λi . If we have enough money remaining in our budget, our ad is displayed. With some probability pi , the user clicks on our ad; we receive profit πi and must pay cost ci . We assume that the probabilities λi , costs ci , and profits πi are known; justification of our ability to know the first two quantities from publicly available sources is given in Section 2.1. The probabilities pi are unknown, and we learn them over time. Our goal is to give an algorithm that converges to near-optimal profits as the number of time periods tends towards infinity. We begin by considering the static case – when the probabilities pi are known. This problem is related to the wellknown stochastic knapsack problem and is NP-complete. Following the approximation algorithms of Sahni [23] for the standard knapsack problem and the nonadaptive algorithm of Dean, Goemans, and Vondrak [7] for the stochastic knapsack problem, we show that we can obtain a near-optimal approximation algorithm by considering prefix-orderings. We order the keywords in decreasing order of profit-to-cost ratio; that is, π1 /c1 ≥ π2 /c2 ≥ · · · ≥ πN /cN . Then our algorithm chooses a set of the form {1, . . . , `} for some integer `; we call such a set a prefix of the keywords. We show that if the cost of each item is sufficiently small compared to our budget, if the expected number of arrivals of any given keyword is not too large, and if the expected number of searched keywords is close to its mean, then in expectation our algorithm returns a near-optimal solution, where the closeness to optimality depends on how well these assumptions are satisfied. We give evidence that in practice these assumptions are quite likely to be true. Note that the static case of our problem is somewhat different than the stochastic knapsack problem. In that problem, profits are known, but sizes are drawn from arbitrary known distributions and items can be placed in the knap- sack in a specified order. Here costs (corresponding to item sizes) are deterministic and known, but query arrival (corresponding to item placement) is random and not under our control. Our result in the static case guides the development of an adaptive algorithm for the case when the clickthru probabilities pi are unknown and we must learn them over time. In each time period, we either choose a random prefix of keywords, or a prefix of keywords using our algorithm from the static case applied to our estimates of the pi from the last time period. We show that, averaged over the number of time periods T , in expectation our algorithm converges to near-optimal profits, where the performance guarantee is almost the same as that of the static case modulo some terms that vanish as T goes to infinity. This result should be compared to traditional multi-armed bandit algorithms [3, 4, 8, 15, 16, 17]. These algorithms must choose one of N slot machines (or “bandits”) to play in each time step. Each play of a bandit will yield a reward, whose distribution is not known in advance. Several algorithms have been proposed for the multi-armed bandit problem whose average expected reward also converges to the optimal. If we associate each bandit with a prefix of the keywords, we can show that in expectation these algorithms converge to near-optimal profits with the same performance guarantee as our static case. The convergence rates of these algorithms, however, depend on the number of possible keywords N , which might be extremely large in our case. In contrast, the convergence rate of our prefix-based adaptive algorithm is independent of N ; it instead depends primarily on the value of the largest ` such that the expected cost of the prefix {1, . . . , `} does not exceed our budget. We believe that in practice the value of ` will be significantly smaller than the total number of keywords N . Furthermore, our dependence on T is much better than the multi-armed bandit algorithms, so that our algorithm converges faster. Finally, when compared to two standard multi-armed bandit algorithms, extensive numerical simulations show that in as little as 30 time periods our algorithm outperforms these two algorithms, yielding significantly better profits. In one set of simulations, our profits were about 7% better, and in another about 20% better. To the best of our knowledge, this work is the first publicly available study that addresses the problem of identifying profitable sets of keywords, realistically taking into account the uncertainty of the clickthru probability and the need to estimate its value based on historical performance. While there has been a great deal of interest from researchers in the area of search-based advertising services, much of it has focused on the design of auctions and payment mechanisms [1, 5, 6, 9, 11, 18, 21]. Our paper is one of only a few that focuses on the users of search-based advertising. Kitts et al. [12] consider the problem of finding optimal bidding strategies under a variety of objectives, but only in our static setting in which the clickthru probabilities are known in advance. We believe that removing this assumption is a crucial step towards a usable algorithm. Furthermore, Kitts et al. do not offer an algorithm for finding an optimal set of keywords when the objective is to maximize expected profit subject to a budget constraint, which again is needed given the requirements of using a service such as Google Adwords. We have attempted to construct a model of the problem that is simultaneously tractable and realistic, justifying our choices with real data whenever possible. Our resulting algorithm is simple to state and code, and provides good results in simulations, beating out several potential competitors. Our paper is structured as follows. We present our model in Section 2. We analyze the static case in Section 3. We discuss the connection of our problem with multi-armed bandits in Section 4.1, then give our algorithm in Section 4.2. We give the results of an experimental study comparing our algorithm with multi-armed bandit algorithms in Section 5. We discuss some possible extensions of our model in Section 6. Because of space constraints, many proofs and supporting figures have been placed in the appendix. We conclude our introduction by mentioning some other related literature. During the past few years, there has been resurgent interest in the study of online decision making with extremely large decision spaces [2, 13, 14, 24]. However, the algorithms developed in these papers are problemspecific and they are not applicable to the problem considered here. We should note that Mehta et al. [19] have considered the problem faced by a search engine on how to optimally match incoming queries with the bid placed by advertisers. This problem is very different from the one studied in this paper; our focus is on the company trying to advertise rather than on the search engine providing the advertising service. CPC associated with each keyword using publicly available information provided by search engines. At the beginning of each time period, we start with a balance of B dollars and must select a set of keywords, which will determine the set of ads that will be shown to consumers. As a search query arrives, if the query matches with one of the selected keywords and we have enough money remaining in the account, the ad associated with the keyword appears on the search result page. Each display of an ad is called an impression. If the consumer then clicks on the ad, we receive a profit, pay the cost associated with the keyword to the search engine, update our remaining balance, and wait for an arrival of another search query. The process terminates once all search queries have arrived for the time period. By appropriate scaling, we may assume without loss of generality that the beginning balance in each time period is $1. Let {1, . . . , N } denote the set of possible keywords on which we can bid. Let 1(·) denote the indicator function and let the random variable BrAt indicate the remaining balance when the rth search query appears and we have decided to select keywords in the set At ⊆ {1, . . . , N } in period t t. Let Xri be a Bernoulli random variable with parameter pi indicating whether or not the consumer clicks on the ad associated with keyword i during the arrival of the rth search query in period t. If we select the set of keywords At ⊆ {1, . . . , N } in period t, the expected profit E [ZAt ] is given by 2. 3 2 St X N X t A t t πi 1 i ∈ At , Qr = i, Br ≥ ci , Xri = 1 5 , E [ZAt ] = E 4 PROBLEM FORMULATION AND MODEL DESCRIPTION In this section, we develop a model of search-based advertising services and formulate the problem of identifying profitable sets of keywords. Sections 2.1 and 6 provide additional discussion of the problem formulation, justification of the model’s assumptions, and discussion of the publicly available data sources that can be used to estimate some of the model’s parameters. Assume that we have a probability space (Ω, F , P) and N keywords indexed by 1, 2, . . . , N . For any t ≥ 1, let S t denote the total number of search queries that arrive in period t. We assume that S 1 , S 2 , . . . are independent and identically distributed random variables with mean 1 < µ < ∞ and the distribution of S t is known in advance. We assume that, in each time period, the search queries arrive sequentially, and each query can correspond to any one of the N keywords. For any t ≥ 1 and r ≥ 1, let the random variable Qtr denote the keyword associated with the rth search query in period t. We assume that Qtr : t ≥ 1, r ≥ 1 are independent and identically distributed random P variables and P Qtr = i = λi for i = 1, . . . , N, where N i=1 λi = 1. We assume that λi ’s are known in advance. In Section 2.1, we will show an approach for estimating the distribution of S t and the parameters λi using publicly available information provided by search engines. For each keyword 1 ≤ i ≤ N , let 0 ≤ pi ≤ 1, ci > 0, and πi ≥ 0 denote the clickthru probability, the ad cost or cost-per-click (CPC), and the expected profit (per click) of keyword i, respectively. We will assume that the CPC and the expected profits (ci and πi ) are known in advance and remain constant. Although the CPC of each keyword depends on the bidding behaviors of other users, we believe this requirement is a reasonable model in the short-term. In Section 2.1, we will provide additional discussion of this assumption and show how we can approximate the current r=1 i=1 (1) t We assume that, for any i, the random variables Xri : t ≥ 1, r ≥ 1) are independent and identically distributed and are independent of the arrival process. The above expression for the expected profit reflects our model of the search-based advertising dynamics. Under this model, we receive a profit from keyword i during the arrival of the rth search query in period t if and only if (a) We bid on keyword i at the beginning of period t, i.e. i ∈ At ; (b) The rth query corresponds to keyword i, i.e. Qtr = i; (c) We have enough balance in the account to pay for the cost of the keyword if the consumer clicks on the ad, i.e. BrAt ≥ ci ; t (d) The consumer actually clicks on the ad, i.e. Xri = 1. As mentioned in the previous section, the clickthru rate pi for any keyword i is generally not known in advance and we can obtain an estimate of its value only by selecting the keywords and observing the resulting impressions and clicks. Therefore, we must balance the tradeoffs between choosing keywords that seems to yield high average profits based on past performance and trying previously unused keywords in order to learn about their clickthru probabilities. To analyze the tradeoffs between exploitations and explorations, let us introduce the following notation. Let (Ft : t ≥ 1) denote a filtration on the probability space (Ω, F , P), where Ft corresponds to all events that have happened up to time t. A policy φ = (A1 , A2 , . . .) is a sequence of random variables, where At ⊆ {1, 2, . . . , N } corresponds to the set of keywords selected in period t. We will focus exclusively on a class of non-anticipatory policies defined as follows. Definition 1. A policy φ = (A1 , A2 , . . .) is non-anticipatory if At is Ft -measurable for all t ≥ 1. Let Π denote the collection of all non-anticipatory policies. For any T ≥ 1, we aim to find a non-anticipatory policy thatP maximizes the expected profit over T periods, i.e. supφ∈Π Tt=1 E [ZAt ] . In general, the above optimization is intractable and we aim to find an approximation algorithm that yields near optimal expected profit. 2.1 Model Calibration Although we aim to estimate adaptively the clickthru probability associated with the ad of each keyword based on past performance, our formulation assumes that we can estimate a priori the arrival process, the query distributions, and the current CPC of each keyword. In this section, we identify publicly available data sources that might be used to estimate these parameters. Figure 4(a) in Appendix A shows the estimated number of search queries in Yahoo! in September 2005 related to the keyword “cheap Europe trip”. This data is obtained through the publicly available Keyword Selector tool provided by the Yahoo! Sponsored Search program (http://inventory. overture.com/d/searchinventory/suggestion/). Using this information, it might be possible to estimate the probability that each query will correspond to a particular keyword. Our model also assumes that the cost-per-click (CPC) associated with each keyword is known in advance. Although the actual CPC depends on the outcome of the bidding behaviors of other users, as indicated in Section 2, we believe that this requirement is a reasonable assumption in the short-term. Whereas the clickthru probability associated with the ad of each keyword can be estimated only by bidding on the keywords and observing the resulting impressions and clicks, it is possible to estimate the CPC associated with each keyword using data provided by search engines. For instance, Figure 4(b) shows an example of the current outstanding bids (as of 10/18/05) for the keyword “cheap Europe trip” under the Yahoo! Sponsored Search Marketing program. This information is publicly available through the Bid Tools offered by Yahoo! at http://uv.bidtool. overture.com/d/search/tools/bidtool/index.jhtml. Given the outstanding bids for a keyword, we might approximate the CPC of each keyword as the median bid price, the kth maximum bid price, or the average bid price. follows directly from the general expression given in Equation 1 in Section 2; we simply drop the superscript t denoting the time period. The following lemma allows us to simplify the objective function of the STATIC BIDDING problem; its proof appears in Appendix B. Lemma 1. For any A ⊆ {1, 2, . . . , N }, r ≥ 1, and keyword i ∈ A, n o n P Qr = i, BrA ≥ ci , Xri = 1S ≥ r = λi pi P BrA ≥ ci S ≥ r and E [ZA ] = 3.1 P i∈A πi λi pi P∞ r=1 E 1 BrA ≥ ci , S ≥ r o . Computational Complexity and Cost Assumptions Note that if pi = 1 for all i, then the consumer will always click on the ad. Thus, the problem becomes a variation of the stochastic knapsack problem, which is known to be NP-complete [7, 10]. Unless P = N P , it is thus unlikely that there exists a polynomial-time algorithm for solving the STATIC BIDDING problem. In this section, we identify special structure of our problem that will enable the development of an efficient approximation algorithm in Section 3.2. Throughout this paper, we will make the following assumptions on the ordering of keywords and their costs. Assumption 1. (a) The keywords are indexed such that π1 /c1 ≥ π2 /c2 ≥ · · · ≥ πN /cN . (b) PN i=1 ci pi λi > 1. (c) There exists k ≥ 1 and 0 ≤ α < 1 such that ci ≤ 1/k and λi µ ≤ kα for all i. Before we proceed to the statement of our theorem, let us first discuss each of these assumptions. The first two assumptions are without loss of generality. Assumption 1(a) assumes that the keywords are sorted by a prefix ordering – in a descending order of profit-to-cost ratio. Moreover, by adding fictitious keywords with Pzero profit, we can assume without loss of generality that N i=1 ci pi λi > 1, which is the statement of Assumption 1(b). Thus our key assumption is Assumption 1(c). This assumption places constraints on the magnitude of the cost 3. A STATIC SETTING: WHEN THE PI ARE and the expected number of arrivals associated with each keyword. The requirement that ci ≤ 1/k implies that each KNOWN click of the ad associated with keyword i will cost at most In this section, we will focus on the static setting, where 1/k fraction of the total budget. As demonstrated in Figthe clickthru probability associated with the ad of each keyure 5 in Appendix A, we believe that this assumption is word is known in advance. The structure of the policy found a reasonable requirement. Figure 5 shows the distribution in this section will guide the development of the adaptive of the median bid price for each of the 842 travel-related policy in the next section. keywords. These keywords were chosen based on the auSince we know the clickthru probability associated with thors’ experience in working with an online seller of travel the ad for each keyword, the problem reduces to the following static optimization, which we will call STATIC BIDpackages. Examples of the keywords include “hawaii vaDING problem. cation package”, “africa safari”, and “london discount airfare”. A complete list of 842 keywords used in this study Z∗≡ max E [ZA ] A⊆{1,...,N } ∼ # is available at http://www.orie.cornell.edu/ paatrus/ " S N keywordlist.txt. The data is collected over a 2-day period XX = max E πi 1 i ∈ A, Qr = i, BrA ≥ ci , Xri = 1 .(10/17/05-10/18/05) from the publicly available Bid Tool A⊆{1,...N } r=1 i=1 offered by the Yahoo! Sponsored Search program at http:// where E [ZA ] denotes the expected profit from selecting keyuv.bidtool.overture.com/d/search/tools/bidtool/index. words in A. The above expression of the objective function jhtml. As seen from the figure, the average median bid 3.2 In addition, for our approximation algorithm to perform well, the value α should be small. Recall from Assumption 1 that for any keyword i, λi µ ≤ kα . Thus, α measures the average number of search queries for each keyword i. As shown in Figure 6 in Appendix A, most keywords have an average of less than 20 queries per day, leading us to believe that this requirement should hold for a large proportion of keywords. Figure 1 shows the value of the performance bound given in Theorem 1 (with ρ = 1.0) for various values of α and k, with k ranging from 103 to 106 . We observe, for instance, that if k is 1000, α is 0.2, and ρ is 1, Theorem 1 shows that the algorithm is within 0.68 of the optimal profit. Performance Bound for Different Values of k and α (ρ = 1.0) 1 An Approximation Algorithm The main result of this section is stated in the following theorem, whose proof is given in Section 3.3. Theorem 1. Let k and α be defined as in Assumption 1. Suppose the probabilities pi are known and 1/k1−α ≤ 1 − 1/k − 1/k(1−α)/3 . Let IU be defined by ( IU = max ` : µ X̀ ci pi λi ≤ 1 − 1 1 − (1−α)/3 k k If ρ = E[min {S, µ}]/µ, P = {1, 2, . . . , IU }, and Z denotes the optimal profit, then 1 k(1+2α)/3 − 0.8 k = 104 0.7 0.6 k = 103 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 2 k(1−α)/3 0.5 0.6 0.7 0.8 0.9 1 α . ∗ k = 105 ) i=1 ρ 1− k = 106 0.9 P er fo r m an ce B o u n d price for a keyword is $0.30-$0.35. A daily budget of $300$350/day will translate to a value of k around 1000. Of course, as the daily budget increases, so will the value of k. The second part of Assumption 1(c) requires that the expected number of search queries for keyword i, λi µ, is at most kα for some 0 ≤ α < 1. Although the expected number of search queries for a keyword can vary dramatically, we believe that this assumption should hold for a large proportion of keywords, as shown in Figure 6 in Appendix A. Figure 6 shows the distribution of the average daily searches on Yahoo! for the same 824 travel-related keywords during September 2005. This information is obtained through the publicly available Keyword Selection tool offered by the Yahoo! Sponsored Search program at http://inventory. overture.com/d/searchinventory/suggestion/. As seen from the figure, for over 80% of the keywords, the average number of daily searches is at most 20 searches per day. If the value of k is about 1000, this translates to the value of α of around 0.43. Figure 1: Plot of the performance bound given in Theorem 1 for various values of k and α (with ρ = 1.0). Z ∗ ≤ E [ZP ] ≤ Z ∗ . Before we proceed to the proof of the theorem, let us discuss the implications of the above result. The above theorem shows that the quality of the prefix-based approximation algorithm depends on two factors. The first factor, E [min {S, µ}] /µ, measures how tightly the distribution of S concentrates around its mean. In the most important example where S follows a Poisson distribution, this ratio can be computed explicitly and it is given in the following lemma, whose proof appears in Appendix C. Lemma 2. If S is a Poisson random variable with mean √ µ > 1, then 1 − 3/ µ ≤ E [min {S, µ}] /µ ≤ 1. 3.3 Proof of Theorem 1 The proof of Theorem 1 relies on the following lemmas. The first lemma establishes an upper bound on the optimal expected profit in terms of the optimal value of a linear program. Lemma 3. Let Z LP be defined by ( Z LP ≡ max N X i=1 ∗ Then, Z ≤ Z LP N X πi λi pi zi ) ci λi pi zi ≤ 1, 0 ≤ zi ≤ µ, ∀i i=1 . Proof. Consider any A ⊆ {1, 2, . . . N }. Let the random variable CA denote the total cost associated with bidding on keywords in A. By definition, we know that CA ≤ 1 with probability 1, which implies that In most applications, the expected number of search queries µ will be very large, ranging from hundreds of thousands to over millions of queries per day. If the arrival follows a Pois" S N # XX son distribution, then we see that the ratio E [min {S, µ}] /µ A 1 ≥ E [CA ] = E ci 1 i ∈ A, Qr = i, Br ≥ ci , Xri = 1 will be very close to one. r=1 i=1 The second factor that influences the quality of our ap∞ h i X X proximation algorithm is the value of k and α. Since 0 ≤ ci λi pi E 1 BrA ≥ ci , S ≥ r = α < 1, as k – the relative value between the budget and r=1 i∈A the CPC for each keyword – increases, the expected profit will be close to the optimal. As indicated in the previous where P the last equality follows from Lemma P∞ 1. Note that for A section, we believe that this is a reasonable assumption. If any i, ∞ E 1 B ≥ c , S ≥ r ≤ i r r=1 r=1 E [1 (S ≥ r)] = the budget for a campaign ranges from $200-$400 per day E[S] = µ. and the maximum CPC for each keyword ranges from 30 to For anyP 1 ≤ i ≤ let 0 ≤ zi ≤µ be defined by zi = N, A 50 cents, this translates to a value of k from 400 to 1200. 1 (i ∈ A)· ∞ E 1 B ≥ ci , S ≥ r . Then, we have that r r=1 PN i=1 ci pi λi zi ≤ 1 and also that " E [ZA ] = E S X N X # πi 1 i ∈ A, Qr = i, BrA ≥ ci , Xri = 1 r=1 i=1 = X ∞ X πi λi pi h E 1 BrA ≥ ci , S ≥ r i = N X r=1 i∈A πi λi pi zi i=1 where the second inequality follows from Lemma 5. The third inequality follows from Lemma 2. The final inequality follows from Assumption 1(c), which implies that, for any keyword i, µci λi pi ≤ 1/k1−α ; it follows from the definition P U of IU that µ Ii=1 ci λi pi ≥ 1 − 1/k − 1/k(1−α)/3 − 1/k1−α . Since k ≥ 1 and 0 < α < 1, 1/k(1−α)/3 ≥ 1/k1−α , we have that Since A is arbitrary, the desired result follows. 1− The next lemma relates the optimal value Z LP of the linear program defined in Lemma 3 to the expected profits obtained when we select keywords according to the prefix ordering; its proof is in Appendix D. Lemma 4. Let H = {1, . . . , u} where u is any positive P integer such that 0 < µ ui=1 ci λi pi ≤ 1 − 1/k − 1/k(1−α)/3 , so that u ≤ IU . Let Z LP denote the optimal value of the linear program defined in Lemma 3. Then, X X πi λi pi ≥ i∈H ! ci λi pi Z LP ≥ i∈H X ! ci λi pi Lemma 5. Let H = {1, . . . , u} where u is any positive P integer such that 0 < µ uj=1 cj λj pj ≤ 1 − 1/k − 1/k(1−α)/3 ; so that u ≤ IU . Then, for any keyword i ∈ H and 1 ≤ r ≤ µ, ≥ µ P i∈H ci λi pi k 2 P 1 − µ i∈H ci λi pi k 1− 1− 1 1 − (1+2α)/3 . k ≥ Here is the proof of Theorem 1. Proof. Note that since by hypothesis 1/k1−α ≤ 1−1/k− 1/k(1−α)/3 , it follows that IU is well-defined. It follows from Lemma 5 that forany keyword i ∈ P and 1 ≤ r ≤ µ, P BrP ≥ ci S ≥ r E [ZP ] = X ≥ 1 − 1/k(1+2α)/3 . Therefore, πi λi pi X πi λi pi i∈P ≥ = ≥ ≥ n o n o P BrP ≥ ci , S ≥ r r=1 i∈P ≥ ∞ X 1− bµc X P BrP ≥ ci , S ≥ r r=1 1 k(1+2α)/3 E [min {S, µ}] µ E [min {S, µ}] µ E [min {S, µ}] µ 0 @ X 1 bµc P {S ≥ r}A r=1 1− 1− 1− 1 k(1+2α)/3 1 k(1+2α)/3 1 k(1+2α)/3 πi λi pi i∈P µ X πi λi pi i∈P Z∗µ X ci λi pi i∈P − 2 k(1−α)/3 Z∗ 1− 1 1 1 1 − (1−α)/3 − 1−α k k k 1− AN ADAPTIVE SETTING: WHEN THE PI ARE UNKNOWN In this section, we consider the case when the clickthru probabilities are not known in advance. To estimate these probabilities, we need to select the keywords and observe the resulting impressions and clicks, a process that can potentially result in significant costs. We thus need to carefully balance the tradeoffs between explorations and exploitations. In Section 4.1, we will show how this problem can be formulated as a multi-armed bandit problem, where each bandit corresponds to a subset of keywords. We then aim to develop a strategy that adaptively selects bandits that yield near-optimal profits. By formulating the problem an instance of the multiarmed bandit problem, we can apply existing adaptive algorithms developed for this class of problems. Unfortunately, existing multi-armed bandit algorithms have performance guarantees that deteriorate as the number of keywords increases, motivating us to develop an alternative adaptive approximation algorithm that leverages the special structure of our problem. In Section 4.2, we describe our improved algorithm and show that the algorithm exhibits a superior performance bound compared with traditional multi-armed bandit algorithms. We will show that the quality of the solution generated by our algorithm is independent of the number of keywords and scales gracefully with the problem’s other parameters. Then, in Section 5, we show through extensive simulations that our algorithm outperforms traditional multi-armed bandit algorithms, increasing profits by about 7%. 4.1 X 1− 1 2 − (1−α)/3 k k(1+2α)/3 k 1 2 2 1 ≥ 1 − (1+2α)/3 − (1−α)/3 + (2+α)/3 − k k k k 1 2 ≥ 1 − (1+2α)/3 − (1−α)/3 k k which is the desired result. Z . if the rth search query corresponds to keyword i, given that we bid on a set of keywords H and r ≤ µ. Since the proof is fairly technical, the details are given in Appendix E. P{BrH ≥ ci S ≥ r} 4. The final lemma provides a lower bound on the probability n o P BrH ≥ ci S ≥ r that we have enough money remaining k(1+2α)/3 ≥ ∗ i∈H 1 A Traditional Multi-Armed Bandit Approach Theorem 1 shows that by considering only subsets of keywords that follow the prefix ordering – a descending order of profit-to-cost ratio – we can achieve near-optimal profits. This result enables us to reduce the size of the decision space from 2N (all possible subsets of N keywords) to just N . Assume that the keywords are indexed as in Assumption 1. For any 1 ≤ i ≤ N , let Di = {1, 2, . . . i}. We can view each Di as a decision whose payoff corresponds to the expected profit that results from selecting keywords in Di , i.e. E [ZDi ]. Let Z pref ix denote the maximum profit among decisions D1 , . . . DN , i.e. Z pref ix = max1≤i≤N E [ZDi ] . Viewing each Di as a bandit whose reward corresponds to the profit from selecting keywords in Di , we can then formulate our problem as the multi-armed bandit problem. The multi-armed bandit problem has received much attention and many algorithms exist for this class of problems. Two such algorithms are the UCB1 (see [3]) and the EXP3 (see [4]) algorithms. Both algorithms adaptively select a bandit in each time period based on past observations. Let (Zt (U CB1) : t ≥ 1) and (Zt (EXP 3) : t ≥ 1) denote the sequence of profits obtained by the algorithms UCB1 and EXP3, respectively. The average expected profit generated by these algorithms are given by (see [3, 4]): PT 1 ≥ 1 ≥ Zt (U CB1) L (αN log T + βN ) ≥1− T Z pref ix T Z pref ix PT √ 3 3 LN log N t=1 Zt (EXP 3) ≥1− √ 3 T Z pref ix 2T Z pref ix t=1 PN PN is where L ≡ max i=1 ci xi ≤ 1, xi ∈ N ∀i i=1 πi xi – We choose a sequence (γt ∈ [0, 1] : t ≥ 1), where γt will denote the probability that we choose a random decision at time t. We will later suggest good choices for the γt (see the end of this section and Appendix F). • DEFINITION: For t = 1, 2, . . . – Let `t be the index such that µ `t X X 1 2 cu λu p̂t−1 ≤ 1− − (1−α)/3 < µ cu λu p̂t−1 u u k k u=1 u=1 `t +1 – Let θt denote an integer chosen uniformly at random from the set {1, 2, . . . , N } – Let Ft denote an independent binary random variable such that P {Ft = 1} = 1−γt and P {Ft = 0} = γt . – Let the decision gt be defined by an upper bound on one PN Pthe maximum possible profit in any 2 `t , if Ft = 1 period and αN = 1≤i≤N :∆i >0 8/∆i and βN = 1 + π /3 i=1 ∆i , gt = pref ix θ , otherwise. t and for any 1 ≤ i ≤ N , ∆i = Z − E [ZDi ]. The above results show that, as T increases to infinity, – Let Gt = {1, 2, . . . , gt }. Bid on keywords in Gt . the average profit earned by the UCB1 and EXP3 algorithms converge to the maximum expected profit Z pref ix – Observe the resulting impressions and clicks. among D1 , . . . , DN . We know from Theorem 1 that Z pref ix – Updates: For any keyword i, let Vit and Wit deprovides a good approximation to Z ∗ , suggesting that the note the number of impressions and clicks that UCB1 and EXP3 algorithms may provide a viable algokeyword i receives in this period, respectively. rithms for solving our problem. Then, for all i, Unfortunately, these two generic algorithms for the multiarmed bandit problem ignore any special structure of the yi ← yi + Vit and xi ← xi + Wit problem, and thus, the rates of convergence of these al gorithms depend on the number of keywords N . As the 1, if yi = 0 number of keywords N increases, the rate of convergence p̂ti = xi , if yi > 0 y i decreases as well, clearly an undesirable feature in our setting where we can have millions of keywords. In fact, the • OUTPUT: A sequence of decisions (Gt : t ≥ 1) and the U CB1 algorithm requires us to try each of the N possible corresponding sequence of indexes (gt : t ≥ 1). decisions once during the first N time periods. Clearly, for large values of N as in our setting, this might not be feasible. In the above algorithm, p̂ti represents our estimate, at the end of t periods, of the clickthru probability of the ad associ4.2 An Improved Adaptive Approximation Alated with keyword i. Note that as long as the ad associated gorithm with keyword i has not received any impressions (i.e. we Motivated by the result from the previous section, we will have no data on the clickthru probability), we will set p̂ti to develop an alternative algorithm that exploits the special 1. However, when the ad receives at least one impressions, structure of our problem. We will show that our algorithm we set p̂ti as the average clickthru probability. exhibits a performance bound that scales gracefully with For ease of exposition, let us introduce the following nothe problem’s parameters and the bound will be indepentation that will be used throughout the paper. Let IU be dent of the number of keywords. Then, through extensive defined by simulations in Section 5, we will show that our algorithm ( ) X̀ outperforms both the UCB1 and EXP3 algorithms. 1 1 IU = max ` : µ cu λu pu ≤ 1 − − (1−α)/3 Our algorithm, which we will refer to as the ADAPTIVE k k u=1 BIDDING algorithm, is defined as follows. Observe that IU is the same as the index of the prefix we would use in the algorithm of Theorem 1 if we knew the ADAPTIVE BIDDING probabilities pi . We will show below that we can state the • INITIALIZATION: convergence of our algorithm in terms of IU rather than N , the total number of keywords. Intuitively speaking, it does – For any i, let yi and xi denote the cumulative not make sense for us to choose too frequently prefixes whose total number of impressions and the total number index is larger than IU since this will cause us to exceed our of clicks, respectively, that the ad associated with budget. keyword i has received. Set yi = xi = 0 for all The main result of this section is stated in the following 1 ≤ i ≤ N . Set p̂0i = 1 for all i. theorem. Theorem 2. Let (Gt : t ≥ 1) denote the sequence of decisions generated by the ADAPTIVE BIDDING algorithm. Let ρ = E [min {S, µ}] /µ, and assume that 1/k1−α ≤ 1 − 1/k − 1/k(1−α)/3 . Then, under Assumption 1, for any T ≥1 PT PT E [ZGt ] 1 4 ≥ ρ 1 − (1+2α)/3 − (1−α)/3 − T Z∗ k k t=1 where M = 32µIU kρ 1 − ≤ 1 k(1+2α)/3 32µIU 12IU2 + + √ 3 12IU2 t=1 γt T PI U PIU i=1 kρ 1 − 1/ k − −λi µ i=1 λi µ/ 1 − e k(7−4α)/3 λi µ/ 1 − e−λi µ k Before we proceed to the proof of the theorem, let us compare the performance bound of the our adaptive bidding algorithm with those obtained from the UCB1 and EXP3 algorithms for the multi-armed bandit problem. It follows from the discussion in Section 4.1 and from Theorem 1 that, under the algorithms UCB1 and EXP3, we have the following per formance bounds: let κ ≡ ρ 1 − 1/k(1+2α)/3 − 2/k(1−α)/3 , then PT Zt (U CB1) T Z∗ PT t=1 Zt (EXP 3) T Z∗ t=1 ≥ ≥ L (αN log T + βN ) T Z∗ √ 3 3 LN log N √ κ− 3 2T Z ∗ κ− where L denotes the maximum possible profit in any one period. When we compare the performance of UCB1 and EXP3 with the result of Theorem 2, we observe that there is a slightly decrease in the performance of our algorithm, from 2/k(1−α)/3 to 4/k(1−α)/3 . However, the convergence rate of our algorithms is superior to both UCB1 and EXP3, as indicated below. (a) Unlike its counterparts, the convergence rate of our algorithm in Theorem 2 is independent of the number of keywords N . Instead, it depends only on IU which reflects the maximum number of keywords (chosen based on a prefix ordering) whose expected cost does not exceed 1 − 1/k − 1/k(1−α)/3 fraction of the budget. In practice, IU should be significantly smaller than N . (b) Although the convergence rate of our algorithm depends on the expected number of search queries µ in any given period, this quantity should be comparable to the constant L – the maximum profit that can be obtained in any one period – appearing in the convergence rates of UCB1 and EXP3 above. To see this, note that by definition of L, we have that L ≥ E Z{1,...,IU } ≥ µρ 1 − 1 k(1+2α)/3 X IU πi λi pi , i=1 where the last inequality follows from the proof of Theorem 1. Moreover, the convergence rate of our algorithm depends on µ through the product µIU . By definition, the value of IU decreases as µ increases, suggesting that the convergence rate of our algorithm should be insensitive to the increase in the parameter µ. M , T (c) With millions of available keywords, the arrival rate λi µ of some keywords can be extremely small. The convergence rate of our algorithm, however, is insensitive tosmall arrival rates because the expression λi µ/ 1 − e−λi µ is bounded above by 1 as λi µ converges to 0. 2 (d) We should also P∞ point out that, if we set γt = 1/t for all t ≥ 1, then t=1 γt < ∞. This implies that the average reward generated by our algorithm over T time periods converges to near-optimal profits at the rate of 1/T , which is faster than the convergence rate of other two algorithms. (e) In their seminal paper, Lai and Robbins [16] established a lower bound on the minimum regret of any adaptive strategy, showing that for any non-anticipatory strategy φ, T X {Z ∗ − E [Zt (φ)]} = Ω(log T ), t=1 where E [Zt (φ)] denote the expected profit in period t under the strategy φ. This result shows that the total expected profit over T periods of any adaptive strategy must differ from the optimal expected profit by at least Ω(log T ); thus, the average expected reward per time period cannot converge to the optimal expected profit faster than O ((log T )/T ). We should note that our result in Theorem 2 is consistent with this lower bound. According to Theorem 2, we have that T X {Z ∗ − E [ZGt ]} ≤ νZ ∗ T + t=1 T X t=1 γt + M, where ν = 1 − ρ 1 − 1/k(1+2α)/3 − 4/k(1−α)/3 . Clearly, the upper bound on the right hand side of the above inequality increases linearly with T . Intuitively speaking, the average expected profit under our algorithm converges to the optimal expected profit under the prefix ordering, which serves only as an approximation to the optimal expected profit (from among all subsets of keywords). Finally, in addition to nicer theoretical properties, in Section 5, through simulations on many large problem instances, we will further show that our adaptive bidding algorithm outperforms both UCB1 and EXP3, increasing the profits by an average of 7%. 4.3 Sketch of the Proof of Theorem 2 We provide a sketch of the proof of Theorem 2. For more details, the reader is referred to Appendix G. In Lemma 8 in Appendix G, we establish an upper bound on the probability that `t > IU , showing that for all t > 1, P {`t > IU } ≤ 1/k(1−α)/3 . For any T ≥ 1, let the random variable JT ⊆ {1, 2, . . . T } denote the set of time periods when the index of the last keyword chosen based on the prefix ordering is less than or equal to IU , i.e. JT = {t ≤ T : Ft = 1, `t ≤ IU } . Note that when Ft = 1, the algorithm chooses the decision based on the prefix ordering, i.e. gt = `t . The result from Lemma 8 shows that JT will contain a large fraction of time periods, enabling us to restrict our attention to only time periods in JT . (1+2α)/3 hLet ξ = 1 − 1/k i . For any keyword i, let ζi,T = E P t∈JT :gt ≥i pi − p̂t−1 denote the expected cumulative i errors during time periods in JT between our estimate of the clickthru probability for keyword i and its true value. Lemma 9 (in Appendix G) then relates the performance of our algorithm to the accuracy of our estimates of the clickthru probabilities associated with keywords 1, . . . , IU during time periods in JT , showing that for any T ≥ 1, interval [0,1). The clickthru rate associated with each keyword is a random number from [0.0,0.2). To generate the probability that a search query will correspond to each keyword, we assign random numbers (chosen independently and uniformly from [0,1)) to all keywords and normalize them. For each problem instance, we compare our algorithm with EXP32 and UCB13 , two standard algorithms for the PT PT IU γt µX 3 multi-armed bandit problem (see Section 4.1 for more det=1 E [ZGt ] ≥ ρ ξ − (1−α)/3 − t=1 −ρξ ci λi ζi,T . tails). For our prefix-based algorithm, we set the randomT Z∗ T T i=1 k ization probability γt to be γt = 1/t2 for all t ≥ 1. In Lemma 10 in Appendix G, we then show that, on avFigure 2 (a) and (b) shows the simulation results for the erage, the cumulative errors over time periods in JT do not SMALL experiment. Figure 2(a) shows the average profit increase with T . In fact, it is bounded by a small constant over time for all three algorithms when averaged over all 100 plus a term that vanishes to zero as T increases, i.e. for any problem instances in the SMALL experiment. The dashed T ≥ 1 and 0 < δ < 1, lines above and below the solid lines correspond to the 95% 0 1 confidence interval. As seen from the figure, our prefix-based λi µ 2 PI U IU 6IU i=1 1−e−λi µ ( ) A algorithm outperforms both UCB1 and EXP3 algorithms, 1 1 8µIU µX ci λi ζi,T ≤ (1−α)/3 + @ 2 2 2 + (7−4α)/3 yielding an increase of about 7% in average profits. FigT i=1 T δ kρ ξ k k (1 − δ)ρξ ure 2(b) shows the distribution over 100 problem instances (in the SMALL experiment) of the relative difference beThe result of Theorem 2 follows from the last two inequaltween the linear programming upper bound (Lemma 3) and ities and from setting δ = 1/2. The intuition underlying the profit under our algorithm. As seen from the figure, the above inequality is the following. Consider any keyword the profit generated by our algorithm differs from the linear i ≤ IU . Suppose JT consists of t1 < . . . < tw and consider programming bound by about 30%. Since the optimal soluany th+1 ∈ JT . During the previous h periods t1 , . . . , th , the tion of the linear program provides an upper bound on the total expected number of queries associated with keyword i optimal expected profit, this result implies that the average is λi µh. By the definition of JT , the prefix chosen during profit generated by our algorithm differs from the optimal these h periods has final index at most IU . We can thus by at most 30%. apply Lemma 5 to show that with probability at least ξ we Note that for the SMALL experiments, the value of k is have enough budget to display ads for each of these λi µh about 1200 and the value of α is around 0.33 (see Assumpqueries of keyword i. Hence, the expected cumulative numtion 1 for more details). The bound of Theorem 2 implies ber of impressions that the ad for keyword i will receive just that for these values our algorithm should converge to at before period th+1 is at least Ω (ξλi µh). We can then apply least .16 of the linear programming bound, but our simulaa Chernoff bound to show that, at time th+1 , the error betion results show that our algorithm performs much better tween our empirical estimate of the probability and its true than this. As indicated by Theorems 1 and 2, as the value of value is at most O e−ξλi µh , declining to zero exponentially k increases and the value of α decreases, the average profit in h. Thus, the total errors across all time periods in JT should approach the optimal. This theorem is confirmed by (summing over h) will be finite and independent of T . the simulation results for the LARGE experiments, which is 5. EXPERIMENTS shown in Figure 2 (c) and (d). For the LARGE experiment, In the previous section, we developed an alternative algothe value of k and α are around 3,333 and 0.22, respectively. rithm that exploits the prefix ordering of the keywords. We For these values, the bound of Theorem 2 implies that our showed that the average expected profit generated by the algorithm should converge to at least .61 of the linear proalgorithm converges to near-optimal profits and the convergramming bound, and again in our simulation we are doing gence rate is independent of the number of keywords. In better than this. In Figure 2(c), we see that the average this section, we compare the performance of the algorithm profit generated by our algorithms is about 20% higher than against UCB1 and EXP3 algorithms for multi-armed bandit the profits under UCB1 and EXP3 algorithms. In addition, problems. as shown in Figure 2(d), the average profit differs from the We ran two experiments: SMALL and LARGE. Each exoptimal by at most 20%. periment has 100 randomly generated problem instances. Appendix F provides additional discussion of the impact In the SMALL experiment, each problem instance has 8,000 of different choices of randomization parameters γt and inikeywords, 200 time periods, $400 budget per period, and the tialization of the estimated clickthru probabilities. number of search queries in each time period has a Poisson distribution with a mean of 40,000. In the LARGE experi6. MODEL DISCUSSIONS AND EXTENSIONS ment, each problem instance has 50,000 keywords, 200 time In this section, we provide additional discussion on the periods, $1,000 budget per period, and the number of search strengths and weaknesses of our formulation, along with posqueries in each time period has a Poisson distribution with sible extensions of our model. First, in most search-based a mean of 150,0001 . advertising services, changing the bid price of each keyword In both sets of experiments, for each problem instance, will alter the position of the ad on the search result page. the cost-per-click of each keyword is chosen uniformly at random from the interval [0.1,0.3) and the profit-per-click 2 For EXP3, we tested 3 different settings for the “exploration probability”: of each keyword is chosen uniformly at random from the 1%, 5%, and 10%. All three settings exhibit similar performance. So, we 1 Since the mean number of search queries µ is very large, to speed up the simulation, we approximate the Poisson distribution using a Gaussian √ distribution with mean µ and standard deviation µ. decide to set the exploration probability to 5%. 3 The UCB1 algorithm requires us to try each of the N prefix sets once during the first N periods. We choose these sets by random sampling from among those that have not been tried. Distribution of Relative Errors Between the LP Bound and the Profit Under the Prefix Algorithm (100 Problem Instances, 8K Keywords, Avg. 40K Queries, $400 Budget) Average Profit Among UCB1, EXP3, and Prefix-Based Adaptive Algorithm (100 Problem Instances, 8K Keywords, Avg. 40K Queries, $400 Budget) NOTE: The dash lines represent 95% confidence interval R u n n in g A v e r a g e P r o fit U p T o T im e t % o f P r o b le m In s ta n c e s 25% 1120 1100 1080 20% Prefix Ordering 1060 1040 1020 1000 980 960 940 920 900 UCB1 15% 10% EXP3 5% 0% 1 51 101 <= 27.8% 151 28.0% 28.2% 28.4% 28.6% 28.8% 29.0% <= 30% Relative Error (%) Time Period (a) (b) Average Profit Among UCB1, EXP3, and Prefix-Based Adaptive Algorithm (100 Problem Instances, 50K Keywords, Avg. 150K Queries, $1,000 Budget) NOTE: The dash lines represent 95% confidence interval Distribution of Relative Errors Between the LP Bound and the Profit Under the Prefix Algorithm (100 Problem Instances, 50K Keywords, Avg. 150K Queries, $1,000 Budget) 45% R u n n in g A v e r a g e P r o fit U p T o T im e t 3600 3500 3400 3300 3200 3100 3000 2900 2800 2700 2600 2500 % o f P r o b le m In s t a n c e s 40% 35% Prefix Ordering 30% UCB1 25% 20% 15% EXP3 10% 5% 0% 1 51 101 151 Time Period (c) <= 18% 18.5% 19.0% 19.5% 20.0% 20.5% <= 21.0% Relative Error (%) (d) Figure 2: Figures in the top ((a) and (b)) and bottom ((c) and (d)) rows shows the result for the SMALL and LARGE experiments, respectively. The figures in the left hand column ((a) and (c)) show a comparison of the average profit over time under UCB1, EXP3, and our prefix-based algorithms. The figures on the right hand column ((b) and (d)) show the relative difference between the linear programming bound and the average profit generated by our algorithm over 100 problem instances. This work focuses primarily on identifying profitable subsets of keywords. Our formulation assumes that the bid price of each keyword remains constant and the ad will always appear in the same spot on search result page. Incorporating the impact of the ad’s position on the search result page will allow for a richer class of models. Finding an appropriate model that enables the development of efficient computational methods remains an open question. Another open question is that of finding an appropriate model for competitive bidding for keywords. Our model assumes that the cost of a keyword is fixed. Of course, these costs depend on what others bid for the same keywords. Incorporating competitive aspects into our model is an interesting and challenging research direction. 7. REFERENCES [1] G. Aggarwal and J. D. Hartline, “Knapsack Auctions,” Workshop on Sponsored Search Auctions, EC 2005. [2] R. Agrawal, “The continuum-armed bandit problem,” SIAM J. Control and Optimization, vol. 33, pp. 1926–1951, 1995. [3] P. Auer, N. Cesa-Bianchi, P. Fischer, “Finite-time Analysis of the Multiarmed Bandit Problem,” Machine Learning, pp. 235–256, 2002. [4] P. Auer, N. Cesa-Bianchi, Y. Freund, R. E. Schapire, “Gambling in a rigged casino: The adversarial multi-armed bandit problem,” SODA 1995. [5] M-F. Balcan, A. Blum, J. D. Hartline, Y. Mansour “Sponsored Search Auction Design via Machine Learning,” Workshop on Sponsored Search Auctions, EC 2005. [6] J. Chen, D. Liu, A. B. Whinston, “Designing Share Structure in Auctions of Divisible Goods,” Workshop on Sponsored Search Auctions, EC 2005. [7] B. Dean, M.X. Goemans and J. Vondrak, “Approximating the Stochastic Knapsack Problem: The Benefit of Adaptivity,” FOCS 2004, pp. 208–217. [8] E. Even-Dar, S. Mannor, and Y. Mansour, “PAC bounds for multi-armed bandit and Markov decision processes,” Fifteenth Annual Conference on Computational Learning Theory, pp. 255-270, 2002. [9] J. Goodman, “Pay-Per-Percentage of Impressions: An Advertising Method that is Highly Robust to Fraud,” Workshop on Sponsored Search Auctions, EC 2005. [10] R.L. Graham, E.L. Lawler, J.K. Lenstra, and A.H.G. Rinnooy Kan, “Optimization and approximation in deterministic sequencing and scheduling: A survey,” Annals of Discrete Mathematics, 5:287–326, 1979. [11] B. J. Jansen and M. Resnick, “Examining Searcher Perceptions of and Interactions with Sponsored Results,” Workshop on Sponsored Search Auctions, EC 2005. [12] B. Kitts, P. Laxminarayan, B. LeBlanc, R. Meech, “A Formal Analysis of Search Auctions Including Predictions on Click Fraud and Bidding Tactics,” Workshop on Sponsored Search Auctions, EC 2005. [13] R. Kleinberg, “Online Decision Problems with Large Strategy Sets,” Ph.D. disseration, MIT 2005. [14] R. Kleinberg, “Nearly tight bounds for the continuum-armed bandit problem,” in NIPS 2005. [15] S. R. Kulkarni and G. Lugosi, “Finite-time lower bounds for the two-armed bandit problem,” IEEE Trans. Aut. Control, pp. 711-714, 2000. [16] T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Advances in Applied Mathematics, vol. 6, pp. 4-22, 1985. [17] S. Mannor and J. N. Tsitsiklis, “Lower bounds on the sample complexity of exploration in the multiarmed bandit problem,” Sixteenth Annual Conference on Computational Learning Theory, pp. 418-432. Springer, 2003. [18] C. Meek, D. M. Chickering, D. B. Wilson, “Stochastic and Contingent-Payment Auctions,” Workshop on Sponsored Search Auctions, EC 2005. [19] A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani, “Adwords and Generalized Online Matching,” FOCS 2005. [20] M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis, Cambridge, 2005. [21] D. C. Parkes and T. Sandholm,“Optimize-and-Dispatch Architecture for Expressive Ad Auctions,” Workshop on Sponsored Search, EC 2005. [22] D. Pollard, Convergence of Stochastic Processes, Springer Verlang, 1994. [23] S. Sahni, “Approximate Algorithms for the 0/1 Knapsack Problem,” Journal of the ACM, 22:115–124, 1975. [24] M. Zinkevich, “Online convex programming and generalized infinitisimal gradient ascent,” in Proceeding of the 20th International Conference on Machine Learning, pp. 928–936, 2003. APPENDIX A. FIGURES Figure 3: Examples of ads shown under search-based advertising programs. The left and right figures show the result of searching for “cheap Europe trip” (on 10/18/05) on Google and Yahoo!, respectively. The links on the top (shaded) area and on the right hand column of the search result page correspond to ads by companies participating in the search-based advertising programs. (a) (b) Figure 4: Figure (a) is a screen shot showing the estimated number of search queries in Yahoo! in September 2005 related to the keyword “cheap Europe trip”. The data is available through the Keyword Selector Tool offered by Yahoo! at http://inventory.overture.com/d/searchinventory/suggestion/. Figure (b) shows the current bids (as of 10/18/05) for the same keyword under the Yahoo! Sponsored Marketing program. This information is publicly available through the Bid Tools offered by Yahoo! at http://uv.bidtool.overture.com/d/search/tools/bidtool/index.jhtml. B. PROOF OF LEMMA 1 Proof. Recall that BrA denotes the remaining account balance when the rth search query arrives and we bid on keywords in A. This implies that the event {BrA ≥ ci } depends only on the previous r − 1 search queries, and does not depend on whether the consumer clicks on the ad associated with keyword i when the rth search query arrives nor does it depend on Distribution of the Median Cost Per Click for 842 Travel-Related Keywords (10/18/2005, Yahoo! Sponsored Search Program) 18% 16% % of Total Keywords 14% 12% 10% 8% 6% 4% 2% 0% <= 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.50+ Median CPC ($) Figure 5: A distribution of the median bid price under the Yahoo! Sponsored Search program for 842 travelrelated keywords collected during a 2-day period from 10/17/05 - 10/18/05. A complete list of 842 keywords used in this study can be found at http://www.orie.cornell.edu/∼paatrus/keywordlist.txt. Distribution of the Average Number of Daily Searches for 842 Travel-Related Keywords in September 2005 (Yahoo! Sponsored Search Program) 80% 70% % of Total Keywords 60% 50% 40% 30% 20% 10% 0% <=10 20 30 40 50 60 70 80 90 100 100+ Average Daily Searches in September 2005 Figure 6: A distribution of average daily searches on Yahoo! for 824 travel-related keywords during September 2005. what the rth search query corresponds to. Thus, n P Qr = i, BrA ≥ ci , Xri = 1S ≥ r o h i = E 1 Qr = i, BrA ≥ ci , Xri = 1 S ≥ r = E 1 BrA ≥ ci S ≥ r E 1 (Qr = i, Xri = 1) S ≥ r = λi pi E 1 BrA ≥ ci S ≥ r = λi pi P BrA ≥ ci S ≥ r h h n i h i i o where the third equality follows from our assumption that the clickthrus of the ads are independent of the keyword corresponding to the rth search query, and both of these random variables are independent of the total number of arrivals S. The second equality in the lemma follows directly from the above result. C. PROOF OF LEMMA 2 The proof of Lemma 2 makes use of the following bound on the tail of a Poisson random variable. Since the proof follows from the standard application of Chernoff bound (cf. Theorem 4.5 in [20]), we omit the details. Lemma 6. If S is a Poisson random variable with mean µ, then for any 0 < δ < 1, P {S ≤ (1 − δ)µ} ≤ e−µδ Here is the proof of Lemma 2. 2 /2 . Proof. We have that bµc X P {S ≥ r} = r=1 bµc X (1 − P {S < r}) = bµc − r=1 P {S ≤ r − 1} r−1 S≤ µ µ P = P {S ≤ r − 1} r=1 It follows from Lemma 6 that for any 1 ≤ r ≤ bµc, bµc X ( ≤ exp −µ 1 − r−1 µ 2 ) 2 = e−(1+µ−r) 2 /2µ ≤ e−(1+bµc−r) 2 /2µ , which implies that bµc X P {S ≤ r − 1} ≤ r=1 bµc X e−(1+bµc−r) 2 /2µ = r=1 bµc X e−r 2 Z /2µ ∞ ≤ 2 e−x /2µ dx = 0 r=1 √ 2πµ 2 Therefore, we have that √ √ √ bµc 2πµ 2π 2π 1 3 =µ − √ P {S ≥ r} ≥ bµc − ≥µ 1− − √ ≥µ 1− √ , 2 µ 2 µ µ 2 µ µ r=1 √ where the last inequality follows from our assumption that µ > 1, which implies that 1/ µ > 1/µ and the fact that √ 2π/2 ≤ 2. bµc X D. PROOF OF LEMMA 4 Proof. Since the second inequality follows directly from Lemma 3, it suffices to prove only the first inequality. Let m Pm+1 P denote the index such that µ m i=1 ci λi pi Since π1 /c1 ≥ π2 /c2 ≥ · · · ≥ πn /cn , it follows from the i=1 ci λi pi ≤ 1 < µ definition of Z LP that Z LP =µ m X πi λi pi + πm+1 λm+1 pm+1 i=1 P 1−µ m i=1 ci λi pi cm+1 λm+1 pm+1 It is easy to verify that u < m since ci λi pi µ ≤ 1/k1−α ≤ 1/k(1−α)/3 for all i by Assumption 1 (c). Moreover, since π1 /c1 ≥ · · · ≥ πN /cN , we have that Pm Pu Pu πi λi pi πi λi pi πm+1 λm+1 pm+1 i=u+1 πi λi pi Pi=1 P Pi=1 ≥ and ≥ , u m u i=1 ci λi pi which implies that m X u X πi λi pi ≤ i=u+1 i=u+1 ci pi λi ! Pm i=u+1 ci λi pi P u i=1 ci λi pi πi λi pi i=1 i=1 ci λi pi cm+1 λm+1 pm+1 and u X πm+1 λm+1 pm+1 ≤ ! πi λi pi i=1 cm+1 λm+1 pm+1 Pu i=1 ci λi pi Putting everything together, we have Z LP = µ u X πi λi pi + µ i=1 ≤ µ µ u X µ u X ! P ! Pm m 1 i=u+1 ci λi pi 1+ P + u µ i=1 ci λi pi πi λi pi πi λi pi i=1 = πi λi pi + πm+1 λm+1 pm+1 i=u+1 u X i=1 ≤ m X i=1 µ Pu P P i=1 1 ci λi pi , which is the desired result. E. PROOF OF LEMMA 5 The proof of Lemma 5 makes use of the following result. Lemma 7. For any A ⊆ {1, 2, . . . , N }, V ar r−1 X X s=1 i∈A ci 1 (Qs = i, Xsi m cm+1 λm+1 pm+1 1 − µ i=1 ci λi pi Pu · cm+1 λm+1 pm+1 i=1 ci λi pi 1−µ m ci λi pi i=u+1 ci λi pi Pu i=1 1+ P + u c λ p µ c λ i=1 i i i i=1 i i pi ! πi λi pi P 1−µ m i=1 ci λi pi cm+1 λm+1 pm+1 ! r−1 X = 1) S ≥ r ≤ ci λi pi k i∈A . Proof. Since Qs ’s are independent and for any i, Xsi ’s are also independent, it suffices to show that for any s < r, ! 1X V ar ci 1 (Qs = i, Xsi = 1) S ≥ r ≤ ci λi pi k i∈A i∈A Note that since Qs and Xsi are independent, we have that E 1 (Qs = i, Xsi = 1) S ≥ r = λi pi , which implies that ! X V ar ci 1 (Qs = i, Xsi = 1) S ≥ r i∈A " !2 # X =E ci (1 (Qs = i, Xsi = 1) − pi λi ) S ≥ r i∈A " # X 2 2 = ci E (1 (Qs = i, Xsi = 1) − pi λi ) S ≥ r i∈A " # X + ci cj E (1 (Qs = i, Xsi = 1) − λi pi ) · (1 (Qs = j, Xsj = 1) − λj pj ) S ≥ r i,j∈A:i6=j X 2 X X ci λi pi (1 − λi pi ) − = i∈A ci cj λi pi λj pj i,j∈A:i6=j 1X ci λi pi , k i∈A ≤ where the third equality follows from the fact that 1(Qs = i)1(Qs = j) = 0 for all i 6= j. The final inequality follows from Assumption 1(c) that ci ≤ 1/k for all i. Here is the proof of Lemma 5. Proof. For any keyword i ∈ H, let TrH denote the total amount of money that we have already spent when rth search query arrives. Then, H H P ≥ ci S ≥ r = P Tr ≤ 1 − ci S ≥ r = 1 − P Tr > 1 − ci S ≥ r . H We will focus on developing an upper bound for the expression P Tr > 1 − ci S ≥ r . BrH By definition of TrH , we know that, with probability 1, TrH = r−1 X X ci 1 Qs = i, BrH ≥ ci , Xsi = 1 ≤ s=1 i∈H r−1 X X ci 1 (Qs = i, Xsi = 1) . s=1 i∈H Moreover, by the hypothesis of the lemma, E "r−1 XX ci 1 (Qs = i, Xsi s=1 i∈H # X X 1 1 = 1) S ≥ r = (r − 1) ci λi pi ≤ µ ci λi pi ≤ 1 − − (1−α)/3 , k k i∈H i∈H and it follows from the Chebyshev’s inequality and the previous lemma that P TrH > 1 − ci S ≥ r ≤ P (r−1 XX ci 1 (Qs = i, Xsi ) = 1) > 1 − ci S ≥ r s=1 i∈H V ar Pr−1 P s=1 ≤ ≤ ≤ ≤ ≤ i∈H ci 1 (Qs = i, Xsi = 1) S ≥ r 1 − ci − (r − 1) r−1 k P i∈H 1 − ci − (r − 1) P ci λi pi P i∈H ci pi λi µ P i∈H ci λi pi k 2 P 1 1 − k − µ i∈H ci pi λi (2−2α)/3 X k k 1 k(1+2α)/3 µ ci λi pi i∈H i∈H 2 ci pi λi 2 which is the desired result. F. IMPACT OF RANDOMIZATION AND INITIALIZATION In this section, we explore the impact of different choices of randomization and initialization. We ran our algorithm on the 100 problem instances in the LARGE experiment using four different randomization and initialization settings: (a) No randomization (γt = 0 ∀t) and set the initial estimate of the clickthru probability associated with the ad of each keyword to zero (p̂0i = 0 ∀i). (b) 1/t randomization (γt = 1/t ∀t) and set the initial estimate of the clickthru probability associated with the ad of each keyword to 1 (p̂0i = 1 ∀i). (c) 1/t2 randomization (γt = 1/t2 ∀t) and set the initial estimate of the clickthru probability associated with the ad of each keyword to 1 (p̂0i = 1 ∀i). (d) No randomization (γt = 0 ∀t) and set the initial estimate of the clickthru probability associated with the ad of each keyword to 1 (p̂0i = 1 ∀i). Figure 7 shows the average profit over time under each of the four parameter settings. Recall that our ADAPTIVE ALGORITHM sets the initial estimate of the clickthru probability to 1, corresponding to cases b), c), and d). In this case, we clearly see the benefits of randomization; allowing for random decisions with small probability (either γt = 1/t or 1/t2 ) yields significantly higher average profits. It is interesting to note that although setting γt = 1/t yields slightly higher profits over γt = 1/t2 , the difference is quite small. By setting the initial clickthru probability for each keyword to 1, we limit the number of keywords chosen in each time period. Initially, we will select a small subset of keywords and gradually enlarge the set of keywords as we observe more data on impressions and clicks. The opposite extreme of this approach is to set the initial clickthru probability for each keyword to 0, implying that we will select all keywords in the first time period. Although this setting yields the highest average profit, as seen from the top line in Figure 7, we do not have a formal proof of the convergence under this setting. Average Profits for Prefix-Based Algorithm Under Different Randomizations and Initializations (LARGE experiment) No Randomization with Initial Prob. Estimates Equal to Zero 1/t Randomization with Initial Prob. Estimates Equal to One 1/t2 Randomization with Initial Prob. Estimates Equal to One No Randomization with Initial Prob. Estimates Equal to One 3600 A v e ra g e P ro f it U p t o T im e t 3400 3200 3000 2800 2600 2400 1 51 101 Time Period 151 Figure 7: Impact of different randomization and initialization. Figure 7 shows the results of applying our algorithm under four different parameter settings. G. PROOF OF THEOREM 2 The proof of Theorem 2 depends on a series of lemmas. The first lemma establishes an upper bound on the probability that `t > IU . The proof appears in Appendix G.1. Lemma 8. For any t ≥ 1, P {`t > IU } ≤ 1/k(1−α)/3 . The next lemma establishes a lower bound on the average total profit. The proof of this result appears in Appendix G.2. Lemma 9. For any T ≥ 1, PT E [ZGt ] 3 1 ≥ ρ 1 − (1+2α)/3 − (1−α)/3 T Z∗ k k PT t=1 t=1 − γt ρ − T T 1 1− µ k(1+2α)/3 IU X 2 ci λi E 4 i=1 3 t−1 5 pi − p̂ i X t:gt ·1(Ft =1 ; `t ≤IU )≥i We should note that if JT = {t ≤ T : Ft = 1, `t ≤ IU }, then since the keywords are indexed starting from 1, we have that X X pi − p̂t−1 = i pi − p̂t−1 , i t∈JT :gt ≥i t:gt ·1(Ft =1 ; `t ≤IU )≥i which consistent with the notations in Section 4.3. The next result establishes an upper bound on the accumulated difference between the empirical estimate of the clickthru probability pbt−1 and its true value pi , corresponding to the second term in the above lemma. The proof appears in Appendix i G.3. Lemma 10. For any T ≥ 1 and 0 < δ < 1, µ IU X 2 ci λi E 4 i=1 3 t−1 5 ≤ pi − p̂ i X t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i T k(1−α)/3 + 8µIU δ 2 kρ2 1 − 2 + 1 k(1+2α)/3 6IU2 PI U i=1 λi µ/ 1 − e−λi µ k(7−4α)/3 (1 − δ)ρ 1 − 1 k(1+2α)/3 Finally, here is the proof of Theorem 2. Proof. It follows from Lemma 9 and 10 that, for any 0 < δ < 1, PT E [ZGt ] T Z∗ t=1 ≥ ρ 1− 1 k(1+2α)/3 −ρ 1 − −ρ 1 − − k(1−α)/3 1 k(1+2α)/3 1 k(1+2α)/3 PT 4 t=1 − 0 1 B @ T γt T 1− 0 C 2 A 8µIU δ 2 kρ2 1 1 k(1+2α)/3 P 1 U 6IU2 Ii=1 λi µ/ 1 − e−λi µ 1 @ A 1 T k(7−4α)/3 (1 − δ)ρ 1 − k(1+2α)/3 Setting δ = 1/2, we get that PT E [ZGt ] 1 4 ≥ ρ 1 − (1+2α)/3 − (1−α)/3 T Z∗ k k t=1 PT − t=1 γt T − M , T where M = 32µIU kρ 1 − 1 k(1+2α)/3 + 12IU2 PI U i=1 λi µ/ 1 k(7−4α)/3 − e−λi µ ≤ 32µIU + √ 3 kρ 1 − 1/ k 12IU2 PI U i=1 λi µ/ 1 − e−λi µ k √ where the inequality follows from the fact that 0 ≤ α < 1, which implies that 1 − 1/k(1+2α)/3 ≥ 1 − 1/ 3 k and k(7−4α)/3 ≥ k, which is the desired result. G.1 Proof of Lemma 8 Proof. For any keyword u, let Yu (t) denote the cumulative total number of impressions that the ad associated with keyword u has received up to the end of period t. It follows from the definition of `t that ( P {`t > IU } = P µ ( = P µ ( = P µ ( ≤ P µ ( ≤ P µ IX u +1 ) cu λu p̂t−1 u u=1 IX u +1 1 2 ≤ 1 − − (1−α)/3 k k p̂t−1 u cu λu − pu u=1 IX u +1 p̂t−1 u cu λu pu − IU +1 X 2 1 ≤ 1 − − (1−α)/3 − µ cu λu pu k k u=1 X IU +1 ≥µ u=1 IX u +1 u=1 t−1 cu λu pu − p̂u u=1 X IU +1 cu λu pu − > ) 2 1 cu λu pu − 1 − − (1−α)/3 k k ) ) 1 k(1−α)/3 p̂t−1 1 (Yu (t u ) − 1) ≥ 1) > u=1 1 k(1−α)/3 , where the first inequality follows from the definition of IU , which implies that 1 − 1/k − 1/k(1−α)/3 < µ Therefore, we have that X IU +1 µ cu λu pu − 1 − u=1 1 2 − (1−α)/3 k k > PIU +1 u=1 cu λu pu . 1 . k(1−α)/3 The final inequality follows from the fact that for any keyword u such that Yu (t − 1) = 0, we have p̂t−1 = 1, which implies u that pu − p̂tu ≤ 0. By Markov’s inequality, we have the following inequalities ( P X IU +1 µ cu λu pu − p̂t−1 1 (Yu (t u ) − 1) ≥ 1) > u=1 E n PIU +1 µ u=1 1 k(1−α)/3 o2 cu λu pu − p̂t−1 1 (Yu (t − 1) ≥ 1) u ≤ 2 (1/k(1−α)/3 ) X IU +1 = k(2−2α)/3 u=1 +2k (2−2α)/3 (µcu λu )2 E X h pu − p̂t−1 u µ2 cu cv λu λv E 2 i 1 (Yu (t − 1) ≥ 1) pu − p̂t−1 u pv − p̂t−1 1 (Yu (t − 1) ≥ 1, Yv (t − 1) ≥ 1) v u<v Now, consider any 1 ≤ u < v ≤ IU + 1, E = pu − p̂t−1 u X X P {Yu (t − 1) = yu , Yv (t − 1) = yv } E yu ≥1 yv ≥1 = X X yu ≥1 yv ≥1 pv − p̂t−1 1 (Yu (t − 1) ≥ 1, Yv (t − 1) ≥ 1) v pu − " P {Yu (t − 1) = yu , Yv (t − 1) = yv } E p̂t−1 u pv − Yu (t p̂t−1 v ! yu 1 X s (Xu − pu ) yu s=1 − 1) = yu , Yv (t − 1) = yv ! yv 1 X s (Xv − pv ) Yu (t − 1) = yu , Yv (t − 1) = yv yv s=1 # = 0, where Xus is an indicator random variable indicating whether or not the consumer clicks on the ad associated with keyword u during the sth time when the ad associated with keyword u receives an impression. The final equality follows from the fact Xus and Xvs are independent of Yu (t) and Yv (t). Moreover, Xus and Xvt are also independent for any s and t and E [Xus − pu ] = E [Xvs − pv ] = 0 for all s. In addition, h E pu − p̂t−1 u = X 2 i 1 (Yu (t − 1) ≥ 1) yu ≥1 = X " P {Yu (t − 1) = yu } E yu ≥1 = X pu − p̂t−1 u P {Yu (t − 1) = yu } E " P {Yu (t − 1) = yu } E yu ≥1 = X P {Yu (t − 1) = yu } yu ≥1 2 Yu (t − 1) = yu ! yu 1 X s (Xu − pu ) yu s=1 ! # Yu (t − 1) = yu yu 1 X s (Xu − pu ) yu s=1 yu 1 X s 2 (X − p ) u u Yu (t − 1) = yu 2 yu s=1 # pu (1 − pu ) yu ≤ pu where the third equality follows from the fact that Xus ’s are independent and identically distributed Bernoulli random variables with parameter pu and they are independent of Yu (t). Moreover, E [Xus − pu ] = 0 for all s. The fourth equality follows from the fact that Xus are Bernoulli random variable with parameter pu . Putting everything together, we have that ( P {`t > IU } ≤ P X IU +1 µ cu λu pu − p̂t−1 1 (Yu (t u ) − 1) ≥ 1) > u=1 X 1 k(1−α)/3 IU +1 ≤ k(2−2α)/3 (µcu λu )2 pu u=1 ≤ = ≤ k(2−2α)/3 1 k 1 k1−α µ (1−α)/3 X X IU +1 µcu λu pu u=1 IU +1 cu λu pu u=1 1 k(1−α)/3 where the third inequality follows from Assumption 1(c), which implies that, for any keyword i, µci λi ≤ 1/k1−α . The final inequality follows from the definition of IU , which implies that X IU +1 µ u=1 G.2 cu λu pu ≤ 1 − 1 1 1 − (1−α)/3 + 1−α ≤ 1 k k k Proof of Lemma 9 Proof. Recall that, for any t ≥ 1, we have Gt = {1, . . . , gt }. Also, let S t denote the total number of search queries in time t, and let Brgt denote the remaining account balance in period t just before the arrival of the rth search query associated with keyword i (in period t), assuming that we have bid on the set of keywords in {1, 2, . . . , gt } in period t. Also, let the t random variable Qtr denote the keyword associated with the rth search query in period t. Also, let Xri denote a Bernoulli random variable indicating whether or not the consumer clicks on the ad associated with keyword i during the arrival of the rth search query in period t. Then, we have that T X E [ZGt ] = t=1 T X 3 2 gt St X X t g t t πi 1 Qr = i, Br ≥ ci , Xri = 1 5 E4 r=1 i=1 t=1 = T X E t=1 = T X " T X E t=1 = T X πi 1 " "g t X gt ∞ X X "g t X t=1 = i, Brgt Qtr πi 1 i=1 r=1 πi λi pi ≥ = " ∞ X E 1 Brgt πi λi pi ∞ X t ci , Xri t = 1, S ≥ r # ## ≥ = 1, S ≥ r gt ## ≥ ci , S t ≥ r g t # t i, Brgt r=1 i=1 E Qtr i=1 r=1 E E t=1 = "g ∞ t X X t ci , Xri t 1 Brgt ≥ ci , S ≥ r r=1 i=1 where the penultimate equality follows from Lemma 1. Recall that Ft ∈ {0, 1} denotes a binary random variable indicating whether or not we choose a decision based on the prefix ordering in period t, where Ft = 1 implies that gt equals to `t (the decision based on prefix ordering), and when Ft = 0, gt is a random decision. Note that, by definition, Ft = 1 with probability 1 − γt . Then, we have that T X E [ZGt ] = t=1 T X t=1 ≥ T X t=1 = = ≥ E "g t X " πi λi pi ∞ X Brgt 1 ≥ ci , S ≥ r # r=1 i=1 E 1 (Ft = 1 ; `t ≤ IU ) · gt X πi λi pi ∞ X 1 Brgt t ≥ ci , S ≥ r # r=1 i=1 ## E E 1 (Ft = 1 ; `t ≤ IU ) · πi λi pi 1 ≥ ci , S ≥ r `t , Ft t=1 r=1 i=1 " "g ## T ∞ t X X X gt t E 1 (Ft = 1 ; `t ≤ IU ) · E πi λi pi 1 Br ≥ ci , S ≥ r `t , Ft t=1 r=1 i=1 2 2 33 bµc gt T X X X g t E 41 (Ft = 1 ; `t ≤ IU ) · E 4 πi λi pi 1 Br t ≥ ci , S ≥ r `t , Ft 55 T X " " t=1 ≥ t ρ 1− 1 X T k(1+2α)/3 gt X ∞ X i=1 r=1 " Brgt E 1 (Ft = 1 ; `t ≤ IU ) · µ gt X t=1 t # πi λi pi i=1 where the final inequality follows from the fact that when Ft = 1 and `t ≤ IU , we have that gt = `t , which implies that 2 E4 gt X i=1 πi λi pi bµc X r=1 1 Brgt 3 ≥ ci , S t ≥ r ` t , F t 5 = `t X i=1 = ≥ = 2 3 bµc X πi λi pi E 4 1 Br`t ≥ ci , S t ≥ r `t , Ft 5 r=1 ) ( ) t t P ≥ ci `t , Ft , S ≥ r P S ≥ r`t , Ft πi λi pi r=1 i=1 ( ) X bµc `t X 1 1 − (1+2α)/3 πi λi pi P S t ≥ r`t , Ft k r=1 i=1 0 1 X bµc `t X 1 @ P St ≥ r A 1 − πi λi pi `t X X bµc ( Br`t k(1+2α)/3 r=1 = ρ 1− 1 k(1+2α)/3 µ gt X i=1 πi λi pi , i=1 where the inequality follows from Lemma 5 and the fact that, conditioned on `t and S t ≥ r, Br`t is independent of Ft . The penultimate equality follows from the fact that S t is independent of `t and Ft . The final equality follows from the observation that bµc X P S t ≥ r = E min S t , µ = ρµ. r=1 The above result implies that 2 gt St X X 1 (Ft = 1 ; `t ≤ IU ) · E 4 πi λi pi 1 (Brgt ≥ ci ) r=1 i=1 3 gt X 1 πi λi pi `t , Ft 5 ≥ ρ 1 − (1+2α)/3 1 (Ft = 1 ; `t ≤ IU ) µ k i=1 which gives the desired result. Thus, we have that T X E [πGt ] ≥ ρ 1− t=1 k(1+2α)/3 ≥ ρ 1− X T 1 Z k(1+2α)/3 E 1 (Ft = 1 ; `t ≤ IU ) · µ t=1 1 " ∗ T X gt X # πi λi pi i=1 " E 1 (Ft = 1 ; `t ≤ IU ) · µ t=1 gt X # ci λi pi i=1 where the final inequality follows from the fact that if Ft = 1 and `t ≤ IU , then we have that with probability 1, gt = `t and by definition of IU , µ gt X ci λi pi = µ `t X ci λi pi ≤ 1 − i=1 i=1 1 1 − (1−α)/3 , k k and it follows from Lemma 4 that `t X πi λi pi ≥ Z ∗ `t X ci λi pi . i=1 i=1 Therefore, we have that 1 (Ft = 1 ; `t ≤ IU ) · µ gt X πi λi pi ≥ Z ∗ 1 (Ft = 1 ; `t ≤ IU ) · µ gt X i=1 ci λi pi . i=1 Let C ∗ = 1 − 1/k − 2/k(1−α)/3 − 1/k1−α . Note that by definition of `t , we have that µ `t X ci λi p̂t−1 ≤1− i i=1 `X t +1 1 2 − (1−α)/3 < µ ci λi p̂t−1 , i k k i=1 P t which implies that µ `i=1 ci λi p̂t−1 ≥ C ∗ since it follows from Assumption 1(c) that ci λi µ ≤ 1/k1−α for all i. Then, we have i the following inequalities T X E [ZGt ] ≥ t=1 ρ 1− = ρ 1− ≥ ρ 1− 1 Z k(1+2α)/3 1 k(1+2α)/3 T X t=1 Z k(1+2α)/3 1 ∗ ∗ T X t=1 Z ∗ T X t=1 " E 1 (Ft = 1 ; `t ≤ IU ) · µ " ( gt X ( ci λi pi i=1 E 1 (Ft = 1 ; `t ≤ IU ) µ " # gt X ci λi p̂t−1 i i=1 ∗ E 1 (Ft = 1 ; `t ≤ IU ) C + µ +µ gt X ci λi pi − i=1 gt X i=1 ci λi pi − p̂t−1 i )# p̂t−1 i )# where the last inequality follows from the fact that when Ft = 1, gt = `t . Therefore, T X E [ZGt ] ≥ t=1 1 ρ 1− k(1+2α)/3 +ρ 1 − ≥ ≥ 1 (Ft = 1 ; `t ≤ IU ) 2 Z E4 t=1 1− 2 Z E4 ∗ 1 k(1−α)/3 1− 2 Z E4 (1 − γt ) t=1 3 t−1 ci λi µ pi − p̂i 5 i=1 1 k(1−α)/3 X T (1 − γt ) t=1 X ci λi µ i=1 3 t−1 pi − p̂ 5 i X max{gt ·1(Ft =1 ; `t ≤IU )} ∗ k(1+2α)/3 X T T gt ·1(Ft =1 X X; `t ≤IU ) 3 t−1 ci λi µ pi − p̂i 5 i=1 t=1 1 # t−1 ci λi µ pi − p̂i i=1 T gt ·1(Ft =1 X X; `t ≤IU ) gt X P {Ft = 1} P {`t ≤ IU } ∗ 1 k(1+2α)/3 −ρ 1 − k(1+2α)/3 Z∗C∗ρ 1 − T X t=1 1 = Z∗C T X ∗ # i=1 t=1 1 k(1+2α)/3 −ρ 1 − " ci λi µ pi − p̂t−1 i E [1 (Ft = 1 ; `t ≤ IU )] Z E k(1+2α)/3 Z∗C∗ρ 1 − T X ∗ 1 1 (Ft = 1 ; `t ≤ IU ) gt X t=1 k(1+2α)/3 T X t=1 Z∗C∗ k(1+2α)/3 −ρ 1 − Z E 1 1 ρ 1− " k(1+2α)/3 −ρ 1 − E [1 (Ft = 1 ; `t ≤ IU )] ∗ k(1+2α)/3 1 ρ 1− T X t=1 1 = Z∗C∗ t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i where the first equality follows P from the fact that `t and Ft are independent. The second inequality follows from Lemma 8. Note that we assume that 0i=1 ci λi µ(pi − p̂t−1 ) = 0. i Note that when Ft = 1, gt = `t . Using the fact that ci λi µ ≤ 1/k1−α for all i, we have that 2 E4 X ci λi µ i=1 3 2 IU X t−1 pi − p̂ 5 ≤ µ ci λi E 4 i X max{gt ·1(Ft =1 ; `t ≤IU )} i=1 t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i 3 t−1 pi − p̂ 5 i X t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i Putting everything together, we have that PT E [ZGt ] T Z∗ t=1 ≥ ρC ∗ 1 − 1 1− k(1+2α)/3 2 1 X − ρ ci µλi E 4 T i=1 IU ≥ ρ 1− 1 k(1+2α)/3 −ρ 1 − ≥ ρ 1− − t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i −ρ 1 − k(1−α)/3 k(1+2α)/3 1 k(1+2α)/3 3 t−1 pi − p̂ 5 i X T 1 X (1 − γt ) T t=1 2 IU 1 X ci µλi E 4 T i=1 1 3 − (1−α)/3 k(1+2α)/3 k T 1 X (1 − γt ) T t=1 k(1−α)/3 3 1 1 − 3 t−1 pi − p̂ 5 i t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i PT X t=1 γt T 2 IU 1 X ci µλi E 4 T i=1 X t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i 3 t−1 pi − p̂ 5 i where is the desired result. Note that the second inequality follows from the fact that C ∗ 1− 1− = 1− = + − 1− = − k(1−α)/3 1 2 1 − (1−α)/3 − 1−α k k k 2 1 1 − (1−α)/3 − 1−α k k k 1 k(1+2α)/3 2 k(1−α)/3 − + 1 k(1+2α)/3 1 k(1−α)/3 2 k(2+α)/3 − 1− 1− 1 1 k(1+2α)/3 − 3 2 1 − (5−2α)/3 k(4−α)/3 k 1 3 ≥ 1 − (1+2α)/3 − (1−α)/3 k k + − 3 k(2+α)/3 + k(1−α)/3 − k(2−2α)/3 + + − k(1+2α)/3 2 k(1−α)/3 1− k(1+2α)/3 k(2+α)/3 1 1 + + 3 + G.3 1 3 1 1 1 + (5+α)/3 + (5−2α)/3 + 1−α k k k k 1− = 1− k(1+2α)/3 1− = 1 2 k + 3 k(2+α)/3 − k(1−α)/3 1 k(1−α)/3 + 1 k(2+α)/3 1 1 1 1 + (4+2α)/3 + (4−α)/3 − (5+α)/3 k k k k + − 1 k1−α 2 k(4+α)/3 3 k 1 + + 2 1 − 1−α k k(2−2α)/3 + 1 k(4−α)/3 2 1 k(4+2α)/3 + + k(4−α)/3 − + 1 k(4−4α)/3 2 + k(2−2α)/3 − 1 k(5−2α)/3 1 k(4−4α)/3 1 k(5+α)/3 1 k(4−4α)/3 Proof of Lemma 10 The proof of Lemma 10 makes use of the following results. The first result provides an upper bound on the sample average for a Bernoulli random variable. Since the result follows from a standard application of Chernoff bound (cf. [20]), we omit the proof. Lemma 11. Let X1 , X2 , . . . , Xn be independent and identically distributed Bernoulli random variables with a parameter p, i.e. for all i, P{Xi = 1} = 1 − P{Xi = 0} = p. Then, for any 0 < < 1, E p − n 2 1X Xi ≤ + 2e−n /3 . n i=1 The next result is a restatement of the Berstein’s Inequality for real-valued random variables. For a proof, the reader is refer to Appendix B in [22]. Theorem 3. Let Y1 , P . . . , Yn be independent nonnegative variables and 0 ≤ Yi ≤ M for all i. For any i, let Pn random 2 σi2 = V ar(Yi ). Let Y = n i=1 Yi and µ = E [Y ]. If V ≥ i=1 σi , then for any 0 < δ < 1, ( P {Y ≤ (1 − δ) µ} ≤ exp − 12 δ 2 µ2 V + 13 M δµ ) Using the fact that, if a, b and c are nonnegative, the function x 7→ (ax2 )/(b + cx) is increasing in x for x ≥ 0, the corollary below follows immediately from the theorem. Corollary 1. Let YP variables and 0 ≤ Yi ≤ M for all i. For any i, let 1 , . . . , Yn be independent nonnegative Pn random 2 σi2 = V ar(Yi ). Let Y = n i=1 Yi and µ = E [Y ]. If V ≥ i=1 σi , then for any 0 < δ < 1 and 0 ≤ η ≤ µ, ( P {Y ≤ (1 − δ) η} ≤ exp − 12 δ 2 η 2 V + 13 M δη ) The next lemma establishes an upper bound on the expected difference between pi and p̂t−1 . i Lemma 12. For any keyword i and 0 < , δ < 1, 2 E4 X 3 pi − p̂t−1 5 i ≤ T + 1 − exp t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i + −δ 2 λi ρ2 1 1− 2 2 1 k(1+2α)/3 1 − exp −2 (1 − δ)λi µρ 1 − 1 k(1+2α)/3 4 , 3 Proof. Let 0 < , δ < 1 be given. Note that when Ft = 1, we have that gt = `t . Therefore, it suffices to consider only keywords i such that i ≤ IU . So, let any i ≤ IU be given. To prove the desired result, it suffices to show that the upper bound holds for any arbitrary set of periods 1 ≤ t1 < t2 < · · · < tw ≤ T such that gth · 1 (Fth = 1, `th ≤ IU ) ≥ i, ∀h = 1, . . . , w. Note that during time periods t1 , . . . , tw , we have selected keywords based on the prefix ordering (i.e. Fth = 1 and gth = `th for all h) whose largest index gth includes i and is less than or equal to IU . We will show that, for any t1 , . . . , tw satisfying the above inequality, the conditional expectation 2 X E4 t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i 3 pi − p̂t−1 (t1 , `t , Ft ) , . . . , (tw , `t , Ft ) , 5 w w i 1 1 is bounded above by the expression given in Lemma 12. For ease of exposition, we will drop the reference to the conditioning information (t1 , `t1 , Ft1 ) , . . . , (tw , `tw , Ftw ) when it is clear from the context. Recall that Yi (t) denotes the total cumulative impressions that the ad associated with keyword i has received at the end of period t. Conditioned on (t1 , `t1 , Ft1 ) , . . . , (tw , `tw , Ftw ), we have that 2 X E4 t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i = = w X h=1 w X t −1 E p̂ih 3 pi − p̂t−1 (t1 , `t , Ft ) , . . . , (tw , `t , Ft )5 w w i 1 1 − pi th −1 E p̂i − pi · 1 Yi (th − 1) ≤ (1 − δ)(h − 1)λi ρµ 1 − h=1 + w X t −1 E p̂ih w X P h=1 + ≤ w X 2 k(1+2α)/3 1 1 − exp −2 (1 − δ)λi ρµ 1 − 1 h=1 3 k(1+2α)/3 Yi (th − 1) ≤ (1 − δ)(h − 1)λi ρµ 1 − + T + 1 + 2 exp −2 (1 − δ)(h − 1)λi ρµ 1 − P k(1+2α)/3 k(1+2α)/3 h=1 1 1 Yi (th − 1) ≤ (1 − δ)(h − 1)λi ρµ 1 − w X k(1+2α)/3 − pi · 1 Yi (th − 1) > (1 − δ)(h − 1)λi ρµ 1 − h=1 ≤ 1 (2) 3 k(1+2α)/3 where the first inequality follows from Lemma 11 and the final inequality follows from the formula for the geometric series. Thus, to establish the desired result, it suffices to show that w X P Yi (th − 1) ≤ (1 − δ)(h − 1)λi ρµ 1 − h=1 1 k(1+2α)/3 ≤ 1 1 − exp −δ 2 λi ρ2 1 − 1 k(1+2α)/3 2 4 Gt Recall that Br h denote the remaining account balance in period th just before the arrival of the rth search query (in period t th ), assuming that we bid on keywords in Gth = {1, . . . , gth }. And Qrh denotes the keyword associated with the rth search query in period th . For any 1 ≤ h ≤ w, let Uh be defined by min{S th ,bµc} Uh X ≡ Gth 1 Qtrh = i, Br ≥ ci r=1 = ∞ X Gth ≥ ci , min S th , bµc ≥ r Gth ≥ ci , S th ≥ r 1 Qtrh = i, Br r=1 = bµc X 1 Qtrh = i, Br r=1 The random variable Uh provides a lower bound on the number of impressions that keyword i receives in period th . Thus, for any 1 ≤ h ≤ w, h−1 X Uv ≤ Yi (th − 1) . v=1 Note that, conditioned on (t1 , `t1 , Ft1 ) , . . . , (tw , `tw , Ftw ), the random variables U1 , U2 , . . . , Uw are independent because Si ’s are i.i.d. and the distribution of Uh depends only on Gth which is fixed given our conditioning information. We can thus P apply the result of Corollary 1 to the random variable h−1 v=1 Uv . To do so, we need to determine its expectation and variance. Conditioned on t1 , . . . , tw , we will show that for any 1 ≤ h ≤ w, E "h−1 X # 1 Uv ≥ (h − 1)λi µρ 1 − k(1+2α)/3 v=1 h−1 X and V ar [Uv ] ≤ (h − 1)λi µ2 . v=1 To prove this result, note that for any 1 ≤ h ≤ w, E [Uh ] = bµc X n Gth λi P Br ≥ ci , S th ≥ r o r=1 ≥ = λi 1 − k(1+2α)/3 th λi E min S , µ = X bµc 1 λi µρ 1 − P S th ≥ r r=1 1 k(1+2α)/3 1− 1 k(1+2α)/3 , where the inequality follows from Lemma 5 and the fact that Gth = {1, 2, . . . , gth } and gth ≤ IU for all 1 ≤ h ≤ w. Note that since gth ≤ IU for all h, it follows from the definition of IU that g th X µ cu λu pu ≤ 1 − u=1 1 1 − (1−α)/3 , k k which is the required hypothesis for the application of Lemma 5. The above result implies that for any 1 ≤ h ≤ w, E "h−1 X # Uv ≥ (h − 1)λi µρ 1 − v=1 1 , k(1+2α)/3 Moreover, conditioned on (t1 , `t1 , Ft1 ) , . . . , (tw , `tw , Ftw ), for any 1 ≤ h ≤ w, V ar [Uh ] ≤ = E Uh2 20 6B min{S th ,bµc} X E 4@ Hth 1 Qtrh = i, Br 12 3 C 7 ≥ ci A 5 r=1 20 ≤ E 4@ bµc X 12 3 1 Qtrh = i A 5 r=1 = 2 bµc X E4 1 Qtrh = i + r=1 X 3 1 Qtrh = i, Qtqh = i 5 1≤q,r≤bµc:q6=r = λi bµc + λ2i bµc(bµc − 1) ≤ λi µ2 where the last equality follows from the fact that, for r 6= q, the arrival of the rth and q th queries are independent of each other. The final inequality follows from our assumption that µ > 1. The above result implies that h−1 X v=1 V ar [Uv ] ≤ (h − 1)λi µ2 . Note that 0 ≤ Uh ≤ µ for all h and i. Therefore, by applying Corollary 1, we can conclude that for any 1 ≤ h ≤ w, P Yi (th − 1) ≤ (1 − δ)(h − 1)λi µρ 1 − 1 ≤ k(1+2α)/3 P exp where the last inequality follows from the fact that 1 + 13 δ 1 − Hence, Yi (tih − 1) ≤ (1 − δ)(h − 1)λi µρ 1 − P h=1 1 k(1+2α)/3 k(1+2α)/3 − 12 δ(h − 1)λi µρ 1 − ) 1 9 > = i2 1 k(1+2α)/3 h i 1 > > : (h − 1)λi µ2 + 31 µδ (h − 1)λi µρ 1 − (1+2α)/3 ; k 8 9 2 1 > > < = − 12 δ 2 (h − 1)2 λ2i µ2 ρ2 1 − k(1+2α)/3 exp 1 > > : (h − 1)λi µ2 + 31 δ(h − 1)λi µ2 ρ 1 − (1+2α)/3 ; k 8 2 9 1 > > = < − 12 δ 2 (h − 1)λi ρ2 1 − (1+2α)/3 k exp 1 > > ; : 1 + 13 δρ 1 − k(1+2α)/3 9 8 2 1 > > = < δ 2 (h − 1)λi ρ2 1 − (1+2α)/3 ≤ h exp = Uv ≤ (1 − δ)(h − 1)λi µρ 1 − 8 > < = v=1 ≤ w X (h−1 X k − > : 1 k(1+2α)/3 ≤ > ; 4 ≤ 2. 8 > < δ 2 (h − 1)λi ρ2 1 − w X exp h=1 − > : 1 − exp k > ; 4 ≤ 2 9 > = (1+2α)/3 1 −δ 2 λi ρ2 1 1− 2 1 k(1+2α)/3 4 where the last inequality follows from the standard bound for the geometric series. Since (t1 , `t1 , Ft1 ) , . . . , (tw , `tw , Ftw ) are arbitrary, the desired result follows. Finally, here is the proof of Lemma 10. Proof. It follows from Lemma 12 that IU X 2 ci λi µE 4 i=1 ≤ t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i IU X ci λi µT + i=1 ≤ 3 pi − p̂t−1 5 i X i=1 IU X ci λi µT + i=1 = T IU X IU X 1 − exp IU X i=1 −δ 2 λi ρ2 δ 2 ρ2 k(1+2α)/3 2 + 1 1 k(1+2α)/3 2 IU X 2 + 1 2 (1 − δ)ρ 1 − 2 (1 1 − exp 1 k(1+2α)/3 6 − δ)ρ 1 − 2ci λi µ 6ci λi µ ci + i=1 4 IU X i=1 IU X i=1 k(1+2α)/3 8µ 1− 1− 8ci λi µ δ 2 λi ρ2 1 − ci λi µ + i=1 ci λi µ 1 k(1+2α)/3 −2 (1 − δ)λi ρµ 1 − (1 − e−λi µ ) IU X i=1 ci λi µ , (1 − e−λi µ ) where the last inequality follows from the fact that for any 0 ≤ x ≤ 1 and any d ≥ 0, 1 2 ≤ 1 − e−x x and 1 1 ≤ , 1 − e−dx x (1 − e−d ) which implies that 1 1 − exp −δ 2 λi ρ2 1 − 1 k(1+2α)/3 2 ≤ 4 8 δ 2 λi ρ2 1 − 1 k(1+2α)/3 2 1 k(1+2α)/3 3 and 1 1 − exp −2 (1 − δ)λi µρ 1 − ≤ 1 k(1+2α)/3 3 2 (1 − δ)ρ 1 − 3 1 k(1+2α)/3 (1 − e−λi µ ) Using our assumption that ci ≤ 1/k and λi µ ≤ kα for all i, we can conclude that for any 0 < , δ < 1, IU X 2 ci λi µE 4 i=1 3 t−1 5 ≤ T IU + pi − p̂ i 1−α X t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i k 8µIU δ 2 kρ2 1 − 2 + 1 k(1+2α)/3 6 PIU i=1 λi µ/ 1 − e−λi µ 2 k(1 − δ)ρ 1 − 1 , k(1+2α)/3 which is the desired result. Set = k(2−2α)/3 /IU . By definition of IU , it is easy to verify that < 1. Then, we have that for any 0 < δ < 1, IU X i=1 2 ci λi µE 4 X t≤T :gt ·1(Ft =1 ; `t ≤IU )≥i which gives the desired result. 3 t−1 5 ≤ pi − p̂ i T k(1−α)/3 + 8µIU δ 2 kρ2 1 − 1 k(1+2α)/3 2 + 6IU2 PI U i=1 λi µ/ 1 − e−λi µ k(7−4α)/3 (1 − δ)ρ 1 − 1 k(1+2α)/3
© Copyright 2026 Paperzz