Economics and Probabilistic Machine Learning David M. Blei Departments of Computer Science and Statistics Columbia University Modern probabilistic modeling An efficient framework for discovering meaningful patterns in massive data. Ñ Ñ Duct Tape”. Many available packages. Typically fast and scalable. Ñ Ñ and assumptions. Challenging to implement. May not be fast or scalable. How to use traditional machine learning and statistics to solve modern problems Probabilistic machine learning: tailored models for the problem at hand. Probabilistic machine learning: tailored models for the problem at hand. I Compose and connect reusable parts I Driven by disciplinary knowledge and its questions I Large-scale data, both in terms of data points and data dimension I Focus on discovering and using structure in unstructured data I Exploratory, observational, causal analyses Ñ Ñ Many available packages. Typically fast and scalable. Ñ Ñ Challenging to implement. May not be fast or scalable. Many software packages available; typically fast and scalable More challenging to implement; may not be fast or scalable KNOWLEDGE & QUESTION DATA R A. 7Aty This content downloaded from 128.59.38.144 on Thu, 12 Nov 2015 01:49:31 UTC All use subject to JSTOR Terms and Conditions K=7 LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 K=8 LWK Make assumptions YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH Discover patterns pops 1 2 3 4 5 6 7 Predict & Explore 8 K=9 LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 8 9 Figure S2: Population structure inferred from the TGP data set using the TeraStructure algorithm at three values for the number of populations K. The visualization of the ✓’s in the Figure shows patterns consistent with the major geographical regions. Some of the clusters identify a specific region (e.g. red for Africa) while others represent admixture between regions (e.g. green for Europeans and Central/South Americans). The presence of clusters that are shared between different regions demonstrates the more continuous nature of the structure. The new cluster from K = 7 to K = 8 matches structure differentiating between American groups. For K = 9, the new cluster is unpopulated. 28 The probabilistic pipeline I Design models that reflect our domain expertise and knowledge I Given data, compute the approximate posterior of hidden variables I Use the computation to predict the future or explore the patterns in your data. alian Japanese BalochiBantuKenya BantuSouthAfrica Basque Bedouin BiakaPygmy Brahui Burusho Cambodian Colombian Dai Daur Druze French Han Han−NChina HazaraHezhen Italian Japanese Kalash Karitiana Lahu Makrani Mandenka prob Adygei Kalash Karitiana Lahu Makrani Mandenka MayaMbutiPygmy Melanesian Miao Mongola Mozabite NaxiOrcadian Oroqen Palestinian Papuan Pathan Pima Russian San Sardinian She Sindhi Surui Tu TujiaTuscan UygurXibo Yakut Yi Yoruba pops 1 2 3 4 5 6 7 Population analysis of 2 billion genetic measurements MayaMbutiPygmy Melanesia M Communities discovered in a 3.7M node network of U.S. Patents Annual Review of Statistics and Its Application 2014.1:203-232. Downloaded from www.annualreviews.org by Princeton University Library on 01/09/14. For personal use only. 1 2 3 4 5 Game Season Team Coach Play Points Games Giants Second Players Life Know School Street Man Family Says House Children Night Film Movie Show Life Television Films Director Man Story Says Book Life Books Novel Story Man Author House War Children Wine Street Hotel House Room Night Place Restaurant Park Garden 6 7 8 9 10 Bush Campaign Clinton Republican House Party Democratic Political Democrats Senator Building Street Square Housing House Buildings Development Space Percent Real Won Team Second Race Round Cup Open Game Play Win Yankees Game Mets Season Run League Baseball Team Games Hit Government War Military Officials Iraq Forces Iraqi Army Troops Soldiers 11 12 13 14 15 Children School Women Family Parents Child Life Says Help Mother Stock Percent Companies Fund Market Bank Investors Funds Financial Business Church War Women Life Black Political Catholic Government Jewish Pope Art Museum Show Gallery Works Artists Street Artist Paintings Exhibition Police Yesterday Man Officer Officers Case Found Charged Street Shot Figure 5 Topics found in a corpus of 1.8 million articles from the New York Times. Modified from Hoffman et al. (2013). Topics found in 1.8M theonNew York Times a particular movie), ourarticles prediction of thefrom rating depends a linear combination of the user’s embedding and the movie’s embedding. We can also use these inferred representations to find groups of users that have similar tastes and groups of movies that are enjoyed by the same kinds of users. Figure 4c illustrates the graphical model. This model is closely related to a linear factor model, Neuroscience analysis of 220 million fMRI measurements EMNLP 2016 Submission ***. Confidential review copy. DO NOT DISTRIBUTE. 1 1 1 1 1 1 1 1 1 1 1 1 1 Figure 1: Measure of “eventness,” or time interval impact on cable content (Eq. 2). Grey background indicates the number of cables sent over time. This comes from the model fit we discuss in Section 3. Capsule successful detects real-world events from National Archive diplomatic cables. Events uncovered from 2M diplomatic cables and the primary sources around them. We develop Capsule, a probabilistic model for detecting and characterizing important events in large collections of historical communication. Figure 1 illustrates Capsule’s analysis of the two 1 1 1 1 “event” to be a moment in history when embassies deviate from what each usually discusses, and when each embassy deviates in the same way. Capsule embeds this intuition into a Bayesian model. It uses hidden variables to encode what “typi- 1 1 1 1 1 1 guacamole spicy avoclassic guacamole salsa classic avomex salsa pico de gallo rojos salsa mild deli counter tostitos tortilla chips gold salsa mild rojo salsa medium rojo 16 oz tostitos scoops tortilla chips wht corn tostitos restaurant style tortilla chips salsa mild deli counter 16 oz tostitos hint of lime salsa medium deli counter 16 oz la tapatia hmstyle tort chips la tapatia trtlla chips ca styl tostitos restaurant style tortilla chips tostitos tortilla chips multigrain tostitos scoops tortilla chips super sz santitas tortilla chips yellow corn pico de gallo garden highway padrinos tort strps rstrnt styl taco bell taco sauce mild pace picante sauce medium mission brn bag trtla chps wht crn trngl sfwy taco shells jumbo rosarita refried beans mission brown bag tortilla chips round mission brown bag trtla strps white corn lawrys taco spices and seasoning frsh exp shreds taco bell refried beans pace picante sauce mild mission tortilla strips family size mission tortilla rounds family size taco bell refried beans ff mission brown bag tortilla triangles sfwy taco shells white corn taco bell taco shells concord guacamole mix mild kraft cheese taco finely shredd santitas tortilla chips white corn ortega taco seasoning mix taco bell taco seasoning mix salsa medium deli counter pace thick & chnky salsa medium old el paso stand n stuff taco shells pace thck & chnky salsa mild mcrmck seasoning mix taco pace picante sauce medium kraft chse chdr/monterey jk finely shrd mission jumbo taco shells kraft 4 cheese mexican blend fine shred oep taco shells rosarita refried beans fat free mission tortilla burrito mcrmck seasoning mix taco mild kraft chse chdr/monterey jk finely shrd mission tortilla taco soft ff ortega taco shells lucerne cheese taco blend finely shred kraft 4 cheese mexican finely shredded sfy sel salsa chipotle oep taco shells lucerne cheese 4 blend mexican ortega taco shells white corn sfy sel salsa peach/pineapple lawrys taco seasoning mix lucerne finely shredded cheese mexican sfy sel salsa southwest hot sargento 4 chs mexican blend sfy sel salsa verde medium sargento 4 cheese mexican blend shredded mission white corn taco shells mission flour tort burrito 8 ct lawrys taco seasoning mix chckn sfwy refried beans vegetarian mission fajita size sfwy olives ripe sliced mission tortilla fluffy gordita mcrmck seasoning mix taco ls mission tortilla soft taco sfwy refried beans nf mission tortilla soft taco sfwy olives ripe sliced rosarita refried beans f/f ortega refried beans fat free mission fajita size tortilla rosarita refried beans vegetrn la tapatia tortilla bby burrito guerrero tort riquisima reseal rosarita refried blk beans l/f sfy sel salsa southwest mild sfy sel salsa southwest med mission tort sft taco 30ct lucerne 4 cheese mex shredded mission corn tortillas 36 ct guerrero wht corn tortilla 50ctguerrero tortilla burrito 10ct guerrero tortlla riquisima 8 in lucerne cheese colby jack shredded lucerne cheese cheddar medium shredded lucerne cheddar mild shredded lucerne cheese monterey jack shredded mission tortilla corn super sz mission corn tortillas 7in 12ct la tapatia corn tortilla grmt rosarita refried beans vegetarn rosarita refried beans z−slsa f/f foster farms ground chicken Item characteristics learned from 5.6M purchases don antonio trtla wht corn herdez salsa verde mild KNOWLEDGE & QUESTION DATA R A. 7Aty This content downloaded from 128.59.38.144 on Thu, 12 Nov 2015 01:49:31 UTC All use subject to JSTOR Terms and Conditions K=7 LWK YRI ACB ASW CDX CHB CHS JPT KHV LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 K=8 Make assumptions CEU Discover patterns pops 1 2 3 4 5 6 7 Predict & Explore 8 K=9 LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 8 9 Figure S2: Population structure inferred from the TGP data set using the TeraStructure algorithm at three values for the number of populations K. The visualization of the ✓’s in the Figure shows patterns consistent with the major geographical regions. Some of the clusters identify a specific region (e.g. red for Africa) while others represent admixture between regions (e.g. green for Europeans and Central/South Americans). The presence of clusters that are shared between different regions demonstrates the more continuous nature of the structure. The new cluster from K = 7 to K = 8 matches structure differentiating between American groups. For K = 9, the new cluster is unpopulated. 28 Our perspective: I Customized data analysis is important to many fields. I This pipeline separates assumptions, computation, application. I It facilitates solving data science problems. KNOWLEDGE & QUESTION DATA R A. 7Aty This content downloaded from 128.59.38.144 on Thu, 12 Nov 2015 01:49:31 UTC All use subject to JSTOR Terms and Conditions K=7 LWK YRI ACB ASW CDX CHB CHS JPT KHV LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 K=8 Make assumptions CEU Discover patterns pops 1 2 3 4 5 6 7 Predict & Explore 8 K=9 LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 8 9 Figure S2: Population structure inferred from the TGP data set using the TeraStructure algorithm at three values for the number of populations K. The visualization of the ✓’s in the Figure shows patterns consistent with the major geographical regions. Some of the clusters identify a specific region (e.g. red for Africa) while others represent admixture between regions (e.g. green for Europeans and Central/South Americans). The presence of clusters that are shared between different regions demonstrates the more continuous nature of the structure. The new cluster from K = 7 to K = 8 matches structure differentiating between American groups. For K = 9, the new cluster is unpopulated. 28 What we need: I Flexible and expressive components for building models I Scalable and generic inference algorithms I New applications to stretch probabilistic modeling into new areas KNOWLEDGE & QUESTION DATA R A. 7Aty This content downloaded from 128.59.38.144 on Thu, 12 Nov 2015 01:49:31 UTC All use subject to JSTOR Terms and Conditions K=7 LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 K=8 LWK Make assumptions YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH Discover patterns pops 1 2 3 4 5 6 7 Predict & Explore 8 K=9 LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 8 9 Figure S2: Population structure inferred from the TGP data set using the TeraStructure algorithm at three values for the number of populations K. The visualization of the ✓’s in the Figure shows patterns consistent with the major geographical regions. Some of the clusters identify a specific region (e.g. red for Africa) while others represent admixture between regions (e.g. green for Europeans and Central/South Americans). The presence of clusters that are shared between different regions demonstrates the more continuous nature of the structure. The new cluster from K = 7 to K = 8 matches structure differentiating between American groups. For K = 9, the new cluster is unpopulated. 28 Criticize model Revise I Here I discuss two threads of research with Susan Athey’s group. I Build probabilistic models to analyze large-scale consumer behavior; many consumers choosing among many items I (Caveat: I’m not an economist.) I also joint with Francisco Ruiz I Vision: a utility model for baskets of items: U.basket/ D [subs/comps] C [shopper] C [prices] C [other] C I Goals – design, fit, check, and revise this model – answer counterfactual questions about purchase behavior Economic embeddings Identifying substitutes and co-purchases in large-scale consumer data. I Word embeddings are a powerful approach for analyzing language. I Discovers a distributed representation of words – Distances appear to capture semantic similarity. I Many variants, but each reflects the same main ideas: – Words are placed in a low-dimensional latent space – A word’s probability depends on its distance to other words in its context Bengio et al., A neural probabilistic language model. Journal of Machine Learning Research, 2003. Mikolov et al., Efficient estimation of word representations in vector space. Neural Information Processing Systems, 2013. Image: Paul Ginsparg angel hair buitoni sauce alfredo light buitoni sauce pesto basil r/f buitoni tortellini ital sausage buitoni tortellini chkn/prsct buitoni tortellini 3 cheese buitoni tortellini mixed cheese buitoni tortellini chse fmly sz buitoni tortellini herb chck fam sz buitoni ravioli cheese buitoni ravioli 4 cheese buitoni fettucine fresh e sauce alfredo buitoni sauce alfredo fmly sz buitoni linguine fre sauce marinara fresh buitoni genova meat sauce I Exponential family embeddings generalize this idea to other types of data. I Use generalized lineargenova models and exponential families ricotta ravioli genova meat ravioli I Examples: neuroscience; recommender systems; networks; shopping baskets I Bigger picture: A statistical perspective on ideas from neural networks. Table 2: Analysis of neural data: mean squared error and standard errors of neural activity (on the test set) for different models. Both models significantly Zebrafish outperform ; is more accuratebrain than activity - . Figure 1: Top view of the zebrafish brain, with blue circles at the location of the individual neurons. We zoom on 3 neurons and their 50 nearest neighbors (small blue dots), visualizingInteractions the “synaptic weights” learned by a model (K D 100). between countries The edge color encodes the inner product of the neural embedding vector and the context vectors ⇢n> ˛m for each neighbor m. Positive values are green, negative values are red, and the transparency is proportional to the magnitude. With these weights we can form hypotheses about how nearby neurons are connected. the lagged activity conditional on the simultaneous lags of surrounding neurons. We studied context sizes c 2 f10; 50g and latent dimension K 2 f10; 100g. Models. We compare to probabilistic factor analysis ( ), fitting Kdimensional factors for each neuron and K-dimensional factor loadings for each time frame. In , each entry of the data matrix is a Gaussian with mean equal to the Vacation-town deli Jam PB #1 PB #2 Soda Bread Pizza I Consider a vacation-town deli; it has six items. I Customers either buy [pizza, soda] or [peanut butter, jam, bread] Customers only buy one type of peanut butter at a time. I Items bought together (or not) are co-purchased (or not). The peanut butters are substitutes. (For now we ignore many issues, e.g., formal definitions, price, causality.) I We would like to capture this purchase behavior. Vacation-town deli Jam PB #1 PB #2 Soda Bread Pizza I We endow each item with two (unknown) locations in a real space Rk : an embedding and context vector ˛ . I The conditional probability of each item depends on its embedding and the context vectors of the other items in the basket, xb;i j xb; I i n P o Poisson exp i> j ¤i ˛j xb;j : ˛i are latent product attributes i indicate how product i interacts with other products’ attributes 2.5 2 ⇢ ˛ .interaction/ Embedding Context .attributes/ Pizza Soda 1.5 1 0.5 Peanut butter #2 Peanut butter #1 Soda Jelly Bread 0 Pizza −0.5 Bread Jelly −1 −1.5 Peanut butter #1 Peanut butter #2 −2 −2.5 −3 −2 −1 0 1 2 3 Pizza and soda are never bought with bread, jam, and PB; and vice versa 2.5 2 ⇢ ˛ .interaction/ Embedding Context .attributes/ Pizza Soda 1.5 1 0.5 Peanut butter #2 Peanut butter #1 Soda Jelly Bread 0 Pizza −0.5 Bread Jelly −1 −1.5 Peanut butter #1 Peanut butter #2 −2 −2.5 −3 −2 −1 0 1 2 Bread, jam, and PB are bought together 3 2.5 2 ⇢ ˛ .interaction/ Embedding Context .attributes/ Pizza Soda 1.5 1 0.5 Peanut butter #2 Peanut butter #1 Soda Jelly Bread 0 Pizza −0.5 Bread Jelly −1 −1.5 Peanut butter #1 Peanut butter #2 −2 −2.5 −3 −2 −1 0 1 PB #1 is never bought with PB #2 2 3 2.5 2 ⇢ ˛ .interaction/ Embedding Context .attributes/ Pizza Soda 1.5 1 0.5 Peanut butter #2 Peanut butter #1 Soda Jelly Bread 0 Pizza −0.5 Bread Jelly −1 −1.5 Peanut butter #1 Peanut butter #2 −2 −2.5 −3 −2 −1 0 1 2 PB #1 is bought with similar items as PB #2 3 Exponential Family Embedding I The goal of an EF - EMB is to discover a useful representation of data I Observations x D x1Wn , where xi is a D -vector I Examples: DOMAIN INDEX VALUE Language Neuroscience Network Shopping position in text i neuron and time .n; t / pair of nodes .s; d / item and basket .d; b/ word indicator activity level edge indicator number purchased Exponential Family Embedding xi OBSERVED DATA EMBEDDINGS CONTEXT VECTORS ⇢ ˛ I Three ingredients: context, conditional exponential family, embedding structure I Two latent variables per data index, an embedding and a context vector I Model each data point conditioned on its context and latent variables. I The latent variables interact in the conditional. How depends on which indices are in context and which one is modeled Context I Each data point i has a context ci , a set of indices of other data points. I We model the conditional of the data point given its context, I Examples p.xi j xci /. DOMAIN DATA POINT CONTEXT Language Neuroscience Network Shopping word neuron activity edge purchased item surrounding words activity of surrounding neurons other edges on the two nodes other item counts on the same trip Context I Each data point i has a context ci , a set of indices of other data points. I We model the conditional of the data point given its context, I Examples p.xi j xci /. DOMAIN DATA POINT CONTEXT Language Neuroscience Network Shopping word neuron activity edge purchased item surrounding words activity of surrounding neurons other edges on the two nodes other item counts on the same trip Context I Each data point i has a context ci , a set of indices of other data points. I We model the conditional of the data point given its context, I Examples p.xi j xci /. DOMAIN DATA POINT CONTEXT Language Neuroscience Network Shopping word neuron activity edge purchased item surrounding words activity of surrounding neurons other edges on the two nodes other item counts on the same trip Conditional exponential family xi OBSERVED DATA EMBEDDINGS CONTEXT VECTORS ⇢ ˛ I The EF - EMB has latent variables for each data point’s index: an embedding Œi and a context vector ˛Œi I These are used in the conditional of each data point, xi j xci exp-fam..xci I Œi ; ˛Œci /; t.xi //: (Poisson for counts, Gaussian for reals, Bernoulli for binary, etc.) Conditional exponential family xi OBSERVED DATA EMBEDDINGS CONTEXT VECTORS ⇢ ˛ I The natural parameter combines the embedding and context vectors, 0 i .xci / D f @Œi > X j 2ci 1 ˛Œj xj A ; I E.g., an item’s embedding (interaction) helps determine its count; its context vector (attributes) helps determine other item’s counts Embedding Structure x ⇢ ˛ I The embedding structure determines how parameters are shared. I E.g., Œi D Œj for i D .Oreos; t/ and j D .Oreos; u/. I Sharing enables learning about an object, such as a neuron, node, or item. Embedding Structure x ⇢ ˛ I The embedding structure determines how parameters are shared. I E.g., Œi D Œj for i D .Oreos; t/ and j D .Oreos; u/. I Sharing enables learning about an object, such as a neuron, node, or item. Embedding Structure x ⇢ ˛ I The embedding structure determines how parameters are shared. I E.g., Œi D Œj for i D .Oreos; t/ and j D .Oreos; u/. I Sharing enables learning about an object, such as a neuron, node, or item. Pseudolikelihood 0 100 200 300 400 500 I We model each data point, conditional on the others. I Combine these ingredients in a “pseudo-likelihood” (i.e., a utility) L.; ˛/ D n X i D1 > i t .xi / a.i / C log f ./ C log g.˛/: I Fit with stochastic optimization; exponential families simplify the gradients. Pseudolikelihood 0 100 200 300 400 500 I The objective resembles a collection of GLM likelihoods. I The gradient is rŒj L D I X i D1 t.xi / EŒt .xi / rŒj i C rŒj log f .Œj /: I (Stochastic gradients give justification to NN ideas like “negative sampling.”) Market basket analysis I Data | Purchase counts of items in shopping trips at a large grocery store – Category-level | 478 categories; 635,000 trips; 6.8M purchases – Item-level | 5,675 items; 620,000 trips; 5.6M purchases I Context | Other items purchased at the same trip I Structure | Embeddings for each item are shared across trips I Family | Poisson (and we downweight the zeros) Market basket analysis I Recall the conditional probability xi j x i n P o Poisson exp i> j ¤i ˛j xj : I ˛i reflects attributes of item i I i reflects the interaction of item i with attributes of other items. tSNE Component 2 A 2D representation of category attributes ˛i tSNE Component 1 shortening powdered sugar condensed milk extracts flour granulated sugar baking ingredients brown sugar pie crust evaporated milk pie filling cream salt frozen pastry dough specialty/miscellaneous deli items infant formula disposable diapers disposable pants baby accessories baby/youth wipes infant toiletries childrens/infants analgesics A 2D representation of item attributes ˛i pasta chkn herb parigianna buitoni ravioli 4 cheese lt buitoni angel hair buitoni sauce alfredo light buitoni sauce pesto basil r/f buitoni tortellini ital sausage buitoni tortellini chkn/prsct buitoni tortellini 3 cheese buitoni tortellini mixed cheese buitoni tortellini chse fmly sz buitoni tortellini herb chck fam sz buitoni ravioli cheese buitoni ravioli 4 cheese buitoni fettucine fresh egg buitoni sauce alfredo buitoni sauce alfredo fmly sz buitoni sauce marinara fresh buitoni genova meat sauce genova ricotta ravioli genova meat ravioli linguine fresh egg buitoni artichoke parm dip stonemill spread garlic herb lt alouette carrs cracker table water garlic & herb spread garlic/herb alouette carrs crackers table water black carrs cracker table water cracked pepper carrs crackers table water w/sesame cheddar smokey sharp deli counter dv gouda domestic dv pep farm cracker butter thins pep farm dist crackers quartet brie president havarti w/dill primo taglio dv brie primo taglio brie cambozola champignon carrs crackers whole wheat toast tiny pride of fran brie supreme dv brie camembert de france dv gouda smoked deli counter dv brie mushroom champignon dv crackers water classic monet spread spinach alouette alouette cheese spread sundrd tom w/bsl brie ile de france cut/wrap brie marquis de lafayette wellington wtr crackers traditional wellington wtr crackers cracked pepper wellington wtr crackers tsted ssme mozz/proscuitto rl primo taglio crackers monet classic shp dip artichoke dip spinach signature 16 oz cps toast garlic & cheese mussos guacamole spicy avoclassic guacamole salsa classic avomex salsa pico de gallo rojos salsa mild deli counter tostitos tortilla chips gold salsa mild rojo salsa medium rojo 16 oz tostitos scoops tortilla chips wht corn tostitos restaurant style tortilla chips salsa mild deli counter 16 oz tostitos hint of lime salsa medium deli counter 16 oz la tapatia hmstyle tort chips la tapatia trtlla chips ca styl tostitos restaurant style tortilla chips tostitos tortilla chips multigrain tostitos scoops tortilla chips super sz santitas tortilla chips yellow corn pico de gallo garden highway padrinos tort strps rstrnt styl taco bell taco sauce mild pace picante sauce medium mission brn bag trtla chps wht crn trngl sfwy taco shells jumbo rosarita refried beans mission brown bag tortilla chips round mission brown bag trtla strps white corn lawrys taco spices and seasoning frsh exp shreds taco bell refried beans pace picante sauce mild mission tortilla strips family size mission tortilla rounds family size taco bell refried beans ff mission brown bag tortilla triangles sfwy taco shells white corn taco bell taco shells concord guacamole mix mild kraft cheese taco finely shredd santitas tortilla chips white corn ortega taco seasoning mix taco bell taco seasoning mix salsa medium deli counter pace thick & chnky salsa medium old el paso stand n stuff taco shells pace thck & chnky salsa mild mcrmck seasoning mix taco pace picante sauce medium kraft chse chdr/monterey jk finely shrd mission jumbo taco shells kraft 4 cheese mexican blend fine shred oep taco shells rosarita refried beans fat free mission tortilla burrito mcrmck seasoning mix taco mild kraft chse chdr/monterey jk finely shrd mission tortilla taco soft ff shred ortega taco shells kraft 4 cheese mexican finely shreddedlucerne cheese taco blend finely sfy sel salsa chipotle oep taco shells lucerne cheese 4 blend mexican ortega taco shells white corn sfy sel salsa peach/pineapple lawrys taco seasoning mix lucerne finely shredded cheese mexican sfy sel salsa southwest hot sargento 4 chs mexican blend sfy sel salsa verde medium sargento 4 cheese mexican blend shredded mission white corn taco shells mission flour tort burrito 8 ct lawrys taco seasoning mix chckn sfwy refried beans vegetarian mission fajita size sfwy olives ripe sliced mission tortilla fluffy gordita mission tortilla soft taco mcrmck seasoning mix taco ls sfwy refried beans nf mission tortilla soft taco sfwy olives ripe sliced rosarita refried beans f/f ortega refried beans fat free mission fajita size tortilla rosarita refried beans vegetrn la tapatia tortilla bby burrito guerrero tort riquisima reseal rosarita refried blk beans l/f sfy sel salsa southwest mild sfy sel salsa southwest med mission tort sft taco 30ct lucerne 4 cheese mex shredded mission corn tortillas 36 ct guerrero wht corn tortilla 50ctguerrero tortilla burrito 10ct guerrero tortlla riquisima 8 in lucerne cheese colby jack shredded lucerne cheese cheddar medium shredded lucerne cheddar mild shredded lucerne cheese monterey jack shredded foster farms ground chicken mission tortilla corn super sz mission corn tortillas 7in 12ct la tapatia corn tortilla grmt rosarita refried beans vegetarn rosarita refried beans z−slsa f/f don antonio trtla wht corn herdez salsa verde mild Category level MODEL Poisson embedding Poisson embedding (downweighting zeros) Additive Poisson embedding Hierarchical Poisson factorization Poisson PCA K D 20 K D 50 7:497 7:110 7:868 7:740 8:314 7:284 6:994 8:191 7:626 9:51 K D 100 7:199 6:950 8:414 7:626 11:01 Item level MODEL Poisson embedding Hierarchical Poisson factorization K=50 K=100 7:72 7:86 7:64 7:87 Gopalan et al., Scalable recommendation with hierarchical Poisson factorization. Uncertainty in Artificial Intelligence, 2015. Collins et al., A generalization of principal component analysis to the exponential family. Neural Information Processing Systems, 2002. 2.5 2 ⇢ ˛ .interaction/ Embedding Context .attributes/ Pizza Soda 1.5 1 0.5 Peanut butter #2 Peanut butter #1 Soda Jelly Bread 0 Pizza −0.5 Bread Jelly −1 −1.5 Peanut butter #1 Peanut butter #2 −2 −2.5 −3 −2 −1 0 1 2 3 We want to use this fit to understand purchase patterns. I Exchangeables have a similar effect on the purchase of other items. I Same-category items tend to be exchangeable and rarely purchased together (e.g., two types of peanut butter). I Complements are purchased (or not purchased) together (e.g., hot dogs and buns). 2.5 2 ⇢ ˛ .interaction/ Embedding Context .attributes/ Pizza Soda 1.5 1 0.5 Peanut butter #2 Peanut butter #1 Soda Jelly Bread 0 Pizza −0.5 Bread Jelly −1 −1.5 Peanut butter #1 Peanut butter #2 −2 −2.5 −3 −2 −1 0 1 2 3 I PB #1 and PB #2 induce similar distributions of other items I But they are rarely purchased together. I Define the sigmoid function between two items, ki , 1 I 1 C expf k> ˛i g N ki , 1 ki 2.5 2 ⇢ ˛ .interaction/ Embedding Context .attributes/ Pizza Soda 1.5 1 0.5 Peanut butter #2 Peanut butter #1 Soda Jelly Bread 0 Pizza −0.5 Bread Jelly −1 −1.5 Peanut butter #1 Peanut butter #2 −2 −2.5 −3 −2 −1 0 1 2 3 I The “substitute predictor” is 0 @ X k=fi;j g ki log ki kj 1 N ki A C N ki log N kj j i log j i 1 j i : 2.5 2 ⇢ ˛ .interaction/ Embedding Context .attributes/ Pizza Soda 1.5 1 0.5 Peanut butter #2 Peanut butter #1 Soda Jelly Bread 0 Pizza −0.5 Bread Jelly −1 −1.5 Peanut butter #1 Peanut butter #2 −2 −2.5 −3 −2 −1 0 1 2 3 I The “complement predictor” is the negative of the last term j i j i log 1 j i : I Notes: – We use the symmetrized version of both quantities – These quantities generalize to other exponential families ITEM 1 ITEM organic vegetables vegetables (<10 oz) baby food stuffing gravy pie filling deli cheese dry pasta/noodles mayonnaise cake mixes 2 SCORE ( RANK ) organic fruits beets (>=10 oz) disposable diapers cranberries stuffing evaporated milk deli crackers tomato paste/sauce/puree mustard frosting Example co-purchases at the category level 6.18 (01) 5.64 (02) 3.43 (32) 3.30 (36) 3.23 (37) 3.09 (42) 2.87 (55) 2.73 (63) 2.61 (69) 2.49 (78) ITEM 1 ITEM bouquets frozen pizza 1 bottled water 1 carbonated soft drinks 1 orange juice 1 bathroom tissue 1 bananas 1 salads-convenience 1 potatoes 1 bouquets 2 roses frozen pizza 2 bottled water 2 carbonated soft drinks 2 orange juice 2 bathroom tissue 2 bananas 2 salads-convenience 2 potatoes 2 blooming SCORE ( RANK ) 0.20 (01) 0.18 (02) -0.07 (03) -0.12 (04) -0.37 (05) -0.58 (06) -0.61 (07) -0.63 (08) -0.66 (09) -1.18 (10) Top ten potential substitutes at the category level ITEM 1 ITEM ygrt peach ff s&w beans garbanzo whiskas cat fd beef parsnips loose celery hearts organic 85p ln gr beef patties 15p fat kiwi imported colby jack shredded star magazine seasoning mix fajita 2 ygrt mxd berry ff s&w beans red kidney whiskas cat food tuna/chicken rutabagas apples fuji organic sesame buns mangos small taco bell taco seasoning mix in touch magazine mission tortilla corn super sz Example co-purchases at the UPC level SCORE ( RANK ) 19.83 (0001) 14.42 (0002) 8.45 (0149) 8.32 (0157) 4.36 (0995) 4.35 (1005) 3.22 (1959) 2.89 (2472) 2.87 (2497) 2.87 (2500) ITEM 1 ITEM coffee drip grande sandwich signature reg market bouquet sushi shoreline combo semifreddis bread baguette orbit gum peppermint snickers candy bar cheer lndry det color guard coors light beer btl greek salad signature 2 SCORE ( RANK ) coffee drip venti sandwich signature lrg alstromeria/rose bouquet sushi full moon combo crusty sweet baguette orbit gum spearmint 3 musketeers candy bar all lndry det liquid fresh rain coors light beer can neptune salad signature Example potential substitutes at the UPC level -0 .33 (001) -1.17 (020) -2.89 (186) -3.76 (282) -7.65 (566) -7.96 (595) -7.97 (598) -7.99 (602) -8.12 (621) -8.15 (630) Summary and Questions xi OBSERVED DATA EMBEDDINGS CONTEXT VECTORS ⇢ ˛ I Word embeddings have become a staple in natural language processing We distilled its essential elements, generalized to consumer data I Compared to classical factorization, good performance in many data – movie ratings, neural activity, scientific reading, shopping baskets Rudolph et al. Exponential Family Embeddings. Neural Information Processing Systems, 2016. Liang et al. Modeling User Exposure in Recommendation. Recommendation Systems, 2016. Ranganath et al. Deep Exponential Families. Artificial Intelligence and Statistics, 2015. Summary and Questions xi OBSERVED DATA EMBEDDINGS CONTEXT VECTORS ⇢ ˛ I How can we capture higher-order structure in the embeddings? I Why downweight the zeros? I How can we include price and other complexities? Rudolph et al. Exponential Family Embeddings. Neural Information Processing Systems, 2016. Liang et al. Modeling User Exposure in Recommendation. Recommendation Systems, 2016. Ranganath et al. Deep Exponential Families. Artificial Intelligence and Statistics, 2015. Poisson factorization A computationally efficient method for discovering correlated preferences I Economics – Look at items within one category (e.g. yoghurt) – Try to estimate the effects of interventions (e.g., coupons, price, layout) I Machine learning – Look at all items – Estimate user preferences and make predictions (recommendations) – Ignore causal effects of interventions Items Users a St ke tri s ar rW Em eS pir k ac sB tu Re rn of e th y i ed J n he W rry Ha all tS e M Ba re ot fo in th k ar eP a St k re rT III: I Implicit data is about users interacting with items – clicks – “likes” – purchases I Less information than explicit data (e.g. ratings), but more prevalent ra W th of an Kh ITEM ATTRIBUTES USER PREFERENCES yui ˇi ✓uk ⇠ Gam. ; / ˇik ⇠ Gam. ; / ✓u yui ⇠ Poisson ✓u> ˇi Poisson factorization I Assumptions – Users (consumers) have latent preferences u . – Items have latent attributes ˇi . – How many items a shopper purchased comes from a Poisson. I The posterior p.; ˇ j y/ reveals purchase patterns. Gopalan et al., Scalable recommendation with hierarchical Poisson factorization. Uncertainty in Artificial Intelligence, 2015. ITEM ATTRIBUTES ˇi USER PREFERENCES yui ✓uk ⇠ Gam. ; / ˇik ⇠ Gam. ; / ✓u yui ⇠ Poisson ✓u> ˇi Advantages I captures heterogeneity of users I implies a distribution of total consumption I efficient approximation, only requires non-zero data Gopalan et al., Scalable recommendation with hierarchical Poisson factorization. Uncertainty in Artificial Intelligence, 2015. News articles from the New York Times Scientific articles from Mendeley “Business Self-Help” “Astronomy” Stay Focused And Your Career Will Manage Itself To Tear Down Walls You Have to Move Out of Your Office Self-Reliance Learned Early Maybe Management Isn't Your Style My Copyright Career Theory of Star Formation Error estimation in astronomy: A guide Astronomy & Astrophysics Measurements of Omega from 42 High-Redshift Supernovae Stellar population synthesis at the resolution of 2003 “Personal Finance” “Biodiesel” In Hard Economy for All Ages Older Isn't Better It's Brutal Younger Generations Lag Parents in Wealth-Building Fast-Growing Brokerage Firm Often Tangles With Regulators The Five Stages of Retirement Planning Angst Signs That It's Time for a New Broker Biodiesel from microalgae. Biodiesel from microalgae beats bioethanol Commercial applications of microalgae Second Generation Biofuels Hydrolysis of lignocellulosic materials for ethanol production “All Things Airplane” “Political Science” Flying Solo Crew-Only 787 Flight Is Approved By FAA All Aboard Rescued After Plane Skids Into Water at Bali Airport Investigators Begin to Test Other Parts On the 787 American and US Airways May Announce a Merger This Week Social Capital: Origins and Applications in Modern Sociology Increasing Returns, Path Dependence, and Politics Institutions, Institutional Change and Economic Performance Diplomacy and Domestic Politics Comparative Politics and the Comparative Method Figure 6: The top 10 items by the expected weight βi from three of the 100 components discovered by our algorithm for the New York Times and Mendeley data sets. non-uniformly sampled items. JMLR W&CP, Jan 2012. [8] A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis. Chapman & Hall, London, 1995. the datatel challenge. Procedia Computer Science, 1(2):1–3, 2010. [18] N. Johnson, A. Kemp, and S. Kotz. Univariate Discrete Distributions. John Wiley & Sons, 2005. Mean normalized precision Mendeley New York Times ! 10% ! ! 8% 25% Netflix (implicit) ! ! ! ! ! 2% 20% 7% 8% ! ! ! Mendeley ! 4% New York Times ! ! 8% ! Echo Nest 8% ! 7% Netflix (implicit) 18% ! 0.5% ! 5% 6% ! 4% 0.0% ! ! ! 15% 20% ! ! 15% ! 12% ! ! MF Netflix (explicit) ! 1.5% 6% 10% ! ! 7% LDA NMF ! ! 2.0% 1.0% BPF 15% 5% ! 0% HPF 25% ! ! 15% ! 6% ! ! 20% 6% 1% Netflix (explicit) 30% ! ! 2.5% Mean recall Echo Nest ! ! 5% HPF BPF LDA MF 10% 9% NMF ! ! 4% ! ! 5% ! Figure 4: Predictive performance on data sets. The top and bottom plots show normalized mean precision and mean recall at 20 recommendations, respectively. While competing method performance varies across data sets, HPF and BPF consistently outperform competing methods. MCMC algorithm for inference. The authors [30] report that a single Gibbs iteration on the Netflix data set with ommendations for each method and data sets. We see that HPF and BPF outperform other methods on all data sets “FRUIT” “CAT CARE” “BABY ESSENTIALS” “HEALTHY” stone fruit pears tropical fruit apples grapes cat food wet cat food dry cat litter & deodorant canned fish paper towels baby food starbucks coffee disposable diapers infant formula baby/youth wipes health and milk substitutes organic vegetables organic fruits cold cereal vegetarian / organic frozen Consumer #1 : “Cats and Babies” 1.2 #10 -4 0.03 0.025 Latent preferences Latent preferences 1 0.8 0.6 0.4 0.02 0.015 0.01 0.005 0.2 0 Consumer #2 : “Healthy and Cats” 0 0 5 10 Components 15 20 0 5 10 Components 15 20 Poisson factorization and economics I Consider a utility model of a single purchase with Gumbel error, U.yui / D log.u> ˇi / C : I Suppose a shopper u buys N items. Then yu j N Multi.N; u / ui / expfu> ˇi g: I Thus, the unconditional distribution of counts is Poisson factorization, yuj Poisson.u> ˇj /: Poisson factorization and economics I With this connection, we can devise new utility models, e.g., U.yui / D log.u> ˇi C ˛u expf c pricei / C : I ...and other factors – – – – – time of day in stock date observed item characteristics & category demographic information about the shopper I Inference is still efficient. With assumptions, we can answer counterfactual questions. ITEM ATTRIBUTES ˇi USER PREFERENCES yui ✓uk ⇠ Gam. ; / ˇik ⇠ Gam. ; / ✓u yui ⇠ Poisson ✓u> ˇi I Poisson factorization efficiently analyzes large-scale purchase behavior. I Next steps – include notions of co-purchases and substitutes – include time of day at a level that is unconfounded – include price and stock out; answer counterfactual questions I Research in recommendation systems can help economic analyses. KNOWLEDGE & QUESTION DATA R A. 7Aty This content downloaded from 128.59.38.144 on Thu, 12 Nov 2015 01:49:31 UTC All use subject to JSTOR Terms and Conditions K=7 LWK YRI ACB ASW CDX CHB CHS JPT KHV LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 K=8 Make assumptions CEU Discover patterns pops 1 2 3 4 5 6 7 Predict & Explore 8 K=9 LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 8 9 Figure S2: Population structure inferred from the TGP data set using the TeraStructure algorithm at three values for the number of populations K. The visualization of the ✓’s in the Figure shows patterns consistent with the major geographical regions. Some of the clusters identify a specific region (e.g. red for Africa) while others represent admixture between regions (e.g. green for Europeans and Central/South Americans). The presence of clusters that are shared between different regions demonstrates the more continuous nature of the structure. The new cluster from K = 7 to K = 8 matches structure differentiating between American groups. For K = 9, the new cluster is unpopulated. 28 Probabilistic machine learning: design expressive models to analyze data. I Tailor your method to your question and knowledge I Use generic and scalable inference to analyze large data sets I Form predictions, hypotheses, inferences, and revise the model KNOWLEDGE & QUESTION DATA R A. 7Aty This content downloaded from 128.59.38.144 on Thu, 12 Nov 2015 01:49:31 UTC All use subject to JSTOR Terms and Conditions K=7 LWK YRI ACB ASW CDX CHB CHS JPT KHV LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 K=8 Make assumptions CEU Discover patterns pops 1 2 3 4 5 6 7 Predict & Explore 8 K=9 LWK YRI ACB ASW CDX CHB CHS JPT KHV CEU FIN GBR IBS TSI MXL PUR CLM PEL GIH pops 1 2 3 4 5 6 7 8 9 Figure S2: Population structure inferred from the TGP data set using the TeraStructure algorithm at three values for the number of populations K. The visualization of the ✓’s in the Figure shows patterns consistent with the major geographical regions. Some of the clusters identify a specific region (e.g. red for Africa) while others represent admixture between regions (e.g. green for Europeans and Central/South Americans). The presence of clusters that are shared between different regions demonstrates the more continuous nature of the structure. The new cluster from K = 7 to K = 8 matches structure differentiating between American groups. For K = 9, the new cluster is unpopulated. 28 Opportunities for economics and machine learning I Push economics to high-dimensional data and scalable computation I Push ML to explainable models, applied causal inference, new problems I Develop new modeling methods together
© Copyright 2026 Paperzz