Recitation2 for BigData Hashing Jay Gu Jan 24 2013 Homework1 Patch • Unknown user’s gender should be 0 – fix DataInstance.java, HashedDataInstance.java • New lambda range for the “Regularization” part Outline • Hash function • Hash kernel • Multitask learning Hash Function Hash Function • Collision is bad – Want: – But we do not know the input distribution…. Universal Hashing Family Uniformly pick For any given pair such that: Simple construction: ax+b • Pick a prime number p Simple construction: ax+b • Proof Sketch: - How many hash functions in H? p(p-1) - x1 <> x2 u <> v - How many (u,v) pairs causes collision? p(p/n – 1) - (u,v) is 1-1 mapped to (a,b), which is 1-1 mapped to h How to hash string? • Java’s built-in hashcode: • Md5 checksum: 128 bits = 16 bytes Hash Kernel 1 1 1 0 0 0 0 0 0 1 Mary (1) Little (-1) Lamb (-1) Obam a (1) Care (1) Husky (-1) UW (-1) Big (1) Data (-1) Rock (-1) 1 -1 -1 -1 0 0 1 0 1 1 0 0 1 1 0 Mary (1) Little (-1) Lamb (-1) Obam a (1) Care (1) Husky (-1) UW (-1) Big (1) Data (-1) Rock (-1) -1 1 -1 1 1 Hash Kernel High Dimension (infinite) Low Dimension (finite) X: Space of feature name, not value! • Directly learn W in the space of • Implement using only h Instead of hashing into m bins, hashing into 2m bins, and take the first bit as sign. Multitask Learning Multitask Learning Global part Hash feature name Personalized part Hash feature name along with user id
© Copyright 2026 Paperzz