Learning Tags from Unsegmented Videos of Multiple Human Actions