SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE D5.3 Reinforcement learning 3rd review meeting Cartif-Brussels 17/12/2014 D5.3 Reinforcement Learning 1 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE Contents The SandS system ◦ The intended role of Reinforcement Learning The actor-critic model Some results and conclusions D5.3 Reinforcement Learning 2 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE The SandS system The role of the NI is to provide recipe recommendations to new task proposed by the eahouker The role of the Reinforcement Learning is to tune the recipes to the eahouker preferences ◦ Recipe individualization ◦ Learning implies (many) repetition of • recipe application • Eahouker feedback D5.3 Reinforcement Learning 3 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE The SandS System D5.3 Reinforcement Learning 4 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE The SandS system (explanation) Eahouker forwards a new task to NI (through DI and ESN) NI generates a new recipe for the task, sending it to the appliance, adn RL for future processing Eahouker receives the result, forwards satisfaction feedback to RL for future processing Sometime in the future, Eahouker needs to do the same task, NI detects that fact, sending it to RL RL generates a refined recipe according to past satisfaction, sending it to NI which sends it to the appliance Eahouker generates a satisfaction feedbak for RL. D5.3 Reinforcement Learning 5 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE Reinforcement Learning Elements ◦ Action selection: recipe generation ◦ State of the system: appliance and task ◦ Reward: eahouker feedback ◦ Learning goal: optimal action selection policies • Evaluation of policy quality: critic • Generation of optimal actions: actor D5.3 Reinforcement Learning 6 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE Actor critic model D5.3 Reinforcement Learning 7 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE Actor Critic model Policies and actions are modelled by Radial Basis Functions Actor updates the action generation parameters by Critic updates the valuation function by D5.3 Reinforcement Learning 8 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE Some results We need to resort to simulation Case study: washing machine (abstract) We generate hidden target task-recipe pairs ◦ An initial random recipe is provided ◦ applying RL to tune initial recipe ◦ Measuring evolution by the difference between the hidden recipe and the one generated by the actor D5.3 Reinforcement Learning 9 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE Some results D5.3 Reinforcement Learning 10 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE Some results D5.3 Reinforcement Learning 11 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE Some results Increase in satisfaction in 6 experiments (5 is maximum) D5.3 Reinforcement Learning 12 SOCIAL HOUSEKEEPING THROUGH INTERCOMMUNICATING APPLIANCES AND SHARED RECIPES MERGING IN A PERVASIVE WEB-SERVICES INFRASTRUCTURE Conclusions Reinforcement learning may be applied to the recipe tuning Requires repetitive experimentation by the eahouker Shows good convergence, meaning few trials will show significant improvement of the eahouker experience. Real life application may be achieved when the system is fully deployed, ◦ software available and modular D5.3 Reinforcement Learning 13
© Copyright 2026 Paperzz