Learning What to Value
Daniel Dewey
The Singlularity Institute for Artificial Intelligence
Summary
What should real-world AGI be designed to do? We assert
that a real-world AGI ought to have a motivational framework
capable of expressing our goals.
Using this criterion, we find that reinforcement learning is not
appropriate for real-world AGI. As a partial solution, we
describe observation-utility maximization, a framework that can
express many different hard-coded goals.
Finally, we propose value learning, a motivational framework in
which agents learn to maximize an initially unknown utility
function, obviating the need for hard-coded human values. Value
learning is more promising than either reinforcement learning or
observation-utility maximization, but it also raises many new
questions. What pool of utility functions should be considered?
What properties should we seek in a prior over utility functions?
What constitutes evidence that a utility function is "good"?
Value learning agents do not solve all problems of ultraintelligent
agent design, but do give a direction for future work.
3. Observation-Utility Maximization
2. Reinforcement Learning
1. Motivation
As a field, AGI has its hands full with an engineering problem: how
can we design and build a system that performs with the intellectual
depth and breadth of a human? However, it is not enough that an
artificial agent perform efficiently and effectively— it must also be
motivated to pursue tasks that will benefit us. If an AGI does not
share our goals, we will have succeeded only in creating a powerful
human-indifferent agent. This outcome would fall well short of the
potential benefits AGI could bring to humanity if it shared our goals.
Worse is the threat of a human-indifferent agent undergoing an
intelligence explosion. I. J. Good's intelligence explosion theory
predicts that upon passing human-level intelligence, an AGI will likely
undergo a process of repeated self-improvement; in the wake of such
an event, how well our goals are fulfilled would depend entirely on
how well the AGI's goals match ours.
As a partial solution, we define observation-utility maximization, a
framework that is at least flexible enough to express many different
goals. Observation-utility maximizers ("OUMs") are inspired by Nick
Hay's work on optimal agents.
Reinforcement learning agents ("RLs") act so as to maximize
future rewards. An idealized RL's decision process is depicted in
Fig. 1. Formally, the agent's action yk (following interaction history
yx≤m) is given by
yk = arg max
yk
Σ
The most important feature of an OUM is its observation-utility
function, U. Let U be a function from an interaction history yx≤m to a
scalar utility. U calculates expected utility, how much "value" the
agent expects the universe to contain, given an interaction history.
Putting U(yx≤m) in place of an RL's sum of rewards (r1 + ... + rm)
produces an idealized OUM, whose decision process is shown in
Fig. 2:
(rk + ... + rm) P(yx≤m | yx≤k yk)
xk yxk+1:m
The trouble with reinforcement learning is that it can only be used
to define agents whose goal is to maximize future rewards. It is
appealing to think that an RL "has no goal" and will learn its goal
from its environment, but this is not strictly true. An RL may learn
instrumental goals, but its final goal is to maximize rewards. Since
human goals are not naturally instrumental to maximized rewards,
an RL will work at cross-purposes to us in many cases. For
example, an RL could benefit by altering its environment to give
rewards regardless of whether human goals are achieved.
yk
U(yx≤m) P(yx≤m | yx≤k yk)
xk yxk+1:m
Unlike reinforcement learning, expected observation-utility
maximization can be used to define agents with many different final
goals. Unfortunately, an OUM requires a detailed observation-utility
function up front. This is not ideal; a major benefit of reinforcement
learning was that it seemed to allow us to apply an intelligent agent
to a problem without clearly defining its goal beforehand. Can this
idea of learning to maximize an initially unknown utility function
be recovered?
Since the reinforcement learning framework is only capable of
expressing one goal, the maximization of rewards, we conclude
that reinforcement learning is not suitable for real-world AGI.
The upshot of these arguments is that, in order to realize the
potential benefits of AGI, real-world AGI must be designed to at
least be capable of expressing human goals. As we will see, this
criterion is useful in choosing a motivational framework.
Σ
yk = arg max
4. Value Learning
Finally, we propose a new framework, called value learning. The core idea of value learning is
uncertainty over utility functions; instead hard-coding the goals we want it to achieve, we give
a value learning agent ("VL") the epistemic tools it needs to learn what we want it to value
from experience.
Fig. 3
U1(
Action x1
A value learning agent ("VL") is provided with a pool of possible utility functions and a a
probability distribution P such that each utility function can be assigned probability P(U |yx≤m)
given a particular interaction history. In designing P, we specify what kinds of interactions
constitute evidence about goals.
Replacing the reinforcement learner's sum of rewards with an expected utility over a pool of
possible utility functions, we have an optimality notion for a value-learning agent:
yk = arg max
yk
Σ
Σ
P1(yx≤m | yx≤k yk)
xk yxk+1:m
Choose
maximizing
Action
U(yx≤m) P2(U | yx≤m)
Action x1
⇒
As shown in Fig. 3, a VL chooses the action that maximizes the expected value of the
resulting future, where "value" is an expected value across possible utility functions
weighted by evidence gleaned from the VL's interactions with the world.
Action x1
Fig. 2
...
...
U3(
...
U1(
)
U1(
U2(
⇒
+
)
)
)
...
)
...
U1(
)
)
...
...
U1(
)
...
)
)
...
...
U3(
+
U3 (
...
)
)
U3(
...
...
U1(
+
+ ...
)
...
)
)
+ ...
)
U2(
)
) U2 (
+
U1(
U (
+ 2
U1(
)
U 3(
...
U3(
...
U2(
) + U2(
)
U2(
)
)
U2(
)
U3(
+ ...
)
...
...
Fig. 1
Action x1
Choose
maximizing
Action
)
U3(
U
The value learning framework can be used to express many different goals depending on what
we consider evidence about the "right" utility function. Above observation-utility maximization, it
has the added benefit of allowing the agent to, in some sense, help us define its goal.
U2(
U3(
⇒
U1(
)
⇒ U(
) + U(
Action x1
⇒ U(
) + U(
Action x1
⇒ U(
) + U(
) + U(
) + U(
) + U(
) + U(
) + ...
) + U(
) + ...
) + U(
) + ...
Choose
maximizing
Action
5. Conclusion
Marcus Hutter's introduction to AIXI offers a compelling statement of the goals of AGI as a
field:
"Most, if not all known facets of intelligence can be formulated as goal-driven or,
more precisely, as maximizing some utility function. It is, therefore, sufficient to
study goal-driven AI... The goal of AI systems should be to be useful to humans.
The problem is that, except for special cases, we know neither the utility function
nor the environment in which the agent will operate in advance."
Reinforcement learning, we have argued, is not appropriate for real-world AGI. Reinforcement
learners act to maximize future rewards, and are therefore fundamentally indifferent to
humans. Reinforcement learning does not solve the problem of maximizing an initially unknown
utility function in an initially unknown environment.
Value learning, on the other hand, is an example framework expressive enough to be used
in agents with goals other than reward maximization. This framework is not a full design for
a safe, ultraintelligent agent; at very least, the design of probability distributions and model
pools for utility functions is crucial and non-trivial, and still better frameworks for ultraintelligent
agents likely exist. Value learners do not solve all problems of ultraintelligent agent design, but
do give a direction for future work on this topic.
Action x1
⇒
+
Action x1
⇒
+
Action x1
⇒
+
+
+
+
+
+ ...
+
+ ...
+
+ ...
© Copyright 2026 Paperzz