Functions
Margaret M. Fleck
24 September 2010
These notes cover functions, especially the concepts of onto and oneto-one. This material is in section 2.3 of Rosen. We also see the fun, if
slightly dangerous, method of simplifying proofs by claiming something is
true “without loss of generality.”
1
Functions
We all know roughly what functions are, from high school and (if you’ve
taken it) calculus. You’ve mostly seen functions whose inputs and outputs
are numbers, defined by an algebraic formula such as f (x) = 2x + 3. We’re
going to generalize and formalize this idea, so we can talk about functions
with other sorts of input and output values.
Suppose that A and B are sets, then a function f from A to B (shorthand:
f : A → B) is an assignment of exactly one element of B (i.e. the output
value) to each element of A (i.e. the input value). A is called the domain of
f and B is called the co-domain.
For example, let’s define g : Z → Z by the formula g(x) = 2x. The
domain and co-domain of this function are both Z.
Notice that the domain and co-domain are part of the definition of the
function, just like the input/output type declarations for a function in a
programming language. Suppose we define h : N → N such that h(x) = 2x.
This is a different function from g because the declared domain and codomain are different.
Two functions are equal if they have the same domain, the same codomain, and assign the same output value to each input value.
The inputs and outputs to functions don’t have to be numbers and a
function doesn’t have to be defined by an algebraic formula. It’s sufficient
1
to describe a clear, explicit procedure for finding the output value, given the
input value. For example, suppose H is the set of herb’s in my garden. That
is, H = {sage, oregano, tarragon, thyme, chives}. And suppose A contains
the letters of the alphabet (ignoring case). We can define a function
s:H→A
where s(x) is the first letter in x’s name. For example s(Sage) = S.
For small finite sets like these, we can also just list all the input/output
pairs:
sage 7→ S
chives 7→ C
oregano 7→ O
tarragon 7→ T
thyme 7→ T
We also show this with a bubble diagram.
tarragon
T
S
thyme
O
chives
A
sage
B
C
oregano
Notice that we use 7→ when we show the output value for a a single input
value, but → to show the input and output sets for the whole function.
A function is also known as a map or mapping and we can say that some
input value x maps to the corresponding output value y.
2
2
What isn’t a function?
Much of the definition of a function is an association of output values with
input values. Suppose I give you an association p. When is p not a function?
One possibility is that p doesn’t provide an output value for every input
value. For example, suppose we defined p : R → R such that p(x) is the multiplicative inverse of x. That is, p(x) is the integer y such that xy = 1. This
isn’t a function because one input value (zero) doesn’t have a corresponding
output value.1
Notice that this depends critically on what we’ve declared the domain to
be. If we revised the type declaration for p to read p : R − {0} → R, then p
would be a function.
An association is also not a function if it assigns two different output values to the same input value. Suppose if I want to define the locations of top-5
CS departments. I.e. the domain is D = {MIT, CMU, UIUC, Stanford, Berkeley}
and the do-domain is the set of all cities C. I might define c as:
MIT 7→ Cambridge
CMU 7→ Pittsburgh
UIUC 7→ Urbana
UIUC 7→ Champaign
Berkeley 7→ Berkeley
Stanford 7→ Palo Alto
This isn’t a function because UIUC is mapped to two distinct output
values. When we need to have a function return multiple values, we need to
return them as sets. For example, we might define c : D → P(C) by:
MIT 7→ {Cambridge}
CMU 7→ {Pittsburgh}
UIUC 7→ {Urbana, Champaign}
1
Almost functions that don’t provide an output for every input are sometimes useful
and are known as partial functions. But we won’t use them in this class.
3
Berkeley 7→ {Berkeley}
Stanford 7→ {Palo Alto}
Since we’ve declared that our output values are sets, we have to make
them all sets. So we have to map (say) MIT to {Cambridge} rather than to
the bare string Cambridge.
Another √
example of this problem would be the function h : R+ → R such
that h(x) = x which maps each positive real onto its square root. We could
fix this function by stipulating that we mean the positive square root.
3
Images and Onto
Suppose we have a function f : A → B. If x is an element of A, then the
value f (x) is also known as the image of x. If S is a subset of A, then we can
(by a slight abuse of notation) write f (S) for the set of all values produced
when f is applied to elements of S. That is
f (S) = {f (x) | x ∈ S}
The image of f is the image of the whole domain A, written f (A) or
Im(f ). That is:
f (A) = Im(f ) = {f (x) : x ∈ A}
f is onto if its image is its whole co-domain. Or, equivalently,
∀y ∈ B, ∃x ∈ A, f (x) = y
For example, our function from herbs to first letters of names isn’t onto,
because its image is the set {C, O, S, T } which is nowhere near all the letters
of the alphabet.
Suppose we define f : Z → Z by f (x) = x + 2. This function is onto. If
we pick any integer y, let x be y − 2. Then f (x) = f (y − 2) = (y − 2) + 2 = y.
Now suppose we define g : N → N using the same formula g(x) = x + 2.
g isn’t onto, because none of the input values map onto 0 or 1.
Warning: whether a function is onto depends on how we’ve
declared its domain and co-domain. When we’re discussing these properties, it is absolutely critical to declare the input/output types for all the
functions you are using.
4
4
Why are some functions not onto?
In this class, most examples of non-onto functions look like cases where you
could have defined them to be onto, but the author just didn’t feel like
setting up the co-domain precisely. Or, when it comes to computer programs,
perhaps the compiler declarations don’t allow you to be sufficiently precise,
e.g. there’s no distinct type for non-negative floating point numbers.
In some applications, however, it’s critical that certain functions not be
onto. For example, in graphics or certain engineering applications, we may
wish to map out or draw a curve in 2D space. The whole point of the curve
is that it occupies only part of 2D and it is surrounded by whitespace. These
curves are often specified “parametrically,” using functions that map into,
but not onto, 2D.
For example, we can specify a (unit) circle as the image of a function
f : [0, 1] → R2 defined by f (x) = (cos 2πx, sin 2πx). If you think of the input
values as time, then f shows the track of a pen or robot as it goes around
the circle. The cosine and sine are the x and y coordinates of its position.
The 2π multiplier simply puts the input into the right range so that we’ll
sweep exactly once around the circle (assuming that sine and cosine take
their inputs in radians).
5
Negating onto
To understand the concept of onto, it may help to think about what it
means for a function not to be onto. This is our first example of negating a
statement involving two nested (different) quantifiers. Our definition of onto
is
∀y ∈ B, ∃x ∈ A, f (x) = y
So a function f is not onto if
¬∀y ∈ B, ∃x ∈ A, f (x) = y
To negate this, we proceed step-by-step, moving the negation inwards.
You’ve seen all the identities involved, so this is largely a matter of being
careful.
¬∀y ∈ B, ∃x ∈ A, f (x) = y
5
≡ ∃y ∈ B, ¬∃x ∈ A, f (x) = y
≡ ∃y ∈ B, ∀x ∈ A, ¬(f (x) = y)
≡ ∃y ∈ B, ∀x ∈ A, f (x) 6= y
So, if we want to show that f is not onto, we need to find some value y
in B, such that no matter which element x you pick from A, f (x) isn’t equal
to y.
6
One-to-one
A function is one-to-one if it never assigns two input values to the same
output value. That is
or, equivalently,
∀x, y ∈ A, x 6= y → f (x) 6= f (y)
∀x, y ∈ A, f (x) = f (y) → x = y
(These two versions are equivalent because they are the contrapositives of
one another.)
When reading this definition, notice that when you set up two variables
x and y, they don’t have to have different (math jargon: “distinct”) values.
In normal English, if you give different names to two objects, the listener is
expected to understand that they are different. By contrast, mathematicians
always mean you to understand that they might be different but there’s also
the possibility that they might be the same object.
For example, let g : Z → Z be defined by g(x) = 2x. g is one-to-one. But
remember our function s that mapped herbs to the first letter of their names.
It’s not one-to-one because tarragon and thyme share the same output value
T.
A classic example of a function that’s not one-to-one is the absolute value
function |x| from the reals to the reals. It’s not one-to-one because, for
example, 2 and -2 map to the same output value. Similarly, let g : R → Z
be defined by g(x) = ⌊x⌋. g is not one-to-one, because 2.1, 2.3, and many
other numbers all map to 2.
Like onto, one-to-one depends on what you’ve picked for the domain and
co-domain. For example, consider a function g ′ that is very like g but with
a smaller domain. Let g ′ : Z → Z be defined by g ′ (x) = ⌊x⌋. Then g ′ is
one-to-one.
6
7
Pre-images and inverses
Suppose that f : A → B is a function from A to B. If we pick a value x in
A, then the corresponding output value f (x) is called the image of x. If we
pick a value y ∈ B, then x ∈ A is a pre-image of y if f (x) = y. For example,
in our herb-name function s, chives is a pre-image of C.
Notice that I said a pre-image of y, not the pre-image of y. For a one-toone function, each element of the co-domain has exactly one pre-image. But
this isn’t true for functions that aren’t one-to-one. For example, if f is the
absolute value function from the reals to the reals, the output value 2 has
two pre-images: 2 and -2. In our herb-name example, X has no pre-images
and T has two (tarragon and thyme).
If a function f is both one-to-one and onto, then each output value has
exactly one pre-image. So we can invert f , to get an inverse function f −1 . A
function that is both one-to-one and onto is called a one-to-one correspondence. If f maps from A to B, then f −1 maps from B to A.
Notice that one-to-one and onto are like the two properties required to
be a function: each input gets at most one value, each input gets at least
one value). The conditions for being a one-to-one correspondence force these
conditions to hold in the backwards, as well as the forwards, direction.
8
Warning about variations in terminology
The terms “injective,” “surjective,” and “bijection” are fancy synonyms for
one-to-one, onto, and one-to-one correspondence. No more and no less. It’s
important to get used to both versions of each term, because individual
mathematicians often alternate, even within a single lecture.
It used to be that people used the term “range” to refer to the co-domain.
However, formal mathematics has standardized on the term “co-domain” for
the declared set of possible output values for the function. When the term
“range” is used, it is as a synonym for “image,” i.e. the actual output values
produced when you feed in all possible input values. (Rosen uses “range”
with this meaning.)
We’ll try to stick carefully to the newer convention in this class. Please
avoid the term range. But be aware that older authors and authors outside
math/CS may use the terms differently.
7
9
Proving that a function is one-to-one
Claim 1 Let f : Z → Z be defined by f (x) = 3x + 7. f is one-to-one.
Let’s prove this using our definition of one-to-one.
Proof: We need to show that for every integers x and y, f (x) =
f (y) → x = y.
So, let x and y be integers and suppose that f (x) = f (y). We
need to show that x = y.
We know that f (x) = f (y). So, substituting in our formula for f ,
3x + 7 = 3y + 7. So 3x = 3y and therefore x = y, by high school
algebra. This is what we needed to show.
When we pick x and y at the start of the proof, notice that we haven’t
specified whether they are the same number or not. Mathematical convention
leaves this vague, unlike normal English where the same statement would
strongly suggest that they were different.
You may have encountered the abbreviation QED at the end of a proof.
This stands for “quod erat demonstrandum” which is simply the Latin for
“this is what we needed to show.” Some people put a little box or a little
triangle of 3 dots at the end of the proof. It’s good style to have something
at the end of your proof which tells the reader that the proof is complete.
10
Proving that a function is onto
Now, consider this claim:
Claim 2 Define the function g from the integers to the integers by the formula g(x) = x − 8. g is onto.
Proof: We need to show that for every integer y, there is an
integer x such that g(x) = y.
So, let y be some arbitrary integer. Choose x to be (y + 8).
x is an integer, since it’s the sum of two integers. But then
g(x) = (y + 8) − 8 = y, so we’ve found the required pre-image for
y and our proof is done.
8
Notice that our function f from the last section wasn’t onto. Suppose we
tried to build a proof that it was.
Proof: We need to show that for every integer y, there is an
integer x such that f (x) = y.
So, let y be some arbitrary integer. Choose x to be
(y−7)
.
3
...
If f was a function from the reals to the reals, we’d be ok at this point,
because x would be a good pre-image for y. However, f ’s inputs are declared
isn’t an integer. So it won’t work
to be integers. For many values of y, (y−7)
3
as an input value for f .
11
A 2D example
Here’s a sample function whose domain is 2D. Let f : Z2 → Z be defined by
f (x, y) = x + y. I claim that f is onto.
First, let’s make sure we know how to read this definition. f : Z2 is
shorthand for Z × Z, which is the set of pairs of integers. So f maps a pair
of integers to a single integer, which is the sum of the two coordinates.
To prove that f is onto, we need to pick some arbitrary element y in the
co-domain. That is to say, y is an integer. Then we need to find a sample
value in the domain that maps onto y, i.e. a “preimage” of y. At this point,
it helps to fiddle around on our scratch paper, to come up with a suitable
preimage. In this case, (0, y) will work nicely. So our proof looks like:
Proof: Let y be an element of Z. Then (0, y) is an element of
f : Z2 and f (0, y) = 0 + y = y. Since this construction will work
for any choice of y, we’ve shown that f is onto.
12
Composing two functions
Suppose that f : A → B and g : B → C are functions. Then g ◦ f is the
function from A to C defined by (g ◦ f )(x) = g(f (x)). Depending on the
author, this is either called the composition of f and g or the composition of
g and f . The idea is that you take input values from A, run them through
f , and then run the result of that through g to get the final output value.
9
Take-home message: when using function composition, look at the author’s shorthand notation rather than their mathematical English, to be
clear on which function gets applied first.
In this definition, notice that g came first in (g ◦ f )(x) and g also comes
first in g(f (x)). I.e. unlike f (g(x)) where f comes first. The trick for
remembering this definition is to remember that f and g are in the same
order on the two sides of the defining equation.
For example, if we use our functions f and g defined above, the domains
and co-domains of both functions are the integers. So we can compose the
two functions in both orders.
(f ◦ g)(x) = f (g(x)) = 3g(x) + 7 = 3(x − 8) + 7 = 3x − 24 + 7 = 3x − 17
(g ◦ f )(x) = g(f (x)) = f (x) − 8 = (3x + 7) − 8 = 3x − 1
Notice that the order matters to the output value!
Frequently, the declared domains and co-domains of the two functions
aren’t all the same, so often you can only compose in one order. For example,
consider the function h : {strings} → Z which maps a string x onto its length
in characters. (E.g. h(Margaret) = 8.) Then f ◦ h exists but (h ◦ f ) doesn’t
exist because f produces numbers and the inputs to h are supposed to be
strings.
13
A proof involving composition
Consider this claim:
Claim 3 For any sets A, B, and C and for any functions f : A → B and
g : B → C, if f and g are injective, then g ◦ f is also injective.
We can prove this with a direct proof, by being systematic about using our
definitions and standard proof outlines. First, let’s pick some representative
objects of the right types and assume everything in our hypothesis.
Proof: Let A, B, and C be sets. Let f : A → B and g : B → C
be functions. Suppose that f and g are injective.
We need to show that g ◦ f is injective.
10
To show that g ◦ f is injective, we need to pick two elements x and y in
its domain, assume that their output values are equal, and then show that
x and y must themselves be equal. Let’s splice this into our draft proof.
Remember that the domain of g ◦ f is A and its co-domain is C.
Proof: Let A, B, and C be sets. Let f : A → B and g : B → C
be functions. Suppose that f and g are injective.
We need to show that g ◦ f is injective. So, choose x and y in A
and suppose that (g ◦ f )(x) = (g ◦ f )(y)
We need to show that x = y.
Now, we need to apply the definition of function composition and the fact
that f and g are each injective:
Proof: Let A, B, and C be sets. Let f : A → B and g : B → C
be functions. Suppose that f and g are injective.
We need to show that g ◦ f is injective. So, choose x and y in A
and suppose that (g ◦ f )(x) = (g ◦ f )(y)
Using the definition of function composition, we can rewrite this
as g(f (x)) = g(f (y)). Combining this with the fact that g is
injective, we find that f (x) = f (y). But, since f is injective, this
implies that x = y, which is what we needed to show.
14
Another proof involving composition
We can prove a similar fact for surjective (aka onto) functions:
Claim 4 For any sets A, B, and C and for any functions f : A → B and
g : B → C, if f and g are surjective, then g ◦ f is also surjective.
Proof: Let A, B, and C be sets. Let f : A → B and g : B → C
be functions. Suppose that f and g are surjective.
We need to show that g ◦ f is surjective. That is, we need to show
that for any element x in C, there is an element y in A such that
(g ◦ f )(y) = x.
11
So, pick some element x in C. Since g is surjective, there is an
element z in B such that g(z) = x. Since f is surjective, there is
an element y in A such that f (y) = z.
Substituting the value f (y) = z into the equation g(z) = x, we
get g(f (y)) = x. That is, (g ◦ f )(y) = x. So y is the element of
A we needed to find.
15
Proof by cases
Now, let’s do a slightly trickier proof. First, a definition. Suppose that A
and B are sets of real numbers (e.g. the reals, the rationals, the integers).
A function f : A → B is increasing if, for every x and y in A, x ≤ y implies
that f (x) ≤ f (y). f is called strictly increasing if, for every x and y in A,
x < y implies that f (x) < f (y).2
Claim 5 For any sets of real numbers A and B, if f is any strictly increasing
function from A to B, then f is one-to-one.
A similar fact applies to strictly decreasing functions.
To prove this, we will restate one-to-one using the alternative, contrapositive version of its definition.
∀x, y ∈ A, x 6= y → f (x) 6= f (y)
Normally, this wouldn’t be a helpful move. The hypothesis involves a
negative fact, whereas the other version of the definition has a positive hypothesis, normally a better place to start. But this is an atypical exam and,
in this case, the negative information turns out to be a good starting point.
Proof: Let A and B be sets of numbers and let f : A → B be a
strictly increasing function. Let x and y be distinct elements of
A. We need to show that f (x) 6= f (y).
Since x 6= y, there’s two possibilities.
Case 1: x < y. Since f is strictly increasing, this implies that
f (x) < f (y). So f (x) 6= f (y)
2
In math, “strictly” is often used to exclude the possibility of equality.
12
Case 2: y < x. Since f is strictly increasing, this implies that
f (y) < f (x). So f (x) 6= f (y).
In either case, we have that f (x) 6= f (y), which is what we needed
to show.
The phrase “distinct elements of A” is math jargon for x 6= y.
When we got partway into the proof, we had the fact x 6= y which isn’t
easy to work with. But the trichotomy axiom for real number states that for
any x and y, we have exactly three possibilities: x = y, x < y, or y < x. The
constraint that x 6= y eliminates one of these possibilities.
We then used a technique called “proof by cases”. In proof by cases, you
establish a list of several possibilities, one of which must be true. You then
try each possibility in turn, assuming it’s true and showing that your desired conclusion follows. If the conclusion follows no matter which possibility
you’ve chosen, it must be true in general. This technique isn’t hard. The
main point to be careful about is making sure your cases do actually cover
all the possibilities.
16
Without loss of generality
In this example, the proofs for the two cases are very, very similar. So we can
fold the two cases together. Here’s one approach, which I don’t recommend
doing in the early part of this course but which will serve you well later on:
Proof: Let A and B be sets of numbers and let f : A → B be a
strictly increasing function. Let x and y be distinct elements of
A. We need to show that f (x) 6= f (y).
Since x 6= y, there’s two possibilities.
Case 1: x < y. Since f is strictly increasing, this implies that
f (x) < f (y). So f (x) 6= f (y)
Case 2: y < x. Similar to case 1.
In either case, we have that f (x) 6= f (y), which is what we needed
to show.
This method only works if you, and your reader, both agree that it’s
obvious that the two cases are very similar and the proof will really be similar.
13
Dangerous assumption right now. And we’ve only saved a small amount of
writing, which isn’t worth the risk of losing points if the TA doesn’t think it
was obvious.
But this simplificiation is very useful in more complicated situations (e.g.
in CS 373) where you have may have lots of cases and the proofs really are
very similar and the proof for each case is long.
Here’s another way to simplify our proof:
Proof: Let A and B be sets of numbers and let f : A → B be a
strictly increasing function. Let x and y be distinct elements of
A. We need to show that f (x) 6= f (y).
We know that x 6= y, so either x < y or y < x. Without loss of
generality, assume that x < y.
Since f is strictly increasing, x < y implies that f (x) < f (y). So
f (x) 6= f (y), which is what we needed to show.
The phrase “without loss of generality” means that we are adding an
additional assumption to our proof but we claim it isn’t really adding any
more information. In this case, x and y are both arbitrary elements of A
and the condition f (x) 6= f (y) is symmetrical in its two inputs (i.e. it means
the same thing as f (y) 6= f (x). So we haven’t yet made any real distinction
in properties between x and y. It doesn’t actually matter which of the two
numbers is called x and which is called y. So if we happen to have chosen
our names in the wrong order so that y < x, we can just rename our two
numbers in the opposite way. And then we’ll have x < y.
Again, this is a potentially dangerous proof technique that you’ll probably
want to see a few times before you use it yourself. It’s actually used fairly
often to simplify proofs, so it has an abbreviation: WOLOG or WLOG.
14
© Copyright 2026 Paperzz