Investigate!
Consider the graph drawn below.
Find a subgraph with the smallest number of edges that is still connected and contains all the vertices.
Find a subgraph with the largest number of edges that doesn’t contain any cycles.
What do you notice about the number of edges in your examples above? Is this a coincidence?
One very useful and common approach to studying graph theory is to restrict your focus to graphs of a particular kind. For example, you could try to really understand just complete graphs or just bipartite graphs, instead of trying to understand all graphs in general. That is what we are going to do now, looking at
trees. Hopefully by the end of this section we will have a better understanding of this class of graph, and also understand why it is important enough to warrant its own section.
Definition of a Tree.
A
tree is a connected graph containing no cycles.
A
forest is a graph containing no cycles. Note that this means that a connected forest is a tree.
Does the definition above agree with your intuition for what graphs we should call trees? Try thinking of examples of trees and make sure they satisfy the definition. One thing to keep in mind is that while the trees we study in graph theory are related to trees you might see in other subjects, the correspondence is not exact. For instance, in anthropology, you might study family trees, like the one below,
So far so good, but while your grandparents are (probably) not blood-relatives, if we go back far enough, it is likely that they did have
some common ancestor. If you trace the tree back from you to that common ancestor, then down through your other grandparent, you would have a cycle, and thus the graph would not be a tree.
You might also have seen something called a
decision tree (such as the algorithm for deciding whether a series converges or diverges). Sometimes these too contain cycles, as the decision for one node might lead you back to a previous step.
Both the examples of trees above also have another feature worth mentioning: there is a clear order to the vertices in the tree. In general, there is no reason for a tree to have this added structure, although we can impose such a structure by considering
rooted trees, where we simply designate one vertex as the
root. We will consider such trees in more detail later in this section.
Subsection Properties of Trees
We wish to really understand trees. This means we should discover properties of trees; what makes them special and what is special about them.
A tree is a connected graph with no cycles. Is there anything else we can say? It would be nice to have other equivalent conditions for a graph to be a tree. That is, we would like to know whether there are any graph theoretic properties that all trees have, and perhaps even that
only trees have.
To get a feel for the sorts of things we can say, we will consider three
propositions about trees. These will also illustrate important proof techniques that apply to graphs in general, and happen to be a little easier for trees.
Our first proposition gives an alternate definition for a tree. That is, it gives necessary and sufficient conditions for a graph to be a tree.
Proposition 4.2.1.
A graph
is a tree if and only if between every pair of distinct vertices of
there is a unique path.
Proof.
This is an “if and only if” statement, so we must prove two implications. We start by proving that if
is a tree, then between every pair of distinct vertices there is a unique path.
Assume
is a tree, and let
and
be distinct vertices (if
only has one vertex, then the conclusion is satisfied automatically). We must show two things to show that there is a unique path between
and
that there is a path, and that there is not more than one path. The first of these is automatic, since
is a tree, it is connected, so there is a path between any pair of vertices.
To show the path is unique, we suppose there are two paths between
and
and get a contradiction. The two paths might start out the same, but since they are different, there is some first vertex
after which the two paths diverge. However, since the two paths both end at
there is some first vertex after
that they have in common, call it
Now consider the two paths from
to
Taken together, these form a cycle, which contradicts our assumption that
is a tree.
Now we consider the converse: if between every pair of distinct vertices of
there is a unique path, then
is a tree. So assume the hypothesis: between every pair of distinct vertices of
there is a unique path. To prove that
is a tree, we must show it is connected and contains no cycles.
The first half of this is easy:
is connected, because there is a path between every pair of vertices. To show that
has no cycles, we assume it does, for the sake of contradiction. Let
and
be two distinct vertices in a cycle of
Since we can get from
to
by going clockwise or counterclockwise around the cycle, there are two paths from
and
contradicting our assumption.
We have established both directions so we have completed the proof.
Read the proof above very carefully. Notice that both directions had two parts: the existence of paths, and the uniqueness of paths (which related to the fact that there were no cycles). In this case, these two parts were really separate. In fact, if we just considered graphs with no cycles (a forest), then we could still do the parts of the proof that explore the uniqueness of paths between vertices, even if there might not
exist paths between vertices.
This observation allows us to state the following
corollary:
Corollary 4.2.2.
A graph
is a forest if and only if between any pair of vertices in
there is at most one path.
We do not give a proof of the corollary (it is, after all, supposed to follow directly from the proposition) but for practice, you are asked to give a careful proof in the exercises. When you do so, try to use proof by contrapositive instead of proof by contradiction.
Our second proposition tells us that all trees have
leaves: vertices of degree one.
Proposition 4.2.3.
Any tree with at least two vertices has at least two vertices of degree one.
Proof.
We give a proof by contradiction. Let
be a tree with at least two vertices, and suppose, contrary to stipulation, that there are not two vertices of degree one.
Let
be a path in
of longest possible length. Let
and
be the endpoints of the path. Since
does not have two vertices of degree one, at least one of these must have degree two or higher. Say that it is
We know that
is adjacent to a vertex in the path
but now it must also be adjacent to another vertex, call it
Where is
It cannot be a vertex of
because if it was, there would be two distinct paths from
to
the edge between them, and the first part of
(up to
). But
also cannot be outside of
for if it was, there would be a path from
to
that was longer than
which has longest possible length.
This contradiction proves that there must be at least two vertices of degree one. In fact, we can say a little more:
and
must
both have degree one.
The proposition is quite useful when proving statements about trees, because we often prove statements about trees by
induction. To do so, we need to reduce a given tree to a smaller tree (so we can apply the inductive hypothesis). Getting rid of a vertex of degree one is an obvious choice, and now we know there is always one to get rid of.
To illustrate how induction is used on trees, we will consider the relationship between the number of vertices and number of edges in trees. Is there a tree with exactly 7 vertices and 7 edges? Try to draw one. Could a tree with 7 vertices have only 5 edges? There is a good reason that these seem impossible to draw.
Proposition 4.2.4.
Let
be a tree with
vertices and
edges. Then
Proof.
We will give a proof by induction on the number of vertices in the tree. That is, we will prove that every tree with
vertices has exactly
edges, and then use induction to show this is true for all
For the base case, consider all trees with
vertices. There is only one such tree: the graph with a single isolated vertex. This graph has
edges, so we see that
as needed.
Now for the inductive case, fix
and assume that all trees with
vertices have exactly
edges. Now consider an arbitrary tree
with
vertices. By
Proposition 4.2.3,
has a vertex
of degree one. Let
be the tree resulting from removing
from
(together with its incident edge). Since we removed a leaf,
is still a tree (the unique paths between pairs of vertices in
are the same as the unique paths between them in
).
Now
has
vertices, so by the inductive hypothesis, has
edges. What can we say about
Well, it has one more edge than
so it has
edges. But this is exactly what we wanted:
so indeed
This completes the inductive case, and the proof.
There is a very important feature of this induction proof that is worth noting. Induction makes sense for proofs about graphs because we can think of graphs as growing into larger graphs. However, this does NOT work. It would not be correct to start with a tree with
vertices, and then add a new vertex and edge to get a tree with
vertices, and note that the number of edges also grew by one. Why is this bad? Because how do you know that
every tree with
vertices is the result of adding a vertex to your arbitrary starting tree? You don’t!
The point is that whenever you give an induction proof that a statement about graphs that holds for all graphs with
vertices, you must start with an arbitrary graph with
vertices, then
reduce that graph to a graph with
vertices, to which you can apply your inductive hypothesis.
Subsection Rooted Trees
So far, we have thought of trees only as a particular kind of graph. However, it is often useful to add additional structure to trees to help solve problems. Data is often structured like a tree. This book, for example, has a tree structure: draw a vertex for the book itself. Then draw vertices for each chapter, connected to the book vertex. Under each chapter, draw a vertex for each section, connecting it to the chapter it belongs to. The graph will not have any cycles; it will be a tree. But a tree with clear hierarchy which is not present if we don’t identify the book vertex as the “top”.
As soon as one vertex of a tree is designated as the
root, then every other vertex on the tree can be characterized by its position relative to the root. This works because there is a unique path between any two vertices in a tree. So from any vertex, we can travel back to the root in exactly one way. This also allows us to describe how distinct vertices in a rooted tree are related.
If two vertices are adjacent, then we say one of them is the
parent of the other, which is called the
child of the parent. Of the two, the parent is the vertex that is closer to the root. Thus the root of a tree is a parent, but is not the child of any vertex (and is unique in this respect: all non-root vertices have
exactly one parent).
Not surprisingly, the child of a child of a vertex is called the
grandchild of the vertex (and it is the
grandparent). More in general, we say that a vertex
is a
descendent of a vertex
provided
is a vertex on the path from
to the root. Then we would call
an
ancestor of
For most trees (in fact, all except paths with one end the root), there will be pairs of vertices neither of which is a descendant of the other. We might call these cousins or siblings. In fact, vertices
and
are called
siblings provided they have the same parent. Note that siblings are never adjacent (do you see why?).
Example 4.2.5.
If we designate vertex
as the root, then
and
are the children of
and are siblings of each other. Among the other things we cay say are that
is a child of
and a descendant of
The vertex
is a descendant of
in fact, is a grandchild of
Vertices
and
are siblings, since they have the common parent
Notice how this changes if we pick a different vertex for the root. If
is the root, then its lone child is
which also has only one child, namely
We would then have
the child of
(instead of the other way around), and
is the descendant of
instead of the ancestor.
and
are now siblings.
All of this flowery language helps us describe how to
navigate through a tree. Traversing a tree, visiting each vertex in some order, is a key step in many algorithms. Even if the tree is not rooted, we can always form a rooted tree by picking any vertex as the root. Here is an example of why doing so can be helpful.
Example 4.2.6.
Explain why every tree is a bipartite graph.
Solution.
To show that a graph is bipartite, we must divide the vertices into two sets and so that no two vertices in the same set are adjacent. Here is an algorithm that does just this.
Designate any vertex as the root. Put this vertex in set Now put all of the children of the root in set None of these children are adjacent (they are siblings), so we are good so far. Now put into every child of every vertex in (i.e., every grandchild of the root). Keep going until all vertices have been assigned one of the sets, alternating between and every “generation.” That is, a vertex is in set if and only if it is the child of a vertex in set
The key to how we partitioned the tree in the example was to know which vertex to assign to a set next. We chose to visit all vertices in the same generation before any vertices of the next generation. This is usually called a
breadth first search (we say “search” because you often traverse a tree looking for vertices with certain properties).
In contrast, we could also have partitioned the tree in a different order. Start with the root, put it in
Then look for one child of the root to put in
Then find a child of that vertex, into
and then find its child, into
and so on. When you get to a vertex with no children, retreat to its parent and see if the parent has any other children. So we travel as far from the root as fast as possible, then backtrack until we can move forward again. This is called
depth first search.
These algorithmic explanations can serve as a proof that every tree is bipartite, although care needs to be spent to prove that the algorithms are
correct. Another approach to prove that all trees are bipartite, using induction, is requested in the exercises.
Subsection Spanning Trees
One of the advantages of trees is that they give us a few simple ways to travel through the vertices. If a connected graph is not a tree, then we can still use these traversal algorithms if we identify a subgraph that
is a tree.
First we should consider if this even makes sense. Given any connected graph
will there always be a subgraph that is a tree? Well, that is actually too easy: you could just take a single vertex of
If we want to use this subgraph to tell us how to visit all vertices, then we want our subgraph to include all of the vertices. We call such a tree a
spanning tree. It turns out that every connected graph has one (and usually many).
Spanning tree.
Given a connected graph
a
spanning tree of
is a subgraph of
which is a tree and includes all the vertices of
Every connected graph has a spanning tree.
How do we know? We can give an algorithm for
finding a spanning tree! Start with a connected graph
If there is no cycle, then
is already a tree and we are done. If there is a cycle, let
be any edge in that cycle and consider the new graph
(i.e., the graph you get by deleting
). This tree is still connected since
belonged to a cycle, there were at least two paths between its incident vertices. Now repeat: if
has no cycles, we are done, otherwise define
to be
where
is an edge in a cycle in
Keep going. This process must eventually stop, since there are only a finite number of edges to remove. The result will be a tree, and since we never removed any vertex, a
spanning tree.
This is by no means the only algorithm for finding a spanning tree. You could have started with the empty graph and added edges that belong to
as long as adding them would not create a cycle. You have some choices as to which edges you add first: you could always add an edge adjacent to edges you have already added (after the first one, of course), or add them using some other order. Which spanning tree you end up with depends on these choices.
Example 4.2.7.
Find two different spanning trees of the graph,
Solution.
Here are two spanning trees.
Although we will not consider this in detail, these algorithms are usually applied to
weighted graphs. Here every edge has some weight or cost assigned to it. The goal is to find a spanning tree that has the smallest possible combined weight. Such a tree is called a
minimum spanning tree. Finding the minimum spanning tree uses basically the same algorithms as we described above, but when picking an edge to add, you always pick the smallest (or when removing an edge, you always remove the largest).