Search, Sorting and Big

4 Lists
A list is a sequence of values of a certain type. It may have properties like being
sorted/unsorted, having duplicates and so on. Lists can typically have the following
operations:
 initialization
 add an item to the list
 delete an item from a list
 search
 sort
 print
 etc.
An implemented list may have only some subset of the above functionalities. A list is
very general and can be implemented many ways. In your 344 notes there is an
implementation of a linked list.
Lists can be implemented either sequentially (with arrays or vectors) or with a linked list
structure. If you used a fixed array you will need to state an initial size. Items are stored
in memory consecutively and you can have direct access to any particular item through
the use of its index. When sorted, the list can be searched using binary search. Making
the list grow can be expensive and space is often wasted as large amount of space may be
allocated but not used
capacity
after
growing
array
Initial
capacity of
list
Amount of
memory
needed
Number of items in the list
Array/Vector
Linked List
page 15
A linked structure is very easy to grow and shrink. Data is not stored in consecutive
memory locations so a large block of contiguous memory is not required even for storing
large amounts of data. Space is wasted because a pointer is used to link up the pieces of
data. However, the amount of extra data is related to number of items in the list already.
A linked list cannot be searched using binary search as direct access to nodes are not
available.
4.1
Linked Lists
The linked list data structure is made up of multiple nodes. Each node is made up of two
portions, a data portion and a pointer portion. The data portion contains one instance of
the data to be stored. While the pointer portion indicates the location of another node.
4.1.1 Example:
An array of 5 doubles:
6.3
3.2
5.4
7.8
1.5
A linked list storing the same data as the above array:
start
6.3
3.2
5.4
7.8
1.5
Each linked list must have a pointer to some node in the list. Any other node in the list is
reached by following the pointers from one node to the next.
Like an array, a linked list stores data of the SAME data type. The data type of a linked
list will determine how each node is declared.
4.1.2 Example:
If we want to store some doubles our node declaration would be:
struct Node{
double data_;
Node* next_;
};
// the data portion of the node
// a pointer to the next node
page 16
To create linked lists that hold other data types the data portion of a node would need to
be a different data type. You could have a linked lists that hold user defined data
datatypes.
4.1.3 Example
class Hamster
char name_[50];
int age_;
public:
....
};
//Each node holds one instance of Hamster.
struct Node{
Hamster data_;
Node* next_;
};
Your linked lists can also be template classes.
template <class TYPE>
struct Node{
TYPE data_;
Node<TYPE>* next_;
};
If you create a template node, you will also need to create a template list.
Even though we have used a struct to define a Node, it is also possible to use a class. If
you use a class to define a Node, you will need to write functions to access the data
members. Another method of handling a Node class is by using friends. You could
make the list a friend of your Node class so that access to data is restricted to the list.
Every list MUST have a pointer to a node in the list (usually the beginning). This pointer
should point to a node that will allow access to all the other nodes. It may also be useful
to have an internal pointer that could be used to manipulate the list.
A linked list could be defined as the following. (Note that there are other ways to
encapsulate the idea of linked list. This is just one way.):
class Llist{
Node* start_;
Node* curr_;
public:
....
};
page 17
The public member functions for a linked list should then provide methods for accessing
and manipulating the data stored in the linked list.
In the following discusion we will assume we are writing a linked list of doubles
4.1.4 Typical functions for accessing specific nodes/data:
Node* start(); //returns a pointer to the first node
Node* curr(); //returns a pointer to the the current node
Node* next(); //returns a pointer to the node after the
//current node. If curr is NULL return NULL
Node* prev(); //returns a pointer to the node before the
//current node. If curr is NULL return NULL
int data(double& dat); //if the curr is not NULL, pass the
//data at the node back through dat
4.1.5 Typical functions for manipulating the linked list.
int goprev(); /*make curr_ point at the previous node if
possible. If successful return true, otherwise
return false*/
int gonext(); /*make curr_ point at the next node if possible.
If successful return true, otherwise return
false*/
void gostart(); /*makes the first node the current node*/
void goend();
/*makes the last node the current node*/
int InsertAfter(double newdata);
/*Insert a node after the current node containing newdata as
the data for the node. curr_ should point at the newly added
node. Return true for success, false for failure*/
int InsertBefore(double newdata);
/*Insert a node before the current node containing newdata as
the data for the node. curr_ should point at the newly added
node Return true for success, false for failure*/
int Remove();
/*Removes the node pointed to by curr_. If the node removed
was the last node in the list make curr_ point at the new last
node. Otherwise, curr_ should point at the node after one the
one that was just removed. returns true if successful, false
if not*/
page 18
4.1.6 Other Functions:
The list also needs to be properly initialized. When the list goes out of scope, resources
must be freed up. Therefore a constructor and destructor is also need
4.1.7 Illustration of how a linked list works:
When writing the code for a linked list, you must be very careful about the order in which
you do each step of a function. It is important that you don't lose bits of the list as you try
to add and remove nodes. The following shows pictorially how insertion and removal
could be done. When you write your own linked list, pictures will help.
Note: In each of the following illustration a code segment will be used to show the code
needed to obtain the desired result. IT IS NOT NECESSARILY THE FULL CODE TO
A FUNCTION.
Initialize the linked list
Typically a linked list starts off empty. start_ and curr_ are both NULL. The fact that
start_ is NULL means that the list is empty.
start_
Code segment:
curr_
start_=NULL;
curr_=NULL;
Make the function call: InsertAfter(3.5);
Since the list is empty, all we need to do is create a new node and make start_ and
curr_ point to it. In the new node, the next_ pointer should be NULL as there is no next
node:
Step 1:
temp
start_
curr_
3.5
Code segment:
Node* temp=new Node;
temp->data_=newdata;
temp->next_=NULL;
start_=temp;
curr_=temp;
page 19
Step 2:
temp
start_
3.5
curr_
Step 3:
temp
start_
3.5
curr_
Note: temp is a local variable and thus it will go out of scope. However, temp is just a
pointer. The pointer will go out of scope but the node with 3.5 will NOT go out of scope.
Make the function call: InsertAfter(6.3);
In order to add to the list, we must create the node, make the next_ pointer of the node
with 3.5 point to the newly added node, and make curr point to the new node. Note that
start_ doesn't change. Also we must do it in order or we will lose the nodes involved:
Initial:
Code segment:
Node* temp=new Node;
temp->data_=newdata;
temp->next_=NULL;
curr_->next_=temp;
curr_=temp;
start_
curr_
3.5
//step
//step
//step
//step
//step
1
1
1
2
3
Step 1: create a new node
temp
start_
curr_
3.5
6.3
page 20
Step 2: make next pointer of 3.5 node point to new node
temp
start_
curr_
3.5
6.3
Step 3: make curr_ point to new node
temp
start_
curr_
3.5
6.3
Make function call InsertBefore(4.2);
Step 1 create temporary pointer to point at node before curr:
pr
Code segment:
Node* pr=prev();
Node* temp=new Node;
temp->data_=newdata;
temp->next_=curr_;
pr->next_=temp;
curr_=temp;
start_
3.5
curr_
6.3
//step
//step
//step
//step
//step
//step
Step 2: create a new node
pr
start_
curr_
3.5
temp
6.3
4.2
page 21
1
2
2
2
3
4
Step 3: Link up properly so 3.5's next_ points to new node
pr
start_
curr_
3.5
temp
6.3
4.2
Step 4: point curr_ at new node.
pr
temp
start_
curr_
3.5
6.3
4.2
Make function call InsertAfter(8.1);
Step 1: Create an initialize new node
temp
start_
curr_
8.1
3.5
4.2
6.3
Step 2: link new node into list
temp
8.1
start_
curr_
3.5
4.2
6.3
page 22
Step 3: point curr_ to new node.
temp
start_
8.1
3.5
curr_
Code segment:
Node* temp=new Node;
temp->data_=newdata;
temp->next_=curr_->next_;
curr_->next_=temp;
curr_=temp;
4.2
//step
//step
//step
//step
//step
6.3
1
1
1
2
3
Make function call gonext();
Code segment:
curr_=curr_->next_;
start_
3.5
4.2
8.1
6.3
curr_
Make function call to goprev();
Step 1: After ensuring curr_!=start_ use a temporary variable to point at first node
temp
start_
curr_
8.1
3.5
4.2
6.3
page 23
Step 2: If temp->next_==curr_ it means that temp points at the previous node. If it
isn't advance temp to point at temp->next_. Repeat until right node is reached
temp
start_
3.5
8.1
4.2
6.3
curr_
Step 3: make curr_ point at same node as temp
temp
start_
3.5
curr_
4.2
8.1
6.3
Code segment:
if(curr_!=start_){
Node* temp=start_; //step 1
while(temp->next_!=curr_) temp=temp->next_; //step 2
curr_=temp;
}
Make function call: gostart();
Code segment:
curr_=start_;
start_
curr_
3.5
4.2
8.1
6.3
page 24
Make function call: InsertBefore(5.2);
Step 1: Create an initialize a new node
Code segment:
Node* temp=new Node;
temp->data_=newdata;
temp->next_=curr_;
start_=temp;
curr_=temp;
temp
//step
//step
//step
//step
//step
1
1
1
2
3
5.2
start_
3.5
4.2
8.1
6.3
curr_
Step 2: Make start_ point at temp
temp
5.2
start_
3.5
4.2
8.1
6.3
3.5
4.2
8.1
6.3
curr_
Step 3: Make curr_ point at temp.
temp
5.2
start_
curr_
Make function call: gonext();
Code segment:
curr_=curr_->next_;
start_
5.2
3.5
4.2
8.1
6.3
curr_
page 25
Make function call: remove();
When removing a node it is very important that you don't lose the list. The order is very
important here.
Step 1: Use two temporary pointers to point at the nodes before and after the one we want
to remove. A pointer to tne node we want to remove is also useful
pr
rem
nx
start_
5.2
3.5
4.2
8.1
6.3
curr_
Step 2: point the next_ pointer of the previous node and curr_ to the node after the one we
want to remove
pr
rem
nx
start_
5.2
3.5
4.2
8.1
6.3
curr_
Step 3: Deallocate the node to be removed. Note that rem still holds the same address but
the node it points at is no longer valid.
pr
rem
nx
start_
5.2
3.5
4.2
8.1
6.3
curr_
Code segment:
Node* pr=prev();
//step 1
Node* nx=next();
//step 1
Node* rem=curr(); //step 1
pr->next_=curr_=nx;; //step 2
delete rem
//step 3
page 26
4.2
Doubly Linked List
The previous linked list has a forward link only. Thus, at any point we can find out what
the next node is with relative ease but to find the previous node, you would need to start
at the beginning and search for a node who's next pointer has the same value as curr_.
One improvement that you could make to your list is to create a doubly linked list. A
doubly linked list is a linked list where every node has both a forward and backwards
pointer (one points to next node one points to previous node).
4.2.1 The advantage of a doubly linked list



No need to search entire list to find previous pointer
Can move/search in both directions on list
can access entire list from any point
4.2.2 the disadvantages of a doubly linked list


more memory is needed to store back pointer
requires more work to set up back links properly
4.2.3 Picture of a doubly linked list:
start_
5.2
3.5
4.2
8.1
6.3
curr_
4.3
Circular Linked list
Another method of implementing a linked list involves using a circular form so that the
next_ pointer of the last node points back to the first node.
page 27
4.3.1 Advantages of a circular linked list



Some problems are circular and a circular data structure would be more natural
when used to represent it
The entire list can be traversed starting from any node (traverse means visit every
node just once)
fewer special cases when coding(all nodes have a node before and after it)
4.3.2 Disadvantages of a circular linked list


Depending on implementation, inserting at start of list would require doing a
search for the last node which could be expensive.
Finding end of list and loop control is harder (no NULL's to mark beginning and
end)
4.3.3 Picture of a singly linked circular linked list
start_
5.2
3.5
4.2
8.1
6.3
curr_
4.3.4 Picture of a doubly linked circular linked list
start_
5.2
3.5
4.2
8.1
6.3
curr_
page 28
4.3.5 Implementational improvement
With a non-circular linked list, we typically have a pointer to the first item. However,
with a circular linked list (especially a singly linked one) this implementation may not be
a good idea. The reason for this is that if we point to the start of the list and we want to
add/remove an item to the front, we would need to go through the entire list in order to
find the last node so that we could keep the linked list hooked up properly.
One thing we could do is add another pointer to the list called last_ which points to the
last node in the list. However, this means that our object will have another pointer to
worry about setting properly.
Another method of implementation is to forget about the start_ pointer entirely and just
have a last_ pointer. The reason for this is because if we point to just the last node, it is
very very easy to find out what the first one was (remember start==last_->next_).
page 29
5 The stack
A stack is a kind of list where items are always added to the front and removed from the
front. Thus, a stack is a FILO structure. A stack can be thought of a structure that
resembles a stack of trays. Each time a new piece of data is added to the stack, it is
placed on the top. Each time a piece of data is removed it also must be removed from the
top. Typically only the top item is visible. You cannot remove something from the
middle.
5.1






5.2



Operations on a stack
push - add a new item to the stack (remember always add to front or top)
pop - removes first item from the stack
initialize - create an empty stack
empty - tests for whether or not stack is empty
full - tests to see if stack is full (not needed if data structure grows automatically)
top - looks at value of the top item but do not remove it
Stack based algorithms:
Bracket checking
o int bracketcheck(char expr[]); returns true if expr is a string where (),
{} and [] brackets are properly matched. false if not.
Postfix expression calculator
o The way we write expressions uses infix notation. In other words, all
operations look like A operator B (operation is "in" the middle of the
expression). In order to change the order of operations, we must use ().
Order of operations also matter
o Another way to write expressions is to use postfix expression. All
operations look like A B operator
o The advantage of postfix expressions is that brackets are not needed and
order of operators are not needed.
o Example: infix (1+2)-3*(4+5), equivalent postfix: 1 2 + 3 4 5 + * o Some calculators actually use postfix notation for entry.
Infix to postfix converter
A stack structure could also be used to mimic the runtime stack so that we could use it
instead of writing recursive functions.
page 30
5.2.1 Example:
Recall we earlier wrote a recursive function to return the number of 1's in the binary
representation of a number N.
int FindOnesNR(int N){
int result=0;
Stack<int> mystack;
while(N>=1){
if(N%2)
mystack.push(1);
N=N/2;
}
while(!mystack.isempty()){
result+=mystack.pop();
}
return result;
}
6 Queues
Queues like stacks are a special kind of list. In the case of a queue, items are added to the
back and removed from the front (FIFO structure). A queue is a line!
6.1.1 Operations on a queue






insert (aka enqueue, enter) - adds an item to the end of the queue
remove (aka dequeue, leave) - remove an item from front of the queue
initialize - create an empty queue
empty - tests for whether or not queue is empty
full - tests to see if queue is full (not needed if data structure grows automatically)
front - looks at value of the first item but do not remove it
6.1.2 Applications of Queues
Queues are a useful representation of problems for different applications. For example,
jobs to a network printer is enqueued so that the earlier a job is submitted the earlier it
will be printed. Breadth first searches use queues. Queues also have applications in
graph theory.
page 31
6.1.3 Queue implementation
Queues are slightly more difficult to implement using arrays/vectors than a stack. The
reason is that with a stack, we can very efficiently add/remove to the back of the list.
However, with a queue, no matter which end you choose to add data to, the other end
must be used for removing data. If we simply use a vector or array as it is, we will end
up having to shuffle data for at least one of the two operations. This could make the
queue very inefficient.
6.1.3.1 Implementation #1
Another way to implement a queue using arrays is to store a front and end index. To add
an element put the item in array[end] and increment end. To remove an item, remove it
from array[front] and increment front. Treat the array as circular so that the element after
the last element is array[0].
Observe the following example:
At start: List is empty. we cannot remove. note that end==front when queue is empty
0
1
2
3
4
1
2
3
4
front: 0
end: 0
enqueue 5:
5
0
front: 0 //index we will remove from
end: 1 //index we will add to
dequeue: again end== front, queue is empty
0
1
2
3
4
front: 1 //index we will remove from
end: 1 //index we will add to
page 32
enqueue 4:
0
4
1
2
3
4
front: 1 //index we will remove from
end: 2 //index we will add to
enqueue 5:
0
4
1
5
2
3
4
front: 1 //index we will remove from
end: 3 //index we will add to
enqueue 2:
0
4
1
5
2
2
3
4
front: 1 //index we will remove from
end: 4 //index we will add to
enqueue 6: Note how end goes back to index 0 as we are at the end.
0
4
1
5
2
2
3
6
4
front: 1 //index we will remove from
end: 0 //index we will add to
enqueue 7:
7
0
4
1
5
2
2
3
6
4
front: 1 //index we will remove from
end: 1 //index we will add to
page 33
Problem. How do we know that in the above list, the queue is empty or full? We can't
simply test queue[end] for a certain value because any value can be in it.
Solutions:
 solution 1: never allowing front to equal to end unless it is empty
 solution 2: store the queuesize
 solution 3: store a queuefull flag
Other problems with vector implementation
 growing the queue is not as easy as it was. We can't simply say resize() because
we will add room to only the end of the list. For example like the situation above
enqueue 8: we can't do it as list is full so make list bigger. if we try to insert at
queue[end] we will still erase old value even if list is now bigger
7
0
4
1
5
2
2
3
6
4
5
6
front: 1 //index we will remove from
end: 1 //index we will add to
To grow properly we will need to check. if end < front then we will need to copy all
values starting at queue[0] to queue[end-1] to list. be careful when copying that you
don't go past the end of the new list.
algorithm for resizing.:
oldsize=queue.size();
//will be start of end of list
queue.resize();
//make it bigger
if(end<front){
oldend=end;
//keep track of where the oldend
value is
end=oldsize;
//this is where numbers will begin insertion.
for(int i=0;i<oldend;i++){
queue[(oldsize+i)%queue.size()]=queue[i];
}
Perhaps an even better way of implementing a queue is with the use linked lists.
Singly linked implementation:
Here is a picture of a singly linked list.
start_
5.2
3.5
4.2
8.1
6.3
curr_
page 34
Could we implement a queue using the above list as is? What would the runtimes be like
for enqueue? dequeue?
If we use our singly linked list there will be problems unless we modify how some of the
functions work. suppose we want to insert at the front of the list and remove from the
end.
Enqueue is easy just call:
gostart();
insertBefore(val);
Dequeue is easy to call:
goend();
insertAfter(val);
but you should note that to use goend(), curr_ must need to advance through list until the
end.(it isn't constant run time like gostart() was). If we had just done a dequeue, then
curr_ can be set efficiently. However if we keep alternating between enqueue() and
dequeue() curr_ will have to keep travelling up and down the list. and slow down the
speed of the queue. We could of course add an end_ pointer to the object but that would
mean keeping proper track of another pointer. Also removing from the end always
involves doing a search for the previous node which is also a slow process.
If we use a doubly linked list, we still may have to move curr_ up the list for each
dequeue(). The only savings we would get would be that if we were at the end of the list,
dequeue() would be constant time operation but once we do an enqueue operation, curr_
will be at the front again and we will still have to move it to the end in order to
dequeue().
Another way to implement the queue is to use a circular linked list. Growing and
shrinking are not a problem with circular linked lists. adding to end is easy (if it is
properly implemented. see linked list section on how to properly implement a circular
linked list). removing from front is also very easy. Both enqueue and dequeue
operations take constant time (ie no mater how big the list is, adding/removing one item
takes the same amount of time)
page 35