SORTING
(1) Sorting---Overview:I/p:- A sequence <a1 a2 … an>
O/p:- A permutation of i/p < a1’ a2’ … an’ > such that ai’ <= ai+1’ , for 1<=i’<n
where each ai belongs to a domain on which sorting is defined in one or the other way. For
example, sorting is defined on the set of real numbers where sorting is generally considered to
be sorting by magnitude of the numbers. i.e ¾ is considered to be higher in order than ½ and -¾
is considered to be lower than -½.
This does not mean that sorting can not be done a domain on which such natural concept like
magnitude of an element is not defined. We can force or define our own customized notion of
magnitude on the domain in such cases, of course depending on requirement and if such a
notion can really be created.
For ex. consider a 2 dimensional figure, a square of width 4, with left bottom corner located at
the zero of the Cartesian 2 dimensional geometry. Consider 2-dimensional points (x,y) inside or
on border of this square where x and y are positive integers. Call this set as INT(square),
meaning the collection of all points inside or on the boundary of the square where co-ordinates
can take only integral values. So, INT(square)={(0,0), (1,0), (2,0), (3,0), (0,1), (1,1), (2,1), (3,1),
(0,2), (1,2), (2,2), (3,2), (0,3), (1,3), (2,3), (3,3)} .
Now, there does not exist any immediate obvious concept like magnitude of a n-dimensional
point on which we can sort the above set. Given (1,3) and (2,2), we do not know which one is
greater so that we can sort this pair. So, we can define our own notion of magnitude. For
example, consider magnitude of a point (x,y) to be the formula |(x,y)|=|x|+|y| i.e sum of
absolute values of co-ordinates. Then given points (a,b) and (c,d) we can say (a,b) >= (c,d) if and
only if |(a,b)| >= |(c,d)|. What is happening here is that points (a,b) and (c,d) are put into
relation with each other by a certain relation R which totally orders the point set. Our usual 2dimensional Euclidean Distance from (0,0), defined as
|(x,y)| = √( x2+ y2 )
called the radial distance, is one such valid total order relation on the 2-dimensional plane that
we have been dealing with in Cartesian Geometry.
Ok..so....Given a domain, we may come up with many such total order relations, using which we
can sort the given input subset of that domain. All points having same value will be called equal
w.r.t the defined relation. For eg. all 2-dimensional points having same Euclidean Distance 'd'
i.e all points on the circumference of a circle of radius 'd' are considered to be equal in order, if
we use Euclidean Distance metric as a basis for sorting. The value of a real number is another
such metric over real number domain which we generally use for sorting real numbers. The
point is that, at least one such definition should exist which our algorithm should use to decide
relative order among elements of a given domain. Sorting algorithms may sometimes explicitly
use the definition of this total order relation as a part of sorting strategy. But, usually sorting
algorithms you will learn are independent of these considerations.
So, by this end, you should have acquired enough understanding of what sorting is generally
and that you should have realized by now that the first thing a sorting algorithm requires is the
existence of some comparison concept (i.e a total order relation) based on which comparisons
of elements of the given domain can be done. Well, even a partially ordered set would also do
for sorting. In such a case, sorting would be a collection of chains each, individually sorted and
each chain is obviously totally ordered.
(2) Space-Time Complexity Analysis---Issues:..suddenly analysis coming in the scene when you have not seen even a single sorting algorithm!!! why ???
...so, to speak, we don't need to see any such concrete algorithm to discuss the points which are mentioned
below...For example, to start learning Discrete Structures, you do not need to see an example of a discrete
structure so that you start learning.. all you need at first is to think about what should not be a discrete
structure and what in the world can be a sufficient picture of what a discrete structure really is...then you
start studying examples of discrete structures which confirm to our concerns...
there are different methods of solving a problem..many methods...but some of them adorned a lot for their
sheer beauty and elegance. Be patient enough to see the issues involved in implementing one of them,
namely divide and conquer strategy.
(a) Space Complexity:Space is always required for entire input to be stored, usually. This is generally the case that
you are dealing with and may be this is your only view about input output. This view is typically
called as Off-line or Static case (and you will know on-line case may be in higher semesters ). Meaning, entire
input is available to us at once and we work on it by reading it as and when required. Similar is
the case for output. Usually, we will produce the entire output at once somewhere which might
be used in future computations. In such off-line cases, it does not make sense to consider space
requirement for input and output as part of the space analysis of our algorithm, because
somehow we have to store input somewhere so that our algorithm can read it as and when
needed. That is the least thing our algorithm should and will demand, in off-line cases. So, that
becomes a kind of necessity on the part of our algorithm.
Algorithm-space:- So, algorithm’s space cost should be calculated in terms of the space
requirement of the other entities because of algorithm’s strategy. Place holding variables, loop
variables, certain copy preserving variables and may be sometimes as big as a temporary array
or a list and many such things called as temporary or local entities. These may be required by an
algorithm for it’s strategy to be effective to solve the problem. Such extra space is generally
engaged by an algorithm as a working information set. Algorithm creates such a space and then
possibly keeps on updating this space as a result of it’s functioning and based on the state of
this working information, further actions of the algorithm are decided (just like rough work on
paper). We calculate such costs and attribute it to algorithm’s space requirement. Typically we
want extra space requirement i.e strategy space of the algorithm, to be bounded above by
some constant number, represented asymptotically as a member function of the O(1) class of
constantly dominated functions. We, in fact, have a nice name for such algorithms. “In-Place”
algorithms.
Implementation-space:- But, algorithm is going to be implemented on some device through
models of computation or technologies like programming languages, assembly language or
direct ROM burning of the circuitry corresponding to the algorithm's steps. So, along with
algorithm-space, there may come the cost of space provided by certain features of that
implementation technology that your code is using directly as a service or support feature.
Don't panic if you don't get it right now.
For example, when implemented by method of higher level programming languages like C, C++,
recursive algorithms are usually implemented as recursive function calls (if the language
supports such a feature). Recursive calls are usually handled by what is called as an active
function call stack. This call stack occupies a certain portion of system memory to track function
calls, recursive or otherwise. This system stack grows and shrinks according to the number of
non-recursive calls or embedded active recursive calls present at any moment in the runtime.
Each active call record provides enough memory for each function call to keep it's state. Local
variables, return address and local memory references to dynamically allocated memory are all
recorded in this call activation record (How else do you think such a dynamic information can be stored,
to record order and relevance of function calls??). The amount of system stack dedicated to an
algorithm is called the Stack Depth of that algorithm. We do not, generally, attribute Stack Depth
cost to the algorithm, because it is the dedicated implementation level facility at use. It is the
feature of that implementation technology, like C or C++, serving our algorithm.
If we attribute even this facility or support cost to the algorithm then we are not
appreciating the beauty or elegance of algorithm strategy. Solution methods like divide and
conquer are elegant methods for solving a certain class of problems. As a famous quote says ,
“To iterate is human. To recurse, divine”.
But on the other hand, if we ignore implementation costs altogether then we might run a risk of
overuse of space if our algorithm requires to have way too many active functions depending on
each other simultaneously by our poor implementation of recursion to the levels which are not
even needed. What if the only language currently in the world does'nt support recursive calls?
What if the implementing system shouts back at you saying “Hello, no further recursion please..i will
have to give you my place to support your activities..So, Stop Now or achieve nothing from me!!!”. I bet,
you already have encountered such situations (though not such screaming from the system..system
cries out in code words like “recursion depth exceeded” or “segmentation fault” or similar). So we try
converting such elegant divide and conquer recursive algorithms to non-recursive form i.e
iterative equivalent, without losing efficiency of D&C approach.
This is done for at least 2 reasons...To help system memory not overload itself with recursive
records and ...it is in our nature to look for further efficient solutions even when we have one
efficient solution at hand.
We are still discussing probable issues of recursive method only. Why only recursive method in discussion
here? Are there not other methods where such issues will not arise, may be?
Yes, there of course are other so-called non-recursive methods..like decrease and conquer, memoization (not
memoRization), greedy, branch-bound, backtrack, heuristic-search space and alike..currently you do not fully
understand what these words mean.. but, for time being, anyway, so far, Divide and Conquer (D&C) is like one
of the jewels of methods for solving a problem...and especially a jewel for sorting in the sense that it does
avoid unnecessary comparisons to establish sorted order...in this sense, it really deserves space for discussions
over the method.
So..again...Another quick remedy to reduce the recursive load on the implementing
technology is to stop recursion earlier enough i.e to identify problem size threshold at which
problem is solved by other direct methods rather than calling recursion further. This precedes
by a little algebraic and/or arithmetic analysis of time complexity equations which implies us
that below threshold value recursion would be more costly than direct problem solving. This
reduces the load a lot. Such an analysis is not mathematically difficult. But it involves diligent,
exact step counting of the algorithm or it requires to consider fixing of those constants in your
asymptotic analysis like O, o, Ɵ classes of functions. Only then you can compare 2 functions
absolutely on their magnitude for deciding the threshold or the barrier, because to decide a
barrier or a threshold, what we want here is an exact estimated comparison of 2 functions..and
not just asymptotic behavior.
(b) Time Complexity:It is assumed here that you have some idea about calculating time complexity in terms of what
is called as input size. It is also assumed that know asymptotic analysis notations like O, o, Ω
and Ө. So, we analyze the time complexity of any algorithm, and especially sorting algorithm,
from 3 perspectives.
(i) Worst Case Analysis:As is obvious from the words, by looking at the strategy of the algorithm, we
identify input instances which make algorithm consume the highest possible number
of operations. Purpose is to bound from above the time requirement of the algorithm.
There may be multiple such instances. To get upper bound, we can actually find the
run-time equations of each of these worst cases and then choose the fastest growing
function among them.
There is obviously at least 1 reason to support worst case analysis. And that is, that
knowing upper bound on the running time guarantees the worstness of the algorithm.
Knowing upper bound is surely required.
But there may be several opinions against, putting worst case analysis as the major
time complexity analysis of an algorithm.
---How many worst cases can exist?
Let D= {d1, d2, ...,dn} be an input set to an algorithm, where di belongs to some
domain. Most of the times, any arrangement of this set can become an input to our
algorithm. This is known as an input instance. For ex. given a set of numbers {10,6,3,4,8} as
an input set to sorting algorithm, we know that this same set can come as an input in any of the
5!= 120 ways. So there are 120 input instances. We generally have some idea about how
the arrangements of D are distributed. Let P be such an assumed probability
distribution over this set D. (In case we have no knowledge about the distribution of
arrangements of input set then P is considered to be Uniform Distribution ) Let assume that X is the
set of worst case input instances identified for our algorithm. Now, according to P,
what is the probability of occurrence of at least one of these X worst cases? Means
we are asking about the probability of the set X when P is the distribution over parent
set D. Usually it is low, since usually number of worst cases is itself low.
As a short example, worst case is identified for Quick-Sort by the situation when out
of 2 sub arrays of given array of size n, left or right, exactly 1 is empty so that every
time problem size reduces by just 1. This leads to time complexity equation ε Ɵ(n2).
How many cases force this to happen? Just 2, when input is either increasingly sorted
or decreasingly sorted. So, D= {x1, x2,...,xn} and X={increasing permutation, decreasing
permutation}. Assuming probability distribution over D to be P=Uniform Distribution,
the probability of set X is p(X)=2/(n!), which is tremendously low even for n=10.
Obviously there are other permutations too for which time complexity of Quick-Sort
shoots to Ɵ(n2) class. We will see that in a section dedicated for Quick-Sort.
Above points should make you start thinking that we should be careful when we say
that time complexity of an algorithm is it's worst case equation..i.e we should be
careful when we summarize time complexity in just one class i.e O (Big-Oh). Worst
case analysis will surely have an upper hand in analysis when either worst cases are
too many (in such a case, algorithm is generally inefficient) or probability of occurrence of
worst case set is such high that it starts concerning us. But this all depends on the
distribution P that we choose or that exists in reality, on the input set.
---How really worst any worst case can be? Can we not reduce worstness?
Previous question tried dealing with the quantitative aspect of worst case. Here we
discuss qualitative aspect of a worst case. Instead of talking about number of worst
cases, we talk about how really bad can any worst case be. Bad in the structural
sense, the way the input is stored and the inherent structure of the input elements
themselves. Any worst case will surely lead to higher time complexity. But can we not
really reshape the received input instance so as to avoid receiving worst case input
instances directly to our algorithm? This is called pre-processing the input without
losing it's meaning or value. Can we not pre-process the input to avoid worst input,
somehow, without increasing the time complexity of entire algorithm?
If we can then of course we should take a look at the input instance and pre-process it
if required, before feeding it to our algorithm.
For example, consider again Quick-Sort and it's 2 worst case input instances,
increasing or decreasing input. Before letting Quick-Sort run on the input, we can scan
the array once to see if we have received worst case. It takes Ɵ(n) time to decide such
a presence. Then we can pre-process it by partially permuting it so that now we have
transformed worst case to some other input instance. Then feed to Quick-Sort. Total
time complexity = Pre-processing-Time + T(Q-Sort) = Ɵ(n)+Ɵ(n*log(n)) = Ɵ(n*log(n)).
So we have not disturbed time complexity. (Actually in this case, we do not even need to run
Quick-Sort because worst case instance of increasing input is already sorted and we can sort directly the
other worst case instance i.e decreasing input, in Ɵ(n) time... Think how..)
Now is the time to discuss case when pre-processing is costly. You can try this case yourself. As an
exercise, you can think about some problem and it's algorithm and find out worst cases for given
problem and algo. Think of some way of pre-processing the input. Find such a problem and algo for
which pre-processing input itself is costlier than algo time complexity. A hint to start with is that
input elements themselves should be so complex or so restrictive in structure that any processing
on them will probably take considerable time. For example, how about the problem of sorting the
singly linked list ???
From above, it is clear that worst case analysis must be done to show the upper
bound and may be it is not that good to label the algorithm’s time complexity by it.
But anyway, worst case equation is the upper bound on an algorithm.
(ii) Best Case Analysis:Best cases are those which consume the minimum of resources and computational
steps. Symmetric to worst case analysis, discussion for best case goes.
(iii) Average Case Analysis:Average case analysis is simply averaging the time complexity over all input instances.
Now, there are many kinds of averages..arithmetic average, mean squared average
and so on. We have to choose an averaging notion which fits best to the analysis case.
Now input instances are generally large in number. So obviously it is not feasible to
list down all instances and calculate time complexity for each. Here, using the notions
O, o, Ώ, Ɵ of function classes comes to help. Though there exist as many time
complexity equations as are the instances, these equations usually lead to only a few
function orders. For ex. all functions of the form aX+b where a and b are real numbers belong to
O(X) class order. Though there exist un-countably infinite such functions, they all form just 1 class.
We have to remember that average case complexity considers all input instances and
their equations. When we talk about input instances, we are also talking about
probability distribution P over the input instance set. With this in mind, average
complexity is the mathematical expectation over run time equations, if it exists.(There
are sets for which mathematical expectation does not exist for a given probability distribution because
summation does not converge)
If T is the set of run time equations of input instances then
Average complexity=E[T] =∑all input instances I p(I) * run-time(I)
where, p(I) = probability of occurrence of input instance I.
Example:- Consider the problem of searching an element x in the given array A.
Consider linear search algorithm. Complexity of linear search is the number of
comparisons required for finding the search key x. Assume that elements of A are all
distinct. Given an input set of n elements, there are n! input instance arrays. Assume
uniform probability distribution over input instances. Input instances can be grouped
together according to number of comparisons these instances yield. So there can only
be n+1 groups, n for successful search and 1 for unsuccessful search. So here we can
see that n! instances are categorized in just n+1 groups. All we require is to find the
size of each group. Let C(i) be the size of the group requiring i comparisons. Here
T={1,2,3,4,…,n+1} i.e our time equations are nothing but simply considered to be
number of comparisons, without any loss of generality. Average complexity of linear
search is then mathematical expectation over T, given uniform distribution
E[T]= ∑1<=i<=n+1 i * p(instance subset requiring i comparisons)
Now, since we have assumed uniform distribution, probability of each instance is
1/(n!). So above equation becomes
E[T]= ∑1<=i<=n+1 i * C(i)/(n!)
Calculating C(i) is straightforward. i comparisons means search key is found at location
i. There exist (n-1)! such instances. This is true for all 1<=i<=n. So,
E[T]= ∑1<=i<=n+1 i * (n-1)!/(n!) = ∑1<=i<=n+1 i /n = (n+1)(n+2)/(2n) = Ɵ(n)
So average complexity of linear searching is Ɵ(n). Actual count number says that on
an average we need to scan half the array to find presence or absence. This is intuitive
also but only in the case of uniform probability distribution.
Here we have used elementary combinatorics to count instances. We can simplify
analysis by using other tools like Indicator Random Variables and then calculating
expectation of each random variable and adding all these expectations.
Here analysis was simplified because of uniform probability distribution..calculations
were simplified because each instance had the same probability 1/n! . In any general distribution
case, analysis is quite involved and complex. Equations may become almost
impossible to be represented in closed form. Tools required for such an analysis may
have to be taken from mathematical fields like algebra, calculus, discrete
mathematics and alike.
---What is the difference between average time complexity and time complexity of
average input instance ? Are they same?
Average complexity of an algorithm is what we discussed above..expectation, if it
exists, over run-time equations under considered probability distribution. Average
input instance is certainly a different thing because here we are talking about an
instance and not about complexity. Like best case and worst case, average input is an
instance case which consumes an average number of operations of an algorithm and
resources. So there are 2 ways to figure out this. (i) Either every operation should be
executed average number of times or (ii) algorithm should execute only an average
number of operations. In first case, algorithm executes every operation but average
number of times. In second case, algorithm does not execute every operation but
executes only average number of operations.
Example .. Say an algorithm A consists of 5 steps.
Algorithm A:Step 1 group of statements
cost= c1
repeated at max n 1 times
Step 2 group of statements
cost= c2
repeated at max n 2 times
Step 3 group of statements
cost= c3
repeated at max n 3 times
Step 4 group of statements
cost= c4
repeated at max n 4 times
Step 5 group of statements
cost= c5
repeated at max n 5 times
According to
case (i) average algo cost = ∑i Average cost of Step(i) = ∑i ci*ni/2
case (ii) average algo cost = ∑x cost of Step(x) * nx, where x ϵ 3-subset of {1,2,3,4,5} ..
because out of 5, on an average 3 are executed fully. This second case suggests that
some 3 steps are executed and other 2 not. This can be the case when steps are
executed conditionally i.e under if-else or switch kind of condition clauses. So on an
average number of times conditions evaluate to false.
As you can see, these 2 may evaluate to different cost functions. But for algorithm A,
still the order of both these functions is same i.e they belong to same function class
asymptotically. But this need not be true generally. What if the cost functions of different
steps are of different order? Then second case gives different time complexity functions
depending on which subset of steps actually got executed.
That is why case (i) method of evaluation seems to be appropriate because in that
case an algorithm is not rejecting certain steps just because they might be executed
conditionally. Rather than rejecting, we should in fact consider the average number of
times when the conditions turn to be TRUE and so steps will be executed.
So we should come up with an input instance which satisfies case (i) evaluation.
Again, notion of average here is always with respect to the probability distribution
over input instances.
Considering linear search problem above, considering uniform distribution once
again, average input instance will be any such instance where on an average n/2 times
algorithm fails to locate the search key. This corresponds to the case where search
key is at an average location i.e at location n/2. There are (n-1)! Instances where
search key is at location n/2. So, there are (n-1)! average cases for linear search
problem. All of these evaluate to time complexity functions of same order and that is
Ɵ(n). (Linear search is a problem which has n! worst cases i.e when search key is absent.. (n-1)! best
cases i.e when search key is present at location 1.. and (n-1)! average cases i.e when search key is at
location n/2)
Let us put some notations to summarize momentarily above discussion about average
case and average complexity.
I = set of all input instances
AI= set of average instances
T(x) = time complexity of x, where x is a subset of I
As we saw before, average time complexity is expectation, if it exists, over all
instances under probability distribution P. So, Average Time Complexity = E[T(I)].
And we also know that we have time complexity of average case inputs also. That is
T(AI).
Now read the question which started this subsection. That question is now formally
expressed as
Are E[T(I)] and T(AI) of same order always? i.e do they belong to
same asymptotic function class always?
For linear search algorithm case, answer is YES. So, we can summarize time
complexity analysis of linear search problem on n elements in an expression like
Ω(1) <= T(n) <= O(n) with average complexity Ө(n)
It is left to you to think about the answer in general case.
(3) Desirable Algorithm Properties(if you can achieve):Now, if you have not understood the need and timing of the topics we have discussed
till now, do not worry much. But also do not be relaxed. Be sure that you come to
read these previous points as and when required by you. So, to talk about desirable
properties of any sorting algorithm, we want a sorting algorithm to be
(a) In-Place :Meaning, all the computations done by algorithm should be done in a space
function constraint belonging to the class O(1) of functions. You will see (and
probably know) that merge sort and radix sort are at least 2 such algorithms you
know which are not in-place.
(b) Stable :Meaning, elements which are of same order (relative to sorting) appear in the
sorted order in the output sequence as they appeared in the input sequence.
Meaning, for example, if the integer 5 appears in the sequence, say 3 times, then 5
should appear in output in the same order as itself in the input. To clarify further, let
us say, for the sake of discussion, that 3 instances of this 5 are labeled as 5', 5'' and 5'''
in the order of appearance in i/p(it is same 5, but 3 times). We know that these 3 5's
are same in order. But we want these 3 5's to appear in o/p in the same order i.e 5',
5'', 5'''. If we can achieve this for all elements which have multiple occurrence or the
elements which are order-equivalent w.r.t sorting (remember general magnitude custom
definition we talked about in first section?? total order relation ?? order equivalence??), then we
have achieved stability of the order of appearance.
Apart from these desirable properties, we of course want an algorithm to be efficient
on various frontiers like time, memory hierarchy (like cache, primary memory,
secondary memory) and other resources like network bandwidth if we are dealing
with a huge amount of data over computer interconnection network or any
communication network like radio communication and alike.
Now, after all these considerations, we are in a position to list different strategies for
sorting i.e list different algorithms for sorting. We will discuss, in fair details, every
sorting strategy that turns up in usual B.E curriculum.
(3) An Upper Bound on Run Time of Sorting By Comparison
:
Before discussing a variety of algorithms for sorting, we can derive general upper bound on
running time of any algorithm which does sorting by comparison on a random set of elements
(partially or totally ordered sets). Consider the input set S={a1,a2,…,an} where ai belongs to
some data domain. We assume absence of any further information about any possible input
set S like the range of data elements, frequency of occurrence and probability distribution on
the universe from where the data is chosen.
We can decide sorted ordering of S by comparing each element x with every other element
and checking how many elements are less than x and putting x directly to it’s proper position
using this information. Given n elements, there are nC2 such pairs. Each pair (a,b) checks if a <
b and results in value 0 or 1 accordingly. This information can be stored and reused
repeatedly to swap elements to their proper positions. At a time only 1 element needs fixing
the position. So at a time only n-1 comparisons are required. Given this, run time complexity
becomes O(n2) with space complexity Ɵ(n). We can summarize this as a result.
Sufficient Upper Bound on Run Time and Space Requirement for Sorting By Comparison:Given a set of input elements S of size n and absence of further information about
elements of S, O(n2) run time procedure is sufficient with Ɵ(n) space complexity, for sorting
by comparison.
Equipped with the above result, in later sections, we will see algorithms having time
complexity belonging to the above stated O(n2) family of functions and space complexity
belonging to O(n) family of functions, not Ɵ(n) because this class of forces to require the
variable space which is not the case for many algorithms like Heap sort, Bubble sort, Insertion
sort and many others.
(4) Algorithms for Sorting
:
Below is a list of well studied algorithms which find their way to syllabus almost everywhere. It is
useful to think of an algorithm as a person having it’s own mind and words to describe the
thought. So every algorithm has a strategy which is expressed formally as a step-by-step
computational procedure which we see in text. Consider S to be the set of elements to be
sorted. A few points below are to be considered.
(i) An element x ϵ S is called ith order statistic if x is ith smallest element in the sorted order
over S.
(ii) Given 2 elements i, j ϵ N, with i < j, we say Derangement has occurred if S[i] > S[j]. This
means smaller values occur before higher ones in the sequence. The performance of sorting
algorithms, most of the times, depends on the magnitude of derangement present in the input
sequence. Neatly, performance of any algorithm is tightly linked to the magnitude of the
derangement in i/p sequence and how fast is any strategy in resolving derangements from a
certain to pass to the next. So it makes sense in obtaining complexity bounds relative to
magnitude of derangement and the distribution of a certain magnitude of it in the input
sequence.
(4.1) SELECTION Sort:4.1.1 Strategy:Successively relocate the ith order statistic to location i. This is to be done by Decrease-andConquer approach. Dec-Con approach follows here by assuming that all i-1 statistics have been
relocated to respective proper positions. Given this, ith order is the 1st order statistic of
remaining n-i elements. Follow above procedure for increasing values of i. Separation between
sorted and unsorted elements is maintained by array.
4.1.2
Math Model:-
Consider i/p to be a sequence of elements S = < a1, a2, …, an>
Let Perm = {x | x is a permutation sequence of S}
Swap: Perm X Zn X Zn → Perm
such that,
Swap( p, i, j) = p’ such that
for 0<= k≠i and k≠j <= (n-1), p’(k) = p(k) and
p’(i) = p(j) and p’(j) = p(i)
min : Perm X Zn → Zn
such that
min(p, i) = minimum k { p(k) | i<= k<= (n-1) }
SS= < Swap(p0, 0, min(p0,0)), … , Swap(pn-2, n-2, min(pn-2,n-2)) >
such that
pi = Swap(pi-1, i-1, min(pi-1,i-1)) and
p0 = i/p and o/p = Swap(pn-2, n-2, min(pn-2,n-2))
4.1.3 Algorithm Extraction from Model :As seen, selection sort is modeled as (n-1)-length homogenous sequence of Swap function.
Algorithmically, this can be easily translated into any kind of iterative loop of (n-1) repeatations.
Function min embedded in Swap is a search function finding minimum value. This function can be easily
translated into a linear search over contiguous space. Such direct translations lead to following
procedure text.
For i from 1 to n-1
(1)
f1 = n
min = S[i]
(2)
f2 = (n-1)
For j from i+1 to n
(3)
f3 = (n-i+1)
(4)
f4 = (n-i)
(5)
f5 <= (n-i)
(6)
f6 = (n-1)
Begin
Begin
If ( min > A[j] )
min= A[j]
End
swap min and A[i]
End
4.1.4
Space/Time Complexity Analysis :-
1.4.1 Space :Space(n) =
Variable space ( ϴ(1) ) + StackDepth ( ϴ(1) )
=
ϴ(1)
1.4.2 Time :T(n) = ∑1<=i<=6 fi
4.1.4.2.1 Worst Case Analysis :- is when step (5) gets executed with frequency reaching
upper bound. This happens only when input is reverse sorted. So worst case instance
S= <a1, a2, …, an> such that for all i, ai > ai+1
T(n) = ϴ(n2)
4.1.4.2.2 Best Case Analysis :- is when step (5) gets executed not even once. This happens
only when input is already sorted. So best case instance
S= <a1, a2, …, an> such that for all i, ai <= ai+1
T(n) = ϴ(n2)
4.1.4.2.2 Average Case Analysis :- is when step (5) gets executed an average number of times.
This depends on the probability distribution over Perm. But anyway, average
complexity is the mathematical expectation over run time equations w.r.t a probability
distribution p.
Consider I ϵ Perm
E[ T( Perm ) ] = ∑ I ϵ Perm p(I) * T(I)
As seen from worst and best case analysis, ϴ(n2) <= T(I) <= ϴ(n2). So,
E[ T( Perm ) ] = ∑ I ϵ Perm p(I) * ϴ(n2) = ϴ(n2) * ∑ I ϵ Perm p(I) = ϴ(n2)
As a result of above analysis, if T(n) is the time complexity of the sort then
Ὠ(n2) <= T(n) <= O(n2) implies
T(n) = ϴ(n2)
4.1.5 Discussion :#Swaps = n-1
Selection sort has complexity of the order of n2 because the strategy is insensitive to
received permutation. The strategy is just to seek minimum element from a set and relocate it to proper position. Search operation can not be optimized due to lack of any
specific information about the input sequence. As a result, brute force linear search has
to be used, resulting in a stationary complexity range. Generally, we can think of a strategy to be nearly input insensitive if it results in a stationary complexity range.
(4.2) BUBBLE Sort:4.2.1 Strategy:Successively relocate the ith order statistic to location i in decreasing values of i . This is
to be done by Decrease-and-Conquer approach. Dec-Con approach follows here by assuming
that all n-i statistics have been relocated to respective proper positions. Given this, ith order is
the(n-i)th order statistic of remaining n-i elements. Follow above procedure for decreasing values
of i.
ith element is to be relocated to proper position by pair-wise derangement check and swap(if
required). This is very different from Selection sort strategy. In selection sort, ith element does
not find it’s own way through the sequence because it is swapped exactly once to reach to it’s
proper location. Here, ith element finds it’s own way by moving progressively through the
sequence of swaps whenever required. Separation between sorted and unsorted elements is
maintained by array.
4.2.2
Math Model:I/p: A sequence <a1 a2 … an>
O/p: A permutation sequence of I/p such that for all 1≤ k ≤ n, ak ≤
ak+1
Let SEQ = {x | x is a permutation sequence of I/p sequence }
N = Set of all non-negative integers
(i) To find out of order pairs :F : SEQ X N → N
F(<a1 a2 … an>, p ) = min { i | ai > ai+1 } , 1 ≤ i ≤ p and 1 ≤ p < n
= 0 , otherwise
Description:- Given prefix subsequence <a1 a2 … ap>, F selects minimum index at
which adjacent elements are out of order.
(ii)To swap out of order pairs :fswap : SEQ X N
→ SEQ
fswap(<a1 a2 … an>, k) = <a1’ a2’ … an’> for 1 ≤ k < n
such that
ai’ = ai , 1 ≤ i < k
ak’ = ak+1
ak+1’ = ak
ai’ = ai , k+1 < i ≤ n,
and
fswap(<a1 a2 … an>, 0) = <a1 a2 … an>
Description:- Given sequence <a1 a2 … an>, elements at indices k and (k+1) are out of
order i.e ak > ak+1. Thus, ak and ak+1 are swapped and all other elements remain same. This
transforms original sequence <a1 a2 … an>
to another sequence <a1’ a2’ … an’>.
If no elements are out of order then this is indicated by second argument as 0 (since F outputs
0). In such a case, sequence remains unchanged.
(iii)A Pass through the array :fpass : SEQ X N → SEQ
(restricted over 1≤ k≤ n)
fpass (<a1 a2 … an>, k ) = fkswap (<a1 a2 … an>, F(<a1 a2 … an>, k) ) , for 1≤k< n
= fkswap ( xk , F(xk ,k ) )
such that,
xi = fi-1swap ( xi-1 , F(xi-1 , i-1) ) , for all 1< i ≤ k and
x1 is either an I/p sequence or any other permutation of I/p as a result of previous
operations of any function operating on sequences (can be observed in equation (4) ahead ).
Description:(1) fd means function f is applied to itself d times (iterated functions).
(2) k is the length of the subsequence. fpass with second argument as k means current
subsequence under consideration is <a1 a2 … ak> (i.e first n-k maximum elements have been
settled to proper positions) and bubble sort is trying to fix (n-k+1)st maximum element in it’s
proper position (current proper position under consideration is k).
(iv)
Sequence of Passes :- compute fpass for 1≤ k≤ n
BS = < f1 f2 … fn-1>
such that
fi( xi , n-(i)+1 ) = fpass (<a1 a2 … an>, n-i+1 ), for all 1≤ i < n
and
xj = fj-1( xj-1 , n-( j-1 )+1) for 2≤ j ≤k
and
x1 = I/p
and
O/p = fn-1(sn-1, 2)
Description:(1) Bubble Sort is a series of passes. For next pass, size of sequence under consideration
reduces by 1, since in each pass 1 element is put to it’s proper position in the sequence.
(2) Pass i=1 function f1 operates on x1 = initial sequence as it’s first argument and length n1+1= n. It fixes 1st maximum element at position n. This produces modified sequence with
1st maximum at proper position. This output sequence of f1 is given to f2 as x2. After a
sequence of n-1 functions, output is the desired O/p sequence.
So, Bubble Sort is a sequence BS (as above) of pass functions (modeled as equation (3) ), where
each pass function fixes exactly 1 element in it’s proper place in the sequence. Each of these
pass functions fix the proper element to proper position by swapping(if required). Thus,
swapping is modeled as equation (2). To end, to swap only when required, we require to have a
function which can tell us if any adjacent pair is out of order. This function is modeled as
equation (1).
4.2.3 Algorithm Extraction from Model :In above model, each function fi in sequence BS depends on fpass which in turn depends on fswap which
finally depends on F. So, from a very rough overview, BS sequence forms outer core of the procedure. BS
is a sequence of same function operating over different lengths of the i/p sequence. Each function is a
group of operations. So, in BS that same group of operations is executed one after the other for n-1
times. So, algorithmically we can express the sequence BS as a Repeat statement.
1.
i=1
2.
Repeat (n-1 times )
//because BS is of length n-1
3.
Evaluate function fi of BS (i.e evaluate fpass on length n-i+1)
4.
increment i
Algo 1. Algorithmic analog of mathematical BS
Going ahead, each fi is a fpass function operating on sequence Si of length n-i+1. fpass is itself a repeated
evaluation of function fswap for n-i+1 times. So, going ahead, step 3 itself can be expressed algorithmically
as another Repeat statement. This Repeat is done for each execution of step 2. So extending Algo1,
1.
i=1
// i tracks the pass number
2.
Repeat (n-1 times )
//because BS is of length n-1
3.
k=1
4.
Repeat( n-i+1 times)
5.
// k tracks the position inside a pass
Evaluate function fswap
6.
Increment k
7.
increment
i
Algo 2.
Further, step 5 in Algo 2 is a swap function which will swap only after evaluating function F. If function
F returns 0 then no action should be taken because sequence will be left unchanged for that point of
execution. This conditional behavior can be expressed algorithmically as an if-else structure. We don’t
require else part since, as said above, no action to be taken if F returns 0. So, naturally evaluation of
function F becomes if test condition. So, extending Algo2,
1.
i=1
// i tracks the pass number
2.
Repeat (n-1 times )
//because BS is of length n-1
3.
k=1
// k tracks the position inside a pass
4.
Repeat( n-i+1 times)
5.
If ( F evaluates to non-zero value k) then
6.
swap entries at indices k and k+1
7.
Increment k
8.
increment
i
Algo 3.
Now, further, every operation is on sequences, we have to introduce variables to mention sequences.
Initially variable will represent i/p sequence. Also, step 6 talks about swapping. At this stage, this swap
execution can be expressed in many different ways. So, no real need to expand step 6 further to specify
it’s form. But, F in step 5 actually is a minimum index finding function. This is a complex evaluation.
Since we are looking for an algorithm on serial processing machines, minimum finding is itself a search
over all indices one-by-one, starting from minimum index 1. We can use variable k introduced in step 3
of Algo3. Extending Algo3,
1.
s = i/p sequence , s’ =i/p sequence
2.
i=1
3.
Repeat (n-1 times )
// i tracks the pass number
//because BS is of length n-1
4.
s= s’
5.
k=1
6.
Repeat( n-i+1 times)
7.
// k tracks the position inside a pass
If ( k = F(s’,k) ) then
8.
swap s’(k) and s’(k+1),
9.
s’= old s’ with swap changes effected
10.
11.
Increment k
increment
i
12 . s’ is the desired o/p sequence in increasingly sorted order
Algo4.
Below given algorithm is the refined expression of above Algo4.
Bubble Sort Algorithm :- (Heavy elements push down approach)
For(i=1 to n-1)
For(k=1 to n-i)
If( ak > ak+1)
……………………………………………..(4)
……………………………………………(3)
………………………………………… (1)
f1 = n
f2 = n-i+1
f3 = n-i
Swap ak with ak+1 …………………………………… (2)
f4 <= n-i
4.2.4 Space/Time Complexity Analysis :4.2.4.1 Space :Space(n) =
Variable space ( ϴ(1) ) + StackDepth ( ϴ(1) )
=
ϴ(1)
4.2.4.2 Time :4.2.4.2.1 Worst Case Analysis :- is when step labeled(2) executes to the upper bound.
T(n) = ∑i=1n ∑j=1i ( ϴ(1) + ϴ(1) ) = ϴ( n (n+1)÷2) = ϴ ( n2)
This is the case when input sequence comes in reverse sorted order.
4.2.4.2.2 Best Case Analysis :- is when input sequence comes in requisite sorted order so
that step labeled(2) executes none of the times.
T(n) = ∑i=1n ∑j=1i ( ϴ(1) ) = ϴ( n (n+1)÷2) = ϴ ( n2)
4.2.4.2.2 Average Case Analysis :- same as that of Selection sort due to the fact that best
and worst case complexities remain stationary.
As a result of above analysis, if T(n) is the time complexity of the sort
then
Ὠ(n2) <= T(n) <= O(n2) implies
T(n) = ϴ(n2)
4.2.5 Discussion :Discussion here goes along the same lines as that for Selection sort. There are a few
changes, however . Owing to the fact that the ith element finds its way through the sequence by
progressive swaps, the number of swaps is comparatively lot more than that in Selection sort. As
observed, number of swaps here obey
0 <= #swaps <= n *(n-1) ÷ 2 = O(n2)
Difference is not only in number of swaps. Number of swaps here depends on how bad a
permutation has been received. In best case, #swaps = 0. Tracking #swaps from pass to pass can
surely be utilized to decide whether the next pass is even necessary. If in the ith pass #swaps=0
then this directly implies that array has been already sorted, since the highest i-1 statistics have
been repositioned properly and #swaps=0 for remaining n-i elements. This provides a way to
optimize the sort to respond to the best case and cases nearly best case instance.
Tracking #swaps for a pass requires ϴ(1) extra space for tracking variable and adds only ϴ(1) to
time complexity per pass to set the tracking variable if required. Extra condition of checking
tracking variable for zero, before initiating the next pass, requires ϴ(1) time per pass.
All of the above analysis implies that addition to the space and run time complexity is only a
constant and the lower bound i.e best case time complexity reduces to Ὠ(n). Thus, modified
bubble sort would have following run time analysis.
As a result of above analysis, if T(n) is the time complexity of the sort
then
Ὠ(n) <= T(n) <= O(n2)
Improvement was possible because the strategy of checking pair-wise adjacent elements for
derangement is exactly also the strategy to check whether the sequence is in sorted order. This sort can
be said to be sensitive to the sequence of elements. This has helped optimize the sort, which is not the
case with Selection sort discussed earlier. Also, to be noticed that, to optimize the algorithm, not much
change is done to the basic strategy of the sort.
© Copyright 2026 Paperzz