US03CBCA03 (Advanced Data & File Structure) Unit - III CHARUTAR VIDYA MANDAL’S SEMCOM Vallabh Vidyanagar Faculty Name: Ami D. Trivedi Class: SYBCA Subject: US03CBCA03 (Advanced Data & File Structure) *UNIT – 3 (SORTING AND SEARCHING) INTRODUCTION TO SORTING Sorting is a task of rearranging data in an order such as ascending, descending or lexicographic. Sorting also means rearranging a set of records based on their key values when the records are stored in a file. Sorting is known as a fundamental operation in Computer Science. Operation of sorting is frequently performed in business data processing applications. Sorting operation has also become important in many scientific applications. Human can perform sorting task naturally. However, a computer program has to follow a sequence of exact instructions to do sorting. This sequence of instructions is called an algorithm. A sorting algorithm is a method that can be used to put a list of unordered items into an ordered sequence. Various sorting algorithms exist, and they differ in terms of their efficiency and performance. Some important and well-known sorting algorithms are bubble sort, selection sort, insertion sort, quick sort etc.. Sorting techniques can be classified into two broad categories: Internal sorting and External sorting. Internal sort: When a set of data is small enough such that entire sorting can be performed in a computer’s internal storage (primary memory) then the sorting is called internal sort. External sort: Sorting of a large set of data, which is stored in low speed computer’s external memory (such as hard disk, magnetic tape, etc) is called external sort. It involves large amount of data transfer between external memory (low speed) and main memory (high speed). Many sorting techniques have been developed. e.g. Bubble sort Selection sort Merge sort Insertion sort Quick sort Heap sort Shuttle sort Radix sort Bucket sort Flash sort Address calculation sort Partition exchange sort Two way merge sort Shell sort / Comb sort Simple pancake sort Spaghetti (Poll) sort Distribution sort Tournament sort LSD Radix sort MSD Radix sort postman sort Mineral sort Shaker sort Timsort Introsort Cycle sort Library sort Strand sort Smoothsort Bogosort Counting sort Cocktail sort Gnome sort Pigeonhole sort Spread sort Burst sort Stooge sort Sample sort Odd-even sort Bead sort A programmer has to choose from a verity of sorting methods. Basically three points that should affect a programmer’s decision are: 1. Programming time 2. Execution time of program 3. Memory or secondary memory space required Page 1 of 12 US03CBCA03 (Advanced Data & File Structure) Unit - III BASIC SORTING TECHNIQUES 1. BUBBLE SORT Algorithm: BUBBLE_SORT (K, N) This algorithm sorts elements into ascending order. K N PASS LAST I EXCHS 1. 2 3. 4. 5. 6. Vector (Array) Number of elements in a vector Pass counter Position of last unsorted element Index (subscript) used for vector elements Used to count number of exchanges done in any pass [ Initialize ] LAST ← N (entire list assumed unsorted at this point) [ Loop on pass index ] Repeat thru step 5 for PASS = 1, 2, ……….., N-1 [ Initialize exchanges counter for this pass ] EXCHS ← 0 [ Perform pair wise comparisons on unsorted elements ] Repeat for I = 1, 2, …………., LAST-1 If K [ I ] > K [ I + 1 ] then K[I] K[I+1] EXCHS ← EXCHS + 1 [ Were any exchanges made on this pass ? ] If EXCHS = 0 then Return (mission accomplished; return early) else LAST ← LAST – 1 (reduce size of unsorted list) [ Finished ] Return (maximum number of passes required) Explanation Bubble sort is well known sorting method. In this algorithm, at the most n-1 passes (rounds) are required. Here, n is number of elements. During 1st pass (round), element K1 and K2 are compared. If K1 is greater than K2 then they are interchanged (swapped). This process will be repeated for K2 and K3, K3 and K4 and so on. This method will force the small values to move up like a bubble. After 1 st pass, the largest value will be at nth position. In every pass, the next largest element will be at position n-1, n-2,…….,2, 1 respectively. After each pass, checking is done to find out whether any interchanges (exchanges) were made in that pass or not. If no interchanges required, it means that data is sorted. So now no pass is required. Step-by-step example Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number to greatest number using bubble sort. In each step, elements written in bold are being compared. Three passes will be required. Page 2 of 12 US03CBCA03 (Advanced Data & File Structure) First Pass: (51428) (15428) (14528) (14258) Second Pass: (14258) (14258) (12458) (12458) Third Pass: (12458) (12458) (12458) (12458) (15428) (14528) (14258) (14258) (14258) (12458) (12458) (12458) Unit - III Compare first two elements and swaps them Swap since 5 > 4 Swap since 5 > 2 These elements are already in order (8 > 5), algorithm does not swap them. Swap since 4 > 2 (12458) (12458) (12458) (12458) Advantages of Bubble sort 1. Easy to understand. 2. Easy to implement. 3. Better algorithm for almost sorted data. Disadvantages of Bubble sort Large amount of data movement required if data is in random order or reverse sorted order. 2. SELECTION SORT Algorithm: SELECTION_SORT (K, N) This algorithm sorts elements into ascending order. K N PASS Vector (Array) Number of elements in a vector Pass counter and position of first element in the vector which will be checked during a particular pass MIN_INDEX Position of smallest element found so far in a particular pass I Index (subscript) used for vector elements 1. 2 3. 4. 5. [ Loop on pass index ] Repeat thru step 4 for PASS = 1, 2, ……….., N-1 [ Initialize minimum index ] MIN_INDEX ← PASS [ Make a pass and obtain element with smallest value ] Repeat for I = PASS + 1, PASS + 2, ………., N If K [ I ] < K [ MIN_INDEX ] then MIN_INDEX ← I [ Exchange elements ] If MIN_INDEX ≠ PASS then K [ PASS ] K [ MIN_INDEX ] [ Finished ] Return Page 3 of 12 US03CBCA03 (Advanced Data & File Structure) Unit - III Step-by-step example Explanation Selection is the easiest way to sort. In this algorithm, n-1 passes (rounds) are required. Here, n is number of elements. In 1st pass we begin with first element K1 in the list considering it as minimum. Position of first element is remembered as minimum position. Then K2 is compared with element at minimum position. If K2 is found minimum then position of K2 is remembered as minimum position. This process will be repeated for K3, K4 and so on. At the end of 1st pass, we will get position of 1st smallest element from list. Element at this position will be interchanged with 1st element. Note that interchange of elements will not be required if position of minimum element is not changed means K1 is minimum. In 2nd pass we begin with second element K2 in the list considering it as minimum. Repeat above procedure. After n-1 pass, we will get sorted array. Advantages of Selection sort 1. 2. 3. 4. Easy to understand. Easy to implement. Faster than bubble sort because each pass requires at the most one interchange of data. It performs well on a small list. Disadvantages of Selection sort Takes more time than bubble sort for almost sorted data because it needs all passes to be performed. Page 4 of 12 US03CBCA03 (Advanced Data & File Structure) Unit - III 3. MERGE SORT Algorithm: SIMPLE_MERGE (K, FIRST, SECOND, THIRD) This algorithm sorts elements into ascending order. K TEMP FIRST SECOND THIRD I J L 1. 2 3. 4. 5. Vector (Array) contains two ordered arrays Temporary vector Position of first element of First vector in K vector Position of first element of Second vector in K vector Position of last element of Second vector in K vector Index (subscript) used for first vector elements Index (subscript) used for second vector elements Index (subscript) used for TEMP vector elements [ Initialize ] I ← FIRST J ← SECOND L←0 [ Compare corresponding elements and output the smallest ] Repeat while I < SECOND and J ≤ THIRD If K [ I ] ≤ K [ J ] then L←L+1 TEMP [ L ] ← K [ I ] I←I+1 Else L←L+1 TEMP [ L ] ← K [ J ] J←J+1 [ Copy the remaining unprocessed elements in output area ] If I ≥ SECOND then Repeat while J ≤ THIRD L←L+1 TEMP [ L ] ← K [ J ] J←J+1 Else Repeat while I < SECOND L←L+1 TEMP [ L ] ← K [ I ] I←I+1 [ Copy elements of temporary vector into original area ] Repeat for I = 1, 2, …………., L K [ FIRST – 1 + I ] ← TEMP [ I ] [ Finished ] Return Explanation Operation of sorting is related to merging. This algorithm merges two sorted vectors to single sorted vector. This can be done by selecting the item with smallest value in one of the vector and place them in new vector. Page 5 of 12 US03CBCA03 (Advanced Data & File Structure) Unit - III Here, two sorted vectors are stored in a common vector K as follows. K: 11 23 42 9 FIRST 25 SECOND THIRD Where elements of first vector are stored from position FIRST to SECOND-1 and elements of second vector are stored from position SECOND to THIRD. A loop will be executed for comparison between elements of both the vector. 1st element of first vector will be compared with 1st element of second vector. Smallest out of two will be copied to temporary vector. Subscript of a vector whose element is copied will be incremented by one to point to next element of same vector. This loop will terminate when it comes to end of one of the vector. Now, rest of the elements of remaining vector will be copied to temporary vector. And finally, sorted elements from temporary vector will be copied to Original vector. Note: This algorithm can be generalized to merge k sorted tables into a single sorted table. Such a merging operation is called multiple merging or k-way merging. Step-by-step example K: I and L 11 23 42 9 J 25 42 9 J 25 9 J 25 25 TEMP: 9 K: 11 I and L 23 TEMP: 9 11 K: 11 23 I and L 42 TEMP: 9 11 23 L 9 K: 11 23 I 42 TEMP: 9 11 23 25 K: 11 23 42 9 L 25 TEMP: 9 11 23 25 42 Advantages of Merge sort Easy to merge already sorted lists into a new sorted list with merge sort. Disadvantages of Merge sort Merge sort requires extra storage space for temporary vector. Page 6 of 12 US03CBCA03 (Advanced Data & File Structure) Unit - III APPLICATION OF SORTING Sorting algorithms are essential in a broad variety of applications. 1. Commercial computing Government organizations, financial institutions, and commercial enterprises organize much of their information by sorting it. Information related to accounts to be sorted by name or number, transactions to be sorted by time or place, mail to be sorted by postal code or address, files to be sorted by name or date etc. Processing such data requires use of a sorting algorithm. 2. Search for information Keeping data in sorted order makes it possible to efficiently search through it using the classic binary search algorithm. Speeding up searching is perhaps the most important application of sorting. 3. Operations research We can arrange jobs as per increasing order of processing time to complete jobs in such a way that it maximizes customer satisfaction by minimizing the average completion time of the jobs. Suppose that we have N jobs to complete, where job j requires t j seconds of processing time. We need to complete all of the jobs, but want to maximize customer satisfaction by minimizing the average completion time of the jobs. We schedule jobs in increasing order of processing time as per “shortest processing time first” rule to accomplish this goal. 4. Event-driven simulation Many scientific applications involve simulation, to model some aspect of the real world to understand it in a better way. In event driven simulation, pending events are sorted by event time to save time required to search next event. Take example of Bank simulator. We are given - number of cashier, arrival time of each customer and time required to serve each customer. Our goal is design a simulator that will tell us how long each customer waits in line. Start a simulation clock at 0 ticks. At each iteration, advance the clock to time of next event. Pending events are organized as a priority queue, sorted by event time. 5. Numerical computations Scientific computing is often concerned with accuracy (how close are we to the true answer?). Some numerical algorithms use priority queues and sorting to control accuracy in calculations. Accuracy is extremely important when we are performing millions of computations with estimated values such as the floating-point representation of real numbers that we commonly use on computers. For e.g. one way to do numerical integration is (where the goal is to estimate the area under curve) to maintain priority queue with accuracy estimates for a set of subintervals that compromise the whole interval. The process is to remove the least accurate subinterval, split it in half (thus achieving better accuracy for the two halves) and put the two halves back onto the priority queue, continuing until the desired tolerance is reached. 6. String processing algorithms are often based on sorting. For example, an algorithm for finding the longest repeated substring in a given string that is based on first sorting suffixes of the strings. Page 7 of 12 US03CBCA03 (Advanced Data & File Structure) 7. Unit - III Records with multiple keys Sorting is done on different key as per the requirement. E.g. transaction data can be sorted on customer number or on date etc. In typical applications, records have multiple data members that might need to serve as sort keys. For example, one client may need to sort the transaction list by account number; another client might need to sort the list by place; and other clients might need to use other fields as sort keys. 8. Display Google Page Rank results Page rank results can be useful to know importance of a page. Important pages have high page rank. Arranging pages according to their page rank will be helpful to find reputable pages. PageRank is an algorithm that calculates a web metric which shows how reputable a particular page is according to Google. This rank depends not only on the quality and the quantity of the incoming links but also on several other parameters such as the number of outgoing links per page (on the linking webpage), the position/visibility of the links and more. 9. Find the median The median of a finite list of numbers can be found by arranging all the observations from lowest value to highest value and picking the middle one. If there is an even number of observations, then there is no single middle value; the median is then usually defined to be the mean of the two middle values. 10. Frequency distribution and find the mode Mode means given a set of n items, which element occurs the largest number of times? Frequency distribution displays the number of observations within a given interval. For both of above task, sort the items and do a linear scan. 11. Find the closest pair Closest pair means given n numbers, find the pair which are closest to each other. Once the numbers are sorted, the closest pair will be next to each other in sorted order. So a linear scan will speedily complete the task of finding closest pair in sorted data. 12. Identify statistical outliers In statistics, an outlier is an observation that is numerically distant from the rest of the data. To find outlier, first step is sorting of data. 13. Find duplicates in a mailing list 14. Organize an MP3 library 15. Element uniqueness Given a set of n items, are they all unique or are there any duplicates? To check this, sort items and do a linear scan to check all adjacent pairs. 16. Stability Stable sorting method will keep the data in order after the sort. A sorting method is stable if it preserves the relative order of equal keys in the array. For example, suppose, in our internet commerce application, that we enter transactions into an array as they arrive, so they are in order of the time field in the array. Now suppose that the application requires that the transactions be separated out by location for further processing. One easy way to do so is to sort the array by location. If the sort is unstable, the transactions for each city may not necessarily be in order by time after the sort. Some of the sorting methods are stable (insertion sort and mergesort); many are not (selection sort, shellsort, quicksort, and heapsort). Page 8 of 12 US03CBCA03 (Advanced Data & File Structure) Unit - III INTRODUCTION TO SEARCHING Searching is a process of finding an element within the list of elements. List of elements have been stored in order or randomly. Search algorithm is an algorithm for finding an item with specified properties among a collection of items. BASIC SEARCHING TECHNIQUES Searching is divided into two categories: 1. Linear Search (Sequential Search) 2. Binary Search 1. LINEAR SEARCH Suppose we want to search an element in given unordered list of elements. Simplest technique is to scan every element in a sequential manner and check whether it is desired element or not. A search will be successful if all the elements are accessed and the desired element is not found. If the desired element is present in first position then only one comparison is required. If the desired element is at last position then n comparisons are required. Here n is number of elements. Linear searching is the basic and simple method of searching. Algorithm: LINEAR_SEARCH (K, N, X) This algorithm searches an element from unordered / ordered vector. K N X Vector (Array) consist N+1 elements Number of elements in a vector Element to be searched 1. [ Initialize search ] I←1 K[N+1]←X [ Search the vector ] Repeat while K [ I ] ≠ X I ← I +1 [ Successful search ? ] If I = N + 1 then Write (‘UNSUCCESSFUL SEARCH’) Return ( 0 ) else Write (‘SUCCESSFUL SEARCH’) Return ( I ) 2 3. Explanation In first step, we store the element to be searched at n+1 position of array. A sequential search is performed on n+ 1 element. A loop will be executed for comparison between array elements and X (element to be searched). Loop will terminate if desired element will be found. At this time I will contain position of desired element in array. If we could not find element till nth position then element at n+1 position (which is X) will match with X. And the loop will be terminated. This time I will contain value n+1. After the loop, we can check that if I is equal to n+1 means it is an unsuccessful search. Otherwise it is successful search. Page 9 of 12 US03CBCA03 (Advanced Data & File Structure) Unit - III Step-by-step example Searching for key = 05 Advantages of Linear search 1. Linear searching is the basic and simple method of searching. 2. Easy to implement. 3. Useful for searching an element in an unordered or ordered list. Disadvantages of Linear search Linear search is time consuming. 2. BINARY SEARCH Binary search is very efficient algorithm. This search technique searches the given item in minimum possible comparisons. We need to sort the array elements in increasing order to perform binary search. Less time is taken by Binary search to search an element from the sorted list of elements. So binary search method is more efficient than the linear search. Updating an ordered array due to insertions or deletions is time consuming task. So, binary search is not useful when the array changes often. Algorithm: BINARY_SEARCH (K, N, X) This algorithm searches an element from an ordered vector. K N X Vector (Array) Number of elements in a vector Element to be searched 1. [ Initialize ] LOW ← 1 HIGH ← N [ Perform search ] Repeat thru step 4 while LOW ≤ HIGH [ Obtain index of midpoint of interval ] MIDDLE ← └ ( LOW + HIGH ) /2 ┘ [ Initialize ] If X < MIDDLE then HIGH ← MIDDLE – 1 else if X > MIDDLE then LOW ← MIDDLE + 1 else Write (‘SUCCESSFUL SEARCH’) Return ( MIDDLE ) [ Unsuccessful search ] Write (‘UNSUCCESSFUL SEARCH’) Return ( 0 ) 2 3. 4. 5. LOW MIDDLE HIGH Lower limit of search interval Middle limit of search interval Upper limit of search interval Page 10 of 12 US03CBCA03 (Advanced Data & File Structure) Unit - III Step-by-step example Binary search in case of Successful Binary search in case of Unsuccessful search search Explanation Logic in this technique is: 1. First find the middle element of the array. 2. Compare middle element with X. 3. There are 3 possibilities: a. It is desired element, so search is successful. b. If it is less than X then search only first half of array. c. If it is greater than X then search only second half of array. For b case, new search area will be lower limit to middle-1. For c case, new search area will be middle+1 to upper limit. Repeat the same steps until an element is found or we end with unsuccessful search. Advantages of Binay search 1. Binary search is very efficient algorithm. 2. Require fewer number of comparisons as compared to Linear search. Disadvantages of Binary search 1. Binary search is not useful when the array elements are frequently changed. 2. Array must be sorted to perform binary search. Page 11 of 12 US03CBCA03 (Advanced Data & File Structure) Unit - III APPLICATION OF SEARCHING 1. Search algorithms can be used to find solutions or objects with specified properties and constraints in a large solution search space or among a collection of objects. 2. There are Search algorithms which are designed for the prospective quantum computer. Quantum computer is a device that uses quantum mechanical phenomena (quantum physics) to perform operations on data. 3. In text editors, we might want to search through a very large document for the occurrence of a given string. 4. In text retrieval tools, we may want to search through thousands of such documents. 5. String matching algorithms as part of a more complex algorithm (e.g., the Unix program ``diff'' that works out the differences between two similar text files). String matching / String searching algorithms will try to find a place where 1 or more strings (patterns) are found within a large string or text. 6. To search in binary strings (ie, sequences of 0s and 1s). For example the ``pbm'' graphics format is based on sequences of 1s and 0s. 7. Implementing a "switch() ... case:" construct in a virtual machine where the case labels are individual integers. If you have 100 cases, you can find the correct entry in 6 to 7 steps using binary search, whereas sequence of conditional branches takes on average 50 comparisons. 8. Binary search is now used in 99% of 3D games and applications. Space is divided into a tree structure and a binary search is used to retrieve which subdivisions to display according to a 3D position and camera. 9. Binary search offers a feature of finding non-exact matches (closest matches). SORTING V/S SEARCHING 1. 2. 3. 4. 5. 6. Sorting The process of arranging data elements or 1. data records in to data structure is called Sorting. There are various sorting techniques such 2. as selection sort, bubble sort, insertion sort, quick sort, merge sort, shell sort. After performing sorting techniques, the 3. position of data elements or data records are changed. After performing sorting, searching 4. becomes easy. Output of sorting algorithms is sorted 5. elements. If insertion and deletions occur very 6. frequently than sorting is time consuming for large array. Searching The process of finding data elements or data records from data structure is called Searching. There are two searching techniques such as sequential search and binary search. After performing searching techniques, the position of data elements or data records are not changed. Without performing sorting, searching becomes difficult. Output of searching algorithm successful or unsuccessful search. is Insertion and deletion in unordered array will only increase / decrease few comparisons in linear search, Disclaimer The study material is compiled by Ami D. Trivedi. The basic objective of this material is to supplement teaching and discussion in the classroom in the subject. Students are required to go for extra reading in the subject through library work. Page 12 of 12
© Copyright 2026 Paperzz