Govt. Polytechnic for Women, Morni Hills, Panchkula Department of Computer Engineering E- Session Notes Subject: Data Structure; Semester: 3rd For any enquiry, please contact Office or Call at 01733-250096 or Visit www.mornigpw.org or contact us at [email protected] 1 Subject:-Data Structure Session :- 3 hrs S. No Topic Covered 1 Introduction, Step in Program Development, Problem Session 1 Solving Concept, Flow Chart, Algorithm, Variable & Constant. Data Type, Pointer Variable, Introduction of Data Session 2 Structure, Classification and its Operations. Arrays, Its Representation in computer memory. 2 Session No. 3 Array Representation in computer memory Operation Session 3 Traversing Algorithm of an array. 4 Insertion Algorithm and Deletion Algorithm, Introduction of Linked List. 5 Introduction of Double Linked List, Representation of Session 5 Linked list in Memory, Traversing Algorithm of Linked list. 6 Inserting into Linked List, Application of Lined list 7 Session 4 Session 6 Deletion into Link List, Circular Linked List, Traversing Session 7 Insertion and Deletion in doubly linked list, Header Linked list 8 Introduction to Stacks, Representation of Stacks,Operation Session 8 of Stacks 9 Use of StackPolish Notation, Session 9 Postfix Evaluation, Conversion of Infix to Postfix evolution. 10 Introduction of Queue,Its Implementation,DE-queue, En- Session 10 queue, Circular Queue, Recursion, 11 Introduction of Search, Linear Search and 12 Session 11 Binary Search Introduction Of Sorting, Bubble Sort, Session 12 Insertion Sort, 13 QuickSort, Selection Sort, Merge Sort 14 Heap Sort,Radix Sort,Exchange Sort 2 Session 13 Session 14 Unit -1st Fundamental Notations Introduction:Computer software, or just software, is a collection of computer programs and related data that provides the instructions for telling a computer what to do and how to do it. Software refers to one or more computer programs and data held in the storage of the computer. In other words, software is a set of programs, procedures, algorithms and its documentation concerned with the operation of a data processing system. Program software performs the function of the program it implements, either by directly providing instructions to the digital electronics or by serving as input to another piece of software. The term was coined to contrast to the old term hardware (meaning physical devices). In contrast to hardware, software "cannot be touched". Software is also sometimes used in a more narrow sense, meaning application software only. Sometimes the term includes data that has not traditionally been associated with computers, such as film, tapes, and records. Computer software is so called to distinguish it from computer hardware, which encompasses the physical interconnections and devices required to store and execute (or run) the software. At the lowest level, executable code consists of machine language instructions specific to an individual processor. A machine language consists of groups of binary values signifying processor instructions that change the state of the computer from its preceding state. Programs are an ordered sequence of instructions for changing the state of the computer in a particular sequence. It is usually written in high-level programming languagesthat are easier and more efficient for humans to use (closer to natural language) than machine language. High-level languages are compiled or interpreted into machine language object code. Software may also be written in an assembly language, essentially, a mnemonic representation of a machine language using a natural language alphabet. Assembly language must be assembled into object code via an assembler. Types of software:- 3 A layer structure showing where the operating system software and application software are situated while running on a typical desktop computer Software includes all the various forms and roles that digitally stored data may have and play in a computer (or similar system), regardless of whether the data is used as code for a CPU, or other interpreter, or whether it represents other kinds of information. Software thus encompasses a wide array of products that may be developed using different techniques such as ordinary programming languages, scripting languages, microcode, or an FPGA configuration. The types of software include web pages developed in languages and frameworks like HTML, PHP, Perl, JSP, ASP.NET, XML, and desktop applications like OpenOffice.org, Microsoft Word developed in languages like C, C++, Objective-C, Java, C#, or Smalltalk. Application software usually runs on an underlying software operating systems such as Linux or Microsoft Windows. Software (or firmware) is also used in video games and for the configurable parts of the logic systems of automobiles, televisions, and other consumer electronics. Practical computer systems divide software systems into three major classes[citation needed]: system software, programming software and application software, although the distinction is arbitrary, and often blurred. System software System software is computer software designed to operate the computer hardware, to provide basic functionality, and to provide a platform for running application software.[5][6] System software includes device drivers, operating systems, servers, utilities, and window systems. System software is responsible for managing a variety of independent hardware components, so that they can work together harmoniously. Its purpose is to unburden the application softwareprogrammer from the often complex details of the particular computer being used, including such accessories as communications devices, printers, device readers, displays and keyboards, and also to partition the computer's resources such as memory and processor time in a safe and stable manner. Programming software Programming software includes tools in the form of programs or applications that software developers use to create, debug, maintain, or otherwise support other programs and applications. The term usually refers to relatively simple programs such as compilers, debuggers, interpreters, linkers, and text editors, that can be combined together to accomplish a task, much as one might use multiple hand tools to fix a physical object. Programming tools are intended to assist a programmer in writing computer programs, and they may be combined in an integrated development environment (IDE) to more easily manage all of these functions. 4 Application software Application software is developed to perform in any task those benefits from computation. It is a set of programs that allows the computer to perform a specific data processing job for the user. It is a broad category, and encompasses software of many kinds, including the internet browser being used to display this page. Step in Program Developing :Planning Planning is an objective of each and every activity, where we want to discover things that belong to the project. An important task in creating a software program is extracting the requirements or requirements analysis.[1] Customers typically have an abstract idea of what they want as an end result, but not what software should do. Skilled and experienced software engineers recognize incomplete, ambiguous, or even contradictory requirements at this point. Frequently demonstrating live code may help reduce the risk that the requirements are incorrect. Once the general requirements are gathered from the client, an analysis of the scope of the development should be determined and clearly stated. This is often called a scope document. Certain functionality may be out of scope of the project as a function of cost or as a result of unclear requirements at the start of development. If the development is done externally, this document can be considered a legal document so that if there are ever disputes, any ambiguity of what was promised to the client can be clarified. Implementation, testing and documenting Implementation is the part of the process where software engineers actually program the code for the project. Software testing is an integral and important phase of the software development process. This part of the process ensures that defects are recognized as soon as possible. Documenting the internal design of software for the purpose of future maintenance and enhancement is done throughout development. This may also include the writing of an API, be it external or internal. The software engineering process chosen by the developing team will determine how much internal documentation (if any) is necessary. Plan-driven models (e.g., Waterfall) generally produce more documentation than Agile models. 5 Deployment and maintenance Deployment starts after the code is appropriately tested, approved for release, and sold or otherwise distributed into a production environment. This may involve installation, customization (such as by setting parameters to the customer's values), testing, and possibly an extended period of evaluation. Software training and support is important, as software is only effective if it is used correctly Problem Solving Concept:Structured programming is a programming paradigm aimed on improving the clarity, quality, and development time of a computer program by making extensive use of subroutines, block structures and for and while loops – in contrast to using simple tests and jumps such as the goto statement which could lead to "spaghetti code" which is both difficult to follow and to maintain. It emerged in the 1960s, particularly from work by Böhm and Jacopini, and a famous letter, Go To Statement Considered Harmful, from EdsgerDijkstra in 1968[2]—and was bolstered theoretically by the structured program theorem, and practically by the emergence of languages such as ALGOL with suitably rich control structures. structured programs are often composed of simple, hierarchical program flow structures. These are sequence, selection, and repetition: "Sequence" refers to an ordered execution of statements. In "selection" one of a number of statements is executed depending on the state of the program. This is usually expressed with keywords such as if..then..else..endif, switch, or case. In some languages keywords cannot be written verbatim, but must be stropped. In "repetition" a statement is executed until the program reaches a certain state, or operations have been applied to every element of a collection. This is usually expressed with keywords such as while, repeat, for or do..until. Often it is recommended that each loop should only have one entry point (and in the original structural programming, also only one exit point, and a few languages enforce this). 6 Graphical representations of the three basic patterns. The box diagrams (blue) were invented right for the new theory, and you can see here their equivalents in the mostly used control flow charts. ALGORITHM These are the steps written in simple natural language one by one in sequence so that if any step is missed it could be written easily. When we write the Algorithm then there is no need accruing knowledge. ALGORITHM NOTATION An algorithm is base of not only effective data structure but also base of good programming. Therefore it is necessary that each step in algorithm is written clearly. A complete algorithm Notation is given below. Name Of Algorithm: Each algorithm has a name related with subject. The name of the algorithm is always written a capital latter in very first line of algorithm. Introductory Comments: Algorithm name is followed by brief description of the task. This section has the entire variable name, which are assumed and used in algorithm. Steps: The actual algorithm is made by different steps. Each step begin with description closed in square bracket. Comments: Each step may be end with some comments about step. These comments give better idea about each step to reader. Example of algorithm ALGORITHM FOR CALCULATE AVARAGE (This algorithm reads marks of four subject and the calculate the average of these marks. Where m1, m2, m3 and m4 are used to store the marks of four subjects respectively and an other variable “average” is used to store the average of total marks.) Step-1. [input individual marks] Read (m1, m2, m3, m4) Step-2 [Calculate the average] Average(m1+m2+m3+m4)/4 Step-3 [Out Put the Result] Print average Step-4 [Finished] Exit 7 Flowchart :- A simple flowchart representing a process for dealing with a non-functioning lamp. A flowchart is a type of diagram that represents an algorithm or process, showing the steps as boxes of various kinds, and their order by connecting them with arrows. This diagrammatic representation can give a step-by-step solution to a given problem. Process operations are represented in these boxes, and arrows connecting them represent flow of control. Data flows are not typically represented in a flowchart, in contrast with data flow diagrams; rather, they are implied by the sequencing of operations. Flowcharts are used in analyzing, designing, documenting or managing a process or program in various fields. Overview Flowcharts are used in designing and documenting complex processes or programs. Like other types of diagram, they help visualize what is going on and thereby help the viewer to understand a process, and perhaps also find flaws, bottlenecks, and other less-obvious features within it. There are many different types of flowcharts, and each type has its own repertoire of boxes and notational conventions. The two most common types of boxes in a flowchart are: a processing step, usually called activity, and denoted as a rectangular box a decision, usually denoted as a diamond. A flowchart is described as "cross-functional" when the page is divided into different swim lanes describing the control of different organizational units. A symbol appearing in a particular "lane" is within the control of that organizational unit. This technique allows the author to locate the responsibility for performing an action or making a decision correctly, showing the responsibility of each organizational unit for different parts of a single process. 8 Flowcharts depict certain aspects of processes and they are usually complemented by other types of diagram. For instance, Kaoru Ishikawa defined the flowchart as one of the seven basic tools of quality control, next to the histogram, Pareto chart, check sheet, control chart, cause-and-effect diagram, and the scatter diagram. Similarly, in UML, a standard concept-modeling notation used in software development, the activity diagram, which is a type of flowchart, is just one of many different diagram types. Common alternate names include: flowchart, process flowchart, functional flowchart, process map, process chart, functional process chart, business process model, process model, process flow diagram, work flow diagram, business flow diagram. The terms "flowchart" and "flow chart" are used interchangeably. Flowchart building blocks Examples A simple flowchart for computing factorial N (N!) Stencil for flow-chart with the symbols. A flowchart for computing the factorial of N — written N! and equal to 1 × 2 × 3 × ... × N. Symbols A typical flowchart from older basic computer science textbooks may have the following kinds of symbols: 9 Start and end symbols Represented as circles, ovals or rounded (fillet) rectangles, usually containing the word "Start" or "End", or another phrase signaling the start or end of a process, such as "submit inquiry" or "receive product". Arrows Showing "flow of control". An arrow coming from one symbol and ending at another symbol represents that control passes to the symbol the arrow points to. The line for the arrow can be solid or dashed. The meaning of the arrow with dashed line may differ from one flowchart to another and can be defined in the legend. Generic processing steps Represented as rectangles. Examples: "Add 1 to X"; "replace identified part"; "save changes" or similar. Subroutines Represented as rectangles with double-struck vertical edges; these are used to show complex processing steps which may be detailed in a separate flowchart. Example: PROCESS-FILES. One subroutine may have multiple distinct entry points or exit flows (see coroutine); if so, these are shown as labeled 'wells' in the rectangle, and control arrows connect to these 'wells'. Input/Output Represented as a parallelogram. Examples: Get X from the user; display X. Prepare conditional Represented as a hexagon. Shows operations which have no effect other than preparing a value for a subsequent conditional or decision step (see below). Conditional or decision Represented as a diamond (rhombus) showing where a decision is necessary, commonly a Yes/No question or True/False test. The conditional symbol is peculiar in that it has two arrows coming out of it, usually from the bottom point and right point, one corresponding to Yes or True, and one corresponding to No or False. (The arrows should always be labeled.) More than two arrows can be used, but this is normally a clear indicator that a complex decision is being taken, in which case it may need to be broken-down further or replaced with the "pre-defined process" symbol. Junction symbol Generally represented with a black blob, showing where multiple control flows converge in a single exit flow. A junction symbol will have more than one arrow coming into it, but only one going out. In simple cases, one may simply have an arrow point to another arrow instead. These are useful to represent an iterative process (what in Computer Science is called a loop). A loop may, for example, consist of a connector where control first enters, processing steps, a conditional with one arrow exiting the loop, and one going back to the connector. For additional clarity, wherever two lines accidentally cross in the drawing, one of them may be drawn with a small semicircle over the other, showing that no junction is intended. Labeled connectors 10 Represented by an identifying label inside a circle. Labeled connectors are used in complex or multi-sheet diagrams to substitute for arrows. For each label, the "outflow" connector must always be unique, but there may be any number of "inflow" connectors. In this case, a junction in control flow is implied. Concurrency symbol Represented by a double transverse line with any number of entry and exit arrows. These symbols are used whenever two or more control flows must operate simultaneously. The exit flows are activated concurrently when all of the entry flows have reached the concurrency symbol. A concurrency symbol with a single entry flow is a fork; one with a single exit flow is a join. It is important to remember to keep these connections logical in order. All processes should flow from top to bottom and left to right. Data-flow extensions A number of symbols have been standardized for data flow diagrams to represent data flow, rather than control flow. These symbols may also be used in control flowcharts (e.g. to substitute for the parallelogram symbol). A Document represented as a rectangle with a wavy base; A Manual input represented by quadrilateral, with the top irregularly sloping up from left to right. An example would be to signify data-entry from a form; A Manual operation represented by a trapezoid with the longest parallel side at the top, to represent an operation or adjustment to process that can only be made manually. A Data File represented by a cylinder. Types of flowchart flowcharts can be modeled from the perspective of different user groups (such as managers, system analysts and clerks) and that there are four general types:[9] Document flowcharts, showing controls over a document-flow through a system Data flowcharts, showing controls over a data-flow in a system System flowcharts showing controls at a physical or resource level Program flowchart, showing the controls in a program within a system Data type In computer science and computer programming, a data type or simply type is a classification identifying one of various types of data, such as real-valued, integer or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of that type can be stored. 11 Overview Data types are used within type systems, which offer various ways of defining, implementing and using them. Different type systems ensure varying degrees of type safety. Formally, a type can be defined as "any property of a programmed we can determine without executing the program". Almost all programming languages explicitly include the notion of data type, though different languages may use different terminology. Common data types may include: integers, Booleans, characters, floating-point numbers, Alphanumeric strings. For example, in the Java programming language, the "int" type represents the set of 32-bitintegers ranging in value from -2,147,483,648 to 2,147,483,647, as well as the operations that can be performed on integers, such as addition, subtraction, and multiplication. Colors, on the other hand, are represented by three bytes denoting the amounts each of red, green, and blue, and one string representing that color's name; allowable operations include addition and subtraction, but not multiplication. Most programming languages also allow the programmer to define additional data types, usually by combining multiple elements of other types and defining the valid operations of the new data type. For example, a programmer might create a new data type named "complex number" that would include real and imaginary parts. A data type also represents a constraint placed upon the interpretation of data in a type system, describing representation, interpretation and structure of values or objects stored in computer memory. The type system uses data type information to check correctness of computer programs that access or manipulate the data. Most data types in statistics have comparable types in computer programming, and vice-versa, as shown in the following table: Statistics Computer programming real-valued (interval scale) floating-point real-valued (ratio scale) count data (usually non-negative) integer binary data Boolean categorical data enumerated type random vector list or array random matrix two-dimensional array 12 Classes of data types Primitive data types Machine data types All data in computers based on digital electronics is represented as bits (alternatives 0 and 1) on the lowest level. The smallest addressable unit of data is usually a group of bits called a byte (usually an octet, which is 8 bits). The unit processed by machine code instructions is called a word (as of 2011, typically 32 or 64 bits). Most instructions interpret the word as a binary number, such that a 32-bit word can represent unsigned integer values from 0 to or signed integer values from to . Because of two's complement, the machine language and machine doesn't need to distinguish between these unsigned and signed data types for the most part. There is a specific set of arithmetic instructions that use a different interpretation of the bits in word as a floating-point number. Machine data types need to be exposed or made available in systems or low-level programming languages, allowing fine-grained control over hardware. The C programming language, for instance, supplies integer types of various widths, such as short and long. If a corresponding native type does not exist on the target platform, the compiler will break them down into code using types that do exist. For instance, if a 32-bit integer is requested on a 16 bit platform, the compiler will tacitly treat it as an array of two 16 bit integers. Several languages allow binary and hexadecimal literals, for convenient manipulation of machine data. In higher level programming, machine data types are often hidden or abstracted as an implementation detail that would render code less portable if exposed. For instance, a generic numeric type might be supplied instead of integers of some specific bit-width. Boolean type The Boolean type represents the values: true and false. Although only two values are possible, they are rarely implemented as a single binary digit for efficiency reasons. Many programming languages do not have an explicit Boolean type, instead interpreting (for instance) 0 as false and other values as true. Numeric types The integer data types, or "whole numbers". May be subtype according to their ability to contain negative values (e.g. unsigned in C and C++). May also have a small number of predefined subtypes (such as short and long in C/C++); or allow users to freely define sub ranges such as 1..12 (e.g. Pascal/Ada). 13 Floating point data types, sometimes misleadingly called real, contain fractional values. They usually have predefined limits on both their maximum values and their precision. Fixed point data types are convenient for representing monetary values. They are often implemented internally as integers, leading to predefined limits. Bignum or arbitrary precision numeric types lack predefined limits. They are not primitive types, and are used sparingly for efficiency reasons. Composite types Main article: Composite type Composite types are derived from more than one primitive type. This can be done in a number of ways. The ways they are combined are called data structures. Composing a primitive type into a compound type generally results in a new type, e.g. array-of-integer is a different type to integer. An array stores a number of elements of the same type in a specific order. They are accessed using an integer to specify which element is required (although the elements may be of almost any type). Arrays may be fixedlength or expandable. Record (also called tuple or struct) Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members. Union. A union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g. "float or long integer". Contrast with a record, which could be defined to contain a float and an integer; whereas, in a union, there is only one value at a time. A tagged union (also called a variant, variant record, discriminated union, or disjoint union) contains an additional field indicating its current type, for enhanced type safety. A set is an abstract data structure that can store certain values, without any particular order, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a boolean "in" or "not in". An object contains a number of data fields, like a record, and also a number of program code fragments for accessing or modifying them. Data structures not containing code, like those above, are called plain old data structure. Many others are possible, but they tend to be further variations and compounds of the above. Enumerations Main article: Enumerated type The enumerated type. This has values which are different from each other, and which can be compared and assigned, but which do not necessarily have any particular concrete representation in the computer's memory; compilers and 14 interpreters can represent them arbitrarily. For example, the four suits in a deck of playing cards may be four enumerators named CLUB, DIAMOND, HEART, SPADE, belonging to an enumerated type named suit. If a variable V is declared having suit as its data type, one can assign any of those four values to it. Some implementations allow programmers to assign integer values to the enumeration values, or even treat them as type-equivalent to integers. String and text types Alphanumeric character. A letter of the alphabet, digit, blank space, punctuation mark, etc. Alphanumeric strings, a sequence of characters. They are typically used to represent words and text. Character and string types can store sequences of characters from a character set such as ASCII. Since most character sets include the digits, it is possible to have a numeric string, such as "1234". However, many languages would still treat these as belonging to a different type to the numeric value 1234. Character and string types can have different subtypes according to the required character "width". The original 7-bit wide ASCII was found to be limited, and superseded by 8 and 16-bit sets, which can encode a wide variety of non-Latin alphabets (Hebrew, Chinese) and other symbols. Strings may be either stretch-tofit or of fixed size, even in the same programming language. They may also be subtype by their maximum size. Note: strings are not primitive in all languages, for instance C: they may be composed from arrays of characters. Other types Types can be based on, or derived from, the basic types explained above. In some language, such as C, functions have a type derived from the type of their return value. Pointers and references The main non-composite, derived type is the pointer, a data type whose value refers directly to (or "points to") another value stored elsewhere in the computer memory using its address. It is a primitive kind of reference. (In everyday terms, a page number in a book could be considered a piece of data that refers to another one). Pointers are often stored in a format similar to an integer; however, attempting to dereference or "look up" a pointer whose value was never a valid memory address would cause a program to crash. To ameliorate this potential problem, pointers are considered a separate type to the type of data they point to, even if the underlying representation is the same. Abstract data types Any type that does not specify an implementation is an abstract data type. For instance, a stack (which is an abstract type) can be implemented as an array (a 15 contiguous block of memory containing multiple values), or as a linked list (a set of non-contiguous memory blocks linked by pointers). Abstract types can be handled by code that does not know or "care" what underlying types are contained in them. Programming that is agnostic about concrete data types is called generic programming. Arrays and records can also contain underlying types, but are considered concrete because they specify how their contents or elements are laid out in memory. Examples include: A queue is a first-in first-out list. Variations are Deque and Priority queue. A set can store certain values, without any particular order, and with no repeated values. A stack is a last-in, first out. A tree is a hierarchical structure. A graph. A hash or dictionary or map or Map/Associative array/Dictionary is a more flexible variation on a record, in which name-value pairs can be added and deleted freely. A smart pointer is the abstract counterpart to a pointer. Both are kinds of reference Utility types For convenience, high-level languages may supply ready-made "real world" data types, for instance times, dates and monetary values, even where the language allows them to be built from primitive types. Constant In computer programming, a constant is an identifier whose associated value cannot typically be altered by the program during its execution (though in some cases this can be circumvented, e.g. using self-modifying code). Many programming languages make an explicit syntactic distinction between constant and variable symbols. Although a constant's value is specified only once, a constant may be referenced many times in a program. Using a constant instead of specifying a value multiple times in the program can not only simplify code maintenance, but it can also supply a meaningful name for it and consolidate such constant bindings to a standard code location (for example, at the beginning). Dynamically-valued constants Besides the static constants described above, many procedural languages such as Ada and C++ extend the concept of constants toward global variables that are 16 created at initialization time, local variables that are automatically created at runtime on the stack or in registers, to dynamically allocated memory that is accessed by pointer, and to parameter lists in function headers. Dynamically-valued constants do not designate a variable as residing in a specific region of memory, nor are the values set at compile time. In C++ code such as floatfunc(const float ANYTHING) { const float XYZ = someGlobalVariable*someOtherFunction(ANYTHING); ... } the expression that the constant is initialized to are not themselves constant. Use of constantans is not necessary here for program legality or semantic correctness, but has three advantages: 1. It is clear to the reader that the object will not be modified further, once set 2. Attempts to change the value of the object (by later programmers who do not fully understand the program logic) will be rejected by the compiler 3. The compiler may be able to perform code optimizations knowing that the value of the object will not change once created.[3] Dynamically-valued constants originated as a language feature with ALGOL 68.[3] Studies of Ada and C++ code have shown that dynamically-valued constants are used infrequently, typically for 1% or less of objects, when they could be used much more, as some 40–50% of local, non-class objects are actually invariant once created.[3][4] On the other hand, such "immutable variables" tend to be the default in functional languages since they favour programming styles with no side-effect (e.g., recursion) or make most declarations immutable by default. Some functional languages even forbid sideeffects entirely. Constantness is often used in function declarations, as a promise that when an object is passed by reference, the called function will not change it. Depending on the syntax, either a pointer or the object being pointed to may be constant, however normally the latter is desired. Especially in C and C++, the discipline of ensuring that the proper data structures are constant throughout the program is called const-correctness. Variable In computer programming, a variable is a storage location and an associated symbolic name (an identifier) which contains some known or unknown quantity or information, a value. The variable name is the usual way to reference the stored value; this separation of name and content allows the name to be used independently of the exact information it represents. The identifier in computer source code can be bound to a value during run time, and the value of the variable may thus change during the course of program execution. Variables in 17 programming may not directly correspond to the concept of variables in mathematics. The value of a computing variable is not necessarily part of an equation or formula as in mathematics. In computing, a variable may be employed in a repetitive process: assigned a value in one place, then used elsewhere, then reassigned a new value and used again in the same way (see iteration). Variables in computer programming are frequently given long names to make them relatively descriptive of their use, whereas variables in mathematics often have terse, one- or two-character names for brevity in transcription and manipulation. A variable storage location may be referred by several different identifiers, a situation known as aliasing. Assigning a value to the variable using one of the identifiers will change the value that can be accessed through the other identifiers. Compilers have to replace variables' symbolic names with the actual locations of the data. While a variable's name, type, and location often remain fixed, the data stored in the location may be changed during program execution. Identifiers referencing a variable An identifier referencing a variable can be used to access the variable in order to read out the value, or alter the value, or edit the attributes of the variable, such as access permission, locks, semaphores, etc. For instance, a variable might be referenced by the identifier "total_count" and the variable can contain the number 1956. If the same variable is referenced by the identifier "x" as well, and if using this identifier "x", the value of the variable is altered to 2009, then reading the value using the identifier "total_count" will yield a result of 2009 and not 1956. If a variable is only referenced by a single identifier that can simply be called the name of the variable. Otherwise, we can speak of one of the names of the variable. For instance, in the previous example, the "total_count" is a name of the variable in question, and "x" is another name of the same variable. Scope and extent The scope of a variable describes where in a program's text the variable may be used, while the extent (or lifetime) describes when in a program's execution a variable has a (meaningful) value. The scope of a variable is actually a property of the name of the variable, and the extent is a property of the variable itself. A variable name's scope affects its extent. Scope is a lexical aspect of a variable. Most languages define a specific scope for each variable (as well as any other named entity), which may differ within a given program. The scope of a variable is the portion of the program code for 18 which the variable's name has meaning and for which the variable is said to be "visible". Entrance into that scope typically begins a variable's lifetime and exit from that scope typically ends its lifetime. For instance, a variable with "lexical scope" is meaningful only within a certain block of statements or subroutine. Variables only accessible within a certain functions are termed "local variables". A "global variable", or one with indefinite scope, may be referred to anywhere in the program. Extent, on the other hand, is a runtime (dynamic) aspect of a variable. Each binding of a variable to a value can have its own extent at runtime. The extent of the binding is the portion of the program's execution time during which the variable continues to refer to the same value or memory location. A running program may enter and leave a given extent many times, as in the case of a closure. Unless the programming language features garbage collection, a variable whose extent permanently outlasts its scope can result in a memory leak, whereby the memory allocated for the variable can never be freed since the variable which would be used to reference it for deallocation purposes is no longer accessible. However, it can be permissible for a variable binding to extend beyond its scope, as occurs in Lisp closures and C static local variables; when execution passes back into the variable's scope, the variable may once again be used. A variable whose scope begins before its extent does is said to be uninitialized and often has an undefined, arbitrary value if accessed (see wild pointer), since it has yet to be explicitly given a particular value. A variable whose extent ends before its scope does may become a dangling pointer and deemed uninitialized once more since its value has been destroyed. Variables described by the previous two cases may be said to be out of extent or unbound. In many languages, it is an error to try to use the value of a variable when it is out of extent. In other languages, doing so may yield unpredictable results. Such a variable may, however, be assigned a new value, which gives it a new extent. For space efficiency, a memory space needed for a variable may be allocated only when the variable is first used and freed when it is no longer needed. A variable is only needed when it is in scope, but beginning each variable's lifetime when it enters scope may give space to unused variables. To avoid wasting such space, compilers often warn programmers if a variable is declared but not used. It is considered good programming practice to make the scope of variables as narrow as feasible so that different parts of a program do not accidentally interact with each other by modifying each other's variables. Doing so also prevents action at a distance. Common techniques for doing so are to have different sections of a program use different name spaces, or to make individual variables "private" through either dynamic variable scoping or lexical variable scoping. Many programming languages employ a reserved value (often named null or nil) to indicate an invalid or uninitialized variable. 19 Parameters The formal parameters of functions are also referred to as variables. For instance, in this Python code segment, def add two(x): return x + 2 add two(5)# yields 7 The variable named x is a parameter because it is given a value when the function is called. The integer 5 is the argument which gives x its value. In most languages, function parameters have local scope. This specific variable named x can only be referred to within the add two function (though of course other functions can also have variables called x). Memory allocation The specifics of variable allocation and the representation of their values vary widely, both among programming languages and among implementations of a given language. Many language implementations allocate space for local variables, whose extent lasts for a single function call on the call stack, and whose memory is automatically reclaimed when the function returns. More generally, in name binding, the name of a variable is bound to the address of some particular block (contiguous sequence) of bytes in memory, and operations on the variable manipulate that block. Referencing is more common for variables whose values have large or unknown sizes when the code is compiled. Such variables reference the location of the value instead of storing the value itself, which is allocated from a pool of memory called the heap. Bound variables have values. A value, however, is an abstraction, an idea; in implementation, a value is represented by some data object, which is stored somewhere in computer memory. The program, or the runtime environment, must set aside memory for each data object and, since memory is finite, ensure that this memory is yielded for reuse when the object is no longer needed to represent some variable's value. Objects allocated from the heap must be reclaimed—especially when the objects are no longer needed. In a garbage-collected language (such as C#, Java, and Lisp), the runtime environment automatically reclaims objects when extant variables can no longer refer to them. In non-garbage-collected languages, such as C, the program (and the programmer) must explicitly allocate memory, and then later free it, to reclaim its memory. Failure to do so leads to memory leaks, in which the heap is depleted as the program runs, risks eventual failure from exhausting available memory. When a variable refers to a data structure created dynamically, some of its components may be only indirectly accessed through the variable. In such circumstances, garbage collectors (or analogous program features in languages that lack garbage collectors) must deal with a case where only a portion of the memory reachable from the variable needs to be reclaimed. 20 Naming conventions Unlike their mathematical counterparts, programming variables and constants commonly take multiple-character names, e.g. COST or total. Single-character names are most commonly used only for auxiliary variables; for instance, i, j, k for array index variables. Some naming conventions are enforced at the language level as part of the language syntax and involve the format of valid identifiers. In almost all languages, variable names cannot start with a digit (0-9) and cannot contain whitespace characters. Whether, which, and when punctuation marks are permitted in variable names varies from language to language; many languages only permit the underscore ("_") in variable names and forbid all other punctuation. In some programming languages, specific (often punctuation) characters (known as sigils) are prefixed or appended to variable identifiers to indicate the variable's type. Case-sensitivity of variable names also varies between languages and some languages require the use of a certain case in naming certain entities; [note 1] Most modern languages are case-sensitive; some older languages are not. Some languages reserve certain forms of variable names for their own internal use; in many languages, names beginning with 2 underscores ("__") often fall under this category. However, beyond the basic restrictions imposed by a language, the naming of variables is largely a matter of style. At the machine code level, variable names are not used, so the exact names chosen do not matter to the computer. Thus names of variables identify them, for the rest they are just a tool for programmers to make programs easier to write and understand. Using poorly chosen variable names can make code more difficult to review than non-descriptive names, so names which are clear are often encouraged.[1] Programmers often create and adhere to code style guidelines which offer guidance on naming variables or impose a precise naming scheme. Shorter names are faster to type but are less descriptive; longer names often make programs easier to read and the purpose of variables easier to understand. However, extreme verbosity in variable names can also lead to less comprehensible code. 21 Pointer Variable:- Pointer a pointing to the memory address associated with variable b. Note that in this particular diagram, the computing architecture uses the same address space and data primitive for both pointers and non-pointers; this need not be the case. In computer science, a pointer is a programming languagedata type whose value refers directly to (or "points to") another value stored elsewhere in the computer memory using its address. For high-level programming languages, pointers effectively take the place of general purpose registers in low-level languages such as assembly language or machine code, but may be in available memory. A pointer references a location in memory, and obtaining the value at the location a pointer is known as dereferencing the pointer. A pointer is a simple, more concrete implementation of the more abstract reference data type. Several languages support some type of pointer, although some have more restrictions on their use than others. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number. Pointers to data significantly improve performance for repetitive operations such as traversing strings, lookup tables, control tables and tree structures. In particular, it is often much cheaper in time and space to copy and dereference pointers than it is to copy and access the data to which the pointers point. Pointers are also used to hold the addresses of entry points for called subroutines in procedural programming and for run-time linking to dynamic link libraries (DLLs). In object-oriented programming, pointers to functions are used for bindingmethods, often using what are called virtual method tables. Pointers are directly supported without restrictions in languages such as PL/I, C, C++, Pascal, and most assembly languages. They are primarily used for constructing references, which in turn are fundamental to constructing nearly all data structures, as well as in passing data between different parts of a program. 22 In functional programming languages that rely heavily on lists, pointers and references are managed abstractly by the language using internal constructs like cons. When dealing with arrays, the critical lookup operation typically involves a stage called address calculation which involves constructing a pointer to the desired data element in the array. If the data elements in the array have lengths that are divisible by powers of two, this arithmetic is usually much more efficient. Padding is frequently used as a mechanism for ensuring this is the case, despite the increased memory requirement. In other data structures, such as linked lists, pointers are used as references to explicitly tie one piece of the structure to another. The basic syntax to define a pointer is: int*ptr; This declares ptr as the identifier of an object of the following type: pointer that points to an object of type int This is usually stated more succinctly as 'ptr is a pointer to int.' Because the C language does not specify an implicit initialization for objects of automatic storage duration,[3] care should often be taken to ensure that the address to which ptr points is valid; this is why it is sometimes suggested that a pointer be explicitly initialized to the null pointer value, which is traditionally specified in C with the standardized macro NULL:[4] int*ptr= NULL; Dereferencing a null pointer in C produces undefined behavior, which could be catastrophic. However, most implementationssimply halt execution of the program in question, usually with a segmentation fault. However, initializing pointers unnecessarily could hinder program analyses, thereby hiding bugs. In any case, once a pointer has been declared, the next logical step is for it to point at something: int a =5; int*ptr= NULL; ptr=&a; This assigns the value of ptr to be the address of a. For example, if a is stored at memory location of 0x8130 then the value of ptr will be 0x8130 after the assignment. To dereference the pointer, an asterisk is used again: *ptr=8; 23 This means take the contents of ptr (which is 0x8130), "locate" that address in memory and set its value to 8. If a is later accessed again, its new value will be 8. DATA STRUCTURES The logical or mathematical model of a particular organization of data is called data structure. Data structure is way of storing and accessing the data in to acceptable form for computer. So that large number of data is processed in small interval of time. In simple way we say that storing the data in to computer memory is called data structure. We also say that data structure is way in which data is stored and accessed. Data model depends on two considerations. First it must be rich enough in structure to mirror the actual relationships of the data in the real world. Second the structure should be simple enough that one can effectively process the data when necessary. The study of data structure includes the following steps. o Logical or mathematical descriptions of the structure. o Implication of the structure on computer memory. o Quantitative analyses of structure, which include determining the amount of memory which needed to, store the data and time required for processing. Classification of Data Data structures are classified in to two types o Linear Data Structure o Non-Linear Data Structure 24 Structure Linear Data Structure: The organization of data in single row or we say that in form of sequence is called linear data structure. For example Array List etc. Non-Linear Data Structure: The organization of data in hierarchical form is called non-linear data structure. For example files, tree, graph etc. Operation Perform on Data Structure When data is proceeding then different type of operation perform. Following are most important operation, which play major role in data structure. Traversing: Accessing each record so that certain item may be proceed is called traversing. We also say that visiting of the record is called Traversing. Searching:Finding the location of record with given value is called Searching. Inserting: Adding new record to structure is called inserting. Deleting: Removing a record or set of records form data structure is called deleting process. Sorting: Arranging the record in some logical order is called Sorting. Merging: Combining the two or more then two record is called Merging. Types of Data Structure Array Stack Queue Link List Tree 25 Unit- 2nd Arrays An array is a collection of consecutive memory locations that can be referred to as a single name with subscripts. In simple word we say that array is collection of continues memory location where one type of data is stored. There are two type of array. Linear Array: A linear array is also called one-dimensional array. The entire element stored in shape of vector means in one row or one column. The elements of the array stored respectively in successive memory locations. In linear array each element of an array is referred by one subscript. The number ‘n’ of the elements is called the length or size of the array. The length /size of the array can be obtained from the index set by the formula Length = UB-LB+1 Where UB is the largest index, called the upper bound and LB is the smallest index, called the lower bound of the array. Note that length = UB when LB=1. The elements of the array ‘A’ may be denoted by the subscript notation. A1, A2, A3………….…An. Or by the parentheses notation A(1),A(2),A(3),……,A(n). (Used in FORTRAN and BASIC) Or by the bracket notation (used in PASCAL and C) A[1],A[2],A[3],……,A[n]. The number n in A [n] is called a subscript or an index and A [n] is called a subscripted variable. Multi-dimensional arrays The number of indices needed to specify an element is called the dimension, dimensionality, or rank of the array type. (This nomenclature conflicts with the concept of dimension in linear algebra,[5] where it is the number of elements. Thus, an array of numbers with 5 rows and 4 columns, hence 20 elements, is said to have dimension 2 in computing contexts, but represents a matrix with dimension 4-by-5 or 20 in mathematics. Also, the computer science meaning of "rank" is similar to its meaning in tensor algebra but not to the linear algebra concept of rank of a matrix.) 26 Many languages support only one-dimensional arrays. In those languages, a multi-dimensional array is typically represented by an Iliffe vector, a onedimensional array of references to arrays of one dimension less. A twodimensional array, in particular, would be implemented as a vector of pointers to its rows. Thus an element in row i and column j of an array A would be accessed by double indexing (A[i][j] in typical notation). This way of emulating multidimensional arrays allows the creation of ragged or jagged arrays, where each row may have a different size — or, in general, where the valid range of each index depends on the values of all preceding indices. This representation for multi-dimensional arrays is quite prevalent in C and C++ software. However, C and C++ will use a linear indexing formula for multidimensional arrays that are declared as such, e.g. by intA[10][20] or int A[m][n], instead of the traditional int **A.[6]:p.81 Indexing notation Most programming languages that support arrays support the store and select operations, and have special syntax for indexing. Early languages used parentheses, e.g. A(i,j), as in FORTRAN; others choose square brackets, e.g. A[i,j] or A[i][j], as in Algol 60 and Pascal. Representation of linear array in computer memory The memory of computer is sequence of address and each element which stored in this memory has its own address. For example an array ‘ A ‘ have data elements 200,104,23,84,90 200 104 23 84 0 1 2 3 90 4 Each element of array ‘A’ have its own address which start form index value 0 to 4. So LB is 0 and UP is 4 Total Length =LB –UP +1 0 – 4 +1=5 So array ‘A’ contain total 5 element and its index start form 0 to 5. 27 Accessing the element of array When an array is properly mapped and stored in memory then DOP VECTOR Method id very efficient way to access each element. In this method we use the start address of element. This start address is called Base address of array. If we know the base address of array the we can find and access any element of array by using following formula. Loc(LA[k])= Base(LA) + W (K – Lower Bound) K is searching element whose address is to be located. W is length that one element take place in array. For example LA is linear array and have yearly record of company form 1932 to 1984. Each year record reserves 4-memory location. If we find the address of 1965 then; LB index value 1932 Info 200 1934 Info 204 1935 Info 208 Up to Base address Base (LA)=200 W=4 K=1965 Loc(LA[k])= Base(LA) + W (K – Lower Bound) Loc(LA[1965])=200 + 4 (1965 – 1932) = 200 + 4(33) =332 So we find that address of 1965 is 332 so we can easily access the record stored at that address. Traversing of array: Accessing each record of array so that certain item may be proceed is called traversing of array. We also say that visiting of the record is called Traversing. Following algorithm traverse all the element of array. 28 1984 --- TRAVERSING OF LINEAR ARRAY (Let LA is linear array the this algorithm help to print each element of array on screen. In this algorithm we use two variable LB and UB. Where LB denote the lower bound and UB denote the upper bound) Step-1 [initialize the loop counter] Repeat step-2 for k=LB to UB Step-2 [apply the process] Print LA [k] Step-3 [finish] Exit Insertion and Deletion Let A be a collection of data elements in the memory of the computer. Insertion refers to the operation of adding another element to the collection A, and deletion refers to the operation of removing one of the elements from A. Inserting an element at the end of a linear array can be easily done provided the memory space allocated for the array as large enough to accommodate the additional element. on the other hand suppose we want to insert an element in the middle of the array then on the average half of the elements must be down ward to newel locations to accommodate the new element and keep the order of the other elements. Similarly deleting an element at the end of an array creates no difficulties, but deleting an element somewhere in the middle of the array would require that each subsequent element moved one location upward in order to fill up the array. Algorithm for insertion The following algorithm inserts a data element ITEM into the kth position in a linear array LA with N elements. The first four steps create space in memory by moving downward one location each element from the kth position. We first set j:=N and then using j as a counter decrease j each time the loop is executed until j reached k. the next step, (step 5) inserts ITEM into the array in the space just created. Before the exit from the algorithm, the number N of elements in LA is increased by 1 to account for the new element. ALGORITHM FOR INSERTION (LA, N, K, ITEM) (Here LA is a linear array with N elements and k is a +ve integer such that k<=N. This algorithm inserts an element ITEM into the kth position in LA.) 29 1. [Initialize counter] Set J = N 2. Repeat step 3 and 4 while j>=k 3. [Move element downward] Set LA [j+1]=LA [J] 4. [Decrease counter] Set j =j-1 [End of step 2 loop] 5. [Insert element] Set LA [k] = ITEM 6. [Reset Array N] Set N = N+1 7. [Finish] Exit Algorithm for insertion Deletion means remove an element form the linear array .if we want to delete the element from end the it is simple but if we want to delete the element form middle of an array then first we move all the element up word from the item which will be deleted. In the end reset the array size as one item is deleted. So following step will taken. 1. Find the location of the element to be deleted. 2. Delete the element at that location. This action creates empty space. 3. Move the element up word to fill that empty space. 4. Reset the nth value of array as one item is deleted. Algorithm for DELETION (LA, N, K, ITEM) (Here LA is a linear array with N elements and k is a +ve integer such that k<=N. this algorithm deleted the kth element from LA). 1. [Select the item] Set ITEM =LA[k] 30 2. [Set the loop] Repeat for j = k to N -1 [Move j+1st element upward] Set LA[j]=LA[j+1] [End of loop] 3. [Reset N] Set N:=N-1 4. [Finish] Exit 31 Unit 3rd Linked List Data processing frequently involves storing and processing data organized into lists. One way to store such data is by means of arrays. Array is linear relationship between the data elements. The element of array take place continues memory location so they have physical relationship between data element in the memory and if once array declared than we cannot increase the size of array. Another way is Link List. In link list the entire element have logical relationship not physical relationship like that array. In simple word, link list is linear collection of element called node. Where linear order given by means of pointer.Such that each node divided in to two parts. First part contains the information of the element and second part called the link field or next pointer field, contains the address of the next node in the list. In the following figure a linked list is shown with five nodes. Each node is pictured with two parts. The left part represents the information which may contain an entire record of data items e.g. NAME, ADDRESS, . . . . . and etc. The right part represents the next pointer field of the node, and there is an arrow drawn from it to the next node in the list. The pointer of the last node contains a special value called the null pointer, which means end of the link list. Name or Start One way link list x Next pointer field of 3rd node Informational part of 3rd node The linked list also contains a list pointer variable called start or name that contains the address of the first node in the list. A special case is the list that has no nodes. Such a list is called the null list or empty list and denoted by the null pointer in the variable START. Doubly linked list In computer science, a doubly linked list is a linked data structure that consists of a set of sequentially linked records called nodes. Each node contains two fields, called links, that are references to the previous and to the next node in the sequence of nodes. The beginning and ending nodes' previous and next links, respectively, point to some kind of terminator, typically a sentinel node or null, to facilitate traversal of the list. If there is only one sentinel node, then the list is circularly linked via the sentinel node. It can be conceptualized as two singly linked lists formed from the same data items, but in opposite sequential orders. 32 A doubly linked list whose nodes contain three fields: an integer value, the link to the next node, and the link to the previous node. The two node links allow traversal of the list in either direction. While adding or removing a node in a doubly linked list requires changing more links than the same operations on a singly linked list, the operations are simpler and potentially more efficient (for nodes other than first nodes) because there is no need to keep track of the previous node during traversal or no need to traverse the list to find the previous node, so that its link can be modified. Definition: A doubly linked list, in computer science, is a linked data structure that consists of a set of sequentially linked records called nodes. Each node contains two fields, called links, that are references to the previous and to the next node in the sequence of nodes. The beginning and ending nodes’ previous and next links, respectively, point to some kind of terminator, typically a sentinel node or null, to facilitate traversal of the list. Noticeable differences between singly and doubly linked lists : Although doubly linked lists require more space per node as compared to singly linked list, and their elementary operations are more expensive; but they are often easier to manipulate because they allow sequential access to the list in both directions. In doubly linked list insertion or deletion of any node, whose address is given, can be carried out in constant number of operations and on the other hand in singly linked list the same operation would require the address of the predecessor’s address which of course is not a problem with doubly linked list as we can move in both directions. The below figure show how a doubly linked list looks like: 33 Representation of linked list in computer memory. Let LIST be a linked list. Then LIST will be maintained in memory as follows. First of all LIST requires two linear arrays, we will call them INFO and LINK such that INFO[k] and LINK[k] contain the information part and the next pointer field of a node of LIST, respectively. LIST requires a variable name such as START which contains the location of the beginning of the list and a next pointer sentinel denoted by null which indicates the end of the list. Since the subscripts of arrays INFO and LINK will usually be +ve, we will choose null=0. The following example of linked list indicate that the nodes of a list need not occupy adjacent elements in the arrays INFO and LINK, and that more than one list may be maintained in the same linear arrays INFO and LINK. However, each list must have its own pointer variable giving the location of its first node. Example Following figure shows a linked list in the memory. Here each node of the first contains a single character. INFO START LINK 1 B 3 2 E 5 3 C 4 4 D 2 5 F Null 6 A 1 6 We can obtain the actual list of characters, as follows. START = 6, so INFO [6] = A LINK [6] = 1, so INFO [1] = B LINK [1] = 3, so INFO [3] = C 34 LINK [3] = 4, so LINK [4] = 2, so LINK [2] = 5, so LINK [5] = Null, INFO [4] = D INFO [2] = E INFO [5] = F LINK [5] = 0 or Null so the list has ended. Traversing a linked list Let LIST be a linked list in memory stored in linear arrays INFO and LINK with START pointing to the first element and Null indicating the end of LIST. Suppose we want to traverse LIST in order to process each node exactly once. Traversing algorithm uses a pointer variable PTR that points to the node that is currently being processed. Accordingly LINK [PTR] points to the next node to be processed. Thus the assignment PTR = LINK [PTR] Moves the pointer to the next node. The details of the algorithm are as followed. Initialize PTR or START then process INFO [PTR], the information at the first node. Update PTR by the assignment PTR = LINK [PTR], so that PTR points to the second node. Again update PTR by the assignment PTR=LINK [PTR], and then process INFO [PTR], the information at the 3rd node, and so on. Continue until PTR =Null Which signals the end of the list. A formal presentation of algorithm follows. ALGORITHM TRAVERSING FORM LINK LIST (Let LIST be a linked list in memory this algorithm traverses LIST, applying an operation process to each node of LIST. The variable PTR points to the node currently being processed node. 1. [Initialize pointer PTR] Set PTR = START 2. Repeat step 3 and 4 while PTR ! = Null 3. Apply process to INFO [PTR] 4. [Increment the PTR to the next node] Set PTR = LINK [PTR] [End of step 2 loop] 5. [Finish] Exit 35 INSERTION IN BEGGING OF LINK LIST IF we want to insert new item in the link list and there is no restriction for specific place means that we inset the new element at anywhere. So we insert the element in begging of the link list. For this purpose we create new place and set this place as start. ALGORITHM FOR INSERTION IN BEGINNING OF THE LINK LIST (In this algorithm we insert an ITEM in first location of link list. START is very first node and AVAIL is next available node in link list.) Step1. [Check for over flow] If avail=null then Print “Over flow” exit Step2. [Make new avail] Set new =avail Step3. and avail=link [avail] [Insert the element in new] Info [new]=item Step4. Link [new]=start Step. [Set new as start] Start=new Step6 exit INSERTION IN SORTED LINK LIST IF we want to insert new ITEM in sorted link list then item must be inserted between the two node A and B. For this purpose first of all we find the location (LOC) of new node by comparing new item with each value of list and stopped the searching of location when ITEM<=info [PTR]. After this second procedure will be start in which we insert the new item at position LOC. ALGORITHM FOR INSERTION IN SORTED LINK LIST (In this algorithm we insert an ITEM in sorted list. This algorithm is divided in to two sub algorithm. In first step we find the location LOC for new item and then in 2nd step we insert new ITEM at LOC. START is very first node and AVAIL is next available node in link list.) 36 First process: [Finding the LOC for new ITEM] Step1. [Check for List is empty] If START=NULL then LOC=NULL and Return Step2. [Special Case] If ITEM <info[START] then Set LOC=NULL and Return Step3. [initialize the pointer ] Set Step4. save= START and PTR=link [START] [Set the loop] Repeat the step 5 and 6 while PTR != NULL Step5 if ITEM<INFO[PTR] then Ste LOC = SAVE and return Step6. [Update the pointer] Set SAVE= PTR and PTR = Link [PTR] Step 7. Set LOC=SAVE Step 8. Return Second process: [Insertion of element at location (LOC)] Step1. [Check for over flow] If avail=null then Print “Over flow” exit Step2. [Make new avail] Set new =avail and avail=link [avail] Step3. [Insert the element in new] Info [new]=item Step4. [Check new location of new node] If LOC= NULL then [inserts new node as first node] Info [new]=item . Link [new]=start Start=new Else [Set new node at LOC] Set link [new]=link [loc] and link [loc]=new Step6 [Finish] Exit 37 Deleting a node from a linked list One of the basic algorithms needed to maintain the list is DELETE. This is explained by carrying on with the example from the previous page. But basically the DELETE algorithm adjusts the pointer of the relevant nodes to make sure the list remains in the correct order after the node is removed Example: Storing an alphabetic list of names in a linked list The diagram above shows the final form of the linked list. This time the task is to remove a node from the linked list DELETE ALGORITHM 1. Starting with the first node pointed to by the start pointer 2. Is this the node to be removed? 3. Case 1: The node to be removed is the first one. 1. Adjust the start pointer to point to the next node 2. Remove the original node 3. Mark the memory it used as free once more 4. Task complete 4. Case 2: The node to be removed is an intermediate one 1. Examine the next node by using the node pointers to move from node to node until the correct node is identified 2. Node found 3. Copy the pointer of the removed node into temporary memory 4. Remove the node from the list and mark the memory it was using as free once more 5. Update the previous node's pointer with the address held in temporary memory 6. Task complete 5. Case 3: It is the last node to be removed 1. Remove the last node from the list 2. Mark the prior node with a null pointer 3. Mark the memory the deleted node used as free once more. 4. Task complete 6. Case 4: Node cannot be found 1. Return an error message to the calling code 2. Task complete 38 Circular list In the last node of a list, the link field often contains a null reference, a special value used to indicate the lack of further nodes. A less common convention is to make it point to the first node of the list; in that case the list is said to be circular or circularly linked; otherwise it is said to be open or linear. A circular linked list In the case of a circular doubly linked list, the only change that occurs is that end, or "tail", of the said list is linked back to the front, or "head", of the list and vice versa. Let’s now directly jump over to various operations which can be performed over doubly linked list: Add a node in a list at beginning or at end or in between. Delete a node from list at specific location. Reverse a list. Count nodes present in the list. Print the list to see all the nodes present in the list. First let’s have a look at the node definition : ? 1structnode{ 2 intdata; 3 structnode *prev; 4 structnode *next; 5}; It’s evident from the above node definition that a node in a doubly linked list have 2 links (nextand prev) and one data value (data). Add a node in a list. Insertion of a node in a linked list can be done at three places, viz. at start, in between at a specified location or at end. o Inserting a node at the start of list : Algorithm : 1. Update the nextpointer of the new node to the head node and make prevpointer of the new node as NULL 39 o o o o 2. Now update head node’s prevpointer to point to new node and make new node as head node. Inserting a node in between of list : Algorithm : 1. Traverse the list to the position where the new node is to be inserted.Let’scall this node as Position Node (we have to insert new node just next to it). 2. Make the nextpointer of new pointer to point to next node of position node. Also make the prevpoint of new node to point to position node. 3. Now point position node’s nextpointer to new node and prevnode of next node of position node to point to new node. Inserting a node at the end of the list : Algorithm : 1. Traverse the list to end. Let’s call the current last node of list as Last node. 2. Make nextpointer of New node to point to NULL and prevpointer of new node to point to Last node. 3. Update next pointer of Last node to point to new Node. Thus we see how easily we can add a node in a list. Let us now write code for all the three cases.Please note here that we are passing double pointer in the function as we may require to change the head pointer Delete a node from a list. As similar to Insertion of a node in a linked list, deletion can also be done at three places, viz. from start, in between at a specified location or from end. Deletion of node from start of list : Algorithm : 1. Create a temporary node which will point to the same node where Head pointer is pointing. 2. Now move the head pointer to point to the next node. Also change the heads prevto NULL. Then dispose off the node pointed by temporary node. Deletion of node from end of list : Algorithm : 1. Traverse the list till end. While traversing maintain the previous node address also. Thus when we reach at end then we have one pointer pointing to NULL and other pointing to penultimate node. 2. Update the nextpointer of penultimate node to point to NULL. 3. Dispose off the Last Node. Deletion of node from an intermediate position : 40 Algorithm : 1. As similar to previous case maintain two pointer, one pointing to the node to be deleted and other to the node previous to our target node (Node to be deleted). 2. Once we reach our target node, change previous node next pointer to point to nextpointer of target node and make prevpointer of next node of target node to point to previous node of target node. 3. Dispose off the target node. Doubly linked list In computer science, a doubly linked list is a linked data structure that consists of a set of sequentially linked records called nodes. Each node contains two fields, called links, that are references to the previous and to the next node in the sequence of nodes. The beginning and ending nodes' previous and next links, respectively, point to some kind of terminator, typically a sentinel node or null, to facilitate traversal of the list. If there is only one sentinel node, then the list is circularly linked via the sentinel node. It can be conceptualized as two singly linked lists formed from the same data items, but in opposite sequential orders. A doubly linked list whose nodes contain three fields: an integer value, the link to the next node, and the link to the previous node. The two node links allow traversal of the list in either direction. While adding or removing a node in a doubly linked list requires changing more links than the same operations on a singly linked list, the operations are simpler and potentially more efficient (for nodes other than first nodes) because there is no need to keep track of the previous node during traversal or no need to traverse the list to find the previous node, so that its link can be modified. Definition: A doubly linked list, in computer science, is a linked data structure that consists of a set of sequentially linked records called nodes. Each node contains two fields, called links, that are references to the previous and to the next node in the sequence of nodes. The beginning and ending nodes’ previous and next links, respectively, point to some kind of terminator, typically a sentinel node or null, to facilitate traversal of the list. Noticeable differences between singly and doubly linked lists : 41 Although doubly linked lists require more space per node as compared to singly linked list, and their elementary operations are more expensive; but they are often easier to manipulate because they allow sequential access to the list in both directions. In doubly linked list insertion or deletion of any node, whose address is given, can be carried out in constant number of operations and on the other hand in singly linked list the same operation would require the address of the predecessor’s address which of course is not a problem with doubly linked list as we can move in both directions. The below figure show how a doubly linked list looks like: Circular doubly linked lists Traversing the list Assuming that someNode is some node in a non-empty list, this code traverses through that list starting with someNode (any node will do): Forwards node := someNode do do something with node.value node := node.next while node ≠ someNode Backwards node := someNode do do something with node.value node := node.prev while node ≠ someNode Notice the postponing of the test to the end of the loop. This is important for the case where the list contains only the single node someNode. 42 Inserting a node This simple function inserts a node into a doubly linked circularly linked list after a given element: functioninsertAfter(Node node, NodenewNode) newNode.next := node.next newNode.prev := node node.next.prev := newNode node.next := newNode To do an "insertBefore", we can simply "insertAfter(node.prev, newNode)". Inserting an element in a possibly empty list requires a special function: functioninsertEnd(List list, Node node) iflist.lastNode == null node.prev := node node.next := node else insertAfter(list.lastNode, node) list.lastNode := node To insert at the beginning we simply "insertAfter(list.lastNode, node)". Deletion operation Finally, removing a node must deal with the case where the list empties: function remove(Listlist, Node node) ifnode.next == node list.lastNode := null else node.next.prev := node.prev node.prev.next := node.next if node == list.lastNode list.lastNode := node.prev; destroy node Header Linked List: A header linked list which always contains a special node, called the header node, at the beginning of the list. The following are two kinds of widely used header list: 1. A grounded header list is a header list where the last node contains the null pointer. 2. A circular header list is a header list where the last node points back to the header node.Unless otherwise stated or implied, our header list will 43 always be circular list. Accordingly, in such a case, the header node also acts as a sentinel indicating the end of the list. Circular Linked List Header Linked List 44 Unit :- 4th Stacks, Queues and Recursion STACK A stack is a linear structure in which items may be added or removed only at one end. Every day examples of such a structure are a stock of dishes, a stack of pennies, a stack of dishes and a stack of folded cloths. We can observe that an item may be added or removed only from the top of the stack. It means that the last item to be added to a stack is the first item to be removed. Due to stacks are also called last in first out (LIFO) lists. In stack an element may be inserted or deleted only at one end called the top of the stack. It means that the elements are removed from a stack in the reverse order of that in which they were inserted into the stack. Special terminology is used for two basic operations associated with stacks. 1. “push” is the term used for insertion. 2. “pop” is the term used for deletion. TOP For example A,B,C,D are 4 element of stack ST. D 4 so we see that A , B, C, D are stored on location C 3 B 2 A 1 1,2,3,4. So top in this stack is 4. ST 45 Representation of a stack A typical stack, storing local data and call information for nested procedure calls (not necessarily nested procedures!). This stack grows downward from its origin. The stack pointer points to the current topmost datum on the stack. A push operation decrements the pointer and copies the data to the stack; a pop operation copies data from the stack and then increments the pointer. Each procedure called in the program stores procedure return information (in yellow) and local data (in other colors) by pushing them onto the stack. This type of stack implementation is extremely common, but it is vulnerable to buffer overflow attacks (see the text). A typical stack is an area of computer memory with a fixed origin and a variable size. Initially the size of the stack is zero. A stack pointer, usually in the form of a hardware register, points to the most recently referenced location on the stack; when the stack has a size of zero, the stack pointer points to the origin of the stack. The two operations applicable to all stacks are: a push operation, in which a data item is placed at the location pointed to by the stack pointer, and the address in the stack pointer is adjusted by the size of the data item; apop or pull operation: a data item at the current location pointed to by the stack pointer is removed, and the stack pointer is adjusted by the size of the data item. 46 There are many variations on the basic principle of stack operations. Every stack has a fixed location in memory at which it begins. As data items are added to the stack, the stack pointer is displaced to indicate the current extent of the stack, which expands away from the origin. Stack pointers may point to the origin of a stack or to a limited range of addresses either above or below the origin (depending on the direction in which the stack grows); however, the stack pointer cannot cross the origin of the stack. In other words, if the origin of the stack is at address 1000 and the stack grows downwards (towards addresses 999, 998, and so on), the stack pointer must never be incremented beyond 1000 (to 1001, 1002, etc.). If a pop operation on the stack causes the stack pointer to move past the origin of the stack, a stack underflow occurs. If a push operation causes the stack pointer to increment or decrement beyond the maximum extent of the stack, a stack overflow occurs. PUSH( STACK, TOP, MAXSTK, ITEM ) (This algorithm inserts an element ITEM in the STACK. TOP is a pointer that indicates the top position of the stack and MAXSTK is the no of elements that can be placed in the stack.) 1. [Check for overflow] If TOP > = MAXSTK then Printf (“ overflow “) RETURN 2. [Increase the TOP] Set TOP = TOP + 1 3. [Push the element] STACK [TOP] = ITEM 3. [Finish] Exit POP ( STACK, TOP, ITEM ) (This algorithm deleted thetopelement of the STACK and assigns it to the variable ITEM. TOP is the pointer that indicates the top position of the stack.) 1. [Check for underflow] If TOP = 0 then Print (“Underflow “) 47 RETURN 2. [Pop the element] Set ITEM = STACK [TOP] 3. [Decrease the counter] Set TOP = TOP – 1 4. [Finish] Exit Uses of Stack:ARITHMETIC EXPRESSION Priority of Operator: An arithmetic expression is collection of arithmetic operator and operand where operator may be Unary Operator or may be binary operator. When we solve an arithmetic expression then which operator perform its operation this is called priority of operator. Basically these operator works in three levels. Highest Priority Second Priority Third Priority : : : Exponentiation Multiplication and division Addition and subtraction Suppose we want to evaluate the following expression. 2^ 3 + 5 *2 ^ 2 –12 /16 First of all we evaluate Exponentiation 8 +5 * 4 – 12 /16 Now second priority to Multiplication and division.if both operator of same priority then w solve from left to right as in our case ( * ) and ( / ) have same priority the we solve form left to right . First we solve multiplication process. 8 + 20 – 12 / 16 In next step we solve division process. 8 + 20 –2 And in the end + and - again have same priority then we again solve from left to right. So we solve + operation first. 48 28 –2 Final step is process of subtraction 26 POLISH NOTATION: When we write an arithmetic expression then operator takes place between two operands this is called infix notation. For example X + Y (Where x and y are two operand and + is operator) If we write an expression and operator take place before the operands then this is called prefix notation or polish notation. For example + X Y (Where x and y are two operands and + is operator) If the operator in an expression takes place after the operands then this is called postfix notation. For example: X Y + (Where x and y are two operand and + is operator) Conversion of infix expression to Polish notation: We translate step by step, the following infix expression into polish notation using brackets [ ] to indicate a partial translation: A + (B * C) A + [*BC] +A * BC Conversion of infix expression to Postfix notation: We translate step by step, the following infix expression into postfix notation using brackets [ ] to indicate a partial translation: A + (B * C) A + [BC *] A BC * + Postfix Evaluation : In normal algebra we use the infix notation like a+b*c. The 49 corresponding postfix notation is abc*+. The algorithm for the conversion is as follows : Scan the Postfix string from left to right. Initialize an empty stack. If the scanned character is an operand, add it to the stack. If the scanned character is an operator, there will be at least two operands in the stack. If the scanned character is an Operator, then we store the top most element of the stack(top Stack) in a variable temp. Pop the stack. Now evaluate top Stack(Operator)temp. Let the result of this operation be ret Val. Pop the stack and Push ret Val into the stack. Repeat this step till all the characters are scanned. After all characters are scanned, we will have only one element in the stack. Return topStack. Example : Let us see how the above algorithm will be implemented using an example. Postfix String : 123*+4- Initially the Stack is empty. Now, the first three characters scanned are 1,2 and 3, which are operands. Thus they will be pushed into the stack in that order. Expression Stack Next character scanned is "*", which is an operator. Thus, we pop the top two elements from the stack and perform the "*" operation with the two operands. The second operand will be the first element that is popped. Expression Stack The value of the expression(2*3) that has been evaluated(6) is pushed into the stack. 50 Expression Stack Next character scanned is "+", which is an operator. Thus, we pop the top two elements from the stack and perform the "+" operation with the two operands. The second operand will be the first element that is popped. Expression Stack The value of the expression(1+6) that has been evaluated(7) is pushed into the stack. Expression Stack Next character scanned is "4", which is added to the stack. Expression Stack Next character scanned is "-", which is an operator. Thus, we pop the top two elements from the stack and perform the "-" operation with the two operands. The second operand will be the first element that is popped. Expression Stack 51 The value of the expression(7-4) that has been evaluated(3) is pushed into the stack. Expression Stack Now, since all the characters are scanned, the remaining element in the stack (there will be only one element in the stack) will be returned. End result : Postfix String : 123*+4- Result : 3 Infix to Postfix Conversion: In normal algebra we use the infix notation like a+b*c. The corresponding postfix notation is abc*+. The algorithm for the conversion is as follows : Scan the Infix string from left to right. Initialize an empty stack. If the scanned character is an operand, add it to the Postfix string. If the scanned character is an operator and if the stack is empty Push the character to stack. If the scanned character is an Operand and the stack is not empty, compare the precedence of the character with the element on top of the stack (top Stack). If top Stack has higher precedence over the scanned character Pop the stack else Push the scanned character to stack. Repeat this step as long as stack is not empty and top Stack has precedence over the character. Repeat this step till all the characters are scanned. (After all characters are scanned, we have to add any character that the stack may have to the Postfix string.) If stack is not empty add topStack to Postfix string and Pop the stack. Repeat this step as long as stack is not empty. Return the Postfix string. 52 QUEUE A queue is a linear list of elements in which deletion can take place only at one end, called the FRONT, and insertions can take place only at the other end, called the REAR. These two terms are only used with the queues. Queues are also called First-in First-out (FIFO) lists. Example of queue in daily life is the people waiting in a line at a bank. Man first comes first out form line. Another example of queue is line of those people who waiting for bus at bus stop. Each new person who comes takes his place at the end of the line when bus is comes, the person at front of line board first. An important example of a queue in computer science occurs in a timesharing system, in which the programs with the same priority form a queue while waiting to be executed. Representation of Queue InMemory:Following Figures shows the way the arrays in figures a, b, c and d will be stored in memory using an array QUEUE with N elements. These figures also indicate the way elements will be deleted from and the way elements will be added to the queue. 1. FRONT = 1 REAR = 4 A B C D 1 2 1 FRONT = 2 REAR = 4 C D 2 3 4 2. B 3. 3 1 FRONT = 2 REAR = 6 C D 2 3 4 1 FRONT = 3 REAR = 6 C D 3 4 4 B 4. 2 5 5 E .... N N .... N N .... 6 6 5 F 6 N N 5 F 6 N N E Note that whenever an element is deleted from the queue, the value of FRONT is increased by 1; this is accomplished by FRONT: = FRONT + 1 Similarly whenever an element is added to the queues the value of REAR is increased by 1; this is accomplished by REAR : = REAR + 1 53 QUEUE INSERTION (QUEUE, N, FRONT, REAR, ITEM) (This algorithm inserts an element ITEM in the array QUEUE where N is the number of elements in the queue and FRONT and REAR are the pointers.) 1. [Check for overflow] If FRONT = 1 and REAR = N or Front=Rear +1 then Printf (“overflow “) and RETURN 2. [ Increment the Rear ] If FRONT = Null and REAR = Null Set FRONT = 1 and REAR =1 Else if REAR = N then Set REAR = 1 Else Set REAR = REAR + 1 [End of if structure] 3. [Insert the element] Set QUEUE [REAR] = ITEM 4. [Finish] Exit DELETTION IN QUEUE (QUEUE, N, FRONT, REAR, ITEM) (This algorithm deletes an element from the QUEUE and assigns it to the variable ITEM. N is the total no of elements FRONTand REAR are used as pointers). 1. [Check for underflow] If FRONT =0 then Printf (“Underflow “) and RETURN 2. [Delete the element] Set ITEM: = QUEUE [FRONT] 54 3. [Increase the FRONT] If FRONT = REAR then Set FRONT = REAR = 0l Else If FRONT = N then Set FRONT= 1 Else Set FRONT = FRONT + 1 [End of If structure] 4. [Finish] Exit Circular Queue In a standard queue data structure re-buffering problem occurs for each dequeue operation. To solve this problem by joining the front and rear ends of a queue to make the queue as a circular queue Circular queue is a linear data structure. It follows FIFO principle. In circular queue the last node is connected back to the first node to make a circle. Circular linked list fallow the First In First Out principle Elements are added at the rear end and the elements are deleted at front end of the queue Both the front and the rear pointers points to the beginning of the array. It is also called as “Ring buffer”. Items can inserted and deleted from a queue in O(1) time. Circular Queue can be created in three ways they are · Using single linked list · Using double linked list · Using arrays Using single linked list: It is an extension for the basic single linked list. In circular linked list Instead of storing a Null value in the last node of a single linked list, store the address of the 1st node (root) forms a circular linked list. Using circular linked list it is possible to directly traverse to the first node after reaching the last node. The following figure shows circular single linked list: 55 Using double linked list In double linked list the right side pointer points to the next node address or the address of first node and left side pointer points to the previous node address or the address of last node of a list. Hence the above list is known as circular double linked list. The following figure shows Circular Double linked list :- Algorithm for creating circular linked list :Step 1) start Step 2) create anode with the following fields to store information and the address of the next node. Structure node begin int info pointer to structure node called next end Step 3) create a class called clist with the member variables of pointer to structure nodes called root, prev, next and the member functions create ( ) to create the circular linked list and display ( ) to display the circular linked list. Step 4) create an object called ‘C’ of clist type Step 5) call C. create ( ) member function Step 6) call C. display ( ) member function Step 7) stop Algorithm for create ( ) function:Step 1) allocate the memory for newnode newnode = new (node ) Step 2) newnode->next=newnode. // circular Step 3) Repeat the steps from 4 to 5 until choice = ‘n’ Step 4) if (root=NULL) root = prev=newnode // prev is a running pointer which points last node of a list else newnode->next = root prev->next = newnode prev = newnode Step 5) Read the choice Step 6) return 56 Using array In arrays the range of a subscript is 0 to n-1 where n is the maximum size. To make the array as a circular array by making the subscript 0 as the next address of the subscript n-1 by using the formula subscript = (subscript +1) % maximum size. In circular queue the front and rear pointer are updated by using the above formula. The following figure shows circular array: Algorithm for En-queue operation using array Step 1.Start Step 2.if (front == (rear+1)%max) Print error “circular queue overflow “ Step 3.else { rear = (rear+1)%max Q[rear] = element; If (front == -1 ) f = 0; } Step 4.stop Algorithm for De-queue operation using array Step 1.start Step 2.if ((front == rear) && (rear == -1)) Print error “circular queue underflow “ Step 3.else { element = Q[front] If (front == rear) front=rear = -1 Else Front = (front + 1) % max } Step 4.stop Recursion Recursion in computer science is a method where the solution to a problem depends on solutions to smaller instances of the same problem. The approach can be applied to many types of problems, and is one of the central ideas of computer science. 57 "The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions." Most computer programming languages support recursion by allowing a function to call itself within the program text. Some functional programming languages do not define any looping constructs but rely solely on recursion to repeatedly call code. Computability theory has proventhat these recursive-only languages are mathematically equivalent to the imperative languages, meaning they can solve the same kinds of problems even without the typical control structures like “while” and “for”. Tree created using the Logo programming language and relying heavily on recursion. Recursive programs Recursive procedures Factorial A classic example of a recursive procedure is the function used to calculate the factorial of a natural number: Pseudo code (recursive): function factorial is: input: integer n such that n>= 0 output: [n × (n-1) × (n-2) × … × 1] 1. if n is 0, return 1 2. otherwise, return [ n × factorial(n-1) ] end factorial The function can also be written as a recurrence relation: This evaluation of the recurrence relation demonstrates the computation that would be performed in evaluating the pseudo code above: 58 Unit :-5th TREES TREES Tree is non-linear data structure. This structure is mainly used to represent data containing a hierarchal relationship between elements (For example) record, family relationship, and table contents. The earth structure is a good example of tree. If we want to represent the structure graphically, we will have the following diagram. EARTH Africa Europe Asia N-America S-America China Pakistan USA All data element in tree called nodes. The node, which does not have its preceding nodes, is called ROOT node and it is consider at level number one. If a tree have ROOT node N and S1 and S2 left and right successor then N is called parent or father and S1 is called left childe and S2 is called right child. S1 and S2 are called brother or siblings. The node on same level are called “NODE OF SAME GENERATION”. BINARY TREE: A tree is said to be binary tree if it have at most two nodes. In simple word we say that a binary tree must have zero, one or two node. A binary tree is very important tree structure. 59 Complete Binary Tree: A tree is said to be a complete binary tree if it have exactly two nodes except last node. Similar Binary Tree: Two binary tree T1 and T2 are said to be similar binary tree if they have same structure. In simple word we say that if two trees have same shape then they are called similar trees. T1 T2 A X B C D Y Z H E M Copy of Tree: Two binary tree T1 and T2 are said to be copy of each other if they have same structure and same contents. 60 Representing Binary Trees in Memory Array Representation For a complete or almost complete binary tree, storing the binary tree as an array may be a good choice. One way to do this is to store the root of the tree in the first element of the array. Then, for each node in the tree that is stored at subscript k, the node's left child can be stored at subscript 2k+1 and the right child can be stored at subscript 2k+2. For example, the almost complete binary tree shown in Diagram 2 can be stored in an array like so: However, if this scheme is used to store a binary tree that is not complete or almost complete, we can end up with a great deal of wasted space in the array. For example, the following binary tree would be stored using this techinque like so: Linked Representation 61 If a binary tree is not complete or almost complete, a better choice for storing it is to use a linked representation similar to the linked list structures covered earlier in the semester: Each tree node has two pointers (usually named left and right). The tree class has a pointer to the root node of the tree (labeled root in the diagram above). Any pointer in the tree structure that does not point to a node will normally contain the value NULL. A linked tree with N nodes will always contain N + 1 null links. Binary tree B-tree. 62 A simple binary tree of size 9 and height 3, with a root node whose value is 2. The above tree is unbalanced and not sorted. In computer science, a binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to the "root" node (the ancestor of all nodes), if it exists. Any node in the data structure can be reached by starting at root node and repeatedly following references to either the left or right child. A tree which does not have any node other than root node is called a null tree. In a binary tree a degree of every node is maximum two. A tree with n nodes has exactly n−1 branches or degree. Binary trees are used to implement binary search trees and binary heaps. Definitions for rooted trees A directed edge refers to the link from the parent to the child (the arrows in the picture of the tree). The root node of a tree is the node with no parents. There is at most one root node in a rooted tree. A leaf node has no children. The depth of a node n is the length of the path from the root to the node. The set of all nodes at a given depth is sometimes called a level of the tree. The root node is at depth zero. The depth (or height) of a tree is the length of the path from the root to the deepest node in the tree. A (rooted) tree with only one node (the root) has a depth of zero. Siblings are nodes that share the same parent node. A node p is an ancestor of a node q if it exists on the path from the root to node q. The node q is then termed as a descendant of p. The size of a node is the number of descendants it has including itself. In-degree of a node is the number of edges arriving at that node. Out-degree of a node is the number of edges leaving that node. The root is the only node in the tree with In-degree = 0. All the leaf nodes have Out-degree = 0. 63 Types of binary trees Tree rotations are very common internal operations on self-balancing binary trees. A rooted binary tree is a tree with a root node in which every node has at most two children. A full binary tree (sometimes proper binary tree or 2-tree or strictly binary tree) is a tree in which every node other than the leaves has two children. Or, perhaps more clearly, every node in a binary tree has exactly 0 or 2 children. Sometimes a full tree is ambiguously defined as a perfect tree. A perfect binary tree is a full binary tree in which all leaves are at the same depth or same level, and in which every parent has two children.(This is ambiguously also called a complete binary tree. A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible. An infinite complete binary tree is a tree with a countably infinite number of levels, in which every node has two children, so that there are 2d nodes at level d. The set of all nodes is countably infinite, but the set of all infinite paths from the root is uncountable: it has the cardinality of the continuum. These paths corresponding by an order preserving bijection to the points of the Cantor set, or (through the example of the Stern–Brocot tree) to the set of positive irrational numbers. A balanced binary tree is commonly defined as a binary tree in which the depth of the two subtrees of every node differ by 1 or less,[3] although in general it is a binary tree where no leaf is much farther away from the root than any other leaf. (Different balancing schemes allow different definitions of "much farther".[4]) Binary trees that are balanced according to this definition have a predictable depth (how many nodes are traversed from the root to a leaf, root counting as node 0 and subsequent as 1, 2, ..., depth). This depth is equal to the integer part of where is the number of nodes on the balanced tree. Example 1: balanced tree with 1 node, (depth = 0). Example 2: balanced tree with 3 nodes, 64 (depth=1). Example 3: balanced tree with 5 nodes, of tree is 2 nodes). (depth A degenerate tree is a tree where for each parent node, there is only one associated child node. This means that in a performance measurement, the tree will behave like a linked list data structure. Note that this terminology often varies in the literature, especially with respect to the meaning of "complete" and "full". T1 T2 A B A X C B D C D E E Traversing of Binary tree: There are following three standard way of traversing the element of binary tree. 1) Pre Order: Process on ROOT. Process on LEFT Node (In pre order). Process on RIGHT node (In pre order). 2) In Order: Process on LEFT Node (In in-order). Process on ROOT. Process on RIGHT node (In in-order). 3) Post Order: Process on RIGHT node (In post order). Process on LEFT Node (In post order). Process on ROOT. 65 Example: If we take following expression: [a +(b-c)] * [d –c) / (f + g –h)] Then following tree represent the above expression: * / + a - - - b c h e d + f g Pre order traversing: * + a - b c / - d e - + f g h Post Order traversing: a b c - + d e - f g + h - / * ALGORITHM FOR PREORDER TRAVERSING Suppose T is binary tree then with the help of this algorithm we access the element of T in pre order using stack which temporary hold the data. Step-1 [initialize the pointers] Set TOP=1 and STACK [TOP]=null and PTR=ROOT 66 Step-2 [set the loop] Repeat the step 3 to 5 while PTR != null Step-3 Apply the process on ROOT Step-4 If RIGHT [PTR]! = Null then [Push the item in stack] Set TOP = TOP +1 and STACK [TOP]=RIGHT [ROOT] Step-5 If LEFT [ROOT]! = Null then PTR= LEFT [ROOT] ELSE PTR= STACK [TOP] AND TOP = TOP –1 [End of the loop] Step-6 [end] ALGORITHM FOR INORDER TRAVERSING Suppose T is binary tree then with the help of this algorithm we access the element of T in IN-ORDER using stack which temporary hold the data. Step-1 [initialize the pointers] Set TOP=1 and STACK [TOP]=null and PTR=ROOT Step-2 Repeat step (a) and (b) while PTR! = Null i. Set TOP= TOP +1 and STACK [TOP]=PTR ii. Set PTR= LEFT [PTR] Step-3 Set PTR= STACK [TOP] and TOP = TOP –1 Step-4 Repeat step 5 to 7 while PTR! = Null Step-5 apply the process on PTR Step-6 if RIGHT [PTR] != null then PTR= RIGHT [PTR] and GOTO step 2 Step-7 Set PTR=STACK [TOP] and TOP = TOP –1 [End of the loop] Step-8 [END] 67 ALGORITHM FOR POSTORDER TRAVERSING Suppose T is binary tree then with the help of this algorithm we access the element of T in POST ORDER using stack which temporary hold the data. Step-1 [initialize the pointers] Set TOP=1 and STACK [TOP]=null and PTR=ROOT Step-2 [Push the left node in to stack] Repeat step 3 to 5 while PTR! = Null Step-3 Set TOP = TOP +1 and STACK [TOP]=PTR Step-4 if RIGHT [PTR]! = NULL then Set TOP = TOP +1 and STACK [TOP]=RIGHT [PTR] Step-5 Set PTR=LEFT [PTR] Step-6 Set PTR= STACK [TOP] and TOP= TOP –1 Step-7 Repeat while PTR> 0 Apply process on PTR Set PTR= STACK [TOP] and TOP= TOP –1 Step-8 if PTR< 0 then SET PTR= -PTR GOTO step 2 Step-9 exit Binary search tree Type Time in big O notation Space Search Insert Delete Tree complexity Average O(n) O(log n) O(log n) O(log n) Worst case O(n) O(n) O(n) O(n) 68 A binary search tree of size 9 and depth 3, with root 8 and leaves 1, 4, 7 and 13 In computer science, a binary search tree (BST), which may sometimes also be called an ordered or sorted binary tree, is a node-basedbinary treedata structure which has the following properties:[1] The left sub tree of a node contains only nodes with keys less than the node's key. The right sub tree of a node contains only nodes with keys greater than the node's key. Both the left and right sub trees must also be binary search trees. There must be no duplicate nodes. Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes, nodes are compared according to their keys rather than any part of their associated records. The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as in-order traversal can be very efficient. Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associative arrays. Operations Operations on a binary search tree require comparisons between nodes. These comparisons are made with calls to a comparator, which is a subroutine that computes the total order (linear order) on any two keys. This comparator can be explicitly or implicitly defined, depending on the language in which the BST is implemented. Searching Searching a binary search tree for a specific key can be a recursive or iterative process. We begin by examining the root node. If the tree is null, the key we are searching for does not exist in the tree. Otherwise, if the key equals that of the root, the search is successful. If the key is less than the root, search the left subtree. Similarly, if it is greater than the root, search the right subtree. This process is repeated until the key is found or the remaining subtree is null. If the searched key is not found before a null subtree is reached, then the item must not be present in the tree. Here is the search algorithm in pseudo code (iterative version, finds a BST node): 69 algorithm Find(key, root): current-node := root while current-node isnot Nil do if current-node.key= key then return current-node elseif key < current-node.key then current-node := current-node.left else current-node := current-node.right The following recursive version is equivalent: algorithm Find-recursive(key, node): // call initially with node = root if node = Nil ornode.key= key then node elseif key <node.keythen Find-recursive(key,node.left) else Find-recursive(key,node.right) This operation requires O(log n) time in the average case, but needs O(n) time in the worst case, when the unbalanced tree resembles a linked list (degenerate tree). Insertion Insertion begins as a search would begin; if the key is not equal to that of the root, we search the left or right subtrees as before. Eventually, we will reach an external node and add the new key-value pair (here encoded as a record 'newNode') as its right or left child, depending on the node's key. In other words, we examine the root and recursively insert the new node to the left subtree if its key is less than that of the root, or the right subtree if its key is greater than or equal to the root. Here is the general pseudocode for BSTINSERT(V,T) (assume V is not in T): BSTINSERT(V,T) { if T is empty then T = create_singleton(V) else if V >rootvalue(T) then if T's right subtree exists then BSTINSERT(V,T's right subtree) else T's right subtree = create_singleton(V) else if T's left subtree exists then BSTINSERT(V,T's left subtree) else T's left subtree = create_singleton(V) } I hope you can see that this can very easily be written using our tree operations. I also hope you see that this is exactly the same as the SEARCH(V,T) operation described above, except for 70 1. the processing we do in the base case (empty tree or subtree) 2. the assumption that V is not in the tree for INSERT. Deletion There are three possible cases to consider: Deleting a leaf (node with no children): Deleting a leaf is easy, as we can simply remove it from the tree. Deleting a node with one child: Remove the node and replace it with its child. Deleting a node with two children: Call the node to be deleted N. Do not delete N. Instead, choose either its in-order successor node or its inorder predecessor node, R. Replace the value of N with the value of R, then delete R. As with all binary trees, a node's in-order successor is the left-most child of its right subtree, and a node's in-order predecessor is the right-most child of its left subtree. In either case, this node will have zero or one children. Delete it according to one of the two simpler cases above. Deleting a node with two children from a binary search tree. The triangles represent subtrees of arbitrary size, each with its leftmost and rightmost child nodes at the bottom two vertices. Consistently using the in-order successor or the in-order predecessor for every instance of the two-child case can lead to an unbalanced tree, so good implementations add inconsistency to this selection. Running time analysis: Although this operation does not always traverse the tree down to a leaf, this is always a possibility; thus in the worst case it requires time proportional to the height of the tree. It does not require more even when the node has two children, since it still follows a single path and does not visit any node twice. Deleting a Node From a Binary Search Tree Of course, if we are trying to delete a leaf, there is no problem. We just delete it and the rest of the tree is exactly as it was, so it is still a BST. 71 There is another simple situation: suppose the node we're deleting has only one subtree. In the following example, `3' has only 1 subtree. To delete a node with 1 subtree, we just `link past' the node, i.e. connect the parent of the node directly to the node's only subtree. This always works, whether the one subtree is on the left or on the right. Deleting `3' gives us: which we normally draw: Finally, let us consider the only remaining case: how to delete a node having two subtrees. For example, how to delete `6'? We'd like to do this with minimum amount of work and disruption to the structure of the tree. The standard solution is based on this idea: we leave the node containing `6' exactly where it is, but we get rid of the value 6 and find another value to store in the `6' node. This value is taken from a node below the `6's node, and it is that node that is actually removed from the tree. So, here is the plan. Starting with: 72 Erase 6, but keep its node: Now, what value can we move into the vacated node and have a binary search tree? Well, here's how to figure it out. If we choose value X, then: 1. everything in the left subtree must be smaller than X. 2. everything in the right subtree must be bigger than X. Let's suppose we're going to get X from the left subtree. (2) is guaranteed because everything in the left subtree is smaller than everything in the right subtree. What about (1)? If X is coming from the left subtree, (1) says that there is a unique choice for X - we must choose X to be the largest value in the left subtree. In our example, 3 is the largest value in the left subtree. So if we put 3 in the vacated node and delete it from its current position we will have a BST with 6 deleted. Here it is: So our general algorithm is: to delete N, if it has two subtrees, replace the value in N with the largest value in its left subtree and then delete the node with the largest value from its left subtree. Note: The largest value in the left subtree will never have two subtrees. Why? Because if it's the largest value it cannot have a rightsubtree. Finally, there is nothing special about the left subtree. We could do the same thing with the right subtree: just use the smallest value in the right subtree. 73 Unit – 6th Searching and Sorting Searching Searching means the operation of finding the location of element in array. If element found in array in this process then search is successful and if element not found in array then search is said to un successful. Let LA be a collection of data elements in the memory and suppose a specific ITEM of information is given. Searching refers to the operation of finding location ITEM in LA. The search is said to be successful if ITEM is found and unsuccessful otherwise. Two important and useful method of searching are: Linear Search Binary Search Linear search In this search particular element is search sequentially in the whole list. Suppose LA is a linear array with N elements. Given no other information about LA, the most intuitive way to search for a given ITEM in LA is to compare ITEM with each element of LA one by one. To simplify the matter we first assign ITEM to LA [N+1], the position following the last element of LA. Now LA[1] is compare with LA[n+1] if both same then LOC=LA[1] and if not then LA[2] is compare with LA[n+1] and son in the end if LOC is found before the LA[n+1] then search is successful otherwise search is not successful. The purpose of this initial assignment is to avoid repeatedly testing whether or not we have reached the end of the array DATA. This method of searching is called linear search or sequential search. Algorithm for linear search (Here LA is a linear array with N elements, and ITEM is a given item of formation. This algorithm finds the location of ITEM in LA.) 1. [Insert item at the end of LA] set LA[N+1]:=ITEM 2. [Initialize counter] set LOC:=1 3. [Search for item] Repeat while LA[LOC] != ITEM Set LOC:=LOC+1 [End of loop] 4. If LOC:=N+1 then search is successful else search is unsuccessful 5. Exit 74 Binary Search Binary search is very effected method when we have sorted list. In this sorting process we find the middle element and compare the middle element of the list with searching element if middle element is equal to searching element then search is successful otherwise we compare that searching element is grater then middle element or less then middle element. If grater then middle element then we find in right side and if less then middle element then we find in right side. Procedure of Binary Search: If we have list DATA of N element and we find the location of element ITEM then we find the middle element such as MID= INT (BEG + END) /2) Where BEG is starting element and End is final element. Consider the following example: 11 22 30 1 2 3 40 44 55 60 66 77 4 5 6 7 8 9 80 10 88 11 Suppose we want to search 40 in above data using binary search method. So according to this method first we find the middle element and then compare with searching element which is 40 in our case. MID=INT(BEG +END)/2 MID=INT(1+11)/2 MID=6 As MID is 6th element which is 55 As we see that MID is not equal to 40 and searching element is less then MID element so we find the element in left side of list. We again calculate MID such as END= MID –1 its means MID=6-1 =5 MID=INT(BEG+END)/2 MID=INT(5+1)/2 MID=3 AS MID is 3rd element which is 30 and equal to searching element and also we see that searching element is grate then MID so we find the element in right side so we again calculate BEG which is MID +1 its means BEG=3+1 =4 and MID=INT (BEG +END)/2 MID=INT (4+5)/2 MID=4 AS MID is 4th element which is 40 and equal to searching element. 75 As MID is equal to searching element so search is successful. ALGORITHM OF BINARY SEARCH (LA is list of element with lower bound LB and upper bound UB and item is given ITEM is searching element.) step-1 [initialize the counter] set BEG=LB , END=UB and MID= INT(BEG+ END)/2 Step-2 [Set the loop] Repeat step 3 and step 4 while BEG<= END and LA[MID] != ITEM Step-3 if ITEM < LA [MID] then Set END=MID-1 ELSE Set BEG=MID+1 Step-4 Set MID=INT (BEG +END)/2 (End of the loop) Step-5 if LA [MID]=ITEM Then Search is successful Else Search is unsuccessful Step=6 [finish] Exit Sorting Sorting is process in which we arrange the element of list in ascending or descending order. The operation of sorting is most often performed in business data processing applications. However sorting is important process in every application. In process of sorting almost all the element move from one place to another place and arranging of these element is called sorting. Let A be a list of n numbers. Sorting of A refers to preparation of rearranging the elements of A so they are in increasing order or decreasing order i.e A[1]<A[2]<A[3]< . . . . . . .<A[N]. A[1]>A[2]>A[3]> . . . . . . .>A[N]. 76 There are many technique of sorting which will adopted according to the requirement of situation. QUICK SORT: In this method of sorting first element of the list form very left side is selected and moved to its proper position by comparing in the special sequence. Such that if we have list of element which contain N element then n1 (first element form left side) is selected and compare form right to left if any element found less then n1 (selected element) then interchange. In next step selected element which is marked on new location is compare form left to right if any element if any element is found is grater then selected element then it is swapped. This process is repeated until the element is reached its proper positions. When the selected element reached its proper position list is spilt in to two sub lists. Applied this procedure to each sub list. In the end complete list will be sorted. This procedure is also called partition exchange sort as in this procedure list is divide in to parathion. For example Suppose A is the list of following 12 numbers 44, 33, 11, 55, 77,90 40, 60, 99, 22, 88, 66 The reduction step of the quick sort algorithm finds the final position of one of the given numbers; here we select the first number 44. This is accomplished as: Beginning with the last number, 66 scan the list from right to left comparing each number with 44 and stopping at the first number less then 44. Here the no is 22. Interchange 44 and 22 to obtain the list 44, 33, 11, 55, 77,90 40, 60, 99, 22, 88, 66 22, 33, 11, 55, 77,90 40, 60, 99, 44, 88, 66 Now compare 44 form left to right comparing each with each element and stopping at the first no greater than 44. It is 55. Interchange 44 and 55 to obtain the list. 22, 33, 11, 55, 77,90 40, 60, 99, 44, 88, 66 Now after swapping obtained list is as under: 22, 33, 11, 44, 77,90 40, 60, 99, 55, 88, 66 Now again scan the list from right to left until meeting the first no less than 44. It is 40. Interchange them and obtain the list 22, 33, 11, 44, 77,90 40, 60, 99, 55, 88, 66 Now after swapping obtained list is as under: 77 22, 33, 11, 40, 77,90 44, 60, 99, 55, 88, 66 Now scan the list from left to right until meeting the first no greater than 44. It is 77. Interchange to obtain the list 22, 33, 11, 40, 77,90 44, 60, 99, 55, 88, 66 Now after swapping obtained list is as under: 22, 33, 11, 40, 44,90 77, 60, 99, 55, 88, 66 Now if we scan the list from right then found then 44 is reached its proper postion its means there is no element is grater the 44 on right side and no less element form 44 on left side. The original list is dividing in to two sub list. The above reduction steps are repeated with each sub list containing two or more elements. Since we can process only one sub list at a time, we must be able to keep track of some sub lists for future processing. ALGORITHM OF QUICK (LA, N, BEG, END, LOC, LEFT, RIGHT) (Here LA is an array with N elements. Parameters BEG and ENDcontain the boundary values of the sub list of LA to which this procedure applies. LOC keeps track of the position of the first element LA[BEG] of the sub list during the procedure. The local variables LEFT and RIGHT will contain the boundary values of the list of elements that have not been scanned.) Step-1[ Initialize ] Set LEFT : = BEG RIGHT : = END LOC : = BEG Step-2 [ Scan from right to left ] Repeat WHILE LA[LOC]<= LA[RIGHT] and LOC != RIGHT RIGHT : = RIGHT + 1 [End of loop] If LOC = RIGHT then Return If LA[LOC] > A[RIGHT] then 78 Step-3 [Interchange LA[LOC] and LA[RIGHT] TEMP : = LA[LOC] LA[LOC] : = LA[RIGHT] LA[RIGHT] : = TEMP Set LOC : = RIGHT Go to step 3 [End of if structure] step-4 [Scan from left to right] Repeat WHILE LA[LEFT] <= LA[LOC] and LEFT != RIGHT LEFT : = LEFT + 1 [End of loop] If LOC = LEFT then Return If LA[LEFT] > LA[LOC] then [Interchange LA[LEFT] and LA[LOC] TEMP : = LA[LOC] LA[LOC] : = LA[LEFT] LA[LEFT] : = TEMP Set LOC : = LEFT Go to step 2 [End of if structure] BUBBLE SORT: In this method of sorting each element is compared with succeeding element and if preceding element is found greater then succeeding element than interchange the these two element. In this way largest element is sink in the end of the array and smallest element bubble up in the start of the array so this process is called bubble sort. For example Suppose the list of numbers LA[1],LA[2], . . . . . ,LA[N] is in memory the bubble sort algorithm works as follows. Step 1. Compare LA[1] and LA[2] and arrange them in the desired order so that LA[1]<LA[2]. Then compare LA[2] and LA[3] and arrange them so that LA[2]<LA[3]. Then compare LA[3]and LA[4] and arrange them so that LA[3]<LA[4]. Continue this procedure until you compare LA[N-1] and LA[N] and arrange them so that LA[N-1]<LA[N]. After step-1 the largest element takes its place in LA [N]. Step 2. 79 Repeat step 1 with one less comparison i.e, now we stop after compare and possibly arrange LA[N-2]<LA[N-1]. After step- 2 largest element takes its place at LA[N-1]. Step 3. Repeat step 1 with two less comparisons i.e, we stop after we compare and possibly rearrange LA[N-3]<LA[N-2]. …………………………………………… …………………………………………… …………………………………………… …………………………………………… Step N-1. Compare LA[1] and LA[2] and arrange them so that LA[1]<LA[2]. After step N-1 the list will be sorted in ascending order. The process of sequentially traversing through all or part of a list is frequently called a PASS, so each of the above steps is called a pass. Accordingly the bubble sort algorithm requires N-1 passes, where N is the no of input items. ALGORITHM FOR BUBBLE SORT. (La is linear array. In this process of sorting we arrange the element of array. So that N will be the total number of array PTR is selected element of array which compare with succeeding element.) 1. Repeat step 2 and 3 For k:=1 to N-1 2. [Initialize the pointer] Set PTR:=1 3. [Execute pass] Repeat while PTR<= N-k (i) if LA[PTR]>LA[PTR+1] then interchange LA[PTR] and LA[PTR+1] [End of if structure] (ii) Set PTR:=PTR+1. [End of inner loop] [End of step 1 outer loop] 4. Exit INSERTION SORT: Suppose an array LA with n elements LA[1], LA[2], LA[3], . . . . . , LA[n] is in memory. The insertion sort algorithm scans LA from LA[1] to LA[n], Inserting each element LA[k] into its proper position. Following steps will be taken for this purpose. Pass 1. LA[1] by itself is trivially stored. 80 Pass 2. LA[2] is inserted either before or after LA[1] so that LA[1], LA[2] is sorted. LA[3] is inserted into its proper place either before LA[1], between LA[1] and LA[2] or after LA[2] so that LA[1], LA[2], LA[3] is sorted. LA[4] is inserted into its proper place either before LA[1], between LA[1] and LA[2] or after LA[2] and befor LA[3] or after LA[3] . So that LA[1], LA[2], LA[3], LA[4] is sorted. Pass 3. Pass 4. And at the end Pass N LA[n]is inserted into its proper place in either before LA[1] or after LA[2] or between LA[3] and LA[4] and up to so on. So that LA[1],LA[2LA[3]…….to LA[n] all the element will be sorted. Consider the following data Using insertion sort. Pass K=1 K=2 K=3 K=4 K=5 K=6 K=7 K=8 Sorted A[0] -Infinite -Infinite -Infinite -Infinite -Infinite -Infinite -Infinite -Infinite -Infinite A[1] 77 77 33 33 11 11 11 11 11 A[2] 33 33 77 44 33 33 22 22 22 A[3] 44 44 44 77 44 44 33 33 33 A[4] 11 11 11 11 77 77 44 44 44 A[5] 88 88 88 88 88 88 77 66 55 A[6] 22 22 22 22 22 22 88 77 66 A[7] 66 66 66 66 66 66 66 88 77 INSERTION SORT ( A, N, PTR, TEMP, K ) (This algorithm sorts the array LA with N elements. Temp is the temporary location to hold the element be inserted 1<= k<=N. 1. [ Initialize the sentinel element ] Set LA[0] := - 2. Repeat step 3 to stop 5 for k = 2,3, . . . N. 3. Set TEMP : = LA[k] and PTR : = k-1 4. Repeat while TEMP < LA[ PTR ] (a) [ Move element forward ] Set LA[PTR + 1] : = LA[PTR] (b) Set PTR : = PTR – 1 [End of loop] 5. [ Insert the element ] Set LA[PTR +1] : = TEMP 81 A[8] 55 55 55 55 55 55 55 55 88 [End of step 2 loop] 6. [Finish] Return Merge sort Merge sort is based on the divide-and-conquer paradigm. Its worst-case running time has a lower order of growth than insertion sort. Since we are dealing with subproblems, we state each subproblem as sorting a subarrayA[p .. r]. Initially, p = 1 and r = n, but these values change as we recurse through subproblems. To sort A[p .. r]: 1. Divide Step If a given array A has zero or one element, simply return; it is already sorted. Otherwise, split A[p .. r] into two subarraysA[p .. q] and A[q + 1 .. r], each containing about half of the elements of A[p .. r]. That is, q is the halfway point of A[p .. r]. 2. Conquer Step Conquer by recursively sorting the two subarraysA[p .. q] and A[q + 1 .. r]. 3. Combine Step Combine the elements back in A[p .. r] by merging the two sorted subarraysA[p .. q] and A[q + 1 .. r] into a sorted sequence. To accomplish this step, we will define a procedure MERGE (A, p, q, r). Note that the recursion bottoms out when the subarray has just one element, so that it is trivially sorted. Algorithm: Merge Sort To sort the entire sequence A[1 .. n], make the initial call to the procedure MERGE-SORT (A, 1, n). MERGE-SORT (A, p, r) 1. 2. 3. 4. 5. IF p<r THEN q = FLOOR[(p + MERGE (A, p, q) MERGE (A, q + 1, MERGE (A, p, q, r) 82 // r)/2] Check for base case // Divide step // Conquer step. r) // Conquer step. // Conquer step. Example: Bottom-up view of the above procedure for n = 8. Merging What remains is the MERGE procedure. The following is the input and output of the MERGE procedure. INPUT: Array A and indices p, q, r such that p ≤ q ≤ r and subarrayA[p .. q] is sorted and subarrayA[q + 1 .. r] is sorted. By restrictions on p, q, r, neither subarray is empty. OUTPUT: The two subarrays are merged into a single sorted subarray in A[p .. r]. We implement it so that it takes Θ(n) time, where n = r − p + 1, which is the number of elements being merged. Idea behind Linear Time Merging Think of two piles of cards, Each pile is sorted and placed face-up on a table with the smallest cards on top. We will merge these into a single sorted pile, face-down on the table. 83 A basic step: Choose the smaller of the two top cards. Remove it from its pile, thereby exposing a new top card. Place the chosen card face-down onto the output pile. Repeatedly perform basic steps until one input pile is empty. Once one input pile empties, just take the remaining input pile and place it face-down onto the output pile. Each basic step should take constant time, since we check just the two top cards. There are at most n basic steps, since each basic step removes one card from the input piles, and we started with n cards in the input piles. Therefore, this procedure should take Θ(n) time. Now the question is do we actually need to check whether a pile is empty before each basic step? The answer is no, we do not. Put on the bottom of each input pile a special sentinel card. It contains a special value that we use to simplify the code. We use ∞, since that's guaranteed to lose to any other value. The only way that ∞ cannot lose is when both piles have ∞ exposed as their top cards. But when that happens, all the no sentinel cards have already been placed into the output pile. We know in advance that there are exactly r − p + 1 no sentinel cards so stop once we have performed r − p + 1 basic steps. Never a need to check for sentinels, since they will always lose. Rather than even counting basic steps, just fill up the output array from index p up through and including index r . Example A call of MERGE(A, 9, 12, 16). Read the following figure row by row. That is how we have done in the class. The first part shows the arrays at the start of the "for k ← p to r" loop, where A[p . . q] is copied into L[1 . . n1] and A[q + 1 . . r ] is copied into R[1 . . n2]. Succeeding parts show the situation at the start of successive iterations. Entries in A with slashes have had their values copied to either L or R and have not had a value copied back in yet. Entries in L and R with slashes have been copied back into A. The last part shows that the subarrays are merged back into A[p . . r], which is now sorted, and that only the sentinels (∞) are exposed in the arrays L and R.] 84 The first two for loops (that is, the loop in line 4 and the loop in line 6) take Θ(n1 + n2) = Θ(n) time. The last for loop (that is, the loop in line 12) makes n iterations, each taking constant time, for Θ(n) time. Therefore, the total running time is Θ(n). 85 Analyzing Merge Sort For simplicity, assume that n is a power of 2 so that each divide step yields two sub problems, both of size exactly n/2. The base case occurs when n = 1. When n ≥ 2, time for merge sort steps: Divide: Just compute q as the average of p and r, which takes constant time i.e. Θ(1). Conquer: Recursively solve 2 subproblems, each of size n/2, which is 2T(n/2). Combine: MERGE on an n-element subarray takes Θ(n) time. Summed together they give a function that is linear in n, which is Θ(n). Therefore, the recurrence for merge sort running time is Selection sort In selection sort is a sorting algorithm, specifically an in-placecomparison sort. It has O(n2) time complexity, making it inefficient on large lists, and generally performs worse than the similar insertion sort. Selection sort is noted for its simplicity, and also has performance advantages over more complicated algorithms in certain situations, particularly where auxiliary memory is limited. The algorithm divides the input list into two parts: the sublist of items already sorted, which is built up from left to right and is found at the end, and the sublist of items remaining to be sorted, occupying the rest of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by always finding the next largest (or smallest, depending on sorting order) element and exchanges it with the final element of the array. Here is an example of this sort algorithm sorting five elements: 64 25 12 22 11 11 25 12 22 64 86 11 12 25 22 64 11 12 22 25 64 11 12 22 25 64 (nothing appears changed on this last line because the last 2 numbers were already in order) Selection sort can also be used on list structures that make add and remove efficient, such as a linked list. In this case it is more common to remove the minimum element from the remainder of the list, and then insert it at the end of the values sorted so far. For example: 64 25 12 22 11 11 64 25 12 22 11 12 64 25 22 11 12 22 64 25 11 12 22 25 64 Class Data structure Worst case performance Best case performance Average case performance Worst case space complexity Sorting algorithm Array О(n2) О(n2) О(n2) О(n) total, O(1) auxiliary This type of sorting is called "Selection Sort" because it works by repeatedly element. It works as follows: first find the smallest in the array and exchange it with the element in the first position, then find the second smallest element and exchange it with the element in the second position, and continue in this way until the entire array is sorted. SELECTION_SORT (A) for ← i min min j for If ← A[j] min min 1 to j x i + < 1 min n-1 ← ← to x ← j x 87 n ← do i; A[i] do then j A[j] A[min A[i] ← min x The 1. Set 2. Find 3. Swap 4. Set 5. Repeat Here Let A You ← j] A [i] algorithm works as follows: first position as current position. the minimum value in the list it with the value in the current position next position as current position Steps 2-4 until you reach end of list is be an a array with will need more detailed n elements: A(1), A(2), to setup explanation: ... , A(n) 2 loops. First loop will be used to set index (i) of current position (i = 1 to n-1) 0. Set i = 1 1. a) min_idx = i (save current index) . . b) tmp_num = A(i) (save value of array at current position) . . c) j = i + 1 (set index to next position in unsorted list) Second loop will be used to find smallest item in unsorted list. It will check each item to right of current position (j = i+1 to n) If an item is smaller than item in current position (A(j) < A(i)), then move into current position and save index for this item. (Do not move item from current position to position j yet, since we might have smaller item later in list). 2. a) If A(j) < A(i) then . . . . . . A(i) = A(j) . . . . . . min_idx = j . . . . end if . . b) j = j + 1 . . c) Repeat steps 2a-2b until you reach end of list (j > n) When second loop is done, current position will now contain smallest value in unsorted list and min_idx will contain position of smallest value. We can now move item from current position i (we saved this at beginning of first loop in tmp_num) to position min_idx. 3) a. A(min_idx) = . . b. i = i . . c. Repeat steps 1a-3b until you reach end of list (i > n-1) 88 + tmp_num 1 Heap Sort The binary heap data structures is an array that can be viewed as a complete binary tree. Each node of the binary tree corresponds to an element of the array. The array is completely filled on all levels except possibly lowest. We represent heaps in level order, going from left to right. The array corresponding to the heap above is [25, 13, 17, 5, 8, 3]. The root of the tree A[1] and given index i of a node, the indices of its parent, left child and right child can be computed PARENT return LEFT return RIGHT return 2i + 1 (i) floor(i/2) (i) 2i (i) Let's try these out on a heap to make sure we believe they are correct. Take this heap, 89 which is represented by the array [20, 14, 17, 8, 6, 9, 4, 1]. We'll go from the 20 to the 6 first. The index of the 20 is 1. To find the index of the left child, we calculate 1 * 2 = 2. This takes us (correctly) to the 14. Now, we go right, so we calculate 2 * 2 + 1 = 5. This takes us (again, correctly) to the 6. Now let's try going from the 4 to the 20. 4's index is 7. We want to go to the parent, so we calculate 7 / 2 = 3, which takes us to the 17. Now, to get 17's parent, we calculate 3 / 2 = 1, which takes us to the 20. Heap Property In a heap, for every node i other than the root, the value of a node is greater than or equal (at most) to the value of its parent. A[PARENT (i)] ≥ A[i] Thus, the largest element in a heap is stored at the root. Following is an example of Heap: By the definition of a heap, all the tree levels are completely filled except possibly for the lowest level, which is filled from the left up to a point. Clearly a heap of height h has the minimum number of elements when it has just one node 90 at the lowest level. The levels above the lowest level form a complete binary tree of height h -1 and 2h -1 nodes. Hence the minimum number of nodes possible in a heap of height h is 2h. Clearly a heap of height h, has the maximum number of elements when its lowest level is completely filled. In this case the heap is a complete binary tree of height h and hence has 2h+1 -1 nodes. Following is not a heap, because it only has the heap property - it is not a complete binary tree. Recall that to be complete, a binary tree has to fill up all of its levels with the possible exception of the last one, which must be filled in from the left side. Height of a node We define the height of a node in a tree to be a number of edges on the longest simple downward path from a node to a leaf. Height of a tree The number of edges on a simple downward path from a root to a leaf. Note that the height of a tree with n node is n-element heap has height lg n which is (lgn). This implies that an lg n In order to show this let the height of the n-element heap beh. From the bounds obtained on maximum and minimum number of elements in a heap, we get 2h ≤ n ≤ 2h+1-1 Where n is the number of elements in a heap. 2h ≤ n ≤ 2h+1 Taking logarithms to the base 2 h ≤ lgn ≤ h +1 91 It follows that h = lgn . We known from above that largest element resides in root, A[1]. The natural question to ask is where in a heap might the smallest element resides? Consider any path from root of the tree to a leaf. Because of the heap property, as we follow that path, the elements are either decreasing or staying the same. If it happens to be the case that all elements in the heap are distinct, then the above implies that the smallest is in a leaf of the tree. It could also be that an entire subtree of the heap is the smallest element or indeed that there is only one element in the heap, which in the smallest element, so the smallest element is everywhere. Note that anything below the smallest element must equal the smallest element, so in general, only entire subtrees of the heap can contain the smallest element. Inserting Element in the Heap Suppose we have a heap as follows Let's suppose we want to add a node with key 15 to the heap. First, we add the node to the tree at the next spot available at the lowest level of the tree. This is to ensure that the tree remains complete. 92 Let's suppose we want to add a node with key 15 to the heap. First, we add the node to the tree at the next spot available at the lowest level of the tree. This is to ensure that the tree remains complete. Now we do the same thing again, comparing the new node to its parent. Since 14 < 15, we have to do another swap: Now we are done, because 15 20. Four basic procedures on heap are 1. 2. 3. 4. Heapify, which runs in O(lg n) time. Build-Heap, which runs in linear time. Heap Sort, which runs in O(n lg n) time. Extract-Max, which runs in O(lg n) time. 93 Maintaining the Heap Property Heapify is a procedure for manipulating heap data structures. It is given an array A and index i into the array. The subtree rooted at the children of A[i] are heap but node A[i] itself may possibly violate the heap property i.e., A[i] <A[2i] or A[i] < A[2i +1]. The procedure 'Heapify' manipulates the tree rooted at A[i] so it becomes a heap. In other words, 'Heapify' is let the value at A[i] "float down" in a heap so that subtree rooted at index i becomes a heap. Example of Heap Suppose we have a complete binary tree somewhere whose subtrees are heaps. In the following complete binary tree, the subtrees of 6 are heaps: The Heapify procedure alters the heap so that the tree rooted at 6's position is a heap. Here's how it works. First, we look at the root of our tree and its two children. We then determine which of the three nodes is the greatest. If it is the root, we are done, because we have a heap. If not, we exchange the appropriate child with the root, and continue recursively down the tree. In this case, we exchange 6 and 8, and continue. 94 Now, 7 is greater than 6, so we exchange them. We are at the bottom of the tree, and can't continue, so we terminate. Building a Heap We can use the procedure 'Heapify' in a bottom-up fashion to convert an array A[1 . . n] into a heap. Since the elements in the subarray A[ n/2 +1 . . n] are all leaves, the procedure BUILD_HEAP goes through the remaining nodes of the tree and runs 'Heapify' on each one. The bottom-up order of processing node guarantees that the subtree rooted at children are heap before 'Heapify' is run at their parent. BUILD_HEAP (A) 1. heap-size (A) ← length [A] 2. For i ← floor(length[A]/2) down to 1 do 3. Heapify (A, i) 95 We can build a heap from an unordered array in linear time. Heap Sort Algorithm The heap sort combines the best of both merge sort and insertion sort. Like merge sort, the worst case time of heap sort is O(n log n) and like insertion sort, heap sort sorts in-place. The heap sort algorithm starts by using procedure BUILD-HEAP to build a heap on the input array A[1 . . n]. Since the maximum element of the array stored at the root A[1], it can be put into its correct final position by exchanging it with A[n] (the last element in A). If we now discard node n from the heap than the remaining elements can be made into heap. Note that the new element at the root may violate the heap property. All that is needed to restore the heap property. HEAPSORT (A) 1. BUILD_HEAP (A) 2. for i ← exchange heap-size [A] Heapify (A, 1) length A[1] ← (A) down to ↔ heap-size [A] 2 - do A[i] 1 Radix Sort Radix sort is one of the linear sorting algorithms for integers. It functions by sorting the input numbers on each digit, for each of the digits in the numbers. However, the process adopted by this sort method is somewhat counterintuitive, in the sense that the numbers are sorted on the least-significant digit first, followed by the second-least significant digit and so on till the most significant digit. Radix sorting involves looking at a radix (or digit) of a number and placing it in an array of linked lists to sort it. Algorithm for radix sorting: 1. Look at the rightmost digit. 2. Assign the full number to that digits index. 3. Look at the next digit to the left FROM the current sorted array. IF there is no digit, pad a 0. 4. REPEAT STEP 3 UNTIL all numbers have been sorted. Let's see a step by step example of a radix sort of the following set of unsorted numbers. The bold digits here represent the first digit to look at when attempting to sort the list. You must also append it to the end of that linked list in the array. 212 21 725 431 898 616 249 96 Step 1: 0 1 2 3 4 5 6 7 8 9 21 ->431 212 ->72 24 05 616 898 09 Step 2: (working from step 1) 0 005 ->009 1 212 ->616 2 021 ->024 3 431 4 5 6 7 072 8 9 898 Step 3: (working from step 2) 0 5 -> 9 -> 21 -> 24 -> 72 1 2 212 3 4 431 5 6 616 7 8 898 9 Step 3 is the final step and the list is sorted. The benefits of a radix sort is the fact that it can be done by pencil and paper. It also only contains a fixed data structure (an array of size 10). The downside of radix sort is that it takes time to implement since you may manually go through numerous steps to sort the list depending on how many numbers you have to sort. Here is another example of radix sort, this time using numbers up to 4 digits in length. You will notice something interesting here… 58 99 999 47 200 101 1002 12 1111 Step 1: 97 0 1 2 3 4 5 6 7 8 9 200 101 -> 1111 1002 ->12 47 58 99 -> 999 Step 2: (working from step 1) 0 200 ->101 -> 1002 1 1111 ->012 2 3 4 047 5 058 6 7 8 9 099 ->999 Step 3: (working from step 2) 0 1002 ->0012 ->0047 ->0058 ->0099 1 0101 ->1111 2 0200 3 4 5 6 7 8 9 0999 Step 4: (working from step 3) 0 12 -> 47 -> 58 -> 99 -> 101 -> 200 -> 999 1 1002 -> 1111 2 3 4 5 6 7 8 9 Step 4 is the final step here. Notice however that the index 0 goes from 0 to 999 while 1 goes from 1000 to 1999 etc. 98 Exchange Sorting The second class of sorting algorithm that we consider comprises algorithms that sort by exchanging pairs of items until the sequence is sorted. In general, an algorithm may exchange adjacent elements as well as widely separated ones. In fact, since the insertion sorts considered in the preceding section accomplish the insertion by swapping adjacent elements, insertion sorting can be considered as a kind of exchange sort. The reason for creating a separate category for insertion sorts is that the essence of those algorithms is insertion into a sorted list. On the other hand, an exchange sort does not necessarily make use of such a sorte Preliminaries Assume list of n integers is stored in a “row” which we call an array Assume that the individual integers are called elements Assume each position is numbered left to right from 1 to n. Let index refer to the position of a particular array element Assume two pointers, alpha and beta, which are indices into the array Assume alpha-element is the integer “pointed to” by alpha; beta-element the integer pointed to by beta. Algorithm 1. 2. 3. 4. 5. 6. 7. Let alpha point to element #1 Let beta point to element #2 If alpha-element > beta-element exchange their positions Increment beta If beta <= n goto Step #3; otherwise continue to Step #6 Increment alpha & set beta to alpha + 1 If alpha < n goto Step #3; otherwise STOP – list is sorted. EXCHANGE SORT ALGORITHM The following steps define an algorithm for sorting an array, 1. Set i to 0 2. Set j to i + 1 3. If a[i] > a[j], exchange their values 4. Set j to j + 1. If j < n goto step 3 5. Set i to i + 1. If i < n - 1 goto step 2 6. a is now sorted in ascending order. Note: n is the number of elements in the array. 99
© Copyright 2026 Paperzz