Chapter 1 Introduction In today’s information age, communication plays a very important role and has contributed heavily to the growth of technology. The Electronic security has increasingly involved in making communication more prevalent and robust. Therefore, a mechanism is a need for to assure the security and privacy of information that is sent over the electronic communication media. Whether the communication media is wired or wireless, it needs to be protected from the unauthorized access of information. The method of transforming the original information into an unreadable format is called Encryption and the reverse process is called Decryption of information. The study of encryption and decryption is known as Cryptography. Cryptography involves the study and the applications of the principles and techniques by which the information is rendered unintelligible to all but the intend to receive. On the other hand, the 1 Cryptanalysis is the science and art of solving cryptosystems to recover the unintelligible information. The computer security mainly consists of three parts namely; data confidentiality, data integrity and data authenticity. The data confidentiality is the protection of data from unauthorized disclosure. The data integrity is the assurance that the data received are exactly as sent by an authorized entity. The authentication is the assurance that the communication entity is the one that it claims to be. The present day Cryptography involves three distinct mechanisms namely; symmetric key encipherment, asymmetric key encipherment and hashing. 1.1 History of Cryptography Egyptain Hieroglyphs (1900 BC): This was one of the first known incidences of cryptography (Fig.1.1). A scribe used nonstandard hieroglyphs in an inscription. From the Greek meaning ’sacred writing’ was the picture language that was used often to decorate temples and monuments. It could be written with pen and ink on papyrus, or painted or carved onto stone. It was carefully drawn to make the signs as accurate as possible. Hieroglyphs were used to write the ancient Egyptian language. In the beginning hieroglyphic signs were used to keep records of the king’s possessions. Scribes could easily make these records by drawing a picture of a cow or a boat followed by a number. But as the language be2 came more complex more pictures were needed. Eventually the language consisted of more than 750 individual signs. Figure 1.1: Egyptian Hieroglyphs Mesopotamian Tablet (1500 BC): A 3”×2” Mesopotamian tablet contained an enciphered formula for making pottery glaze (Fig.1.2). Cuneiform signs were used in the least common syllabic values to attempt to hide secrets of the formula. Pictograms, or drawings representing actual things, were the basis for cuneiform writing. Early pictograms resembled the objects they represented, but through repeated use over the time they began to look simpler and even abstract. These marks eventually became wedge-shaped and could convey sounds or abstract concepts. Atbash Cipher(600 BC-500 BC): Hebrew scribes writing down the book of Jeremiah used a reverse-alphabet, simple substi3 Figure 1.2: Mesopotamian Tablet tution cipher known as the Atbash cipher (Fig.1.3). Many names of people and places are believed to have been deliberately obscured in the Hebrew Bible using this cipher. The Atbash cipher is a Hebrew code which substitutes the first letter of the alphabet for the last and the second letter for the second last, and so on. This cipher is one of the few used in the Hebrew language. The cipher itself, Atbash, is very similar to the substitution cipher. A substitution cipher is one where each letter of the alphabet actually represents another letter. In the case of the Atbash cipher, the first letter of the alphabet is substituted for the last, the second for the second last and so on. i.e., for us in English the letter ’A’ 4 becomes ’Z’, the letter ’B’ becomes ’Y’, the letter ’C’ becomes ’X’, and so on. Atbash gets it’s name from the fact that in the cipher, A becomes tav (the last), B becomes shin (one before last), and so on. Plain text: ABCDEFGHIJKLMNOPQRSTUVWXYZ Cipher text: ZYXWVUTSRQPONMLKJIHGFEDCBA Figure 1.3: Atbash Cipher Greek Skytale (486 BC): Ancient Greeks invented the ’Skytale’ (rhymes with Italy), which was a stick wrapped with narrow strips of papyrus, leather, or parchment. The message was written on the wrapping; then the strip was removed and passed to the messenger (Fig.1.4). Only if the receiver had the same size tube 5 would they be able to read the message. From indirect evidence, the skytale was first mentioned by the Greek poet Archilochus who lived in the 7th century BC. Other Greek and Roman writers during the following centuries also mentioned it, but it was not until Apollonius of Rhodes (middle of the 3rd century BC) that a clear indication of its use as a cryptographic device appeared. A description of how it operated was not known from before Plutarch. Mestrius Plutarch was a Greek historian/ biographer and essayist. Figure 1.4: Greek Skytale Frequency Analysis (1000AD): Frequency Analysis led 6 to techniques for breaking mono alphabetic substitution ciphers. Most likely motivated due to textual analysis of the Koran. It has been suggested that the close textual study of the Qur’an first brought to light that Arabic has a characteristic letter frequency. Its use spread, and was so widely used by European states by the Renaissance that several schemes were invented by cryptographers to defeat it. These included homophones, polyalphabetic substitution and polygraphic substitution schemes. The frequency analysis was based on the fact that in any given stretch of a language, letters and combinations of letters occur with varying frequencies (Fig.1.5). In the English language for example, ’E’ is the most common letter, while ’X’ is rare. Leon Alberti (1466): Leon Alberti invented the cipher disk and cryptographic key. Alberti’s cipher disk was polyalphabetic, meaning that a new alphabet could be created each time by turning the disk. This type of disk was the only method of using this type of cipher until the 16th century. Alberti thought his cipher was unbreakable. This assumption was based on his inquiries into frequency analysis, which was the most effective method of deciphering mono alphabetic cryptograms. Given enough crypto text, one could use the frequency of the letters in reference to a normal distribution to find the shift and solve the cryptogram. This system failed to solve polyalphabetic cryptograms, however, since the 7 Figure 1.5: Frequency Analysis letter distribution is garbled. Vigenere Cipher (1587): The Vigenere Cipher is polyalphabetic, meaning that instead of there being a one-to-one relationship between each letter and its substitute, there is a one-to-many relationship between each letter and its substitutes. The user chooses a keyword and repeats it until it matches the length of the plain text(Table1.1). Louis XIV The Great Cipher (1626): A nomenclature cipher developed by Antoine and Bonaventure Rossignol (Fig.1.6). Each number stood for a French syllable rather than single letters. The Great Cipher was used to encrypt the King’s most secret mes- 8 sages. In fact, The Man in the Iron Mask’s identity was protected by the Great Cipher. The Great Cipher was not broken for two centuries when Commandant Etienne Bazeries was able to take the most frequent occurring numbers to decipher one common word, less enemies. The Morse Code (1763): The Telegraph showed that, electro statically generated signals which stood for letters of the alphabet could be sent a long way through a wire with the circuit being completed through the Earth. The original telegraph used 26 wires; one for each letter of the alphabet. Samuel Morse creates Morse code: Morse code represents letters, numbers and punctuation marks by means of a code signal sent intermittently. This Table 1.1: Vigenere Cipher A B C D E F G H ... Y Z A A B C D E F G H ... Y Z B B C D E F G H I ... Z A C C D E F G H I J ... A B D D E F G H I J K ... B C E .. . E .. . F .. . G .. . H .. . I .. . J .. . K .. . L .. . ... .. . D .. . Y Y Z A B C D E F ... W X Z Z B C D E F G H ... X 9 C .. . Y Figure 1.6: LouisXIV The Great Cipher is an early form of digital communication. It uses two states ’on’ and ’off’, composed into five symbols: dit(’), dah(-), short gap between letters, medium gap between words and long gap between sentences. Morse code differed from the telegraph in that it sent code for each letter on a single wire rather than a wire for each letter. In 1863, the European form of Morse code was created. Kasiski breaks Vigenere Cipher (1863): Prussian major named Kasiski proposed a method for breaking a Vigenere cipher that consisted of finding the length of the keyword and then dividing the message into that many simple substitution cryptograms. Frequency analysis could then be used to solve the resulting simple 10 substitutions. Zimmerman Telegram (1917): The Zimmerman telegram was a secret telegram which included proposals for a German alliance with Mexico. The telegram was intercepted and decrypted by the British Government (Fig.1.7). Figure 1.7: Zimmerman Telegram The German ’ADFGVX’ (1918): The German ADFGVX cipher was the first cipher used by the German Army during World War I. This was a fractioning transposition cipher which combined a modified Polybius square with a single columnar transposition used to encode a 36 letter alphabet (26 letters plus 10 digits) (Table 1.2). 11 The Enigma (1918): Arthur Scherbius designed the Enigma - a device which allowed businesses to communicate confidential documents without having to resort to clumsy and slow codebooks. The device consisted of many rotors turning on a common axis. The rotors had numbers 1 through 26 marked on the edge, or the alphabet A-Z, and were equipped with 26 electrical contacts (one for each letter of the alphabet) so that when a letter was pressed, the output would depend on the position of the rotor and its cross wiring. Within the same year, the Enigma was put to use; most famously by Nazi Germany before and during World War II. The World War II (1937 - 1945): The Navajo code talkers have been credited with saving countless lives and hastening the end of the war. The code talkers primary job was to talk and transmit information on tactics, troop movements, orders and other vital battlefield information via telegraphs and radios in their Table 1.2: The Germen ’ADFGVX’ A D F G V X A S U B J E C D T A D F G H F K L M N O G P Q R V W X V Y Z 0 1 2 3 X 4 5 6 7 8 9 I 12 native dialect. A major advantage of the code talker system was its speed. The method of using Morse code often took hours where as, the Navajos handled a message in minutes. It has been said that if was not for the Navajo code talker’s, the Marines would have never taken Iwo Jima. The Navajo’s unwritten language was understood by fewer than 30 non-Navajo’s at the time of World War II. The size and complexity of the language made the code extremely difficult to comprehend, much less decipher. It was not until 1968 that the code became declassified by the US Government. Lucifer (1971): Horst Feistel created Lucifer at IBM’s, Thomas J. Watson Laboratory. Lucifer was the name given to several of the earliest civilian block ciphers and was a direct precursor to the Data Encryption Standard. Cryptographic HASH of passwords (1975): Hash algorithms are typically used to provide a digital fingerprint of a file’s contents to ensure that the file has not been altered by an intruder or virus. They generally help to preserve the integrity of a file. 1.2 Computer Security With the introduction of the computer, the need for automated tools for protecting files and other information stored on the computer became evident. This is especially the case for a shared 13 system, such as a time-sharing system, and the need is even more acute for systems that can be accessed over a public telephone network, data network, or the Internet. The collection of tools designed to protect data and to thwart hackers is computer security. In symmetric encipherment or secret key cryptography, an entity A can send a message to another entity B, over an insecure channel with the assumption that an adversary X cannot understand the contents of the message by simply eavesdropping over the channel. A encrypts the message using encryption algorithm; B decrypts the message using a decryption algorithm. Both of them use a single secret key. This system is used for one-to-one communication. A modern symmetric key block cipher encrypts an n-bit block of plain text or decrypts an n-bit block of cipher text. Encryption or decryption algorithm uses a k-bit secret key. The decryption algorithm must be inverse of the encryption algorithm and both operations must use the same secret key. A symmetric encryption scheme has five ingredients: • Plain text: This is the original intelligible message or data that is fed into the algorithm as input. • Encryption algorithm: The encryption algorithm performs various substitutions and transformations on the plain text. 14 • Secret key: The secret key is also an input to the encryption algorithm. The algorithm will produce a different output depending on the specific key being used at the time. The exact substitution and transformations performed by the algorithm depends on the key. • Cipher text: This is the scrambled message produced as output. It depends on the plain text and the secret key. For a given message, two different keys will produce two different cipher texts. The cipher text is an apparently random stream of data and is unintelligible. • Decryption algorithm: This is essentially the encryption algorithm that does inverse operation. It produces the original message by using the cipher text and the secret key. 1.2.1 Classical Ciphers They are classified into two types, namely; Substitution Cipher and Transposition Cipher. Substitution Ciphers • Additive cipher or Shift cipher or Ceaser cipher: It involves in replacing each letter of the alphabet with the letter standing some places further down the alphabet. Encryption algorithm is C ≡ P + B mod N and decryption 15 algorithm is P ≡ C − B mod N where P - Plaintext, C Ciphertext, B - secret key and N - number of alphabets in cipher. • Multiplication cipher or Linear transformation: It uses the substitution as: Encryption algorithm: C ≡ A ×P mod N and Decryption algorithm: P ≡ A−1 ×C mod N where A - encryption key and A−1 - decryption key. • Affine Transformation: It uses the transformation as : Encryption algorithm: C ≡ A×P +B mod N and decryption algorithm: P ≡ A0 × C + B 0 mod N , where A and B are encryption Keys A’ = A−1 and B’ = −A−1 B are decryption keys. • Mono Alphabetic Substitution Cipher: In this method a mapping is created between each plain text character and the corresponding cipher text character. • Polyalphabetic cipher: a) Vigener cipher: b) Beaufort Cipher: In Polyalphabetic substitution each occurrences of a character may have different substitute. The relationship between the characters in the plain text and a character in the cipher text is one - to - many. Polyalphabetic ciphers have the advantage of hiding the letter frequency of the underlying language. 16 • Play fair cipher: Multiple letter encryption cipher, which treats digram in the plain text as single units and translate these units into cipher text digram. • Hill cipher or Enciphering Matrices: In Hill cipher, the key is a square matrix of size m × m in which m is the size of the block. Let one block of the plain text be P ≡ P1 , P2 , · · · , Pm then the corresponding cipher text be C = C1 , C2 , · · · , Cm The encryption algorithm is C ≡ K ×P mod N , where K is a m × m enciphering key matrix, the necessary condition for the key matrix in the Hill cipher is that it must have a multiplicative inverse. • Auto key cipher: In this cipher the key is stream of strokes, in which each sub key is used to encrypt the corresponding character in the plain text. Encryption algorithm: Ci ≡ Pi + Pi+1 mod N , Co ≡ Po + K mod N . Decryption algorithm: Po ≡ Co − K mod N , Pi ≡ Ci − Pi−1 mod N . • One time pad: Shannon has shown that perfect secrecy can be achieved if each plain text symbol is encrypted with a key randomly chosen from a key domain. This idea is used in a cipher called one time pad, invented by Vernam. In this cipher, the key has the same length as the plain text and is chosen 17 completely random. The main difficulty in implementing this cipher is that the exchanging of key becomes difficult. Transposition cipher Transposition cipher reorders the letters or symbols in a predetermined order. • Keyless Transposition cipher: There are two methods in Keyless Transposition cipher. In the first method, the text is written into a table column by column and then transmitted row by row. In the second method, the text is written into the table row by row and then transmitted column by column. • Keyed Transposition cipher: Divide the plain text into groups of predetermined size, called blocks and then use a key to permute the characters in each block separately. • Combining two approaches: First the text is written into a table row by row. Second the permutation is done by recording the columns, third the new table is read column by column. They are also called column transmission cipher. • Double Transposition cipher: This can make the job of cryptanalysts more difficult. The algorithm is repeated twice with a different key or the same key. 18 1.2.2 The modern Symmetric Cipher Fiestel Cipher: The inputs to the encryption algorithm of Fiestel cipher are a plain text of length 2w bits and a key Kbits . The plain text is divided into two halves L0 and R0 . The two halves of the data pass through n rounds of processing and then combine to produce the cipher text block. Each round i has inputs Li−1 and Ri−1 , derived from the previous round, as well as a sub key Ki , derived from the key K. In general, the sub keys are different from K and from each other. The parameters of Fiestel cipher are: Block size = 32/64/128 bits, Key size = 32/64/128 bits, Number of rounds = 16. Data Encryption Standard (DES): The most widely used encryption scheme is based on the DES adopted in 1977 by National Bureau of Standards (now the National Institute of Standards and Technology (NIST)), as Federal Information Processing Standard 46 (FIPS PUB 46) The parameters of DES cipher are: Block size = 64-bits, Key size = 56-bits, Number of rounds = 16. Since the key size is 56-bits it is possible to break DES. To make the key size large, Triple DES is used where the key size is 168- bits. 19 Advanced Encryption Standard (AES): In 1997 NIST called for papers for a replacement of DES. The NIST specifications required a block size of 128 bits and three different key sizes 128, 192 and 256 bits and AES must be an open algorithm, available to the public worldwide. The announcement was made internationally to solicit resources from all over the world. The criteria defined by the NIST for selecting AES fall into three areas: • Security: The main emphasis was security. This criteria is focused on the resistance to cryptanalysis attacks other than Brute-force attacks. • Cost: Covers the computational efficiency and storage requirement for different implementations such as hardware, software or smart cards. • Implementation: The algorithm must have flexibility and simplicity that is implementation must be possible on any platforms. After the first AES candidate conference, NIST announced that 15 out of 21 received algorithms had met the requirements and have been selected as the first candidates in Aug 1998. After the second AES candidate conference which was held in Rome, NIST announced 5 out of 21 candidates - MARS, RC-6, Rijndael, Serpent 20 and Two fish were selected as finalists in August 1999. After the third AES candidate conference, NIST announced that Rijndael, designed by Belgium researchers Dr Joan Daemon and Dr Vincent Rijmen, was elected as Advanced Encryption Standard in October 2000. In Feb 2001, NIST announced that the draft of the Federal Information Processing standard (FIPS) was available for public review and comments. Finally, AES was published as FIPS 197 in the Federal Register in December 2001. As though the AES algorithm is resistant against algebraic attacks like differential and linear cryptanalysis, it may have threat from XSL attack. This is because the S-box used in AES is a static one. By making the S-box as key dependent dynamic S-box, the XSL attack becomes very difficult. If the key size increases Brute-force attack also needs more time. The AES algorithm is discussed in detail in Chapter 4. Asymmetric or public key cryptography: In this system there are two keys namely; public key and private key. To send a secure message to B, A first enciphers the message using B’s public key. To decrypt the message B uses his own private key. This system is used for one-to-many or many-to-one communication. Hashing: In hashing, a fixed length message digest is created out of the variable length message. The digest is normally much smaller than the message (128 bits, 256 bits or 512 bits normally), both the message and the digest are sent to B. Hashing is used for 21 providing the data integrity. Shannon introduced two fundamental properties for any block cipher to have perfect secure, namely; diffusion and confusion. The idea of diffusion is to hide the relationship between the cipher text and plain text. Diffusion implies that each symbol (character or byte or bit) in the cipher text is dependent on same or all symbols in the plain text i.e., if a single symbol in the plain text is changed several or all symbols in the cipher text will be changed. The idea of confusion is to hide the relation between the cipher text and the key. This will frustrate the adversary who tries to use the cipher text to find the key i.e., if a single bit in the key is changed, most or all bits in the cipher text will also be changed. The diffusion effect can be introduced on cipher text by permutation. The confusion effect can be introduced on cipher text by substitution box or S-box. Most of the modern block ciphers invariably use the S-box in different forms. In this thesis, the construction of S-box and Inverse S-box used in AES algorithm with necessary mathematical background are discussed in detail. The input to an S-box could be an n-bit word, but the output can be an m-bit or n-bit word, where the mapping from the inputs to the out puts is predefined. S-boxes are an important component of symmetric cryptosystems. Because AES has only one standard S-box, it has made it a target of algebraic attacks like the XSL (eXtended 22 Sparse Linearization) attack. Though none of these attacks have succeeded, they provide an incentive for dynamic S-boxes. 1.3 Types of Attacks Passive attacks: Passive attacks are in the nature of eavesdropping on, or monitoring of, transmissions. Attacks threatening confidentiality of information are snooping and traffic analysis. Snooping refers to unauthorized access to an interception of data. For example, a file inserted through the Internet may contain confidential information. An unauthorized entity may intercept the transmission and use the contents for his/her own benefit. To prevent snooping, the data can be made non illegible to the interceptor by using enciphering technique. Traffic analysis: Although the encipherment of data may make it non intelligible for the interceptor, he/she can obtain some other type of information by monitoring on line traffic. For example he/she can find the electronic address of the sender and/or the receiver. Active attacks: Active attacks involve some modification of the data stream or the creation of a false stream. The important active attacks are: • Attacks Threatening Integrity: The integrity of data can be threatened by modification. After intercepting or accessing 23 the information, the attacker modifies the information to make it beneficial to himself or herself or to others. Sometimes the attacker simply deletes or destroys the message to harm the system or to benefit from it. • Masquerading: It happens when the attacker impersonates somebody else. For example an attacker might steal the bank card and PIN of a bank customer and pretend that he/she is the customer. • Replaying: The attacker obtains a copy of the message sent by a user and later tries to replay/send it. For example, a person sends a request to his bank to ask for payment to the attacker, who had done a job for him. The attacker intercept the message and sends it again to cause another payment from the Bank. • Repudiation: The sender of the message might later deny that he/she has sent the message and the receiver of the message might deny that he has received the message. • Attacks threatening availability: It is a very common attack. It may slow down or totally interrupt the service of a system. The attacker can use several strategies to achieve this. He might send so many bogus requests to a server that the server crashes because of the heavy load. The attacker might 24 intercept and delete a server’s response to a client, making the client to believe that the server is not responding. The attacker may also intercept the requests from the clients, causing the clients to send the requests many times and over load the system. 1.4 Cryptanalysis The cryptanalytic attacks rely on the nature of the algorithm and the general characteristics of the plain text or even some sample of the plain text-cipher text pairs. This type of attack exploits the characteristics of the algorithm to attempt to deduce a specific plain text or to deduce the key being used. 1.4.1 Brute-force attack It involves trying every possible key until an intelligible translation of the cipher text into plain text is obtained. On an average, half of all possible keys must be tried to achieve success. Table 1.3 shows the average time required for exhaustive search. With the use of massively parallel organizations of microprocessors, it may be possible to achieve processing rates many orders of magnitude greater. 25 Table 1.3: The average time required for exhaustive search Key Number of al- Time size(bits) ternative keys quired re- Time required at 1 at 106 decryp- decryption/µs tion/ µs 32 4.3 × 109 35.8 minutes 2.15 milliseconds 56 7.2 × 1016 1142 years 10.01 hours 128 3.4 × 1038 5.4 × 1024 years 5.4 × 1018 years 168 3.7 × 1050 5.9 × 1036 years 5.9 × 1030 years 1.4.2 Differential cryptanalysis The rationale behind the differential cryptanalysis is to observe the behavior of the pairs of the text blocks evolving along each round of the cipher, instead of observing the evolution of a single text block. Consider the original plain text block m to consist of two halves m0 , m1 . Each round of block cipher output is swapped. At each round only one new m/2 bits block is created. Then the intermediate message halves are related as follows: mi+1 = mi−1 ⊕ f (mi , Ki ) i = 1, 2, ..n where Ki is round key and n is number of rounds. To start with two messages m and m0 , with known XOR difference ∆m= m ⊕ m0 and consider the difference between the intermediate message halves ∆mi = mi ⊕ m0i ∆mi+1 = mi+1 ⊕ m0i+1 26 = [mi−1 ⊕ f (mi , Ki ) ⊕ mi+1 ] = [m0i−1 ⊕ f (m0i , Ki )] Now, suppose that many pairs of inputs to f with the same difference yields the same output difference if the same sub key is used. If we know mi−1 and mi with high probability, then we know mi+1 with high probability. If a number of such differences are determined, it is feasible to determine the sub key used in the function f . It is found that in order to break a block cipher of 56-bit key it needs 247 chosen plain text with 247 encryptions. Although 247 is certainly significantly less than 255 the need for the adversary to find 247 chosen plain text makes this attack of only theoretical interest. 1.4.3 Linear cryptanalysis This attack is based on finding the linear approximations to describe the transformations performed in block ciphers. This method can find a block cipher key given 243 known plain texts as compared to 247 chosen plain text for differential cryptanalysis. Although this is a minor improvement, because it may be easier to acquire known plain text rather than chosen plain text, it still leaves linear cryptanalysis infeasible attack on block ciphers. 27 1.5 Organization of the Thesis This thesis addresses the enhancement of confidentiality and integrity using cryptographic techniques. The rest of the thesis has been organised as follows: Chapter 2 deals with the literature survey and the problem statement of the work. Chapter 3 deals with dynamic S-box generation and the avalanche criteria of the S-box. The static S-box used in the present Advanced Encryption Standard satisfies only 64 percent avalanche criteria. It has been shown that there are S-boxes which can satisfy the maximum avalanche criteria. In Chapter 4, the present AES algorithm and the modified AES algorithm with the dynamic S-box have been presented. In Chapter 5, the stream cipher generated based on the dynamic S-box has been discussed. In Chapter 6, another important security aspect integrity HASH function has been discussed. In this chapter, the modified Whirlpool Hash function generation with dynamic S-box is discussed and the results have been tabulated. Chapter 7 concludes this thesis with the directions for the future research. 28
© Copyright 2026 Paperzz