Integer numbers 2n numbers can be written using n-bits (NBC, C2) Computer Science Basics of computer arithmetic, cont. NBC – integer numbers spanning range 0...2n-1 C2 – integer numbers spanning range –2n-1...+2n-1-1 Cezary Bolek 2n numbers [email protected] NBC University of Lodz C2 Faculty of Management ..... 0 1 2 3 Imin=2n-1 ..... -2 -1 0 1 2 Imax=2n-1 ..... Imax=2n-1-1 Department of Computer Science Computer Literacy Every decimal digit is coded separately using 4 bits. Only combination from 0000 to 1001 are valid. For n-bit integer number notation: 1. All the numbers are available within given range. 2. Arithmetic operations (+,-,*,div,mod) always give accurate, integer results (with no approximation nor rounding). 3. Overflow error (range exceeding) is signaled by: 17BCD 0001 0111b 1001 0101b 95BCD BCD Code allows to avoid conversion between decimal and binary systems, but notation needs more space (memory) and conversion relies on more complicated (and slower) arithmetic algorithms and operations. a) carry (overflow) (NBC) b) change of result sign (C2, for same sign numbers) Cezary Bolek <[email protected]> 2 BCD Code (Binary Coded Decimal) Integer number arithmetic Computer Literacy Cezary Bolek <[email protected]> 3 Computer Literacy Cezary Bolek <[email protected]> 4 1 m.n Form of Binary Numbers Rational numbers n.m notation can be used for any positional notation. x Notation radix (base of numbering system) is not the limitation, so can be used for binary numbers. Fractional notation y is not convenient from computer storage and calculation point of view. am ... a1 a0 . a-1 a-2 ... a-n Fixed point notation using point as separator of integer part and fractional n.m can be useful only for limited range of numbers. Integer and fractional parts length must be fixed to n and m. (Fixed Point Notation) m 1 0 -1 -n am·2 + ...+ a1·2 + a0·2 + a-1·2 + ... + a-n·2 101.11b = 22+20+2-1+2-2=4+1+0.5+0.25=5.75 100.001b = 4 + 1/8 = 4.125 Computer Literacy Cezary Bolek <[email protected]> 5 Computer Literacy Floating Point notation Floating pint notation is often called Scientific Notation. where: Scientific notation is convenient for storing (writing) both very large and very small numbers. X – mantissa (fractional part), Y – exponent, b – number system base - radix Computer Literacy 149600000.0 = .149E9 -0.000076502 = -76.502e-6 = 0.149*109 = -76.502*10-6 Cezary Bolek <[email protected]> 6 Floating Point notation Every number can be presented in exponential form: ±X * bY, 149600000.0 -0.000076502 Cezary Bolek <[email protected]> 7 Computer Literacy Cezary Bolek <[email protected]> 8 2 Normalized form Floating Point Binary Notation Normalized number form: only one significant digit on the right side of point. General form: ± X*2Y 11010b = 0.1101b*25 -0.000000101b = -1.01b*2-7 149600000.0 = 1.49E8 -0.000076502 = -7.6502e-5 110100000.0b = 1.101b*28 -0.000000101b = -1.01b*2-7 For binary system: 11010 = 0.1101*10101 -0.000000101 = -1.01*10-111 Computer Literacy Cezary Bolek <[email protected]> In binary system, mantissa of normalized numbers always has digit 1 on the right side of separating point. 9 Computer Literacy Code IEEE754 (-1)s * 1.f * 2e-127 IEEE754 (1985) - standard of floating point binary numbers notation (and storage form) Mantissa is stored without leading digit (1), what makes notation more compact. Following binary number form * 1.f * Exponent is shifted (increased by 127), what solves the problem of negative numbers.. 2e-127 110000001010000...000b = -1 * 1.01b*2129-127 = -1.01b*22 = -5 can be stored in 32-bit as follows: s sign 1 bit Computer Literacy e exponent 8 bits 10 IEEE754 cont. IEEE – Institute of Electrical and Electronics Engineers (-1)s Cezary Bolek <[email protected]> f 001100001000000...000b = +1 * 1.0b*296-127 = 1 * 2-31 ≈ 4.65e-10 mantissa 23 bits Cezary Bolek <[email protected]> 11 Computer Literacy Cezary Bolek <[email protected]> 12 3 Text coding Conversion Between Number Systems Basic set of characters: • Letters (lowercase and capitals) • Punctuation marks and others • Digits • Space Total: less than 100 characters Numbers with finite expansion (finite length) in one number system may have infinite expansion in another. This results in irreversible loss of accuracy while conversion is performed. 7 bits are needed to store single character. 0.13 = 0.33333333333333333333333310... 0.17 = 0.14285714285714285714285710... 0.110 = 0,00011001100110011001100112... Computer Literacy Cezary Bolek <[email protected]> Practically, to keep in 8-bit standard, 1 byte is used to store single character. Question: What are the values that should be assigned to specific characters? Coding standard. 13 Computer Literacy ASCII Code Cezary Bolek <[email protected]> 14 ASCII cont. ASCII – The American Standard Code for Information Interchange 0..31 (0..1Fh) – terminal control codes bs lf cr esc ASCII is 7-bit code fixed and accepted as a standard in 1968. ASCII code was significant step in achievement of basic compatibility in text data exchange between different computer systems. – back space – move caret back – line feed – move caret down – carriage return – move caret to the beginning of line – escape – special code signaling 32 (20h) – space 48..57 (30..39h) – codes of digits (0..9) 65..90 (41..5Ah) – codes of capital letters (A..Z) 97..122 (61..7Ah) – codes of small letters (a..z) Computer Literacy Cezary Bolek <[email protected]> 15 Computer Literacy Cezary Bolek <[email protected]> 16 4 Codepages ASCII in practice Dzień i noc. ASCII code does not define characters and corresponding codes above 127. Undefined ASCII codes are assigned to national (German, French, etc.) letters and to some special characters (©, ®, «, and others) that are not defined in basic ASCII set. 44 7A 69 65 F1 0D 0A 69 20 6E 6F 63 2E D z i e ń Code Pages define this assignment (codes 128 to 255). i cr lf n o c . space character code > 128 “ń” letter (cp1250) Some Polish Code Pages: DOS – cp852, Mazovia, ... Windows – cp1250 Unix, Internet – iso-8859-2 Computer Literacy Cezary Bolek <[email protected]> New line coding standards: DOS, Windows: cr+lf Unix: lf Mac OS: cr 17 Unicode ASCII – Computer Literacy Cezary Bolek <[email protected]> 18 Unicode - UTF (Unicode Transformation Format) limited standard + national characters coding problems (many codepages) Unicode characters can be stored in file using different methods for 4-byte character coding Solution: to gather in a single coding set all the letters (characters) met in every language in the world (European languages, Cyrillic, Hebrew, Arabic, Japanese, Chinese, hieroglyphs, etc.) and various symbols (technical, medical, musical, pronunciation, etc.). This methods are called: UTF - Unicode Transformation Format UTF-32 (UCS-4) – fully 4-byte coding with no tranformation Unicode – standard for text coding based on 32-bit length (4B). Version 5.2 of Unicode defines 107,296 various characters. UTF-16 – 2-byte sequence coding, basic character are coded based on 2-byte sequence, additional symbols use 4 bytes UTF-8 – variable length, 8-bit coding; single character can be stored in 1 to 6 bytes Every Unicode code unambiguously identifies symbol stored in any computer in the world. Unicode standard allows to use in a single text document any contemporary language specific characters (and historical). UTF-7 – variable length, 7-bit coding; designed for mail systems, presently almost not in use For compatibility with ASCII code first 128 Unicode codes are the same in both standards. http://www.unicode.org Computer Literacy Cezary Bolek <[email protected]> 19 Computer Literacy Cezary Bolek <[email protected]> 20 5 Unicode cont. Unicode UTF-8 - sample UTF-7 (7-bit format) i UTF-8 (8-bit format): ASCII characters coded with no differences; only codes that are greater than 127 are being modified (text in Polish language are only a dozen percent larger than coded in non-Unicode, ASCII based standards) w zaleŜności UTF-8: UTF-16 and UTF-32 – formats has two flavors, the difference relies on the order of bytes in word: •Little Endian – less significant byte goes first (Windows) •Big Endian – more significant byte goes first 77 20 7A 61 6C 65 C5 BC 6E 6F C5 9B 63 69 w z a l e Ŝ space ś n o c i Polish character ś Polish character Ŝ ! Not all systems nor programs support Unicode ! Unicode coded text can be even more than twice as big as ASCII coded Computer Literacy Cezary Bolek <[email protected]> 21 Computer Literacy Cezary Bolek <[email protected]> Sound coding 22 Sound coding Sound recording system Sound: air particles vibrations (oscillations) that are received by microphone membrane as changes of pressure. ADC microphone analog/digital converter Vibration amplitude: sound volume Vibration frequency: sound pitch Sampling: continuous quantity measuring in certain, fixed time slots. Computer Literacy RAM DAC memory digital/analog converter speaker Compact Disc Standard: 16-bit ADC/DAC converters (each sample: 2B) sampling frequency: 44kHz (sample every 23µs) 1s of sound (mono) ≈ 86kB, 1h ≈ 302MB Sampling frequency must be higher than frequency of signal being sampled (at least twice). Cezary Bolek <[email protected]> memory Sound playing system odchylenie membrany czas RAM 23 Computer Literacy Cezary Bolek <[email protected]> 24 6 Image coding Image coding Color image – single pixel: 1B, 2B, 3B lub 4B 8-bit color depth – 256 colors 16-bit color depth – 65536 colors 24,32-bit color depth – 16.7 million colors Monochrome image – every image point (pixel) is coded with one bit. 1 byte 1 byte image line length c de olo pt r h 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00011100 01001000 00100010 01010000 00100010 01100000 00100010 01100000 00100010 01010000 00100010 01001000 00100010 01000100 00011100 01000100 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 image memory: 16p x 16p x 1B = 256B vertical resolution Image lines image memory: 16 lines x 2B = 32B 640x480x8bpp = 300kB 1024x768x8bpp=768kB 640x480x24bpp = 900kB 1024x768x24bpp=2.25MB horizontal resolution Computer Literacy Cezary Bolek <[email protected]> 25 Computer Literacy Data compression 26 The most often used lossless compression methods: vocabulary methods, which are based on searching and replacing exact occurrences of certain bit or characters sequences and thus replacing them by data that needs less storage space, e.g. replacing word “horn” by shorter sequence bits than are needed to code (store) 4 independent characters. On other hand, knowledge of “horn” coding sequence doesn’t allow to code “hornet” word, which has nothing in common with “horn”. Lossless compression: it is possible to reconstruct original data based on its compressed form (executable files compression, texts, graphics) ZIP, GIF statistical methods, which use shorter sequences of bits to replace symbols that occur more frequently in data being coded. For instance: “ch” sequence (pretty common in English) needs less data to be stored than two separated characters: “c” and “h”. Loss compression: it is not possible to reconstruct original data but decompressed data is acceptably similar to original, (multimedia data compression: images, video, sound) JPG, MP3 Cezary Bolek <[email protected]> Cezary Bolek <[email protected]> Lossless compression Data Compression – data size reduction by removing the redundancy. Computer Literacy bpp – bits per pixel 27 Computer Literacy Cezary Bolek <[email protected]> 28 7 Loss compression JPEG Loss compression is possible due to construction and the way human senses function. Human senses sensitivity depend on characteristic of information being received. Compression algorithms in most cases use: psychoacoustic models, psychovisual models less important data, information about sound or image (from human senses point of view) are neglected. The only information is stored or transferred that is characterized by high sensitivity and influence on human senses reception. Original image Compression 75% There are no loss compression algorithms that can applied to any type of data (it is not possible to compress executable EXE files based on loss algorithms because they are interpreted by machine, not by human senses). Image: •JPEG, Video: •DivX, MPEG, Real Video Audio: •MP3, OGG Vorbis, Real Audio, Computer Literacy Compression 95% Cezary Bolek <[email protected]> 29 Computer Literacy Cezary Bolek <[email protected]> 30 Motto There are 10 kinds of people: those who understand binary and those who do not. Computer Literacy Cezary Bolek <[email protected]> 31 8
© Copyright 2026 Paperzz