Computer Science

Integer numbers
2n numbers can be written using n-bits (NBC, C2)
Computer Science
Basics of computer arithmetic, cont.
NBC – integer numbers spanning range 0...2n-1
C2
– integer numbers spanning range –2n-1...+2n-1-1
Cezary Bolek
2n numbers
[email protected]
NBC
University of Lodz
C2
Faculty of Management
.....
0 1 2 3
Imin=2n-1
.....
-2 -1 0 1 2
Imax=2n-1
.....
Imax=2n-1-1
Department of Computer Science
Computer Literacy
Every decimal digit is coded separately using 4 bits.
Only combination from 0000 to 1001 are valid.
For n-bit integer number notation:
1. All the numbers are available within given range.
2. Arithmetic operations (+,-,*,div,mod) always give
accurate, integer results (with no approximation nor
rounding).
3. Overflow error (range exceeding) is signaled by:
17BCD
0001 0111b
1001 0101b
95BCD
BCD Code allows to avoid conversion between decimal
and binary systems, but notation needs more space
(memory) and conversion relies on more complicated (and
slower) arithmetic algorithms and operations.
a) carry (overflow) (NBC)
b) change of result sign (C2, for same sign numbers)
Cezary Bolek <[email protected]>
2
BCD Code
(Binary Coded Decimal)
Integer number arithmetic
Computer Literacy
Cezary Bolek <[email protected]>
3
Computer Literacy
Cezary Bolek <[email protected]>
4
1
m.n Form of Binary Numbers
Rational numbers
n.m notation can be used for any positional notation.
x
Notation radix (base of numbering system) is not the
limitation, so can be used for binary numbers.
Fractional notation y is not convenient from computer
storage and calculation point of view.
am ... a1 a0 . a-1 a-2 ... a-n
Fixed point notation using point as separator of integer
part and fractional n.m can be useful only for limited
range of numbers. Integer and fractional parts length must
be fixed to n and m. (Fixed Point Notation)
m
1
0
-1
-n
am·2 + ...+ a1·2 + a0·2 + a-1·2 + ... + a-n·2
101.11b = 22+20+2-1+2-2=4+1+0.5+0.25=5.75
100.001b = 4 + 1/8 = 4.125
Computer Literacy
Cezary Bolek <[email protected]>
5
Computer Literacy
Floating Point notation
Floating pint notation is often called
Scientific Notation.
where:
Scientific notation is convenient for storing
(writing) both very large and very small numbers.
X – mantissa (fractional part),
Y – exponent,
b – number system base - radix
Computer Literacy
149600000.0 = .149E9
-0.000076502 = -76.502e-6
= 0.149*109
= -76.502*10-6
Cezary Bolek <[email protected]>
6
Floating Point notation
Every number can be presented in
exponential form:
±X * bY,
149600000.0
-0.000076502
Cezary Bolek <[email protected]>
7
Computer Literacy
Cezary Bolek <[email protected]>
8
2
Normalized form
Floating Point Binary Notation
Normalized number form:
only one significant digit on the right
side of point.
General form: ± X*2Y
11010b = 0.1101b*25
-0.000000101b = -1.01b*2-7
149600000.0 = 1.49E8
-0.000076502 = -7.6502e-5
110100000.0b = 1.101b*28
-0.000000101b = -1.01b*2-7
For binary system:
11010 = 0.1101*10101
-0.000000101 = -1.01*10-111
Computer Literacy
Cezary Bolek <[email protected]>
In binary system, mantissa of normalized
numbers always has digit 1 on the right side of
separating point.
9
Computer Literacy
Code IEEE754
(-1)s * 1.f * 2e-127
IEEE754 (1985) - standard of floating point binary numbers
notation (and storage form)
Mantissa is stored without leading digit (1), what
makes notation more compact.
Following binary number form
* 1.f *
Exponent is shifted (increased by 127), what solves
the problem of negative numbers..
2e-127
110000001010000...000b =
-1 * 1.01b*2129-127 = -1.01b*22 = -5
can be stored in 32-bit as follows:
s
sign
1 bit
Computer Literacy
e
exponent
8 bits
10
IEEE754 cont.
IEEE – Institute of Electrical and Electronics Engineers
(-1)s
Cezary Bolek <[email protected]>
f
001100001000000...000b =
+1 * 1.0b*296-127 = 1 * 2-31 ≈ 4.65e-10
mantissa
23 bits
Cezary Bolek <[email protected]>
11
Computer Literacy
Cezary Bolek <[email protected]>
12
3
Text coding
Conversion Between Number Systems
Basic set of characters:
• Letters (lowercase and capitals)
• Punctuation marks and others
• Digits
• Space
Total: less than 100 characters
Numbers with finite expansion (finite length) in one number
system may have infinite expansion in another.
This results in irreversible loss of accuracy while conversion
is performed.
7 bits are needed to store single character.
0.13 = 0.33333333333333333333333310...
0.17 = 0.14285714285714285714285710...
0.110 = 0,00011001100110011001100112...
Computer Literacy
Cezary Bolek <[email protected]>
Practically, to keep in 8-bit standard, 1 byte is used to
store single character.
Question: What are the values that should be assigned to
specific characters? Coding standard.
13
Computer Literacy
ASCII Code
Cezary Bolek <[email protected]>
14
ASCII cont.
ASCII – The American Standard Code for Information
Interchange
0..31 (0..1Fh) – terminal control codes
bs
lf
cr
esc
ASCII is 7-bit code fixed and accepted as a standard in
1968. ASCII code was significant step in achievement of
basic compatibility in text data exchange between
different computer systems.
– back space – move caret back
– line feed – move caret down
– carriage return – move caret to the beginning of line
– escape – special code signaling
32 (20h)
– space
48..57 (30..39h) – codes of digits (0..9)
65..90 (41..5Ah) – codes of capital letters (A..Z)
97..122 (61..7Ah) – codes of small letters (a..z)
Computer Literacy
Cezary Bolek <[email protected]>
15
Computer Literacy
Cezary Bolek <[email protected]>
16
4
Codepages
ASCII in practice
Dzień
i noc.
ASCII code does not define characters and corresponding
codes above 127.
Undefined ASCII codes are assigned to national (German,
French, etc.) letters and to some special characters (©, ®, «,
and others) that are not defined in basic ASCII set.
44 7A 69 65 F1 0D 0A 69 20 6E 6F 63 2E
D z
i e ń
Code Pages define this assignment (codes 128 to 255).
i
cr
lf
n o
c .
space
character code > 128
“ń” letter (cp1250)
Some Polish Code Pages:
DOS
– cp852, Mazovia, ...
Windows
– cp1250
Unix, Internet – iso-8859-2
Computer Literacy
Cezary Bolek <[email protected]>
New line coding standards:
DOS, Windows:
cr+lf
Unix:
lf
Mac OS:
cr
17
Unicode
ASCII
–
Computer Literacy
Cezary Bolek <[email protected]>
18
Unicode - UTF (Unicode Transformation Format)
limited standard + national characters coding problems
(many codepages)
Unicode
characters can be stored in file using different
methods for 4-byte character coding
Solution:
to gather in a single coding set all the letters (characters) met in every language in
the world (European languages, Cyrillic, Hebrew, Arabic, Japanese, Chinese,
hieroglyphs, etc.) and various symbols (technical, medical, musical, pronunciation,
etc.).
This
methods are called: UTF - Unicode Transformation
Format
UTF-32 (UCS-4) – fully 4-byte coding with no tranformation
Unicode – standard for text coding based on 32-bit length (4B).
Version 5.2 of Unicode defines 107,296 various characters.
UTF-16 – 2-byte sequence coding, basic character are coded based
on 2-byte sequence, additional symbols use 4 bytes
UTF-8 – variable length, 8-bit coding; single character can be stored
in 1 to 6 bytes
Every Unicode code unambiguously identifies symbol stored in any computer in
the world.
Unicode standard allows to use in a single text document any contemporary
language specific characters (and historical).
UTF-7 – variable length, 7-bit coding; designed for mail systems,
presently almost not in use
For compatibility with ASCII code first 128 Unicode codes are the same in both
standards.
http://www.unicode.org
Computer Literacy
Cezary Bolek <[email protected]>
19
Computer Literacy
Cezary Bolek <[email protected]>
20
5
Unicode cont.
Unicode UTF-8 - sample
UTF-7 (7-bit format) i UTF-8 (8-bit format):
ASCII characters coded with no differences; only codes that are greater
than 127 are being modified (text in Polish language are only a dozen
percent larger than coded in non-Unicode, ASCII based standards)
w zaleŜności
UTF-8:
UTF-16 and UTF-32 – formats has two flavors, the difference relies on
the order of bytes in word:
•Little Endian – less significant byte goes first (Windows)
•Big Endian – more significant byte goes first
77 20 7A 61 6C 65 C5 BC 6E 6F C5 9B 63 69
w
z a
l
e
Ŝ
space
ś
n o
c
i
Polish character
ś
Polish character
Ŝ
! Not all systems nor programs support Unicode
! Unicode coded text can be even more than twice as big as ASCII
coded
Computer Literacy
Cezary Bolek <[email protected]>
21
Computer Literacy
Cezary Bolek <[email protected]>
Sound coding
22
Sound coding
Sound recording system
Sound: air particles vibrations (oscillations)
that are received by microphone membrane
as changes of pressure.
ADC
microphone analog/digital
converter
Vibration amplitude: sound volume
Vibration frequency: sound pitch
Sampling: continuous quantity measuring in
certain, fixed time slots.
Computer Literacy
RAM
DAC
memory
digital/analog
converter
speaker
Compact Disc Standard:
16-bit ADC/DAC converters (each sample: 2B)
sampling frequency: 44kHz (sample every 23µs)
1s of sound (mono) ≈ 86kB, 1h ≈ 302MB
Sampling frequency must be higher than
frequency of signal being sampled (at least
twice).
Cezary Bolek <[email protected]>
memory
Sound playing system
odchylenie
membrany
czas
RAM
23
Computer Literacy
Cezary Bolek <[email protected]>
24
6
Image coding
Image coding
Color image – single pixel: 1B, 2B, 3B lub 4B
8-bit color depth
– 256 colors
16-bit color depth
– 65536 colors
24,32-bit color depth
– 16.7 million colors
Monochrome image – every image point (pixel) is coded
with one bit.
1 byte
1 byte
image line length
c
de olo
pt r
h
00000000 00000000
00000000 00000000
00000000 00000000
00000000 00000000
00011100 01001000
00100010 01010000
00100010 01100000
00100010 01100000
00100010 01010000
00100010 01001000
00100010 01000100
00011100 01000100
00000000 00000000
00000000 00000000
00000000 00000000
00000000 00000000
image memory:
16p x 16p x 1B = 256B
vertical
resolution
Image lines
image memory:
16 lines x 2B = 32B
640x480x8bpp = 300kB
1024x768x8bpp=768kB
640x480x24bpp = 900kB
1024x768x24bpp=2.25MB
horizontal resolution
Computer Literacy
Cezary Bolek <[email protected]>
25
Computer Literacy
Data compression
26
The most often used lossless compression methods:
vocabulary methods, which are based on searching and replacing exact
occurrences of certain bit or characters sequences and thus replacing
them by data that needs less storage space, e.g. replacing word “horn” by
shorter sequence bits than are needed to code (store) 4 independent
characters. On other hand, knowledge of “horn” coding sequence doesn’t
allow to code “hornet” word, which has nothing in common with “horn”.
Lossless compression:
it is possible to reconstruct original data based on its
compressed form
(executable files compression, texts, graphics)
ZIP, GIF
statistical methods, which use shorter sequences of bits to replace
symbols that occur more frequently in data being coded.
For instance: “ch” sequence (pretty common in English) needs less data to
be stored than two separated characters: “c” and “h”.
Loss compression:
it is not possible to reconstruct original data but decompressed
data is acceptably similar to original,
(multimedia data compression: images, video, sound)
JPG, MP3
Cezary Bolek <[email protected]>
Cezary Bolek <[email protected]>
Lossless compression
Data Compression – data size reduction by removing the
redundancy.
Computer Literacy
bpp – bits per pixel
27
Computer Literacy
Cezary Bolek <[email protected]>
28
7
Loss compression
JPEG
Loss compression is possible due to construction and the way human senses
function.
Human senses sensitivity depend on characteristic of information being received.
Compression algorithms in most cases use:
psychoacoustic models,
psychovisual models
less important data, information about sound or image (from human senses point
of view) are neglected. The only information is stored or transferred that is
characterized by high sensitivity and influence on human senses reception.
Original
image
Compression
75%
There are no loss compression algorithms that can applied to any type of data (it
is not possible to compress executable EXE files based on loss algorithms
because they are interpreted by machine, not by human senses).
Image:
•JPEG,
Video:
•DivX, MPEG, Real Video
Audio:
•MP3, OGG Vorbis, Real Audio,
Computer Literacy
Compression
95%
Cezary Bolek <[email protected]>
29
Computer Literacy
Cezary Bolek <[email protected]>
30
Motto
There are 10 kinds of people:
those who understand binary
and those who do not.
Computer Literacy
Cezary Bolek <[email protected]>
31
8