Memory and Characters

FY04: Introduction to the use of computers
jennifer george
Acknowledgement

Jeremy Gow
jennifer george
1
Last Week Lecture
Mass storage: hard disks, optical, flash
 Huge increases in capacity over years
 Filesystems

 Files and directories
 Unix, OS X and Windows all different
 Windows also uses drives
 Can be shared over network
jennifer george
Last week’s Lab
Linux Server
 PuTTY
 SSH
 VNC
 emacs

jennifer george
2
jennifer george
Today

Measuring digital data
 Bits
 Bytes
 Kilobytes
 Megabytes
 ...

SI and Binary units
jennifer george
3
More for today
Binary files
 Hexadecimal
 Text files
 Character sets
 Text encodings
 ASCII, Unicode

jennifer george
The Analogue World

Information is continuous (smoothly,
without breaks)
jennifer george
4
The Digital World
Information is discontinuous (broken
into chunks)
 Modern computing is digital (not
analogue)

jennifer george
Bits: The foundation of digital
computing

A bit is smallest possible chunk of
information
 the difference between two possibilities
 on/off, up/down, yes/no, heads/tails...
Traditionally 0 or 1 (Binary digIT)
 Unit of storage (written b)

 Space used to store something as 0s and 1s
jennifer george
5
Everything digital is made of
bits
jennifer george
Bytes

A byte is 8 bits
Written B, so 8b = 1B
 Unit of storage

 This image is 7395b, about 924B

Related units
 nybble: 4 bits (0.5 bytes)
 crumb: 2 bits (0.25 bytes)
jennifer george
6
Binary: Numbers as bits

Representing numbers using bits

117 = 64 + 32 + 16 + 4 + 1

A full byte is 255 = 128 + 64 + 32 + 16 +
8+4+2+1
jennifer george
Binary: Powers of 2

Binary based on powers of 2

117 = 26 + 25 + 24 + 22 + 20

A full byte is (28 - 1) = 27 + 26 + 25 + 24 +
23 + 2 2 + 2 1 + 2 0
jennifer george
7
Group exercise:
Your Age in Binary
In groups of 4 or 5
 Work out your individual ages in
binary
 Work out your combined age in binary


I’m 100001 (tomorrow I’ll be 100010)
jennifer george
The Kilobyte (kB)
1000 bytes
 8000 bits
 Half a page of text
 A small icon
 About 7 magnetic swipe cards

jennifer george
8
The Megabyte (MB)
One millon bytes (1,000,000 = 106)
 1000 kilobytes
 A thick book
 A minute of MP3 (128 kb/s)
 6 sec of CD audio
 A digital photo (a few MB)

jennifer george
The Gigabyte (GB)
One billion bytes (1,000,000,000 = 109)
 1000 megabytes
 TV quality film (a few GB)
 17 hours of MP3 (128kb/s)
 English Wikipedia (2.7 GB)
 The Human Genome (3 GB)

jennifer george
9
The Terabyte (TB)
One trillion bytes (1,000,000,000,000 =
1012)
 1000 gigabytes
 Library of Congress (20TB of text)
 YouTube (600 TB in 2006)

jennifer george
The Petabyte (PB)
One quadrillion bytes
(1,000,000,000,000,000 = 1015)
 1000 terabytes
 Large Hadron Collider (15
 PB/year)
 Google storage (??? PB)
 All printed material (200 PB)

jennifer george
10
Beyond the Petabyte

Exabyte (1018)
 A year of US telephone calls (9.25 EB)

Zettabyte (1021)
 All electronic data (1.8 ZB by 2011)
 1 gram of DNA (2.25 ZB)
 “All words ever spoken” as 32kb/s audio (42
ZB)

Yottabyte (1024)
 The internet?
jennifer george
Group exercise
How much data do you own?
In groups of 3 or 4
 Estimate how much digital data you
each own
 Photos, music etc.

 What takes up the most space?
Laptops, iPods, phones...
 1 GB = 1000 MB
 1 MB = 1000 kB

jennifer george
11
SI Prefixes
Le Système International d'Unités
 Many uses: kilobits, kilobytes,
kilometres, ...
 1 kilobyte = 1000 bytes

jennifer george
Binary Prefixes

Based on powers of 2 (like binary)
 Used for data only
 More convenient when using binary
addresses

1 kilobyte = 1024 bytes
jennifer george
12
SI versus Binary

Each unit now has two different
meanings
 Is a kilobyte 1000 or 1024 bits?

Binary kB 2.4% larger than SI kB
jennifer george
IEC Binary Prefixes
Attempt in 1999 to resolve ambiguity
 Rename binary prefixes (for bytes only)

 kilobyte becomes kibibyte
jennifer george
13
Binary files
Files are zeros and ones (grouped into
bytes)
 Designed to be interpreted in some
way

 Text (bytes → characters)
 Image (bytes → pixels)
 MP3 files (bytes → sounds)
 ...

Each uses a different encoding (stuff
→ bytes)
jennifer george
Binary: Numbers as bits

Representing numbers using bits

117 = 64 + 32 + 16 + 4 + 1

A full byte is 255 = 128 + 64 + 32 + 16 +
8+4+2+1
jennifer george
14
Hexadecimal
Binary for humans
Binary is hard for people to read & write
 Can translate to hexadecimal (base-16)


01111010 →7A
jennifer george
Hexadecimal
Converting to and from binary

Each hexadigit represents four bits

Two hexadigits is one byte, e.g. 7A →
0111 1010
jennifer george
15
Hexadecimal
Example
jennifer george
Text files




Text files contain a sequence of characters
 e.g. emails, web pages, ...
They are binary files + a text encoding
Encoding defines byte for each character
Encodings may have different character sets
jennifer george
16
ASCII
Character set
American Standard Code for
Information Interchange
 128 characters
 Printing characters (inc. space)

 !”#$%&’()*+,-
./0123456789:;<=>?@ABCDEFGHIJKLMN
OPQRSTUVWXYZ[\]
^_`abcdefghijklmnopqrstuvwxyz{|}~
32 control characters
 Tab, line feed, bell, ... (mostly obsolete)

jennifer george
ASCII
Encoding
A character is a single byte
 Printing characters...

jennifer george
17
ASCII
Example
jennifer george
Unicode
Universal Character Set
Over 100,000 characters
 From world and historical scripts

Alphabetic characters
 Technical & mathematical symbols
 Combination characters (ligatures,
accents)
 Control characters (new line etc.)

jennifer george
18
Unicode
http://unicode.org/charts/
jennifer george
Unicode
Latin characters
jennifer george
19
Unicode
Arabic characters
jennifer george
Unicode
CJK characters
jennifer george
20
Unicode
Georgian characters
jennifer george
Unicode
Choice of encodings

UCS-4 (simple)
 4 bytes per character

UTF-16 (e.g. Windows)
 Usually 2 bytes, some use 4

UTF-8 (e.g. Unix)
 ASCII characters need 1 byte (compatible!)
 Others need 2, 3 or 4 bytes
jennifer george
21
Text encoding
Example

Encode the string “£4 = €5”
jennifer george
Word processing files

Word processing applications
 Microsoft Word, Open Office Writer, Pages,
Star Office, Abiword, KWord, ...

Used to represent text, but
 large amounts of formatting information
 include graphics, charts and more
 don’t usually use standard text encoding
jennifer george
22
Group activity
Your name in binary (ASCII encoding)
jennifer george
Summary
Binary files
 Hexadecimal makes binary easier to
read
 Text files

 = binary file + text encoding
 Encodings have different character sets
 ASCII and Unicode

Reading: Brookshear §1.4
jennifer george
23
Reading
http://en.wikipedia.org/wiki/Orders_of_m
agnitude_(data)
 http://en.wikipedia.org/wiki/Binary_prefix

jennifer george
24