Lecture 3: Chinese Character Output

Chinese Character Output
• Character字符: abstract object recognized by
human in communication, it is the
representation at the conceptual level. Control
characters in computer internal code is not
considered characters
• Glyph字形: character in its concrete form
without regards to thickness, style, size, and
the computer internal representation(bitmap,
outline, etc)
• Font (font set)字體/字型庫: specific form of
character with all computer internal
representation attributes
Lecture 9
1
• The three levels of representation
Image
圖像
Font
字型
External
Representation
外部表示
Rendering
Glyph
字形
GID
(Glyph ID)
Document
Description
Association
Character
字符
Human perception
Lecture 9
Code
Internal
Representation
內部表示
2
Lecture 9
3
Lecture 9
4
Glyph Representation: Bitmaps
• A matrix of 1s and 0s to represent a character
• Typical monitor display a character using a 16 x 16 bitmap
Total Chars 87 x 94
8,178
Type
Size
Storage(est)
Simple
16 x 16
262k
Common
24 x 24
589k
Common
32 x 32
1M
Detailed
64 x 64
4M
Detailed
96 x 96
8M
Detailed
128 x 128
16M
Detailed
256 x 256
64M
• Typical sizes and storage demand are shown
• (not double size => quadruple storage)
• Data compression(a lot of empty space)
Lecture 9
5
• Usually store small bitmaps and scale up but there are problems
with the quality of slanted edges
• Linear scaling: from Old(xold, yold) to New(xnew, ynew),
where 0 <= xold<= (WidthOLD -1), 0 <= yold<= (HeightOLD-1)
and 0 <= xnew<= (WidthNEW -1), 0 <= ynew<= (HeightNEW -1)
assuming Height and Width values are integers
• rx= WidthNEW/WidthOLD , ry=HeightNEW /HeightOLD
• If rx >1 and ry >1, then it is called scaling up
• New(xnew, ynew) = New(x * rx, y* ry) = Old(x , y )
Lecture 9
6
Smoothing techniques for scaling
• Ad Hoc Techniques (No underlying model but cheap):
– Enlargement (Matrix manipulation)
• Thresholding: convert into bitmap (assign 1 if >=
0.4 for unidirectional)
Lecture 9
7
• Smoothing spline (齒形) and interpolation嵌入法(costly)
– Basis: Character bitmaps are a coarse sample of the
original character
– Approach: Recover the curves of the character as
continuous functions (cubic spline) and then interpolate
or generate the bitmaps of another size
– Optimization: Minimize the unsmoothing
Lecture 9
8
Bezier Curves
• P(t) = (x(t), y(t)): any point
in the curve(0<= t <= 1)
• Cubic Bezier: 4 points
– end points coincide with curve
– other points control shape (can specify gradient at end points)
• X(t) =X0*(1-t)3 + 3* X1*(1-t)2*t + 3*X2*(1-t) *t2 + X3*t3
• Y(t) =Y0*(1-t)3 + 3* Y1*(1-t)2*t + 3*Y2*(1-t) *t2 + Y3*t3
Lecture 9
9
Glyph Representation: Outline
• Characters as shapes enclosed by lines or curves and
specify these by parameters (i.e. data as an ASCII file
and an interpreter to generate the graphic image)
• Line specified by 2 points
• Curve: (usually cubic Bezier) specified by 4 points
– end points coincide with curve
– other points control shape
Lecture 9
10
• Advantages comparing to bitmaps:
– Scaling does not affect quality (Major)
– Does not need to store different sized fonts (a
compression of extremely detailed/large fonts)
– Compression (as in standard text)
– Email transport without encoding and decoding
• Example of a Postscript for the Chinese Character 一:
Lecture 9
11
• Unit of measurements: 1 point = 1/72 of an inch and
the coordinates starts at the bottom left corner and
coordinate translation is needed.
• Postscript level 1 font(base font) can handle only up to
256 characters in each set.
• It maps 256 code into names of fonts in the set.
• Postscript Level 0 fonts: Composite Font
– Double byte encoding:
– 1st byte: index to base font
– 2nd byte: code in the particular base font
Lecture 9
12
• CID-keyed fonts(pp 288)
A technique to make character glyph definitions be
independent of codeset.
– Each character glyph is given a CID which uniquely
defines a glyph shape.
– A CMap is a file which contains mapping of character
encodings with glyphs(CID).
– A CIDFont file contains the pointers to the actual
descriptions of the glyphs. A CIDFont file usually keeps
character glyphs with the same style.
• Other outline fonts include: TrueType fonts and
OpenType. They different in the data structures/
header forms.
Lecture 9
13
Bitmap-to-Outline Conversion
• Determine outline for all the straight lines
• Generate curve list: a curve must begin and end in two
different corner (therefore needs to find corners:
compute an angle between two vector points along the
outline)
• Preprocessing for curve-fitting: knee removal, smooth
filtering to yield finer co-ordinates of sample points.
• Perform curve fitting: iterations try to improve fitting
goodness (measured as the least square error)
• End point alignment: close end points of two
consecutive splines are merged by averaging their
positions
Lecture
9
14
Lecture 9
15
Getting outline pixels through erosion
• Finding the outline of a bitmap is to find the pixel that is
located inside an object, but that has at least one neighbour
outside the object
• Basic idea
– Find the bitmap with its edge pixels
removed:erosion( a smaller cross)
– Original bitmap with the eroded
bitmap removed.
Lecture 9
16
• Need more mathematical terms and binary image operation
• Translation:The displacement in either the x direction, the y
direction or both at once. It is the reposition of the coordinate system.
• Suppose B is a binary image,
• Bxy means to move B by the
coordinates(x,y).
(x,y)
Translated
(0,0)
origin
Lecture 9
17
• Erosion of B(a bitmap): is a set of coordinates (x,y)
such that S translated by (x,y), is contained in B.
• E = B ⊕ S = {(x,y) | Sxy  B}
• S(4 pixels of blacks):
• Against
• and their rotations
• Returns all the points in B whose neighbors are not
the boarder (edge) pixels.
Lecture 9
18
• Outline pixels:
• B - (B
Lecture 9
S)
19