Japanese Records and Whether or not to Switch from EACC to

Japanese Records and
Whether or not to Switch from
MARC 8 to Unicode Storage
日
本
語
(with an Innovative Interfaces Millennium local system)
The University of Washington Law
Library’s Decision-making Process
Differences in Storage and/or Export
Settings With Different Local Systems




日
本
語
Your Mileage May Vary
It’s important to note that different local systems vary widely in whether and
how data is stored, imported and exported. These differences will have a
huge impact on the experience of librarians making decisions on whether or
not to export records in Unicode from OCLC to the local system.
Innovative Interfaces Millennium Local Systems
Do not allow import of records encoded differently than the encoding for
storage. In other words, If III storage is set to Unicode, records must be
imported from OCLC in Unicode. If storage is set to MARC 8, records must
be imported in MARC 8
Voyager Local Systems (CJK version)
Can be set to convert imported MARC 8 records to Unicode on-the-fly for
storage. This makes the decision about exporting from OCLC Connexion in
Unicode VS MARC 8 less important (almost irrelevant)
Other Local Systems?
Local systems that store data in MARC 8 cannot import and display Unicode
records unless they convert the records to MARC 8. Conversely, local
systems storing data in Unicode cannot import MARC 8 records unless the
data is converted to Unicode.
Ask these questions about your local system:



What encoding is used for storage?
Is there a required encoding for imported records?
If not, are imported records automatically converted to the appropriate encoding for
storage?
To switch, or not to switch…
Marian Gould Gallagher Law Library

Our Library is trying to decide…
OCLC Connexion
Japanese Records
日
本
語
Innovative Interfaces
Millennium System
MARC 8
OR
Unicode Storage??
Unicode VS MARC 8 Basics


Computers store text as numeric codes. Unicode has
become the standard for text storage worldwide. Its use
facilitates the storage, transfer, and display of text in a
wide range of computer software environments (the
internet, databases, browsers, word processors, etc)
What is MARC 8?
MARC 8 has been the North American Library
Community’s text storage standard.
(“The group of 7/8-bit and 24-bit character sets used to encode MARC
21 records. These sets are specified in MARC 21 Specifications for
Record Structure, Character Sets, and Exchange Media, Character
Sets, Part 1.”1)

What is Unicode?
Unicode has become the international standard for text
storage.
“The Universal Character Set (UCS) which is ISO 10646 and its
industry counterpart Unicode.”1
1Source:
LC’s “MARC 21 Specifications for Record Structure, Character
Sets, and Exchange Media: CHARACTER SETS”
http://www.loc.gov/marc/specifications/speccharintro.html
日
本
語
What Problems are Specific to Japanese?
Q: Do Some Problems associated with Unicode vs
MARC 8 storage affect one language (such as
Japanese) more than others?
A: Not Really. Problems with character display for
specific languages are more often an issue of
font availability. Each application must have
access to a font that will display the proper
characters. Arial Unicode MS can display most
Unicode characters. In library records, an
additional issue is converting between MARC 8
and Unicode.
But these issues can affect many languages and
scripts; not just Japanese.
日
本
語
What Problems are Specific to Japanese?
Q: So are there any Japanese-specific problems?
A: Not when it comes to Unicode storage itself. But
there are common problems with display of Kanji
and Japanese romanization in library catalogs.
These are mainly font-availability issues, not
Unicode storage issues.
Examples of Font-based Problems Specific to
Japanese

Romanization (Diacritic Problem)
• “Alif” as in kon’in 婚姻

Kanji
Examples of Japanese Kanji not in EACC
(Different Unicode Code Point Required for Verified
Catalog Record in OCLC)
•
•
•
•
•
MARC 8/ EACC: 說 (U+8AAA) instead of 説 (U+8AAC)
MARC 8/ EACC: 虛 (U+865B) instead of 虚 (U+865A)
MARC 8/ EACC: 卷 (U+5377) instead of 巻 (U+5DFB)
MARC 8/ EACC: 錄 (U+9304) instead of 録 (U+9332)
MARC 8/ EACC: 查 (U+67E5) instead of 査 (U+67FB)
日
本
語
What Problems are
Specific to Japanese?
Why Switch to Unicode Storage?
Q: If there are no problems with MARC 8 storage
specific to Japanese, then why should our
library switch to Unicode storage?
A: Consider this quote from Microsoft:
“Deciding whether to store non-DBCS [doublebyte character set] data as Unicode is generally
determined by an awareness of the effects on
storage, and about how much sorting,
conversion, and possible data corruption might
happen during client interactions with the data.
.. However, for most applications the effect is
negligible. Databases with well-designed
indexes are especially unlikely to be affected…
日
本
語
What Problems are Specific to
Japanese?
Why Switch to Unicode Storage?
A: (continued) Most of the time, the decision to store
character data, even non-DBCS data, in Unicode
should be based more on business needs
instead of performance. In a global economy that
is encouraged by rapid growth in Internet traffic, it
is becoming more important than ever to support
client computers that are running different
locales. Additionally, it is becoming increasingly
difficult to pick a single code page that supports
all the characters required by a worldwide
audience.” 2
2See
the Microsoft article “Storage and Performance
Effects of Unicode” : http://msdn2.microsoft.com/enus/library/ms189617.aspx
日
本
語
What are the Pros and Cons to Converting our
Local System to “Unicode Storage”?

Advantages of Staying with MARC 8





May not be possible to “back out” of switch to Unicode if
problems crop up
Your records have No risk of being damaged
Could be faster than Unicode (but probably is not)
In a phrase: “If it ain’t broke, don’t fix it!”
Advantages of Switching to Unicode

Could enhance data exchange capabilities
• Export/Import
• Copy/Paste between Applications
• Network printing




Allows for display of your records in a wide variety of worldwide computing environments
May improve some long-standing problems with local
system software (such as printing, display)
Supporting the international Unicode standard is one of
presenting your library catalog as a global resource
“Nothing ventured, nothing gained!”
日
本
語
Who
OCLC
OCLC Connexion
Connexion
OCLC Connexion
OCLC Connexion
decides whether to flip…
MARC 8
Storage
日
本
語
…the switch to
Unicode Storage?

In our library:

The Head of Technical
Services
Unicode
Storage
• Main contact with Innovative
• Requests information about
successes/problems at other
libraries

East Asian Law Department
• Responsible for Chinese,
Japanese, and Korean records
• Work together with Tech Services
Gallagher Law
Library
Local System
an Innovative Interfaces, Inc.
Millennium local system
What will our library do?

Undetermined!
Our library is still in the
decision process
 We’re considering all
of the information
noted in this
presentation
 We will probably
decide soon!
University of Washington Marian Gould Gallagher Law Library
日
本
語
What sources of information are there?


Your Local System Guides
Library of Congress Guides
Such as: LC’s “MARC 21 Specifications for Record Structure,
Character Sets, and Exchange Media: CHARACTER SETS”
http://www.loc.gov/marc/specifications/speccharintro.html


OCLC CJK Help
Microsoft Guides
Such as: “Storage and Performance Effects of
Unicode” : http://msdn2.microsoft.com/enus/library/ms189617.aspx



Unicode Consortium
http://www.unicode.org/
OCLC CJK listserv
Eastlib listserv
日
本
語
Flipping the switch…
MARC 8
Storage
Unicode
Storage
Is up to you and Your
Library…