Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage 日 本 語 (with an Innovative Interfaces Millennium local system) The University of Washington Law Library’s Decision-making Process Differences in Storage and/or Export Settings With Different Local Systems 日 本 語 Your Mileage May Vary It’s important to note that different local systems vary widely in whether and how data is stored, imported and exported. These differences will have a huge impact on the experience of librarians making decisions on whether or not to export records in Unicode from OCLC to the local system. Innovative Interfaces Millennium Local Systems Do not allow import of records encoded differently than the encoding for storage. In other words, If III storage is set to Unicode, records must be imported from OCLC in Unicode. If storage is set to MARC 8, records must be imported in MARC 8 Voyager Local Systems (CJK version) Can be set to convert imported MARC 8 records to Unicode on-the-fly for storage. This makes the decision about exporting from OCLC Connexion in Unicode VS MARC 8 less important (almost irrelevant) Other Local Systems? Local systems that store data in MARC 8 cannot import and display Unicode records unless they convert the records to MARC 8. Conversely, local systems storing data in Unicode cannot import MARC 8 records unless the data is converted to Unicode. Ask these questions about your local system: What encoding is used for storage? Is there a required encoding for imported records? If not, are imported records automatically converted to the appropriate encoding for storage? To switch, or not to switch… Marian Gould Gallagher Law Library Our Library is trying to decide… OCLC Connexion Japanese Records 日 本 語 Innovative Interfaces Millennium System MARC 8 OR Unicode Storage?? Unicode VS MARC 8 Basics Computers store text as numeric codes. Unicode has become the standard for text storage worldwide. Its use facilitates the storage, transfer, and display of text in a wide range of computer software environments (the internet, databases, browsers, word processors, etc) What is MARC 8? MARC 8 has been the North American Library Community’s text storage standard. (“The group of 7/8-bit and 24-bit character sets used to encode MARC 21 records. These sets are specified in MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media, Character Sets, Part 1.”1) What is Unicode? Unicode has become the international standard for text storage. “The Universal Character Set (UCS) which is ISO 10646 and its industry counterpart Unicode.”1 1Source: LC’s “MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media: CHARACTER SETS” http://www.loc.gov/marc/specifications/speccharintro.html 日 本 語 What Problems are Specific to Japanese? Q: Do Some Problems associated with Unicode vs MARC 8 storage affect one language (such as Japanese) more than others? A: Not Really. Problems with character display for specific languages are more often an issue of font availability. Each application must have access to a font that will display the proper characters. Arial Unicode MS can display most Unicode characters. In library records, an additional issue is converting between MARC 8 and Unicode. But these issues can affect many languages and scripts; not just Japanese. 日 本 語 What Problems are Specific to Japanese? Q: So are there any Japanese-specific problems? A: Not when it comes to Unicode storage itself. But there are common problems with display of Kanji and Japanese romanization in library catalogs. These are mainly font-availability issues, not Unicode storage issues. Examples of Font-based Problems Specific to Japanese Romanization (Diacritic Problem) • “Alif” as in kon’in 婚姻 Kanji Examples of Japanese Kanji not in EACC (Different Unicode Code Point Required for Verified Catalog Record in OCLC) • • • • • MARC 8/ EACC: 說 (U+8AAA) instead of 説 (U+8AAC) MARC 8/ EACC: 虛 (U+865B) instead of 虚 (U+865A) MARC 8/ EACC: 卷 (U+5377) instead of 巻 (U+5DFB) MARC 8/ EACC: 錄 (U+9304) instead of 録 (U+9332) MARC 8/ EACC: 查 (U+67E5) instead of 査 (U+67FB) 日 本 語 What Problems are Specific to Japanese? Why Switch to Unicode Storage? Q: If there are no problems with MARC 8 storage specific to Japanese, then why should our library switch to Unicode storage? A: Consider this quote from Microsoft: “Deciding whether to store non-DBCS [doublebyte character set] data as Unicode is generally determined by an awareness of the effects on storage, and about how much sorting, conversion, and possible data corruption might happen during client interactions with the data. .. However, for most applications the effect is negligible. Databases with well-designed indexes are especially unlikely to be affected… 日 本 語 What Problems are Specific to Japanese? Why Switch to Unicode Storage? A: (continued) Most of the time, the decision to store character data, even non-DBCS data, in Unicode should be based more on business needs instead of performance. In a global economy that is encouraged by rapid growth in Internet traffic, it is becoming more important than ever to support client computers that are running different locales. Additionally, it is becoming increasingly difficult to pick a single code page that supports all the characters required by a worldwide audience.” 2 2See the Microsoft article “Storage and Performance Effects of Unicode” : http://msdn2.microsoft.com/enus/library/ms189617.aspx 日 本 語 What are the Pros and Cons to Converting our Local System to “Unicode Storage”? Advantages of Staying with MARC 8 May not be possible to “back out” of switch to Unicode if problems crop up Your records have No risk of being damaged Could be faster than Unicode (but probably is not) In a phrase: “If it ain’t broke, don’t fix it!” Advantages of Switching to Unicode Could enhance data exchange capabilities • Export/Import • Copy/Paste between Applications • Network printing Allows for display of your records in a wide variety of worldwide computing environments May improve some long-standing problems with local system software (such as printing, display) Supporting the international Unicode standard is one of presenting your library catalog as a global resource “Nothing ventured, nothing gained!” 日 本 語 Who OCLC OCLC Connexion Connexion OCLC Connexion OCLC Connexion decides whether to flip… MARC 8 Storage 日 本 語 …the switch to Unicode Storage? In our library: The Head of Technical Services Unicode Storage • Main contact with Innovative • Requests information about successes/problems at other libraries East Asian Law Department • Responsible for Chinese, Japanese, and Korean records • Work together with Tech Services Gallagher Law Library Local System an Innovative Interfaces, Inc. Millennium local system What will our library do? Undetermined! Our library is still in the decision process We’re considering all of the information noted in this presentation We will probably decide soon! University of Washington Marian Gould Gallagher Law Library 日 本 語 What sources of information are there? Your Local System Guides Library of Congress Guides Such as: LC’s “MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media: CHARACTER SETS” http://www.loc.gov/marc/specifications/speccharintro.html OCLC CJK Help Microsoft Guides Such as: “Storage and Performance Effects of Unicode” : http://msdn2.microsoft.com/enus/library/ms189617.aspx Unicode Consortium http://www.unicode.org/ OCLC CJK listserv Eastlib listserv 日 本 語 Flipping the switch… MARC 8 Storage Unicode Storage Is up to you and Your Library…
© Copyright 2026 Paperzz