What is Unicode?

Unicode@sap
Unicode SAP Systems
NW AS Internationalization
SupportedlanguagesinUnicode.doc
09.05.2007
© Copyright 2006 SAP AG. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or for any purpose without the
express permission of SAP AG. The information contained herein may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain proprietary software
components of other software vendors.
Microsoft, Windows, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation.
IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390,
OS/400, iSeries, pSeries, xSeries, zSeries, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli,
and Informix are trademarks or registered trademarks of IBM Corporation in the United States and/or
other countries.
Oracle is a registered trademark of Oracle Corporation.
UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.
Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are
trademarks or registered trademarks of Citrix Systems, Inc.
HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web
Consortium, Massachusetts Institute of Technology.
Java is a registered trademark of Sun Microsystems, Inc.
JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology
invented and implemented by Netscape.
MaxDB is a trademark of MySQL AB, Sweden.
SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver and other SAP products and services
mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP
AG in Germany and in several other countries all over the world. All other product and service names
mentioned are the trademarks of their respective companies. Data contained in this document serves
informational purposes only. National product specifications
may vary.
The information in this document is proprietary to SAP. No part of this document may be reproduced,
copied, or transmitted in any form or for any purpose without the express prior written permission of
SAP AG.
This document is a preliminary version and not subject to your license agreement or any other
agreement with SAP. This document contains only intended strategies, developments, and
functionalities of the SAP® product and is not intended to be binding upon SAP to any particular
course of business, product strategy, and/or development. Please note that this document is subject to
change and may be changed by SAP at any time without notice.
SAP assumes no responsibility for errors or omissions in this document. SAP does not warrant the
accuracy or completeness of the information, text, graphics, links, or other items contained within this
material. This document is provided without a warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability, fitness for a particular purpose, or noninfringement.
SAP shall have no liability for damages of any kind including without limitation direct, special, indirect,
or consequential damages that may result from the use
of these materials. This limitation shall not apply in cases of intent or gross negligence.
The statutory liability for personal injury and defective products is not affected. SAP has no control
over the information that you may access through the use
of hot links contained in these materials and does not endorse your use of third-party Web pages nor
provide any warranty whatsoever relating to third-party Web pages.
SAP AG
2
SupportedlanguagesinUnicode.doc
Icon
09.05.2007
Meaning
Caution
Background
Function
Example
Tip
Recommendation
Syntax
Typographic Conventions
Format
Beschreibung
Beispieltext
Words or characters that appear on the screen. These include field
names, screen titles, pushbuttons as well as menu names, paths and
options.
Cross-references to other documentation
Beispieltext
Emphasized words or phrases in body text, titles of graphics and tables
BEISPIELTEXT
Names of elements in the system. These include report names,
program names, transaction codes, table names, and individual key
words of a programming language, when surrounded by body text, for
example, SELECT and INCLUDE.
Beispieltext
Screen output. This includes file and directory names and their paths,
messages, names of variables and parameters, source code as well as
names of installation, upgrade and database tools.
Beispieltext
Exact user entry. These are words or characters that you enter in the
system exactly as they appear in the documentation.
<Beispieltext>
Variable user entry. Pointed brackets indicate that you replace these
words and characters with appropriate entries.
BEISPIELTEXT
Keys on the keyboard, for example, function keys (such as F2) or the
ENTER key
SAP AG
A
the keyboard key A
<A>
the keystroke A
/A/
the character A
[A]
the glyph A
0x41
the byte sequence in hexadecimal notation
i18n
Internationally used abbreviation of the term 'Internationalization'
3
SupportedlanguagesinUnicode.doc
09.05.2007
Introduction .............................................................................................................................................. 5
What is Unicode? ................................................................................................................................. 5
Language Configurations in SAP Systems.......................................................................................... 2
Availability of Unicode mySAP Solutions ............................................................................................. 3
Basic Information ..................................................................................................................................... 3
Programming Languages ........................................................................................................................ 3
ABAP Programs............................................................................................................................ 4
C and C++ Programs.................................................................................................................... 5
XML............................................................................................................................................... 5
Java .............................................................................................................................................. 5
Unicode Character Encodings and Byte Length.................................................................................. 6
Hardware Requirements .................................................................................................................. 7
Database and Platform Support ....................................................................................................... 8
Frontend Settings in the Unicode System ........................................................................................... 9
SAP GUI Support.......................................................................................................................... 9
Frontend Requirements.................................................................................................................. 10
Code Pages ................................................................................................................................ 10
Font Selection............................................................................................................................. 10
Locales ........................................................................................................................................... 11
International Components for Unicode (ICU) ............................................................................. 11
Unicode SAP Systems: After the System Installation .................................................................... 11
Language Installation.................................................................................................................. 12
Logon Language......................................................................................................................... 14
Translation Import....................................................................................................................... 14
Printing in Unicode SAP Systems .................................................................................................. 14
Supported Device Types ............................................................................................................ 15
Communication within Multilingual System Landscapes ................................................................... 15
Data Transfer in a Unicode/non-Unicode System Landscape ................................................... 16
Transport between Unicode and non-Unicode SAP Systems.................................................... 17
RFC Library ................................................................................................................................ 18
From non-Unicode SAP System to Unicode .................................................................................. 19
Appendix ............................................................................................................................................ 19
Documentation ............................................................................................................................... 19
New Installation of Unicode SAP Systems ................................................................................. 19
Conversion of SAP Single Code Page Systems and SAP MDMP Systems to Unicode............ 19
Further Information......................................................................................................................... 20
Contacts ......................................................................................................................................... 20
SAP AG
4
SupportedlanguagesinUnicode.doc
09.05.2007
Introduction
With SAP NetWeaver™, Unicode is the solution for multilingual SAP systems. This document is
intended for managers and consultants who require detailed information about technical requirements,
language availability and import, the integration process of Unicode Systems in an existing SAP
System landscapes and ways to your Unicode SAP System.
This document is constantly being revised. Please make sure that you always use the most current
version which can be downloaded from SAP Note 73606.
What is Unicode?
Unicode (and the parallel ISO 10646 standard) defines the character set necessary for efficiently
processing text in any language and for maintaining text data integrity. In addition to global character
coverage, the Unicode standard is unique among character set standards because it also defines data
and algorithms for efficient and consistent text processing. This enables high-level processing and
ensures that all conformant software produces the same results. The widespread adoption of Unicode
over the last decade made text data truly portable and formed a cornerstone of the Internet.
What is the Business Value?
Globalized software, based on Unicode, maximizes market reach and minimizes cost. Globalized
software is built and installed once and yet handles text for and from users worldwide and
accommodates their cultural conventions. It minimizes cost by eliminating per-language builds,
installations, and maintenance updates.
Who needs Unicode?
Computer users who deal with multilingual text -- business people, linguists, researchers, scientists,
and others - will find that the Unicode Standard greatly simplifies their work. Mathematicians and
technicians, who regularly use mathematical symbols and other technical characters, will also find the
Unicode Standard valuable.
Global business processes, for example global HR system or global Master Data Management, Web
Services offering customers to enter their contact data (Global Master Data containing multiple local
language characters!), in short: Global Business requires the support of a Global Character Set!
What are the benefits of a Unicode-based SAP System?
Internet and Web Services
The Internet (including the World Wide Web) – and therefore collaborative scenarios – are
based on Unicode.
Unicode SAP systems take full advantage of XML and Java (both of which require Unicode).
Unicode is required for cross-application data exchange without loss of data caused by
incompatible character sets. One way to present documents on the World Wide Web, for
example, is XML.
Unicode compliant ABAP paves the way for more efficient and effective integration of ABAP
and Java applications.
Business Value
Unicode SAP systems can be more tightly integrated with non-SAP products and offer a
superior platform for collaborative, cross-system business applications.
Unicode enables SAP customers to install global systems that cover their business processes
worldwide.
SAP AG
5
SupportedlanguagesinUnicode.doc
09.05.2007
Companies using different distributed systems frequently want to aggregate their worldwide
corporate data. Without Unicode, their ability to do this is limited.
Language Support
Unicode SAP systems provide unrestricted use of any language or language combination in
the world.
With Unicode, you can use multiple languages simultaneously on a single frontend.
In Unicode SAP systems you can display and maintain data which has been entered in ANY
language with ANY logon language - provided the language installation has been performed
correctly as described in chap. Language Installation. The logon language only determines
the display language for menus, dynpros, system messages etc…i.e. the language the user is
working in.
Fig.1: Example collection of languages which are supported in Unicode SAP systems
The languages are sorted by their 2-letter ISO 639 codes.
Language support
•
As of R/3 1I non-Unicode: 41 languages which have a 2-letter language key according to ISO
639-1 (= Default Set of Languages: see Fig. 2)
•
As of Web Application Server 6.20 Unicode:
a) Default Set of Languages
•
41 language codes (see Table below)
b) New Unicode Languages:
•
433 additional languages which have a 3-letter language key according to ISO 639-2 (see
example 1. below)
•
86 languages which have no separate ISO 639 language key but are assigned to countries or
scripts (see examples 2a) and b) below)
SAP AG
6
SupportedlanguagesinUnicode.doc
09.05.2007
All languages from ISO 639-1and ISO 639-2 standard and 86 languages with no ISO-language key
(560 languages in total) can be entered and displayed in Unicode SAP systems as of:
Web AS 6.20 Unicode Support Package 54 onwards
Web AS 6.40 Unicode Support Package 14 onwards
Web AS 7.00 Unicode onwards
SAP Note 895560 provides an overview of printing and display restrictions.
Note:
Language support in non-Unicode systems as of Web AS 6.20 onwards is limited to the Default Set of
41 languages!
Fig. 2: Default Set
SAP/ISO Lang. Key
Language
SAP/ISO Lang. Key
Language
AF
Afrikaans
JA
Japanese
AR*
Arabic*
KO
Korean
BG
Bulgarian
LV
Latvian
CA
Catalan
LT
Lithuanian
ZH
Chinese
MS
Malayan
ZF
Chinese trad.
NO
Norwegian
HR
Croatian
PL
Polish
CS
Czech
PT
Portuguese
DA
Danish
Z1
Reserved- cust.
NL
Dutch
RO
Romanian
EN
English
RU
Russian
ET
Estonian
SR
Serbian
FI
Finnish
SH
Serbian (Latin)
FR
French
SK
Slovakian
DE
German
SL
Slovene
EL
Greek
ES
Spanish
HE
Hebrew
SV
Swedish
HU
Hungarian
TH
Thai
IS
Icelandic
TR
Turkish
ID
Indonesian
UK
Ukrainian
IT
Italian
*in Unicode systems only
Examples: New Unicode Languages
1. Languages with 3-letter language key according to ISO 639-2:
SAP AG
•
Hindi = ISO ‘HIN’ = SAP ’HI’
•
Iranian = ISO ‘IRA’ = SAP ’IR’
7
SupportedlanguagesinUnicode.doc
•
09.05.2007
Sanskrit = ISO ‘SAN’ = SAP ‘SA’
2. Languages with no separate language key according to ISO 639
a) assigned to countries:
•
English Australia = ISO ‘ENG’ = SAP '1E'
•
English Canada = ISO ‘ENG’ = SAP '3E'
•
English Ireland = ISO ‘ENG’ = SAP '8E'
•
English New Zealand = ISO ‘ENG’ = SAP '1N'
b) assigned to scripts:
•
Azerbaijani (Cyrillic) = ISO ‘AZE’ = SAP ‘5R’
•
Azerbaijani (Latin)
= ISO ‘AZE’ = SAP ‘AZ’
For an overview of all supported languages (including technical details, used scripts, and countries)
see the Excel sheet "Supported Languages and Code Pages.xls". You can download this document
from SAP Note 73606 and from www.service.sap.com/i18n → i18n Media Library.
Translation Status
SAP delivers the mySAP ERP editions in 30 of the languages listed in Fig.2. For an overview see
www.service.sap.com/languages. Here you will find availability figures per release, translation level
and delivery status of each language. However, if you need additional languages with 2 or 3-letter
language codes, you can install them in your Unicode SAP system as described in chap. Language
Installation, but remember that translation is not delivered by SAP.
Language Configurations in SAP Systems
One of the important considerations when preparing to install an SAP System is the choice of the
system language(s). A multinational company operating in different countries all over the world usually
needs several languages consisting of many different characters.
Characters are encoded on a per script basis. So, for example, there is only one set of Latin
characters defined, despite the fact that the Latin script is used for the alphabets of thousands of
different languages. The same principle applies for any other script (Cyrillic, Arabic, Ethiopic,
Devanagari, etc.) which is used for writing many different languages.
Range of the Unicode Standard
The Unicode standard and ISO/IEC 10646 support three encoding forms (UTF-8, UTF-16 and UTF-32)
that use a common repertoire of characters and allow for encoding as many as 1.1 million characters.
This is sufficient for all known character encoding requirements, including full coverage of all historic
scripts of the world, as well as common notational systems.
The Unicode Standard defines codes for characters used in all the major languages written today.
Scripts include the European alphabetic scripts, Middle Eastern right-to-left scripts, and many scripts
of Asia. It covers the GB 18030-2000 Standard as well.
The Unicode Standard further includes punctuation marks, diacritics, mathematical symbols, technical
symbols, arrows, dingbats, etc. It provides codes for diacritics, which are modifying character marks
such as the tilde (~), that are used in conjunction with base characters to represent accented letters (ñ,
for example). In all, the Unicode Standard, Version 4.0 provides codes for 95,221 characters from the
world's alphabets, ideograph sets, and symbol collections."
SAP AG
2
SupportedlanguagesinUnicode.doc
09.05.2007
Fig. 3
ASCII
General Scripts
Symbols
CJK Ideographs
65,000 characters
Hangul
Compatibility
Surrogate Area
Additional
1,000,000 characters
Availability of Unicode mySAP Solutions
Unicode-based mySAP solutions deploy the SAP Web Application Server 6.20, 6.40 and 7.00 which
support the Unicode Standard and provide essential new functionality, syntax improvements and
extended semantics. The first complete Unicode version of the Web Application Server is available as
of Web AS Release 6.20 Support Package 38, Web AS Release 6.40 Support Package 04 and Web
AS Release 7.00 respectively.
The general availability of the Unicode version of mySAP ERP, mySAP Business Suite and SAP
NetWeaver components is scheduled for the end of the ramp-up phase of the particular component.
See SAP Note 79991 for current status.
Basic Information
Programming Languages
Prior to Web AS 6.20, SAP used different encodings to process characters from different scripts - such
as ASCII, EBCDIC, or double-byte code pages. These character sets covered every language used in
SAP. However, problems occurred when you tried to mix texts from different incompatible character
sets in a central system. Exchanging data between systems with incompatible character sets also lead
to contingent situations.
SAP AG
3
SupportedlanguagesinUnicode.doc
09.05.2007
The solution to this problem is to use a code consisting of all the characters used throughout the world,
i.e. Unicode (ISO/IEC 10646), which consists of at least 16 bit = 2 bytes, alternatively of 32 bit = 4
bytes per character.
ABAP Programs
Most programs should work without any modification, but you need to ensure that all programs comply
with the stricter Unicode 6.10 syntax and semantics, which improve program efficiency and enable
Unicode support. Note that all programs must be 6.10 compliant to run in a Unicode system and 6.10
compliant programs will also run in a non-Unicode system as well. In a non-Unicode system, programs
do not have to be 6.10 compliant.
To check your program, use the transaction UCCHECK to determine if your programs are ABAP 6.10
compliant; In addition, programs should be tested to catch non-static errors that appear at run-time.
Use the transaction SCOV to monitor the testing.
The language adjustments made as part of the conversion to Unicode provide an excellent opportunity
for all ABAP developers to clean up their source code. The new programming statements also work in
non-Unicode programs. SAP strongly recommends using them there in order to improve the
readability and minimize ambiguity in source code.
UCCHECK
Run UCCHECK and enter the programs you want to check.
After you have completed the check, and modified any code that was not ABAP 6.10 compliant, you
should check the runtime behavior of your programs. UCCHECK issues errors for static detectable
syntax errors, or warnings where runtime errors are possible, that cannot be detected by the static
syntax check.
Coverage Analyzer
With the Coverage Analyzer you can montor the code coverage of program executions in your SAP
systems. The Coverage Analyzer enables you to check the completeness of runtime tests and to
display the results. It shows the collected data in several different customizable hierarchies.You can
drill down to the programs and to their respective modularization units. The information that can be
displayed includes for example:
•
Degree of utilization
•
Percentage of program units that have been tested
•
Percentage of programs whose Unicode check flag is active
•
Number of processing blocks
Unicode Enhanced Syntax Check
The system profile parameter abap/unicode_check=on can be used to enforce the enhanced
syntax check for all objects in non-Unicode systems. When setting this parameter, only Unicodeenabled objects (objects with the Unicode flag) are executable. Note that after setting the Unicode
flag, automatically generated programs might need to be regenerated. The mentioned parameter
should be set to the value "on" only, if all customer programs have been enabled according to
transaction UCCHECK.
The same ABAP source code is used for both Unicode and non-Unicode installations (see Fig. 3).
Therefore you can freely combine Unicode and non-Unicode mySAP components and you take
advantage of new releases with or without Unicode. Enhancements have also been made to RFC to
guarantee smooth data communication between Unicode and non-Unicode systems and with nonSAP products as well. Because most mySAP solutions consist of cross-component integration
scenarios, thorough integration testing is also planned, in particular within a combined Unicode / nonUnicode system environment.
SAP AG
4
SupportedlanguagesinUnicode.doc
09.05.2007
There are separate Unicode and non-Unicode versions of R/3:
Fig. 3 Character expansion model
•
No explicit Unicode data type in ABAP
•
Single ABAP source for Unicode and non-Unicode systems
•
Automatic conversion of character data for communication between Unicode and nonUnicode systems
C and C++ Programs
Both the non-Unicode (single byte or multibyte) SAP kernel and the Unicode SAP kernel are compiled
from a single set of C program sources -- when porting the SAP system to Unicode, a huge amount of
C programming source were enhanced, but this does not affect the non-Unicode SAP kernel. Your
C/C++ programs must also be enhanced to run in a Unicode system (see:
www.service.sap.com/unicode@sap -> Unicode Library -> ABAP and Unicode -> Ext. Unicode
Interfaces: Unicode Interface for C-Programming.ppt.
XML
Unicode SAP systems take full advantage of XML.
SAP Exchange Infrastructure (SAP XI), SAP's platform for process integration based on the exchange
of XML messages is only available as of Web AS 6.20 Unicode. It provides a technical infrastructure
for XML-based message exchange in order to connect SAP components with each other, as well as
with non-SAP components.
For detailed information about SAP XI, see www.service.sap.com/xi.
For information about SAP Web Services see http://service.sap.com/uddi
For information about SAP Internet Business Framework see http://www.service.sap.com/netweaver.
For more information about XML-development at SAP, see www.service.sap.com/xml.
Java
Unicode SAP systems take full advantage of Java.
For information about Java development at SAP see www.service.sap.com/j2ee.
SAP AG
5
SupportedlanguagesinUnicode.doc
09.05.2007
Unicode Character Encodings and Byte Length
Each Unicode character has a unique Unicode scalar value, and there are three "Unicode
Transformation Formats" (UTF), which are mathematical permutations of the Unicode scalar value.
There are currently 8-, 16- and 32- bit encodings. UTF-8, UTF-16, UTF-32, as well as other encoding
schemes for Unicode characters, such as CESU-8.
Although there are multiple encoding schemes, this is not the same as the problem of multiple code
pages. All UTF formats, as well as CESU-8, contain exactly the same character set. The
transformations between Unicode encodings are done algorithmically, and therefore no conversion
tables are needed; this improves performance considerable.
To provide the most efficient balance between memory requirements, performance, and compatibility
with existing non-Unicode systems, SAP uses different encoding schemes.
Fig. 4: Encoding schemes used in SAP systems
variable length; 1 character = 1-4 bytes
fixed length; 1 character = 2 bytes
(surrogate pairs = 2+2 bytes)
platform independent, byte order
independent
platform-dependent byte order
(Little/Big Endian)
no alignment restriction
2 byte alignment restriction
UTF-8 is the character encoding used for
XML
best compromise between memory
usage and algorithmic complexity.
all 7 bit ASCII characters have the same
code points and byte length
fits to Java and Microsoft
environment
→ this ensures compatibility with nonUnicode systems
best way to migrate existing ABAP
and C programs
Internal communication
(Frontend; GUI)
UTF-16 (16 bit encoding)
(application server)
External communication
UTF- 8 (8 bit encoding)
Fig. 5: Example: Representation of Unicode Characters in SAP systems
Character
Unicode Scalar
Value
UTF-8 / CESU8
UTF-16
UTF-16
[Little Endian]
[Big Endian]
A
U+0041
41
41 00
00 41
Ä
U+00C4
C3 84
C4 00
00 C4
α
U+03B1
CE B1
B1 03
03 B1
‫א‬
U+05D0
D7 90
D0 05
05 D0
晓
U+6653
E6 99 93
53 66
66 53
Fig. 6: Database Format: varies depending on the manufacturer
UTF-8
CESU-8
UTF-16
DB/2 (AIX)
Oracle
SQL Server
Max DB (8.0)
DB/2 (AS400)
SAP DB (7.0)
SAP AG
6
SupportedlanguagesinUnicode.doc
09.05.2007
What is Little Endian/Big Endian?
The byte order of UTF-16 depends on the processor architecture (i.e. is byte order/"endian" dependent).
Little Endian (LE)
The least significant byte of the number is stored in memory at the lowest address, and the most
significant byte at the highest address. (The little end comes first.) As an analogy, we say "fourteen" in
English; the less significant number, four, comes first.
This is also called least significant byte (LSB) ordering.
Big Endian (BE)
The most significant byte is stored in memory at the lowest address, and the least significant byte at
the highest address. (The big end comes first.) As an analogy, we say "twenty-four" in English; the
more significant number, twenty, comes first.
This is also called most significant byte (MSB) ordering.
When converting to Unicode, the export code page should correspond to the endianness of the target
system.
Hardware Requirements
Non-Unicode system - Unicode system compared
The Unicode encoding determines the length of a character. A character in one of the Unicode
encodings can be more than 1 byte, and therefore Unicode characters can be longer than characters
defined in other standard code pages. This leads to larger hardware demands.
The CPU/RAM figures below are measured average numbers on SAP Application Servers. They will
be different for different transactions. Additional CPU/RAM hardware resource requirements on
standalone servers must be provided by DB vendors.
Fig. 7
*
*The ABAP statement SET_LOCALE which is required for the processing of language-dependent data in MDMP systems is
CPU expensive. In a single code page/Unicode system it is exclusively used for sorting and therefore not CPU expensive.
SAP
Wide AG
character handling is expensive for double-byte code pages.
7
SupportedlanguagesinUnicode.doc
09.05.2007
The size of additional hardware which is required for a Unicode database depends on:
•
Database Unicode encoding format (e.g. CESU-8 vs. UTF-16)
•
Database settings (page size, extent size)
•
Hardware compression
•
Languages in use: e.g. for double-byte characters UTF-16 requires less storage than UTF-8
which means that processing Japanese text data demands more space than processing
English text data which means (see Fig. 8)
•
Application modules in use (ratios: tables/indices, text/binary data)
•
Reorganization frequency :
o
Unicode conversion includes a DB reorganization
o
DB growth is often compensated by shrinking due to reorganizationi (especially the
indices)
Fig. 8
A
1100
8000
晒
Ä
CESU-8 UTF-16
1100
8000
CESU-8 UTF-16
1100
8000
CESU-8 UTF-16
Fig. 9: Examples
DB Size before
Conversion (in
GB)
DB Size after
Conversion (in
GB)
Change
DB Manufacturer
Unicode
Encoding
54
48
-11,1%
Oracle
CESU-8
528
461
- 12,7%
Oracle
CESU-8
772
666
- 13,7%
Oracle
CESU-8
112
93
-17%
Oracle
CESU-8
880
674
- 23,4%
Oracle
CESU-8
240
270
+ 12,5%
Oracle
CESU-8
460
650
+ 41,3%
SAP DB 7.3
UTF-16
22
36
+ 63,6%
SQL
UTF-16
Database and Platform Support
The following operating system/database combinations are supported in Unicode SAP systems.
Informix and Reliant Unix support is not planned.
Fig. 10
SAP AG
8
SupportedlanguagesinUnicode.doc
09.05.2007
Database
Operating System
W2K
Linux³
Solaris
1
1
1
HP
Tru64
AIX
1
OS/400
OS/390
Oracle
!
!
!
!
!
!
-
-
MS SQL
Server
!
-
-
-
-
-
-
-
SAP
DB/Max
DB
!
!
!
!
!
!
-
-
DB2
!
!
!
!
!
!
-²
1
64bit versions only.
Fig. 11
Additional Storage
Database (Platform)
Encoding
DB2 for AS/400
UTFUTF-16
10…20% *
DB2 for z/OS
UTFUTF-16
-20…10%**
20…10%**
UTFUTF-8
-10%
MaxDB
UTFUTF-16
40…60%
MS SQL
UTFUTF-16
40…60%
Oracle
CESUCESU-8
-10%
DB2 /Universal Database for
Unix/NT
Requirements
*Small growth as biggest part of the ASCII based database is already Unicode
* *With hardware compression, which is always used for SAP Unicode installations
Average database growth measured in customer systems (sum of all sizes):
■
UTF-8 and CESU-8: -13% (more than 90% of the databases have shrunk)
■
UTF-16: +30...40%
If you run a database which is not supported by Unicode, you can perform a database change with
simultaneous Unicode conversion. Choose the heterogeneous system copy procedure for the
database export/import instead of the homogeneous system copy method. For more information about
system copy methods see SAP Service Marketplace Quick Link /systemcopy.
Frontend Settings in the Unicode System
SAP GUI Support
All SAP GUIs (HTML, Java, Windows) support Unicode alongside all the non-Unicode code pages
already supported. Because SAP GUI is backward compatible, a single SAP GUI can be used to
access both Unicode and non-Unicode systems, and therefore only one GUI is needed per frontend.
SAP AG
9
SupportedlanguagesinUnicode.doc
09.05.2007
Frontend Requirements
Requirements:
minimum SAP GUI Patch Level 56 (SAP recommends to install the newest
SAP GUI Patch Level)
Documentation:
“SAPGUI for Windows: I18N User Guide”: You can download this
documentation from SAP Service Marketplace at www.service.sap.com/i18n
→ I18N Media Library or from SAP Note 508854
SAP Note 710720 (SAPGUI for Windows 6.40)
For full support of languages with multi-byte system locales (Japanese, Traditional Chinese, Simplified
Chinese and Korean) SAPGUI 6.40 is required!
Application-specific restrictions
As of SAP NetWeaver 2004s all BEx tools are Unicode-enabled. For details have a look at
www.service.sap.com/bi → SAP NetWeaver 2004s BI → BI Capabilities.
Prerequisites
1. Unicode version of SAP Web Application Server (at least kernel patch level 1078)
2. SAPGUI 620 (at least patch level 33)
3. Windows 2000, XP and the succeeding versions of Windows
4. I18N mode of SAP Frontend must be ON
Code Pages
Unicode SAP systems use one system code page and one frontend code page (GUI code page). The
system code page depends on the platform byte order:
•
4103 (UTF-16 LE)
•
4102 (UTF-16 BE)
The frontend code page is 4110 (UTF-8; Unicode Character Set). In Unicode systems the application
server sends the information about the frontend code page, and the frontend code page is
automatically set to 4110 (UTF-8).
Do not change these settings!
Font Selection
It is recommended to choose a Microsoft TrueType font, such as "Courier New" or "Andale Mono"
(included in the Internet Explorer) as fixed font. In this case, "Tahoma" is automatically selected as
proportional font. This combination covers a large area of the Unicode Character Set.
Note:
"Arial Monospaced for SAP" is not suitable as it covers only Latin-1 characters.
To display and change the frontend settings in the Unicode System, select
Function Bar and choose:
SAP AG
from the System
10
SupportedlanguagesinUnicode.doc
09.05.2007
1. Font (I18N)…: On the first screen the fixed font is displayed (for example "Courier New".. If
you choose the OK button, a second, identical, screen is opened in which the proportional font
is displayed (for example "Tahoma").
2. Options (I18N)…
Read the “SAPGUI for Windows: I18N User Guide” before maintaining the frontend settings!
Locales
ABAP programs are written to be language-neutral and all language-specific data is derived from the
system locales. Unlike in non-Unicode SAP systems, the Unicode system locales are platform
independent. To provide Unicode Locales, SAP uses the International Components for Unicode (ICU).
The ICU is a C/C++ and Java library that provides many internationalization functions including locales,
transliteration and language-sensitive collation.
Input Methods in Unicode Systems
SAP does not support input methods which make use of characters in the Unicode Private Use Area
(PUA), for example Hong Kong Chinese (HKSCS) characters.
For more information, see SAP Note 845233.
International Components for Unicode (ICU)
Background and History of ICU
ICU is a set of C/C++ and Java libraries for Unicode support, software internationalization and
globalization (i18n/g11n). It grew out of the JDK 1.1 internationalization APIs, which the ICU team
contributed, and the project continues to be developed for the most advanced Unicode/i18n support.
ICU is widely portable and gives applications the same results on all platforms and between C/C++
and Java software.
ICU in the Unicode Development Project
SAP Unicode development uses ICU for those functions that were defined by the platform-dependent
locales in non-Unicode systems:
1. CTYPE functions:
toupperU(), tolowerU(), isupperU(), islowerU(), isspaceU(), isprintU() etc. In the Unicode system, these
functions are no longer platform- or locale-dependent. Every Unicode character has well defined
properties, and these properties are accessible via ICU.
2. Collation:
The ABAP statements SORT ... AS TEXT and CONVERT ... INTO SORTABLE CODE provide the
programmer with locale-dependent, cultural sorting. ICU has collations for all languages that are
supported by SAP.
The bidirectional layout of Hebrew and Arabic texts will be implemented with ICU both in the Unicode
and non-Unicode system.
Unicode SAP Systems: After the System Installation
Requirements:
Web AS 6.20 onwards
Programs:
report RSCPINST; transaction SMLT
Further Information:
SAP Notes 73606, 544623, 551344; Excel list "Supported
Languages and Code Pages.xls".
SAP AG
11
SupportedlanguagesinUnicode.doc
09.05.2007
In a Unicode SAP system you can use almost every script and therefore almost every language in the
world . An overview of the major languages, their scripts and the countries in which they are spoken is
available in the Excel file "Supported Languages and Code Pages" which can be downloaded from
SAP Note 73606 and from SAP Service Marketplace → Quick Link /i18n → I18N Media Library.
Before installing a Unicode system you must consider the following topics:
1. Will you need additional hardware? See chapter Hardware requirements.
2. Do your ABAP programs comply with ABAP 6.10 syntax? Use transaction UCCHECK to find
the lines that must be modified. Modify any program that is not compliant. For all programs set
the attribute "Unicode enabled". See chapter Programming Languages.
3. If you have any C/C++ programs, are they Unicode-compliant?
4. Which languages will be required?
Consider all of the users who will be working in the system and determine which users need to work in
their respective languages. Also consider languages that are necessary for you to conduct business.
Language Installation
After determining which languages to install run report RSCPINST.
RSCPINST is a setup and diagnostic tool for configuring languages in SAP systems. The report
automatically determines the settings required for a consistent i18n configuration, based on the set of
languages selected. RSCPINST checks (i) all important i18n configuration tables (ii) all important i18n
application server profile parameters. The report also updates the necessary database tables, but
modifications to the application profile parameters have to be carried out manually.
In transaction SE38 start RSCPINST. You will get a message that this is a Unicode system. Confirm
the message, then select
and follow the instructions described there.
Web AS 6.20 and Web AS 6.40: You can activate up to 50 languages per system.
Web AS 7.00 and higher: You can activate up to 240 languages per system.
You can check your current language configuration anytime by using pushbutton
config.
Current NLS
After you have installed or changed your configuration always use
Simulate to check the
configuration and possible inconsistencies. You can simulate RSCPINST as often as required. Do not
activate before all inconsistencies have been checked and adjusted!
Fig. 12: Language Installation Tool
SAP AG
12
SupportedlanguagesinUnicode.doc
09.05.2007
1. Enter languages
The tool suggests the set of languages which is entered in database table TCP0I. If TCP0I is empty,
RSCPINST suggests language EN (English) only. If you want to add languages, use
Add. You can
add all 41 languages from the Default Set delivered by SAP (see Fig. 2).
If you want to install more languages or the F4 help in the Key field does not include the language key
you require, you can add the language key(s) to the F4 help list. Select Extend Language List. This
function extends the Default Language Set (as described in chapter 'Technical Language Support', Fig.
2) with any of the new Unicode Languages (chapter 'Technical Language Support', examples 1 and 2).
Proceed the configuration with the extended list as usual.
Note:
Read SAP Note 42305. It is also recommended to contact SAP to evaluate your system landscape
and language configuration before adding new languages.
2. Enter country
If there is an existing entry, it displays the country in the fields next to
Choose and also all possible
countries in the list. If it is a new installation or no previous entry is found, it just displays 'Unicode' in
both the field and the list. If there is a need to select the country, you can make the country list
available by selecting Goto → Select Country (Unicode) from the menubar.
3. Select
Simulate from the toolbar.
4. Now RSCPINST checks the consistency of new configurations and generates a list of new setting
information and problematic areas, such as obsolete parameters, etc. which must be adjusted
SAP AG
13
SupportedlanguagesinUnicode.doc
09.05.2007
manually before the installation can be activated. No database updates are carried out within this
mode.
5. Review the output and make the necessary changes to the profile parameters if indicated any.
6. Select
Activate to complete the installation.
Manually update the profile parameter zcsa/installed_languages so that it will include all of the
languages installed. In the RSCPINST simulation (Fig. 13) you see an example of new parameter
values after the installation of two additional languages. Copy the new values and go to transaction RZ
10 or RZ11. Replace the values of zcsa/installed_languages by pasting the copied values onto the
current values of this parameter.
Fig. 13: Language Installation Simulation
Logon Language
In Unicode SAP systems you can display and maintain data from ANY language with ANY logon
language - provided the languages and locales have been installed and profile parameters have been
maintained correctly as described in chap. Language Installation (see also: SAP Note 42305).
The logon language determines the display language for menus, dynpros, system messages etc…i.e.
the language the user is working in.
Translation Import
After having finished all code page related configurations, run transaction SMLT to configure, import or
supplement the translations for all required languages. Call transaction SMLT and read the
documentation which is available via pushbutton Documentation in the toolbar.
Printing in Unicode SAP Systems
Local printing with Single Code Page printers is still possible. To print multilingual data, Unicode
printers are required.
SAP AG
14
SupportedlanguagesinUnicode.doc
09.05.2007
Fig. 14
Supported Device Types
LEXUTF8
HPUTF8
Communication within Multilingual System Landscapes
In this chapter you will see how Unicode SAP systems integrate with other Unicode and with nonUnicode systems. You will also learn about the advantages of a homogeneous Unicode system
landscape. The example shows that not each non-Unicode system can correctly deal with the text
data it receives - whilst each Unicode system can. So, to be truly able to communicate without any
code page incompatibilities, the only solution is a system landscape which is completely
Unicode-based.
Important SAP Notes
SAP Note
745030
Description
MDMP - Unicode Interfaces: Solution
Overview
Provides a detailed overview of data exchange
solutions between MDMP systems and Unicode.
656350
Master Data Transfer UNICODE <==> MDMP
Systems with ALE
Details the description and solution of a special
development project which allows transferring
language dependent master data between a
UNICODE and MDMP system using ALE
Technology.
820419
IDoc adapter: Incorrect field values with MDMP
systems
Information about using the Exchange
Infrastructure (XI3.0) for sending IDocs with the
IDOC adapter to an MDMP system.
SAP AG
15
SupportedlanguagesinUnicode.doc
09.05.2007
A multinational company has a Unicode SAP system at its headquarters in the US, a single
code page SAP system in Japan (Shift-JIS or SJIS) and an MDMP SAP system in Australia
(Latin-1/SJIS/Thai).
Japanese and English (7Bit ASCII) data can be sent and received by all offices, but the
Japanese office cannot receive all data with Thai characters from the Australian office,
because SJIS does not contain those characters.
Data Transfer in a Unicode/non-Unicode System Landscape
Data transfer between two Unicode systems is always unproblematic, no matter if text data contain
language information or not.
Communication between two systems will be problematic in the following cases:
1. Sender and receiver system deploy different code pages. If you logon to a Unicode system with
language EN and maintain Japanese data which are then transferred (for example via RFC) or
transported into a non-Unicode System which has no Japanese code page, the Japanese data will
be corrupted in the non-Unicode System.
2. JAVA applications communicate with non-Unicode SAP components: as JAVA is using Unicode
for text procssing and Unicode is a superset to all old non-Unicode code pages there is always
danger of data loss in the communication between JAVA and non-Unicode software. This applies
for the communication between the ABAP stack and the Java stack within a NetWeaver
application as well as for the communication of the ABAP stack with external JAVA applications.
3. A Unicode system communicates with an Asian non-Unicode system (double-byte code page).
Fig. 15
100% data transfer
7Bit ASCII data transfer; solution for 100% data transfer being implemented in some
applications
7Bit ASCII and some additional data transfer 100%
7Bit ASCII data transfer only
No solution for data transfer yet - under investigation!
Smooth data transfer can only be guaranteed between Unicode systems and between Unicode
system and JAVA application (see: Fig. 15). As MDMP is completely unknown in the JAVA world,
there is no communication possible between MDMP system and JAVA applications.
Fig. 16 System Communication
Sender /
Receiver
Single Code Page
MDMP
JAVA Application
Unicode
Single Code Page
MDMP
JAVA Application
Unicode
SAP AG
16
SupportedlanguagesinUnicode.doc
09.05.2007
Fig. 17 Example: Communication between Single Code Page SAP Systems
Sender* /
Receiver
ISO-1 ISO-2 ISO-3 ISO-5
ISO-6
ISO-7 ISO-9 ISO-11 SJIS Big 5 KSC GB
5601 1324
ISO- 1
ISO- 2
ISO- 3
ISO- 5
ISO- 6
ISO- 7
ISO- 9
ISO- 11
SJIS
Big 5
KSC5601
GB1324
*ISO-X = ISO 8859-X
Fig. 18 Common errors:
■
communication of Unicode system with non-Unicode system (reason: wrong language key)
■
file upload/download (reason: wrong code page)
Transport between Unicode and non-Unicode SAP Systems
Transporting objects between UC and non-UC systems is technically supported. There are some
restrictions, however, which are described in SAP Notes 638357 and 80727.
SAP AG
17
SupportedlanguagesinUnicode.doc
09.05.2007
RFC Library
As of Web AS 6.10 RFC enhancements guarantee smooth data communication between Unicode and
non-Unicode systems and with non-SAP products as well. All necessary data conversions occur in the
Unicode system; therefore you do not need to make any changes to your existing non-Unicode
systems. When configuring destinations in the Unicode systems, you simply have to declare the RFC
destination as a non-Unicode system, and then the data will be converted with the appropriate code
pages and language keys of the non-Unicode destination system.
The RFC Library exists in a Unicode and non-Unicode version. Thus, the Unicode RFC Library is
forward and backward compatible, i. e. a current Unicode RFC Application can communicate with any
non-Unicode RFC Application independently of its release and vice versa.
The Unicode Library is able to communicate with any RFC partner, regardless if the partner is Unicode
or non-Unicode. There are two approaches of the RFC Library:
1. Both RFC partner and RFC client use a Unicode (or non-Unicode) system. The data will be sent
to the RFC server as it is, i.e there is no conversion for character-like data at sender side. The
receiver converts the data into its own internal format. Note: This RFC-connection does only work
100% if sender and receiver system do not deploy different code pages; i.e. if they are both either
Unicode systems (see Fig. 15) or they use the same non-Unicode code page (see Fig. 16)!
2. Only one RFC partner uses a Unicode system. In this case the Unicode system must convert the
data into a suitable ASCII data format before sending it. When the RFC converts text data
between Unicode and MDMP systems it converts from/to the code page in which the text data are
encoded in the MDMP system. The encoding code page depends on the text language which is
taken from the language field of the table. If the table has no language field the matching code
page will be determined according to the logon language. For example if the logon language is
Japanese, the Unicode partner will convert the character-like data into a 8000 code page before
sending it. This code page is called communication code page. For more information about
RFC-connections between Unicode and non-Unicode systems read the following SAP Notes:
547444 (RFC Enhancement for Unicode ./. MDMP Connections)
480671 (The Text Language Flag of LANG Fields)
722193 (RFC legacy MDMP callers and Unicode callees)
647495 (RFC for Unicode ./. MDMP Connections
790485 (RFC Problem Single Code Page, non-Unicode to Unicode System)
For information about how to use the RFC Library see the RFC-Documentation on SAP Service
Marketplace. Go to service.sap.com/rfc-library. Select Media Library → RFC Library Guide.
SAP AG
18
SupportedlanguagesinUnicode.doc
09.05.2007
From non-Unicode SAP System to Unicode
There are four ways to your Unicode SAP system:
1. New installation of a Unicode SAP system (as of mySAP ERP 2005/SAP NetWeaver 2004s all
new installations are Unicode systems)
2. Conversion of non-Unicode SAP system to Unicode (supported as of R/3 Enterprise 4.70 Ext.
2.00/Web AS 6.20)
3. Combined Upgrade & Unicode Conversion to target release SAP ERP 2005
•
source release SAP_BASIS 4.6C (Status: Unrestricted shipment).
•
source releases SAP_BASIS 6.20 and 6.40 (pilot project, see SAP Note 928729)
4. Twin Upgrade & Unicode Conversion
•
Source releases lower than R/3 4.6C/SAP_BASIS 4.6C (pilot project, see SAP Note
959698)
•
Source release Web AS 6.10 (pilot project, see SAP Note 959698)
A system conversion from MDMP to Unicode implies some additional consideration and more
preparation and conversion steps than a Single Code Page system conversion. Make sure you deploy
Unicode-based mySAP components (listed in SAP Note 79991) and a Unicode-enabled database (for
current information of databases supported by Unicode, see section Database and Platform Support in
this document and SAP Note 379940). Make also sure you read the applicable documentation listed in
the Appendix.
Appendix
Documentation
New Installation of Unicode SAP Systems
If you have decided to install a new Unicode SAP system, you need the Installation Guide for your
database/platform combination, and SAP Note 544623.
Conversion of non-Unicode SAP to Unicode
If you have decided to convert existing SAP systems to Unicode, the following documentation is
required:
1. "Unicode Conversion Guide". Go to SAP Service Marketplace → Quick Link /unicode@sap.
Select Unicode Library → Unicode Conversion Library.
2. "Homogeneous or Heterogeneous System Copy for SAP Systems Based on SAP Web AS 6.xx".
Go to SAP Service Marketplace → Quick Link /instguides.
Combined Upgrade & Unicode Conversion
Download the Combined Upgrade & Unicode Conversion Guide 4.6C from SAP Note 928729.
Download the Component Upgrade Guide and the Installation Guide for your database/platform
combination from SAP Service Marketplace Quick Link/instguides.
Download the System Copy Guide from SAP Service Marketplace Quick Link/systemcopy.
SAP AG
19
SupportedlanguagesinUnicode.doc
09.05.2007
Further Information
Visit the websites of the SAP Internationalization team for detailed technical information:
http://service.sap.com/i18n
http://service.sap.com/unicode@sap
Visit the website of the SAP Globalization team for detailed customer and project information:
www.service.sap.com/unicode
For general information about Unicode visit the public website of the Unicode Consortium:
http://www.unicode.org
Overview of pre-Unicode code pages:
http://czyborra.com/charsets
Contacts
For technical information: mailto:[email protected]
For information about Unicode Conversion project status: mailto:[email protected]
SAP AG
20