Oracle 10g New Features: Globalization Support

articles: 

Oracle Database 10g has expanded the ability to globalize Oracle databases dramatically. As you can see by looking over the following list of globalization improvements, Oracle has attempted to make Database 10g the database of choice for all globally supported and utilized databases.

The new features for globalization support are:

Globalization Development Kit - The Oracle Globalization Development Kit (GDK) is a toolkit that simplifies the development process and reduces the cost of developing Internet applications that will be used to support a global environment. Oracle database 10g release 1 of the GDK includes comprehensive programming APIs (Java and PL/SQL), multilingual test data, code samples, and documentation that address many of the design, development, and deployment issues encountered while creating global applications.

The key component of the GDK is the Oracle Globalization Services (OGS). OGS is a set of Java and PL/SQL APIs that provide Oracle application developers with the framework to develop globalized Internet applications, using the best globalization practices and features designed by Oracle.

Enhanced Character Set Scanner and Converter - The Character Set Scanner has been enhanced to support the scanning of nested tables and character semantics objects. The database scan summary report now provides additional information on the source database, along with statistics on possible size expansion.

CLOB and NCLOB Implicit Conversions - This feature provides implicit conversion between CLOB and NCLOB datatypes. Global internet applications that support multiple national language character sets no longer require development and deployment of explicit function calls to achieve this conversion.

Unicode 3.2 Support - This feature provides support for the latest Unicode standard, Unicode 3.2, by adding new Unicode code points, character classifications, and mapping information to existing Unicode character sets.

Expanded Locale Coverage - This feature adds new territories and languages, and augments existing definition files with additional information.

Let's start with an overview of the globalization development kit (GDK) enhancements.

Globalization Development Kit Enhancements

For those of you who haven't had to tackle globalization before, the Oracle Globalization Development Kit (GDK) is a toolkit designed to simplify the development process and reduce the cost of developing Internet applications that will be used to support a global environment. Oracle database 10g release 1 of the GDK includes:

  • A set of comprehensive programming APIs (Java and PL/SQL)

  • Multilingual test data

  • Code samples

  • Documentation that addresses many of the design, development, and deployment issues encountered while creating global applications.

The key component of the GDK is the Oracle Globalization Services (OGS). GDK and OGS are the set of Java and PL/SQL APIs that provide Oracle application developers with the framework to develop globalized Internet applications, using the best globalization practices and features designed by Oracle.

Overview of the Globalization Development Kit and Its Components

The general lack of knowledge about the complexity of globalization concepts and APIs makes designing and developing a globalized application a difficult task for even for the most experienced DBAs and developers. The Globalization support infrastructure is a key base of knowledge that must be understood by DBAs at the conceptual level, and by developers at the foundation level, who work within an Oracle database to provide globalization services. The required developer knowledge extends to the properties of the different character sets, territories, languages, and linguistic sort definitions.

DBAs need to understand how the choice of the base character sets for their database will limit the choices for developers. Another critical developer skill is to be able to design and write code that is able to simultaneously support multiple clients running on differing operating systems with differing character sets and locale preferences.

The purpose of Oracle Database 10g's Globalization Development Kit (GDK) is to provide a toolkit that simplifies the development process, reducing the cost of developing internet applications that support a global environment. The mission of the GDK is to provide developers with tools to facilitate the globalization of their internet applications, including the configuration of the different components in each tier to the coding of the application, and the design and testing of the globalization logic to the deployment of the application.

Documentation, a comprehensive set of programming APIs (Java and PL/SQL), and example multilingual test data and code samples are included in the Oracle Database 10g Release 1 of the GDK. They address many of the design, development, and deployment issues encountered while creating global applications. It is promised that the GDK will also include tools that will assist in configuration testing and troubleshooting of globalization problems in future releases.

Overview of Designing a Global Internet Application

There are two architectural models that must be understood by DBAs and developers alike that are used either for deploying a global web site or a global internet application, depending on your globalization and business requirements. Which model you choose to deploy affects how the internet application is developed and how the application server is configured in the middle-tier. Let's examine the two basic models.

Model 1: Multiple instances of monolingual internet applications

Monolingual internet applications support only one locale in a single binary. The term locale refers to a national language and the region in which the language is spoken. For example, the primary language of the United States and Great Britain is English. However, the two territories have different currencies and different conventions for date formats. Not to mention various usage differences. Therefore, the United States and Great Britain are considered to be 2 different locales. As the old saying goes, two countries divided by a common language.

Monolingual globalization support is suitable for customers who want to support one locale per instance of the application. This means that users need to have different entry points to access the applications for different locales. This model is only manageable if the number of supported locales is small.

Model 2: Single instance of a multilingual application

Multilingual internet applications support multiple locales simultaneously in a single binary. This level of globalization support is utilized by customers who need to support several locales in a single internet application simultaneously. In the multilingual model, users of different locale preferences use the same entry point to access the application.

As you can imagine, developing and supporting an application using the monolingual model is very different from developing an application using the multilingual model. The Oracle Globalization Services (OGS) consists of libraries that are used to assist in the development of global applications using either architectural model.

Getting Started with Oracle Globalization Services

We have already mentioned that Oracle Globalization Services (OGS) consists of a set of Java APIs that are used to develop globalized internet applications. In general, there is not a lot about the specific APIs that the DBA needs to know. However, a general knowledge of what they are and how they are used is important to an understanding of the support and development of global databases.

The functionalities offered by OGS can be divided into two distinct areas:

  • OGS provides middle tier development support to provide consistent globalization operations as provided in the database server. It extends Oracle Globalization Support to the middle tier.

  • Internet OGS Services provides the middle-tier globalization framework for internet applications. This set of APIs provides the developmental support to hide the complexity of synchronizing globalization operations across tiers.

Let's take a look at the OGS Oracle and Internet services, first, the Oracle services.

Oracle Services in OGS

As we said above, Java's globalization functionalities and behaviors are not the same as those offered in Oracle. For example, dates retrieved from a database are formatted using Oracle conventions (such as number and date formatting and linguistic sort ordering), but the static application dates are typically formatted using Java conventions. Java's globalization functionalities can also vary, depending on the version of the JDK that is used.

Before Oracle Database 10g, when an application was required to incorporate Oracle globalization features, it would make connections to the database server and issue SQL statements to retrieve the required settings. The select operations needed to support locale definition make applications complicated and generate more network connections to the database server.

The DBA has to be aware that the OGS Java APIs are certified with JDK versions 1.2 and above with the following exception: The character set conversion classes depend on the java.nio.charset package, which is available in JDK 1.4 and later. Therefore, during installation make sure that the proper JDK (1.4) is available.

OGS extends Oracle's database globalization features to the application server. This allows applications to perform globalization logic, such as Oracle date and number formatting on the middle tier. OGS developers can eliminate expensive programming logic from the database, improving the overall application performance by reducing unnecessary network traffic between the application tier and the database server. This also helps the DBA by making the database operate more effectively and eliminating traffic that could mask other application issues.

The main functionalities of the OGS Oracle Services are:

  • Oracle Locale Mapping and Locale Information

  • Character Set Conversion (JDK 1.4 and Later)

  • Oracle Date, Number, and Monetary Formats

  • Oracle Binary and Linguistic Sorts

Let's look into each of these areas in more detail.

Oracle Locale Mapping and Locale Information in OGS

Languages, territories, linguistic sorts, and character sets are a part of the proprietary Oracle locale definitions. The naming convention that Oracle uses can also be different from other vendors. Although Oracle tries to use industry standards, some are Oracle-specific; this is because they were tailored to meet special customer requirements.

OGS APIs are used to map equivalent locales between Java, IANA, and Oracle. A Java application can get a locale specification from the client, using these APIs that are specified in Oracle locale or IANA locale. If the Java application can't map the Oracle or IANA locale to an equivalent Java locale, it won't know how to interpret the incoming characters from the client and then will present the information incorrectly.

The OraLocaleInfo class is the Oracle locale class that includes language, territory, and collator objects. OraLocaleInfo provides a way for Java applications to retrieve a collection of locale-sensitive objects for a given locale. For example, these objects might be a full list of the Oracle linguistic sorts available in OGS, the local time zones defined for a given territory, or the common languages used in a particular territory etc.

Character Set Conversion (JDK 1.4 and Later) in OGS

In addition to locale conversion capabilities, OGS contains a set of APIs that allow users to perform various Oracle character set conversions. This set of APIs provides for the Oracle-specific sets of characters that Java cannot handle with its own classes.

JDK 1.4 and J2SE provided an interface for developers to extend Java's character sets. OGS was designed to utilize this feature to provide implicit support for Oracle's character sets. You can use J2SE APIs to get Oracle-specific behaviors or you can directly access the needed character set conversion APIs in OGS.

DBAs should note that the java.nio.charset java package is not available in JDK versions before 1.4, therefore, you must install at least JDK 1.4 to use Oracle's character set plug-in feature.

Oracle's character set names are proprietary. To protect against potential conflicts with Java's own character sets, all Oracle character set names have been given an X-ORACLE- prefix for all implicit usage through Java's API.

If JDBC is used, the JDBC driver provides the necessary character set conversion between the application server and the database; so APIs that explicitly call OGS character set conversions are not required.

Oracle Date, Number, and Monetary Formats in OGS

Another function of OGS is to provide formatting classes that support date, number, and monetary formats that use Oracle conventions for use in Java applications. These OGS APIs also provide for the additional locale formats and information introduced in Oracle Database 10g, such as the long date, number, and monetary formats for a given locale.

Oracle Binary and Linguistic Sorts in OGS

Oracle internally provides support for binary, monolingual, and multilingual linguistic sorts in the database. In Oracle Database 10g, these linguistic sorts have been expanded to provide case-insensitive and accent-insensitive sorting and searching capabilities inside the database. By using the OGS - OraCollator class, Java applications can sort and search for information, based on the latest Oracle binary and linguistic sorting capabilities.

Normalization (decomposition of characters) can be an important part of sorting. The Unicode standard is the basis for the composition and decomposition of characters, so sorting, which uses normalization, depends on the Unicode standard. OGS provides an OraNormalizer class based on the Unicode 3.2 standard. The OraNormalizer class contains methods to do composition and decomposition.

The following four monolingual linguistic sorts are not supported in OGS. They are:

  • Thai Telephone

  • Thai Dictionary

  • Japanese

  • Canadian French

To obtain similar sorting results, you can also use their multilingual counterparts which are:

  • THAI_M

  • JAPANESE_M

  • CANADIAN_M

Next, let's look at Internet Services.

Internet Services in OGS

Internet Services in OGS provide the globalization framework for middle-tier Java applications. This OGS framework is designed to minimize the effort required to build new applications or to migrate monolingual internet applications into globalized applications.

OGS Internet Services requires Servlet container version 2.3 or later.

This section discusses the following topics:

  • Locale Sources in OGS

  • Locale Availability in OGS

  • Determining Locale in OGS

  • Determining of the Locale Source in OGS

  • Locale Verification in OGS

  • Locale Caching in OGS

  • Overwriting Locale in OGS

  • Character Set Handling in OGS

  • Rewriting URLs to Access Local Contents in OGS

Locale Sources in OGS

Oracle Database 10g OGS provides new APIs to detect user predefined locale sources that are accessible by an internet application. The locale detection in Java is very primitive; it does not support user-selected locale preferences or locale preferences from user profiles. Also, the only locale fallback mechanism in Java is based on the locale preferences stored in the Accept-Language header. For your more advanced applications, these primitive locale detection features in Java will not be adequate.

Oracle Database 10g OGS provides support for your predefined locale sources, sources like:

  • User input locale

  • HTTP language preference

  • The application default locale

The OGS framework allows for the addition of custom locale sources, such as user profiles from the database or the LDAP server. However, because the user profile schema definitions vary across applications, distinct database access objects must be created to support them.

Locale Availability in OGS

The number of the locales or languages that an application needs to support depends on the target user base. The Oracle Database 10g OGS Internet services returns to the application the following locale availability data:

  • A list of the locales that are supported for a given application. This information is defined in the OGS application configuration file.

  • A list of the Oracle languages, Oracle territories, Oracle linguistic sorts, and Oracle character sets based on the locales supported in the application.

  • The user locale of the current request

  • A list of the commonly used languages in a given territory based on the current user locale

  • A list of the common territories that support a given language based on the current user locale

  • A list of the commonly used character sets based on the language defined in the current user locale

  • A list of the local time zones for a given territory defined in the current user locale

Using Java and OGS, an application can use locale information to generate a menu for users to select their preferred locale. The locale entries in the selection list will contain only the locales that are supported by the application, based on the returns from the OGS API. This provides the added benefit that the supported application locales can be increased or removed without having to change the application code, because they are defined in the configuration file and not in the application code itself.

Determining Locale in OGS

Oracle Database 10g OGS offers automatic locale detection to determine the current locale of the user. OGS provides the localizer, which is an all-in-one globalization object that provides commonly used globalized information for your page. If you require more than the functionality of the basic request.getLocale class, then using the Localizer class provided by OGS is better.

Determining the Locale Source in OGS

OGS determines the user locale based on the locale determination rules, defined by either the DBA or the developer in the OGS application configuration file.

For example:

  • When the LDAP user has been authenticated, and the LDAP locale preference is defined, then it is used as the preferred locale of the current user.

  • When the LDAP server is not available or the user has signed off from the server, then the preferred locale of the current user is defined according to the locale information stored in the HTTP Accept-Language header.

  • When both the LDAP and HTTP locale preference are not available, then the application default locale defined in the application configuration file is used to define the current user locale.

In the first scenario listed above, the locale data is stored in a local cookie, so that the subsequent web pages can use the cookie without having to access the LDAP server. In the second scenario, the locale is not saved in the cookie, but the HTTP language preference in the Accept-Language header is referenced on each request.

In the case where the HTTP Accept-Language header is the locale source, the locale order defined by the q parameter is considered by OGS.

The default locale of the Java VM (where the application executes) is not depended on by OGS because the application default locale defined in the OGS application configuration file catches all fallback operations.

Locale Verification in OGS

In OGS, each locale source can be further verified to determine whether the returned locale is supported by the application. This verification is done by determining that the returned locale is defined as one of the supported locales in the OGS application configuration file. The verification methodology is:

  • Get the list of the supported application locales from the application configuration file.

  • Verify that the returned user preferred locale is a part of the application locale. If the answer is yes, use this locale as the current user locale.

  • If the returned user preferred locale includes a variant, then remove the variant and return to step 2. For example, de_DE_EURO becomes de_DE.

  • If the returned user preferred locale includes a country code, then remove it and go to step 2. For example, de_DE becomes de.

  • If the returned user preferred locale does not match any of the locales defined in the application configuration file, then return the default application locale defined in the application configuration file as the current user locale.

If you exclude the list of supported application locales from the application configuration file, then OGS assumes that all locales are supported by the application, resulting in no default application locale being defined for the application.

Locale Caching in OGS

Once the current user locale has been defined, OGS will cache the locale information, so that the application doesn't need to redetermine and reverify the user locale with each request.

When the preferred user locale source is gotten from the LDAP server or a user profile table in the database, then a local cookie is used to cache the locale information.

In most cases, the locale preference will be static throughout the user session, therefore, storing it in the local cookie reduces the need to retrieve the same piece of information from the server over and over again.

When the locale source is obtained from user input or the HTTP locale preference, then this locale information is passed to the application as a parameter and not cached as a cookie. This is because the locale preference may change between requests.

Overwriting Locale in OGS

OGS also provides a method for the application to overwrite the locale preference information stored in either the LDAP server or in the user profile table in the database. This will also reset the current locale information stored inside the local cookie for the current session.

This operation is ignored when the user input locale or the HTTP Locale preference is used because these locale sources are read-only.

Character Set Handling in OGS

Oracle Database 10g OGS supports these scenarios for setting the character sets of the application HTML pages:

  • A single local character set is dedicated to the application. Single local character sets are appropriate only for a monolingual internet application.

  • Use the native character set for each language. For example, English contents are represented in ISO-8859-1 and Japanese contents are represented in Shift_JIS. This would be appropriate for multilingual internet applications that use the default character set mapping for each locale.

  • Use Unicode UTF-8 for all contents regardless of the language. This is appropriate for multilingual applications that use Unicode for deployment.

  • OGS does not support the scenario where the incoming character set is different from that of the outgoing character set.

The character set information is specified in the OGS application configuration file. This configuration information is used by ServletRequestWrapper and ServletResponseWrapper classes, which set the proper character set for the request object. It is also used by the ContentType class for output when instantiated.

When the page-charset is set to auto-charset, then the character set of the incoming content is determined to be the default character set of the current user locale. The default character set is derived from the locale-to-character-set mapping table, specified in the application configuration file. When the character set mapping table in the application configuration file is not available, then the locale is based on the table that maps the default locale name to the IANA character in OGS. The default character mappings are derived from the OraLocaleInfo class.

Rewriting URLs to Access Local Contents in OGS

When utilizing a monolingual application, which by definition, only supports a single locale, the URL ending in /index.html by default takes the user to the starting page of the application. However, in a multilingual application, contents in different languages are stored separately, and it is not uncommon for them to be staged in different directories, based on the language or the country name. This language or location information is then used during retrieval by specifying it in the URL.

OGS removes the need for the developer to hardcode paths and locate translated files by using the ServletHelper class. The ServletHelper class rewrites the URLs based on the current user locale as determined by OGS APIs.

OGS Application Configuration File

The OGS application configuration file is the heart of the OGS APIs. The application configuration file dictates the behavior and the properties of an application because it contains locale mapping tables and parameters for the configuration of the application. In OGS, one configuration file is required for each individual application. The file, called; ogsapp.xml, is a XML document. This file resides in the ./WEB-INF directory of the J2EE environment of the application. This file should be maintained by the DBA, based on requirements specified by the developers during the application development process. Additions of locales and other language-specific entries should also be done by the DBA during the application lifetime.



The above is an excerpt from the bestselling Oracle10g book Oracle Database 10g New Features by Mike Ault, Madhu Tumma and Daniel Liu, published by Rampant TechPress.