Report
The Construction of Online Databases Related to Asian and African Area Studies
SONG Xianfeng (COE Researchers)

Objectives
          The aim of area studies is to attain a holistic understanding of an area by integrating ecological, socio-cultural and social scientific viewpoints. In order to advance area study, there is a need for collaboration between area-specific information with various formats, including topographical maps, satellite images, aero-photos, GPS photos, and so on. Today, scientific research in Asian and African Area Studies has led to the accumulation of the above geographic collections from long-term studies. To create an efficient and effective means of using these valuable research materials for integrated area studies, we have been constructing an online metadata database system to promote data discovery and data sharing.
          The online metadata database is a highly scalable infrastructure, relying on open source solutions and metadata standards. On one hand, the system supports the process of extending the new database; on the other hand, the database system can be accessed with any Z39.50-compatible system, however, without changing the underlying data structure.

Open Source Solution
          In the geo-informatics community, the software used to process geographic information has improved over the past years. Among them, the free GIS projects, which aim to promote freedom in the scope of GISs, have made great progress. In addition to a number of desktop GIS packages, open-source alternatives to commercial software can also be found in the areas of web mapping and metadata techniques. It seems that many proprietary applications also have free alternatives. Table 1 is a list of open source software packages that are competitive with vendor products. It is not a complete comparison; more can be found at the FreeGIS web site. It is by no means an exhaustive list of software packages, and the inclusion or exclusion of a product in this comparison does not in any way reflect on its merit.
          Whether or not to use these open source software packages depends on a deep analysis of each package’s functions and the actual project situation. Nevertheless, it is worth exploring and is possible to experiment, principally because of the cost savings. One point of discussion on the cost saving question is that the open source software simply has a better “total cost of operation,” which is often realized in a middle term.

Table 1. Open Source Alternatives
 Open Source SoftwareVendor Products
Light Ware OpenEV/OpenMap ArcView
Heavy Ware GRASS ArcInfo
Spatial Data Engine PostGIS ArcSDE
Spatial Database PostgreSQL Oracle/IBM DB2/Informix
Web Mapping UMN Mapserver ArcIMS
Metadata Tools
(Z39.50)
Isite
ZETA Perl
OCLC SiteSearch
Finsiel ZETA

          In our system, as shown later, we made use of several of the open source software packages in the above table. The PostgreSQL/PostGIS is used for the storage and management of the Dublin Core metadata of our geographic datasets. Based on this common metadata database platform, we can discover the database by two means: one is to visualize the geographic collection over a global map (UMN mapserver); the other is geographic dataset search and retrieval using a Z39.50-compliant service (Isite and ZETA Perl).

Methodology

Data Catalogue Using the Dublin Core Metadata
          The geographic information used for area studies consists of heterogeneous datasets. The metadata is one suitable paradigm for managing such multimedia information. Among metadata standards, Dublin Core is popular, thanks to its simplicity and extensibility. It is a possible alternative to the essential CSDGM metadata, a complex standard used primarily for geographic collections in FGDC, USA. We chose Dublin Core to standardize our geographic dataset collections in order to build a united data catalogue.
          The metadata in our case is stored in a relational database (PostgreSQL/PostGIS) instead of an XML-type file. The tag-structured metadata file is characterized by redundancy and imposes certain time costs during the tag parsing. On the positive side, the database tool is more efficient and effective for maintaining a huge volume of metadata records. However, when needed, there is no difficulty in exporting the metadata records from the database into a tag representation format such as XML, RDF or HTML <meta> tag etc.

Geographic Dataset Visualization and Discovery Using a Map
          All geographic information is associated with a certain position in a global coordinate reference system, such as latitude and longitude, and perhaps altitude in a 3D context. It is suggested that linking these geographic collections to a detailed global map using their geo-location would help to identify the exact geographic coverage of the collections and discover other sources of datasets related to the same area of interest. This exercise of locating geographic collections via a map overcomes the uncertainties and inadequacies of text descriptions.
          The UNM Mapserver is used for online dynamic mapping. It provides users an index map using two main data sources: the Vector Map Level 0 (formerly known as the Digital Chart of the World); and the PostgreSQL database in which the Dublin Core metadata of geographic collections are stored. A series of indices can be produced, i.e. topographic map indices, Landsat MSS path/row map, Landsat TM path/row map, GPS photo distribution etc.

Distributed Search and Retrieval using a Z39.50 protocol
          The metadata stored in the PostgreSQL database can be directly queried over the Internet using a CGI/PHP technique. However, it would be more meaningful and interesting if the database could be structured to be compatible with a standard search protocol, for example, Z39.50, which was designed originally for searching distributed libraries. We made an effort to build a Z39.50-compliant server to integrate our metadata databases. By doing so, it becomes possible to collate our metadata database with other similar works. The compatibility of the BIB1, GEO, GILS and DC profiles increased the system inter-operability. We linked the Z39.50-compliant gateway to promote a large share of our datasets, i.e. the geographic clearinghouse at GSI, Japan.
          We used the Isite package to deliver our database to a Z39.50 service. The key point for forwarding the Z39.50 query to the SQL database is the middle tier we wrote, a translation script between the Z39.50 protocol and the portal SQL statement (with the proper query format depending on the database structure). It is configured into the Isite Zserver; it responds by parsing a query expression and feeding back search results in the specific presentation format required by the client.

Implementation of WWW-to-Z39.50 gateway using ZETA Perl Package
          In addition to cooperating with the GSI gateway, a local WWW-to-Z39.50 gateway was also constructed to provide a high quality map navigation system to assist in information search and retrieval. The gateway consists of two parts: a UMN map server and ZETA Perl. The mapserver uses Vector Map Level 0 (1:1,000,000) to dynamically generate the guide map, which shows in detail the landscape, country borders, administrative units, traffic, water/river systems, major built-ups, and place name annotations. The user can browse the guide map and define a box for finding interesting collections within an area of interest. The search and retrieval is implemented using ZETA Perl – a consistent interface to the Z39.50 services and protocol for Perl applications. The gateway itself is also coded as a system service. It can operate multiple Z39.50 servers at one time and provides the user with three main functions: search, present and retrieval. The lifespan of a search session is set at 15 minutes, counted from its last activity to its remote Z39.50 server. The socket used for the search session is returned to the socket pool once it is no longer being used.

Conclusions
          This paper describes our work in building an operational system for the search and retrieval of geographic collections using Open Source Software. By constructing such a system, we have made the following experimental findings. 1) The adoption of a metadata standard, Dublin Core in our case, can promote effective searching of instance-specific collections in a universal way after standardizing their descriptions by metadata mapping. 2) Despite the difficulties encountered in encoding, the SCRIPT search engine has been tremendously helpful in linking SQL databases with the Z39.50-compliant Isite server. 3) The Minnesota mapserver and Vector Map Level 0 can dynamically generate an attractive 1:1,000,000 guide map including major road/rail networks, hydrologic systems, airports, elevation contours, coastlines, international boundaries and areas of population. 4) The mapserver can provide users with a perfect geo-location index map to visually show the geographic collection on the guide map. Through URL links, the raw collection and its meta-information can be further browsed online. 5) With the mapserver and ZETA Perl, it is possible to write a robust and flexible Z39.50-comliant gateway that meets our requirements. We hope there may be some lessons here that can be applied to similar processes in the geo-informatics community.

The online system will be available soon.

 
21st Century COE Program -Aiming for COE of Integrated Area Studies-  HOME