The Construction of Online Databases Related to Asian and African Area Studies |
SONG Xianfeng (COE Researchers) |
Objectives
The aim of area studies is to attain a holistic understanding of an area
by integrating ecological, socio-cultural and social scientific viewpoints.
In order to advance area study, there is a need for collaboration between
area-specific information with various formats, including topographical maps,
satellite images, aero-photos, GPS photos, and so on. Today, scientific research
in Asian and African Area Studies has led to the accumulation of the above
geographic collections from long-term studies. To create an efficient and
effective means of using these valuable research materials for integrated
area studies, we have been constructing an online metadata database system to
promote data discovery and data sharing.
The online metadata database is a highly scalable infrastructure, relying on
open source solutions and metadata standards. On one hand, the system supports
the process of extending the new database; on the other hand, the database
system can be accessed with any Z39.50-compatible system, however, without
changing the underlying data structure.
Open Source Solution
In the geo-informatics community, the software used to process geographic information
has improved over the past years. Among them, the free GIS projects, which
aim to promote freedom in the scope of GISs, have made great progress. In addition
to a number of desktop GIS packages, open-source alternatives to commercial
software can also be found in the areas of web mapping and metadata techniques.
It seems that many proprietary applications also have free alternatives. Table
1 is a list of open source software packages that are competitive with vendor
products. It is not a complete comparison; more can be found at the FreeGIS
web site. It is by no means an exhaustive list of software packages, and the
inclusion or exclusion of a product in this comparison does not in any way
reflect on its merit.
Whether or not to use these open source software packages depends on a deep
analysis of each package’s functions and the actual project situation.
Nevertheless, it is worth exploring and is possible to experiment, principally
because of the cost savings. One point of discussion on the cost saving question
is that the open source software simply has a better “total cost of operation,” which
is often realized in a middle term.
Table 1. Open Source Alternatives |
|
| Open Source Software | Vendor Products |
|
Light Ware |
OpenEV/OpenMap |
ArcView |
|
Heavy Ware |
GRASS |
ArcInfo |
|
Spatial Data Engine |
PostGIS |
ArcSDE |
|
Spatial Database |
PostgreSQL |
Oracle/IBM DB2/Informix |
|
Web Mapping |
UMN Mapserver |
ArcIMS |
|
Metadata Tools
(Z39.50)
|
Isite
ZETA Perl
|
OCLC SiteSearch
Finsiel ZETA |
|
In our system, as shown later, we made use of several of the open source software packages in the above table. The PostgreSQL/PostGIS is used for the storage and management of the Dublin Core metadata of our geographic datasets. Based on this common metadata database platform, we can discover the database by two means: one is to visualize the geographic collection over a global map (UMN mapserver); the other is geographic dataset search and retrieval using a Z39.50-compliant service (Isite and ZETA Perl).
Methodology
Data Catalogue Using the Dublin Core Metadata
The geographic information used for area studies consists of heterogeneous
datasets. The metadata is one suitable paradigm for managing such multimedia
information. Among metadata standards, Dublin Core is popular, thanks to its
simplicity and extensibility. It is a possible alternative to the essential
CSDGM metadata, a complex standard used primarily for geographic collections
in FGDC, USA. We chose Dublin Core to standardize our geographic dataset collections
in order to build a united data catalogue.
The metadata in our case is stored in a relational database (PostgreSQL/PostGIS)
instead of an XML-type file. The tag-structured metadata file is characterized
by redundancy and imposes certain time costs during the tag parsing. On the
positive side, the database tool is more efficient and effective for maintaining
a huge volume of metadata records. However, when needed, there is no difficulty
in exporting the metadata records from the database into a tag representation
format such as XML, RDF or HTML <meta> tag etc.
Geographic Dataset Visualization and Discovery Using a Map
All geographic information
is associated with a certain position in a global coordinate reference system,
such as latitude and longitude, and perhaps altitude in a 3D context. It is suggested
that linking these geographic collections to a detailed global map using their
geo-location would help to identify the exact geographic coverage of the collections
and discover other sources of datasets related to the same area of interest.
This exercise of locating geographic collections via a map overcomes the uncertainties
and inadequacies of text descriptions.
The UNM Mapserver is used for online dynamic mapping. It provides users an index
map using two main data sources: the Vector Map Level 0 (formerly known as the
Digital Chart of the World); and the PostgreSQL database in which the Dublin
Core metadata of geographic collections are stored. A series of indices can be
produced, i.e. topographic map indices, Landsat MSS path/row map, Landsat TM
path/row map, GPS photo distribution etc.
Distributed Search and Retrieval using a Z39.50 protocol
The metadata stored
in the PostgreSQL database can be directly queried over the Internet using a
CGI/PHP technique. However, it would be more meaningful and interesting if the
database could be structured to be compatible with a standard search protocol,
for example, Z39.50, which was designed originally for searching distributed
libraries. We made an effort to build a Z39.50-compliant server to integrate
our metadata databases. By doing so, it becomes possible to collate our metadata
database with other similar works. The compatibility of the BIB1, GEO, GILS and DC
profiles increased the system inter-operability. We linked the Z39.50-compliant
gateway to promote a large share of our datasets, i.e. the geographic clearinghouse
at GSI, Japan.
We used the Isite package to deliver our database to a Z39.50 service. The key
point for forwarding the Z39.50 query to the SQL database is the middle tier
we wrote, a translation script between the Z39.50 protocol and the portal SQL
statement (with the proper query format depending on the database structure).
It is configured into the Isite Zserver; it responds by parsing a query expression
and feeding back search results in the specific presentation format required
by the client.
Implementation of WWW-to-Z39.50 gateway using ZETA Perl Package
In addition to cooperating
with the GSI gateway, a local WWW-to-Z39.50 gateway was also constructed to provide
a high quality map navigation system to assist in information search and retrieval.
The gateway consists of two parts: a UMN map server and ZETA Perl. The mapserver
uses Vector Map Level 0 (1:1,000,000) to dynamically generate the guide map,
which shows in detail the landscape, country borders, administrative units, traffic,
water/river systems, major built-ups, and place name annotations. The user can
browse the guide map and define a box for finding interesting collections within
an area of interest. The search and retrieval is implemented using ZETA Perl – a
consistent interface to the Z39.50 services and protocol for Perl applications.
The gateway itself is also coded as a system service. It can operate multiple
Z39.50 servers at one time and provides the user with three main functions: search,
present and retrieval. The lifespan of a search session is set at 15 minutes,
counted from its last activity to its remote Z39.50 server. The socket used for
the search session is returned to the socket pool once it is no longer being
used.
Conclusions
This paper describes our work in building an operational system for the search
and retrieval of geographic collections using Open Source Software. By constructing
such a system, we have made the following experimental findings. 1) The adoption
of a metadata standard, Dublin Core in our case, can promote effective searching
of instance-specific collections in a universal way after standardizing their
descriptions by metadata mapping. 2) Despite the difficulties encountered in
encoding, the SCRIPT search engine has been tremendously helpful in linking
SQL databases with the Z39.50-compliant Isite server. 3) The Minnesota mapserver
and Vector Map Level 0 can dynamically generate an attractive 1:1,000,000 guide
map including major road/rail networks, hydrologic systems, airports, elevation
contours, coastlines, international boundaries and areas of population. 4)
The mapserver can provide users with a perfect geo-location index map to visually
show the geographic collection on the guide map. Through URL links, the raw
collection and its meta-information can be further browsed online. 5) With
the mapserver and ZETA Perl, it is possible to write a robust and flexible
Z39.50-comliant gateway that meets our requirements. We hope there may be some
lessons here that can be applied to similar processes in the geo-informatics
community.
The online system will be available soon. |