


CHANT Database: Towards an Integrated Platform for the Study of Ancient Chinese Texts |
||||||||||||||
Ho Kwok Kit Computer Project Officer, CHANT Center |
||||||||||||||
The CHANT project started in 1988 when the Institute of Chinese Studies (ICS) received a 1.35M Hong Kong dollar grant from the University Grants Committee (UGC). The initial objective is to build up an electronic database and make use of it to publish a series of concordances for the Pre-Han and Han traditional texts. In order to provide better services to scholars in this area, the project team determined to go further and release the database also in electronic form on CD-ROM and floppy disk. The system comes with a search program, with which the user is able to search and obtain the search results in just a few seconds.
There are quite a number of hurdles in the development of a computer system for ancient Chinese texts. The most prominent of all is the choice of internal coding scheme. One of the difficult problems for Chinese computing is that there are far too many Chinese characters. While there are only 26 English alphabets, we have thousands of characters, with many of them even hiding somewhere waiting to be found. The following statistics show that the number of Chinese characters is increasing over the years and roughly how many we may have to take care of:
Besides, some survey has collected more than 70,000 Chinese characters. It is important to note that all these figures have not included many characters of other forms like jiaguwen甲骨文and jinwen金文 etc. Therefore, the exact number of Chinese characters still remains a question mark. As we cannot wait until the availability of a perfect set of internal code, we have to decide on one that is available and convenient to our prospective users. In this connection, there are not many to choose from, though many efforts have been made and worth our high degree of appreciation and respect. One good example is the 「聚珍中文整合系統」 by Zhu Bang-fu 朱邦復. Mr. Zhu is the one who invented the Changjei Input Method 「倉頡輸入法」in '70s. In late '80s, he was invited to Shenzhen of Mainland China to help develop an integrated Chinese system, namely the 「聚珍」system. 「聚珍」 is remarkable in the way that it is a very compact system which can run on an ordinary PC with a small memory for word processing, desktop publishing, database management and spreadsheet etc. Its internal codes support up to 60,000 Chinese characters. Though appealing, 「聚珍」 was not well received by the market. Part of the reasons is that it has adopted a proprietary internal Chinese coding scheme and is therefore difficult to cope with the speedy pace in computer development. The「聚珍」 experience illustrated very well the dilemma presented to many researchers in the computerized study of ancient Chinese texts. Most existing Chinese internal coding schemes are far from satisfactory, but reinventing the wheel is both too costly and not a very favorable way to start with, especially when there are constraints on time and other resources. That is the reason why the CHANT Database started with the Big-5 Code. Though always criticized, Big-5 is by far the most popular Chinese Internal Code. It has a limited character space of 13,000 standard and 5,000 plus user-defined characters, but the good thing about it is that there are stable, mature and popular computer operating systems available supporting the scheme. For the CHANT Database, the initial platform was set on the Eten Chinese system, the most popular DOS Chinese system for PC in the 1980s. It was the time Microsoft Windows was still at its early stage of development and not regarded as a very stable system for Chinese applications. By today's standard, the DOS Eten system is outdated as it does not support multi-tasking and graphical interface readily. But as far as the study of ancient Chinese texts is concerned, Eten is very efficient and effective. This is the system the CHANT Center has been using for the publication of 50 plus concordances and the CHANT Database on CD-ROM and floppy for the Pre-Han and Han period. However, as the CHANT project moves forward to include the Wei-jin(魏晉)period and backward to cover jiaguwen, jianbo and jinwen, the limitations of the Eten system are becoming conspicuous, mainly because it is becoming obsolete. After nearly a decade's work, the CHANT Center is moving towards a new platform in building its database. The technology in use is Windows-, and most importantly web-based. Like Eten, Windows supports Big-5 and is the most popular operating system, which will certainly help to publish and distribute the database to a wide range of users. Besides, Windows* graphics capability makes it ideal for our jiaguwen, jianbo and jinwen databases, for which a large amount of graphics and images will be incorporated. As to the user characters needed to present rare ones not found in the standard Big-5 set, we are already able to break the 5,000 limitation with Eten. But with Windows, it is even easier as all our self-created characters will be in outline fonts. That means they can be easily enlarged for better display results. Different forms of characters including jiaguwen and jinwen can be mixed freely with ordinary characters. Rise of the World Wide Web on the Internet is changing the way of information dissemination and our way of study and research. The CHANT Center would also like to make use of web technologies to upgrade the quality and services of the database. In this direction, the CHANT Database will be published in HTML hyper-linked format. In addition to a PC, what the users need is just a standard and popular browser for the viewing and searching of ancient Chinese materials of different periods, no matter they are standard Chinese, oracular characters, or even pictures. As a matter of fact, we have already set up a trial CHANT web-site (http://www.chant.org) to demonstrate our efforts. We believe that this will start a new page in the study of ancient Chinese texts. |
||||||||||||||
Back to TOC |