WP12: Cross concordances of classifications and thesauri

Overall goal

Initial situation:

within the systems of libraries and specialized information centers different classifications and thesauri are in use. Therefore a search across subjects or files is impeded. A person searching e.g. first a library catalogue in Regensburg, then in Lower Saxony or in the U.S., and subsequently articles in the reference database of an information service has to work with different search terms and the respective search logic of the system, so that an efficient search is hardly possible. The user as a rule is only familiar with the classification or the thesaurus which he uses normally. This problem is increased if the different library catalogues and reference databases are connected technologically by a common user interface. This is also true for the application of different indexing systems for metadata.

The goal is to allow an integrated search for subject aspects in distributed data holdings with different intentional emphases taking into account the conceptual differences of the applied thesauri and classifications by cross concordances.

In order to achieve the overall goal it is necessary

to examine the methodics of cross concordances between classifications or thesauri
to program a procedure for the representation of these cross concordances between the different classifications or thesauri available in the Internet.
to establish a prototype of cross concordance for special subjects and selected classifications or thesauri.

According to the emphasis in CARMEN mathematics and physics are selected as a specific basis. For methodological reasons the subject-oriented frame will include social sciences.

By parallel action in classifications and thesauri it is possible to understand diverging problems and methods of solution within different indexing procedures and subjects. This will secure the prototypical character of the study. The solutions shall be applicable also to other classifications and thesauri that have not been included in the study.

Objectives and products of the work package

The clarification of methodological questions is common to both subfields, classification and thesaurus.

The cross concordances refer to classifications/thesauri which represent a closed system. Navigation between different classifications/thesauri must be made possible by cross concordances.
The kind of relation between related notations/descriptors must be mapped, e.g.:
- relation 1:1 (synonymous terms, parallel notations);
- broader term : narrower term;
- narrower term : broader term;
- related terms;
- measure for matching and so forth
These relations will be evaluated within workpackage 7 (retrieval), 9 (interdisciplinary information system), and 11 (treatment of heterogeneity).
The cross concordances will access partly local, partly decentralized linked data pools (classifications/thesauri). The updating of the respective classification or thesaurus will be done by the responsible institution, e.g. the American Mathematical Society, OCLC ... The subsequently required updating of the cross references will be undertaken by UB Regensburg and Die Deutsche Bibliothek. On this occasion the compatibility of the software products to be developed must be considered.

Programming

The programming of the cross concordances will be done in Java based on a relational database system with an abstract intermediate level to allow a transit to different producers of database software. The method of rapid prototyping will by applied. A mutual software tool for both subfields, classification and thesaurus, will be developed.

In the field of classification programming will be done in two parts. Based on the project RVK-Online a system for data maintenance, structured search, and user friendly presentation of classifications will be developed. This tool should on principle be able to map any classification. This will concretely be realised for RVK and DDC.

In addition the cross concordance referring to these or other not involved classifications (MSC, PACS, and the classification for social sciences) will be developed.

The aim is to refer exactly to the actual applied position of a classification, not just to the classification itself. This functionality should also be used for metadata.

It is yet undecided whether this distributed, organizationally advantageous structure will stand the test, or whether a different structural model will prove more promising. To this the experiences out of the project ELVIRA (cf. AP11) and Regensburg's preparatory work will be taken into consideration.

The results have to be visualized both for punctual search and for navigation and mapping of the structure of the classification/thesaurus. The expenditure for the development of the classification software in particular will be a major one. Parts of the functionality are realized already for the SWD.

Subarea classification

The methods for a concordance between general classification and special classification shall be worked out exemplary. Particularly suitable seem to be DDC and RVK in the areas mathematics and physics as well as MSC and PACS.

A concordance between the classification for the social sciences on one hand and RVK and DDC on the other hand will be created additionally.

The problems is the overlap berween the technical terms of the special classifications (MCS, PACS) and between the very specialized classifications and the general classifications (e.g. MSC-DDC). In the project it has to be examined whether a single classification will form the basis to which the other classifications will be mapped, or whether each classification will be mapped to the others. There is also a need to find out how to implement a structured search in concept trees of varying parallel classifications.

In addition to the search for notations in the classification and the search within the hierarchical structure there is the need for a verbal search. A bilingual search in all classification is highly desirable.

Way of organization

The prototype of the cross concordance of classifications will be programmed at the University of Regensburg and integrated into CARMEN by AP 11 and AP 7. It can be integrated into library systems later and also be used as a stand-alone system by specialized information centers and publishers. The concordance of classifications will be compiled in Regensburg in cooperation with OCLC. OCLC is an external partner.

The concordance of thesauri will be compiled cooperatively by Die Deutsche Bibliothek Frankfurt and the IZ in cooperation with DIFF and Max-Planck-Institut für Bildungsforschung.

Leske + Budrich publishers will act in an advisory capacity for the social sciences to ensure applicability at a publishing agency

Beside the CARMEN project there is a cooperation with the applications of the Fachhochschule Regensburg and Leske + Budrich publishers within Slot 1 (author systems for multimedia products).

With regard to the application of products like DDC and SWD which are liable to a license fee it will be necessary later to use the GlobalInfo program's modules for settlement of accounts.

Planning for exploitation

The software products will be applicable for different classifications and thesauri and will be provided for free. There is a great demand for products like that. The compiled concordances will be prototypes which will be made available for free on the Internet by the involved institutions and will be maintained by the respective institutions.

Subarea classification: the classifications RVK and possibly a German edition of DDC as well as the concordance will be supplied permanently by the University of Regensburg. Once there is an actual production a contract has to be set up with OCLC (Forest Press) for the DDC license. It is also conceivable to maintain a German edition of DDC at Die Deutsche Bibliothek.

Subarea thesaurus: the thesauri will be held continuously at the respective institutions involved, the intermediating concordance possibly at Die Deutsche Bibliothek. The SWD copyright will not be touched.

Contact

Dr. Albert Schröder

93042 Regensburg

Germany

E-Mail: albert.schroeder@ur.de

Phone: +49 (0)941 943-3903