The Electronic Text Corpus of Sumerian Literature
Catalogues: by date | by number | in full | Website info: navigation help | site description | display conventions | recent changes
Project info: consolidated bibliography | about the project | credits and copyright | links


ETCSL: SGML-XML markup


SGM

SGML, the Standard Generalised Markup Language, is an international standard (ISO 8879: 1986) for writing tagging languages which describe the structure, rather than the visual appearance, of texts. SGML works by means of Document Type Definitions (DTDs) which prescribes the order, hierarchy and frequency of the elements of a text, and the writing system used. It is particularly useful for ensuring structural consistency throughout a large body of material, and for systematically tagging noteworthy or interesting features of those texts. Because it is an international standard and not a proprietary format, SGML is independent of platform, application and character-set and therefore extremely portable and durable. In short, it is ideal for encoding large language corpora which need to be searched, analysed and shared between projects over a long period of time.

There are many internationally or professionally standard DTDs but, not surprisingly, nothing quite suitable for marking up a corpus of Sumerian literature. The corpus project therefore constructed its own set of DTDs for composite texts and translations, designed to be compatible with those used by its sister projects:

At the end of the first phase of the project in the spring of 2001 all SGML versions of the compositions then edited were deposited with the Oxford Text Archive, a department of the Arts and Humanities Data Service. They will be publicly available from the OTA by the end of 2001. Regretfully, the ETCSL project itself does not have the resources to supply SGML files direct to users.

XML

Since the DTDs were written and implemented in 1997, a simplified version of SGML has been developed, called Extensible Markup Language, or XML. Because of its simplicity XML is much easier to work with than SGML. The project is therefore currently making an XML version of the SGML corpus, to enable (amongst other benefits) a search interface to be devised and a parallel XML-based website to be created.

Meanwhile, the compositions in the corpus are published on this website in HTML 4.0, probably the best known and most successful application of SGML.

Next: The HTML-XML website


Top | Home


Page created on 7.ix.2001. Last revised on 7.ix.2001 by ER.