The OSIS Website
"A Common Format for Many Visions"

OSIS™ 2.0 User's Manual (draft)

Bible Technologies Group

OSIS™ 2.0.1 User's Manual (draft)


Draft Version of OSIS User's Manual

Note the updated schema and users manual number. One of our users already spotted a bug and it has been corrected. Make sure the schema and users manual you are consulting correspond in the numbering.

As you go through this guide to the OSIS 2.0.1 schema, you are going to notice mistakes, omissions and examples you don't find useful. Those were not left as an exercise for the reader.

The editors discussed having a registry of Bible verses for people who contribute corrections, supply omissions or examples but feared that there might be more corrections, supplied omissions or examples than there are verses in the Bible. Not to mention that some verses are more popular than others.

So, as an alternative, future versions of the OSIS User's Manual will have a Contributor's section, which will list your name and the number of corrections or supplied omissions/examples that you have contributed to the manual. Counting by the editors will be final but generous and credit given for duplicates or suggestions not ultimately used in the form submitted. Please specify if you want your email contact information included as well. Address your comments, corrections, supplied omissions/examples, to osis-editors@bibletechnologieswg.org.

This manual is meant to be a guide for all users of the OSIS schema and your assistance will be appreciate both by the editors as well as the community of OSIS users.

Contents

1. Introduction to OSIS™

Welcome to the OSIS (Open Scriptural Information Standard™) User's Manual. OSIS is a set of XML structures that can be used to produce Bibles, commentaries, and related texts that can be easily interchanged with other users, formatted as HTML, PDF, Postscript or any other desired format, and searched on any personal computer. It provides a standard way to express such documents, which is important because it saves time, money, and effort for:

  • authors, who will have less need to adjust their manuscripts for each different potential publisher;
  • publishers, who will gradually come to experience lower costs by not having to manage converting texts presented by authors in so wide a variety of formats, and by not having to provide texts in a different form to each electronic-book system vendor out there (or pay indirectly for those vendors to do the conversions).
  • and software vendors, who can avoid writing a lot of code to manage different formats, and thus make their programs smaller, faster, and more reliable.

The OSIS development team closely studied previous Bible encoding forms, as well as tools for literary encoding in general. By doing this we hope we have avoided some weaknesses, and gained from some strengths, of each one, and we thank the many people who worked on those prior specifications, as well as those who have provided help and feedback in developing OSIS itself, and testing it by encoding large numbers of Biblical and related texts. A list of participants may be found in an Appendix.

Users familiar with the Text Encoding Initiative will find OSIS markup quite familiar, because the bulk of the elements we define correspond directly to TEI elements, and almost always have the same name (though often simplified content). The schema also provides a TEIform attribute for such elements, so they can be recognized by form-aware processors as equivalent to their TEI counterparts. We have attempted to point out any elements below that do not have TEI equivalents, for the sake of anyone using both systems.

OSIS is provided as a free resource by the Bible Technologies Group™ (or BTG™), which is a collaborative effort of the American Bible Society, the Society of Biblical Literature, the Summer Institute of Linguistics, the United Bible Societies, other Bible Societies and related groups, and individual volunteers around the world. OSIS is designed to meet the needs of diverse user communities who read, study, research, translate or distribute biblical texts. This introduction gives a brief overview of OSIS before leading you step by step through producing your first OSIS text.

For more information on OSIS, you may wish to join the OSIS Users' Group. To do so, send mail to osis-user@whi.wts.edu, setting the Subject line to "subscribe". Online information about OSIS is also available at http://www.bibletechnologies.org and http://www.bibletechnologieswg.org.

2. Getting started

The first question that is often asked when learning that OSIS uses XML (a markup language) is: "I'm not a computer person. Can I learn to use OSIS?" If you can type and use even the most basic word processor or computer text-editing program, the answer is clearly "Yes!" OSIS was designed to be offer the beginning user a simple way to do the basic "markup" required for a standard biblical text. "Markup" refers to markers placed within the text, that indicate where useful units (or "elements") such as verses, quotations, cross-references, and other things begin and end.

If you know HTML, you already know most of what you need to know to use OSIS; OSIS uses the same pointy-bracket syntax as HTML (or XHTML to be completely precise). It merely provides a different set of element and attribute names. A few names such as "p" and "div" are the same; others are new, such as "verse". The core set of elements for OSIS is actually smaller than the set for HTML 3.2. To be sure, there are some complex cases that we deal with later, but you can do useful work with no more information than is provided in this basic manual.

The second question that is most often asked is: ‘Do I need an XML editor to do OSIS?’ This question often comes up after a friend of a friend has recommended some editor, and you then checked its price. XML editors vary from free to over $10,000.00 (US), and many are difficult to use (though XMetal™ is a notable exception, and not very expensive).

The basic answer is no, you do not need any special software. You can use any text editor you like to create OSIS documents (or any other XML documents, for that matter). Many will even color the tags for you, because they know how to color HTML tags and the languages are similar enough. However, you should have a way to check your documents for errors -- if your editor doesn't know enough about XML to warn you if you misspell a tag, or forget to end some element that you started, you will want to check for errors periodically using an "XML validator". Many such program are available for various computers; some are available as Web services. (See Appendix, Validating Your OSIS Document for pointers and instructions on web based validation services.) Both Internet Explorer and Netscape can also validate an OSIS file once you have installed the OSIS rules file (called a "schema") and an appropriate stylesheet.

An OSIS-aware text editor will do this checking for you, either on demand or continuously. A friendlier OSIS-aware text editor will provide help by showing you just which elements are permitted at any given place. The friendliest editors also give you the option to see and edit a fully-formatted view on demand, rather than staring directly at pointy-brackets. The choice between the many tools is a personal one, dictated by your working style, level of technical sophistication, goals, budget, and other factors.

3. Some authoring tools

The OSIS team is working even as this manual is being written to adapt free authoring tools that will hide most if not all of the markup from the casual user of OSIS. In the meantime, the best way to learn OSIS is to use a simple text editor, such as WordPad or Kedit on Windows, BBEdit or Alpha on MacOS, or vi or emacs on Linux. You can even use a word processor, though any formatting that you do in it won't matter (you would simply save the file as "text only").

The examples in this manual have been kept deliberately short and can be downloaded as a package from the OSIS website. After you have gained some basic skill using OSIS, you may want try out more sophisticated editors.

Editing is much easier with an editing program that is aware of XML rules in general, and OSIS in particular. For example, rather than seeing literal tags with pointy-brackets, you can have a choice of seeing that, or structural views of your document (say, as a tree or expandable outline), or fully-formatted views to facilitate print layout.

Many products are available that can help you edit XML documents. One style shows the literal XML source file, but colors tags, attributes, and other things to make them stand out. Most such programs also read an XML schema and ensure that you only insert elements and attributes are permitted by the OSIS schema (schemas, such as the OSIS one, declare what elements and attributes are permitted where in documents of a particular kind). One free and helpful tool of this kinds is jEdit, which runs on most platforms. It can be set up to know about many kinds of files, including XML files, and OSIS in particular.

With such an editor, you can see or print a basic a formatted view by using most any Web browser. Later in this manual are instructions for setting up an OSIS file with a style sheet (generally in CSS) so that typical browsers can deal with it.

There are also more word-processor-like XML editors, which primarily show a formatted view defined by some style sheet. These are mainly commercial. XML Spy is one such tool (see http://www.xmlspy.com/); XMetal (see http://www.corel.com/servlet/Satellite?pagename=Corel/Products/productInfo&id=1042152754863) is another.

For high-end layout and typesetting from XML source files, usually a stylesheet language called XSL-FO is used. Two of the more popular commercial XSL-FO solutions are 3b2 (see http://www.3b2.com/), and Antenna House (see http://www.antennahouse.com/). Non-XML-based composition systems such as Quark™ and TeX generally have ways to import XML, but using them for XML composition requires substantial expertise and effort.

4. Your First OSIS Document

Like HTML documents, an OSIS document starts with a header, and then goes on to the actual text content. The header identifies the file as being XML, and as using the OSIS schema. It also provides places to declare a bibliographical description of the work and of any other works cited; and a place to record a history of editing changes. Here is a short, but valid, OSIS document:

<?xml version="1.0" encoding="UTF-8"?>
<osis xmlns="http://www.bibletechnologies.net/2003/OSIS/namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.bibletechnologies.net/2003/OSIS/namespace osisCore.2.0.1.xsd">
<osisText osisIDWork="thisWork" osisRefWork="bible" xml:lang="en-US">
<header>
<work osisWork="thisWork">
<title>Contemporary English Version</title>
<type type="OSIS">Bible</type>
<identifier type="OSIS">Bible.en.CEV.1995</identifier>
<rights type="x-copyright">Copyright 1995 American Bible Society</rights>
<scope>Esth.1.1-Esth.1.4</scope>
<refSystem>Bible</refSystem>
</work>
<work osisWork="bible">
<type type="OSIS">Bible</type>
<refSystem>Bible</refSystem>
</work>
</header>
<div type="section" scope="Esth.1.1-Esth.1.4">
<title>Queen Vashti Disobeys King Xerxes</title>
<p>
<verse sID="Esth.1.1-Esth.1.2" osisID="Esth.1.1 Esth.1.2" n="1-2"/>
King Xerxes of Persia lived in his capital city of Susa and ruled one
hundred twenty-seven provinces from India to Ethiopia.
<verse eID="Esth.1.1-Esth.1.2"/>
<verse sID="Esth.1.3" osisID="Esth.1.3"/>
During the third year of his rule, Xerxes gave a big dinner for all
his officials and officers. The governors and leaders of the provinces
were also invited, and even the commanders of the Persian and Median
armies came.
<verse eID="Esth.1.3"/>
<verse sID="Esth.1.4" osisID="Esth.1.4"/>
For one hundred eighty days he showed off his wealth and spent a lot
of money to impress his guests with the greatness of his kingdom.
<verse eID="Esth.1.4"/>
</p>
</div>
</osisText>
</osis>

5. XML and OSIS declarations

The first several lines of any OSIS document will generally be identical:

The first line above identifies the document as being XML; this is required in exactly the form shown, and enables computers to identify how to process the rest of the document.

The second through third lines are a very long start-tag for the outermost OSIS element, which is called "osis." All elements in an OSIS document must be declared within the OSIS namespace. There are two ways to achieve this and other than remembering to pick one of the two following methods, that is all you need remember about it to start encoding texts using OSIS 2.0.

OSIS Namespace, Method 1: Copy the following lines just after <?xml version="1.0" encoding="UTF-8"/>:

<osis xmlns="http://www.bibletechnologies.net/2003/OSIS/namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.bibletechnologies.net/2003/OSIS/namespace osisCore.2.0.1.xsd">

OSIS Namespace, Method 2: Copy the following lines just after <?xml version="1.0" encoding="UTF-8"/>:

<osis:osis xmlns:osis="http://www.bibletechnologies.net/2003/OSIS/namespace"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.bibletechnologies.net/2003/OSIS/namespace osisCore.2.0.1.xsd">
Note with the second method, the last closing element must be: </osis:osis>. The first method is simpler but both are legitimate.

At this point, the OSIS document has begun. This sample is a single document rather than a collection of documents, so the next element opened is osisText:

<osisText osisIDWork="CEV" osisRefWork="Bible" lang="en">

Every osisText needs to supply an osisIDWork attribute and value. The value will generally be the short name of what is being encoded, in this case the Contemporary English Version, or CEV. The short name is defined in the work declaration for the work, described later. The work element that identifies the work being encoded should be the first work element, if the text has more than one. This sets things up for some of the later elements nested within the osisText element. One such element is work. It requires an osisWork attribute. That attribute's value has to be the same as the value found on the osisIDWork attribute of osisText (see line 7 of the sample). Other elements use/require an osisID attribute which refer back to the osisIDWork attribute of osisText (see lines 19 and 21 of the sample).

Every osisText also needs to specify what reference or versification scheme any osisRefs within it refer to. This may or may not be the same work. Depending on how finely you distinguish things, there are several major versification traditions, and countless fine-grained variations. For the present, we identify and reserve names for these major traditional reference systems:

  • NRSVA New Revised Standard Version with Apocrypha
  • NA27 Nestle-Aland, 27th Edition of the Greek New Testament
  • KJV King James Version or Authorized Version (AV)
  • LXX Septuagint
  • MT Masoretic Text. Hebrew tradition varies in several respects, the best known being that it numbers what is given as a title for Psalms in most English translations as verse 1,and the beginning of the psalm in such a translation as verse 2.
  • SamPent the Samaritan Pentateuch used a quite different numbering system.
  • Synodal Russian
  • Vugl Vulgate
  • Loeb This system is used for most classical literature, though many major works have other systems as well.

OSIS is developing a schema for declaring versification systems formally, and for declaring some systems in terms of others. This will enable programs to map between systems. However, at this time we merely reserve the names above for some systems we know to be substantially different and important.

6. Canonical vs. non-canonical parts of a work

The element osisText has one other important attribute that is not shown above. It is called "canonical", and always has a value of "true" or "false". When true, it asserts that the content is a part of the text being encoded. For example, the "text" of the Bible includes the content of books, chapters, and verses but does not include notes, section-headings added by editors or translators, etc.

The canonical attribute is available on all elements. Its value inherits in the same manner as xml:lang. Because of this inheritance, encoders will seldom need to make this attribute explicit. In osisText this attribute is set to a default value of "true", while header, note, and reference that setting is overidden by setting the value of that attribute to be "false."

In books other than the Bible, a similar distinction holds: the text proper of Herodotus' Histories must be contained in elements with canonical="true", while notes, header data, and the like must not.

The meaning of this attribute is limited. It must not be used to encode interpretive or theological judgements about canonicity. For example, encoders who include the apocryphal books of the Bible, or the alternate longer ending to the Gospel of Mark, must mark them as canonical (whether by default or explicitly). This is simply because they are part of the text being encoded. Users of a text are never justified in drawing conclusions about a translator's, editor's, or encoder's position on questions of inspiration or other theological questions based on how they set the canonical attribute, because the attribute does not mean that.

In most cases use of the canonical attribute is straightforward, and we expect that the default values will almost always produce the intended result. However, there will arise truly difficult cases: for example, one may be encoding an ancient text with annotations of its own. In that case those notes would be canonical, while any added by the current editor would not be. In such cases, the practice chosen and its rationale should be described in the work's documentation.

7. The OSIS text header

The first element within every osisText must be a header. The header declares various works (including the work being encoded and any that are being referenced), and provides a place to keep a revision history of the text.

7.1. The Revision Description

To record changes or edits to the text, authors and editors are encouraged to insert a change element every time significant editing is done. Each change element should contain a date element which says when those edits were completed, in the form

yyyy-mm-ddThh:mm:ss

Note that all fields must have exactly the number of digits shown (4-digit year, 2-digit month, etc.). It is permissible to omit the time and the preceding "T", thus giving just a date. For example, December 25th of 1999 CE would be:

1999-12-25

A date element in the revision description is followed by any number of p (paragraph) elements, in which the changes made are summarized. The person responsible for making the changes should also be identified, using the resp attribute on the change element.

Recommended practice is that more recent change elements appear earlier in the document. That is, entries should occur in reverse chronological order. For example:

<change><date>2003-09-11</date>
<p>sjd: Filling in the gaps. Adding some info for 2.0 as defined
at the Calvin College meetings.</p>
</change>
<change><date>2003-07-01</date>
<p>sjd: Annotated alpha list of elements. Reworked reference and
work sections and added type, scope, and explanations of type and
subtype for work. Explained more elements and attributes.</p>
</change>
<change><date>2003-06-17</date>
<p>sjd: Wrote conformance section. Added lists of elements and
attributes, USMARC list. Inserted placeholders for doc on all element
types. Got document back to XML WF. Wrote CSS stylesheet.</p>
</change>

7.2. Work Declarations

A work element is a declaration. It provides information comparable to that found on the title page of a printed work, using the fields defined by the Dublin Core Initiative (see http://dublincore.org/).

The work element serves two purposes. The work element in the header with an osisWork attribute that matches the osisIDRef in the osisText element identifies the work in which it occurs -- much like the title page in a printed work. For example:

<osisText osisIDWork="CEV" osisRefWork="Bible" lang="en">
<header>
<work osisWork="CEV">

Note that the match between osisIDWork="CEV" in osisText and osisWork="CEV" in the work element links this osisText to this particular work element.

Subsequent work elements identify other works -- much like a citation in a footnote or bibliography in a printed work. Each assigns a local name to each one, using the osisWork attribute. Works so declared can then be referred to from osisIDs or osisRefs throughout the text. For Bibles, this should generally be the accepted acronym or abbreviated form of the translation's name (some standard version abbreviations are listed in an appendix). No periods, hypens, spaces, or colons are allowed in short names.

Note: This mechanism of declaring a short name and using it later as a prefix, is very similar to the XML Namespace mechanism defined at http://www.w3.org/TR/xml-names11/.

7.3. The Dublin Core

Each work element describes a single publication using several pieces of information, primarily title, creator, date, publisher, identifier and language. All of the standard "Dublin Core" fields may be used, plus a few OSIS-specific additions (further information on the Dublin Core system may be found at http://www.dublincore.org). All of the Dublin core fields may be repeated as necessary, but must be encoded in the order shown here. For example:

<work osisWork="EG">
<title>Egyptian Grammar</title>
<creator role="aut">Alan Gardiner</creator>
<contributor role="dte">Francis Llewellyn Griffith</contributor>
<date event="original" type="gregorian">1927</date>
<date event="eversion" type="gregorian">2003</date>
<type type="x-grammar">Grammar</type>
<publisher>Griffith Institute, Ashmolean Museum, Oxford</publisher>
<language type="ISO-639">EN</language>
<language type="Ethnologue">EG-ancient</language>
<identifier type="ISBN">0900416351</identifier>
<identifier type="LCCN">95230980</identifer>
</work>
<work osisWork="CPV">
<title>Cotton Patch Version of Luke and Acts: Jesus' Doings and
the Happenings</title>
<creator role="aut">Clarence Jordan</creator>
<date event="original" type="gregorian">1969</date>
<date event="eversion" type="gregorian">2003</date>
<type type="x-bible">Bible</type>
<publisher>Association Press
<name type="place">New York, NY</name></publisher>
<language type="ISO-639">EN</language>
<identifier type="ISBN">0809617250</identifier>
<identifier type="LCCN">69-18840</identifer>
<scope osisRef="Luke" />
<scope osisRef="Acts" />
</work>

7.3.1. title

A title element must be provided in the work element and contain the main title of the work. Additional titles may also be specified, using the type attribute to identify them as main, sub, part, monographicSeries, or another kind of title. No OSIS-specific types are established for this type attribute.

7.3.2. creator

The creator element is used to specify the person(s) or organization(s) who are primarily responsible for the intellectual content of a work. The role attribute must specify the particular role the primary responsible party played. The most common values would be aut (author), edt (editor), cmm (commentator), trl (translator). A short list of such codes appears in Appendix D: Contributor Roles, with the complete set being found in Appendix G: USMARC Relator Codes This list covers an enormous range, and it should seldom if ever be necessary to use a code not from this list.

7.3.3. contributor

Many people may contribute to a work in roles other than the primary role listed under creator. They should be listed using the contributor element. Their specific role should be recorded in the role attribute of their contributor element. See Appendix G: USMARC Relator Codes for the complete list of role codes provided by the USMARC organization.

7.3.4. date

Date elements in the work element record significant dates in the production or publication process. Use the role attribute to identify the particular date contained in each of the date elements. Those defined roles are:

  • original The original publication date of the first edition
  • edition The date of publication of the referenced or source edition
  • imprint The printing date of the referenced or source edition
  • eversion The revision date of the present electronic edition

The type attribute is used, instead, to identify the calendrical system in which the date is expressed, from the list: Chinese, Gregorian, Islamic, ISO, Jewish, and Julian. At this time, OSIS only defines a syntax for Gregorian dates: yyyy:mm:dd. See the later section on "Date Formats".

7.3.5. publisher

The publisher element in the work element is used to indentify the publisher of a particular work. If a work was published by more than one publisher and that publication record needs to be recorded, use multiple publisher elements and distinguish them using the type attribute. The description given in this attribute is not constrained but it is suggested that values that tie a publisher to a particular edition, such as <publisher type="1848Edition"> should be used. For cases where full identification of a publication history is essential, use of multiple work elements is suggested.

7.3.6. language

A language element must be provided for each language used substantially in a work. The language may be specified using an ISO 639 or ISO 639-2, or SIL Ethnologue codes. The type attribute must be set to IANA, IETF, ISO-639-1, ISO-639-2, ISO-639-2-B, ISO-639-2-T, LINGUIST, or SIL. In the rare case that none of these is sufficient, a prose description should be inserted in the element and the type attribute set to other.

7.3.7. type

The nature or genre of the content of the resource. This element includes terms describing general categories, functions, genres, or aggregation levels for content. Dublin Core's recommended best practice is to select a value from a controlled vocabulary (for example, the DCMI Type Vocabulary -- see http://dublincore.org/documents/dcmi-type-vocabulary/). OSIS does not provide such a controlled vocabulary at this time. If you encode this element, the controlled vocabulary in use should be identified via the type attribute (for example, <type type="DCMI">). To describe the physical or digital manifestation of the resource, use the format element instead.

Note that the Dublin Core type element is distinct from the OSIS type attribute (the latter can occcur on any OSIS element, to distinguish relevant subdivisions of the type).

7.3.8. identifier

The identifier elements provide one or more formal identifiers for the work. The values to be entered for the type attribute on the identifier element are shown in bold. Note that these values must be entered exactly as shown. XML is case sensitive, that is to say, DEWEY is not equal to Dewey. Enter the latter one and you will get an error message.

  • DEWEY Dewey Decimal System
  • DOI Digital Object Identifier
  • ISBN International Standard Book Number
  • ISSN International Standard Serial Number
  • LCCN Library of Congress Control Number
  • OSIS Open Scriptural Information Standard
  • SICI Serial Item and Contribution Identifier
  • URI Uniform Resource Identifier
  • URL Uniform Resource Locator
  • URN Uniform Resource Name

ISBN and LCCN numbers must be recorded without spaces or hyphens. ISBNs must contain ten digits (that is, they must include the final check digit).

We strongly recommend the assignment of an ISBN to each published work using OSIS. This number must, if available, be specified in the identifier field for the work.

The following examples show identifier elements used along with their type attribute to provide an identifier for a work, in this case, the "Cotton Patch Version of Luke and Acts" noted above:

<identifier type="ISBN">0809617250</identifier>
<identifier type="LCCN">69-18840</identifer>
Note that without the proper type attribute, a reader or computer only has a string of numbers, which could be from almost any system of identifiers. The type attribute plays an important role in making sure the information you so carefully record is understandable to others or even yourself, after a few months have lapsed since you looked at the text.

7.3.9. coverage

This element may be used to specify the spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity) to which the work applies. For example, an edition of Herodotus could be specified as Greek/Hellenic, Classical Period. Or a study of medieval Bibles could declare coverage as "medieval".

7.3.10. description

An account of the content of the resource.

Examples of description include, but are not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.

7.3.11. format

The physical or digital manifestation of the resource.

Typically, format may include the media-type or dimensions of the resource. Format may be used to identify the software, hardware, or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats).

7.3.12. relation

A reference to a related resource.

Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.

7.3.13. rights

Information about rights held in and over the resource.

Typically, rights will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and other property rights. The rights element is informative only. Legal rights and penalties for violation of those rights vary from jurisdiction to jurisdiction. Reuse of any resource should be done only after obtaining the necessary rights and permissions or ascertaining that none is required.

7.3.14. subject

A topic of the content of the resource.

Typically, subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.

7.3.14.1. subject classification systems

The type attribute on subject allows the user to specify what classification system the subject entered can be found.

<subject type="ATLA">Fathers of the Church</subject>

Means that the subject "Fathers of the Church" is a subject found in the listing of subjects maintained by the American Theological Libraries Association (ATLA). To assist users, an admittedly partial list of the more well known subject classification systems have been prepared by the OSIS project. Those systems with their abbreviations for use with an OSIS encoded text are as follows:

  • ATLA American Theological Libraries Association
  • BILDI Biblewissenschaftliche Literaturdokumentation Innsbruck
  • DBC Dutch Basic Classification
  • DDC Dewey Decimal Classification
  • EUT Estonian Universal Thesaurus
  • FGT Finnish General Thesaurus
  • LCSH Library of Congress Subject Heading
  • MeSH Medical Subject Headings
  • NLSH National Library Subject Headings (National Library of Poland)
  • RSWK Regeln für den Schlagwortkatalog
  • SEARS Sears List of Subject Headings
  • SOG Soggettario
  • SWD_RSWK Swiss National Library
  • UDC Universal Decimal Classification
  • VAT Vatican Library

For classification systems not listed, insert the classification system with a leading "x-" in the type attribute and notify the OSIS team if that system should be added in a future revision of the schema.

7.3.14.2. source

A reference to a resource from which the present resource is derived.

The present resource may be derived from the source resource in whole or in part. Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.

7.3.14.3. type

The nature or genre of the content of the resource.

Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMI Type Vocabulary [DCT1]). To describe the physical or digital manifestation of the resource, use the format element.

7.3.15. Non-Dublin Core Elements and Attributes in the Work Declaration

7.3.15.1. scope

The scope element(s) must have an osisRef attribute, which defines what part of the titled work occurs in this electronic edition. For example, an edition may consist of only the New Testament and Psalms, or of only a single book. Contiguous ranges may be specified using the hyphen notation described later for osisRefs in general; discontiguous ranges must be specified by including multiple scope element(s), as shown in the second example above. These should be, but are not required to be, in canonical order.

7.4. Identifying a Work given a work declaration element

The six elements already described are the primary means of identifying a referenced work.

If a publication matches all of the above elements within work, it is presumed to be an acceptable resolution for any reference to that work as declared.

If no perfect match can be found, applications may, indeed should, attempt to fall back to the closest available publication. OSIS does not define a required method of fallback, or define what "closest" must mean in all contexts. However, one possible approach is to successively ignore particular elements in this order:

  • Identifier: because identifiers are often ambiguous. For example, hardcover and softcover editions of a book typically have different ISBNs, and occasionally publishers re-use an old ISBN for a completely different book.
  • Date: because a different imprint or edition of the same conceptual work is typically adequate. Precisely targeted links, however, may not refer to the exact location desired. Applications may wish to ignore all dates except for the original publication date.
  • Publisher: because several publishers may publish a given work (particularly older works), publishers may change name, etc.
  • Language: Accepting a publication that does not match in language is a substantial concession. However, some variations of language are greater than others. For example, some modern Bible translations are available in separate American and British English versions, and substituting one for the other is not unreasonable. This is particularly true because translations generally use translated titles as well, and so if the language is not closely related, the title will probably not match either. Applications may wish to encode some knowledge of language and dialect similarities to implement more sophisticated fallback.
  • Creator: because some authors have multiple forms of name: St. Augustine vs. Augustine of Hippo vs. Augustine. The Bible Technology Group intends to develop an authority list of normative name-forms for relevant authors, and once such a list is available, using it will help to avoid such problems. As with other elements, more sophisticated applications may wish to attempt some kind of approximate matching in order to achieve better fallback.
  • Title: the final item to discard is probably title. If a work's title differs, it is probably a different work, or at least a translation into a non-close language. On the other hand, some titles have been used by multiple authors, and so a match on title alone should be considered suspect.

Arguments can easily be made for a variety of other fallback methods. For example, if the identifier element matches, the work is probably right, even though an identifier mismatch is not good evidence that the work is wrong.

7.5. Date formats

All dates in the header and in attributes should be in this standard format, which is based on IETF RFC 3339. However, it uses period rather than colon as the field separator (for consistency with other OSISis types), and adds features to allow for dates BCE, for approximate dates, for date ranges, for yearless dates (as used in many daily devotionals), for weekly dates, and for named times of day (such as used in many prayer books). There are 3 standard date formats; the prefixes that identify them are reserved, and may not be redefined via the refSysId attribute of any work element:

  • yearly:yyyy-mm-ddThh.mm.ss

    Any number of fields may be left off from the right end; for example, if the seconds are dropped (along with the preceding colon), the time refers to the entire minute specified; if the entire time section is left off (along with the preceding "T"), the string refers to the entire day.

    The year must always have 4 digits. However, the year may be entirely omitted to indicate dates that apply to any year, such as in a book of 365 daily readings.

    To indicates years before the common era, add an underscore ("_") before the first digit of the year (immediately following the colon). A hyphen would be preferable, but it is already in use to indicate ranges in osisRefs.

    The entire date/time string (possibly including a leading underscore) may be preceded by "~", indicating that the time is approximate. No means is provided to express just how approximate a time may be.

  • weekly:n

    When readings or other materials are specified as being for particular days of the week, this form must be used. The 'n' value may range from 1 to 7; 1 indicates Monday, in accordance with ISO 8601:2000.

As an alternative to quantitative times, a small set of named times is provided, which can be specified in place of the entire (post-"T") time section (the "T" itself remains). For example:

yearly:06-04T~(Vespers)

would be the identifier for a prayer, reading, or other work to be used at Vespers on June 4 of any year. The named times (which are case-sensitive) include: Vigils, Matins, Lauds, Terce, Sext, None, Vespers, Compline; Sunrise, Sunset; Morning, Afternoon, Evening, Night; AM, PM; Fajr, Zuhr, _Asr, Maghhrib, _Isha, Lail, Dzuha, _Id.

Some works will be primarily organized by dates and times: for example, lectionaries, daily devotionals, prayer books, historical time lines, etc. In such works, use the osisID attribute to identify the retrievable portions; the value should the the applicable time in one of the formats just shown.

Typically, such works are organized in chgronological order of the times specified; however, OSIS does not impose that requirement.

8. Title Pages

In order to make the encoding of title pages as found in standard works easier, OSIS 2.0 introduced the titlePage element. This element contains the following elements from the header: title, contributor, creator, subject, date, description, publisher, type, format, identifier, source, language, relation, coverage, which are explained in the material on the header section. Three additional elements are allowed, which are figure, milestone, and, p. Due to the complexity of title pages, all of these elements may occur in any order inside the titlePage element.

The titlePage element can occur within the osis, osisText, and, osisCorpus elements.

Users just starting with OSIS should use a minimum headers and simple titlePage element until they have gained some experience with text encoding and determining what is, or perhaps more importantly, what is not useful to have encoded in a work.

9. Basic Elements

While book, chapter, and verse numbers are a familiar and useful way of referring to locations in the Bible, they often conflict with the boundaries of parables, stories, genealogies, paragraphs, quotations, and other important units of understanding. Even to print a well-formatted Bible edition, and much more to support high-end search, annotation, and other capabilities, these meaningful units must also commonly be marked.

It is possible to encode a Bible using only book, chapter, and verse markup. However, most encoders also want to also represent sections, paragraphs, quotations, and so on. Higher-level structures are tagged as div, for "division", with a type attribute to specify the particular significance. div elements can occur within other div elements to any number of levels. The first and outermost div should occur immediately after the end of the header. For example,

<div type="book" osisID="Gen">
<head>Genesis</head>
<chapter osisID="Gen.1">
<head>1</head>
<verse osisID="Gen.1.1">In the beginning,...</verse>
<verse osisID="Gen.1.2">The earth was formless and void...</verse>
...
</chapter>
</div>

The div element is used for many top-level components, and so makes heavy use of the type attribute. The pre-defined types include the most common major divisions found in present-day Bibles and related works:

acknowledgement, afterword, annotant, appendix, article, article, back, body, book, bookGroup, chapter, colophon, commentary, concordance, coverPage, dedication, devotional, entry, front, gazetteer, glossary, imprimatur, index, introduction, majorSection, map, outline, paragraph, part, preface, section, subSection, titlePage. ’

The main body of a Bible will typically consist of div elements of type="bookGroup" (such as each Testament, the Apocrypha, and perhaps smaller groups such as the Pentateuch, the Minor Prophets, etc), plus any front and back matter divisions (the selection of which varies greatly between editions).

With each bookGroup div, there will typically be book divs corresponding to each included Canonical or deutero-canonical book. Some books are divided into majorSections (such as the sub-books in Psalms), sections (typically topical divisions with headings), subSections (occasional minor divisions within sections). A specific chapter element is provided and encouraged, though div type="chapter" is also permissible.

Below this point typical texts switch from successive levels of div elements, to more specific markup such as paragraphs, lists, quotations, inscriptions, and the like. Also at this level, the markup begins commonly to interact with verse markup.

Use of the types defined for div is mandatory when a provided type is applicable. For example, a colophon must be marked up as <div type='colophon'>. If types not provided are needed, they may be added but must begin with "x-", to distinguish them from OSIS-standard values.

Such markup forms the primary backbone of an OSIS document. Chapter and verse elements are important (particularly for retrieval), but considered to be an overlay onto the more linguistic or thematic structure. Therefore, so long as verses or chapters do not cross the boundaries of other elements, they may be expressed in the normal fashion (NASB):

<chapter osisID="Mark.10">
<head>Mark Chapter 10</head>
<div type="section"><head>Divorce</head>
<verse osisID="Mark.10.1">Jesus then left that place and went into
the region of Judea and across the Jordan. Again crowds of people
came to him, and as was his custom, he taught them.
</verse>
<verse osisID="Mark.10.2">Some Pharisees came and tested him by
asking, "Is it lawful for a man to divorce his wife?"
</verse>
<verse osisID="Mark.10.3">"What did Moses command you?" he replied.
</verse>
<verse osisID="Mark.10.4">They said, "Moses permitted a man to write
a certificate of divorce and send her away."
</verse>
<verse osisID="Mark.10.5">"It was because your hearts were hard that
Moses wrote you this law," Jesus replied. </verse>
<verse osisID="Mark.10.6">"But at the beginning of creation God 'made
them male and female.' </verse>
<verse osisID="Mark.10.7">'For this reason a man will leave his
father and mother and be united to his wife,</verse>
<verse osisID="Mark.10.8">and the two will become one flesh.' So they
are no longer two, but one. </verse>
<verse osisID="Mark.10.9">Therefore what God has joined together, let
man not separate."</verse>
<verse osisID="Mark.10.10">When they were in the house again, the
disciples asked Jesus about this. </verse>
<verse osisID="Mark.10.11">He answered, "Anyone who divorces his wife
and marries another woman commits adultery against her. </verse>
<verse osisID="Mark.10.12">And if she divorces her husband and
marries another man, she commits adultery." </verse>
</div>
...
</chapter>

10. Simple paragraphing, quotes, and notes

Paragraphs (element p), quotations (element q), and other grouping elements can be inserted around groups of verses, as shown below. Likewise, note elements can be inserted where needed. The paragraph need not give an osisID for the set of verses it contains, since they are typically provided on the verse elements themselves:

...
<p>
<verse osisID="Esth.4.10">Then Esther spoke to Hathach, and gave him
a command for Mordecai: </verse>
<verse osisID="Esth.4.11"><q>All the king's servants and the people
of the king's provinces know that any man or woman who goes into the
inner court to the king, who has not been called, he has but one law:
put all to death, except the one to whom the king holds out the
golden scepter, that he may live. Yet I myself have not been called
to go in to the king these thirty days.</q> </verse>
<verse osisID="Esth.4.12">So they told Mordecai Esther's words.
</verse> </p>
<p>
<verse osisID="Esth.4.13">And Mordecai told them to answer Esther:
"Do not think in your heart that you will escape in the king's palace
any more than all the other Jews. </verse>
</p>
<p>
<verse osisID="Esth.4.14">For if you remain completely silent at this
time, relief and deliverance will arise for the Jews from another
place, but you and your father's house will perish. Yet who knows
whether you have come to the kingdom for such a time as this?"
</verse>
</p>
<p>
<verse osisID="Esth.4.15">Then Esther told them to reply to Mordecai: </verse>
<q>
<verse osisID="Esth.4.16">"Go, gather all the Jews who are present in
Shushan, and fast for me; neither eat nor drink for three days, night
or day. My maids and I will fast likewise. And so I will go to the
king, which is against the law; and if I perish, I perish!
</verse></q>
</p>
<p><verse osisID="Esth.4.17">So Mordecai went his way and did
according to all that Esther commanded him.<note
type="textual">Septuagint adds a prayer of Mordecai
here.</note></verse> </p>

Notice in this example that all the paragraphs and quotations still enclose an exact number of verses; there are exceptions to this elsewhere in the Bible, that need special handling as explained later.

When tagging quotations, do not also include quotation marks. They will be generated in the typesetting or display process. This is important for several reasons. FIrst, if some people use q, some use punctuation marks, and some use both, anyone processing OSIS texts will have to check every text and account for all the variations -- this is expensive and time-consuming: that is, it will make the Bibles cost more (to someone), and be delivered later. Another reason is that punctuation for quotes differs around the world; so any given quotation mark may be meaningless to other communities. In Spanish, for example, there are special rules about how to mark quotes that continue after an interruption -- such cases can be distinguished by adding a type attribute to the q element, with values such as initial, medial, and final.

Many editions of the Bible have accompanying notes, often of several distinct types. A number of predefined types, and some additional internal structure, are discussed later. It is customary to include the notes directly within the text, at the point to which they apply. This can be done via the note element, which can be placed almost anywhere. In the future, it is likely that notes will more commonly reside outside of the text, instead residing in special notes-files that can be attached (via osisRef) to any Bible edition on request.

Every note should have a type attribute to indicate its purpose; many Bible editions show different kinds of notes in different places. The pre-defined note types are listed below; they are not sharply-defined, wholly distinct categories. In addition, if none of these categories suffice, encoders may create their own so long as their names begin with "x-".

  • allusion The note explains an implicit reference the text makes to another text or concept.
  • alternative The note records an alternate possible reading of the text, whether due to ambiguity in translation or to manuscript variation.
  • background The note provides background information, such as cultural norms, explanations of geographic or other information original readers would have known, and so on.
  • citation The note cites a supporting text or further explanation of some kind.
  • crossReference The note provides a cross-reference to a related passage or other text.
  • devotional The note includes information of interest for devotional reading.
  • exegesis The note discusses a relevant point of exegesis or interpretation
  • explanation The note explains implicit, ambiguous, or otherwise non-obvious aspects of the passage.
  • speaker [2.0] This type is intended mainly for use in sermons and other performance texts, where the performer may wish to make notes to him or herself. For example, "tell joke here".
  • study The note provides helps for a deeper study of the passage.
  • translation The note discusses an issue of translation, such as a word whose meanining is unclear in the original, or a reasons for the translator's choice of phrasing. Bible translation projects will likely use this heavily, using the subtype attribute to mark the status of each note as resolved or unresolved, the person responsible for the note, and so on.
  • variant The note records a textual variation in manuscript tradition, relevant at its location.

Sometimes a verse or chapter starts or end in the middle of some other unit, such as a poetic line group, paragraph, quotation, or speech. In such cases an alternate form of the verse or chapter tags must be used. This usage is explained in the next section.

11. Elements that cross other elements

The normal form of an element is a start tag and an end tag: <verse>...</verse>. For handling markup that crosses boundaries, however, a special form must be used. It consists of two totally empty instances of the same element type: one to mark the starting point, and one to mark the ending point. The two empty elements identify themselves as to which is the start and which is the end, and co-identify themselves by the sID attribute (the start of the traditional element) and the eID attribute (the end of the traditional element), the values of which must match.

Empty elements are indicated in XML by a tag with "/" preceding the final ">": thus "<verse/>" rather than <verse> or </verse>. Elements used in this way are commonly called "milestones", and those particular elements in OSIS that permit this alternate encoding are thus called "milestoneable". Elements that are "milestoneable" in the OSIS schema are:

  • abbr
  • chapter
  • closer
  • div
  • foreign
  • l
  • lg
  • q
  • salute
  • seg
  • signed
  • speech
  • verse

This is particularly useful where modern translations break up verses or other traditional divisions in a Bible text. For example, a paragraph based encoding of part of the Book of Esther would appears as follows:

<p>
<verse sID="Esth.2.7" osisID="Esth.2.7"/>Mordecai had a very beautiful cousin named Esther, whose Hebrew name was Hadassah. He had raised her as his own daughter, after her father and mother died.<verse eID="Esth.2.7"/>
<verse sID="Esth.2.8" osisID="Esth.2.8"/>When the king ordered the search for beautiful women, many were taken to the king's palace in Susa, and Esther was one of them.</p>
<p>Hegai was put in charge of all the women,<verse eID="Esth.2.8"/>
<verse sID="Esth.2.9" osisID="Esth.2.9"/>and from the first day, Esther was his favorite. He began her beauty treatments at once. He also gave her plenty of food and seven special maids from the king's palace, and they had the best rooms.<verse eID="Esth.2.9"/>
</p>

There are two things to note about the Esther example:

  • Esther 2:8 is divided by a paragraph (the p element and so must be marked using the verse element as a milestones with the sID and eID attributes to link those two milestones together.
  • Where overlapping elements are necessary, the milestoneable element technique must be used for the entire text. That is, it is an error to mark some verses in Esther with traditional verse elements, i.e., as containers and others with the milestoneable verses. The reason is quite simple, inconsistent markup is more difficult to process and makes the encoded text less useful for everyone.

This is equivalent to the TEI "milestone" method for marking such phenomena. It has the advantage that milestones representing a given type of element have the same name as the element, and automatically have the same attributes. Although XML itself will not detect a validation error if attributes other than eID are specified on the ending milestone, eID is specified on the starting milestone, or the start and end milestones are in the wrong order, each of these conditions is an OSIS error.

For OSIS purposes, there is no semantic difference between marking up a chapter or verse as a container using a start and end tag, versus marking it up as a "milestone pair" consisting of two empty tags.

Note: Typesetting and layout systems vary in their ability to accommodate non-hierarchical markup such as this. Fortunately, in most Bible editions the only formatting consequence of a verse element is insertion of the verse number, and perhaps insertion of a line-break; these are within the capabilities of most layout and style systems even though the verse is not a container in XML terms.

12. Special Text Types

The bulk of the remaining OSIS elements fall into a few simple classes: First, markup for special text types, such as epistles and drama. Second, generic structures such as lists, tables and glossaries (typically found in appendixes of printed Bibles). And finally, small-scale elements that mark, quotations, notes, names, index entries, and the like.

12.1. Markup for epistles and similar materials

Letters, epistles, and similar texts are marked up in basically the same way as any other text. However, three special elements are available for marking portions unique to this genre:

12.1.1. salute

The salute element encloses the salutation or greeting, typically at the very beginning of a letter. It should include the whole salutation, including (if present) the "to", "from", and any following greeting or blessing. If the boundaries of a salutation are the same as the boundaries of a paragraph, section, or other unit, that unit should be placed outside, with the salute element directly within. For example (LBP):

<div type="book" osisID="1Tim">
<head>The First Epistle to Timothy</head>
<chapter osisID="1Tim.1">
<salute>
<verse osisID="1Tim.1.1">FROM: PAUL, a missionary of Jesus Christ,
sent out by the direct command of God our Savior and by Jesus Christ
our Lord -- our only hope.</verse>
<verse osisID="1Tim.1.2">To: Timothy. Timothy, you are like a son
to mein the things of the Lord. May God our Father and Jesus Christ
our Lord show you his kindness and mercy and give you great peace
of hear and mind.</verse>
</salute>
<verse osisID="1Tim.1.3">...</verse>
</chapter>
...
</div>

12.1.2. signed

The signed element surrounds the name of the author and/or amanuensis of a letter and its immediately surrounding phrase of opening or closing (if any). In Biblical epistles, it is common for the author to be named only at the beginning; this should still be marked up with the signed element.

signed may appear with or without an accompanying closer or salute element, and the name may or may not also be tagged as a name (if it is, the name should be the inner element even if it includes all the text content of the signed element. In New Testament epistles, there is not generally an obvious, final signature. However, this element may be used somewhat more broadly of a phrase or portion judged as intended to identify the writer. As shown below, the signature of an amanuensis may also be marked up in this way. For example (RSV):

  • <verse osisID="Rom.16.22"><signed>I Tertius salute you which wrote this epistle in the Lorde.<signed</verse>

    [English, Tyndale, 1525/1530]

  • <verse osisID="1Cor.16.21"><signed>I, Paul, write this greeting with my own hand.</signed></verse>

    [English, RSV]

  • <verse osisID="2Cor.1.1"><signed>Paul, an apostle of Jesus Christ by the will of God, and Timothy [our] brother, to the church of God which is at Corinth, with all the saints who are in all Achaia:</signed></verse>

    [English, Webster]

  • <verse osisID="Gal.6.11"><signed>See with what large letters I am writing to you with my own hand.</signed></verse>

    [English, RSV]

  • <verse osisID="Eph.1.1"><signed>Paul, an apostle of Christ Jesus through the will of God, to the saints that are at Ephesus, and the faithful in Christ Jesus:</signed></verse>

    [English, American Standard Version, 1901]

  • <verse osisID=""><signed>Paul, and Silvanus, and Timothy, to the church of the Thessalonians which is in God the Father and in the Lord Jesus Christ: Grace to you, and peace. </signed></verse>

    [English, RKJNT]

  • <verse osisID="1TIm.1.1"><signed>Paul, an apostle of Jesus Christ, according to the commandment of God our Savior, and of Christ Jesus our hope:</signed></verse>

    [English, Douay-Rheims Bible, Challoner Revision]

  • <verse osisID="Phm.1.1"><signed>Mimi Paulo, mfungwa kwa ajili ya Kristo Yesu, na ndugu Timotheo,</signed> ninakuandikia wewe Filemoni mpendwa, mfanyakazi mwenzetu</verse> <verse osisID="Phm.1.2">na kanisa linalokutana nyumbani kwako, na wewe dada Afia, na askari mwenzetu Arkupo.</verse>

    [Swahili NT]

12.1.3. closer

The closer element surrounds the closing portion of a letter, typically consisting of final greetings or blessing, and a signature (see signed). It is a matter of judgement just where a closer begins and ends. For example:

  • <closer><verse osisID="1John.5.21">Dear children, keep away from anything that might take God's place in your hearts. Amen. Sincerely, <signed>John</signed></verse></closer>

    [LBP]

12.1.3.1. benediction

OSIS presently provides no special markup for benedictions and blessings. Recommended practice at this time if an encoder wishes to identify them in a text, is to use seg type="benediction". For example:

  • <verse osisID="2Cor.13.14"><seg type="benediction">The grace of the Lord Jesus Christ, and the love of God, and the communion of the Holy Spirit, [be] with you all. Amen.</seg></verse>

    [Webster]

12.2. Dramatic texts

OSIS provides two main features for marking up dramatic texts: A way to declare the list of characters, or castList; and a way to identify speeches and speakers in the body of a dramatic text.

A castList element contains a structured list of the roles, or cast, of a dramatic work. It is drawn directly from the TEI structure for the same thing. For example, in the Song of Songs, some translations may present the list of characters at the start of the book: lover, beloved, and friends. The same might be done for Job. However, these elements will be most commonly used for extra-Biblical materials, such as a play based on the Bible, or dramas in classical or other literature.

A simple example of a castList is shown below, perhaps for a dramatic re-enactment of Job:

<castList>
<castGroup>
<head>Cast of characters</head>
<castItem>
<actor>Patrick Durusau</actor>
<role>Job</role>
<roleDesc>A man of God who suffers greatly</roleDesc>
</castItem>
<castItem>
<actor>(a whirlwind)</actor>
<role>God</role>
<roleDesc>The Almighty, who permits Job's suffering, and
responds to his questions about it.</roleDesc>
</castItem>
<castItem>
<actor>(a disembodied voice)</actor>
<role>Satan</role>
<roleDesc>The instigator of Job's suffering</roleDesc>
</castItem>
<castItem>
<actor>Todd Tillinghast</actor>
<role>Eliphaz</role>
<roleDesc>The first of Job's friends to speak</roleDesc>
</castItem>
<castItem>
<actor>Chris Little</actor>
<role>Bildad</role>
<roleDesc>The second of Job's friends to speak</roleDesc>
</castItem>
<castItem>
<actor>Steve DeRose</actor>
<role>Zophar</role>
<roleDesc>The third of Job's friends to speak</roleDesc>
</castItem>
<castItem>
<actor>Troy Griffiths</actor>
<role>Elihu</role>
<roleDesc>The youngest and last of Job's friends to speak,
who was slightly less clueless than the rest.</roleDesc>
</castItem>
</castGroup>
</castList>

The castList element contains the entire casting List, and consists of one or more castGroup elements. Multiple castGroups, each with its own head, would be used if there were multiple sub-groups of the cast to be listed separately; more typically there will be only one castGroup within a castList.

At this time, castList can only occur in a work declaration, after the Dublin Core elements. Thus, if a Bible encoder wishes to include the casts of Song of Songs and of Job, they would each need to be marked as a separate castGroup within that one castList.

The castItem element contains the full information for a single character. This must include a name for the role being played, and should include a roleDesc, that is, a description of that role. It may also include the name of an actor, if the text being encoded represents a particular enactment rather than, say, a libretto or script.

In general there is no need to also encode an actor name or role name with an explicit name element, unless the encoder wishes to provide a normalized form for later reference; in that case, the name element would be placed just within the actor or role element, not surrounding it.

It is strongly recommended that each castGroup and castItem have an ID attribute. Since IDs must be unique across all element types in a document, encoders may wish to prefix certain kinds of IDs to separate them and avoid conflicts. For example, an appropriate ID for a castItem representing the Friends in Song of Songs would be "cast.friends", or perhaps "cast.song.friends".

12.3. speaker

The speaker element is used to identify the person or role that is uttering the content of an associated speech.

<div osisID="NRSV.Song.2">
<speech>
<speaker>woman</speaker>
<verse osisID="NRSV.Song.2.1">I am a rose of Sharon, a lilly of the valleys.</verse>
</speech>
</div>

Which is the equivalent to:

<div osisID="NRSV.Song.2">
<speech who="woman">
<verse osisID="NRSV.Song.2.1">I am a rose of Sharon, a lilly of the valleys.</verse>
</speech>
</div>

Either method is correct but careful encoders will choose one or the other and be consistent in using one method or the other. Other than document invalidity, nothing makes use of an encoded document more difficult than correct, but inconsistent encoding.

12.4. speech

The speech element is used to indicate quoted direct speech. In that sense it represents a kind of quotation. However, the q element is to be used for quotations in general, where the speech element is limited to accounts of an individual making an actual speech in some kind of performance context. In general, both elements should not be applied to the same text portion. Just as with the q element, using the speech element makes quotation marks unnecessary, and they must not be used. For example:

<chapter osisID="Acts.7">
<head>Stephen's Speech to the Sanhedrin</head>
<verse osisID="Acts.7.1" sID="a71"/>Then the high priest asked him, <speech>Are
these charges true?</speech>
<verse eID="a71">
<verse osisID="Acts.7.2" sID="a72"/>To this he replied:
<speech>Brothers and fathers, listen to me! The God of glory appeared
to our father Abraham while he was still in Mesopotamia, before he
lived in Haran. <verse eID='a72'/>
<verse osisID="Acts.7.3" sID="a73">'Leave your country and your people,' God
said, 'and go to the land I will show you.'<verse eID="a73"/>
<verse osisID="Acts.7.4" sID="a74"/>"So he left the land of the Chaldeans and
settled in Haran. After the death of his father, God sent him to this
land where you are now living. <verse eID="a74"/>
<verse osisID="Acts.7.5" sID="a75"/>He gave him no inheritance here, not even a
foot of ground. But God promised him that he and his descendants
after him would possess the land, even though at that time Abraham
had no child. <verse eID="a75"/>
<verse osisID="Acts.7.6" sID="a76"/>God spoke to him in this way: 'Your
descendants will be strangers in a country not their own, and they
will be enslaved and mistreated four hundred years. <verse eID="a76"/>
<verse osisID="Acts.7.7" sID="a77"/>But I will punish the nation they serve as
slaves,' God said, 'and afterward they will come out of that country
and worship me in this place.'<verse eID="a77"/>
<verse osisID="Acts.7.8" sID="a78"/>Then he gave Abraham the covenant of
circumcision. And Abraham became the father of Isaac and circumcised
him eight days after his birth. Later Isaac became the father of
Jacob, and Jacob became the father of the twelve
patriarchs.<verse eID="a78"/>
...
<verse osisID="Acts.7.53" sID="a79"/>you who have received the law that was put
into effect through angels but have not obeyed it.
<verse eID="a79"/>
</speech>
...</chapter>

Note that in this example the high priest's short speech in verse 1 is marked up as a normal container element with normal start- and end-tags, as is Stephen's reply. But, note that all the verse boundaries have been repesented with milestoneable verse elements. The reason for this is quite simple, if the encoding jumps from using containers for verses and only on occassion changes to milestones, noting that Stephen's speech start inside a verse, the file becomes very difficult to process reliably. When a conflict arises between the scope of chapter/verse units and other units, the chapter/verse units give way by being represented as milestones. If a conflict arises between two other units (say, a quote that encompasses part but not all of each of two paragraphs), it is left to the encoder's discretion which or them is represented via milestones.

12.5. Marking up poetic material

Although poetic material is commonly called "verse" material, OSIS avoids that term because of potential confusion with the book/chapter/verse reference system. Thus, like "TEI," markup of poetry refers to lines and line groups.

In addition, OSIS provides a typographic line-break element. This is because in at least some editions of the Bible, the exact placement of typographic line-breaks within poetic lines is considered very important; while on the other hand it is determined in part by presentational concerns (for example, column width), rather than by linguistic characteristics of either the source or target language.

OSIS provides three main elements for marking up poetic material:

12.5.1. lg

The lg or "line group" element is used to contain any group of poetic lines. Thus it covers for units like couplet, stanza, and entire poem. Line groups can contain smaller line groups as well.

12.5.2. l

The l element is used to mark poetic lines, as determined by the linguistic nature of poetry in the language of the work. For example, much English poetry consists of lines that can be located by the position of rhyming words, and/or by counting syllables; Hebrew poety can often be divided into lines based on parallelism of thought or meaning.

The following example shows an encoding of the first two verses of Psalm 7 from the CEV which uses the lg and l elements to mark poetic material.

<div type='section' scope='Ps.7.1-Ps.7.17'>
<title>The <divineName type='x-yhwh'>LORD</divineName> Always Does Right</title>
<lg>
<l>
<verse sID='Ps.7.1' osisID='Ps.7.1'/>You, <divineName type='x-yhwh'>LORD</divineName> God,<lb type='x-secondLine'/>are my protector.</l>
<l>Rescue me and keep me safe<lb type='x-secondLine'/>from all who chase me.<verse eID='Ps.7.1'/>
</l>
<l>
<verse sID='Ps.7.2' osisID='Ps.7.2'/>Or else they will rip me apart</l>
<l>like lions<lb type='x-secondLine'/>attacking a victim,<lb type='x-secondLine'/>and no one will save me.<verse eID='Ps.7.2'/>
</l>
</lg>
</div>

12.5.3. lb

The lb element, or "line break", is used to mark line breaks that are not the result of linguistically or poetically significant structure, but are primarily part of the typography and layout. For example, a lone line might be broken to fit into a narrow column. The lb element is an empty element used to mark where such breaks occurred in an important copy text, or where they should be placed in a text to be rendered.

Bible typesetting has a long tradition involving placement of such breaks. In some cases, translators have carefully decided preferred or required break-points for various set widths. These can be accommodated by using the type attribute of lb. For example, type="wide-pref" and type="narrow-pref" might be used to identify the locations of preferred line-breaks for wide and narrow column layouts. Similarly, type might be used to distinguish various levels of indentation following the break, or other typographic factors deemed important.

The lb element should not be used merely to record where line breaks in general happened to occur in a source edition. For most source editions this information is unimportant; for manuscripts it may be imortant, but must be marked up using the milestone element instead.

12.6. Lists, tables, genealogies, figures and other material

Simple glossaries such as appear at the back of many Bibles, may be encoded at this time using the simple list, label, item elements described below. A dicitonary extension is well along in development, and should be available as an extension module within the next few months. That module should be used for any but the simplest lexical tools; and once available, OSIS may decide to recommend against further use of list to represent even simple glossaries.

12.6.1. list

All types of lists are marked using the list element; they can be distinguished by type attribute valuess such as "ordered", "unordered", "compact", "definition", and type. A list consists of any number of items, some or all preceded by labels, which corresponded to the definition-terms of definition lists in various schemas.

12.6.2. label

A leading label for a given list item. Labels are optional.

12.6.3. item

The main content or description for each list item.

(list example forthcoming)

12.6.4. table

OSIS provides only very rudimentary tables: a table consists of rows, which in turn consist of cells. Formatting and layout is not part of the table markup; it can either be done automatically, as in HTML browsers, or by inserting some signal to the layout engine, such as type attributes or processing instructions. Note that a table can be nested inside another table. Simply start a new table element inside a cell element.

12.6.5. row

12.6.6. cell

(table example forthcoming)

12.6.7. figure

The figure element is used to insert graphic non-textual materials, in other words, maps, pictures, drawings into an encoded text. The figure element in OSIS may contain caption (see next section) along with optional index and note elements.

An example of a figure in an OSIS text might be:

<figure src="Beckmann_1917.jpg" alt="Painting by Max Beckmann, titled
Christ and the Woman taken in Adultery"><caption>Christ and
the Woman Taken in Adultery by Max Beckmann,
1917</caption><index index="illustrations"
index1="Beckmann, Max">
</figure>

At first it may look odd that the material in the alt attribute is repeated in the caption element. The alt attribute is important for situations where the application or user (for the visually impaired) cannot use or see the image that has been inserted in the text. The alt attribute is a friendly way of insuring that the encoded text will be understandable by the widest range of both applications and users.

The index attribute allows the encoder to encode the information necessary to automatically create an index, for either an online version of this material or a more traditional back of the book index. The index attribute gives the type of index where this item will appear and index1 provides the material that will appear in that index. See index (below) for more information on this element.

12.6.8. caption

(see example above, fuller examples forthcoming)

12.7. milestone

The milestone element is an empty element, and so is represented as <milestone/> rather than as a typical start- or end-tag. It is used to mark point events in a text, often involving the layout of the original text, or special points of access into the electronic text.

For example, when digitizing a manuscript, it may be considered important to record where the page, column, and line boundaries of the original manuscript fell. This would be done as shown here:

<milestone type="pb" n="37-verso"/>
<p>The Lord said to Eliphaz:<milestone type="line"/>
What my servant Job has said about me is true, <milestone type="line"/>
but I am angry with you and your two friends for <milestone type="line"/>
not telling the truth. <verse osisID="Job.42.8">So I want you to go
over to <milestone type="line"/>
Job and offer seven bulls and seven goats on an <milestone type="line"/>
alter as a sacrifice to please me. After this, Job <milestone type="line"/>
will pray, and I will agree not to punush you for <milestone
type="line"/>your foolishness.</verse><milestone type="line"/>
<verse osisID="Job.42.9">Eliphaz, Bildad, and Zophar obeyed the Lord,
and he answered Job's prayer.</verse>

Note that because milestone is an empty or point element, not a container, it may be placed freely without concern about violating the boundaries of other elements in the same region.

Where a break to be represented by a milestone occurs between other units, such as verses or paragraphs, the milestone should be placed between those units, rather than just within either one.

When setting attribute n on a milestone, it should indicate the number of the unit starting, not the unit ending. For example, <milestone type="page" n="3"/> indicates the break between pages 2 and 3, not between pages 3 and 4. Numbering does not need to be unique across various types of milestones -- for example, the 24th line on page 5 of a manuscript may be marked simpley n="5", rather than n="24.5" or similar.

Several predefined types are provided for the milestone element (the value for the type attribute is shown in bold):

  • pb

    Marks the location of a page break in the source text.

  • column

    Marks the location of a column break in the source text. Assuming page boundaries are also marked, the start of the first column need not be marked unless something else (such as a footer) precedes it in the encoding of the page. Columns should be numbered in the order of reading (for example, right to left in Hebrew texts). In the case of, say, an English/Hebrew diglot edition, where there is no principled order of reading among the columns, the direction used for the pages (Hebrew or Greek) should be considered the dominant direction, and the same direction should be used for numbering columns.

  • header

    A milestone of type "header" should precede the encoding of the page header if it is being included in the encoded text. This would normally be true only for digitized editions of manuscripts or other important copy editions, because in modern print Bibles headers are typically automatically generated.

  • footer

    Type "footer" should be used just like type "header", except that it marks the page footer area instead.

  • line

    Line milestones should be used to mark line breaks in the copy text when they are considered significant. This will normally only be true for important manuscripts, where line numbering may be needed for paleographic or reference use. Line milestones must not be used to represent linguistically significant line breaks, such as in poetry, for which the lg and l elements are provided.

  • halfLine

    In certain languages it is important to mark half-line units, and this type is provided for such cases.

  • screen

    The milestone of type "screen" is to be used to mark preferred break points in an on-screen rendering of the text. For example, if the user requests to be taken to the book of Psalms in a given electronic edition, it may be best not to take them to Psalm.1.1, but to an earlier point, preceding any introductory material. In many cases this can be accomplished by taking them to the appropriate div (since the <div type="book" osisID="Ps"> should precede and Psalms-specific introductory material); but this milestone type is available for other cases. The OSIS specification does not impose requirements on how applications make use of such milestones.

13. Common elements in all texts

The elements found in this section can be found in almost any encoded text.

13.1. a

The a element is exactly analogous to the HTML a element, and likewise may be used to encode links within a document. This eases integration of OSIS documents into the Web environment. For example:

<p>See Edwards' famous treatise on <a
href="http://www.ccel.org/e/edwards/affections/religious_affections.html">religious
affections</a> for additional information.</p>

13.2. index

The index element may be placed at any point in the document to indicate a topic under which that location should be indexed. It is always an empty element. Multiple indexes (such as of places, names, theological or ethical issues, etc) must be distinguished via the name attribute.

Indexes with up to 4 levels of headings are supported. The primary index entry name is specified on the level1 attribute, followed by sub-headings level2, level3, and level4. For example:

<head>On Justice<index name="topic" level1="Virtues"
level2="Justice"/>

There is also a see attribute, which may be used to represent the need for a cross-reference to another index entry; such elements should be placed together at the end of the document body (since they do not refer to a particular location). For example:

<index name="topic" level1="Virtues" level2="Justice" see="Fairness"/>

No separate "see also" type is provided at this time.

13.3. reference

The reference element is used to encode an explicit cross-reference to another passage or work (the work referred to need not be Biblical, but must be declared via a work element in the header, and by accessible via the same canonical referencing scheme defined in osisID syntax. Reference elements will often occur within notes, but may also occur freely in text (the latter is more common when encoding non-Biblical works). For example:

(example forthcoming)

13.4. abbr

Marks a portion of the content as an abbreviation. The expanded value should be supplied as the value of the expansion attribute. For example:

<abbr expansion="Journal of Biblical Literature">JBL</abbr>

Most often seen in notes, where citations are often abbreviated and users may not be familiar with the abbreviation. Putting expansion in the expansion attribute allows software to chose to diplay the expansion instead of the abbreviation or to display it upon request by the reader.

13.5. catchWord

Catchwords and catchphrases are those parts of notes that are copied from the main text, to orient the reader as to the note's precise applicability. Catchwords in notes must be marked when present. For example:

<verse osisID="NRSV:Ezek.19.5">When she saw that she was thwarted,
that her hope was lost, she took another of her cubs and made him a
young lion.</verse> <note>It is uncertain to which king <catchWord
osisRef="Ezek.19.5">another of her cubs</catchWord> refers....</note>