The OSIS project began as at breakfast meeting organized by Dennis Drescher (SIL) at the XML 2000 meeting in Washington, D.C. Unlike many such meetings about an encoding standard for Bibles and related literature, this one resulted in a coalition thatt produced both the OSIS schema and the following documentation.
There have been meetings, teleconferences, email discussions numbering in the thousands of posts, as each part of the OSIS schema was developed and tested against the needs of those who are interested in using XML in tasks such as publishing, translating and studying the Bible. Work on additional capabilities for OSIS and more documentation is ongoing and newcomers are welcome to the effort. Any volunteers, or comments, corrections, supplied omissions/examples, should be addressed to to osis-editors@bibletechnologieswg.org.
Over the long process to this point, the OSIS project has been supported by the American Bible Society (special thanks to Dr. Eugene Harbecker, and Rev. Trevon Gross), CrossWire.org, SIL, the Society of Biblical Liteature (special thanks to Dr. Kent Harold Richards), and the United Bible Societies. These organizations have contributed staff, resources, organized meetings, paid teleconference bills and supported in other ways the development of OSIS. Suffice it to say that OSIS would not have been possible but for their support.
Organizations played an important role in this effort, but then so did a very large number of individuals. Any listing of individuals will be incomplete but it would be remiss to not name some of the key figures who are responsible for the schema and documentation you see today:
Eric Albright: Eric spent several days at the first OSIS meeting in Rome enlightening the OSIS core group on the choices and decisions made in the creation of XSEM. Eric's insight into the problems faced by translators is profound and was instrumental in shaping core features of the OSIS schema.
Jim Albright: The common last name with Eric Albright is no accident as Jim is Eric's father. Jim has commented extensively on the various drafts of OSIS and saved the reader from both substantive as well as typographical errors on any number of occassions.
Alan Connor: What I would define as a great manager. Rather than look for paperwork or forms, Alan has always looked for solutions. Alan's friendship and assistance in OSIS meetings at SIL have been, are and always will be deeply appreciated.
Peter Constable: Peter has been deeply involved in enabling people to speak with their own voices as part of the Non-Roman Scripts Initiative at SIL for many years. We have valued Peter's advice on character set issues so that OSIS will continue that tradition.
Kees DeBlois: Kees is the long time head of technical development at UBS and OSIS was quite fortunate when Kees agreed to serve as co-chair with Steve DeRose. Kees contributed a much needed "translators" view to the encoding practices found in the OSIS schema.
Dennis Drescher: The source of the idea that became the OSIS schema. He is not to blame for any shortcomings you see in the realization of that idea. Dennis has been a long time supporter of OSIS and a participant in our discussions.
Steve DeRose: Steve agreed to become the co-chair of the Bible Technologies Group (the 'official' activity that lead to OSIS) along with Kees DeBlois in the Spring of 2001. The author of more international standards that can be easily listed, Steve has been both a source of technical work as well as keeping the more excitable members of the core group under control.
John Edwards: John's proof reading of one of the drafts of this manual made me wonder if I had read any of it while I was typing it. If the following manual is readable at all, it is due to the efforts of John and others to filter the output of my keyboard.
Darrell Eppler: Darrell has helped OSIS to be mindful of the need to go from some file on a computer system to something that the average reader would recognize as a Bible. Abstract markup is all well and good, but at some point, some people anyway, want a physical object called a Bible in their hands. To the extent that is possibe, Darrell and Jim Vevries are responsible for that being possible.
Troy Griffitts: Founder of Crosswire.org, one of the largest community projects for developing free Bible software. Troy was one of the early adopters of OSIS and has tirelessly worked on numerous drafts and test conversions of data into proposed OSIS markup.
Adina Hamik: Adina was an early booster for OSIS within the ABS and helped organize OSIS meetings. Her wit and charm helped keep the various organizations and individuals going in a common direction.
Bob Hodgson: Bob has been involved in OSIS since the early days of 2001. He wears so many hats at the ABS that it is difficult to describe which one he had on while supporting OSIS. I think it was the general, support Bible engagement one but I can't say for sure.
Chris Little: Chris is also involved in Crosswire.org (along with Troy Griffitts) and has been a long term participant in OSIS. Chris is one of the calmer members of the core team, a much needed presence more often than you would think. He has contributed to both testing and substantive development of the schema.
Kirk Lowery: Kirk is a Hebrew linguistics expert who has been helping to shape OSIS so that linguistic markup can be added to OSIS texts. Almost there Kirk! I promise!
Nathan Miles: A very good friend and colleague who is charged with no only supporting new formats, such as OSIS, but also supporting old formats as well. A gifted programmer whose suggestions have shaped and refined OSIS.
Harry Plantinga: Harry developed ThML, a language for encoding theological texts but found the time to convert materials into OSIS, to chair OSIS meetings, developed an OSIS editing tool and is now converted ThML texts into OSIS.
Todd Tillinghast: Todd writes the stylesheets that enable OSIS documents to suddenly appear like something that an average user might want to read. Among other things, Todd has created numerous test encodings and other material that has assisted in the development of OSIS.
Jim Vevries: Jim has participated in several OSIS meetings at SIL and along with Darrell Eppler, helped the OSIS team become aware of publication issues. Simply not enough to be elegant, it has to result in printed output as well.
John Walter: If others served in the trenches of writing encoding schemas and documentation, John served an equally important role of advocate for OSIS both within the ABS and to the larger Bible community. It is due in no small part to his efforts that OSIS is now being adopted by the ABS in many of its activities.
There is one name missing from the list of participants and it was not due to oversight. Mike Perez, who is now serving his country on active duty, was the critical person in OSIS moving from a breakfast conversation to the schema and documentation we have today. Mike organized and patiently attended every OSIS meeting and teleconference, while the markup specialists argued about issues too obscure to be repeated here. Mike's focus was always on getting a useable result and hopefully we have not disappointed him too badly with the current schema and users manual.
Claiming author's privilege in a foreword, I would like to publicly thank all of the participants in the OSIS project for their time, efforts and faith that has brought OSIS to this point. In a time when faith often treated with disdain, it has been my privilege to work with men and women of great faith in this project.
Patrick Durusau OSIS Technical Lead Covington, Georgia, October 2004
Welcome to the OSIS (Open Scriptural Information Standard™) User's Manual. OSIS is an XML schema that can be used to produce Bibles, commentaries, and related texts that can be easily interchanged with other users, formatted as HTML, PDF, Postscript or any other desired format, and searched on any personal computer. It provides a standard way to express such documents, which is important because it saves time, money, and effort for:
authors, who will have less need to adjust their manuscripts for each potential publisher;
publishers, who will incur lower costs by not converting texts from authors in a wide variety of formats, and by having texts in a format useable by electronic-book system vendors;
and software vendors, who can avoid writing code to manage different formats, and thus make their programs smaller, faster, and more reliable.
The OSIS development team studied previous Bible encoding proposals, as well as literary encoding in general. We tried to avoid the weaknesses and gain some of the strengths of each one. We thank the many people who worked on those prior specifications, as well as those who have provided help and feedback in developing OSIS itself. A list of participants may be found in Appendix *****.
Users familiar with the Text Encoding Initiative (TEI) will find OSIS markup quite familiar. The bulk of the elements we define correspond directly to TEI elements, and often have the same name (often with simplified content models). The schema also provides a TEIform attribute for such elements, so they can be recognized by form-aware processors as equivalent to their TEI counterparts.
OSIS is provided as a free resource by the Bible Technologies Group™ (or BTG™), which is a collaborative effort of the American Bible Society, the Society of Biblical Literature, the Summer Institute of Linguistics, the United Bible Societies, other Bible Societies and related groups, and individual volunteers around the world. OSIS was designed to meet the needs of diverse user communities who read, study, research, translate or distribute biblical texts. This introduction gives a brief overview of OSIS before leading you step by step through producing your first OSIS text.
The first question that is often asked when learning that OSIS uses XML (a markup language) is: ‘I'm not a computer person. Can I learn to use OSIS?’ If you can type and use even the most basic word processor or computer text-editing program, the answer is clearly ‘Yes!’ OSIS was designed to offer the beginning user a simple way to do the basic ‘markup’ required for a standard biblical text. ‘Markup’ refers to markers placed within the text, that indicate where useful units (or ‘elements’) such as verses, quotations, cross-references, and other things begin and end.
If you know HTML, you already know most of what you need to know to use OSIS; OSIS uses the same pointy-bracket syntax as HTML (or XML to be completely precise). It merely provides a different set of element and attribute names. A few names such as ‘p’ and ‘div’ are the same; others are new, such as ‘verse.’ The core set of elements for OSIS is only slightly larger than the set for HTML 3.2. To be sure, there are some complex cases that we deal with later, but you can do useful work with no more information than is provided in this basic manual.
The second question that is most often asked is: ‘Do I need an XML editor to do OSIS?’ This question often comes up after a friend of a friend has recommended some editor, and you then checked its price. XML editors vary from free to over $10,000.00 (US), and their ease of use varies greatly.
The basic answer is no, you do not need any special software. You can use any text editor you like to create OSIS documents. Many will even color the tags for you, because they know how to color HTML tags and the languages are similar enough. However, you should have a way to check your documents for errors -- if your editor doesn't know enough about XML to warn you if you misspell a tag, or forget to end some element that you started, you will want to check for errors periodically using an "XML validator". Many such programs are available for various computers; some are available as Web services. (See Validating Your OSIS Document for pointers and instructions on web based validation services.) Both Internet Explorer and Netscape can also validate an OSIS file once you have installed the OSIS schema and an appropriate stylesheet.
An OSIS-aware text editor will do this checking for you, either on demand or continuously. A better OSIS-aware text editor will provide help by showing you just which elements are permitted at any given place. The best editors also give you the option to see and edit a fully-formatted view on demand, rather than staring directly at pointy-brackets. The choice between the many tools is a personal one, dictated by your working style, level of technical sophistication, goals, budget, and other factors.
Note on examples: This manual offers a number of examples of OSIS markup. The formatting of those examples (indenting, line breaks, etc.) is for ease of reading only. XML processors don't care if all the elements were run on after the other, although XML encoders would find that difficult to read. Do not spend time trying to make your markup match the indents, line breaks, etc., in this manual.
For future reference: From time to time you will see paragraphs that start: For future reference:. On your first (or even second) reading of this manual you can safely skip those paragraphs. They contain information that you may need later as you become more experienced in using XML or for solving a particular problem.
3. Some authoring tools
The OSIS team is working to adapt free authoring tools that will hide most, if not all, of the markup from the casual user of OSIS. In the meantime, the best way to learn OSIS is to use a simple text editor, such as WordPad or Kedit on Windows, BBEdit or Alpha on MacOS, or vi or emacs on Linux. You can even use a word processor, though any formatting that you do in it won't matter (you would simply save the file as "text only").
The examples in this manual have been kept deliberately short and can be downloaded as a package from the OSIS website. After you have gained some basic skill using OSIS, you may want try out more sophisticated editors.
Editing is much easier with an editing program that is aware of XML rules in general, and OSIS in particular. For example, rather than seeing literal tags with pointy-brackets, you can have a choice of seeing that, or structural views of your document (say, as a tree or expandable outline), or fully-formatted views to facilitate print layout.
Many products are available that can help you edit XML documents. One style shows the literal XML source file, but colors tags, attributes, and other things to make them stand out. Most such programs also read an XML schema and ensure that you only insert elements and attributes are permitted by the OSIS schema (schemas, such as the OSIS one, declare what elements and attributes are permitted where in documents of a particular kind). One free and helpful tool of this kind is jEdit, which runs on most platforms. It can be set up to know about many kinds of files, including XML files, and OSIS in particular.
With such an editor, you can see or print a basic a formatted view by using most any Web browser. Later in this manual are instructions for setting up an OSIS file with a style sheet (generally in CSS) so that typical browsers can render the OSIS text. An OSIS file can also be rendered to HTML or PDF, for example, by using XSLT stylesheets.
There are also more word-processor-like XML editors, which primarily show a formatted view defined by some style sheet. These are mainly commercial. Commercial editors include: XML Spy (http://www.xmlspy.com/); XMetal (http://www.corel.com), and Serna Editor (http://www.syntext.com). Free editors and other XML tools can be found at Free XML Tools (http://www.garshol.priv.no/download/xmltools/), as well as other XML sites on the WWW.
For high-end layout and typesetting from XML source files, usually a stylesheet language called XSL-FO is used. Two of the more popular commercial XSL-FO solutions are 3b2 (see http://www.3b2.com/), and Antenna House (see http://www.antennahouse.com/). A free XSL-FO processor, FOP (Formatting Objects Processor) is available from the Apache Foundation, (http://xml.apache.org/fop/. Non-XML-based composition systems such as Quark™ and TeX have ways to import XML, but using them for XML composition requires substantial expertise and effort.
4. Your First OSIS Document
Like HTML documents, an OSIS document starts with a header, and is followed by the actual text content. The header identifies the file as being XML, and that it uses the OSIS schema. It also provides places to declare a bibliographical description of the text and of any other works cited; and a place to record a history of editing changes. Here is a short, but valid, OSIS document:
BibleBible.en.CEV.1995Copyright 1995 American Bible SocietyEsth.1.1-Esth.1.4BibleBibleBible
King Xerxes of Persia lived in his capital city of Susa and ruled one hundred twenty-seven provinces from India to Ethiopia.
During the third year of his rule, Xerxes gave a big dinner for all
his officials and officers. The governors and leaders of the provinces
were also invited, and even the commanders of the Persian and Median
armies came.
For one hundred eighty days he showed off his wealth and spent a lot
of money to impress his guests with the greatness of his kingdom.
5. XML and OSIS declarations
The first line above identifies the document as being XML and its encoding declaration, in this case UTF-8. The encoding declaration tells the parser (part of the software that reads the file) what character set has been used with this document. Other encoding declarations are possible but this will be the most common one.
The second through third lines are a very long start-tag for the outermost OSIS element, which is called . All elements in an OSIS document must be declared within the OSIS namespace. There are two ways to achieve this and other than remembering to pick one of the two following methods, that is all you need remember about it to start encoding texts using OSIS 2.0.
OSIS Namespace, Method 1: Copy the following lines just after :
OSIS Namespace, Method 2: Copy the following lines just after :
Note with the second method, the last closing element must be: . The first method is simpler but both are legitimate.
For future reference: The first method declares that the OSIS namespace is the default namespace. The second allows use of the OSIS namespace only if you have the prefix "osis" before the element where the namespace will be used. Namespaces are inherited from their parent elements so there is no practical impact on how you would author an OSIS document.
At this point, the OSIS document has begun. This sample is a single document rather than a collection of documents, so the next element opened is osisText:
The osisText element does have more attributes than are shown here but these are the most important ones for a basic OSIS document.
Every osisText element is required to supply an osisIDWork attribute and value. The value will be the short name of what is being encoded, in this case the Contemporary English Version, or CEV. The short name is defined in the work declaration for the text, described later.
Every osisText also needs to specify what reference or versification system any osisRefs within it use. In the case of the Contemporary English Version, that would be the NRSV (New Revised Standard Version). The following is a list of names for reference systems to be used in OSIS documents:
AV Authorized Version (same as KJV)
KJV King James Version (same as AV)
Loeb Classical literature
LXX Septuagint
MT Masoretic Text
NA27 Nestle-Aland, 27th Edition of the Greek New Testament
NRSVA New Revised Standard Version with Apocrypha
SamPent Samaritan Pentateuch
Synodal Russian
Vugl Vulgate
The xml:lang attribute is required on all osisText elements. While the language element in osisWork allows a wide range of language classification systems to be documented, be aware that xml:lang is much more restrictive. The xml:lang attribute will only recognize ISO 639-1 two letter codes and those from IANA. To use anything else with this attribute, you must first prepend a "x-" to the value.
The canonical attribute is available on all elements. It has a ‘default’ value so it does not have to be entered by the encoder if the default value is acceptable. Here it is shown with its default value of true on osisText so the reader can see it in operation.
When canonical="true", it means that the content of that element is a part of the text being encoded. For example, the "text" of the Bible includes the content of books, chapters, and verses but does not include notes, section-headings added by editors or translators, etc. Therefore, the default value for elements such as note is false, as the content of that element has been added by an editor or author to the text being encoded. It should be explicitly noted that the value of the canonical attribute should not be used to reflect theological judgment about the content of a text, but merely to distinguish between what has been added to the text and what has not.
In most cases use of the canonical attribute is straightforward, and the default values will almost always produce the intended result. However, there will arise truly difficult cases: for example, one may be encoding an ancient text with annotations of its own. In that case those notes would be canonical, while any added by the current editor would not be. In such cases, the practice chosen and its rationale should be described in the work's documentation.
6. The OSIS text header
The first element following osisText is required to be header. The header contains the revisionDesc, work, and workPrefix elements for a particular work. These elements must entered in that order, although each may occur an unlimited number of times.
In other words, an OSIS text header may have:
a date
description of revision
a date
description of revision
...more elements... elements for workelements for work
...more elements attributes only, an empty elementattributes only, an empty element
...more
An OSIS text header may NOT have:
a date
description of revision
elements for worka date
description of revision
attributes only, an empty elementelements for work
...more elements attributes only, an empty element
In other words, the order always is:
elements (any number of them), followed by:
elements (any number of them), followed by:
elements (any number of them.
6.1. revisionDesc
To record changes or edits to the text, authors and editors are encouraged to insert a revisionDesc element every time significant editing is done. Each revisionDesc element should contain a date element which says when those edits were completed, in the form
yyyy.mm.ddThh.mm.ss
Note that all fields must have exactly the number of digits shown (4-digit year, 2-digit month, etc.). It is permissible to omit the time and the preceding "T", thus giving just a date. For example, December 25th of 1999 CE would be:
1999.12.25
A date element in the revision description is followed by any number of p (paragraph) elements, in which the changes made are summarized. The person responsible for making the changes should also be identified, using the resp attribute on the revisionDesc element.
Recommended practice is that more recent revisionDesc elements appear earlier in the document. That is, entries should occur in reverse chronological order. For example:
2003.09.11
sjd: Filling in the gaps. Adding some info for 2.0 as defined at the Calvin College meetings.
2003.07.01
sjd: Annotated alpha list of elements. Reworked reference and work sections and added type, scope, and explanations of type and subtype for work. Explained more elements and attributes.
2003.06.17
sjd: Wrote conformance section. Added lists of elements and attributes, USMARC list. Inserted placeholders for doc on all element types. Got document back to XML WF. Wrote CSS stylesheet.
6.2. work
A work element provides information comparable to that found on the title page of a printed work, using the fields defined by the Dublin Core Initiative (see http://dublincore.org/).
The work element in the header with an osisWork attribute that matches the osisIDWork in the osisText element identifies the work in which it occurs -- much like the title page in a printed work. For example:
Note that the match between osisIDWork="CEV" in osisText and osisWork="CEV" in the work element links this osisText to this particular work element.
Other work elements (ones that follow the first one) identify other works -- much like a citation in a footnote or bibliography in a printed work. Each assigns a local name to that work, using the osisWork attribute. Works so declared can then be referred to in osisIDs or osisRefs throughout the text. For Bibles, this should generally be the accepted acronym or abbreviated form of the translation's name (some standard version abbreviations are listed above). No periods, hyphens, spaces, or colons are allowed in short names.
6.3. osisWork
The osisWork attribute of each work element provides a short name used to refer that work and its declaration as a whole. As of version 2.1, OSIS specifies the recommended format described below for those short names. Also, when using this format within the Dublin Core identifier element, the type attribute must be set to "OSIS".
6.4. Work Prefix Defaults
Each work defined in the header is required to provide a short name for itself in the osisWork attribute. These can then be used as prefixes (similar to XML namespace prefixes) on osisID, osisRef, and other attributes throughout the rest of the document.
In OSIS versions through 2.0, specific attributes were provided to set a default work prefix for osisIDs (osisIDWork on the osisText element) and for osisRefs (osisRefWork on the osisText element). These attributes remain available in OSIS 2.1, but a more general defaulting mechanism has been added.
From OSIS 2.1 on, a defaults element is allowed at the end of the header, after all the work elements. It contains any number of workPrefix elements, each of which sets the default work prefix for a particular attribute on a particular element type. For example:
This declaration indicates that the default work prefix on all annotateRef attributes of note elements is to be "Bible.KJV". No colon is to be included (the colon is used to separate a work prefix from the rest of a reference when the work prefix is explicit rather than defaulted).
The syntax of the path attribute is taken directly from the W3C XPath Recommendation, and can be correctly interpreted by any conforming XPath processor. However, the form shown above is the only form permitted at this time; more complex XPaths are not permitted. In other words, the path attribute must consist of "//", an element type name, "/@", and an attribute name. If a more detailed defaulting mechanism is required in the future, it will likely be provided by permitting a wider range of XPath's features.
A particularly useful application of this defaulting mechanism, is for the morph and lemma attributes of the w element (which provides for word-level linguistic annotation). Because the w element is so frequent when used, defaulting the prefix (which points to a work defining the morphological of lexical system used) can save a lot of space.
7. work
7.1. Introduction
The work element is introduced with two short examples and then each of the elements that may occur in the work element are treated separately. Note that the elements in the work element are split into two sections, those that also occur in Dublin Core metadata and ones specific to OSIS.
Each work element describes a single publication using several pieces of information, primarily title, creator, date, publisher, identifier and language. All of the standard "Dublin Core" fields may be used, plus a few OSIS-specific additions (further information on the Dublin Core system may be found at http://www.dublincore.org). All of the elements in work may be repeated as necessary, but must be encoded in the order shown here. For example:
Alan GardinerFrancis Llewellyn Griffith19272003GrammarGriffith Institute, Ashmolean Museum, OxfordENEG-ancient090041635195230980
Clarence Jordan19692003BibleAssociation Press New York, NYEN080961725069-18840
7.2. Dublin Core Elements
7.2.1. title
A title element must be provided in the work element and contain the main title of the work. Additional titles may also be specified, using the type attribute to identify them as main, sub, part, monographicSeries, or another kind of title. No OSIS-specific types are established for this type attribute.
7.2.2. creator
The creator element is used to specify the person(s) or organization(s) who are primarily responsible for the intellectual content of a work. The role attribute must specify the particular role the primary responsible party played. The most common values would be aut (author), edt (editor), cmm (commentator), trl (translator). A short list of such codes appears along with the complete set appears in the appendices under USMARC Relator Codes.
7.2.3. contributor
Many people may contribute to a work in roles other than the primary role listed under creator. They should be listed using the contributor element. Their specific role should be recorded in the role attribute of their contributor element.
7.2.4. date
Date elements in the work element record significant dates in the production or publication process. Use the event attribute to identify the particular date contained in each date element. The defined events are:
original The original publication date of the first edition
edition The date of publication of the referenced or source edition
imprint The printing date of the referenced or source edition
eversion The revision date of the present electronic edition
The type attribute is used to identify the calendrical system in which the date is expressed, from the list: Chinese, Gregorian, Islamic, ISO, Jewish, and Julian. At this time, OSIS only defines a syntax for Gregorian dates: yyyy.mm.dd.
For example:
17492003-12-30
For processing purposes, OSIS compliant software will assume Gregorian dates unless the type attribute indicates otherwise. See "Date Formats" for further details on the encoding of date information.
7.2.5. publisher
The publisher element in the work element is used to identify the publisher of a particular work. If a work was published by more than one publisher, and that publication record needs to be recorded, use multiple publisher elements and distinguish them using the type attribute. The description given in this attribute is not constrained but it is suggested that values that tie a publisher to a particular edition, such as should be used. For cases where full identification of a publication history is essential, use of multiple work elements is suggested.
7.2.6. language
A language element must be provided for each language used substantially in a work. The language may be specified using an ISO 639 or ISO 639-2, or SIL Ethnologue code. The type attribute must be set to IANA, IETF, ISO-639-1, ISO-639-2, ISO-639-2-B, ISO-639-2-T, LINGUIST, or SIL. In the rare case that none of these is sufficient, a prose description should be inserted in the element and the type attribute set to other.
7.2.7. type
The nature or genre of the content of the resource. This element includes terms describing general categories, functions, genres, or aggregation levels for content. Dublin Core's recommended best practice is to select a value from a controlled vocabulary (for example, the DCMI Type Vocabulary -- see http://dublincore.org/documents/dcmi-type-vocabulary/). OSIS does not provide such a controlled vocabulary at this time. If you encode this element, the controlled vocabulary in use should be identified via the type attribute (for example, ).
To describe the physical or digital manifestation of the resource, use the format element instead.
Note that the Dublin Core type element is distinct from the OSIS type attribute (the latter can occur on any OSIS element, to distinguish relevant subdivisions of the type).
7.2.8. identifier
The identifier element provides one formal identifier for the work. The values to be entered for the type attribute on the identifier element are shown in bold. Note that these values must be entered exactly as shown. XML is case sensitive, that is to say, DEWEY is NOT equal to Dewey. Enter Dewey and you will get an error message.
A work, represented by a work element, can have more than one identifier element. That is to say it may have different identifiers in different identification systems. For example, a book could have recorded in its work element, as having two identifier elements, one for its ISBN, 0-310-92955-5, and one for its Library of Congress Control Number, 2002107776.
DEWEY Dewey Decimal System
DOI Digital Object Identifier
ISBN International Standard Book Number
ISSN International Standard Serial Number
LCCN Library of Congress Control Number (also known as "Library of Congress Card Number")
OSIS Open Scriptural Information Standard
SICI Serial Item and Contribution Identifier
URI Uniform Resource Identifier
URL Uniform Resource Locator
URN Uniform Resource Name
ISBN and LCCN numbers must be recorded without spaces or hyphens. ISBNs must contain ten digits (that is, they must include the final check digit).
We strongly recommend the assignment of an ISBN to each published work using OSIS. This number must, if available, be specified in the identifier element for the work.
The following examples show identifier elements used along with their type attribute to provide an identifier for a work, in this case, the "Cotton Patch Version of Luke and Acts" noted above:
080961725069-18840
Note that without the proper type attribute, a reader or computer only has a string of numbers, which could be from almost any system of identifiers. The type attribute plays an important role in making sure the information you so carefully record is understandable to others or even yourself, after a few months have lapsed since you looked at the text.
7.2.9. coverage
This element may be used to specify the spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity) to which the work applies. For example, an edition of Herodotus could be specified as Greek/Hellenic, Classical Period. Or a study of Medieval Bibles could declare coverage as Medieval.
7.2.10. description
An account of the content of the resource.
Examples of description include, but are not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.
7.2.11. format
The physical or digital manifestation of the resource.
Typically, format may include the media-type or dimensions of the resource. Format may be used to identify the software, hardware, or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types [MIME] defining computer media formats).
7.2.12. relation
A reference to a related resource.
Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.
7.2.13. rights
Information about rights held in and over the resource.
Typically, the rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and other property rights. The content of the rights element is informative only. Legal rights and penalties for violation of those rights vary from jurisdiction to jurisdiction. Reuse of any resource should be done only after obtaining the necessary rights and permissions or ascertaining that none is required.
7.2.14. subject
A topic of the content of the resource.
Typically, the subject will contain keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.
7.2.14.1. subject classification systems
The type attribute on subject allows the user to specify the classification system the subject entered can be found.
Fathers of the Church
The above means that the subject "Fathers of the Church" is a subject found in the listing of subjects maintained by the American Theological Libraries Association (ATLA). To assist users, an admittedly partial list of the more well known subject classification systems has been prepared by the OSIS project. Those systems with their abbreviations for use with an OSIS encoded text are as follows:
NLSH National Library Subject Headings (National Library of Poland)
RSWK Regeln für den Schlagwortkatalog
SEARS Sears List of Subject Headings
SOG Soggettario
SWD_RSWK Swiss National Library
UDC Universal Decimal Classification
VAT Vatican Library
For classification systems not listed, insert the classification system with a leading "x-" in the type attribute, and notify the OSIS team if that system should be added in a future revision of the schema.
7.2.14.2. source
A reference to a resource from which the present resource is derived.
The present resource may be derived from the resource identified by the source element in whole or in part. Recommended best practice is to identify the referenced resource by means of a string or number conforming to a formal identification system.
7.2.14.3. type
The nature or genre of the content of the resource.
The type element contains terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the DCMI Type Vocabulary [DCT1]).
To describe the physical or digital manifestation of the resource, use the format element.
7.3. Non-Dublin Core Elements and Attributes in the Work Declaration
7.3.1. scope
The scope element(s) must have an osisRef attribute, which defines what part of the titled work occurs in this electronic edition. For example, an edition may consist of only the New Testament and Psalms, or of only a single book. Contiguous ranges may be specified using the hyphen notation described later for osisRefs in general; discontiguous ranges must be specified by including multiple scope element(s), as shown in the second example above. These should be, but are not required to be, in canonical order.
7.3.2. castList
The castList element is just one of many places where OSIS reveals its reliance on prior Bible encoding proposals. This particular element was chosen based upon analysis of XSEM.
The castList element is composed of any number of castGroup elements.
7.3.2.1. castGroup
The castGroup element provides an easy way to consolidate information about the characters in a dramatic text for later reference in the encoding of that text. The castGroup element is composed of actor, role, and roleDesc elements. See the material under Dramatic Texts for details and examples on using the castList and related elements.
7.3.2.2. teiHeader
The content of a teiHeader element is not processed as part of the OSIS schema. It is provided for cases where a TEI header file or portion of a text needs to accompany a text that is now being encoded in OSIS markup.
7.3.2.3. refSystem
Since versification systems differ between Bible translations (the original texts had no versification at all), it is important to note here which versification system is being followed for a particular edition of the Bible that is being encoded. Works of classical authors as well may have differing reference systems.
The value here should be the the osisWork attribute value from a work declared in a work element in the header of the document.
7.4. Elements allowed in elements
The work element contains the following elements, which must appear in this order, although they can each be repeated as many times as necessary:
title
contributor
creator
subject
date
description
publisher
type
format
identifier
source
language
relation
coverage
rights
scope
castList
teiHeader
refSystem
8. Date formats
All dates in the header and in attributes should be in this standard format, which is based on IETF RFC 3339. However, it uses period rather than colon as the field separator (for consistency with other OSISis types), and adds features to allow for dates BCE, for approximate dates, for date ranges, for yearless dates (as used in many daily devotionals), for weekly dates, and for named times of day (such as used in many prayer books). There are three standard date formats; the prefixes that identify them are reserved, and may not be redefined via the osisWork attribute of any work element:
yearly:yyyy.mm.ddThh.mm.ss
Any number of fields may be left off from the right end; for example, if the seconds are dropped (along with the preceding colon), the time refers to the entire minute specified; if the entire time section is left off (along with the preceding "T"), the string refers to the entire day.
The year must always have four digits. However, the year may be entirely omitted to indicate dates that apply to any year, such as in a book of 365 daily readings.
To indicates years before the common era, add an underscore ("_") before the first digit of the year (immediately following the colon). (A hyphen would be preferable, but it is already in use to indicate ranges in osisRefs).
The entire date/time string (possibly including a leading underscore) may be preceded by "~", indicating that the time is approximate. No means is provided to express just how approximate a time may be.
weekly:n
When readings or other materials are specified as being for particular days of the week, this form must be used. The 'n' value may range from 1 to 7; 1 indicates Monday, in accordance with ISO 8601:2000.
As an alternative to quantitative times, a small set of named times is provided, which can be specified in place of the entire (post-"T") time section (the "T" itself remains). For example:
yearly:06-04T~(Vespers)
would be the identifier for a prayer, reading, or other work to be used at Vespers on June 4 of any year. The named times (which are case-sensitive) include: Vigils, Matins, Lauds, Terce, Sext, None, Vespers, Compline; Sunrise, Sunset; Morning, Afternoon, Evening, Night; AM, PM; Fajr, Zuhr, _Asr, Maghhrib, _Isha, Lail, Dzuha, _Id.
Some works will be primarily organized by dates and times: for example, lectionaries, daily devotionals, prayer books, historical time lines, etc. In such works, use the osisID attribute to identify a retrievable portion of that work. The value should the the applicable time in one of the formats just shown.
Typically, such works are organized in chronological order of the times specified; however, that is a user or publisher requirement and not one imposed by OSIS.
9. Title Pages
In order to make the encoding of title pages as found in standard works easier, OSIS 2.0 introduced the titlePage element. This element contains the following elements from the header: title, contributor, creator, subject, date, description, publisher, type, format, identifier, source, language, relation, coverage, which are explained in the material on the header section. Three additional elements are allowed, which are figure, milestone, and, p. Due to the complexity of title pages, all of these elements may occur in any order inside the titlePage element.
The titlePage element can occur within the osis, osisText, and, osisCorpus elements.
Users just starting with OSIS should use a minimum header and a simple titlePage element until they have gained some experience with text encoding and determining what is, or perhaps more importantly, what is not useful to have encoded in a work.
9.1. Elements allowed in the elements
The titlePage element contains the following elements, which may appear in any order, and may be repeated as many times as necessary:
title
contributor
creator
subject
date
description
publisher
type
format
identifier
source
language
relation
coverage
figure
milestone
p
10. The
Element
While book, chapter, and verse numbers are a familiar and useful way of referring to locations in the Bible, they often conflict with the boundaries of parables, stories, genealogies, paragraphs, quotations, and other important units of understanding. Even to print a well-formatted Bible edition, and much more to support high-end search, annotation, and other capabilities, these meaningful units also must commonly be marked.
It is possible to encode a Bible using only book, chapter, and paragraph markup. However, most encoders also want to also represent sections, verses, quotations, and so on. Higher-level structures are tagged as div, for "division", with a type attribute to specify the particular significance. div elements can occur within other div elements to any number of levels. The first and outermost div should occur immediately after the end of the header. For example,
1585160555Copyright 1995, 2003 American Bible SocietyBibleBible
Genesis
1In the beginning,...The earth was formless and void...
...
The div element is used for many top-level components, and so makes heavy use of the type attribute. The pre-defined types include the most common major divisions found in present-day Bibles and related works:
acknowledgement
afterword
annotant
appendix
article
article
back
body
book
bookGroup
chapter
colophon
commentary
concordance
coverPage
dedication
devotional
entry
front
gazetteer
glossary
imprimatur
index
introduction
majorSection
map
outline
paragraph
part
preface
section
subSection
titlePage
The main body of a Bible will typically consist of div elements of
type="bookGroup"
(such as each Testament, the Apocrypha, and perhaps smaller groups such as the Pentateuch, the Minor Prophets, etc), plus any front and back matter divisions (the selection of which varies greatly between editions).
The books of the New Testament would be grouped as follows:
New Testament...
s for individual books here...
Within each div of
type="bookGroup"
, there will typically be book division types corresponding to each included Canonical or deutero-canonical book. The book division type,
>, may contain
> (such as the sub-books in Psalms),
> (typically topical divisions with headings), and
> (occasional minor divisions within sections).A specific chapter element is provided and encouraged, though a
> is also permissible.
Expanding the New Testament example to include the first two Gospels:
New Testament
Matthew
Chapter 1...text of chapter 1...
Mark
Chapter1...text of chapter 1...
Below this point typical texts switch from successive levels of div elements, to more specific markup such as paragraphs, lists, quotations, inscriptions, and the like. Also at this level, the markup commonly begins to interact with verse markup.
Use of one of the types defined (provided) for div is mandatory when a provided type is applicable. For example, a colophon must be marked up as
. If a type is needed but not provided, it may be added but must begin with "x-", to distinguish it from OSIS-standard values.
Such markup forms the primary backbone of an OSIS document. Chapter and verse elements are important (particularly for retrieval), but considered to be an overlay onto the more linguistic or thematic structure. Given the prevalence of book/chapter/paragraph divisions in modern translations, that is the hierarchy that prevails in all cases of conflict. Most cases of conflict will be with the marking of verses. Some translations do not follow the modern practice of using paragraphs in translation and therefore there is no conflict between verses and paragraphs. So long as verses and chapters do not cross the boundaries of other elements, they may be expressed in as shown in the following example (NASB):
Mark Chapter 10
DivorceJesus then left that place and went into the region of Judea and across the Jordan. Again crowds of people came to him, and as was his custom, he taught them. Some Pharisees came and tested him by asking, "Is it lawful for a man to divorce his wife?" "What did Moses command you?" he replied. They said, "Moses permitted a man to write a certificate of divorce and send her away." "It was because your hearts were hard that Moses wrote you this law," Jesus replied. "But at the beginning of creation God 'made them male and female.' 'For this reason a man will leave his father and mother and be united to his wife,and the two will become one flesh.' So they are no longer two, but one. Therefore what God has joined together, let man not separate."When they were in the house again, the disciples asked Jesus about this. He answered, "Anyone who divorces his wife and marries another woman commits adultery against her. And if she divorces her husband and marries another man, she commits adultery."
...
The quotes that appear in this example present a special problem as they do cross the boundaries of verses in an edition. Rather than introduce the mechanism for such cases here, the traditional quotation marks are used in this example. See the section Elements that cross other elements for details on dealing with such quotations.
In cases where the translation follows the modern practice of using paragraphs in the translation that cross verse boundaries, it is necessary that all verses be marked using the techniques described in Elements that cross other elements. That may seem like a burden but in order to have easy processing for a text, it is necessary that all similar parts of it be marked in the same way. As work on OSIS and other XML editors continue, it will become easier concentrate on the substance of the text and allow automatic mechanisms to deal with the technical niceties of the underlying markup. Until then, however, you will need to compenstate for the weaknesses of XML processors so that your Bible text can be easily produced for the WWW, print, cellphones and other devices.
10.1. Elements that may occur in
elements
The div element allows the following elements to occur within it:
• a • abbr • chapter • closer • date • div • divineName • figure • foreign • hi • index • inscription • lb • lg • list • mentioned • milestone • milestoneEnd (depracated, don't use) • milestoneStart (depracated, don't use) • name • note • p • q • reference • salute • seg • signed • speaker • speech • table • title • transChange • verse • w
It is a milestoneable element. It also allows for mixed (text and element) content.