From teherba.org
Jump to: navigation, search

SchemaList - List the Element Tree of a W3C XML Schema

Starting with the first xs:element definition, the type hierarchy of a W3C schema file is recursively expanded. The output is a linear, indented list of the unfolded possible substructures, the leaf XML elements and their attributes.

Optionally values may be generated, and the tool can generate comments which show the schema type, data type, restrictions and annotations attached to the elements. With value generation and a selection of the first choice, the output will be a well-formed XML instance which usually validates against the input schema. This representation has the big advantage that it shows both the schema design and a real instance of that schema, both combined in a single XML document.

The schema list can be shown as HTML (the default), plain text, pure XML or in tab separated format which is suitable for MS-Excel. For HTML, the start tags have a link showing the XPath to the element. In Excel, columns can easily be hidden or appended. Such a worksheet is then a good base for the development of additional restrictions, mapping rules and the like.

The tool may be called on a commandline or from a web page. The following optional settings may be specified:

  • -c show comments with types, restrictions, patterns etc.
  • -e enc source file encoding, default: UTF-8
  • -e enc target file encoding, default: UTF-8
  • -f show first alternative of choices only
  • -m mode output mode: "html" (default), "plain", "tsv" (for MS-Excel) or "xml"
  • -s show start tags only (no end tags)
  • -v generate element values

These options may be combined. Typical settings are:

  • -cvf generate a well-formed, commented XML instance from the schema
  • -s only show the minimal indented element structure without end tags
  • -cv -m tsv generate an Excel worksheet with comments and values
  • -v -m xml show elements with values in the browser's XML representation (elements can be collapsed)
  • -vf -m plain generate a concise XML instance file which may be stored and validated

In the web interface, the user specifies the desired options and uploads the input schema file to the application on the web server.

Example for HTML output (-cv -m html)

<Document xmlns="urn:sepade:xsd:pain.001.001.02%22><!--[1..1] Document -->
     <pain.001.001.02><!--[1..1] pain.001.001.02 -->
         <GrpHdr><!--[1..1] GroupHeader20 -->
             <MsgId>Max35Text</MsgId><!--[1..1] Max35Text string 1..35 -->
             <CreDtTm>2007-06-29T05:30:00Z</CreDtTm><!--[1..1] ISODateTime dateTime -->
             <NbOfTxs>09</NbOfTxs><!--[1..1] Max15NumericText string /[0-9]{1,15}/ -->
             <CtrlSum>1</CtrlSum><!--[0..1] DecimalNumber decimal L18.17 ! SEPA AOS Can optionally be used as specification for the total amount of the file. -->
             <Grpg>GRPD</Grpg><!--[1..1] Grouping2Code string "GRPD" ! Only the GRPD option may be used.-->
             <InitgPty><!--[1..1] PartyIdentification20 ! Initiating party. -->
                 <Nm>Max70Text</Nm><!--[1..1] Max70Text string 1..70 ! AT-02 Name of the originator.-->
                 <PstlAdr><!--[0..1] PostalAddress5 ! AT-03 Address of the originator.-->
                     <AdrLine>Max70Text</AdrLine><!--[1..2] Max70Text string 1..70 -->
                     <Ctry>AZ</Ctry><!--[1..1] CountryCode string /[A-Z]{2,2}/ -->
                 <Id><!--[0..1] Party5Choice ! AT-10 - ID of the originator. Recommendation: This field should not be used.-->
                     <__unresolvedChoice__><!--[1..1] -->
                       <OrgId><!--[1..1] OrganisationIdentification2 -->
                           <BIC>COBADEFF</BIC><!--[0..1] BICIdentifier string /[A-Z]{6,6}[A-Z2-9][A-NP-Z0-9]([A-Z0-9]{3,3}){0,1}/ -->
                           <IBEI>AZBDFHJNP0</IBEI><!--[0..1] IBEIIdentifier string /[A-Z]{2,2}[B-DF-HJ-NP-TV-XZ0-9]{7,7}[0-9]{1,1}/ -->
                           <BEI>BEIADEFF</BEI><!--[0..1] BEIIdentifier string /[A-Z]{6,6}[A-Z2-9][A-NP-Z0-9]([A-Z0-9]{3,3}){0,1}/ -->
                           <EANGLN>0909090909090</EANGLN><!--[0..1] EANGLNIdentifier string /[0-9]{13,13}/ -->

Example for output in MS-Excel

caption Schema list in Excel worksheet

The columns of the Excel worksheet are filled as follows:

A indented elements with generated values
B cardinality, multiplicity: minOccurs and maxOccurs
C schema type
D elementary XML datatype
E restrictions: string lengths, number ranges, patterns, value enumerations
F annotations attached to the element
G absolute XPath to this element node
H a single ";" in all relevant (non-descriptive) rows, useful for hiding rows

Special Rows

There are two sorts of special rows which are useful for the description of the schema, but which will not lead to a valid XML instance.

  1. With option "-c", any attribute is shown on separate line starting with "@", since attributes also have types, restrictions etc.</>
  2. Without option "-f", any <xs:choice> leads to an artificial element <__unresolvedChoice__> in order to make visible that this choice must still be resolved to yield a valid XML instance. This resolution could be realized by an XSLT stylesheet or by manual editing.

Value Generation

The tool tries to generate validating values for the most common cases. This works rather well for the ISO 20022 message schemata relevant to SEPA (camt, pacs, pain families), but it will possibly fail for complicated patterns or different application areas.

The values are generated with a fixed set of rules which depend on the elementary data type, sometimes the schema type, and the restrictions. The following table shows these rules:

Datatype Restriction, Schema Type Generated Value
boolean   true
decimal   1
dateTime   2007-06-29T04:30:00Z   (?)
date   2007-06-29
NCName   NCName
decimal   1
string (pattern) (the characters from the pattern repeated up to a minimal length)
string (length) (schema type name truncated or padded with letters)
string (enumeration) (the first alternative)
string CurrencyCode EUR
string IBANIdentifier DE28500400000123456589
string BICIdentifier COBADEFF
string BEIIdentifier PUTMDEEM
string CHIPSUniversalIdentifier CH012345