From teherba.org
Jump to: navigation, search

xtrans is a collection of classes which transform binary files (consisting of 8-bit bytes) or text files (strings of encoded characters) to XML and vice versa.

The package is a bridge from the Java and XML world to legacy file formats with

  • binary, packed decimal or EBCDIC fields,
  • column oriented record structures,
  • arrays of fields,
  • record type indicators and overlaying variants of subrecords,
  • length fields and so on.


The package is a building block for:

  • analysis, checking and extraction programs,
  • manipulation of files in one format,
  • general (file) converters between different formats.

The package abstracts from the lexical specialities of a format. It facilitates syntactical analyses and transformations since these tasks can all be done in a uniform way in the XML world, normally by one or more XSLT stylesheets.

The m * n complexity of converting from m different input formats to n output formats is reduced to m + n by the attempt to introduce an intermediate format, in this case XML (eXtensible Markup Language).

The transformers in this package convert between a specific format and XML. Ideally the conversion of a file to XML and back to the specific format reproduces the original file byte for byte. Sometimes, there are minor, irrelevant differences like spacing, suppression of comments etc.

The design of the whole package is such that the generated XML is as close to the specific format as possible. This may seem unneccessary when converting to XML, but is very helpful when converting from XML to the specific format. In general, the sequential order of the XML elements is essential for the correct serialization to the specific format.

In order to solve the general conversion problem, applications must still understand the whole syntactical (BUT NOT lexical) structure AND the semantics of the source and of the target format.


Binary files are processed with the Byte<name> series of classes, while text files are processed with the Char<name> series of classes.

The two classes ByteRecord and CharRecord have an underlying buffer and define

  • a record with Fields of fixed length, name, and some starting position in the record,
  • access methods (setters and getters) for these fields,
  • EBCDIC and packed decimal conversion in the case of ByteRecord, and all Java defined character encodings in the case of CharRecord,
  • methods for reading and writing such a record from/to a file,
  • methods for XML encoding of a record and corresponding parsing.

These two classes were developped mainly for the processing of records in conventional (legacy) file formats. RecordBase is the abstract superclass for both.

As a convenience. there is a third class BeanRecord which has no underlying record buffer, and which allows to define fields (properties) with only a few standard Java types: long, String, java.sql.Date and java.sql.Timestamp. The access and XML conversion methods are similiar to those of the two record classes based on buffers.

The four record classes mentioned above share much common code, and are therefore generated from a file Records.txt with the standalone program LineSplitter.


The transformation to and from XML is handled by specific classes derived from ByteTransformer and CharTransformer respectively, which are both derived from BaseTransformer. The main methods in these classes are

  • generate() which creates an XML file from the foreign file format, and
  • serialize() which creates the foreign file format back from an XML file.

The latter method uses a standard SAX parser.

Implementors are encouraged to define both transformation directions wherever possible, and to test the identical reproduction of the foreign file format from XML instances.

XML Record Specification

The record and transformer classes can be used independantly from each other, but for legacy file formats from z/OS and COBOL, they are normally combined. Abstraction is further raised by the possibility to define a record via an XML description. In this case, the specific Java record class derived from ByteRecord or CharRecord is completely generated with the aid of the XSLT stylesheet genBean.xsl found in the etc/xslt subdirectory of the source distribution. The etc/spec contains examples for XML record definitions, among them one for the German DTA format.

The XML specification allows to define:

  • general attributes of the record like name, type (Byte, Char, or Bean),
  • variants of overlaying subrecords,
  • arrays of fields, and
  • fields with name, length, type (num, decimal, ebcdic, string etc.), remark, and optional offset in the record buffer.

XML generation from the record's fields and SAX parsing is already predefined in the generated record class. The transformer must still be written manually, but apart from a very simple standard pattern it will only contain specific operations like file handling or checksum computation, for example.


For simple tabular record specification there is a possiblity to generate an SQL CREATE statement with the genBean.xsl stylesheet also found in etc/spec.