Xtrans

From teherba.org
Jump to: navigation, search

Applications

JavaImportChecker

This class is a pseudo serializer to be applied after JavaTransformer. It checks the import statements in a Java source file and reports:

  • superfluous imports which are never used
  • missing imports, either because:
    • the class is in the same package
    • classes prefixed with their package name are not properly recognized by the tool (for example java.util.Date) - these should be explicitely imported also
    • inherited enums are not properly recognized by the tool

The tool checks class names when they start with an uppercase letter [A-Z], somewhere followed by a lowercase letter [a-z].

All sources in a project can be checked with a shell command line like:

find ../xtool/src -iname "*.java" | xargs -l -ißß java -jar dist/xtrans.jar -java ßß -jimp

The corresponding output was:

SchemaBean
PathStack
XPathLink
XPathSelect
XmlnsPrefix
XtoolServlet
  import only:	Enumeration
  import only:	InputStream
  import only:	ServletConfig
  import only:	ServletContext
  import only:	ZipFile
SchemaBeanBase
  import only:	Date
  import only:	Timestamp
PathElement
IndexPage
  import only:	HttpSession
  import only:	Iterator
Messages
SchemaList
XmlnsXref
NonClosingInputStream
SchemaArray
  use only:	Date

Bugs

General Problems

  • Though most transformers convert from the raw (specific) format to an XMLized representation, there are a few exceptions where general binary or text files are converted to the specific format which is then wrapped into XML. Examples are Base64, Quoted Prinatble and Morse Code.
  • Most transformers store values in XML elements, but sometimes it seemed easier to store them in attributes of elements. DTA and Datev are examples for the latter case.
  • For formats with many different tags (SWIFT for example) the question arises whether such tags are syntax or data. These tags can be converted to id attributes of a generalized XML "field" element, or a seperate element for each such tag can be generated. The SwiftTransformer made the latter decision.

Test

  • Not all format conversions are precisely reversible.
  • There are only a few test cases.

Incompletene Transformers

  • general.XMLTransformer - insufficient serialization of entities; serializer should be replaced by Apaches's
  • general.CountingTransformer - cannot generate, but serializes any XML to a sorted list with counts for all elements, and the accumulated length of their direct character content
  • net.URITransformer - the set of supported schemas is incomplete, and serializing is not implemented.
  • organizer.LDIFTransformer - not well tested, and serializing is not implemented.

Hints for Developers

Xtrans currently processes only a limited set of formats. You are encouraged to:

  • play with the format transformer classes,
  • email any suggestions for improvement,
  • contribute patches for corrections,
  • contribute new transformer classes.

Coding conventions

Please try to remain close to the current programming style:

  • Write Javadoc comments before all methods and public members.
  • Note that the Java sources are compiled with UTF-8 source encoding:
   <javac  srcdir="${src.home}" destdir="${build.classes}" listfiles="yes"
           encoding="utf8"
           source="1.4" target="1.4"
           debug="${javac.debug}" debuglevel="${javac.debuglevel}">
Determine the proper accents and non-ASCII characters, and write them in Unicode in the Java source files. Use an Unicode enabled editor that handles UTF-8 properly; write some Unicode characters in the header comment such that the editor can detect the UTF-8 encoding.
  • Use reliable sources for the format definition like RFCs or ISO standards, and document them in the Javadoc header of the class.

Reversibility

The transformers should try to serialize XML to exactly the same specific format from which they are able to generate XML. The test Ant targets perform a "generate - serialize - binary compare" sequence to check the reversibility of the transformation.

Some formats don't have a well-defined canonical representation. In JCL, for example, the line breaks and the spaces for field separation are lost in the XML representation, and cannot exactly be reproduced by the serializer. In these cases, subsequent "generate - serialize" sequences should finally produce an identical result.

Future Extensions

  • more text processing formats:
    • (La)TeX - similiar to RTF
    • dot instruction oriented formats: IBM DCF, nroff, troff, perldoc
    • binary formats like IBM DCA/RFT, Siemens Hit, WordPerfect
    • common tagset for text processing features
  • raster image processing formats:
    • TIFF
    • EXIF - at least the header
    • GIF, BMP etc.
  • vector image processing formats with target SVG:
    • WMF
    • Flash?
    • RTF DO, AmiPro, WordPerfect Graphics ...
  • ZIP file tree pseudo transformer