Archive for the ‘Open Source’ Category


Apache FOP

In Open Source,Product Development,Technology,Uncategorized on May 12, 2010 by petrem66

I’ve been using this transformer for many projects so far. It is quite powerful in the way that it can build documents based on a structured definition that’s built according to the XLS_FO schema. The utility of such package is obvious since building dynamic documents is necessary in many B2B or B2C applications, and Apache FOP is good when processing speed and memory footprint are not issues, but it falls short on performance.
The package operates on a four step transformation of the input definition (the XLS_FO document). First, the package loads and configures its main objects, including the renderer, and loads the available fonts into internal objects
Second, the package loads the input constructing a hierarchy based on FONode objects (Root is at the top) using a plain old SAX parser and a custom made Content Handler.
The third step kicks in when on of the page sequence elements in finished building at the end element method call of the Content Handler. The step consists of transforming the FONode based hierarchy into an internal format based on Block and its specialized subclasses. This structure is an abstraction on the layout of the document, a sort of an medium independent view.
Finally during the fourth step, based on the chosen renderer (one has to pass in the mime type of the expected output), the Block based hierarchy is rendered to a final output that can be PDF, PNG, TIFF, JPG, HTML or even RTF.
Based on my experiments with Apache FOP, I am confident that it use 40% of its processing time and cpu on the first step, 30% on the second, and 20% on the third and 10% on the last. What does that mean?. If for a 3 page document the transformer gets it done in 2 sec, one can be sure that it has burned 1.4 sec to prepare and load the XLS_FO, and the rest (0.6 sec) to do the actual job. That’s quite a limitation, and I think it can be done better.
The first think to improve its performance it to rewrite the code pertaining to steps 1 and 2. The challenge is that its not easy to replace the FONode based hierachy with something lighter such as a XmlBeans based