Articles

Simple solution for the Gulf of Mexico Oil Spill

In the Gulf of Mexico Oil Spill on May 22, 2010 by petrem66

I was thinking the other day to the problems caused by the massive leak of petroleum into the Ocean. They are massive as I can judge based on what I’ve seen on TV. The underwater camera captured some frightening images of this massive spill of deadly petroleum. I guess that if the pipe is not properly closed we will find ourselves in a situation in which we will have produces a dead zone the size of the entire Gulf (and who knows maybe the entire habitat of the Ocean will be destroyed with that). With such a threat, I think I can contribute at lease with an idea that may help.
Although I am in the software engineering field for more then 20 years, I still remember my classes in the high school and first year in University mechanics and design, and have a good understanding on who metal work is done. Also, as a teenager my father who is a mechanical engineer use to take me to his company (a heavy machinery maintenance factory). I was fortunate enough to actually find my first job at that factory so that I could earn some money during my first years at University. Back then I had participated at projects to automate cranes, and even as an entry level worker I could see and steal a little from other peoples experience in various fields. Later, after I graduated I run my own company (in association with my father) with the main activity to maintain heavy lifting machinery (cranes). Yes, I did mostly sales and marketing, but when time came and we were short of workers, I would use the wrench to tighten a loose pipe or install a hydraulic distributor. From marketing campaigns, signing contracts, coding in C, Pascal, Lisp, microcontrollers and 486 to injection molds for aluminum, and hydraulic devices I covered pretty much all back then. It was only since I came to Canada about 11 years ago since I stripped all off but coding/designing/architecting software. But I still remember what I learned. So here is my design (find it attached) Underwater portable guillotine for broken pipes

Advertisements

Articles

Apache FOP

In Open Source, Product Development, Technology, Uncategorized on May 12, 2010 by petrem66

I’ve been using this transformer for many projects so far. It is quite powerful in the way that it can build documents based on a structured definition that’s built according to the XLS_FO schema. The utility of such package is obvious since building dynamic documents is necessary in many B2B or B2C applications, and Apache FOP is good when processing speed and memory footprint are not issues, but it falls short on performance.
The package operates on a four step transformation of the input definition (the XLS_FO document). First, the package loads and configures its main objects, including the renderer, and loads the available fonts into internal objects
Second, the package loads the input constructing a hierarchy based on FONode objects (Root is at the top) using a plain old SAX parser and a custom made Content Handler.
The third step kicks in when on of the page sequence elements in finished building at the end element method call of the Content Handler. The step consists of transforming the FONode based hierarchy into an internal format based on Block and its specialized subclasses. This structure is an abstraction on the layout of the document, a sort of an medium independent view.
Finally during the fourth step, based on the chosen renderer (one has to pass in the mime type of the expected output), the Block based hierarchy is rendered to a final output that can be PDF, PNG, TIFF, JPG, HTML or even RTF.
Based on my experiments with Apache FOP, I am confident that it use 40% of its processing time and cpu on the first step, 30% on the second, and 20% on the third and 10% on the last. What does that mean?. If for a 3 page document the transformer gets it done in 2 sec, one can be sure that it has burned 1.4 sec to prepare and load the XLS_FO, and the rest (0.6 sec) to do the actual job. That’s quite a limitation, and I think it can be done better.
The first think to improve its performance it to rewrite the code pertaining to steps 1 and 2. The challenge is that its not easy to replace the FONode based hierachy with something lighter such as a XmlBeans based

Articles

On unit testing

In Product Development, Uncategorized on April 28, 2010 by petrem66

I think it will take a how lot of time to do proper unit testing. It is like coding twice as much as it should be. The benefit though it is that one can verify a lot faster that modules behave as expected. There are some very good points about unit testing at writing-great-unit-tests-best-and-worst-practises
The big question still remains about how granular the testing should go. There are two main ‘camps’ of unit testing supporters that I know of.
People in the first camp believe that the best results are to be obtained when unit testing at component level, such as at class level. One should not leave out any visible functionality of it, and with the unit test both negative and positive use cases are to be covered. I have an issue with this approach in that with growing code, and an evolving product it is very hard to maintain the integrity of the unit tests. The second camp, align the unit testing to the structure of the application, that is, unit test can follow the lines of delimitation between modules seen as a unit. Remember some design patterns like the session facade, people in the second camp won’t bother testing such a class since it merely pass on requests to other classes. Further, when you use dependency injection with Spring Framework, the module unit testing becomes the only logical choice

Articles

How to set Atomikos to not write to console logs?

In JEE on April 20, 2010 by petrem66

For a while, I’ve been using the distributed transaction manager from  Atomikos Transactions Essential for my XTransactions. It works neat except that it tends to be verbose when left on its own device (logging way). Today, I’ve taken a bit of a time to look into fixing that problem in order to trim down the irrelevant messages on the console in my web applications. And I’ve found a simple solution. Here it is

I’ve figured out a way to do that. It is actually very simple since Atomikos uses a centralize class to do the logging called com.atomikos.icatch.system.Configuration. Logging is actually performed with implementations of com.atomikos.diagnostics.Console so all I had to do is to un-register all default consoles and register my own implementation that’s based on commons logging. For that, I have implements a class called Log4JConsole implementing com.atomikos.diagnostics.Console, and placed the following code:

try {
  Configuration.removeConsoles();
  Configuration.addConsole (new Log4JConsole());
  logger.debug(“Set log4j based console for atomikos”);
}
catch(Exception e) {
  logger.error(“Failed to initialize Atomikos”, e);
}

in the servlet.init(). One would argue that I could have implemented a special javax.servlet.ServletContextListener for that initialization to happen at application start-up. That would mean more work since Atomikos configures itself at first use that happens usually after the components using it do so. In my case, Spring FW and Hibernate FW have to come up first. These happen when the Servlet initializes itself

Articles

My goals for the first release

In Business on April 19, 2010 by petrem66

As my first full work day on the project, I want to review my goals with this posting. Yes, the overall goal is to get a ball so that I can be in the game. I think that I will have a shot with the online business only if I’d be able to learn fast and adopt easy to the feedback from the market. I guess all the struggle for a good product, interesting offer, and SEO fit will not be enough to make an impression except for the case when I will realize the positioning to a market niche that is profitable. At this stage of the project, I want to not be concerned with marketing yet.No matter what I think about my potential customers, there is no way yet to prove or disprove my points with palpable evidence. Since reading ‘the Black Swan’ by Nassim Nicholas Taleb I learned to be an empiricist. All my assumptions may be flawed, but how would I know without testing my theories?. At this stage, there is no much to present to the public, and it takes quite an effort to enhance the product

Once launched on the market (I aim at August 2010), I will do whatever it takes to expose it to the largest audience I can and to collect as much analytic to analyze. At that point it should take lesser effort to outsource parts of the new development, and also to put any enhancement in production.

Articles

On securing the persistent data on ‘cloud’

In Architecture, Product Development on April 15, 2010 by petrem66 Tagged:

The application’s foundation has to be build so that it will stand future heavy requirements like security and privacy compliance. One of the most important aspects that need to be well think of is information access outside the running code.
In any cloud based deployment, one can use persistent file system to store block of data like capabilities such as Cloud Files in Rackspace, EC2 in Amazon WS etc. This is especially true when such data must be shared among many server instances that form your application. Three parts are interesting to mention with regards to such block of data:
– encryption
– compaction
– integrity check

Encryption

A very detailed discussion about this topic can be found at Core Security Patterns. Suffice to say is that one can choose from various encryption algorithms at hand for Java developers for that. Since the block of data is not shared outside the application, one should go for a common encryption key. If that is hard-code somewhere in a reusable piece of code, it is safe to assume that it is highly unlikely that a malicious third party would be able to get it from there. Cracking the key of strong encryption algorithms like AES on 192 bit is quite a challenge, but even so one can imaging a strategy of changing such a key on a regular bases (like weekly or monthly) accompanied with a data migration task (re-encryption that is)

Compaction

In order to save on storage space and network bandwidth, one should consider archiving the blocks of data. Java runtime comes with a neat wrapper package of the GZiP compression called java.util.zip. The snippet of code below will do the job of achiving/expanding of your block of data:

public byte[] expand(byte[] buffer) throws Exception {
ByteArrayOutputStream os = null;
try {
os = new ByteArrayOutputStream();
ByteArrayInputStream is = new ByteArrayInputStream(buffer);
GZIPInputStream zin = new GZIPInputStream(is);
byte[] ret = new byte[4098];
for(int i=0; (i = zin.read(ret))>0;)
os.write(ret, 0, i);
return os.toByteArray();
}
finally {
if (os != null)
try {
os.close();
}
catch(Exception e){}
}
}
public byte[] archive(byte[] buffer) throws Exception {
ByteArrayOutputStream os = null;
try {
os = new ByteArrayOutputStream();
GZIPOutputStream zout = new GZIPOutputStream(os);
zout.write(buffer);
zout.finish();
zout.flush();
return os.toByteArray();
}
finally {
if (os != null)
try {
os.close();
}
catch(Exception e){}
}
}

Integrity check

From a consuming application prospective it is important to be assured that the block of data is not tampered. Usually, the producing application accompanies it with an MD5 based sum which it can pass along with the block of data descriptor to the consumer. The md5sum can be obtained with Java API using the java.security.MessageDigester (see the code snippet below)

public boolean match(String md5sum, byte[] buffer) throws Exception {
String s = getMD5sum(buffer);
return s.endsWith(md5sum);
}
public String getMD5sum(byte[] buffer) throws Exception {
byte[] sum = MessageDigest.getInstance(“MD5”).digest(buffer);
StringBuffer sbuf = new StringBuffer();
for (int i = 0; i < sum.length; i++) {
int c = (int) sum[i];
if (c < 0)
c = (Math.abs(c) – 1) ^ 255;
sbuf.append(Integer.toHexString(c >>> 4));
sbuf.append(Integer.toHexString(c & 15));
}
return sbuf.toString();
}

The string that results from calling getMD5sum can be stored along with the block name and passed to the consuming application as such.

Articles

Using Rackspace Cloud Files – 1

In Architecture on April 11, 2010 by petrem66

Rackspace Cloud Files is an inexpensive way to persist private content that needs to be available on a longer period of time, but it is not practical for use as shared file system for Rackspace cloud servers. The storage does not expose internal IP therefore whenever the application needs to load or save to it objects, they will charge you network fees. In order to assess the benefits versus short fallings of using the Rackspace FS in my design I conducted a set of tests.

This blog is the starter of a series on experimenting with CF

Test premise

My web application is a virtual J2EE based server consisting of a number of cloud based façade servers, and a grid of worker to do the actual processing on files. In between I have a simple message queue to disconnect the two groups of servers.

Assuming that an action triggered by the user causes an object O1 to be transformed into O2 and that user expects to be presented O2, the question is what is the best to employ Rackspace Cloud Files?. The criteria for best are cost, complexity, and performance (response time)

Possible architecture

I am considering two scenarios of usage of Rackspace CF by the components of the virtual server:

  • virtual file system, when both the application on the Web server and the Worker processing the request can access it to get and store objects (the content of the file)
  • lazy repository, when only the Worker can access it. The application on the Web server gets the content of the file through the Worker at the end of processing

Using Rackspace CF As Virtual File System

  1. User request
  2. Web application publishes request for processing
  3. Worker gets the request
  4. Worker gets the input file O1
  5. After processing, the Worker stores the result as O2 to cf
  6. Worker publishes work done with follow up Id
  7. Web application gets the follow up for processing completion
  8. Web app gets O2 from cf
  9. Reply to user

Using Rackspace CF As a Lazy Repository

  1. User sends request
  2. The Web application publishes request for processing
  3. The Worker gets the request
  4. The Worker gets the input file O1
  5. The Worker processes O1 and keeps it as O2 locally
  6. The Web application gets the reply from the queue
  7. The Web application gets O2 from the worker and puts it in the response body
  8. The user receives O2
  9. The worker stores O2 to CF

Both scenarios have strength and weaknesses. Using Rackspace CF as a virtual file system, the application can make up a transactional-like process to cover for the case when the worker, or the queue can go under before the response is propagated back. Also it is a true grid architecture in that processors are completely isolated from the facade, but it is expensive in terms of decreased speed, and extra cost. The communication between Rackspace servers and CF is regarded as external traffic. They charge any GB of inbound and outbound, so for any 1 MB of O2, it would cost 3 X 1 MB to reach the user’s browser (see steps 5, 8, 9). Also, if O2 needs to be transient as per some business cases, storing it into CF is a waste. I need to test the speed penalty for external traffic vs internal traffic for this scenarion

Using Rackspace CF as a lazy repository, the cost due to O2 transportation over the network is reduced to 2 x 1MB (see step 9, and 10). Also, in case O2 is transient, the application can discard it after usage saving the cost of storage on the CF, but such configuration is not true grid and it would be a bit more work to ensure reprocessing in case of a Worker lost