Jan Van Besien's blog: 2008

Tuesday, December 23, 2008

Runtime dispatch between different versions of the same library in Java

Java can (at least not yet?) dispatch between different versions of the same library at runtime. In fact, Java doesn't even know about the concept of library versions.

I was recently faced with a problem where it was required to dispatch between different versions of the same library at runtime. We have this architecture where network management software (lets call it the "master") has to communicate with different other software components somewhere on (remote) hardware devices in the network. This communication goes over various protocols, some Java specific (for example RMI using springs http invoker), some not (for example telnet). We also have the requirement to support different versions of these remote devices from within this single "master".

This UML diagram (click to enlarge) shows an example possible situation where the "master" needs to communicate with two different versions of two devices. There are two instances of device A in the network, one with version X and one with version Y. There are also two instances of device B, one with version Z and one with version Q. A driver implementation exists for every version of the device, but ultimately each device type (A or B) has a single interface to be used by the "master". It is the drivers job to abstract away the differences between different versions of the same device type.

If the protocol is telnet or something like that, we wouldn't really need different driver implementations. A naive implementation of the "master" could simply switch around the different possible versions with "if" statements (if this version send telnet command X, if that version send telnet command Y, etc). If the protocol is Java, this becomes a problem, because you then typically depend on a library which specifies the interfaces and types used for the communication. This is technically not required of course (you could manually write RMI to a raw byte output stream), but it is what you generally want. Event for the non Java protocols, you might want to abstract away the different commands required for different versions behind a single interface in different implementation libraries.

We discussed (and experimented with) different alternative solutions to this problem. I'll go through each of them, and explain the pros and cons.

idea 1: custom class loading magic

The first idea was to package the driver implementation libraries in jar files (together with their dependencies), and class load the appropriate version of such a driver at runtime. This also requires the usage of reflection to actually instantiate such a driver. The usage of reflection prevents compile time dependencies on the driver implementation libraries (this is impossible because we would have to depend on different implementations of the same driver interface at the same time). The compile time dependencies are from the "master" to the driver interfaces, and from the driver implementation libraries to the remote device interfaces.

To make this idea work, the class loaders of the "master" and the driver implementation libraries have to be separated. On the other hand, the driver interfaces (used to communicate between the two) have to be known by both. In Java, this is only possible if they both use the driver interface classes from the same class loader. This means that the driver class loader has to have a dependency on the "master" class loader, such that driver interfaces are loaded from there. Having the "master" class loader as parent from the driver library class loader solves this problem.

As a consequence, the driver will find all its libraries in the parent class loader first, unless they are not available there. For libraries on which the driver and the "master" both depend (in our situation, spring was such a library), this is a problem, because from within these classes, the classes loaded by the class loader from the driver itself are not visible. We found a solution to this problem by writing a custom class loader which inverses the regular pattern for finding classes (first in own class loader, then in parent, in stead of first in the parent). This solves the problem, but we have to make sure that the driver interfaces are loaded from the parent class loader (by not packaging them together with the driver implementation libraries).

In the end, this approach does work, but the implementation is scattered around the different components. The "class loading magic" happens in the "master", but it only works if the driver implementation libraries are correctly packaged. This correct packaging depends on whether certain libraries are used by the master and the driver together, or only by the driver interface, etc.

idea 2: deploy the drivers as WAR files

The second idea was to package the driver implementation libraries as WAR files, and to give them a remote interface (RMI or something like that). This would imply that the container would take care of all the separation of class loaders, which is nice.

To understand the downside of this approach, you should know that the "master" doesn't simply uses these drivers once when it has to communicate with a remote device, and then forgets about them. The "master" does lots of interesting things with the drivers: it caches them (for example to reuse the same TCP connection to the device the next time), it manages concurrent access to the drivers (allowed on some, disallowed on others), etc... This means the master needs to have control over the lifetime of the drivers.

If we deploy the drivers as WAR files, we loose this functionality. We would effectively have to make the drivers stateless, or we would have to move a lot of the functionality from the "master" into the individual drivers.

idea 3: OSGi to the rescue

Our next idea was to use OSGi. If we make every driver implementation an OSGi bundle, we could decide at runtime which service of which bundle to invoke, and OSGi would do all the class loading separation for us. I did some experiments which proved that this would be a feasible solution (although I didn't look into the possibilities for lifetime management of the drivers in detail), but apparently OSGi is a bit of an all or nothing approach. There are of course many more components in our architecture than what is shown in the above UML diagram, and at the time it was not feasible to introduce OSGi in all of them. It seems that we would have to deal with the same class loading problems as in the first idea on the boundaries between non-OSGi and OSGi components.

A colleague of mine found a library (transloader) which was created to solve this problem. The transloader library is not tight to OSGi though, it is a general solution to bridge the gap between different class loaders within Java. This led us to our fourth idea.

idea 4: class loading magic revisited

If this transloader library can bridge the gap between different class loaders, why not use it to implement our first idea? We now have the same architecture as in the first idea, but without custom class loaders and without weird packaging constraints on the driver implementation libraries. All we have to do is package the drivers together with all their dependencies (no exceptions) and use the transloader library for the communication between the "master" and the drivers.

Check out the transloader tutorial to learn more, it was really easy to set up and use.

Conclusion

Built in support for modules and library versions in Java would probably still be the most desirable solution. I think until then, something like OSGi can be a nice solution too, if you can introduce it in your architecture. If that is not an option, the above described approach offers a valid alternative (at least in our use case) with a fairly simple architecture. We will be using this idea in production in the coming months, lets see if it lives up to our expectations.

Monday, April 21, 2008

analyzing maven dependencies with UML tools

Maven is very good in managing dependencies in large projects with many different artifacts, but unfortunately maven doesn't provide an easy way to visualize and analyze these dependencies. The dependency report that comes by default with a maven generated web site provides some information, but it is textual only and it only provides information about outgoing dependencies: it tells you what artifacts project X depends on, but not what other projects depend on project X.

Some maven dependency visualization tools exist. The best I could find is the dependency analyzer from jfrog. It's a standalone java application that allows you to open a pom.xml file and graphically explore its dependencies in a swing GUI. It does however not provide advanced analyzing tools, its merely a visualization tool.

Existing UML modeling tools (e.g. magicdraw) typically allow you to do much more advanced kinds of analyzes on dependency and other UML diagrams. They also allow you to represent dependencies in different ways (as a diagram, as a matrix, ...). Why not use the power of these tools to analyze maven dependencies?

maven xmi plugin

In this blog entry, I present a maven plug in that generates an XMI model from a pom.xml file and it's dependencies. XMI is an XML standard most commonly used to represent UML models. Most professional modeling tools can read XMI files. The XMI itself is not directly a visualization of the dependencies, but with a decent modeling tool, all kinds of fancy visualizations are possible.

To give you an idea, this screen shot (click to enlarge) shows a diagram created with magicdraw for the dependencies of the xmi maven plugin itself.

using the maven xmi plugin

First, you'll have to check out the code of the xmi-plugin from subversion:

svn checkout http://xmi-plugin.googlecode.com/svn/trunk/ xmi-plugin

Then you can install the xmi-plugin in your local maven repository, by invoking "mvn install" in the checkout directory. Now you are ready to use the plugin on your favorite maven project. Simply run

mvn xmi:xmi

in the root directory of your project (where the pom.xml file is located). The resulting XMI model will be generated in the target directory of your project, with the name ${artifactid}-${version}.xmi. You can now start analyzing the XMI model in an UML tool.

more information

The xmi-plugin is hosted on google code, for more information go to http://code.google.com/p/xmi-plugin/.

Thursday, March 27, 2008

SOAP faults with spring ws and JAXB 2.0

When migrating existing web services from Axis2 to spring web services (1.0.3) and JAXB (2.0), I ran into three problems related to SOAP faults. I found some hints towards a solution on the net, but none of them were really exhaustive. In this blog post, I give a step by step overview of how we solved these three issues.

Let's start by sketching the context in which we encountered the problems. We already had a contract first approach (actually, our WSDL was generated using an AndroMDA cartridge), so we could start from an existing WSDL. We only had to move the types into a separate XSD file (for JAXB, see further). Our WSDL defines a lot of operations with input, output and fault messages. With axis2, these faults were represented as java exceptions (inheriting from java.lang.Exception). When serializing these java classes back to XML, they were represented as SOAP faults with as detail element of the soap fault a serialized version of the java exception (as defined in the XML schema). This is an example of a SOAP fault message sent to the client:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
  <soapenv:Body>
     <soapenv:Fault>
        <faultcode>soapenv:Client</faultcode>
        <faultstring>FooException: null</faultstring>
        <detail>
           <ns:FooException xsi:type="ns:FooException" xmlns:ns="http://our.namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
              <ns:customField>0</ns:customField>
           </ns:FooException>
        </detail>
     </soapenv:Fault>
  </soapenv:Body>
</soapenv:Envelope>

The first problem when migrating to spring web services and JAXB, is that JAXB generates java types from the XSD rather then from the WSDL. The types of the fault messages are off course known as an XSD element, but the XSD has no information to distinguish them from other types (input and output messages). Typically all input messages end with "Request", all output messages with "Response" and all exceptions with "Exception", but JAXB doesn't do anything special with these naming conventions. Apart from these names, there is thus nothing special about the java types that JAXB generates for them. For exceptions, this leads to the confusing situation of having something named FooException that doesn't inherit from java.lang.Exception (I'll call them "fake" exceptions from now on).

This means that in our AbstractMarshallingPayloadEndpoint implementation (which is what you use if you want to implement web services with spring and a marshalling technology like JAXB), we can't throw these exceptions (as they are not throwable).

The second problem is that these fake exceptions don't have the JAXB @XmlRootElement annotation on them. This means that they can not directly be serialized: you first have to create a valid JAXBElement for them using the ObjectFactory that is generated by JAXB, something like

final JAXBElement element = new ObjectFactory().createFooException((FooException) exception);

The third problem is that spring's built in support for dealing with exceptions doesn't serialize the exception in the detail part of a SOAP fault. By default, spring only renders a fault code and error message in the SOAP fault.

Serializing exceptions as SOAP fault detail messages

Let's forget about the first and second problems for now and assume all exceptions properly inherit from java.lang.Exception and have the JAXB @XmlRootElement annotation on them. By writing a custom SoapFaultMappingExceptionResolver (or directly implementing EndpointExceptionResolver if you want to), you can customize the way the SOAP fault is generated. This is how the customizeFault method could be implemented:

@Override
protected void customizeFault(
 final Object endpoint, 
 final Exception exception, 
 final SoapFault fault)
{
 super.customizeFault(endpoint, exception, fault);

 // get the marshaller
 AbstractMarshallingPayloadEndpoint marshallingendEndpoint = (AbstractMarshallingPayloadEndpoint) endpoint;

 // get the result inside the fault detail to marshal to
 Result result = fault.addFaultDetail().getResult();

 // marshal
 try
 {
   marshallingendEndpoint.getMarshaller().marshal(exception, result);
 } catch (IOException e)
 {
   throw new RuntimeException(e);
 }
}

Note how we use the marshaller from the given endpoint to marshal the exception into the detail of the SOAP fault (represented using the Result interface).

Using JAXB's ObjectFactory to solve the second problem

As explained already, the second of the three problems forces us to use JAXB's ObjectFactory for the exceptions. This means that, in the code example above, we can't directly pass the exception to the marshaller. We first have to create a JAXBElement from the exception. This forces us to known the exact type of the exception, and thus requires ugly casting, for example:

@Override
protected void customizeFault(
 final Object endpoint,
 final Exception exception,
 final SoapFault fault)
{
 super.customizeFault(endpoint, exception, fault);

 // get the marshaller
 AbstractMarshallingPayloadEndpoint marshallingendEndpoint = (AbstractMarshallingPayloadEndpoint) endpoint;

 // get the result inside the fault detail to marshal to
 Result result = fault.addFaultDetail().getResult();

 // create the corresponding jaxb element
 final JAXBElement element;
 if (exception instanceof FooException)
 {
  element = new ObjectFactory().createFooException((FooException) exception);
 } else if (...)
 {
  // else if required for all possible exceptions
 }

 // marshal
 try
 {
   marshallingendEndpoint.getMarshaller().marshal(exception, result);
 } catch (IOException e)
 {
  throw new RuntimeException(e);
 }
}

As you can see, this code is no longer scalable. For every new exception, you have to add code.

Wrapping fake exceptions in a real exception

The last problem to solve is that the fake exceptions are not really java exceptions. We've looked into possibilities to customize the JAXB generation of java types in such a way that everything that ends with "Exception" is actually a real exception, but that didn't seem to be possible.

We ended up writing a solution that works, although I must say it is not very elegant. We've created a custom exception to wrap the "fake" exceptions generated by JAXB (called it UnmarshalledExceptionWrapperException). The wrapper wraps instances of type Object and verifies that their class names end with "Exception" (that really seems to be the best thing we can do). The custom SoapFaultMappingExceptionResolver now has to unwrap these fake exceptions and build a valid JAXB element from them:

@Override
protected void customizeFault(
 final Object endpoint,
 final Exception exception,
 final SoapFault fault)
{
 super.customizeFault(endpoint, exception, fault);

 // get the wrapper
 UnmarshalledExceptionWrapperException exception = (UnmarshalledExceptionWrapperException) ex;

 // unwrap the exception
 Object unwrappedException = exception.getWrappedException();

 // get the marshaller
 AbstractMarshallingPayloadEndpoint 
  marshallingendEndpoint = (AbstractMarshallingPayloadEndpoint) endpoint;

 // get the result inside the fault detail to marshal to
 Result result = fault.addFaultDetail().getResult();

 // create the corresponding jaxb element
 final JAXBElement element;
 if (unwrappedException instanceof FooException)
 {
  element = new ObjectFactory().createFooException((FooException) unwrappedException);
 } else if (...)
 {
  // else if required for all possible exceptions
 }

 // marshal
 try
 {
   marshallingendEndpoint.getMarshaller().marshal(exception, result);
 } catch (IOException e)
 {
  throw new RuntimeException(e);
 }
}

Conclusion

We ended up with a working spring web services implementation to mimic the behavior of Axis2 in respect to SOAP faults, although it is not the kind of java code one can be very proud of. The major problem seems to be that code modification is required when adding, removing or changing the available fault messages in the WSDL. We would have a far more elegant solution if we could get JAXB 2.0 to generate real exceptions with an @XmlRootElement annotation on them.

Wednesday, March 5, 2008

building the latest scala eclipse plugin from source

Rumors have been around for some time now that someone in the scala team is working on a new version of the eclipse plugin for scala. From the scala website you can download a version of the eclipse plugin that is up to date with the latest development on scala itself, but that plugin doesn't seem to get any new features.

The new rewritten plugin can not be downloaded yet (or at least I didn't find it anywhere), so this is what I did to build it from source. You'll need java and ant (including the optional tasks) installed to get this working. The examples are for building on linux, so there might be some differences for other platforms.

First you'll need to checkout the scala and plugin sources from subversion:

svn co http://lampsvn.epfl.ch/svn-repos/scala/scala/trunk scala
svn co http://lampsvn.epfl.ch/svn-repos/scala/plugin plugin

You can build scala, or if you already have an up to date scala distribution on your system somewhere, you can also point the plugin build to that existing scala distribution (see build.properties.SAMPLE in the plugin directory).

Building scala is as simply as running ant with enough memory (in the scala directory):

export ANT_OPTS='-Xms512M -Xmx1024M'; ant dist

The same is true for building the plugin (in the plugin directory):

export ANT_OPTS='-Xms512M -Xmx1024M'; ant dist

Now start eclipse and uninstall the existing scala plugin if you happen to have that one installed. To install the new plugin, you can create a new local update site and point it to the dist/scala.update directory in the plugin directory you've just built.

To get existing scala projects to work with the new plugin, you have to modify the .project files a bit. This is an example:

<?xml version="1.0" encoding="UTF-8"?>
<projectDescription>
 <name>myproject</name>
 <comment></comment>
 <projects>
 </projects>
 <buildSpec>
  <buildCommand>
   <name>scala.plugin.scalabuilder</name>
   <arguments>
   </arguments>
  </buildCommand>
 </buildSpec>
 <natures>
  <nature>scala.plugin.scalanature</nature>
  <nature>org.eclipse.jdt.core.javanature</nature>
 </natures>
</projectDescription>

That's all. Now you've got code completion in scala ;-)

Tuesday, February 26, 2008

Why paging database queries doesn't work

At work, we have several (Java) applications where there is a need to 'page' the data from the database (using hibernate and MySQL). With paging I mean that from all the interesting records, only a few have to be processed or displayed at a time. This requirement often comes from GUI design, but equally often it is required for reasons of performance and memory usage. For example we have this application that generates all kinds of reports at regular intervals based on information it obtains over some remote interface from different remote applications. The amounts of data we have, prevent us from not paging these queries.

The question is how this paging can be implemented. We also require every page to be of the same size (except obviously the last one).

We use hibernate, but the problem is the same when using plain SQL, so I'll stick to SQL in the examples. As a running example, let's assume a one-to-many mapping from a table 'a' to 'b'.

create table a (
   id integer auto_increment, 
   primary key (id));
create table b (
   id integer auto_increment, 
   a_id integer, 
   primary key(id));

A first naive attempt

select * from a 
    inner join b on b.a_id = a.id 
    order by a.id limit x,y;

This query uses the MySQL limit x,y construct, meaning "return y results, starting with result number x".

Unfortunately it doesn't work as expected. Due to the join, the above query returns a result for every b, not for every a. This means the paging will be plain wrong.

A sub query approach

select * from a 
    inner join b on b.a_id = a.id 
    where a.id in
(
    select id from a order by id limit x,y
);

In this query, we try to circumvent the problem by paging on a sub query without joins, and then joining in another query for all the results obtained from the paged query. Unfortunately this doesn't work either, because MySQL doesn't allow the limit construct inside a sub query inside an in clause.

Variation on the same idea

The query and sub query can be split into two distinct queries. First we query the primary keys of all a's in a certain page:

select id from a 
    order by id limit x,y;

Then we use that list of primary keys in a second query:

select * from a 
    inner join b on b.a_id = a.id 
    where a.id in 
    (list of ID's from the previous query);

This actually works, but now the size of the second query is proportional to the page size. This has crashed our MySQL clustered database if the pages grow over 1000 records.

A no-join solution

Another obviously working solution is to not join at all:

select * from a 
    order by id limit x,y;

If necessary, the b records can be fetched from the database with subsequent queries. If, in the context of a certain application, all these b records are going to be needed anyway, this has a tremendous performance impact because in stead of only 1 (fairly complex) query, we now have to do n + 1 (simple) queries.

Dropping the fixed page size requirement

If we drop the requirement that every page (except the last one) should be of the same size, we can page the query without relying on the MySQL limit construct. This allows us to page on actual well chosen constraints, rather than on the number of results that happens to be the consequence of a certain set of joins in a query. For example, we could page on the primary key:

select * from a 
    inner join b on b.a_id = a.id 
    where a.id > x and a.id <= x+y;

With an auto increment primary key on a table where deletes are not very frequent, this might yield good results. However, if from time to time many records are being deleted from the database, this might result in smaller or even empty pages.

Conclusion

This leads me to a rather pessimistic conclusion: paged database queries are not generally possible with a single well performing query, at least not with MySQL 5.

Sunday, February 17, 2008

DataInput.skipBytes(int n)

In Java, DataInput defines an interface to reconstruct primitives from an InputStream (usualy written earlier with a DataOutput). Besides methods like readInt, readLong, readDouble etc, it also provides a method skipBytes(int n). What do you expect this method to do? Do you spot the problem in this piece of code?

DataInputStream input = new DataInputStream(
   new BufferedInputStream(
       new FileInputStream(file)));
input.skipBytes(2);

I wasn't aware of any possible problem until I enabled the maven findBugs plugin on our continuous build server. It warned me about not checking the return value of the skipBytes(int n) method. What? What can it possibly return? If you read the javadoc of the method (what I probably should have done in the first place), you'll see that it returns the number of bytes actually skipped. That is because the method skipBytes doesn't actually skip bytes... it only attempts to skip bytes.

So here's my question: what am I supposed to do when the return value is less then the number of bytes I wanted to skip? Do I need to loop until it is OK? Do I need a fall back implementation? Do I need to raise a runtime exception? Do I need to Thread.yield() and hope another thread will get the chance to fill the buffer of my underlying BufferedInputStream? The only thing the javadoc has to say about this, is that there may be many different reasons why the actual number of skipped bytes differs from the number of bytes you wanted to skip. It seems to me that, depending on the reason, another strategy might be appropriate... but of course there is no way to know what the reason was if it would happen.

Although I could probably have looked into a completely different solution using Java NIO, I ended up writing this:

// skip 2 bytes
input.readByte();
input.readByte();