Friday, April 10, 2009

High frequency task scheduling with inaccurate system clock in Java

Java has two general purpose scheduling implementations: java.util.Timer (further referenced as "Timer") and java.util.concurrent.ScheduledThreadPoolExecutor (further referenced as "STPE"). STPE is generally concieved as the successor of Timer and their should normally not be a reason to use Timer any longer. However, both Timer and STPE rely on the underlying system clock, and can thus not be more accurate then the accuracy of that clock.

I happen to have access to a machine where the accuracy of the clock is 20 milliseconds. This means that when you try to make a thread sleep for a single millisecond
public class SleepTest
{
    public static void main(String[] args) throws InterruptedException
    {
        while (true)
        {
            long start = System.nanoTime();
            Thread.sleep(1);
            System.out.println((System.nanoTime() - start) / 1000000);
        }
    }
}
the thread will actually sleep for about 20 milliseconds every time
20
19
19
19
20
20
19
19
...
This also has an impact on the scheduling behaviour with Timer and STPE. Note that the Timer javadoc explicitely mentions this, while the STPE javadoc doesn't.

For example timing a task to execute every 10ms "with fixed delays" using Timer or STPE results in the task being executed every 20ms in stead:
running task [elapsed nanos since previous = 20157000]
running task [elapsed nanos since previous = 19999000]
running task [elapsed nanos since previous = 19895000]
running task [elapsed nanos since previous = 19940000]
running task [elapsed nanos since previous = 19987000]
running task [elapsed nanos since previous = 19991000]
running task [elapsed nanos since previous = 20013000]
running task [elapsed nanos since previous = 20011000]
running task [elapsed nanos since previous = 19980000]
...
When using Timer or STPE to schedule "at fixed rate", two tasks are being executed in immidiate succession every 20ms. This is also consistent with the documentation: it tries to "catch up".
running task [elapsed nanos since previous = 19808000]
running task [elapsed nanos since previous = 187000]
running task [elapsed nanos since previous = 19813000]
running task [elapsed nanos since previous = 185000]
running task [elapsed nanos since previous = 19842000]
running task [elapsed nanos since previous = 185000]
running task [elapsed nanos since previous = 19803000]
running task [elapsed nanos since previous = 214000]
...
Note that the "fixed delay" of Timer doesn't take the execution time of the task itself into account, while the STPE "fixed delay" does, but that is not something I was trying to measure here. The task I am executing is just an empty Runnable.run() method.

Now imagine you are in a situation where you actually need to schedule something at a higher frequency then what the underlying system clock can handle (sending out a UDP stream at a fixed rate for example). Scheduling with "fixed delay" clearly is no option; both Timer and STPE will only give you a frequency equal to the accuracy of the underlying clock, nothing higher. The only option is to schedule "at fixed rate". Both Timer and STPE will "catch up" in this mode, so the average throughput corresponds to the requested frequency to execute the tasks with. The tasks will be executed in bursts though: one burst with a few tasks every time the underlying clock reports a new time value (every 20ms in my example). The problem with "fixed rate" scheduling is that you don't really have a way to control these bursts. If for example your application is blocked for a longer time for some reasons (garbage collection for example), there will be a gap in the task execution, and then a "big" burst to catch up all the tasks it missed.

This is where Timer offers an advantage over STPE. With STPE you schedule runnables, so inside the task there is not really an API to find out information about the scheduling of the task. With Timer, you schedule TimerTask instances. TimerTask has a method scheduledExecutionTime which returns the time the task "should have run". By comparing this to the actual time, you can discover "big bursts" and for example decide not to execute the task. I haven't found a practical way to do that with STPE.

I discovered this when trying to send out a UDP data stream at a fixed rate. The requirement was that a few missed packets were not so bad (a little drop in the speed), but big bursts with many packets (more then about 10 in a row) caused a problematically high peak in the data rate (because then the stream was "clipped" further down the path resulting in dropped packets) and should thus never happen.

Note that in my tests I used System.nanoTime() to measure the time between task executions. This gives me a higher accuracy then the System.currentTimeMillis() which makes me wonder why there is no scheduler with the same accuracy in Java. Something I should look into...