The question is how this paging can be implemented. We also require every page to be of the same size (except obviously the last one).
We use hibernate, but the problem is the same when using plain SQL, so I'll stick to SQL in the examples. As a running example, let's assume a one-to-many mapping from a table 'a' to 'b'.
create table a ( id integer auto_increment, primary key (id)); create table b ( id integer auto_increment, a_id integer, primary key(id));
A first naive attempt
This query uses the MySQLselect * from a inner join b on b.a_id = a.id order by a.id limit x,y;
limit x,y
construct, meaning "return y results, starting with result number x".
Unfortunately it doesn't work as expected. Due to the join, the above query returns a result for every b, not for every a. This means the paging will be plain wrong.
A sub query approach
In this query, we try to circumvent the problem by paging on a sub query without joins, and then joining in another query for all the results obtained from the paged query. Unfortunately this doesn't work either, because MySQL doesn't allow theselect * from a inner join b on b.a_id = a.id where a.id in ( select id from a order by id limit x,y );
limit
construct inside a sub query inside an in
clause.
Variation on the same idea
The query and sub query can be split into two distinct queries. First we query the primary keys of all a's in a certain page:Then we use that list of primary keys in a second query:select id from a order by id limit x,y;
This actually works, but now the size of the second query is proportional to the page size. This has crashed our MySQL clustered database if the pages grow over 1000 records.select * from a inner join b on b.a_id = a.id where a.id in (list of ID's from the previous query);
A no-join solution
Another obviously working solution is to not join at all:If necessary, the b records can be fetched from the database with subsequent queries. If, in the context of a certain application, all these b records are going to be needed anyway, this has a tremendous performance impact because in stead of only 1 (fairly complex) query, we now have to do n + 1 (simple) queries.select * from a order by id limit x,y;
Dropping the fixed page size requirement
If we drop the requirement that every page (except the last one) should be of the same size, we can page the query without relying on the MySQLlimit
construct. This allows us to page on actual well chosen constraints, rather than on the number of results that happens to be the consequence of a certain set of joins in a query. For example, we could page on the primary key:
With an auto increment primary key on a table where deletes are not very frequent, this might yield good results. However, if from time to time many records are being deleted from the database, this might result in smaller or even empty pages.select * from a inner join b on b.a_id = a.id where a.id > x and a.id <= x+y;