Thread: Efficient pagination using multi-column cursors
Hi folks,
I am working on optimizing a query that attempts to efficiently paginate through a large table using multi-column "cursors" aka. the "seek method" (as described in detail here: https://use-the-index-luke.com/sql/partial-results/fetch-next-page).
The table (drastically simplified) looks like this:
CREATE TABLE data
(
col_1 int NOT NULL,
col_2 int NOT NULL,
col_3 int NOT NULL,
content varchar(10) NOT NULL
);
And has an appropriate index:
CREATE INDEX data_index ON data (col_1, col_2, col_3);
The recommended query to paginate through this table is using the "row values" syntax:
SELECT content
FROM data
WHERE (col_1, col_2, col_3) > (10, 20, 29)
ORDER BY col_1, col_2, col_3
LIMIT 100;
Which results in a perfectly optimized query plan (slightly edited for readability, using test data of 1 million rows, on PostgreSQL 17.2):
Limit (cost=0.42..5.33 rows=100 width=20)
(actual time=0.084..0.197 rows=100 loops=1)
-> Index Scan using data_index on data
(cost=0.42..43617.30 rows=889600 width=20)
(actual time=0.081..0.176 rows=100 loops=1)
Index Cond: (ROW(col_1, col_2, col_3) > ROW(10, 20, 29))
Planning Time: 0.344 ms
Execution Time: 0.264 ms
However, in reality, my query uses a mix of ascending and descending ordering (with an index matching the order columns), in which case the WHERE (col_1, col_2, col_3) > (10, 20, 29) syntax is not an option (unless I somehow derive "reversed" data from the column, which I would like to avoid).
Therefore I am using an equivalent query using multiple WHERE conditions, something like this (for simplicity, no mixed ordering is involved in the examples):
SELECT content
FROM data
WHERE
col_1 >= 10
AND (
col_1 > 10
OR (
col_2 >= 20
AND (
col_2 > 20
OR col_3 > 29
)
)
)
ORDER BY col_1, col_2, col_3
LIMIT 100;
Which returns the same rows, but the query plan is slightly less efficient:
Limit (cost=0.42..6.48 rows=100 width=20)
(actual time=0.848..0.893 rows=100 loops=1)
-> Index Scan using data_index on data
(cost=0.42..52984.52 rows=874874 width=20)
(actual time=0.847..0.884 rows=100 loops=1)
Index Cond: (col_1 >= 10)
Filter: ((col_1 > 10) OR ((col_2 >= 20) AND ((col_2 > 20) OR (col_3 > 29))))
Rows Removed by Filter: 2030
Planning Time: 0.133 ms
Execution Time: 0.916 ms
Instead of exclusively relying on an index access predicate, this plan involves an index filter predicate.
Without being familiar the internals of the query planner, I *think* there *should* be a way to come up with WHERE conditions that results in the "perfect" plan. There are several ways to phrase the conditions, of which I've tried a few, only to get the same or worse performance. Does anyone have a suggestion for a better query?
I am also aware that I might be chasing an optimization with low returns (yet to see how it performs in Real Life data) but I'm already too deep down in a rabbit hole without being able to turn back without knowing The Truth.
I've dumped my experiments into a DB Fiddle: https://www.db-fiddle.com/f/kd8zaibZGKGH1HSStyxNkx/0
Cheers,
Márton
On Wed, Feb 26, 2025 at 9:29 AM <large.goose2829@salomvary.com> wrote: > Without being familiar the internals of the query planner, I *think* there *should* be a way to come up with WHERE conditionsthat results in the "perfect" plan. There is a fundamental trade-off involved here. The simple, fast "WHERE (col_1, col_2, col_3) > (10, 20, 29)" query returns whatever tuples are stored immediately after "(10, 20, 29)" in the index. Naturally, they're returned in index order, which is usually the most useful order (simple ASC order or simple DESC order for all columns). The B-Tree code can physically traverse your mixed-ASC-and-DESC order index in almost the same way. But it is much less useful, since the matching index tuples won't be physically located together as exactly one contiguous group of tuples. And so (with your "handwritten" row comparison) you get a filter qual that filters out non-matching tuples using lower-order index columns. The index scan actually just returns "Index Cond: (col_1 >= 10)" (which still returns a contiguous group of index tuples), while a filter condition excludes those tuples returned by the index scan node that don't satisfy the later/lower-order column condition. The book "Relational Database Index Design and the Optimizers" proposes a vocabulary for the trade-offs in this area -- the 3 star model. When creating the best possible index for certain queries it is sometimes inherently necessary to choose between what it calls the first star (which means avoiding a sort) and the second star (which means having the thinnest possible "row slice"). Sometimes those things are in tension, which makes sense when you imagine how the index must be physically traversed. -- Peter Geoghegan
Thanks for the insights!
On Wed, Feb 26, 2025, at 16:05, Peter Geoghegan wrote:
On Wed, Feb 26, 2025 at 9:29 AM <large.goose2829@salomvary.com> wrote:> Without being familiar the internals of the query planner, I *think* there *should* be a way to come up with WHERE conditions that results in the "perfect" plan.There is a fundamental trade-off involved here. The simple, fast"WHERE (col_1, col_2, col_3) > (10, 20, 29)" query returns whatevertuples are stored immediately after "(10, 20, 29)" in the index.Naturally, they're returned in index order, which is usually the mostuseful order (simple ASC order or simple DESC order for all columns).The B-Tree code can physically traverse your mixed-ASC-and-DESC orderindex in almost the same way. But it is much less useful, since thematching index tuples won't be physically located together as exactlyone contiguous group of tuples.
I am not sure I understand this.
My understanding is that given this "mixed order" index:
CREATE INDEX data_index_desc ON data (col_1, col_2 DESC, col_3);
The index tuples are physically organized exactly in this way:
ORDER BY col_1, col_2 DESC, col_3
So that I should be able to write a query that reads a continuous range from this index without filtering.
And so (with your "handwritten" rowcomparison) you get a filter qual that filters out non-matching tuplesusing lower-order index columns. The index scan actually just returns"Index Cond: (col_1 >= 10)" (which still returns a contiguous group ofindex tuples), while a filter condition excludes those tuples returnedby the index scan node that don't satisfy the later/lower-order columncondition.
Does this mean that it is not possible to come up with a plan that has the same performance as "WHERE (col_1, col_2, col_3) > (10, 20, 29)" using "handwritten" filters, or only for "mixed order"? Or not a theoretical limitation but a limitation of the current implementation of the query planner?
I don't know whether it's polite to bring up competitors on this mailing list, but MySQL 8 (which quite ironically has poor performance for the "row values" syntax, doing a full index scan) seems to avoid index filtering using the "OR AND" variant (at least when no mixed ordering is involved):
SELECT content
FROM data
WHERE
col_1 > 10
OR (
col_1 = 10
AND (
col_2 > 20
OR (
col_2 = 20
AND col_3 > 29
)
)
)
ORDER BY col_1, col_2, col_3
LIMIT 100;
-> Limit: 100 row(s) (cost=100710 rows=100) (actual time=0.0322..1.15 rows=100 loops=1)
-> Index range scan on data using data_index
over (col_1 = 10 AND col_2 = 20 AND 29 < col_3) OR (col_1 = 10 AND 20 < col_2) OR (10 < col_1),
with index condition: ((`data`.col_1 > 10) or ((`data`.col_1 = 10) and ((`data`.col_2 > 20) or ((`data`.col_2 = 20) and (`data`.col_3 > 29)))))
(cost=100710 rows=511937) (actual time=0.0316..1.14 rows=100 loops=1)
Still slightly slower actual total time on the same machine as PostgreSQL though (based on a single EXPLAIN ANALYZE sample only).
The book "Relational Database Index Design and the Optimizers"proposes a vocabulary for the trade-offs in this area -- the 3 starmodel. When creating the best possible index for certain queries it issometimes inherently necessary to choose between what it calls thefirst star (which means avoiding a sort) and the second star (whichmeans having the thinnest possible "row slice"). Sometimes thosethings are in tension, which makes sense when you imagine how theindex must be physically traversed.
Aka. "Good, Fast, Cheap — Pick Any Two" ;)
Cheers,
Márton
On Wed, Feb 26, 2025 at 10:40 AM <large.goose2829@salomvary.com> wrote: > My understanding is that given this "mixed order" index: > CREATE INDEX data_index_desc ON data (col_1, col_2 DESC, col_3); > > The index tuples are physically organized exactly in this way: > ORDER BY col_1, col_2 DESC, col_3 > > So that I should be able to write a query that reads a continuous range from this index without filtering. Yes, you can. For example, if you use the predicate "col_1 >= 10", it can work in this way, even with a mixed-asc-and-desc order multicolumn index -- without any filtering. Same thing if you don't have any predicate at all -- there's no lower-order columns to filter on because there's no columns to filter on at all. The actual predicate that you're interested in isn't like that. It cannot use an index that both returns rows in the mixed-ASC-and-DESC order that you want (and so terminate early with a LIMIT and whatnot), while at the same time accessing all the tuples as exactly one contiguous group. You have to pick which is more important. It sounds very much like having the most useful sort order is important. > Does this mean that it is not possible to come up with a plan that has the same performance as "WHERE (col_1, col_2, col_3)> (10, 20, 29)" using "handwritten" filters, or only for "mixed order"? Or not a theoretical limitation but a limitationof the current implementation of the query planner? Perhaps the query planner should be taught to rewrite the query in such a way as to make it unnecessary for you to do so -- I think that that's what MySQL is doing for you. That is beside the point. Again, think about how things are physically laid out in an index which mixes ASC and DESC order. It is inevitable that the scan has to traverse over non-matching tuples in order to read all of the matching tuples (or to read a given number of matching tuples). This has nothing to do with the query planner. > Aka. "Good, Fast, Cheap — Pick Any Two" ;) It's not like that. Often it just isn't necessary to pick any 2 -- you can have all 3, because the requirements of the query allow it. (Plus it would never make sense to pick the first and second stars over the third.) -- Peter Geoghegan
On Wed, 2025-02-26 at 15:27 +0100, large.goose2829@salomvary.com wrote: > I am working on optimizing a query that attempts to efficiently paginate > through a large table using multi-column "cursors" aka. the "seek method" > (as described in detail here: > https://use-the-index-luke.com/sql/partial-results/fetch-next-page). > > The table (drastically simplified) looks like this: > > CREATE TABLE data > ( > col_1 int NOT NULL, > col_2 int NOT NULL, > col_3 int NOT NULL, > content varchar(10) NOT NULL > ); > > And has an appropriate index: > > CREATE INDEX data_index ON data (col_1, col_2, col_3); > > The recommended query to paginate through this table is using the "row values" syntax: > > SELECT content > FROM data > WHERE (col_1, col_2, col_3) > (10, 20, 29) > ORDER BY col_1, col_2, col_3 > LIMIT 100; > > Which results in a perfectly optimized query plan > > However, in reality, my query uses a mix of ascending and descending ordering (with an > index matching the order columns), in which case the WHERE (col_1, col_2, col_3) > (10, 20, 29) > syntax is not an option (unless I somehow derive "reversed" data from the column, > which I would like to avoid). Here are my ideas for this situation: https://www.cybertec-postgresql.com/en/keyset-pagination-with-descending-order/ Yours, Laurenz Albe