Re: Multicolumn index scan efficiency - Mailing list pgsql-performance

From Vitalii Tymchyshyn
Subject Re: Multicolumn index scan efficiency
Date
Msg-id CABWW-d154jXcE143KKb8UYMnjKviG=fsJPmbYxcERyr2ocOKpg@mail.gmail.com
Whole thread Raw
In response to Re: Multicolumn index scan efficiency  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Multicolumn index scan efficiency
List pgsql-performance
Thank you so much for both clarifying and fixing it!
In our case (FYI, this is from http://github.com/cdapio/cdap) a lot of users have just a single namespace, so it effectively means scanning till the end of the index.
We'll fix https://github.com/cdapio/cdap/blob/develop/cdap-data-fabric/src/main/java/io/cdap/cdap/spi/data/sql/PostgreSqlStructuredTable.java to detect equality scan prefixes and make corresponding SQL. That would fix it for all postgres versions. 

Best regards, Vitalii Tymchyshyn

нд, 9 лист. 2025 р. о 20:21 Peter Geoghegan <pg@bowt.ie> пише:
On Sun, Nov 9, 2025 at 9:44 PM Vitalii Tymchyshyn <vit@tym.im> wrote:
> I am wondering about 2 things:
> 1) Does anyone know which specific change / version made it fast?
> 2) What was the proper way to do a range index scan like WHERE (a,b,c) between (x1,y1,z1) and (x2,y2,z2) before the improvement.
> Note that my tests can mostly be rewritten as equality at least for some columns (and this is what we'll do), but sometimes we do need a range scan like above, so understanding it would be important. Also I am curious :).

This improvement you're seeing here is down to work in commit
bd3f59fd. The short version is that the way we used to decide when a
condition like "WHERE (a,b,c) <= (x2,y2,z2)" was needlessly
conservative. If there were many "a" values equal to x2, we'd have to
scan the index until we got to the next distinct/non-equal "a" value
-- without realizing that we're already past the point where there
cannot possibly be any more matches.

See the discussion on this thread which complained about the problem,
particularly my response to the complaint:

https://www.postgresql.org/message-id/flat/CAH2-WzmLREy6r68A6SEHXnstg01kNs1HiQtOvSO5cTvWuaducw%40mail.gmail.com#62e393ac8bbf06f0f73598ba2ceeab69

--
Peter Geoghegan

pgsql-performance by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Multicolumn index scan efficiency
Next
From: Peter Geoghegan
Date:
Subject: Re: Multicolumn index scan efficiency