Re: Relaxing constraints on BitmapAnd eligibility? - Mailing list pgsql-hackers

From Dmytro Astapov
Subject Re: Relaxing constraints on BitmapAnd eligibility?
Date
Msg-id CAFQUnFiGt0qcBCAv_V6RuRWWBPY9=BH2_oVxtXUFZ_kH9Neshw@mail.gmail.com
Whole thread Raw
In response to Re: Relaxing constraints on BitmapAnd eligibility?  (Dmytro Astapov <dastapov@gmail.com>)
List pgsql-hackers
Hi

And here is a proposed code change, alternative to the doc change.

I hope that it is ok to relax the restriction like so, as `cost_bitmap_node_and` is already resigned to inputs not being independent:
* We estimate AND selectivity on the assumption that the inputs are
* independent.  This is probably often wrong, but we don't have the info
* to do better.

I've ran "make check", and one test have changed (the last one in the "test extraction of restriction OR clauses from join OR clause" group - in an expected way, IMO).
I am attaching the regression.diff for convenience.

Is this a generally desirable change to pursue?

Best regards, Dmytro


On Wed, Feb 26, 2025 at 7:31 PM Dmytro Astapov <dastapov@gmail.com> wrote:
Hi!

I am (still) very unsure if the code change I mentioned will make sense, but documentation chage could perhaps look like something along these lines?



Best regards, Dmytro


On Tue, Feb 25, 2025 at 9:14 PM Dmytro Astapov <dastapov@gmail.com> wrote:
Hi!

I've been investigating why postgres does not do BitmapAnd of two well-suited indexes, and reading indxpath.c

In my case, there is a table (d date, col1 int, col2 int) -- types not really important -- and there are two indices on (d,col1) and (d, col2).

For queries that do WHERE d>=X AND col1=Y AND col2=Z postgres will never BitmapAnd those two indices because both indexes include (d) and we have a condition on (d). Here is a full example, which could also be seen here: https://www.db-fiddle.com/f/uPLx5bRtDEoZw3Dx4kkwKh/0:

begin;

CREATE TABLE test_table (
    d date,
    col1 int,
    col2 int
);

INSERT INTO test_table (d, col1, col2)
SELECT
    d.date,
    c1.val as col1,
    c2.val as col2
FROM
    generate_series(
        '2023-01-01'::date,
        '2023-12-31'::date,
        '1 day'::interval
    ) as d(date),
    generate_series(1, 1000) as c1(val),
    generate_series(1, 1000) as c2(val)
WHERE
    random() < 0.001;

create index on test_table(col1,d);
create index on test_table(col2,d);

-- This uses BitmapAnd
explain select * from test_table where col1=123 and col2=321;

-- This does not use BitmapAnd, even though it could!
explain select * from test_table where col1=123 and col2=321 and d >= '2023-05-05';

I checked that BitmapAnd is rejected by this line in choose_bitmap_and:

   if (bms_overlap(pathinfo->clauseids, clauseidsofar))
      continue; /* consider it redundant */

There is a comment on choose_bitmap_and that explains the rationale of this check, but reading it I can't help but feel that what the comment describes is this condition:

   if (bms_is_subset(pathinfo->clauseids, clauseidsofar))
      continue; /* consider it redundant */

And indeed, in my (admittedly not super-extensive) testing changing bms_overlap to bms_is_subset leads to better faster execution plans. 

Is it possible that this condition could thus be relaxed?

Even if I am wrong, and the condition absolutely should be bms_overlap, I feel that this restriction is very very hard to discover and perhaps https://www.postgresql.org/docs/current/indexes-bitmap-scans.html should mention that compound indexes that have columns in common will never be combined?

Best regards, Dmytro
Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Statistics Import and Export
Next
From: "Devulapalli, Raghuveer"
Date:
Subject: RE: Improve CRC32C performance on SSE4.2