Re: Bitmap scan is undercosted? - boolean correlation - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: Bitmap scan is undercosted? - boolean correlation
Date
Msg-id 20171203040430.GC14008@telsasoft.com
Whole thread Raw
In response to Re: Bitmap scan is undercosted?  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Bitmap scan is undercosted? - boolean correlation
List pgsql-performance
On Sat, Dec 02, 2017 at 05:27:51PM -0800, Jeff Janes wrote:
> I think the non-extended stats code also has trouble with booleans.
> pg_stats gives me a correlation  of 0.8 or higher for the flag column.

It's not due to the boolean though; you see the same thing if you do:
CREATE INDEX aaa_f ON aaa((flag::text));
ANALYZE aaa;
correlation | 0.81193

or:
ALTER TABLE aaa ADD flag2 int; UPDATE aaa SET flag2= flag::int
correlation | 0.81193

I think it's caused by having so few (2) values to correlate.

most_common_vals       | {f,t}
most_common_freqs      | {0.9014,0.0986}
correlation            | 0.822792

It thinks there's somewhat-high correlation since it gets a list of x and y
values (integer positions by logical and physical sort order) and 90% of the x
list (logical value) are the same value ('t'), and the CTIDs are in order on
the new index, so 90% of the values are 100% correlated.

It improves (by which I mean here that it spits out a lower number) if it's not
a 90/10 split:

CREATE TABLE aaa5 AS SELECT (id%100)::int num, (id%10>5)::bool flag FROM generate_series(1, 10000000) id;
CREATE INDEX ON aaa5 (flag);

tablename   | aaa5
attname     | flag
correlation | 0.522184

Justin


pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Bitmap scan is undercosted?
Next
From: Vitaliy Garnashevich
Date:
Subject: Re: Bitmap scan is undercosted?