Re: Bitmap scan is undercosted? - boolean correlation - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Bitmap scan is undercosted? - boolean correlation
Date
Msg-id CAMkU=1xQs6=fB+W+=3suB0ZeGsNi8ekBqEqXQ8FriTPw5kcuVQ@mail.gmail.com
Whole thread Raw
In response to Re: Bitmap scan is undercosted? - boolean correlation  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Bitmap scan is undercosted? - boolean correlation  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby wrote: > On Sat, Dec 02, 2017 at 05:27:51PM -0800, Jeff Janes wrote: > > I think the non-extended stats code also has trouble with booleans. > > pg_stats gives me a correlation of 0.8 or higher for the flag column. > > It's not due to the boolean though; you see the same thing if you do: > CREATE INDEX aaa_f ON aaa((flag::text)); > ANALYZE aaa; > correlation | 0.81193 > > or: > ALTER TABLE aaa ADD flag2 int; UPDATE aaa SET flag2= flag::int > correlation | 0.81193 > > I think it's caused by having so few (2) values to correlate. > > most_common_vals | {f,t} > most_common_freqs | {0.9014,0.0986} > correlation | 0.822792 > > It thinks there's somewhat-high correlation since it gets a list of x and y > values (integer positions by logical and physical sort order) and 90% of > the x > list (logical value) are the same value ('t'), and the CTIDs are in order > on > the new index, so 90% of the values are 100% correlated. > But there is no index involved (except in the case of the functional index). The correlation of table columns to physical order of the table doesn't depend on the existence of an index, or the physical order within an index. But I do see that ties within the logical order of the column values are broken to agree with the physical order. That is wrong, right? Is there any argument that this is desirable? It looks like it could be fixed with a few extra double calcs per distinct value. Considering we already sorted the sample values using SQL-callable collation dependent comparators, I doubt a few C-level double calcs is going to be meaningful. Cheers, Jeff

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bitmap scan is undercosted?
Next
From: Tom Lane
Date:
Subject: Re: Bitmap scan is undercosted? - boolean correlation