Re: Bitmap scan is undercosted? - boolean correlation - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Bitmap scan is undercosted? - boolean correlation
Date
Msg-id CAMkU=1yCX_WK0KYUOhRSKG2a771kiK+QWU5YXxt3EPhEXCLDQQ@mail.gmail.com
Whole thread Raw
In response to Re: Bitmap scan is undercosted? - boolean correlation  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Bitmap scan is undercosted? - boolean correlation
List pgsql-performance
On Dec 3, 2017 15:31, "Tom Lane" wrote: Jeff Janes writes: > On Sat, Dec 2, 2017 at 8:04 PM, Justin Pryzby wrote: >> It thinks there's somewhat-high correlation since it gets a list of x >> and y values (integer positions by logical and physical sort order) and >> 90% of the x list (logical value) are the same value ('t'), and the >> CTIDs are in order on the new index, so 90% of the values are 100% >> correlated. > But there is no index involved (except in the case of the functional > index). The correlation of table columns to physical order of the table > doesn't depend on the existence of an index, or the physical order within > an index. > But I do see that ties within the logical order of the column values are > broken to agree with the physical order. That is wrong, right? Is there > any argument that this is desirable? Uh ... what do you propose doing instead? We'd have to do something with ties, and it's not so obvious this way is wrong. Let them be tied. If there are 10 distinct values, number the values 0 to 9, and all rows of a given distinct values get the same number for the logical order axis. Calling the correlation 0.8 when it is really 0.0 seems obviously wrong to me. Although if we switched btree to store duplicate values with tid as a tie breaker, then maybe it wouldn't be as obviously wrong. Cheers, Jeff

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Bitmap scan is undercosted? - boolean correlation
Next
From: Mariel Cherkassky
Date:
Subject: vacuum after truncate