Re: index fragmentation on insert-only table with non-unique column - Mailing list pgsql-performance

From Claudio Freire
Subject Re: index fragmentation on insert-only table with non-unique column
Date
Msg-id CAGTBQpb=HAXRWKRSddbLdAxN7zUyn5wL-pJkr9QZmgpTFneZWw@mail.gmail.com
Whole thread Raw
In response to Re: index fragmentation on insert-only table with non-unique column  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-performance
On Sat, Aug 13, 2016 at 3:54 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Sun, Jun 05, 2016 at 12:28:47PM -0700, Jeff Janes wrote:
>> On Sun, Jun 5, 2016 at 9:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Claudio Freire <klaussfreire@gmail.com> writes:
>> >> So correlated index scans look extra favourable vs bitmap index scans
>> >> because bitmap heap scans consider random page costs sans correlation
>> >> effects (even though correlation applies to bitmap heap scans as
>> >> well).
>> >
>> > Really?  How?  The index ordering has nothing to do with the order in
>> > which heap tuples will be visited.
>>
>> It is not the order itself, but the density.
>>
>> If the index is read in a range scan (as opposed to =ANY scan), and
>> the index lead column is correlated with the table ordering, then the
>> parts of the table that need to be visited will be much denser than if
>> there were no correlation.  But Claudio is saying that this is not
>> being accounted for.
>
> I didn't completely understand Claudio/Jeff here, and not sure if we're on the
> same page.  For queries on these tables, the index scan was very slow, due to
> fragmented index on non-unique column, and seq scan would have been (was)
> faster (even if it means reading 70GB and filtering out 6 of 7 days' data).
> That was resolved by added a nightly reindex job (..  which sometimes competes
> with other maintenance and has trouble running every table every night).

Yes, but a bitmap index scan should be faster than both, but the
planner is discarding it beause it estimates it will be slower,
because it doesn't account for correlation between index keys and
physical location.

And, while what you clarify there would indeed affect the estimation
for index scans, it would only make the issue worse: the planner
thinks the index scan will be better than it really is, because it's
expecting correlation, but the "fragmentation" of same-key runs
destroys that correlation. A bitmap index scan would restore it,
though, so the bitmap index scan would be that much better.


pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: index fragmentation on insert-only table with non-unique column
Next
From: Merlin Moncure
Date:
Subject: Re: what's the slowest part in the SQL