Re: index fragmentation on insert-only table with non-unique column - Mailing list pgsql-performance

From Stephen Frost
Subject Re: index fragmentation on insert-only table with non-unique column
Date
Msg-id 20160525042625.GX21416@tamriel.snowman.net
Whole thread Raw
In response to Re: index fragmentation on insert-only table with non-unique column  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-performance
* Peter Geoghegan (pg@bowt.ie) wrote:
> On Tue, May 24, 2016 at 10:39 AM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I was able to see great improvement without planner parameters by REINDEX the
> > timestamp index.  My theory is that the index/planner doesn't handle well the
> > case of many tuples with same column value, and returns pages out of logical
> > order.  Reindex fixes that, rewriting the index data with pages in order
> > (confirmed with pageinspect), which causes index scans to fetch heap data more
> > or less monotonically (if not consecutively).  strace shows that consecutive
> > read()s are common (without intervening seeks).  I gather this allows the OS
> > readahead to kick in.
>
> The basic problem is that the B-Tree code doesn't maintain this
> property. However, B-Tree index builds will create an index that
> initially has this property, because the tuplesort.c code happens to
> sort index tuples with a CTID tie-breaker.
>
> > Postgres seems to assume that the high degree of correlation of the table
> > column seen in pg_stats is how it will get data from the index scan, which
> > assumption seems to be very poor on what turns out to be a higly fragmented
> > index.  Is there a way to help it to understand otherwise??
>
> Your complaint is vague. Are you complaining about the planner making
> a poor choice? I don't think that's the issue here, because you never
> made any firm statement about the planner making a choice that was
> worth than an alternative that it had available.
>
> If you're arguing for the idea that B-Trees should reliably keep
> tuples in order by a tie-break condition, that seems difficult to
> implement, and likely not worth it in practice.

The ongoing discussion around how to effectively handle duplicate values
in B-Tree seems relevant to this.  In particular, if we're able to store
duplicate values efficiently and all the locations under a single key
are easily available then we could certainly sort those prior to going
and visiting them.

That's not quite the same as keeping the tuples in order in the heap,
but would more-or-less achieve the effect desired, I believe?

Thanks!

Stephen

Attachment

pgsql-performance by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: index fragmentation on insert-only table with non-unique column
Next
From: Tom Lane
Date:
Subject: Re: index fragmentation on insert-only table with non-unique column