Home > mailing lists

Re: index fragmentation on insert-only table with non-unique column - Mailing list pgsql-performance

From	Peter Geoghegan
Subject	Re: index fragmentation on insert-only table with non-unique column
Date	May 25, 2016 08:18:33
Msg-id	CAH2-Wzn1i=MN6smuUxaq=oMhMt_5xWVEfL+u5V+nDKcYMyUydQ@mail.gmail.com Whole thread Raw
In response to	Re: index fragmentation on insert-only table with non-unique column (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-performance

Tree view

On Tue, May 24, 2016 at 9:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah.  I wonder what would happen if we used the same rule for index
> insertions.  It would likely make insertions more expensive, but maybe
> not by much.  The existing "randomization" rule for where to insert new
> items in a run of identical index entries would go away, because the
> insertion point would become deterministic.  I am not sure if that's
> good or bad for insertion performance, but it would likely help for
> scan performance.

I think that if somebody tacked on a tie-breaker in the same way as in
tuplesort.c's B-Tree IndexTuple, there'd be significant negative
consequences.

The equal-to-high-key case gets a choice of which page to put the new
IndexTuple on, and I imagine that that's quite useful when it comes
up. I'd also have concerns about the key space in the index. I think
that it would seriously mess with the long term utility of values in
internal pages, which currently can reasonably have little to do with
the values currently stored in leaf pages. They're functionally only
separators of the key space that guide index scans, so it doesn't
matter if the actual values are completely absent from the leaf
pages/the table itself (perhaps some of the values in the internal
pages were inserted years ago, and have long since been deleted and
vacuumed away). Provided the distribution of values at the leaf level
is still well characterized at higher levels (e.g. many string values
that start with vowels, very few that start with the letters 'x' or
'z'), there should be no significant bloat. That's very valuable.
Unique indexes are another problem for this naive approach.

Maybe somebody could do better with a more sophisticated approach, but
it's probably better to focus on duplicate storage or even leaf page
compression, as Stephen mentioned.

--
Peter Geoghegan

pgsql-performance by date:

From: Tom Lane
Date: 25 May 2016, 07:43:14
Subject: Re: index fragmentation on insert-only table with non-unique column

From: Jeff Janes
Date: 25 May 2016, 09:23:54
Subject: Re: index fragmentation on insert-only table with non-unique column

Re: index fragmentation on insert-only table with non-unique column - Mailing list pgsql-performance

Previous

Next