Re: [HACKERS] pgsql 10: hash indexes testing - Mailing list pgsql-hackers

From AP
Subject Re: [HACKERS] pgsql 10: hash indexes testing
Date
Msg-id 20170707025218.7jwlg4ajli46g62u@zip.com.au
Whole thread Raw
In response to Re: [HACKERS] pgsql 10: hash indexes testing  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] pgsql 10: hash indexes testing
List pgsql-hackers
On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote:
> I think if you are under development, it is always advisable to create
> indexes after initial bulk load.  That way it will be faster and will
> take lesser space atleast in case of hash index.

This is a bit of a pickle, actually:
* if I do have a hash index I'll wind up with a bloated one at some stage that refused to allow more inserts until the
indexis re-created
 
* if I don't have an index then I'll wind up with a table where I cannot create a hash index because it has too many
rowsfor it to handle
 

I'm at a bit of a loss as to how to deal with this. The DB design does come
with a kind of partitioning where a bundle of tables get put off to the side
and searched seperately as needed but too many of those and the impact on
performance can be noticed so I need to minimise them.

> >> As mentioned above REINDEX might be a better option.  I think for such
> >> situation we should have some provision to allow squeeze functionality
> >> of hash exposed to the user, this will be less costly than REINDEX and
> >> might serve the purpose for the user.  Hey, can you try some hack in
> >
> > Assuming it does help, would this be something one would need to guess
> > at? "I did a whole bunch of concurrent INSERT heavy transactions so I
> > guess I should do a squeeze now"?
> >
> > Or could it be figured out programmatically?
> 
> I think one can refer free_percent and number of overflow pages to
> perform such a command.  It won't be 100% correct, but we can make a
> guess.  We can even check free space in overflow pages with page
> inspect to make it more accurate.

Does this take much time? Main reason I am asking is that this looks like
something that the db ought to handle underneath (say as part of an autovac
run) and so if there are stats that the index code can maintain that can
then be used by the autovac (or something) code to trigger a cleanup this
I think would be of benefit.

Unless I am being /so/ unusual that it's not worth it. :)

I'll reply to the rest in a separate stream as I'm still poking other
work related things atm so can't do the debug testing as yet.

AP



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: [HACKERS] Multi column range partition table
Next
From: Masahiko Sawada
Date:
Subject: [HACKERS] Fix header comment of streamutil.c