Re: [HACKERS] pgsql 10: hash indexes testing - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: [HACKERS] pgsql 10: hash indexes testing |
Date | |
Msg-id | CAA4eK1KKq80BYOc+mcmHcQzV0Mcs3AHGjEEf--TnLaJbkeTgmg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] pgsql 10: hash indexes testing (AP <ap@zip.com.au>) |
Responses |
Re: [HACKERS] pgsql 10: hash indexes testing
|
List | pgsql-hackers |
On Fri, Jul 7, 2017 at 8:22 AM, AP <ap@zip.com.au> wrote: > On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote: >> I think if you are under development, it is always advisable to create >> indexes after initial bulk load. That way it will be faster and will >> take lesser space atleast in case of hash index. > > This is a bit of a pickle, actually: > * if I do have a hash index I'll wind up with a bloated one at some stage > that refused to allow more inserts until the index is re-created > * if I don't have an index then I'll wind up with a table where I cannot > create a hash index because it has too many rows for it to handle > > I'm at a bit of a loss as to how to deal with this. > I can understand your concerns. To address first concern we need to work on one or more of following work items: (a) work on vacuums that can be triggered on insert only workload (it should perform index vacuum as well) (b) separate utility statement/function to squeeze hash index (c) db internally does squeezing like after each split, so that chances of such a problem can be reduced, but that will be at the cost of performance reduction in other workloads, so not sure if it is advisable. Among these (b) is simplest to do but may not be convenient for the user. To address your second concern, we need to speed up the creation of hash index which is a relatively big project. Having said that, I think in your case, this is one-time operation so spending once more time might be okay. > >> >> As mentioned above REINDEX might be a better option. I think for such >> >> situation we should have some provision to allow squeeze functionality >> >> of hash exposed to the user, this will be less costly than REINDEX and >> >> might serve the purpose for the user. Hey, can you try some hack in >> > >> > Assuming it does help, would this be something one would need to guess >> > at? "I did a whole bunch of concurrent INSERT heavy transactions so I >> > guess I should do a squeeze now"? >> > >> > Or could it be figured out programmatically? >> >> I think one can refer free_percent and number of overflow pages to >> perform such a command. It won't be 100% correct, but we can make a >> guess. We can even check free space in overflow pages with page >> inspect to make it more accurate. > > Does this take much time? Main reason I am asking is that this looks like > something that the db ought to handle underneath (say as part of an autovac > run) and so if there are stats that the index code can maintain that can > then be used by the autovac (or something) code to trigger a cleanup this > I think would be of benefit. > Sure, I agree that database should automatically handle bloat, but as said above this will be a separate project and may not be straight forward. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: