Re: init_sequence spill to hash table - Mailing list pgsql-hackers

From David Rowley
Subject Re: init_sequence spill to hash table
Date
Msg-id CAApHDvpWsc72i7bce-Xvf9YdXPFDCJs-yUkBikTzBE5XA31rCA@mail.gmail.com
Whole thread Raw
In response to Re: init_sequence spill to hash table  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: init_sequence spill to hash table
List pgsql-hackers
On Thu, Nov 14, 2013 at 12:15 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
On 13.11.2013 11:55, David Rowley wrote:
I thought I would post the patch early to see if this is actually wanted
before I do too much more work on it.

Seems reasonable.


My implementation maintains using the linear list for sequences up to a
defined threshold (currently 32) then it moves everything over to a
hashtable and free's off the list.

Did you check how it would perform if you just always used the hash table? Or if you just have a single entry before you move to hash table, ie. set the threshold to 2? That would be slightly simpler.


I've just completed some more benchmarking of this. I didn't try dropping the threshold down to 2 or 0 but I did tests at the cut over point and really don't see much difference in performance between the list at 32 and the hashtable at 33 sequences. The hash table version excels in the 16000 sequence test in comparison to the unpatched version.

Times are in milliseconds of the time it took to call currval() 100000 times for 1 sequence.
 PatchedUnpatchedincreased by
1 in cache1856.4521844.11-1%
32 in cache1841.841802.433-2%
33 in cache1861.558 not testedN/A
16000 in cache1963.71110329.22426%

There was not much point in testing 33 sequences with the unpatched version as there is no switching to hash table. Most likely the 1 and 2% drop in speed is noise as the only instructions that have been added here are adding test to check if the hashtable has been created in init_sequence and also a counter to count the number of sequences in the list while still in list mode.

I also tested filling the cache with 30000 sequences then inserting 100000 records into a table which used currval() for the default of a column. The sequence I used would have been half way up the linked list in the unpatched version, so init_sequence would have had to loop 15000 times before finding the sequence.

Here is the average and median over 8 runs of the INSERT statement
PatchedUnpatchedincreased by
AVERAGE482.2626597.6637524%
MEDIAN471.2205567.94921%


Just for clarity, during testing I added the following line to the switch_to_hashtable() function

elog(NOTICE, "moved sequences into hash table");

I've attached seqeuence.sql which includes all of my tests, the functions I used for benchmarking and comments with the results that I got during running the tests.

I've also attached the results formatted a little better in a spreadsheet.

Please let me know if there is something else you think I should test.

Regards

David Rowley
 
- Heikki

Attachment

pgsql-hackers by date:

Previous
From: KONDO Mitsumasa
Date:
Subject: Re: Add min and max execute statement time in pg_stat_statement
Next
From: Andrew Dunstan
Date:
Subject: Re: nested hstore patch