Re: init_sequence spill to hash table - Mailing list pgsql-hackers

From David Rowley
Subject Re: init_sequence spill to hash table
Date
Msg-id CAApHDvoAmJWtRQy03O33ijmw+tPwQLaQnDxrcRd8=OpSJEVVUg@mail.gmail.com
Whole thread Raw
In response to Re: init_sequence spill to hash table  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
<div dir="ltr"><br /><div class="gmail_extra"><br /><br /><div class="gmail_quote">On Fri, Nov 15, 2013 at 3:03 AM,
HeikkiLinnakangas <span dir="ltr"><<a href="mailto:hlinnakangas@vmware.com"
target="_blank">hlinnakangas@vmware.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0px 0px
0px0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div
class="im">On14.11.2013 14:38, David Rowley wrote:<br /><blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I've just
completedsome more benchmarking of this. I didn't try dropping<br /> the threshold down to 2 or 0 but I did tests at
thecut over point and<br /> really don't see much difference in performance between the list at 32 and<br /> the
hashtableat 33 sequences. The hash table version excels in the 16000<br /> sequence test in comparison to the unpatched
version.<br/><br /> Times are in milliseconds of the time it took to call currval() 100000<br /> times for 1
sequence.<br/>       Patched Unpatched increased by  1 in cache 1856.452 1844.11 -1%  32 in<br /> cache 1841.84
1802.433-2%  33 in cache 1861.558  not tested N/A  16000 in<br /> cache 1963.711 10329.22 426%<br /></blockquote><br
/></div>If I understand those results correctly, the best case scenario with the current code takes about 1800 ms.
There'spractically no difference with N <= 32, where N is the number of sequences touched. The hash table method
alsotakes about 1800 ms when N=33. The performance of the hash table is O(1), so presumably we can extrapolate from
thatthat it's the same for any N.<br /><br /></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Ithink that
meansthat we should just completely replace the list with the hash table. The difference with a small N is lost in
noise,so there's no point in keeping the list as a fast path for small N. That'll make the patch somewhat simpler.<span
class=""><fontcolor="#888888"><br /> - Heikki<br /></font></span></blockquote></div><br /></div><div
class="gmail_extra">Ihad thought that maybe the biggest type of workloads might only touch 1 or 2 sequences, though it
maybe small but I had thought there would be an overhead in both cycles and memory usage in creating a hash table for
theselight usages of sequence backends. It would certainly make the patch more simple by removing this and it would
alsomean that I could remove the sometimes unused ->next member from the SeqTableData struct which is just now set
toNULL when in hash table mode. If you think it's the way to go then I can make the change, though maybe I'll hold off
therefactor for now as it looks like other ideas have come up around rel cache.</div><div class="gmail_extra"><br
/></div><divclass="gmail_extra">Regards</div><div class="gmail_extra"><br /></div><div class="gmail_extra">David
Rowley</div></div>

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Anybody using get_eclass_for_sort_expr in an extension?
Next
From: "Tomas Vondra"
Date:
Subject: Re: strncpy is not a safe version of strcpy