Thread: LIMIT/SORT optimization
Earlier I mentioned I had implemented the limited sort optimization. This kicks in quite frequently on web applications that implement paging. Currently Postgres has to sort the entire data set, often in a disk sort. If the page is early in the result set it's a lot more efficient to just keep the top n records in memory and do a single pass through the result set. The patch is quite simple and is mostly localized to tuplesort.c which provides a new function to advise it of the maximum necessary. It uses the existing heapsort functionality which makes it easy to keep the top n records by peeking at the top element of the heap and removing it if the new record would displace it. In experimenting I found heap sort about half the speed of quicksort so I made it switch over to heap sort if the input size reached 2x the specified maximum or if it can avoid spilling to disk by switching. The two open issues (which are arguably the same issue) is how to get the information down to the sort node and how to cost the plan. Currently it's a bit of a hack: the Limit node peeks at its child and if it's a sort it calls a special function to provide the limit. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Attachment
On Wed, 2007-02-07 at 10:49 +0000, Gregory Stark wrote: > The two open issues (which are arguably the same issue) is how to get > the information down to the sort node and how to cost the plan. > Currently it's a bit of a hack: the Limit node peeks at its child and > if it's a sort it calls a special function to provide the limit. We can't lose the LIMIT node altogether, in case we have a paramterised limit or a limit expression, so it does need to be in the executor. Exploiting knowledge about adjacent plan types is already used in the executor. If this seemed like it might be a generic trick, then I'd say implement a generic function, but I don't see that it is. We still want to push LIMIT/Sort down through an APPEND, but this won't help us here - we'd need to do that in the planner. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > On Wed, 2007-02-07 at 10:49 +0000, Gregory Stark wrote: > > The two open issues (which are arguably the same issue) is how to get > > the information down to the sort node and how to cost the plan. > > Currently it's a bit of a hack: the Limit node peeks at its child and > > if it's a sort it calls a special function to provide the limit. > > We can't lose the LIMIT node altogether, in case we have a paramterised > limit or a limit expression, so it does need to be in the executor. Right. The LIMIT node also implements offset and handles tricky border cases such as cursors that move past the edges. It would be pointless to duplicate the logic in tuplesort.c. The idea is to advise tuplesort.c when it can save work by not sorting more work than necessary, not duplicate the work of Limit. > Exploiting knowledge about adjacent plan types is already used in the > executor. If this seemed like it might be a generic trick, then I'd say > implement a generic function, but I don't see that it is. > > We still want to push LIMIT/Sort down through an APPEND, but this won't > help us here - we'd need to do that in the planner. Hm, that's exactly the type of situation I was afraid of needing to have the information to propagate farther than an immediate child node and with more sophisticated rules. However as you point out that can be handled by doing optimizations that modify the plan tree. That keeps the scope of the optimization to a minimum: sort nodes directly under limit nodes. That's probably a better approach. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Is there a newer version of this patch? --------------------------------------------------------------------------- Gregory Stark wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > > On Wed, 2007-02-07 at 10:49 +0000, Gregory Stark wrote: > > > The two open issues (which are arguably the same issue) is how to get > > > the information down to the sort node and how to cost the plan. > > > Currently it's a bit of a hack: the Limit node peeks at its child and > > > if it's a sort it calls a special function to provide the limit. > > > > We can't lose the LIMIT node altogether, in case we have a paramterised > > limit or a limit expression, so it does need to be in the executor. > > Right. The LIMIT node also implements offset and handles tricky border cases > such as cursors that move past the edges. It would be pointless to duplicate > the logic in tuplesort.c. The idea is to advise tuplesort.c when it can save > work by not sorting more work than necessary, not duplicate the work of Limit. > > > Exploiting knowledge about adjacent plan types is already used in the > > executor. If this seemed like it might be a generic trick, then I'd say > > implement a generic function, but I don't see that it is. > > > > We still want to push LIMIT/Sort down through an APPEND, but this won't > > help us here - we'd need to do that in the planner. > > Hm, that's exactly the type of situation I was afraid of needing to have the > information to propagate farther than an immediate child node and with more > sophisticated rules. However as you point out that can be handled by doing > optimizations that modify the plan tree. That keeps the scope of the > optimization to a minimum: sort nodes directly under limit nodes. That's > probably a better approach. > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Is there a newer version of this patch? As requested, I've cut an updated version of this patch against CVS HEAD: http://community.enterprisedb.com/sort-limit-v5.patch.gz The open issues I see are: Adjusting the costing of the sort to take into account the optimization Whether the communication between the Limit node and the Sort node is kosher or whether something more abstract is needed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Gregory Stark" <stark@enterprisedb.com> writes: > Bruce Momjian <bruce@momjian.us> writes: > >> Is there a newer version of this patch? > > As requested, I've cut an updated version of this patch against CVS HEAD: > > http://community.enterprisedb.com/sort-limit-v5.patch.gz Someone asked why I've been posting links rather than attachments. The only reason was because when I posted patches in the past they were dropped by the mailing list. I would say "refused" except I never received a bounce, the messages just never appeared on list or in the archive. I'll try attaching this patch again, which is relatively small compared to the recursive query patches and packed varlena patches which disappeared into the ether. Also, this one is gzipped whereas in the past I usually attached patches uncompressed so people could read them without saving and uncompressing them. Perhaps one of those differences is the source of the problem? Do people prefer receiving attachments or downloadable links? Does the answer change if the patches are quite large? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Attachment
Gregory Stark <stark@enterprisedb.com> writes: > Do people prefer receiving attachments or downloadable links? > Does the answer change if the patches are quite large? Links suck from an archival standpoint; but at the same time you need to pay some attention to the size of your email. I think the current threshold for moderator approval is somewhere between 50K and 100K (on patches; less on our other lists). gzipping large patches is encouraged --- if people's mail readers need help in viewing such an attachment, that's not your problem. regards, tom lane
Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: >> Do people prefer receiving attachments or downloadable links? >> Does the answer change if the patches are quite large? > > Links suck from an archival standpoint; but at the same time you need > to pay some attention to the size of your email. I think the current > threshold for moderator approval is somewhere between 50K and 100K > (on patches; less on our other lists). gzipping large patches is > encouraged --- if people's mail readers need help in viewing such > an attachment, that's not your problem. IIRC, when a patch gets held back, you get a notification. The problem has been with mails that are silently dropped. AFAIK, that has happened outside of mailman, somewhere else in the mail system. For example, we used to drop anything that was a .tar (including .tar.gz), and last I checked we still do that. But admittedly that was some time ago, but I've seen no statement that it has been fixed. (plain gzip usually worked fine, but .tar.gz to include a couple of new files got silently dropped. For example, I tried sending my vcbuild patches at least 4 times before I realized they were silently dropped) So I'd avoid anything other than plaintext or plaintext.gz. //Magnus
On Wed, 2007-03-14 at 15:16 +0000, Gregory Stark wrote: > Do people prefer receiving attachments or downloadable links? Attachments are very clearly submissions to the project. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- Gregory Stark wrote: > > "Gregory Stark" <stark@enterprisedb.com> writes: > > > Bruce Momjian <bruce@momjian.us> writes: > > > >> Is there a newer version of this patch? > > > > As requested, I've cut an updated version of this patch against CVS HEAD: > > > > http://community.enterprisedb.com/sort-limit-v5.patch.gz > > Someone asked why I've been posting links rather than attachments. The only > reason was because when I posted patches in the past they were dropped by the > mailing list. I would say "refused" except I never received a bounce, the > messages just never appeared on list or in the archive. > > I'll try attaching this patch again, which is relatively small compared to the > recursive query patches and packed varlena patches which disappeared into the > ether. Also, this one is gzipped whereas in the past I usually attached > patches uncompressed so people could read them without saving and > uncompressing them. Perhaps one of those differences is the source of the > problem? > > Do people prefer receiving attachments or downloadable links? > Does the answer change if the patches are quite large? > [ Attachment, skipping... ] > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > > Your patch has been added to the PostgreSQL unapplied patches list at: > > http://momjian.postgresql.org/cgi-bin/pgpatches > > It will be applied as soon as one of the PostgreSQL committers reviews > and approves it. Did Greg push a version which didn't carry the "WIP" label to it? As far as I remember he was still asking how to make the Sort and Limit nodes communicate. > Gregory Stark wrote: > > > > "Gregory Stark" <stark@enterprisedb.com> writes: > > > > > Bruce Momjian <bruce@momjian.us> writes: > > > > > >> Is there a newer version of this patch? > > > > > > As requested, I've cut an updated version of this patch against CVS HEAD: > > > > > > http://community.enterprisedb.com/sort-limit-v5.patch.gz -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Bruce Momjian wrote: > > > > Your patch has been added to the PostgreSQL unapplied patches list at: > > > > http://momjian.postgresql.org/cgi-bin/pgpatches > > > > It will be applied as soon as one of the PostgreSQL committers reviews > > and approves it. > > Did Greg push a version which didn't carry the "WIP" label to it? As > far as I remember he was still asking how to make the Sort and Limit > nodes communicate. Good question. I asked for a new version of this patch and the WIP was only in the email subject line. Greg, is this ready for review? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
>> Did Greg push a version which didn't carry the "WIP" label to it? As >> far as I remember he was still asking how to make the Sort and Limit >> nodes communicate. > > Good question. I asked for a new version of this patch and the WIP was > only in the email subject line. Greg, is this ready for review? Well my question was whether it was acceptable to do things this way or whether there was a better way. If it's the right way then sure, if not then no. I guess that's what review is for. In short I'm not sure what constitutes "ready for review". Are you asking if it's ready to apply? I don't know, that's why we have reviews. Or are you asking if it's ready for someone to look at? What's the point of posting WIP patches if you don't want someone to look at it? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > >> Did Greg push a version which didn't carry the "WIP" label to it? As > >> far as I remember he was still asking how to make the Sort and Limit > >> nodes communicate. > > > > Good question. I asked for a new version of this patch and the WIP was > > only in the email subject line. Greg, is this ready for review? > Well my question was whether it was acceptable to do things this way or > whether there was a better way. If it's the right way then sure, if not then > no. I guess that's what review is for. > In short I'm not sure what constitutes "ready for review". Are you asking if > it's ready to apply? I don't know, that's why we have reviews. Or are you > asking if it's ready for someone to look at? What's the point of posting WIP > patches if you don't want someone to look at it? I am asking if _you_ are done working on it. Seems you are, so I will add it to the patch queue. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Some comments on the patch below. Gregory Stark wrote: > + /* tuplesort_set_bound - External API to set a bound on a tuplesort > + * > + * Must be called before inserting any tuples. > + > + * Sets a maximum number of tuples the caller is interested in. The first > + * <bound> tuples are maintained using a simple insertion sort and returned > + * normally. Any tuples that lie after those in the sorted result set are > + * simply thrown out > + */ The "Must be called before inserting any tuples" is in contradiction with the comment in the header file: > + /* This can be called at any time before performsort to advise tuplesort that > + * only this many tuples are interesting. If that many tuples fit in memory and > + * we haven't already overflowed to disk then tuplesort will switch to a simple > + * insertion sort or heap sort and throw away the uninteresting tuples. > + */ The latter seems to be correct. > ! /* > ! * Convert the existing unordered list of sorttuples to a heap in either order. > ! * This used to be inline but now there are three separate places we heap sort > ! * (initializing the tapes, if we have a bounded output, and any time the user > ! * says he doesn't want to use glibc's qsort). > ! * > ! * NOTE heapify passes false for checkIndex (and stores a constant tupindex > ! * passed as a parameter) even though we use heaps for multi-run sources > ! * because we only heapify when we're doing in-memory sorts or in inittapes > ! * before there's any point in comparing tupindexes. > ! */ > ! > ! static void > ! tuplesort_heapify(Tuplesortstate *state, int tupindex, HeapOrder heaporder) > ! { The comment claims that we use heap sort when the user says he doesn't want to use glibc's qsort. I recall that we always use our own qsort implementation nowadays. And we never used the heap sort as a qsort replacement, did we? In performsort, you convert the in-memory heap to a sorted list in one go. I wonder if it would be better to switch to a new TSS_ALLINHEAP state that means "all tuples are now in the in-memory heap", and call tuplesort_heap_siftup in gettuple. It probably doesn't make much difference in most cases, but if there's another limit node in the plan with a smaller limit or the client only fetches a few top rows with a cursor you'd avoid unheapifying tuples that are just thrown away later. There's a few blocks of code surrounded with "#if 0 - #endif". Are those just leftovers that should be removed, or are things that still need to finished and enabled? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
"Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Some comments on the patch below. Thanks! > Gregory Stark wrote: > > > > The comment claims that we use heap sort when the user says he doesn't want to > use glibc's qsort. I recall that we always use our own qsort implementation > nowadays. And we never used the heap sort as a qsort replacement, did we? Thanks, I had a version that used heap sort instead of qsort but that was before I discovered what you said. So I stripped that useless bit out. > In performsort, you convert the in-memory heap to a sorted list in one go. I > wonder if it would be better to switch to a new TSS_ALLINHEAP state that means > "all tuples are now in the in-memory heap", and call tuplesort_heap_siftup in > gettuple. The problem is that the heap is backwards. The head of the heap is the greatest, ie, the last element we want to return. Hm, Is there such a thing as a two-way heap? > There's a few blocks of code surrounded with "#if 0 - #endif". Are those just > leftovers that should be removed, or are things that still need to finished and > enabled? Uhm, I don't remember, will go look, thanks. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Updated patch attached: 1) Removes #if 0 optimizations 2) Changes #if 0 to #if NOT_USED for code that's there for completeness and to keep the code self-documenting purposes rather but isn't needed by anything currently 3) Fixed cost model to represent bounded sorts "Gregory Stark" <stark@enterprisedb.com> writes: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > >> There's a few blocks of code surrounded with "#if 0 - #endif". Are those just >> leftovers that should be removed, or are things that still need to finished and >> enabled? > > Uhm, I don't remember, will go look, thanks. Ok, they were left over code from an optimization that I decided wasn't very important to pursue. The code that was ifdef'd out detected when disk sorts could abort a disk sort merge because it had already generated enough tuples for to satisfy the limit. But I never wrote the code to actually abort the run and it looks a bit tricky. In any case the disk sort use case is extremely narrow, you would need something like "LIMIT 50000" or more to do it and it would have to be a an input table huge enough to cause multiple rounds of merges. I think I've figured out how to adjust the cost model. It turns out that it doesn't usually matter whether the cost model is correct since any case where the optimization kicks in is a case you're reading a small portion of the input so it's a case where an index would be *much* better if available. So the only times the optimization is used is when there's no index available. Nonetheless it's nice to get the estimates right so that higher levels in the plan get reasonable values. I think I figured out how to do the cost model. At least the results are reasonable. I'm not sure if I've done it the "right" way though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Attachment
I did some performance testing of the patch, and the results were good. I did this: test=> CREATE TABLE test (x INTEGER); test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000); test=> SET log_min_duration_statement = 0; test=> SELECT * FROM test ORDER BY x LIMIT 3; and the results where, before the patch, for three runs: LOG: duration: 1753.518 ms statement: select * from test order by x limit 3; LOG: duration: 1766.019 ms statement: select * from test order by x limit 3; LOG: duration: 1777.520 ms statement: select * from test order by x limit 3; and after the patch: LOG: duration: 449.649 ms statement: select * from test order by x limit 3; LOG: duration: 443.450 ms statement: select * from test order by x limit 3; LOG: duration: 443.086 ms statement: select * from test order by x limit 3; --------------------------------------------------------------------------- Gregory Stark wrote: > > Updated patch attached: > > 1) Removes #if 0 optimizations > > 2) Changes #if 0 to #if NOT_USED for code that's there for completeness and to > keep the code self-documenting purposes rather but isn't needed by anything > currently > > 3) Fixed cost model to represent bounded sorts > > [ Attachment, skipping... ] > > > "Gregory Stark" <stark@enterprisedb.com> writes: > > > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > > > >> There's a few blocks of code surrounded with "#if 0 - #endif". Are those just > >> leftovers that should be removed, or are things that still need to finished and > >> enabled? > > > > Uhm, I don't remember, will go look, thanks. > > Ok, they were left over code from an optimization that I decided wasn't very > important to pursue. The code that was ifdef'd out detected when disk sorts > could abort a disk sort merge because it had already generated enough tuples > for to satisfy the limit. > > But I never wrote the code to actually abort the run and it looks a bit > tricky. In any case the disk sort use case is extremely narrow, you would need > something like "LIMIT 50000" or more to do it and it would have to be a an > input table huge enough to cause multiple rounds of merges. > > > I think I've figured out how to adjust the cost model. It turns out that it > doesn't usually matter whether the cost model is correct since any case where > the optimization kicks in is a case you're reading a small portion of the > input so it's a case where an index would be *much* better if available. So > the only times the optimization is used is when there's no index available. > Nonetheless it's nice to get the estimates right so that higher levels in the > plan get reasonable values. > > I think I figured out how to do the cost model. At least the results are > reasonable. I'm not sure if I've done it the "right" way though. > > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > I did some performance testing of the patch, and the results were good. > I did this: > test=> CREATE TABLE test (x INTEGER); > test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000); > test=> SET log_min_duration_statement = 0; > test=> SELECT * FROM test ORDER BY x LIMIT 3; LIMIT 3 seems an awfully favorable case; if the patch can only manage a factor of 4 speedup there, what happens at limit 10, 20, 100? Also, you've tested only one sort size and only one (unspecified) value of work_mem, and the usefulness of the patch would surely vary depending on that. In particular, what happens with a LIMIT large enough to overflow work_mem? Lastly, I suspect that sorting presorted input might be particularly favorable for this patch. Please try it with random data for comparison. regards, tom lane
On Sat, 2007-04-07 at 14:11 -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I did some performance testing of the patch, and the results were good. > > I did this: > > > test=> CREATE TABLE test (x INTEGER); > > test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000); > > test=> SET log_min_duration_statement = 0; > > test=> SELECT * FROM test ORDER BY x LIMIT 3; > > LIMIT 3 seems an awfully favorable case; if the patch can only manage a > factor of 4 speedup there, what happens at limit 10, 20, 100? Also, > you've tested only one sort size and only one (unspecified) value of > work_mem, and the usefulness of the patch would surely vary depending on > that. In particular, what happens with a LIMIT large enough to overflow > work_mem? Yeh, this is really designed to improve the case where we retrieve a "screenfull" of data. i.e. 25, 50 or 100 records. Or worst case 10 screenfulls. The code deliberately doesn't use an insertion sort for that reason, since that is beyond the cut-off where that works best. So it should be optimised for medium numbers of rows when no index is present. The use case is important because we want to be able to populate data for screens in a reasonably bounded time, not one that gets suddenly worse should the number of possible matches exceed work_mem. [Think how well Google reacts to varying numbers of candidate matches] Whatever happens with LIMIT > work_mem doesn't fit the use case, so as long as it is no slower than what we have now, that should be fine. > Lastly, I suspect that sorting presorted input might be particularly > favorable for this patch. Please try it with random data for comparison. Agreed. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
I reran the test using: test=> CREATE TABLE test (x INTEGER); test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000); test=> SET log_min_duration_statement = 0; and got on an unpatched system: 1751.320 ms select * from (select * from test order by x limit 3) as x limit 1; 1725.092 ms select * from (select * from test order by x limit 3) as x limit 1; 1709.463 ms select * from (select * from test order by x limit 3) as x limit 1; 1702.917 ms select * from (select * from test order by x limit 10) as x limit 1; 1705.793 ms select * from (select * from test order by x limit 10) as x limit 1; 1704.046 ms select * from (select * from test order by x limit 10) as x limit 1; 1699.730 ms select * from (select * from test order by x limit 100) as x limit 1; 1712.628 ms select * from (select * from test order by x limit 100) as x limit 1; 1699.454 ms select * from (select * from test order by x limit 100) as x limit 1; 1720.207 ms select * from (select * from test order by x limit 1000) as x limit 1; 1725.519 ms select * from (select * from test order by x limit 1000) as x limit 1; 1728.933 ms select * from (select * from test order by x limit 1000) as x limit 1; 1699.609 ms select * from (select * from test order by x limit 10000) as x limit 1; 1698.386 ms select * from (select * from test order by x limit 10000) as x limit 1; 1698.985 ms select * from (select * from test order by x limit 10000) as x limit 1; 1700.740 ms select * from (select * from test order by x limit 100000) as x limit 1; 1700.989 ms select * from (select * from test order by x limit 100000) as x limit 1; 1695.771 ms select * from (select * from test order by x limit 100000) as x limit 1; which is expected because the sort work is constant. With the patch I see: 433.892 ms select * from (select * from test order by x limit 3) as x limit 1; 496.016 ms select * from (select * from test order by x limit 3) as x limit 1; 434.604 ms select * from (select * from test order by x limit 3) as x limit 1; 433.265 ms select * from (select * from test order by x limit 10) as x limit 1; 432.058 ms select * from (select * from test order by x limit 10) as x limit 1; 431.329 ms select * from (select * from test order by x limit 10) as x limit 1; 429.722 ms select * from (select * from test order by x limit 100) as x limit 1; 434.754 ms select * from (select * from test order by x limit 100) as x limit 1; 429.758 ms select * from (select * from test order by x limit 100) as x limit 1; 432.060 ms select * from (select * from test order by x limit 1000) as x limit 1; 432.523 ms select * from (select * from test order by x limit 1000) as x limit 1; 433.917 ms select * from (select * from test order by x limit 1000) as x limit 1; 449.885 ms select * from (select * from test order by x limit 10000) as x limit 1; 450.182 ms select * from (select * from test order by x limit 10000) as x limit 1; 450.536 ms select * from (select * from test order by x limit 10000) as x limit 1; 1771.807 ms select * from (select * from test order by x limit 100000) as x limit 1; 1746.628 ms select * from (select * from test order by x limit 100000) as x limit 1; 1795.600 ms select * from (select * from test order by x limit 100000) as x limit 1; The patch is faster until we hit 100k or 10% of the table, at which point it is the same speed. What is interesting is 1M is also the same speed: 1756.401 ms select * from (select * from test order by x limit 1000000) as x limit 1; 1744.104 ms select * from (select * from test order by x limit 1000000) as x limit 1; 1734.198 ms select * from (select * from test order by x limit 1000000) as x limit 1; This is with the default work_mem of '1M'. I used LIMIT 1 so the times were not affected by the size of the data transfer to the client. --------------------------------------------------------------------------- Bruce Momjian wrote: > > I did some performance testing of the patch, and the results were good. > I did this: > > test=> CREATE TABLE test (x INTEGER); > test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000); > test=> SET log_min_duration_statement = 0; > test=> SELECT * FROM test ORDER BY x LIMIT 3; > > and the results where, before the patch, for three runs: > > LOG: duration: 1753.518 ms statement: select * from test order by x limit 3; > LOG: duration: 1766.019 ms statement: select * from test order by x limit 3; > LOG: duration: 1777.520 ms statement: select * from test order by x limit 3; > > and after the patch: > > LOG: duration: 449.649 ms statement: select * from test order by x limit 3; > LOG: duration: 443.450 ms statement: select * from test order by x limit 3; > LOG: duration: 443.086 ms statement: select * from test order by x limit 3; > > --------------------------------------------------------------------------- > > Gregory Stark wrote: > > > > Updated patch attached: > > > > 1) Removes #if 0 optimizations > > > > 2) Changes #if 0 to #if NOT_USED for code that's there for completeness and to > > keep the code self-documenting purposes rather but isn't needed by anything > > currently > > > > 3) Fixed cost model to represent bounded sorts > > > > > > [ Attachment, skipping... ] > > > > > > > "Gregory Stark" <stark@enterprisedb.com> writes: > > > > > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > > > > > >> There's a few blocks of code surrounded with "#if 0 - #endif". Are those just > > >> leftovers that should be removed, or are things that still need to finished and > > >> enabled? > > > > > > Uhm, I don't remember, will go look, thanks. > > > > Ok, they were left over code from an optimization that I decided wasn't very > > important to pursue. The code that was ifdef'd out detected when disk sorts > > could abort a disk sort merge because it had already generated enough tuples > > for to satisfy the limit. > > > > But I never wrote the code to actually abort the run and it looks a bit > > tricky. In any case the disk sort use case is extremely narrow, you would need > > something like "LIMIT 50000" or more to do it and it would have to be a an > > input table huge enough to cause multiple rounds of merges. > > > > > > I think I've figured out how to adjust the cost model. It turns out that it > > doesn't usually matter whether the cost model is correct since any case where > > the optimization kicks in is a case you're reading a small portion of the > > input so it's a case where an index would be *much* better if available. So > > the only times the optimization is used is when there's no index available. > > Nonetheless it's nice to get the estimates right so that higher levels in the > > plan get reasonable values. > > > > I think I figured out how to do the cost model. At least the results are > > reasonable. I'm not sure if I've done it the "right" way though. > > > > > > -- > > Gregory Stark > > EnterpriseDB http://www.enterprisedb.com > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: explain analyze is your friend > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Oh, sorry, forgot to do a random table test. The test used: DROP TABLE test; CREATE TABLE test (x INTEGER); INSERT INTO test SELECT random()*1000000 FROM generate_series(1, 1000000); As expected the unpatched version is consistent for all LIMIT values (first query was slow due to load after INSERT): 14567.074 ms select * from (select * from test order by x limit 3) as x limit 1; 4031.029 ms select * from (select * from test order by x limit 3) as x limit 1; 3612.417 ms select * from (select * from test order by x limit 3) as x limit 1; 3505.966 ms select * from (select * from test order by x limit 10) as x limit 1; 3707.830 ms select * from (select * from test order by x limit 10) as x limit 1; 3619.410 ms select * from (select * from test order by x limit 10) as x limit 1; 5548.770 ms select * from (select * from test order by x limit 100) as x limit 1; 3839.660 ms select * from (select * from test order by x limit 100) as x limit 1; 4098.445 ms select * from (select * from test order by x limit 100) as x limit 1; 3677.659 ms select * from (select * from test order by x limit 1000) as x limit 1; 3956.980 ms select * from (select * from test order by x limit 1000) as x limit 1; 3824.934 ms select * from (select * from test order by x limit 1000) as x limit 1; 4641.589 ms select * from (select * from test order by x limit 10000) as x limit 1; 4057.902 ms select * from (select * from test order by x limit 10000) as x limit 1; 4682.779 ms select * from (select * from test order by x limit 10000) as x limit 1; 4032.351 ms select * from (select * from test order by x limit 100000) as x limit 1; 4572.528 ms select * from (select * from test order by x limit 100000) as x limit 1; 4985.500 ms select * from (select * from test order by x limit 100000) as x limit 1; 4942.422 ms select * from (select * from test order by x limit 1000000) as x limit 1; 4669.230 ms select * from (select * from test order by x limit 1000000) as x limit 1; 4639.258 ms select * from (select * from test order by x limit 1000000) as x limit 1; and with the patch: 1731.234 ms select * from (select * from test order by x limit 3) as x limit 1; 570.315 ms select * from (select * from test order by x limit 3) as x limit 1; 430.119 ms select * from (select * from test order by x limit 3) as x limit 1; 431.580 ms select * from (select * from test order by x limit 10) as x limit 1; 431.253 ms select * from (select * from test order by x limit 10) as x limit 1; 432.112 ms select * from (select * from test order by x limit 10) as x limit 1; 433.536 ms select * from (select * from test order by x limit 100) as x limit 1; 433.115 ms select * from (select * from test order by x limit 100) as x limit 1; 432.478 ms select * from (select * from test order by x limit 100) as x limit 1; 442.886 ms select * from (select * from test order by x limit 1000) as x limit 1; 442.133 ms select * from (select * from test order by x limit 1000) as x limit 1; 444.905 ms select * from (select * from test order by x limit 1000) as x limit 1; 522.782 ms select * from (select * from test order by x limit 10000) as x limit 1; 521.481 ms select * from (select * from test order by x limit 10000) as x limit 1; 521.526 ms select * from (select * from test order by x limit 10000) as x limit 1; 3317.216 ms select * from (select * from test order by x limit 100000) as x limit 1; 3365.467 ms select * from (select * from test order by x limit 100000) as x limit 1; 3355.447 ms select * from (select * from test order by x limit 100000) as x limit 1; 3307.745 ms select * from (select * from test order by x limit 1000000) as x limit 1; 3315.602 ms select * from (select * from test order by x limit 1000000) as x limit 1; 3585.736 ms select * from (select * from test order by x limit 1000000) as x limit 1; --------------------------------------------------------------------------- Bruce Momjian wrote: > > I reran the test using: > > test=> CREATE TABLE test (x INTEGER); > test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000); > test=> SET log_min_duration_statement = 0; > > and got on an unpatched system: > > 1751.320 ms select * from (select * from test order by x limit 3) as x limit 1; > 1725.092 ms select * from (select * from test order by x limit 3) as x limit 1; > 1709.463 ms select * from (select * from test order by x limit 3) as x limit 1; > 1702.917 ms select * from (select * from test order by x limit 10) as x limit 1; > 1705.793 ms select * from (select * from test order by x limit 10) as x limit 1; > 1704.046 ms select * from (select * from test order by x limit 10) as x limit 1; > 1699.730 ms select * from (select * from test order by x limit 100) as x limit 1; > 1712.628 ms select * from (select * from test order by x limit 100) as x limit 1; > 1699.454 ms select * from (select * from test order by x limit 100) as x limit 1; > 1720.207 ms select * from (select * from test order by x limit 1000) as x limit 1; > 1725.519 ms select * from (select * from test order by x limit 1000) as x limit 1; > 1728.933 ms select * from (select * from test order by x limit 1000) as x limit 1; > 1699.609 ms select * from (select * from test order by x limit 10000) as x limit 1; > 1698.386 ms select * from (select * from test order by x limit 10000) as x limit 1; > 1698.985 ms select * from (select * from test order by x limit 10000) as x limit 1; > 1700.740 ms select * from (select * from test order by x limit 100000) as x limit 1; > 1700.989 ms select * from (select * from test order by x limit 100000) as x limit 1; > 1695.771 ms select * from (select * from test order by x limit 100000) as x limit 1; > > which is expected because the sort work is constant. With the patch I > see: > > 433.892 ms select * from (select * from test order by x limit 3) as x limit 1; > 496.016 ms select * from (select * from test order by x limit 3) as x limit 1; > 434.604 ms select * from (select * from test order by x limit 3) as x limit 1; > 433.265 ms select * from (select * from test order by x limit 10) as x limit 1; > 432.058 ms select * from (select * from test order by x limit 10) as x limit 1; > 431.329 ms select * from (select * from test order by x limit 10) as x limit 1; > 429.722 ms select * from (select * from test order by x limit 100) as x limit 1; > 434.754 ms select * from (select * from test order by x limit 100) as x limit 1; > 429.758 ms select * from (select * from test order by x limit 100) as x limit 1; > 432.060 ms select * from (select * from test order by x limit 1000) as x limit 1; > 432.523 ms select * from (select * from test order by x limit 1000) as x limit 1; > 433.917 ms select * from (select * from test order by x limit 1000) as x limit 1; > 449.885 ms select * from (select * from test order by x limit 10000) as x limit 1; > 450.182 ms select * from (select * from test order by x limit 10000) as x limit 1; > 450.536 ms select * from (select * from test order by x limit 10000) as x limit 1; > 1771.807 ms select * from (select * from test order by x limit 100000) as x limit 1; > 1746.628 ms select * from (select * from test order by x limit 100000) as x limit 1; > 1795.600 ms select * from (select * from test order by x limit 100000) as x limit 1; > > The patch is faster until we hit 100k or 10% of the table, at which > point it is the same speed. What is interesting is 1M is also the same > speed: > > 1756.401 ms select * from (select * from test order by x limit 1000000) as x limit 1; > 1744.104 ms select * from (select * from test order by x limit 1000000) as x limit 1; > 1734.198 ms select * from (select * from test order by x limit 1000000) as x limit 1; > > This is with the default work_mem of '1M'. I used LIMIT 1 so the times > were not affected by the size of the data transfer to the client. > > > --------------------------------------------------------------------------- > > Bruce Momjian wrote: > > > > I did some performance testing of the patch, and the results were good. > > I did this: > > > > test=> CREATE TABLE test (x INTEGER); > > test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000); > > test=> SET log_min_duration_statement = 0; > > test=> SELECT * FROM test ORDER BY x LIMIT 3; > > > > and the results where, before the patch, for three runs: > > > > LOG: duration: 1753.518 ms select * from test order by x limit 3; > > LOG: duration: 1766.019 ms select * from test order by x limit 3; > > LOG: duration: 1777.520 ms select * from test order by x limit 3; > > > > and after the patch: > > > > LOG: duration: 449.649 ms select * from test order by x limit 3; > > LOG: duration: 443.450 ms select * from test order by x limit 3; > > LOG: duration: 443.086 ms select * from test order by x limit 3; > > > > --------------------------------------------------------------------------- > > > > Gregory Stark wrote: > > > > > > Updated patch attached: > > > > > > 1) Removes #if 0 optimizations > > > > > > 2) Changes #if 0 to #if NOT_USED for code that's there for completeness and to > > > keep the code self-documenting purposes rather but isn't needed by anything > > > currently > > > > > > 3) Fixed cost model to represent bounded sorts > > > > > > > > > > [ Attachment, skipping... ] > > > > > > > > > > > "Gregory Stark" <stark@enterprisedb.com> writes: > > > > > > > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > > > > > > > >> There's a few blocks of code surrounded with "#if 0 - #endif". Are those just > > > >> leftovers that should be removed, or are things that still need to finished and > > > >> enabled? > > > > > > > > Uhm, I don't remember, will go look, thanks. > > > > > > Ok, they were left over code from an optimization that I decided wasn't very > > > important to pursue. The code that was ifdef'd out detected when disk sorts > > > could abort a disk sort merge because it had already generated enough tuples > > > for to satisfy the limit. > > > > > > But I never wrote the code to actually abort the run and it looks a bit > > > tricky. In any case the disk sort use case is extremely narrow, you would need > > > something like "LIMIT 50000" or more to do it and it would have to be a an > > > input table huge enough to cause multiple rounds of merges. > > > > > > > > > I think I've figured out how to adjust the cost model. It turns out that it > > > doesn't usually matter whether the cost model is correct since any case where > > > the optimization kicks in is a case you're reading a small portion of the > > > input so it's a case where an index would be *much* better if available. So > > > the only times the optimization is used is when there's no index available. > > > Nonetheless it's nice to get the estimates right so that higher levels in the > > > plan get reasonable values. > > > > > > I think I figured out how to do the cost model. At least the results are > > > reasonable. I'm not sure if I've done it the "right" way though. > > > > > > > > > -- > > > Gregory Stark > > > EnterpriseDB http://www.enterprisedb.com > > > > > > ---------------------------(end of broadcast)--------------------------- > > > TIP 6: explain analyze is your friend > > > > -- > > Bruce Momjian <bruce@momjian.us> http://momjian.us > > EnterpriseDB http://www.enterprisedb.com > > > > + If your life is a hard drive, Christ can be your backup. + > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > choose an index scan if your joining column's datatypes do not > > match > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://www.enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Sat, 2007-04-07 at 22:16 -0400, Bruce Momjian wrote: > > The patch is faster until we hit 100k or 10% of the table, at which > > point it is the same speed. What is interesting is 1M is also the same > > speed: All tests good, AFAICS. Thanks Bruce. Patch is operating as expected: the common case is considerably faster and the uncommon case isn't any slower. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Bruce Momjian <bruce@momjian.us> writes: >> I did some performance testing of the patch, and the results were good. >> I did this: > >> test=> CREATE TABLE test (x INTEGER); >> test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000); >> test=> SET log_min_duration_statement = 0; >> test=> SELECT * FROM test ORDER BY x LIMIT 3; > > LIMIT 3 seems an awfully favorable case; if the patch can only manage a > factor of 4 speedup there, what happens at limit 10, 20, 100? I did a whole bunch of benchmarking and spent a pile of time gathering numbers in gnumeric to make some pretty graphs. Then gnumeric crashed :( It's days like this that make me appreciate our own code review process even if I'm usually wishing it were a bit looser. Until I gather all those numbers together again I'll just describe the executive summary. I found that the limit of a factor of 4 improvement was mostly an artifact of the very narrow and cheap to sort keys. When more of the time is spent pushing around large tuples or in even moderately expensive comparisons the speedup is much greater. When I tested with narrow tuples consisting of only a single integer I found that the maximum speedup was, as Bruce found, about 4x. I think what's going on is that the overhead in the linear factor of just reading in all those tuples is large enough to hide the improvements in sorting. Especially since with such narrow tuples the resulting data is just too small to overflow filesystem cache where the big benefits come. When I tested with text strings ranging from 0-97 characters long in locale en_GB.UTF-8 I found that the speedup kept going up as the input data set grew. Sorting the top 1,000 strings from an input set of 1,000,000 strings went from 8m11s in stock Postgres to 12s using the bounded heapsort. (Which was an even better result than I had prior to fully randomizing the data. It probably just got packed better on disk in the source table.) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com