Thread: [PATCH] binary heap implementation
Hi. There are two or three places in the Postgres source that implement heap sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the BDR code. It seemed reasonable to factor out the functionality. I've attached a patch (binaryheap.diff) that contains a straightforward implementation of a binary heap (originally written by Andres, with a few bugfixes and cleanups by me). There doesn't seem to be any place to put unit tests for code like this, so at Alvaro's suggestion, I have attached a small standalone program I used for testing (testbinheap.c) and a Makefile. If you copy them both to src/backend/lib and run "make -f Makefile.binheap", it'll build the test program. It's nothing exciting, just exercises the functions in various ways. I've also attached a patch (nodeMergeAppend.diff) that converts the code in nodeMergeAppend.c to use binaryheap. It passes "make check", and also the following test (which is planned as a merge append): CREATE OR REPLACE VIEW gen AS SELECT * FROM generate_series(1, 100000) g(i) ORDER BY g.i OFFSET 0; SELECT * FROM ( SELECT * FROM gen UNION ALL SELECT * FROM gen UNION ALL SELECT * FROM gen UNION ALL SELECT * FROM gen UNION ALL SELECT * FROM gen UNION ALL SELECT * FROM gen) s ORDER BY i OFFSET 1000000; Initially there was a slight performance degradation with my patch, but I speeded things up and now my code is at least at par with, and maybe slightly faster than, the old code. On my laptop, both consistently take ~2.4s on average to execute the above SELECT. Comments? Suggestions? -- Abhijit
Attachment
On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote: > There are two or three places in the Postgres source that implement heap > sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the > BDR code. It seemed reasonable to factor out the functionality. This replaces the "simpleheap.[ch]" I had in the last submitted series. > I've attached a patch (binaryheap.diff) that contains a straightforward > implementation of a binary heap (originally written by Andres, with a > few bugfixes and cleanups by me). I want to emphasise that the only good thing about my submitted version was that it actually compiled. So quite a bit of the work here is from Abhijit... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 14, 2012 at 8:11 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > Comments? Suggestions? It could use a run through pgindent. And the header comments are separated by a blank line from the functions to which they are attached, which is not project style. + if (heap->size + 1 == heap->space) + Assert("binary heap is full"); This doesn't really make any sense. Assert("binary heap is full") will never fail, because that expression is a character string which is, therefore, not NULL. I think you want Assert(heap->size < heap->space). Or you could use (heap->size + 1 >= heap->space) elog(LEVEL, message) if there's some reason this needs to be checked in non-debug builds. + { + sift_down(heap, i); + } Project style is to omit braces for a single-line body. This comes up a few other places as well. Other than the coding style issues, I think this looks fine. If you can fix that up, I believe I could go ahead and commit this, unless anyone else objects. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > On Wed, Nov 14, 2012 at 8:11 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > > Comments? Suggestions? > > It could use a run through pgindent. And the header comments are > separated by a blank line from the functions to which they are > attached, which is not project style. Also there are some comments in the .h file which we typically don't carry. > Other than the coding style issues, I think this looks fine. If you > can fix that up, I believe I could go ahead and commit this, unless > anyone else objects. Does this include the changes in nodeMergeappend.c? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Assert(!"description of error") is an idiom I've seen more than once, although I'm not sure I understand why it's not writtenas Robert says with the condition in the brackets (or as a print to STDERR followed by abort() instead).<div class="gmail_extra"><br/><br /><div class="gmail_quote">On 15 November 2012 15:11, Robert Haas <span dir="ltr"><<a href="mailto:robertmhaas@gmail.com"target="_blank">robertmhaas@gmail.com</a>></span> wrote:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> On Wed, Nov 14, 2012 at 8:11 AM,Abhijit Menon-Sen <<a href="mailto:ams@2ndquadrant.com">ams@2ndquadrant.com</a>> wrote:<br /> > Comments? Suggestions?<br/><br /> It could use a run through pgindent. And the header comments are<br /> separated by a blank linefrom the functions to which they are<br /> attached, which is not project style.<br /><br /> + if (heap->size+ 1 == heap->space)<br /> + Assert("binary heap is full");<br /><br /> This doesn't reallymake any sense. Assert("binary heap is full")<br /> will never fail, because that expression is a character stringwhich<br /> is, therefore, not NULL. I think you want Assert(heap->size <<br /> heap->space). Or you coulduse (heap->size + 1 >= heap->space)<br /> elog(LEVEL, message) if there's some reason this needs to be checked<br/> in non-debug builds.<br /><br /> + {<br /> + sift_down(heap, i);<br /> + }<br /><br/> Project style is to omit braces for a single-line body. This comes up<br /> a few other places as well.<br /><br/> Other than the coding style issues, I think this looks fine. If you<br /> can fix that up, I believe I could goahead and commit this, unless<br /> anyone else objects.<br /><span class="HOEnZb"><font color="#888888"><br /> --<br />Robert Haas<br /> EnterpriseDB: <a href="http://www.enterprisedb.com" target="_blank">http://www.enterprisedb.com</a><br/> The Enterprise PostgreSQL Company<br /></font></span><div class="HOEnZb"><divclass="h5"><br /><br /> --<br /> Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br/> To make changes to your subscription:<br/><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></div></div></blockquote></div><br /></div>
On 11/15/2012 10:11 AM, Robert Haas wrote: > + { > + sift_down(heap, i); > + } > > Project style is to omit braces for a single-line body. This comes up > a few other places as well. > I thought we modified that some years ago, although my memory of it is a bit hazy. Personally I think it's a bad rule, at least as stated, in the case where there's an else clause with a compound statement. I'd prefer to see a rule that says that either both branches of an if/else should be compound statements or neither should be. I think the cases where only one branch is a compound statement are rather ugly. cheers andrew
On Thu, Nov 15, 2012 at 10:21 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Other than the coding style issues, I think this looks fine. If you >> can fix that up, I believe I could go ahead and commit this, unless >> anyone else objects. > > Does this include the changes in nodeMergeappend.c? I think so. I was going to double-check the performance before committing, but it doesn't look like there should be a problem. Coding style changes appear needed in that file as well, however. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote: > There are two or three places in the Postgres source that implement heap > sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the > BDR code. It seemed reasonable to factor out the functionality. pg_dump also contains a binary heap implementation if memory serves right which makes me wonder whether we should try binaryheap.[ch] backend clean... It currently uses palloc/pfree. Too bad we don't have a memory allocation routine which easily works in the backend and frontend... palloc references MemoryContextAlloc and CurrentMemoryContext so thats not that easy to emulate. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andrew Dunstan escribió: > > On 11/15/2012 10:11 AM, Robert Haas wrote: > > >+ { > >+ sift_down(heap, i); > >+ } > > > >Project style is to omit braces for a single-line body. This comes up > >a few other places as well. > > I thought we modified that some years ago, although my memory of it > is a bit hazy. No, we only modified pg_indent to not take the braces off, because of PG_TRY blocks. But we keep using single statements instead of compound in many places; but there is no hard rule about it. To me, using braces were they are not needed is pointless and ugly. We're very careful about ensuring that our macro definitions work nicely with single-statement if/else, for example. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 15, 2012 at 11:22 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote: >> There are two or three places in the Postgres source that implement heap >> sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the >> BDR code. It seemed reasonable to factor out the functionality. > > pg_dump also contains a binary heap implementation if memory serves > right which makes me wonder whether we should try binaryheap.[ch] > backend clean... It currently uses palloc/pfree. > > Too bad we don't have a memory allocation routine which easily works in > the backend and frontend... palloc references MemoryContextAlloc and > CurrentMemoryContext so thats not that easy to emulate. I think there's a clear need for a library of code that can be shared by frontend and backend code. I don't necessarily want to punt everything into libpq, though. What I wonder if we might do is create something like src/lib/pgguts that contains routines for memory allocation, error handling, stringinfos (so that you don't have to use PQexpBuffer in the frontend and StringInfo in the backend), etc. and make that usable by both front-end and back-end code. I think that would be a tremendously nice thing to have. I don't think we should make it this patch's problem to do that, but I'd be strongly in favor of such a thing if someone wants to put it together. It's been a huge nuisance for me in the past and all indicates are that the work you're doing on LR is just going to exacerbate that problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2012-11-15 11:37:16 -0500, Robert Haas wrote: > On Thu, Nov 15, 2012 at 11:22 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2012-11-14 18:41:12 +0530, Abhijit Menon-Sen wrote: > >> There are two or three places in the Postgres source that implement heap > >> sort (e.g. tuplesort.c, nodeMergeAppend.c), and it's also needed by the > >> BDR code. It seemed reasonable to factor out the functionality. > > > > pg_dump also contains a binary heap implementation if memory serves > > right which makes me wonder whether we should try binaryheap.[ch] > > backend clean... It currently uses palloc/pfree. > > > > Too bad we don't have a memory allocation routine which easily works in > > the backend and frontend... palloc references MemoryContextAlloc and > > CurrentMemoryContext so thats not that easy to emulate. > I think there's a clear need for a library of code that can be shared > by frontend and backend code. I don't necessarily want to punt > everything into libpq, though. Aggreed. > What I wonder if we might do is create > something like src/lib/pgguts that contains routines for memory > allocation, error handling, stringinfos (so that you don't have to use > PQexpBuffer in the frontend and StringInfo in the backend), etc. and > make that usable by both front-end and back-end code. I think that > would be a tremendously nice thing to have. Yes. But it also sounds like quite a bit of work :( > I don't think we should make it this patch's problem to do that Me neither. I was thinking about letting the "user" allocate enough memory like: binaryheap *heap = palloc(binaryheap_size(/*capacity*/ 10)); binaryheap_init(heap, 10, comparator); But thats pretty ugly. > but > I'd be strongly in favor of such a thing if someone wants to put it > together. It's been a huge nuisance for me in the past and all > indicates are that the work you're doing on LR is just going to > exacerbate that problem. I actually hope I won't have to fight with that all that much, but we will see :/ Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 15, 2012 at 11:50 AM, Andres Freund <andres@2ndquadrant.com> wrote: > Me neither. I was thinking about letting the "user" allocate enough > memory like: > > binaryheap *heap = palloc(binaryheap_size(/*capacity*/ 10)); > binaryheap_init(heap, 10, comparator); > > But thats pretty ugly. Yeah. I would vote against doing that for now. We can always uglify the code later if we decide it's absolutely necessary. One trick that we could potentially use to make code run in the frontend and backend is to put it in src/port. That code gets compiled twice, once within -DFRONTEND and once without. So you can use #ifdef to handle things that need to work different ways. The only real problem with that approach is that you end up putting the code in libpgport where it seems rather out of place. Still, maybe we could have a src/framework directory that uses the same trick to produce a libpgframework that frontend code can use, and a lib pgframework_srv that can be linked into the backend. That's might not actually be that much work; we'd just need to clone all the existing src/port logic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
At 2012-11-15 10:11:58 -0500, robertmhaas@gmail.com wrote: > > It could use a run through pgindent. Done. > I think you want Assert(heap->size < heap->space). Of course. Thanks for catching that. > Project style is to omit braces for a single-line body. I've removed most of them. In the few cases where one branch of an if/else would have a one-line body and the other would not, I chose to rewrite the code to avoid the problem (because, like Andrew, I hate having to do that). I wasn't sure how to present the following: if (right_off < heap->size && heap->compare(&heap->nodes[node_off], &heap->nodes[right_off]) < 0) { /* swap with the larger child */ if (!swap_off || heap->compare(&heap->nodes[left_off], &heap->nodes[right_off]) < 0) { swap_off = right_off; } } …so I left it alone. If you want me to remove the inner braces and resubmit, despite the multi-line condition, let me know. I didn't know which header comments Álvaro meant I should remove, so I haven't done that either. I did remove the TESTBINHEAP stuff, though. I have attached the updated patch here. Thanks for your review. -- Abhijit
Attachment
On Thu, 2012-11-15 at 17:50 +0100, Andres Freund wrote: > Me neither. I was thinking about letting the "user" allocate enough > memory like: > > binaryheap *heap = palloc(binaryheap_size(/*capacity*/ 10)); > binaryheap_init(heap, 10, comparator); > > But thats pretty ugly. Why not pass the allocator in as a function pointer? It could either be an argument to the initialization, or we could make a new global variable that's present inside the server and outside. (Slight complication: palloc is a macro) Regards,Jeff Davis
On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > [ new patch ] I went over this again today with a view to getting it committed, but discovered some compiler warnings that look like this: warning: cast to pointer from integer of different size The problem seems to be that the binary heap implementation wants the key and value to be a void *, but the way it's being used in nodeMergeAppend.c the value is actually an int. I don't think we want to commit a binary heap implementation which has an impedance mismatch of this type with its only client. The obvious solution seems to be to make the key and value each a Datum, and then clients can use WhateverGetDatum and DatumGetWhatever to pass whatever built-in data type they happen to have. I'm trying that approach now and will post an updated patch based on that approach if it seems to work OK and nobody suggests something better between now and then. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2012-11-20 14:03:12 -0500, Robert Haas wrote: > On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > > [ new patch ] > > I went over this again today with a view to getting it committed, but > discovered some compiler warnings that look like this: > > warning: cast to pointer from integer of different size > > The problem seems to be that the binary heap implementation wants the > key and value to be a void *, but the way it's being used in > nodeMergeAppend.c the value is actually an int. I don't think we want > to commit a binary heap implementation which has an impedance mismatch > of this type with its only client. The obvious solution seems to be > to make the key and value each a Datum, and then clients can use > WhateverGetDatum and DatumGetWhatever to pass whatever built-in data > type they happen to have. I'm trying that approach now and will post > an updated patch based on that approach if it seems to work OK and > nobody suggests something better between now and then. Sounds like a good plan, one other alternative would have been declaring the parameters a size_t (and thus explicitly always big enough to store a pointer, but also valid to store inline data) instead of using a void*. But given there is DatumGetPointer/PointerGetDatum et al, using the postgres-y datatype seems fine to me. In the LCR case those really are pointers, so preserving that case is important to me ;), but your proposal does that, so ... Andres
On Tue, Nov 20, 2012 at 2:29 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2012-11-20 14:03:12 -0500, Robert Haas wrote: >> On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: >> > [ new patch ] >> >> I went over this again today with a view to getting it committed, but >> discovered some compiler warnings that look like this: >> >> warning: cast to pointer from integer of different size >> >> The problem seems to be that the binary heap implementation wants the >> key and value to be a void *, but the way it's being used in >> nodeMergeAppend.c the value is actually an int. I don't think we want >> to commit a binary heap implementation which has an impedance mismatch >> of this type with its only client. The obvious solution seems to be >> to make the key and value each a Datum, and then clients can use >> WhateverGetDatum and DatumGetWhatever to pass whatever built-in data >> type they happen to have. I'm trying that approach now and will post >> an updated patch based on that approach if it seems to work OK and >> nobody suggests something better between now and then. > > Sounds like a good plan, one other alternative would have been declaring > the parameters a size_t (and thus explicitly always big enough to store > a pointer, but also valid to store inline data) instead of using a > void*. But given there is DatumGetPointer/PointerGetDatum et al, using > the postgres-y datatype seems fine to me. > > In the LCR case those really are pointers, so preserving that case is > important to me ;), but your proposal does that, so ... On further reflection, I'm wondering if it wouldn't be a smarter idea to just get rid of the binaryheap_node concept altogether. Instead, when you allocate a binary heap, you specify an "element size" as well as a capacity. Then, the binary heap can truly treat the elements as an opaque array of bytes rather than a struct of any particular type. Of course, the disadvantage of this approach is that it's likely to be slightly slower, because we'll need to do a multiplication by a variable rather than simple bit shift. But the extra flexibility might be worth it, because for example for merge append this is kind of inefficient from a storage perspective. We only really need N+1 pointers, but we've got storage for 2N. With the above change plus a user-specified third argument for the comparator function (passed as yet another argument to binaryheap_allocate) we could get around that while preserving the option for LCR or any other client code to do as it sees fit. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2012-11-20 15:06:58 -0500, Robert Haas wrote: > On Tue, Nov 20, 2012 at 2:29 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2012-11-20 14:03:12 -0500, Robert Haas wrote: > >> On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > >> > [ new patch ] > >> > >> I went over this again today with a view to getting it committed, but > >> discovered some compiler warnings that look like this: > >> > >> warning: cast to pointer from integer of different size > >> > >> The problem seems to be that the binary heap implementation wants the > >> key and value to be a void *, but the way it's being used in > >> nodeMergeAppend.c the value is actually an int. I don't think we want > >> to commit a binary heap implementation which has an impedance mismatch > >> of this type with its only client. The obvious solution seems to be > >> to make the key and value each a Datum, and then clients can use > >> WhateverGetDatum and DatumGetWhatever to pass whatever built-in data > >> type they happen to have. I'm trying that approach now and will post > >> an updated patch based on that approach if it seems to work OK and > >> nobody suggests something better between now and then. > > > > Sounds like a good plan, one other alternative would have been declaring > > the parameters a size_t (and thus explicitly always big enough to store > > a pointer, but also valid to store inline data) instead of using a > > void*. But given there is DatumGetPointer/PointerGetDatum et al, using > > the postgres-y datatype seems fine to me. > > > > In the LCR case those really are pointers, so preserving that case is > > important to me ;), but your proposal does that, so ... > On further reflection, I'm wondering if it wouldn't be a smarter idea > to just get rid of the binaryheap_node concept altogether. Instead, > when you allocate a binary heap, you specify an "element size" as well > as a capacity. Then, the binary heap can truly treat the elements as > an opaque array of bytes rather than a struct of any particular type. > Of course, the disadvantage of this approach is that it's likely to be > slightly slower, because we'll need to do a multiplication by a > variable rather than simple bit shift. But the extra flexibility > might be worth it, because for example for merge append this is kind > of inefficient from a storage perspective. We only really need N+1 > pointers, but we've got storage for 2N. With the above change plus a > user-specified third argument for the comparator function (passed as > yet another argument to binaryheap_allocate) we could get around that > while preserving the option for LCR or any other client code to do as > it sees fit. > Thoughts? I don't have a fundamental problem with it, but I don't think it's worth the cost. If you have variable sized (from the point of the compiler) values you either have more complex offset computations to the individual elements or one more indirection due to a pointer array. Do you feel strongly about it? If so, go ahead, otherwise I would prefer the current way (with parameters changed to be Datum's of course). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Nov 20, 2012 at 3:19 PM, Andres Freund <andres@2ndquadrant.com> wrote: > I don't have a fundamental problem with it, but I don't think it's worth > the cost. If you have variable sized (from the point of the compiler) > values you either have more complex offset computations to the > individual elements or one more indirection due to a pointer array. > > Do you feel strongly about it? If so, go ahead, otherwise I would prefer > the current way (with parameters changed to be Datum's of course). Here's a revised patch that takes the approach of just changing void * to Datum; it includes a bunch of other cleanups that I felt were worthwhile. I tested this using the following setup: CREATE TABLE sample (a int, b numeric); DO $$ BEGIN FOR i IN 1 .. 1000 LOOP EXECUTE 'CREATE TABLE sample' || i || ' (CHECK (b = ' || i || ')) INHERITS (sample)'; EXECUTE 'INSERT INTO sample' || i || ' SELECT g, ' || i || ' FROM generate_series(1,100000) g;'; EXECUTE 'ALTER TABLE sample' || i || ' ADD PRIMARY KEY (a)'; END LOOP; END $$; VACUUM ANALYZE; And this test query: select sum(1) from (select * from sample order by a limit 250) x; On my system, on repeated execution, this takes about 113 ms with unpatched master and about 114 ms with the patch. I'm inclined to think that's not worth getting excited about, as it could be simply code shifting around across cache line boundaries and not indicative of an actual regression due to the difference in logic. Even if there is a regression, it's pretty darn small: most people will not have 1000 partitions, and those that do will often find query execution time dominated by other factors. Against that, there is positive value in having this code somewhat more abstracted. With respect to the question of the API, no, I don't feel overwhelmingly strongly that we have to whack the API with a bat that's larger than the one I've used here. On the flip side, it wouldn't surprise me if, as the code acquires more users, we find that we actually do want to make some changes, either the one I already proposed or something else. Fortunately, it's not a lot of code, so if we end up refactoring it some more later, it shouldn't be a big deal. So I'm happy enough to commit this the way I have it and sort out the rest as we go. If there are no objections I'll go ahead and do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 2012-11-20 15:47:25 -0500, Robert Haas wrote: > On Tue, Nov 20, 2012 at 3:19 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > I don't have a fundamental problem with it, but I don't think it's worth > > the cost. If you have variable sized (from the point of the compiler) > > values you either have more complex offset computations to the > > individual elements or one more indirection due to a pointer array. > > > > Do you feel strongly about it? If so, go ahead, otherwise I would prefer > > the current way (with parameters changed to be Datum's of course). > > Here's a revised patch that takes the approach of just changing void * > to Datum; it includes a bunch of other cleanups that I felt were > worthwhile. I tested this using the following setup: > > CREATE TABLE sample (a int, b numeric); > DO $$ > BEGIN > FOR i IN 1 .. 1000 LOOP > EXECUTE 'CREATE TABLE sample' || i || ' (CHECK (b = ' || i > || ')) INHERITS (sample)'; > EXECUTE 'INSERT INTO sample' || i > || ' SELECT g, ' || i || ' FROM > generate_series(1,100000) g;'; > EXECUTE 'ALTER TABLE sample' || i > || ' ADD PRIMARY KEY (a)'; > END LOOP; > END > $$; > VACUUM ANALYZE; > > And this test query: > > select sum(1) from (select * from sample order by a limit 250) x; > > On my system, on repeated execution, this takes about 113 ms with > unpatched master and about 114 ms with the patch. I'm inclined to > think that's not worth getting excited about, as it could be simply > code shifting around across cache line boundaries and not indicative > of an actual regression due to the difference in logic. Even if there > is a regression, it's pretty darn small: most people will not have > 1000 partitions, and those that do will often find query execution > time dominated by other factors. Against that, there is positive > value in having this code somewhat more abstracted. To make sure were not regressing I ran some more testcases that I could think of/find and didn't find anything problematic. Either both are equally fast or the new code is faster. FWIW for me the new code is very slightly faster in your example, but only if I apply it ontop of fklocks (where I applied it accidentally at first), not if ontop of 273986bf0d39e5166eb15ba42ebff4803e23a688 where its reversed, so your code-layout theory sounds likely. > With respect to the question of the API, no, I don't feel > overwhelmingly strongly that we have to whack the API with a bat > that's larger than the one I've used here. On the flip side, it > wouldn't surprise me if, as the code acquires more users, we find that > we actually do want to make some changes, either the one I already > proposed or something else. Fortunately, it's not a lot of code, so > if we end up refactoring it some more later, it shouldn't be a big > deal. So I'm happy enough to commit this the way I have it and sort > out the rest as we go. If there are no objections I'll go ahead and > do that. Looks good to me. Adapting when we have the actual requirements sounds fine & sensible as well... Thanks, Andres --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > Here's a revised patch that takes the approach of just changing void * > to Datum; it includes a bunch of other cleanups that I felt were > worthwhile. The comment in binaryheap.h says * For a max-heap, the comparator must return -1 iff a < b, 0 iff a == b,* and +1 iff a > b. For a min-heap, the conditionsare reversed. Is it actually required that the output be exactly these three values, or is the specification really <0, 0, >0 ? If the latter, it should say that, rather than overdefine the implementation of comparison. Also, wouldn't it be reasonable to const-ify the comparator function, ie typedef int (*binaryheap_comparator) (const binaryheap_node *a, const binaryheap_node *b); * nodes the first of a list of "space" nodes s/list/array/, no? Also, it's standard to mark this sort of hack with /* VARIABLE LENGTH ARRAY */, or even better use FLEXIBLE_ARRAY_MEMBER (which would require fixing the space calculation in binaryheap_allocate, not that that would be a bad thing anyway). left_offset and friends are defined as taking size_t and returning int, which is broken on machines where size_t is wider than int, or even on machines where they're the same width since int is signed. Personally I'd replace pretty much every single usage of size_t in this module with int, as I am unconvinced either that we need 8-byte indexes or that the logic is correct if it tries to form index -1 (as would happen for example with binaryheap_build applied to an empty heap). The Assert() in binaryheap_add_unordered fills me with no confidence. If we're not going to support expansion of the array (which might be a better idea anyway) then this needs to be an actual run-time check, not something that won't be there in production. What I think *would* be worth asserting for is whether the heap is correctly ordered or not --- that is, I think you should add a flag that is cleared by binaryheap_add_unordered and set by binaryheap_build, and then the functions that presume correct ordering should assert it's set. An Assert in binaryheap_replace_first that the heap is not empty seems appropriate as well. This in binaryheap_replace_first is simply bizarre: if (key) heap->nodes[0].key = key;if (val) heap->nodes[0].value = val; Why are these assignments conditional? If they're sane at all, they should not be testing the Datums directly in any case, but the result of DatumGetSomething. static int32 ! heap_compare_slots(binaryheap_node *a, binaryheap_node *b) { + MergeAppendState *node = (MergeAppendState *) a->value; This should use DatumGetPointer(a->value), and the function result should be int not int32 to comport with the typedef for it. While I'm thinking about it, why are the fields of a binaryheap_node called "key" and "value"? That implies a semantics not actually used here. Maybe "value1" and "value2" instead? regards, tom lane
Hi. A brief response to earlier messages in this thread: 1. I agree that it's a good idea to use Datum rather than void *. 2. I don't think it's worth getting rid of binaryheap_node. 3. I agree that it makes sense to go with what we have now (after Robert's reworking of my patch) and reconsider if neededwhen there are more users of the code. 4. I agree with all of Tom's suggestions as well. At 2012-11-20 18:01:10 -0500, tgl@sss.pgh.pa.us wrote: > > Is it actually required that the output be exactly these three values, > or is the specification really <0, 0, >0 ? The code did require -1/0/+1, but I changed it to expect only <0, 0, >0. So you're right, the comment should be changed. > This in binaryheap_replace_first is simply bizarre: > > if (key) > heap->nodes[0].key = key; > if (val) > heap->nodes[0].value = val; It's a leftover from the earlier implementation that used pointers, where you could pass in NULL to leave either key or value unchanged. > While I'm thinking about it, why are the fields of a binaryheap_node > called "key" and "value"? That implies a semantics not actually used > here. Maybe "value1" and "value2" instead? Yes, I discussed this with Andres earlier (and considered ptr and value or p1 and p2), but decided to wait for others to comment on the naming. I have no problem with value1 and value2, though I'd very slightly prefer d1 and d2 (d for Datum) now. Robert: thanks again for your work on the patch. Do you want me to make the changes Tom suggests above, or are you already doing it? -- Abhijit
Abhijit Menon-Sen <ams@2ndQuadrant.com> writes: >> While I'm thinking about it, why are the fields of a binaryheap_node >> called "key" and "value"? That implies a semantics not actually used >> here. Maybe "value1" and "value2" instead? > Yes, I discussed this with Andres earlier (and considered ptr and value > or p1 and p2), but decided to wait for others to comment on the naming. > I have no problem with value1 and value2, though I'd very slightly > prefer d1 and d2 (d for Datum) now. d1/d2 would be fine with me. BTW, I probably missed some context upthread, but why do we have two fields at all? At least for the purposes of nodeMergeAppend, it would make a lot more sense for the binaryheap elements to be just single Datums, without all the redundant pointers to the MergeAppend node. The need to get at the comparison support info could be handled much more elegantly than that by providing a "void *" passthrough arg. That is, the heap creation function becomes binaryheap *binaryheap_allocate(size_t capacity, binaryheap_comparator compare, void *passthrough); and the comparator signature becomes typedef int (*binaryheap_comparator) (Datum a, Datum b, void *passthrough); For nodeMergeAppend, the Datums would be the slot indexes and the passthrough pointer could point to the MergeAppend node. This would make equal sense in a lot of other contexts, such as if you wanted a heap composed of tuples -- the info to compare tuples could be handled via the passthrough argument, rather than doubling the size of the heap in order to put that pointer into each heap element. If you have no use for the passthrough arg in a particular usage, just pass NULL. regards, tom lane
At 2012-11-20 22:55:52 -0500, tgl@sss.pgh.pa.us wrote: > > BTW, I probably missed some context upthread, but why do we have two > fields at all? I would also have preferred to handle the nodeMergeAppend case using a context pointer as you suggest, but Andres needs to store two pointers in his heap nodes. Andres: suppose we replace binaryheap_node with just a Datum, could you live with storing a pointer to a struct with two pointers? If so, that would address the concerns raised. If not, maybe we should explore Robert's earlier suggestion to make binaryheap_node user-definable (in effect). -- Abhijit
On 2012-11-20 22:55:52 -0500, Tom Lane wrote: > Abhijit Menon-Sen <ams@2ndQuadrant.com> writes: > >> While I'm thinking about it, why are the fields of a binaryheap_node > >> called "key" and "value"? That implies a semantics not actually used > >> here. Maybe "value1" and "value2" instead? > > > Yes, I discussed this with Andres earlier (and considered ptr and value > > or p1 and p2), but decided to wait for others to comment on the naming. > > I have no problem with value1 and value2, though I'd very slightly > > prefer d1 and d2 (d for Datum) now. > > d1/d2 would be fine with me. > > BTW, I probably missed some context upthread, but why do we have two > fields at all? At least for the purposes of nodeMergeAppend, it > would make a lot more sense for the binaryheap elements to be just > single Datums, without all the redundant pointers to the MergeAppend > node. The need to get at the comparison support info could be handled > much more elegantly than that by providing a "void *" passthrough arg. > That is, the heap creation function becomes > > binaryheap *binaryheap_allocate(size_t capacity, > binaryheap_comparator compare, > void *passthrough); > > and the comparator signature becomes > > typedef int (*binaryheap_comparator) (Datum a, Datum b, void *passthrough); > > For nodeMergeAppend, the Datums would be the slot indexes and the > passthrough pointer could point to the MergeAppend node. > > This would make equal sense in a lot of other contexts, such as if you > wanted a heap composed of tuples -- the info to compare tuples could be > handled via the passthrough argument, rather than doubling the size of > the heap in order to put that pointer into each heap element. If you > have no use for the passthrough arg in a particular usage, just pass > NULL. The passthrough argument definitely makes sense, I am all for adding it. The reasons I originally wanted to have two values was that it made it easy/possible for the comparison function to work without any pointer dereferences which was somewhat beneficial to performance. Completely without dereferences only worked on 64bit systems anyway though... With just one value I need an extra struct, but thats not too bad. So if you all prefer just having one value, I am ok with that. Who is updating the patch? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 21, 2012 at 6:08 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2012-11-20 22:55:52 -0500, Tom Lane wrote: >> Abhijit Menon-Sen <ams@2ndQuadrant.com> writes: >> >> While I'm thinking about it, why are the fields of a binaryheap_node >> >> called "key" and "value"? That implies a semantics not actually used >> >> here. Maybe "value1" and "value2" instead? >> >> > Yes, I discussed this with Andres earlier (and considered ptr and value >> > or p1 and p2), but decided to wait for others to comment on the naming. >> > I have no problem with value1 and value2, though I'd very slightly >> > prefer d1 and d2 (d for Datum) now. >> >> d1/d2 would be fine with me. >> >> BTW, I probably missed some context upthread, but why do we have two >> fields at all? At least for the purposes of nodeMergeAppend, it >> would make a lot more sense for the binaryheap elements to be just >> single Datums, without all the redundant pointers to the MergeAppend >> node. The need to get at the comparison support info could be handled >> much more elegantly than that by providing a "void *" passthrough arg. >> That is, the heap creation function becomes >> >> binaryheap *binaryheap_allocate(size_t capacity, >> binaryheap_comparator compare, >> void *passthrough); >> >> and the comparator signature becomes >> >> typedef int (*binaryheap_comparator) (Datum a, Datum b, void *passthrough); >> >> For nodeMergeAppend, the Datums would be the slot indexes and the >> passthrough pointer could point to the MergeAppend node. >> >> This would make equal sense in a lot of other contexts, such as if you >> wanted a heap composed of tuples -- the info to compare tuples could be >> handled via the passthrough argument, rather than doubling the size of >> the heap in order to put that pointer into each heap element. If you >> have no use for the passthrough arg in a particular usage, just pass >> NULL. > > The passthrough argument definitely makes sense, I am all for adding it. > > The reasons I originally wanted to have two values was that it made it > easy/possible for the comparison function to work without any pointer > dereferences which was somewhat beneficial to performance. Completely > without dereferences only worked on 64bit systems anyway though... > With just one value I need an extra struct, but thats not too bad. > > So if you all prefer just having one value, I am ok with that. Who is > updating the patch? I guess I'll take another whack at it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I guess I'll take another whack at it. New version attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 2012-11-21 12:54:30 -0500, Robert Haas wrote: > On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > I guess I'll take another whack at it. > > New version attached. I think the assert in replace_first should be Assert(!binaryheap_empty(heap) && heap->has_heap_property); instead of Assert(heap->has_heap_property); looks good otherwise. Thanks, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas escribió: > On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > I guess I'll take another whack at it. > > New version attached. The following comments still talk about "key and value", thus need an update: + * binaryheap_add_unordered + * + * Adds the given key and value to the end of the heap's list of nodes +/* + * binaryheap_add + * + * Adds the given key and value to the heap in O(log n), while +/* + * binaryheap_replace_first + * + * Change the key and/or value of the first (root, topmost) node and This comment needs updated (s/comparator/compare/, and also add has_heap_property and arg): +/* + * binaryheap + * + * size how many nodes are currently in "nodes" + * space how many nodes can be stored in "nodes" + * comparator comparator to define the heap property + * nodes the first of a list of "space" nodes + */ +typedef struct binaryheap +{ + int size; + int space; + bool has_heap_property; /* debugging cross-check */ + binaryheap_comparator compare; + void *arg; + Datum nodes[FLEXIBLE_ARRAY_MEMBER]; +} binaryheap; I would suggest to add prefixes to struct members, so bh_size, bh_space and so on. This makes it easier to grep for stuff later. Do we really need the struct definition be public? Couldn't it just live in binaryheap.c? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 21, 2012 at 1:30 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > The following comments still talk about "key and value", thus need > an update: Oops. > This comment needs updated (s/comparator/compare/, and also add > has_heap_property and arg): Fixed. > I would suggest to add prefixes to struct members, so bh_size, bh_space > and so on. This makes it easier to grep for stuff later. OK. > Do we really need the struct definition be public? Couldn't it just > live in binaryheap.c? Well, that would preclude defining any macros, like binaryheap_empty(). Since efficiency is among the reasons for inventing this in the first place, that doesn't seem prudent to me. Another new version attached. P.S. to Abhijit: Please, please tell me the secret to getting five people to review your patches. I'm lucky when I can get one! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 2012-11-21 15:09:12 -0500, Robert Haas wrote: > On Wed, Nov 21, 2012 at 1:30 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > The following comments still talk about "key and value", thus need > > an update: > > Oops. > > > This comment needs updated (s/comparator/compare/, and also add > > has_heap_property and arg): > > Fixed. > > > I would suggest to add prefixes to struct members, so bh_size, bh_space > > and so on. This makes it easier to grep for stuff later. > > OK. > > > Do we really need the struct definition be public? Couldn't it just > > live in binaryheap.c? > > Well, that would preclude defining any macros, like > binaryheap_empty(). Since efficiency is among the reasons for > inventing this in the first place, that doesn't seem prudent to me. > > Another new version attached. One very minor nitpick I unfortunately just found now, not sure when that changed: binaryheap_replace_first() hardcodes the indices for the left/right node of the root node. I would rather have it use (left|right)_offset(0). > P.S. to Abhijit: Please, please tell me the secret to getting five > people to review your patches. I'm lucky when I can get one! Heh. This was rather surprising, yes. Its a rather easy patch to review in a way, you don't need much pg internals knowledge for it... Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
At 2012-11-21 15:09:12 -0500, robertmhaas@gmail.com wrote: > > > The following comments still talk about "key and value", thus need > > an update: > > Oops. In the same vein, "Returns NULL if the heap is empty" no longer applies to binaryheap_first and binaryheap_remove_first. Plus there is an extra asterisk in the middle of the comment about binaryheap_replace_first. But it looks good otherwise. I agree with Tom that we should support resizing the heap, but let's not bother with that now. I'll write the few extra lines of code the first time something actually needs to add more elements to a heap. -- Abhijit
[ Sorry for the slow response on this, Thanksgiving interfered. ] On Wed, Nov 21, 2012 at 3:41 PM, Andres Freund <andres@2ndquadrant.com> wrote: > One very minor nitpick I unfortunately just found now, not sure when > that changed: > binaryheap_replace_first() hardcodes the indices for the left/right node > of the root node. I would rather have it use (left|right)_offset(0). Hmm, yeah... but come to think of it, why do we need that special case at all? Why not just call sift_down on the root node and call it good? See the attached version, which simplifies the code considerably and also makes some comment adjustments per Abhijit's comments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 2012-11-27 11:56:41 -0500, Robert Haas wrote: > [ Sorry for the slow response on this, Thanksgiving interfered. ] > > On Wed, Nov 21, 2012 at 3:41 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > One very minor nitpick I unfortunately just found now, not sure when > > that changed: > > binaryheap_replace_first() hardcodes the indices for the left/right node > > of the root node. I would rather have it use (left|right)_offset(0). > > Hmm, yeah... but come to think of it, why do we need that special case > at all? Why not just call sift_down on the root node and call it > good? See the attached version, which simplifies the code > considerably and also makes some comment adjustments per Abhijit's > comments. The simplification made me worry for a second but after checking it out I realized that my fear was only based on my original version where I did akv = simpleheap_remove_first(heap);simpleheap_add(heap, kv->key, kv->value); if there was work to be done. But Abhijit optimized that code to do less work, so the amount of comparisons is exactly the same before/after your simplifications. With considerably less code. Looks ready to me. Thanks, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 28, 2012 at 3:21 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Looks ready to me. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
At 2012-11-15 12:08:12 -0500, robertmhaas@gmail.com wrote: > > Still, maybe we could have a src/framework directory that uses the > same trick to produce a libpgframework that frontend code can use, > and a lib pgframework_srv that can be linked into the backend. > That's might not actually be that much work; we'd just need to > clone all the existing src/port logic. Does anyone have further comments about Robert's idea above? Or alternative suggestions about how to structure a library of routines that can be used in either the backend or in client code (like the binary heap implementation)? -- Abhijit
On Mon, Jan 14, 2013 at 4:55 AM, Abhijit Menon-Sen <ams@2ndquadrant.com> wrote: > At 2012-11-15 12:08:12 -0500, robertmhaas@gmail.com wrote: >> >> Still, maybe we could have a src/framework directory that uses the >> same trick to produce a libpgframework that frontend code can use, >> and a lib pgframework_srv that can be linked into the backend. >> That's might not actually be that much work; we'd just need to >> clone all the existing src/port logic. > > Does anyone have further comments about Robert's idea above? Or > alternative suggestions about how to structure a library of routines > that can be used in either the backend or in client code (like the > binary heap implementation)? A couple observations: *) Not sure about the name: something like "libcommon"? *) This is useful and potentially good for many things. For example, maybe type manipulation code could be pushed into the common library at some point. merlin