Thread: Reusing abbreviated keys during second pass of ordered [set] aggregates
Currently, there are certain aggregates that sort some tuples before making a second pass over their memtuples array (through the tuplesort "gettuple" interface), calling one or more attribute equality operator functions as they go. For example, this occurs during the execution of the following queries: -- Salary is a numeric column: SELECT count(DISTINCT salary) AS distinct_compensation_levels FROM employees; -- Get most common distinct salary in organization: SELECT mode() WITHIN GROUP (ORDER BY salary) AS most_common_compensation FROM employees; Since these examples involve numeric comparisons, the equality operator calls involved in that second pass are expensive. There is no equality fast-path for numeric equality as there is for text; I imagine that in practice most calls to texteq() are resolved based on raw datum size mismatch, which is dirt cheap (while the alternative, a memcmp(), is still very cheap). Furthermore, numeric abbreviated keys have a good chance of capturing the entropy of the original values more effectively than we might expect with a text attribute. That means that in the case of numeric, even sorted, adjacent pairs of abbreviated keys have a pretty good chance of being distinct in the real world (if their corresponding authoritative values are themselves distinct). I think we can avoid this expense much of the time. Patch, Performance =============== Attached patch makes several cases like this try and get away with only using the pre-existing abbreviated keys from the sort (to determine inequality). On my laptop, it removes 15% - 25% of the run time of cases similar to the above, with a high cardinality numeric attribute of uniformly distributed values. As usual with my work on sorting, multiple concurrent sorts by multiple backends are helped most; that puts the improvements for realistic cases involving a text attribute at about 5% (the texteq() memcmp() is cheap, but not free) with 4 clients using pgbench. The numeric cases above are now at about the same speed as similar queries that use a double precision floating point number attribute. 9.5-era testing shows that sort-heavy numeric cases are often about as fast as "equivalent" double cases, so it seems worthwhile to mostly eliminate this additional numeric overhead that cases like the above still incur. I imagine citext would benefit much more here once it has abbreviation support (which should be a TODO item). Omissions -------------- I haven't attempted to accelerate AGG_SORTED strategy aggregates, which looked like it would require more invasive changes. It seemed like more trouble than it was worth, given that crossing group boundaries is something that won't occur very often with typical usage, and given the fact that text is naturally pretty fast there anyway (who groups by a numeric attribute?). Similarly, I did not teach setop_retrieve_direct() to consider the possibility that a set operation (like a UNION) could reuse abbreviated keys if it happens to be fed by a sort node. Finally, I did not accelerate the merging of spools during B-Tree index builds, even though that actually seems quite possible -- that case just seemed marginal. We currently have no test coverage for that case [1], so I guess its performance can't be too important. SortSupport contract ================ As explained in the first patch's commit message, I revised the contract that SortSupport has with opclasses. Should this proposal be accepted, we should backpatch the first patch to 9.5, to prevent anyone from building abbreviation support that won't work on 9.6 (even if that is very unlikely). In general, it seems pretty silly to me to not make use of the abbreviated keys that the sort already went to the trouble of building, and it seems like the revision to the SortSupport contract is not likely to notably limit any interesting cases in the future, so I think that this will be uncontroversial. [1] http://www.postgresql.org/message-id/flat/CAM3SWZTvEtVDysXeE7Py-8Ar+-+XRAPkJCZe-FH_V+zWxmuS9A@mail.gmail.com#CAM3SWZTvEtVDysXeE7Py-8Ar+-+XRAPkJCZe-FH_V+zWxmuS9A@mail.gmail.com -- Peter Geoghegan
Attachment
On Sat, Jul 11, 2015 at 10:05 AM, Peter Geoghegan <pg@heroku.com> wrote:
Forgive me for going off on a tangent here: I understand that the point of your patch is to avoid calls to numeric_eq altogether in these scenarios when cardinality is high, but the above sentence made me wonder how likely it is that numeric values in typical workloads have the same size, weight and sign and could therefore be handled in numeric_eq with memcmp() == 0 rather than numeric_cmp() == 0, and whether there would be any benefit.
Running various contrived aggregate queries on a large low cardinality dataset in a small range (therefore frequently the same weight & size), I managed to measure a small improvement of up to a few percent with the attached patch. I also wonder whether numeric_cmp could be profitably implemented with memcmp on big endian systems and some unrolled loops on little endian systems when size & weight match.
Of course there are a ton of other overheads involved with numeric. I wonder how crazy or difficult it would be to make it so that we could optionally put a pass-by-value NUMERIC in a Datum, setting a low order tag bit to say 'I'm an immediate value, not a pointer', and then packing 3 digits (= 12 significant figures) + sign + weight into the other 63 bits.
Currently, there are certain aggregates that sort some tuples before
making a second pass over their memtuples array (through the tuplesort
"gettuple" interface), calling one or more attribute equality operator
functions as they go. For example, this occurs during the execution of
the following queries:
-- Salary is a numeric column:
SELECT count(DISTINCT salary) AS distinct_compensation_levels FROM employees;
-- Get most common distinct salary in organization:
SELECT mode() WITHIN GROUP (ORDER BY salary) AS
most_common_compensation FROM employees;
Since these examples involve numeric comparisons, the equality
operator calls involved in that second pass are expensive. There is no
equality fast-path for numeric equality as there is for text; I
imagine that in practice most calls to texteq() are resolved based on
raw datum size mismatch, which is dirt cheap (while the alternative, a
memcmp(), is still very cheap).
Forgive me for going off on a tangent here: I understand that the point of your patch is to avoid calls to numeric_eq altogether in these scenarios when cardinality is high, but the above sentence made me wonder how likely it is that numeric values in typical workloads have the same size, weight and sign and could therefore be handled in numeric_eq with memcmp() == 0 rather than numeric_cmp() == 0, and whether there would be any benefit.
Running various contrived aggregate queries on a large low cardinality dataset in a small range (therefore frequently the same weight & size), I managed to measure a small improvement of up to a few percent with the attached patch. I also wonder whether numeric_cmp could be profitably implemented with memcmp on big endian systems and some unrolled loops on little endian systems when size & weight match.
Of course there are a ton of other overheads involved with numeric. I wonder how crazy or difficult it would be to make it so that we could optionally put a pass-by-value NUMERIC in a Datum, setting a low order tag bit to say 'I'm an immediate value, not a pointer', and then packing 3 digits (= 12 significant figures) + sign + weight into the other 63 bits.
Thomas Munro
http://www.enterprisedb.com
http://www.enterprisedb.com
Attachment
This needs a rebase, there are several conflicts in src/backend/executor/nodeAgg.c Thanks, Jeff
Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Sun, Sep 6, 2015 at 9:35 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Running various contrived aggregate queries on a large low cardinality > dataset in a small range (therefore frequently the same weight & size), I > managed to measure a small improvement of up to a few percent with the > attached patch. I also wonder whether numeric_cmp could be profitably > implemented with memcmp on big endian systems and some unrolled loops on > little endian systems when size & weight match. I think that setting up numeric SortSupport to have scratch memory used across authoritative numeric comparator calls would also help. We should prefer to do this kind of thing in a datatype independent way, of course. I'm not opposed to doing what you outline too, but I don't think it will be especially helpful for the cases here. I think that what you're talking about would live in the SortSupport authoritative comparator, and would benefit non-leading-attribute comparisons most. > Of course there are a ton of other overheads involved with numeric. I > wonder how crazy or difficult it would be to make it so that we could > optionally put a pass-by-value NUMERIC in a Datum, setting a low order tag > bit to say 'I'm an immediate value, not a pointer', and then packing 3 > digits (= 12 significant figures) + sign + weight into the other 63 bits. That seems possible, but very invasive. I'd want to get a good sense of the pay-off before undertaking such a project. -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Fri, Sep 25, 2015 at 2:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > This needs a rebase, there are several conflicts in src/backend/executor/nodeAgg.c I attached a revised version of the second patch in the series, fixing this bitrot. I also noticed a bug in tuplesort's TSS_SORTEDONTAPE case with the previous patch, where no final on-the-fly merge step is required (no merge step is required whatsoever, because replacement selection managed to produce only one run). The function mergeruns() previously only "abandoned" abbreviated ahead of any merge step iff there was one. When there was only one run (not requiring a merge) it happened to continue to keep its state consistent with abbreviated keys still being in use. It didn't matter before, because abbreviated keys were only for tuplesort to compare, but that's different now. That bug is fixed in this revision by reordering things within mergeruns(). The previous order of the two things that were switched is not at all significant (I should know, I wrote that code). Thanks -- Peter Geoghegan
Attachment
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Andreas Karlsson
Date:
Hi, I have reviewed this patch and it still applies to master, compiles and passes the test suite. I like the goal of the patch, making use of the already existing abbreviation machinery in more cases is something we should do and the patch itself looks clean. I can also confirm the roughly 25% speedup in the best case (numerics which are all distinct) with no measurable slowdown in the worst case. Given this speedup and the small size of the patch I think we should apply it. I will set this patch to "Ready for Commiter". Andreas
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Sun, Dec 6, 2015 at 7:14 PM, Andreas Karlsson <andreas@proxel.se> wrote: > I can also confirm the roughly 25% speedup in the best case (numerics which > are all distinct) with no measurable slowdown in the worst case. > > Given this speedup and the small size of the patch I think we should apply > it. I will set this patch to "Ready for Commiter". Thanks for the review, Andreas. -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Robert Haas
Date:
On Sat, Oct 10, 2015 at 9:03 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Fri, Sep 25, 2015 at 2:39 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> This needs a rebase, there are several conflicts in src/backend/executor/nodeAgg.c > > I attached a revised version of the second patch in the series, fixing > this bitrot. > > I also noticed a bug in tuplesort's TSS_SORTEDONTAPE case with the > previous patch, where no final on-the-fly merge step is required (no > merge step is required whatsoever, because replacement selection > managed to produce only one run). The function mergeruns() previously > only "abandoned" abbreviated ahead of any merge step iff there was > one. When there was only one run (not requiring a merge) it happened > to continue to keep its state consistent with abbreviated keys still > being in use. It didn't matter before, because abbreviated keys were > only for tuplesort to compare, but that's different now. > > That bug is fixed in this revision by reordering things within > mergeruns(). The previous order of the two things that were switched > is not at all significant (I should know, I wrote that code). I find the references to a "void" representation in this patch to be completely opaque. I see that there are some such references in tuplesort.c already, and most likely they were put there by commits that I did, so I guess I have nobody but myself to blame, but I don't know what this means, and I don't think we should let this terminology proliferate. My understanding is that the "void" representation is intended to whatever Datum we originally got, which might be a pointer. Why not just say that instead of referring to it this way? My understanding is also that it's OK if the abbreviated key stays the same even though the value has changed, but that the reverse could cause queries to return wrong answers. The first part of that justifies why this is safe when no abbreviation is available: we'll return an abbreviated value of 0 for everything, but that's fine. However, using the original Datum (which might be a pointer) seems unsafe, because two binary-identical values could be stored at different addresses and thus have different pointer representations. I'm probably missing something here, so clue me in... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Wed, Dec 9, 2015 at 11:31 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I find the references to a "void" representation in this patch to be > completely opaque. I see that there are some such references in > tuplesort.c already, and most likely they were put there by commits > that I did, so I guess I have nobody but myself to blame, but I don't > know what this means, and I don't think we should let this terminology > proliferate. > > My understanding is that the "void" representation is intended to > whatever Datum we originally got, which might be a pointer. Why not > just say that instead of referring to it this way? That isn't what is intended. "void" is the state that macros like index_getattr() leave NULL leading attributes (that go in the SortTuple.datum1 field) in. However, the function tuplesort_putdatum() requires callers to initialize their Datum to 0 now, which is new. A "void" representation is a would-be NULL pointer in the case of pass-by-value types, and a NULL pointer for pass-by-reference types. > My understanding is also that it's OK if the abbreviated key stays the > same even though the value has changed, but that the reverse could > cause queries to return wrong answers. The first part of that > justifies why this is safe when no abbreviation is available: we'll > return an abbreviated value of 0 for everything, but that's fine. > However, using the original Datum (which might be a pointer) seems > unsafe, because two binary-identical values could be stored at > different addresses and thus have different pointer representations. > > I'm probably missing something here, so clue me in... I think that you're missing that patch 0001 formally forbids abbreviated keys that are pass-by-value, by revising the contract (this is proposed for backpatch to 9.5 -- only comments are changed). This is already something that is all but forbidden, although the datum case does tacitly acknowledge the possibility by not allowing abbreviation to work with the pass-by-value-and-yet-abbreviated case. I think that this revision is also useful for putting abbreviated keys in indexes, something that may happen yet. -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Wed, Dec 9, 2015 at 2:15 PM, Peter Geoghegan <pg@heroku.com> wrote: > I think that you're missing that patch 0001 formally forbids > abbreviated keys that are pass-by-value Sorry. I do of course mean it forbids abbreviated keys that are *not* pass-by-value (that are pass-by-reference). -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Wed, Dec 9, 2015 at 2:15 PM, Peter Geoghegan <pg@heroku.com> wrote: > I think that you're missing that patch 0001 formally forbids > abbreviated keys that are pass-by-value, by revising the contract > (this is proposed for backpatch to 9.5 -- only comments are changed). > This is already something that is all but forbidden, although the > datum case does tacitly acknowledge the possibility by not allowing > abbreviation to work with the pass-by-value-and-yet-abbreviated case. > > I think that this revision is also useful for putting abbreviated keys > in indexes, something that may happen yet. I'm also depending on this for the "quicksort for every sort run" patch, BTW: + /* + * Kludge: Trigger abbreviated tie-breaker if in-memory tuples + * use abbreviation (writing tuples to tape never preserves + * abbreviated keys). Do this by assigning in-memory + * abbreviated tuple to tape tuple directly. + * + * It doesn't seem worth generating a new abbreviated key for + * the tape tuple, and this approach is simpler than + * "unabbreviating" the memtuple tuple from a "common" routine + * like this. + */ + if (state->sortKeys != NULL && state->sortKeys->abbrev_converter != NULL) + stup->datum1 = state->memtuples[state->current].datum1; I could, as an alternative approach, revise tuplesort so that self-comparison works (something we currently assert against [1]), something that would probably *also* require and update to the sortsupport.h contract, but this seemed simpler and more general. In general, I think that there are plenty of reasons to forbid pass-by-reference abbreviated keys (where the abbreviated comparator itself is a pointer, something much more complicated than an integer 3-way comparison or similar). [1] Commit c5a03256c -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Robert Haas
Date:
On Wed, Dec 9, 2015 at 5:15 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Dec 9, 2015 at 11:31 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I find the references to a "void" representation in this patch to be >> completely opaque. I see that there are some such references in >> tuplesort.c already, and most likely they were put there by commits >> that I did, so I guess I have nobody but myself to blame, but I don't >> know what this means, and I don't think we should let this terminology >> proliferate. >> >> My understanding is that the "void" representation is intended to >> whatever Datum we originally got, which might be a pointer. Why not >> just say that instead of referring to it this way? > > That isn't what is intended. "void" is the state that macros like > index_getattr() leave NULL leading attributes (that go in the > SortTuple.datum1 field) in. What kind of state is that? Can't we define this in terms of what it is rather than how it gets that way? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Wed, Dec 16, 2015 at 9:36 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> That isn't what is intended. "void" is the state that macros like >> index_getattr() leave NULL leading attributes (that go in the >> SortTuple.datum1 field) in. > > What kind of state is that? Can't we define this in terms of what it > is rather than how it gets that way? It's zeroed. I guess we can update everything, including existing comments, to reflect that. -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Wed, Dec 16, 2015 at 12:04 PM, Peter Geoghegan <pg@heroku.com> wrote: >> What kind of state is that? Can't we define this in terms of what it >> is rather than how it gets that way? > > It's zeroed. > > I guess we can update everything, including existing comments, to reflect that. Attached revision updates both the main commit (the optimization), and the backpatch commit (updated the contract). Changes: * Changes commit intended for backpatch to 9.5 to use new "zeroed" terminology (not "void" representation). * Puts fix within mergerun() (for this patch) in the same backpatch commit. We might as well keep this consistent, even though the correctness of 9.5 does not actually hinge upon having this change -- there is zero risk of breaking something in 9.5, and avoiding backpatching this part can only lead to confusion in the future. * Some minor tweaks to tuplesort.c. Make sure that tuplesort_putdatum() zeroes datum1 directly for NULL values. Comment tweaks. * Add comment to the datum case, discussing restriction there. I was wrong when I said that the datum sort case currently tacitly acknowledges as possible/useful something that the revised SortSupport contract will forbid (I wrote that e-mail in haste). What I propose to make the SortSupport contract forbid here is abbreviated keys that are *themselves* pass-by-reference. It is true that today, we do not support pass-by-value types with abbreviated keys for the datum case only (such opclass support could be slightly useful for float8, for example -- int8 comparisons of an alternative representation might be decisively cheaper). But that existing restriction is entirely distinct to the new restriction I propose to create. It seems like this distinction should be pointed out in passing, which I've done. -- Peter Geoghegan
Attachment
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Robert Haas
Date:
On Thu, Dec 17, 2015 at 11:43 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Dec 16, 2015 at 12:04 PM, Peter Geoghegan <pg@heroku.com> wrote: >>> What kind of state is that? Can't we define this in terms of what it >>> is rather than how it gets that way? >> >> It's zeroed. >> >> I guess we can update everything, including existing comments, to reflect that. Thanks, this looks much easier to understand from here. > Attached revision updates both the main commit (the optimization), and > the backpatch commit (updated the contract). - /* abbreviation is possible here only for by-reference types */ + /* + * Abbreviation is possible here only for by-reference types. Note that a + * pass-by-value representation for abbreviated values is forbidden, but + * that's a distinct, generic restriction imposed by the SortSupport + * contract. I think that you have not written what you meant to write here. I think what you mean is "Note that a pass-by-REFERENCE representation for abbreviated values is forbidden...". + /* + * If we produced only one initial run (quite likely if the total data + * volume is between 1X and 2X workMem), we can just use that tape as the + * finished output, rather than doing a useless merge. (This obvious + * optimization is not in Knuth's algorithm.) + */ + if (state->currentRun == 1) + { + state->result_tape = state->tp_tapenum[state->destTape]; + /* must freeze and rewind the finished output tape */ + LogicalTapeFreeze(state->tapeset, state->result_tape); + state->status = TSS_SORTEDONTAPE; + return; + } I don't understand the point of moving this code. If there's some reason to do this after rewinding the tapes rather than beforehand, I think we should articulate that reason in the comment block. The last hunk in your 0001 patch properly belongs in 0002. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Fri, Dec 18, 2015 at 9:35 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Attached revision updates both the main commit (the optimization), and >> the backpatch commit (updated the contract). > > - /* abbreviation is possible here only for by-reference types */ > + /* > + * Abbreviation is possible here only for by-reference types. > Note that a > + * pass-by-value representation for abbreviated values is forbidden, but > + * that's a distinct, generic restriction imposed by the SortSupport > + * contract. > > I think that you have not written what you meant to write here. I > think what you mean is "Note that a pass-by-REFERENCE representation > for abbreviated values is forbidden...". You're right. Sorry about that. > + /* > + * If we produced only one initial run (quite likely if the total data > + * volume is between 1X and 2X workMem), we can just use that > tape as the > + * finished output, rather than doing a useless merge. (This obvious > + * optimization is not in Knuth's algorithm.) > + */ > + if (state->currentRun == 1) > + { > + state->result_tape = state->tp_tapenum[state->destTape]; > + /* must freeze and rewind the finished output tape */ > + LogicalTapeFreeze(state->tapeset, state->result_tape); > + state->status = TSS_SORTEDONTAPE; > + return; > + } > > I don't understand the point of moving this code. If there's some > reason to do this after rewinding the tapes rather than beforehand, I > think we should articulate that reason in the comment block. I thought that was made clear by the 0001 commit message. Think of what happens when we don't disable abbreviated in the final TSS_SORTEDONTAPE phase should the "if (state->currentRun == 1)" path have been taken (but *not* the path that also ends in TSS_SORTEDONTAPE, when caller requires randomAccess but we spill to tape, or any other case). What happens is: The code in 0002 gets confused, and attempts to pass back a pointer value as an "abbreviated key". That's a bug. > The last hunk in your 0001 patch properly belongs in 0002. You could certainly argue that the last hunk of 0001 belongs in 0002. I only moved it to 0001 when I realized that we might as well keep the branches in sync, since the ordering is insignificant from a 9.5 perspective (although it might still be tidier), and there is a need to backpatch anyway. I'm not insisting on doing it that way. -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Robert Haas
Date:
On Fri, Dec 18, 2015 at 2:22 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Fri, Dec 18, 2015 at 9:35 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> Attached revision updates both the main commit (the optimization), and >>> the backpatch commit (updated the contract). >> >> - /* abbreviation is possible here only for by-reference types */ >> + /* >> + * Abbreviation is possible here only for by-reference types. >> Note that a >> + * pass-by-value representation for abbreviated values is forbidden, but >> + * that's a distinct, generic restriction imposed by the SortSupport >> + * contract. >> >> I think that you have not written what you meant to write here. I >> think what you mean is "Note that a pass-by-REFERENCE representation >> for abbreviated values is forbidden...". > > You're right. Sorry about that. PFA my proposal for comment changes for 9.5 and master. This is based on your 0001, but I edited somewhat. Please let me know your thoughts. I am not willing to go further and rearrange actual code in 9.5 at this point; it just isn't necessary. Looking at this whole system again, I wonder if we're missing a trick here. How about if we decree from on high that the abbreviated-key comparator is always just the application of an integer comparison operator? The only abbreviation function that we have right now that wouldn't be happy with that is the one for numeric, and that looks like it could be fixed. Then we could hard-code knowledge of this representation in tuplesort.c in such a way that it wouldn't need to call a comparator function at all - instead of doing ApplySortComparator() and then maybe ApplySortAbbrevFullComparator(), it would do something like: if (using_abbreviation && (compare = ApplyAbbrevComparator(...)) != 0) return compare; I'm not sure if that would save enough cycles vs. the status quo to be worth bothering with, but it seems like it might. You may have a better feeling for that than I do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Mon, Dec 21, 2015 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote: > PFA my proposal for comment changes for 9.5 and master. This is based > on your 0001, but I edited somewhat. Please let me know your > thoughts. I am not willing to go further and rearrange actual code in > 9.5 at this point; it just isn't necessary. Fine by me. But this revision hasn't made the important point at all -- which is that 0002 is safe. That's a stronger guarantee than the abbreviated key representation being pass-by-value. > Looking at this whole system again, I wonder if we're missing a trick > here. How about if we decree from on high that the abbreviated-key > comparator is always just the application of an integer comparison > operator? The only abbreviation function that we have right now that > wouldn't be happy with that is the one for numeric, and that looks > like it could be fixed. I gather you mean that we'd decree that they were always compared using a text or uuid style 3-way unsigned integer comparison, allowing us to hardcode that. > Then we could hard-code knowledge of this > representation in tuplesort.c in such a way that it wouldn't need to > call a comparator function at all - instead of doing > ApplySortComparator() and then maybe ApplySortAbbrevFullComparator(), > it would do something like: > > if (using_abbreviation && (compare = ApplyAbbrevComparator(...)) != 0) > return compare; > > I'm not sure if that would save enough cycles vs. the status quo to be > worth bothering with, but it seems like it might. You may have a > better feeling for that than I do. I like this idea. I thought of it myself already, actually. It has some nice properties, because unsigned integers are so simple and flexible. For example, we can do it with int4 sometimes, and then compare two attributes at once on 64-bit platforms. Maybe the second attribute (the second set of 4 bytes) isn't an int4, though -- it's any other type's abbreviated key (which can be assumed to have a compatible representation). That's one more advanced possibility. It seems worthwhile because numeric is the odd one out. Once I or someone else gets around to adding network types, which could happen in 9.6, then we are done (because there is already a pending patch that adds support for remaining character-like types and alternative opclasses). I really don't foresee much additional use of abbreviated keys beyond these cases. There'd be a small but nice boost by not doing pointer chasing for the abbreviated key comparison, I imagine, but that benefit would be felt everywhere. Numeric is the odd one out currently, but as you say it can be fixed, and I foresee no other trouble from any other opclass/type that cares about abbreviation. Another issue is that abbreviated keys in indexes are probably not going to take 8 bytes, because they'll go in the ItemId array. It's likely to be very useful to be able to take the first two bytes, and know that a uint16 comparison is all that is needed, and have a basis to perform an interpolation search rather than a binary search (although that needs to be adaptive to different distributions, I think it could be an effective technique -- there are many cache misses for binary searches on internal B-Tree pages). -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Robert Haas
Date:
On Mon, Dec 21, 2015 at 2:55 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Dec 21, 2015 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> PFA my proposal for comment changes for 9.5 and master. This is based >> on your 0001, but I edited somewhat. Please let me know your >> thoughts. I am not willing to go further and rearrange actual code in >> 9.5 at this point; it just isn't necessary. > > Fine by me. But this revision hasn't made the important point at all > -- which is that 0002 is safe. That's a stronger guarantee than the > abbreviated key representation being pass-by-value. Right. I don't think that we should back-patch that stuff into 9.5. >> Looking at this whole system again, I wonder if we're missing a trick >> here. How about if we decree from on high that the abbreviated-key >> comparator is always just the application of an integer comparison >> operator? The only abbreviation function that we have right now that >> wouldn't be happy with that is the one for numeric, and that looks >> like it could be fixed. > > I gather you mean that we'd decree that they were always compared > using a text or uuid style 3-way unsigned integer comparison, allowing > us to hardcode that. Right. >> Then we could hard-code knowledge of this >> representation in tuplesort.c in such a way that it wouldn't need to >> call a comparator function at all - instead of doing >> ApplySortComparator() and then maybe ApplySortAbbrevFullComparator(), >> it would do something like: >> >> if (using_abbreviation && (compare = ApplyAbbrevComparator(...)) != 0) >> return compare; >> >> I'm not sure if that would save enough cycles vs. the status quo to be >> worth bothering with, but it seems like it might. You may have a >> better feeling for that than I do. > > I like this idea. I thought of it myself already, actually. > > It has some nice properties, because unsigned integers are so simple > and flexible. For example, we can do it with int4 sometimes, and then > compare two attributes at once on 64-bit platforms. Maybe the second > attribute (the second set of 4 bytes) isn't an int4, though -- it's > any other type's abbreviated key (which can be assumed to have a > compatible representation). That's one more advanced possibility. Yikes. > It seems worthwhile because numeric is the odd one out. Once I or > someone else gets around to adding network types, which could happen > in 9.6, then we are done (because there is already a pending patch > that adds support for remaining character-like types and alternative > opclasses). I really don't foresee much additional use of abbreviated > keys beyond these cases. There'd be a small but nice boost by not > doing pointer chasing for the abbreviated key comparison, I imagine, > but that benefit would be felt everywhere. Numeric is the odd one out > currently, but as you say it can be fixed, and I foresee no other > trouble from any other opclass/type that cares about abbreviation. > > Another issue is that abbreviated keys in indexes are probably not > going to take 8 bytes, because they'll go in the ItemId array. It's > likely to be very useful to be able to take the first two bytes, and > know that a uint16 comparison is all that is needed, and have a basis > to perform an interpolation search rather than a binary search > (although that needs to be adaptive to different distributions, I > think it could be an effective technique -- there are many cache > misses for binary searches on internal B-Tree pages). Hmm, that seems iffy to me. There are plenty of data sets where 8 bytes is enough to get some meaningful information about what part of the keyspace you're in, but 2 bytes isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Tue, Dec 22, 2015 at 9:31 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> It has some nice properties, because unsigned integers are so simple >> and flexible. For example, we can do it with int4 sometimes, and then >> compare two attributes at once on 64-bit platforms. Maybe the second >> attribute (the second set of 4 bytes) isn't an int4, though -- it's >> any other type's abbreviated key (which can be assumed to have a >> compatible representation). That's one more advanced possibility. > > Yikes. I'm not suggesting we'd want to do that immediately, or even at all. My point is -- stuff like this becomes possible. My intuition is that it might come up in a bunch of places. >> Another issue is that abbreviated keys in indexes are probably not >> going to take 8 bytes, because they'll go in the ItemId array. It's >> likely to be very useful to be able to take the first two bytes, and >> know that a uint16 comparison is all that is needed, and have a basis >> to perform an interpolation search rather than a binary search >> (although that needs to be adaptive to different distributions, I >> think it could be an effective technique -- there are many cache >> misses for binary searches on internal B-Tree pages). > > Hmm, that seems iffy to me. There are plenty of data sets where 8 > bytes is enough to get some meaningful information about what part of > the keyspace you're in, but 2 bytes isn't. Your experience with abbreviated keys for sorting is unlikely to perfectly carry over here. That's because the 2 bytes are only put in the internal pages (typically less than 1% of the total). These internal pages typically have relatively low density anyway (we don't use the The B* tree technique), so I think that the level of bloat would be tiny. At the same time, these are the range of values that naturally have very wide fan-out. For example, the entire range of values in the index is represented at the root page. All of that having been said, yes, it could still be useless. But then, the overhead will still tend to be nill, due to the fact that abbreviated key generation can be so effectively amortized (it happens only once, per page split), and because branch prediction will tend to remove any penalty. -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Robert Haas
Date:
On Tue, Dec 22, 2015 at 12:31 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Dec 21, 2015 at 2:55 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Mon, Dec 21, 2015 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> PFA my proposal for comment changes for 9.5 and master. This is based >>> on your 0001, but I edited somewhat. Please let me know your >>> thoughts. I am not willing to go further and rearrange actual code in >>> 9.5 at this point; it just isn't necessary. >> >> Fine by me. But this revision hasn't made the important point at all >> -- which is that 0002 is safe. That's a stronger guarantee than the >> abbreviated key representation being pass-by-value. > > Right. I don't think that we should back-patch that stuff into 9.5. OK, so I've gone ahead and committed and back-patched that. Can you please rebase and repost the remainder as a 9.6 proposal? Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Tue, Dec 22, 2015 at 2:59 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Right. I don't think that we should back-patch that stuff into 9.5. > > OK, so I've gone ahead and committed and back-patched that. Can you > please rebase and repost the remainder as a 9.6 proposal? OK. I don't know why you didn't acknowledge in your revision to sortsupport.h that bitwise inequality must be a reliable proxy for authoritative value inequality, which is a stronger restriction than mandating that abbreviated keys always be pass-by-value, but I'm not going to argue. -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Tue, Dec 22, 2015 at 2:59 PM, Robert Haas <robertmhaas@gmail.com> wrote: > OK, so I've gone ahead and committed and back-patched that. Can you > please rebase and repost the remainder as a 9.6 proposal? I attach a rebased patch for 9.6 only. -- Peter Geoghegan
Attachment
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Tue, Dec 22, 2015 at 10:49 AM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Dec 22, 2015 at 9:31 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> It has some nice properties, because unsigned integers are so simple >>> and flexible. For example, we can do it with int4 sometimes, and then >>> compare two attributes at once on 64-bit platforms. Maybe the second >>> attribute (the second set of 4 bytes) isn't an int4, though -- it's >>> any other type's abbreviated key (which can be assumed to have a >>> compatible representation). That's one more advanced possibility. >> >> Yikes. > > I'm not suggesting we'd want to do that immediately, or even at all. > My point is -- stuff like this becomes possible. My intuition is that > it might come up in a bunch of places. I had a better idea, that is based on the same intuition that we can leverage the flexibility of that standardized representation of abbreviated keys to the hilt. We use one bit to represent whether this is the first run, or a subsequent run (in a world where replacement selection is only used for "quicksort with spillover", and run number isn't that interesting), and another bit to represent NULLness. These are the most significant and second most significant bits, respectively. Fill all remaining bits with your unsigned abbreviated key representation, without regard to the underlying details of the type. Now, you just found a way to cut the size of SortTuples significantly, to just two pointers (whereas with alignment considerations, SortTuples currently consume enough memory to store 3 pointers). You also just found a way of significantly cutting the number of instructions that are needed for the hot code path, because you don't really need an abbreviation-based ApplySortComparator() inline function to interpret things through -- whether or not NULLs are first or last, and how NULLs are considered generally is all baked into the abbreviated representation from the start. This is certainly speculative stuff, and having multiple versions of SortTuple would be inconvenient, but the point again is that there are considerable upsides to a standardized representation (abbreviated keys always being compared as unsigned integers), and no downsides. -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Tue, Dec 22, 2015 at 10:49 PM, Peter Geoghegan <pg@heroku.com> wrote: > I attach a rebased patch for 9.6 only. I marked the patch -- my own patch -- "Ready for Committer". I'm the third person to have marked the patch "Ready for Committer", even though no committer bounced the patch back for review by the reviewer, Andreas: https://commitfest.postgresql.org/9/300/ First Andreas did so, then Michael, and finally myself. The problem is that you see things like this: "Closed in commitfest 2015-11 with status: Moved to next CF" Michael (the CF manager at the time) remembered to change the status to "Ready for Committer" again; you see this entry immediately afterwards: "New status: Ready for Committer" but I gather from the CF app history that Alvaro (the current CF manager) did not remember to do that second step when he later moved the patch to the "next" (current) CF. Or maybe he just wasn't aware of this quirk of the CF app. I don't have a problem with having to resubmit a patch to the next CF manually, but it's easy to miss that the status has been changed from "Ready for Committer" to "Needs Review". I don't think anyone ever argued for that, and this patch was accidentally in "Needs Review" state for about 5 days. Can we fix it? -- Peter Geoghegan
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Alvaro Herrera
Date:
Peter Geoghegan wrote: > Michael (the CF manager at the time) remembered to change the status > to "Ready for Committer" again; you see this entry immediately > afterwards: > > "New status: Ready for Committer" > > but I gather from the CF app history that Alvaro (the current CF > manager) did not remember to do that second step when he later moved > the patch to the "next" (current) CF. Or maybe he just wasn't aware of > this quirk of the CF app. That seems a bug in the CF app to me. (FWIW I'm not the "current" CF manager, because the CF I managed is already over. I'm not sure that we know who the manager for the upcoming one is.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Mon, Feb 15, 2016 at 12:58 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > (FWIW I'm not the "current" CF manager, because the CF I managed is > already over. I'm not sure that we know who the manager for the > upcoming one is.) It's pretty easy to forget that this was the first time in a long time that the CF wasn't closed because the next CF began. :-) -- Peter Geoghegan
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > (FWIW I'm not the "current" CF manager, because the CF I managed is > already over. I'm not sure that we know who the manager for the > upcoming one is.) We had a vict^H^H^H^Hvolunteer a few days ago: http://www.postgresql.org/message-id/56B91A6D.6030908@pgmasters.net regards, tom lane
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Michael Paquier
Date:
On Tue, Feb 16, 2016 at 5:58 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Peter Geoghegan wrote: > >> Michael (the CF manager at the time) remembered to change the status >> to "Ready for Committer" again; you see this entry immediately >> afterwards: >> >> "New status: Ready for Committer" >> >> but I gather from the CF app history that Alvaro (the current CF >> manager) did not remember to do that second step when he later moved >> the patch to the "next" (current) CF. Or maybe he just wasn't aware of >> this quirk of the CF app. I recall switching this patch as "Ready for committer" based on the status of the thread. > That seems a bug in the CF app to me. > > (FWIW I'm not the "current" CF manager, because the CF I managed is > already over. I'm not sure that we know who the manager for the > upcoming one is.) When a patch with status "Ready for committer" on CF N is moved to CF (N+1), its status is automatically changed to "Needs Review". That's something to be aware of when cleaning up the CF app entries. -- Michael
Commitfest Bug (was: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates)
From
Jim Nasby
Date:
On 2/16/16 12:38 AM, Michael Paquier wrote: > When a patch with status "Ready for committer" on CF N is moved to CF > (N+1), its status is automatically changed to "Needs Review". That's > something to be aware of when cleaning up the CF app entries. I agree with Alvarro; this seems like a bug to me. If a patch is ready for committer in CF N, why would it suddenly not be ready in N+1? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: Commitfest Bug (was: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates)
From
Pavel Stehule
Date:
2016-02-17 3:19 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 2/16/16 12:38 AM, Michael Paquier wrote:When a patch with status "Ready for committer" on CF N is moved to CF
(N+1), its status is automatically changed to "Needs Review". That's
something to be aware of when cleaning up the CF app entries.
I agree with Alvarro; this seems like a bug to me. If a patch is ready for committer in CF N, why would it suddenly not be ready in N+1?
+1,
This behave is pretty unpleasant and frustrating.
Regards
Pavel
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Robert Haas
Date:
On Wed, Dec 23, 2015 at 12:19 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Dec 22, 2015 at 2:59 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> OK, so I've gone ahead and committed and back-patched that. Can you >> please rebase and repost the remainder as a 9.6 proposal? > > I attach a rebased patch for 9.6 only. Committed, except I left out one comment hunk that I wasn't convinced about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
From
Peter Geoghegan
Date:
On Wed, Feb 17, 2016 at 2:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Committed, except I left out one comment hunk that I wasn't convinced about. Thank you. -- Peter Geoghegan
Re: Commitfest Bug (was: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates)
From
Magnus Hagander
Date:
On Tue, Feb 16, 2016 at 11:12 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
-- 2016-02-17 3:19 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:On 2/16/16 12:38 AM, Michael Paquier wrote:When a patch with status "Ready for committer" on CF N is moved to CF
(N+1), its status is automatically changed to "Needs Review". That's
something to be aware of when cleaning up the CF app entries.
I agree with Alvarro; this seems like a bug to me. If a patch is ready for committer in CF N, why would it suddenly not be ready in N+1?+1,This behave is pretty unpleasant and frustrating.
Well, it's in no way a bug, because it's the behavior we agreed upon at the time :)
That said, we can certainly reconsider that. Would we always copy the value over? Even if it was, say, rejected? (so it would be copied to the new CF but still marked rejected) Or is there a subset of behaviors you're looking for?
Re: Commitfest Bug (was: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates)
From
Alvaro Herrera
Date:
Magnus Hagander wrote: > On Tue, Feb 16, 2016 at 11:12 PM, Pavel Stehule <pavel.stehule@gmail.com> > wrote: > > This behave is pretty unpleasant and frustrating. > > Well, it's in no way a bug, because it's the behavior we agreed upon at the > time :) I guess the "move to next CF" operation is new enough that we didn't yet know what was the most useful behavior. > That said, we can certainly reconsider that. Would we always copy the value > over? Even if it was, say, rejected? (so it would be copied to the new CF > but still marked rejected) Or is there a subset of behaviors you're looking > for? I think the states "Ready for Committer" and "Needs Review" ought to be kept in the new CF. I'm unclear on what to do about "Returned with Feedback" and "Waiting on Author"; my first instinct is that if a patch is in those states, then it shouldn't be possible to move to the next CF. On the other hand, if we force the state to change to "Needs Review" before moving it, we would lose the information of what state it was closed with. So perhaps for any patch in those two states, the state in the next CF should be "needs review" too. I am even more unclear on "Rejected". My instinct says we should refuse a move-to-next-cf for such patches. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Commitfest Bug (was: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates)
From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Magnus Hagander wrote: >> That said, we can certainly reconsider that. Would we always copy the value >> over? Even if it was, say, rejected? (so it would be copied to the new CF >> but still marked rejected) Or is there a subset of behaviors you're looking >> for? > I think the states "Ready for Committer" and "Needs Review" ought to be > kept in the new CF. Definitely. > I'm unclear on what to do about "Returned with Feedback" and "Waiting on > Author"; my first instinct is that if a patch is in those states, then > it shouldn't be possible to move to the next CF. On the other hand, if > we force the state to change to "Needs Review" before moving it, we > would lose the information of what state it was closed with. So perhaps > for any patch in those two states, the state in the next CF should be > "needs review" too. +1 for not moving such patches to the new CF until the author does something --- at which point they'd change to "Needs Review" state. But we should not change them into that state without author input. And I don't see the value of having them in a new CF until the author does something. > I am even more unclear on "Rejected". My instinct says we should refuse > a move-to-next-cf for such patches. Right. Rejected is dead, it shouldn't propagate forward. regards, tom lane
Re: Commitfest Bug (was: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates)
From
Peter Geoghegan
Date:
On Tue, Mar 1, 2016 at 7:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > +1 for not moving such patches to the new CF until the author does > something --- at which point they'd change to "Needs Review" state. > But we should not change them into that state without author input. > And I don't see the value of having them in a new CF until the > author does something. To be clear: My position was always that it's good that the author has to do *something* to get their patch into the next CF. It's bad that this change in state can easily be missed, though. I've now been on both sides of this, as a patch author and patch reviewer. If the patch was left as "Waiting on Author", as my review of Alexander's patch was, then it ought to not change to "Needs Review" silently. That makes absolutely no sense. -- Peter Geoghegan
Re: Commitfest Bug (was: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates)
From
Magnus Hagander
Date:
On Tue, Mar 1, 2016 at 10:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Magnus Hagander wrote:
>> That said, we can certainly reconsider that. Would we always copy the value
>> over? Even if it was, say, rejected? (so it would be copied to the new CF
>> but still marked rejected) Or is there a subset of behaviors you're looking
>> for?
> I think the states "Ready for Committer" and "Needs Review" ought to be
> kept in the new CF.
Definitely.
> I'm unclear on what to do about "Returned with Feedback" and "Waiting on
> Author"; my first instinct is that if a patch is in those states, then
> it shouldn't be possible to move to the next CF. On the other hand, if
> we force the state to change to "Needs Review" before moving it, we
> would lose the information of what state it was closed with. So perhaps
> for any patch in those two states, the state in the next CF should be
> "needs review" too.
+1 for not moving such patches to the new CF until the author does
something --- at which point they'd change to "Needs Review" state.
But we should not change them into that state without author input.
And I don't see the value of having them in a new CF until the
author does something.
Well, "waiting on author" is not considered to be a "closing". So as long as we have patches in that state we want to do *something* with them in a CF. Which currently can be "move to next cf", "reject" or "return with feedback". So basically it means we have to decide if we want to reject it or return-with-feedback it, if we say it should not be possible to "move to next cf".
The problem is if the author already has something ready of course, in which case it will become a new entry on the new CF instead of a moved one -- which means we loose the historical connection between those two.
> I am even more unclear on "Rejected". My instinct says we should refuse
> a move-to-next-cf for such patches.
Right. Rejected is dead, it shouldn't propagate forward.
Ok, so let's try to summarize. We want the following actinos when someone clicks "move to next cf":
Needs Review -> Needs Review
Waiting on Author -> Refuse moving
Ready for committer -> Ready for Committer
Committed -> refuse moving
Moved to next cf -> refuse moving (if it's already set like this, it would probably be a bug)
Returned with feedback -> Refuse moving
Which basically means we only move things that are in needs review or ready for committer state, and that we never actually change the status of a patch in the moving.
Is that a correct summary?
Re: Commitfest Bug (was: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates)
From
Michael Paquier
Date:
On Wed, Mar 2, 2016 at 9:19 PM, Magnus Hagander <magnus@hagander.net> wrote: > Needs Review -> Needs Review > Waiting on Author -> Refuse moving > Ready for committer -> Ready for Committer > Committed -> refuse moving > Moved to next cf -> refuse moving (if it's already set like this, it would > probably be a bug) > Returned with feedback -> Refuse moving > Which basically means we only move things that are in needs review or ready > for committer state, and that we never actually change the status of a patch > in the moving. > > Is that a correct summary? Yes, thanks for the summary. It looks like this is what people would expect based on what I read here. -- Michael
Re: Commitfest Bug (was: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates)
From
Magnus Hagander
Date:
On Wed, Mar 2, 2016 at 7:34 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Wed, Mar 2, 2016 at 9:19 PM, Magnus Hagander <magnus@hagander.net> wrote:
> Needs Review -> Needs Review
> Waiting on Author -> Refuse moving
> Ready for committer -> Ready for Committer
> Committed -> refuse moving
> Moved to next cf -> refuse moving (if it's already set like this, it would
> probably be a bug)
> Returned with feedback -> Refuse moving
> Which basically means we only move things that are in needs review or ready
> for committer state, and that we never actually change the status of a patch
> in the moving.
>
> Is that a correct summary?
Yes, thanks for the summary. It looks like this is what people would
expect based on what I read here.
Ok, I've pushed a code that does that.
Re: Commitfest Bug (was: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates)
From
Peter Geoghegan
Date:
On Wed, Mar 2, 2016 at 5:41 AM, Magnus Hagander <magnus@hagander.net> wrote: > Ok, I've pushed a code that does that. Thank you. -- Peter Geoghegan