Thread: HashAgg's batching counter starts at 0, but Hash's starts at 1.
Hi, While working on 40efbf870 I noticed that when performing a Hash Join that we always start out by setting nbatch to 1. That seems reasonable as it's hard to imagine being able to complete any non-zero amount of work in fewer than 1 batch. In the HashAgg case, since 40efbf870, we'll display: "HashAgg Batches": 0, if you do something like: explain(analyze, format json) select distinct oid from pg_class; I'd rather this said that the number of batches was 1. Does anyone have any objections to that being changed? David
On Tue, Jun 30, 2020, 7:04 PM David Rowley <dgrowleyml@gmail.com> wrote:
Does anyone have any objections to that being changed?
That's OK with me. By the way, I'm on vacation and will catch up on these HashAgg threads next week.
Regards,
Jeff Davis
On Wed, 1 Jul 2020 at 18:46, Jeff Davis <pgsql@j-davis.com> wrote: > > On Tue, Jun 30, 2020, 7:04 PM David Rowley <dgrowleyml@gmail.com> wrote: >> >> Does anyone have any objections to that being changed? > > That's OK with me. By the way, I'm on vacation and will catch up on these HashAgg threads next week. (Adding Justin as I know he's expressed interest in the EXPLAIN output of HashAgg before) I've written a patch to bring the HashAgg EXPLAIN ANALYZE output to be more aligned to the Hash Join output. Couple of things I observed about Hash Join EXPLAIN ANALYZE: 1. The number of batches starts at 1. 2. We always display the number of batches. 3. We write "Batches" for text format and "Hash Batches" for non-text formats. 4. We write "Memory Usage" for text format and "Peak Memory Usage" for non-text formats. 5. "Batches" comes before memory usage. Before this patch, HashAgg EXPLAIN ANALYZE output would: 1. Start the number of batches at 0. 2. Only display "Hash Batches" when batches > 0. 3. Used the words "HashAgg Batches" for text and non-text formats. 4. Used the words "Peak Memory Usage" for text and non-text formats. 5. "Hash Batches" was written after memory usage. In the attached patch I've changed HashAgg to be aligned to Hash Join on each of the points above. e.g. Before: postgres=# explain analyze select c.relname,count(*) from pg_class c inner join pg_Attribute a on c.oid = a.attrelid group by c.relname; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=138.37..142.23 rows=386 width=72) (actual time=3.121..3.201 rows=427 loops=1) Group Key: c.relname Peak Memory Usage: 109kB -> Hash Join (cost=21.68..124.10 rows=2855 width=64) (actual time=0.298..1.768 rows=3153 loops=1) Hash Cond: (a.attrelid = c.oid) -> Seq Scan on pg_attribute a (cost=0.00..93.95 rows=3195 width=4) (actual time=0.011..0.353 rows=3153 loops=1) -> Hash (cost=16.86..16.86 rows=386 width=68) (actual time=0.279..0.279 rows=427 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 50kB -> Seq Scan on pg_class c (cost=0.00..16.86 rows=386 width=68) (actual time=0.007..0.112 rows=427 loops=1) Planning Time: 0.421 ms Execution Time: 3.294 ms (11 rows) After: postgres=# explain analyze select c.relname,count(*) from pg_class c inner join pg_Attribute a on c.oid = a.attrelid group by c.relname; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ HashAggregate (cost=566.03..580.00 rows=1397 width=72) (actual time=13.097..13.430 rows=1397 loops=1) Group Key: c.relname Batches: 1 Memory Usage: 321kB -> Hash Join (cost=64.43..496.10 rows=13985 width=64) (actual time=0.838..7.546 rows=13985 loops=1) Hash Cond: (a.attrelid = c.oid) -> Seq Scan on pg_attribute a (cost=0.00..394.85 rows=13985 width=4) (actual time=0.010..1.462 rows=13985 loops=1) -> Hash (cost=46.97..46.97 rows=1397 width=68) (actual time=0.820..0.821 rows=1397 loops=1) Buckets: 2048 Batches: 1 Memory Usage: 153kB -> Seq Scan on pg_class c (cost=0.00..46.97 rows=1397 width=68) (actual time=0.009..0.362 rows=1397 loops=1) Planning Time: 0.440 ms Execution Time: 13.634 ms (11 rows) (ignore the change in memory consumption. That was due to adding records for testing) Any objections to this change? David
Attachment
On Mon, Jul 27, 2020 at 10:48:45AM +1200, David Rowley wrote: > On Wed, 1 Jul 2020 at 18:46, Jeff Davis <pgsql@j-davis.com> wrote: > > > > On Tue, Jun 30, 2020, 7:04 PM David Rowley <dgrowleyml@gmail.com> wrote: > >> > >> Does anyone have any objections to that being changed? > > > > That's OK with me. By the way, I'm on vacation and will catch up on these HashAgg threads next week. > > (Adding Justin as I know he's expressed interest in the EXPLAIN output > of HashAgg before) Thanks. It's unrelated to hashAgg vs hashJoin, but I also noticed that this is shown only conditionally: if (es->format != EXPLAIN_FORMAT_TEXT) { if (es->costs && aggstate->hash_planned_partitions > 0) { ExplainPropertyInteger("Planned Partitions", NULL, aggstate->hash_planned_partitions, es); That was conditional since it was introduced at 1f39bce02: if (es->costs && aggstate->hash_planned_partitions > 0) { ExplainPropertyInteger("Planned Partitions", NULL, aggstate->hash_planned_partitions, es); } I think 40efbf870 should've handled this, too. -- Justin
On Mon, 27 Jul 2020 at 14:54, Justin Pryzby <pryzby@telsasoft.com> wrote: > It's unrelated to hashAgg vs hashJoin, but I also noticed that this is shown > only conditionally: > > if (es->format != EXPLAIN_FORMAT_TEXT) > { > if (es->costs && aggstate->hash_planned_partitions > 0) > { > ExplainPropertyInteger("Planned Partitions", NULL, > aggstate->hash_planned_partitions, es); > > That was conditional since it was introduced at 1f39bce02: > > if (es->costs && aggstate->hash_planned_partitions > 0) > { > ExplainPropertyInteger("Planned Partitions", NULL, > aggstate->hash_planned_partitions, es); > } > > I think 40efbf870 should've handled this, too. hmm. I'm not sure. I think this should follow the same logic as what "Disk Usage" follows, and right now we don't show Disk Usage unless we spill. Since we only use partitions when spilling, I don't think it makes sense to show the estimated partitions when we don't plan on spilling. I think if we change this then we should change Disk Usage too. However, I don't think we should as Sort will only show "Disk" if the sort spills. I think Hash Agg should follow that. For the patch I posted yesterday, I'll go ahead in push it in about 24 hours unless there are any objections. David
On Tue, Jul 28, 2020 at 12:54:35PM +1200, David Rowley wrote: > On Mon, 27 Jul 2020 at 14:54, Justin Pryzby <pryzby@telsasoft.com> wrote: > > It's unrelated to hashAgg vs hashJoin, but I also noticed that this is shown > > only conditionally: > > > > if (es->format != EXPLAIN_FORMAT_TEXT) > > { > > if (es->costs && aggstate->hash_planned_partitions > 0) > > { > > ExplainPropertyInteger("Planned Partitions", NULL, > > aggstate->hash_planned_partitions, es); > > > > That was conditional since it was introduced at 1f39bce02: > > > > if (es->costs && aggstate->hash_planned_partitions > 0) > > { > > ExplainPropertyInteger("Planned Partitions", NULL, > > aggstate->hash_planned_partitions, es); > > } > > > > I think 40efbf870 should've handled this, too. > > hmm. I'm not sure. I think this should follow the same logic as what > "Disk Usage" follows, and right now we don't show Disk Usage unless we > spill. Huh ? I'm referring to non-text format, which is what you changed, on the reasoning that the same plan *could* spill: 40efbf8706cdd96e06bc4d1754272e46d9857875 if (es->format != EXPLAIN_FORMAT_TEXT) { if (es->costs && aggstate->hash_planned_partitions > 0) { ExplainPropertyInteger("Planned Partitions", NULL, aggstate->hash_planned_partitions, es); } ... /* EXPLAIN ANALYZE */ ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es); - if (aggstate->hash_batches_used > 0) - { ExplainPropertyInteger("Disk Usage", "kB", aggstate->hash_disk_used, es); ExplainPropertyInteger("HashAgg Batches", NULL, aggstate->hash_batches_used, es); > Since we only use partitions when spilling, I don't think it > makes sense to show the estimated partitions when we don't plan on > spilling. In any case, my thinking is that we *should* show PlannedPartitions=0, specifically to indicate *that* we didn't plan to spill. > I think if we change this then we should change Disk Usage too. > However, I don't think we should as Sort will only show "Disk" if the > sort spills. I think Hash Agg should follow that. > > For the patch I posted yesterday, I'll go ahead in push it in about 24 > hours unless there are any objections.
On Mon, Jul 27, 2020 at 5:54 PM David Rowley <dgrowleyml@gmail.com> wrote: > hmm. I'm not sure. I think this should follow the same logic as what > "Disk Usage" follows, and right now we don't show Disk Usage unless we > spill. Since we only use partitions when spilling, I don't think it > makes sense to show the estimated partitions when we don't plan on > spilling. I'm confused about what the guiding principles for EXPLAIN ANALYZE output (text or otherwise) are. > I think if we change this then we should change Disk Usage too. > However, I don't think we should as Sort will only show "Disk" if the > sort spills. I think Hash Agg should follow that. I don't follow your remarks here. Separately, I wonder what your opinion is about what should happen for the partial sort related EXPLAIN ANALYZE format open item, described here: https://www.postgresql.org/message-id/flat/20200619040358.GZ17995%40telsasoft.com#b20bd205851a0390220964f7c31b23d1 ISTM that EXPLAIN ANALYZE for incremental sort manages to show the same information as the sort case, aggregated across each tuplesort in a fairly sensible way. (No activity over on the incremental sort thread, so I thought I'd ask again here, while I was reminded of that issue.) -- Peter Geoghegan
On Mon, Jul 27, 2020 at 08:20:45PM -0700, Peter Geoghegan wrote: > On Mon, Jul 27, 2020 at 5:54 PM David Rowley <dgrowleyml@gmail.com> wrote: > > hmm. I'm not sure. I think this should follow the same logic as what > > "Disk Usage" follows, and right now we don't show Disk Usage unless we > > spill. Since we only use partitions when spilling, I don't think it > > makes sense to show the estimated partitions when we don't plan on > > spilling. > > I'm confused about what the guiding principles for EXPLAIN ANALYZE > output (text or otherwise) are. I don't know of a guideline for text format, but the issues I've raised for HashAgg and IncrSort are based on previous commits which change to "show field even if its value is zero" for non-text format: 7d91b604d9b5d6ec8c19c57a9ffd2f27129cdd94 8ebb69f85445177575684a0ba5cfedda8d840a91 3ec20c7091e97a554e7447ac2b7f4ed795631395 > > I think if we change this then we should change Disk Usage too. > > However, I don't think we should as Sort will only show "Disk" if the > > sort spills. I think Hash Agg should follow that. > > I don't follow your remarks here. > > Separately, I wonder what your opinion is about what should happen for > the partial sort related EXPLAIN ANALYZE format open item, described > here: > > https://www.postgresql.org/message-id/flat/20200619040358.GZ17995%40telsasoft.com#b20bd205851a0390220964f7c31b23d1 > > ISTM that EXPLAIN ANALYZE for incremental sort manages to show the > same information as the sort case, aggregated across each tuplesort in > a fairly sensible way. > > (No activity over on the incremental sort thread, so I thought I'd ask > again here, while I was reminded of that issue.) Note that I did mail recently, on this 2nd thread: https://www.postgresql.org/message-id/20200723141454.GF4286@telsasoft.com -- Justin
On Tue, 28 Jul 2020 at 15:01, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Tue, Jul 28, 2020 at 12:54:35PM +1200, David Rowley wrote: > > On Mon, 27 Jul 2020 at 14:54, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > It's unrelated to hashAgg vs hashJoin, but I also noticed that this is shown > > > only conditionally: > > > > > > if (es->format != EXPLAIN_FORMAT_TEXT) > > > { > > > if (es->costs && aggstate->hash_planned_partitions > 0) > > > { > > > ExplainPropertyInteger("Planned Partitions", NULL, > > > aggstate->hash_planned_partitions, es); > > > > > > That was conditional since it was introduced at 1f39bce02: > > > > > > if (es->costs && aggstate->hash_planned_partitions > 0) > > > { > > > ExplainPropertyInteger("Planned Partitions", NULL, > > > aggstate->hash_planned_partitions, es); > > > } > > > > > > I think 40efbf870 should've handled this, too. > > > > hmm. I'm not sure. I think this should follow the same logic as what > > "Disk Usage" follows, and right now we don't show Disk Usage unless we > > spill. > > Huh ? I'm referring to non-text format, which is what you changed, on the > reasoning that the same plan *could* spill: Oh, right. ... (Sudden bout of confusion due to lack of sleep) Looks like it'll just need this line: if (es->costs && aggstate->hash_planned_partitions > 0) changed to: if (es->costs) I think we'll likely need to maintain not showing that property with explain (costs off) as it'll be a bit more difficult to write regression tests if we display it regardless of that option. David
On Tue, 28 Jul 2020 at 15:21, Peter Geoghegan <pg@bowt.ie> wrote: > Separately, I wonder what your opinion is about what should happen for > the partial sort related EXPLAIN ANALYZE format open item, described > here: > > https://www.postgresql.org/message-id/flat/20200619040358.GZ17995%40telsasoft.com#b20bd205851a0390220964f7c31b23d1 > > ISTM that EXPLAIN ANALYZE for incremental sort manages to show the > same information as the sort case, aggregated across each tuplesort in > a fairly sensible way. > > (No activity over on the incremental sort thread, so I thought I'd ask > again here, while I was reminded of that issue.) TBH, I've not really looked at that. Tom did mention his view on this in [1]. I think that's a pretty good policy. However, I've not looked at the incremental sort EXPLAIN output enough to know how it'll best apply there. David [1] https://www.postgresql.org/message-id/2276865.1593102811%40sss.pgh.pa.us
On Mon, 27 Jul 2020 at 14:54, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Mon, Jul 27, 2020 at 10:48:45AM +1200, David Rowley wrote: > > On Wed, 1 Jul 2020 at 18:46, Jeff Davis <pgsql@j-davis.com> wrote: > > > > > > On Tue, Jun 30, 2020, 7:04 PM David Rowley <dgrowleyml@gmail.com> wrote: > > >> > > >> Does anyone have any objections to that being changed? > > > > > > That's OK with me. By the way, I'm on vacation and will catch up on these HashAgg threads next week. > > > > (Adding Justin as I know he's expressed interest in the EXPLAIN output > > of HashAgg before) > > Thanks. > > It's unrelated to hashAgg vs hashJoin, but I also noticed that this is shown > only conditionally: > > if (es->format != EXPLAIN_FORMAT_TEXT) > { > if (es->costs && aggstate->hash_planned_partitions > 0) > { > ExplainPropertyInteger("Planned Partitions", NULL, > aggstate->hash_planned_partitions, es); > > That was conditional since it was introduced at 1f39bce02: > > if (es->costs && aggstate->hash_planned_partitions > 0) > { > ExplainPropertyInteger("Planned Partitions", NULL, > aggstate->hash_planned_partitions, es); > } > > I think 40efbf870 should've handled this, too. I pushed that change along with all the other changes mentioned to the EXPLAIN ANALYZE format. David
On Mon, Jul 27, 2020 at 8:36 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > I don't know of a guideline for text format, but the issues I've raised for > HashAgg and IncrSort are based on previous commits which change to "show field > even if its value is zero" for non-text format: But the non-text format for IncrSort shows about the same information as sort, broken out by group. What's missing if you assume that sort is the gold standard? The objection to your argument from James (which could just as easily apply to regular sort from earlier releases) is that accurate information just isn't available as a practical matter. This is due to tuplesort implementation limitations that cannot be fixed now. See the comment block in tuplesort_get_stats() for an explanation. The hard part is showing memory used by external sorts. It's true that "Disk" is specifically shown by sort nodes output in text explain format, but you're talking about non-text formats so that's not really relevant AFAICT sort (and IncrSort) don't fail to show a field value if it is zero. Rather, they consistently show "space used" (in non-text format), which can be either memory or disk space. Technically they don't violate the letter of the law that you cite. (Though admittedly this is a legalistic loophole -- the "space" value presumably has to be interpreted according to different rules for programs that consume non-text EXPLAIN output.) Have I missed something? -- Peter Geoghegan
On Wed, Jul 29, 2020 at 08:35:08PM -0700, Peter Geoghegan wrote: > AFAICT sort (and IncrSort) don't fail to show a field value if it is > zero. Rather, they consistently show "space used" (in non-text > format), which can be either memory or disk space. Technically they > don't violate the letter of the law that you cite. (Though admittedly > this is a legalistic loophole -- the "space" value presumably has to > be interpreted according to different rules for programs that consume > non-text EXPLAIN output.) Sort shows this: Sort Method: "external merge" + Sort Space Used: 19520 + Sort Space Type: "Disk" + Incremental sort shows this: Sort Methods Used: + - "external merge" + Sort Space Disk: + Average Sort Space Used: 128+ Peak Sort Space Used: 128 + So my 2ndary suggestion is to conditionalize based on the method rather than value of space used. --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -2830 +2830 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, - if (groupInfo->maxMemorySpaceUsed > 0) + if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT) @@ -2847 +2847 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, - if (groupInfo->maxDiskSpaceUsed > 0) + if (groupInfo->sortMethods & SORT_TYPE_EXTERNAL_SORT) -- Justin
On Wed, Jul 29, 2020 at 9:05 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > So my 2ndary suggestion is to conditionalize based on the method rather than > value of space used. What's the advantage of doing it that way? -- Peter Geoghegan
Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)
From
Justin Pryzby
Date:
On Wed, Jul 29, 2020 at 09:18:44PM -0700, Peter Geoghegan wrote: > On Wed, Jul 29, 2020 at 9:05 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > So my 2ndary suggestion is to conditionalize based on the method rather than > > value of space used. > > What's the advantage of doing it that way? Because filtering out zero values is exactly what's intended to be avoided for nontext output. I think checking whether the method was used should result in the same output, without the literal check for zero value (which itself sets a bad example). --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -2824,13 +2824,13 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, appendStringInfo(&groupName, "%s Groups", groupLabel); ExplainOpenGroup("Incremental Sort Groups", groupName.data, true, es); ExplainPropertyInteger("Group Count", NULL, groupInfo->groupCount, es); ExplainPropertyList("Sort Methods Used", methodNames, es); - if (groupInfo->maxMemorySpaceUsed > 0) + if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT) { long avgSpace = groupInfo->totalMemorySpaceUsed / groupInfo->groupCount; const char *spaceTypeName; StringInfoData memoryName; spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY); @@ -2841,13 +2841,13 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo, ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpace, es); ExplainPropertyInteger("Peak Sort Space Used", "kB", groupInfo->maxMemorySpaceUsed, es); ExplainCloseGroup("Sort Spaces", memoryName.data, true, es); } - if (groupInfo->maxDiskSpaceUsed > 0) + if (groupInfo->sortMethods & SORT_TYPE_EXTERNAL_SORT) { long avgSpace = groupInfo->totalDiskSpaceUsed / groupInfo->groupCount; const char *spaceTypeName; StringInfoData diskName; spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_DISK);
Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)
From
Peter Geoghegan
Date:
On Thu, Jul 30, 2020 at 5:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > Because filtering out zero values is exactly what's intended to be avoided for > nontext output. > > I think checking whether the method was used should result in the same output, > without the literal check for zero value (which itself sets a bad example). It seems fine to me as-is. What about SORT_TYPE_TOP_N_HEAPSORT? Or any other sort methods we add in the future? The way that we flatten maxDiskSpaceUsed and maxMemorySpaceUsed into "space used" on output might be kind of questionable, but it's something that we have to live with for the foreseeable future. I don't think that this is a bad example -- we don't output maxDiskSpaceUsed or maxMemorySpaceUsed at the conceptual level. -- Peter Geoghegan
Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)
From
James Coleman
Date:
On Thu, Jul 30, 2020 at 8:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Wed, Jul 29, 2020 at 09:18:44PM -0700, Peter Geoghegan wrote:
> On Wed, Jul 29, 2020 at 9:05 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > So my 2ndary suggestion is to conditionalize based on the method rather than
> > value of space used.
>
> What's the advantage of doing it that way?
Because filtering out zero values is exactly what's intended to be avoided for
nontext output.
I think checking whether the method was used should result in the same output,
without the literal check for zero value (which itself sets a bad example).
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2824,13 +2824,13 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
appendStringInfo(&groupName, "%s Groups", groupLabel);
ExplainOpenGroup("Incremental Sort Groups", groupName.data, true, es);
ExplainPropertyInteger("Group Count", NULL, groupInfo->groupCount, es);
ExplainPropertyList("Sort Methods Used", methodNames, es);
- if (groupInfo->maxMemorySpaceUsed > 0)
+ if (groupInfo->sortMethods & SORT_TYPE_QUICKSORT)
{
long avgSpace = groupInfo->totalMemorySpaceUsed / groupInfo->groupCount;
const char *spaceTypeName;
StringInfoData memoryName;
spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_MEMORY);
@@ -2841,13 +2841,13 @@ show_incremental_sort_group_info(IncrementalSortGroupInfo *groupInfo,
ExplainPropertyInteger("Average Sort Space Used", "kB", avgSpace, es);
ExplainPropertyInteger("Peak Sort Space Used", "kB",
groupInfo->maxMemorySpaceUsed, es);
ExplainCloseGroup("Sort Spaces", memoryName.data, true, es);
}
- if (groupInfo->maxDiskSpaceUsed > 0)
+ if (groupInfo->sortMethods & SORT_TYPE_EXTERNAL_SORT)
{
long avgSpace = groupInfo->totalDiskSpaceUsed / groupInfo->groupCount;
const char *spaceTypeName;
StringInfoData diskName;
spaceTypeName = tuplesort_space_type_name(SORT_SPACE_TYPE_DISK);
I very much do not like this approach, and I think it's actually fundamentally wrong, at least for the memory check. Quicksort is not the only option that uses memory. For now, there's only one option that spills to disk (external merge sort), but there's no reason it has to remain that way. And in the future we might accurately report memory consumed even when we've eventually spilled to disk also, so memory used would be relevant potentially even if no in-memory sort was ever performed.
So I'm pretty confident checking the space used is the correct way to do this.
James
Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)
From
Justin Pryzby
Date:
On Thu, Jul 30, 2020 at 06:33:32PM -0700, Peter Geoghegan wrote: > On Thu, Jul 30, 2020 at 5:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > Because filtering out zero values is exactly what's intended to be avoided for > > nontext output. > > > > I think checking whether the method was used should result in the same output, > > without the literal check for zero value (which itself sets a bad example). > > It seems fine to me as-is. What about SORT_TYPE_TOP_N_HEAPSORT? Or any > other sort methods we add in the future? Feel free to close it out. I'm satisfied that we've had a discussion about it. -- Justin
Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)
From
Peter Geoghegan
Date:
On Thu, Jul 30, 2020 at 6:39 PM James Coleman <jtc331@gmail.com> wrote: > I very much do not like this approach, and I think it's actually fundamentally wrong, at least for the memory check. Quicksortis not the only option that uses memory. For now, there's only one option that spills to disk (external merge sort),but there's no reason it has to remain that way. I wouldn't be surprised if it was possible to get SORT_TYPE_EXTERNAL_SORT even today (though I'm not sure if that's truly possible). That will happen for a regular sort node if we require randomAccess to the sort, and it happens to spill -- we can randomly access the final tape, but cannot do a final on-the-fly merge. Say for a merge join. -- Peter Geoghegan
Re: HashAgg's batching counter starts at 0, but Hash's starts at 1. (now: incremental sort)
From
Peter Geoghegan
Date:
On Thu, Jul 30, 2020 at 6:40 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > Feel free to close it out. I'm satisfied that we've had a discussion about it. Closed it out. -- Peter Geoghegan