Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | 20200407010137.2bk2ljqk7gmlcfsi@development Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (James Coleman <jtc331@gmail.com>) |
List | pgsql-hackers |
On Mon, Apr 06, 2020 at 07:27:19PM -0400, James Coleman wrote: >On Mon, Apr 6, 2020 at 7:09 PM James Coleman <jtc331@gmail.com> wrote: >> >> On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra >> <tomas.vondra@2ndquadrant.com> wrote: >> > >> > On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote: >> > >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra >> > ><tomas.vondra@2ndquadrant.com> wrote: >> > >> >> > >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: >> > >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: >> > >> >>On 2020-Apr-06, Tom Lane wrote: >> > >> >> >> > >> >>>Locally, things pass without force_parallel_mode, but turning it on >> > >> >>>produces failures that look similar to rhinoceros's (didn't examine >> > >> >>>other BF members). >> > >> >> >> > >> >>FWIW I looked at the eight failures there were about fifteen minutes ago >> > >> >>and they were all identical. I can confirm that, in my laptop, the >> > >> >>tests work without that GUC, and fail in exactly that way with it. >> > >> >> >> > >> > >> > >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too >> > >> >soon. I'll push a fix in a minute. >> > >> > >> > >> >> > >> OK, I've pushed a fix - this should make the buildfarm happy again. >> > >> >> > >> It however seems to me a bit more needs to be done. The fix makes >> > >> show_incremental_sort_info closer to show_sort_info, but not entirely >> > >> because IncrementalSortState does not have sort_Done flag so it still >> > >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that >> > >> before, but not having that flag seems a bit weird to me. >> > >> >> > >> It also seems possibly incorrect - we may end up with >> > >> >> > >> fullsortGroupInfo->groupCount == 0 >> > >> prefixsortGroupInfo->groupCount > 0 >> > >> >> > >> but we won't print anything. >> > > >> > >This shouldn't ever be possible, because the only way we get any >> > >prefix groups at all is if we've already sorted a full sort group >> > >during the mode transition. >> > > >> > >> James, any opinion on this? I'd say we should restore the sort_Done flag >> > >> and make it work as in plain Sort. Or some comment explaining why >> > >> depending on the counts is OK (assuming it is). >> > > >> > >There's previous email traffic on this thread about that (I can look >> > >it up later this evening), but the short of it is that I believe that >> > >relying on the group count is actually more correct than a sort_Done >> > >flag in the case of incremental sort (in contrast to regular sort). >> > > >> > >> > OK. Maybe we should add a comment to explain.c saying it's OK. >> > >> > I've pushed a fix for failures due to different planned workers (in the >> > test I added to show changes due to add_partial_path tweaks). >> > >> > It seems we're not out of the woods yet, though. rhinoceros and >> > sidewinder failed with something like this: >> > >> > Sort Method: quicksort Memory: NNkB >> > + Sort Method: unknown Disk: NNkB >> > >> > Would you mind investigating at it? >> >> I assume that means those build farm members run with very low >> work_mem? Is it an acceptable fix to adjust work_mem up a bit just for >> these tests? Or is that bad practice and these are to expose issues >> with changing into disk sort mode? > >On rhinoceros I see: > >================== pgsql.build/src/test/regress/regression.diffs >=================== >diff -U3 /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out >/opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out >--- /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/expected/subselect.out >2020-03-14 10:37:49.156761104 -0700 >+++ /opt/src/pgsql-git/build-farm-root/HEAD/pgsql.build/src/test/regress/results/subselect.out >2020-04-06 16:01:13.766798059 -0700 >@@ -1328,8 +1328,9 @@ > -> Sort (actual rows=3 loops=1) > Sort Key: sq_limit.c1, sq_limit.pk > Sort Method: top-N heapsort Memory: xxx >+ Sort Method: unknown Disk: 0kB > -> Seq Scan on sq_limit (actual rows=8 loops=1) >-(6 rows) >+(7 rows) > >Same on sidewinder. > >Given the 0kB I'm not sure this is *just* a work_mem thing, though >that's still something I'm curious to know about, and it's still part >of the "problem" here. > >I'm investigating further. > I don't see how could this be caused by low work_mem value, really. What I think it happening is that when executed with force_parallel_mode = regress we run this as if in a parallel worker, but then it gets somehow confused and either (leader + worker) or two worker stats, or something like that. I don't know why it's happening, though, or why would it be triggered by the incremental sort patch ... Actually, I just managed to trigger exactly this - the trick is that we plan for certain number of workers, but then fail to start some. So for example like this: create table t (a int, b int, c int); insert into t select mod(i,10), mod(i,10), i from generate_series(1,1000000) s(i); set force_parallel_mode = regress; set max_parallel_workers = 0; explain (costs off, analyze) select * from t order by a,b; QUERY PLAN --------------------------------------------------------------- Sort (actual time=0.010..0.010 rows=0 loops=1) Sort Key: a, b Sort Method: quicksort Memory: 25kB Sort Method: unknown Disk: 0kB -> Seq Scan on t (actual time=0.003..0.004 rows=0 loops=1) Planning Time: 0.040 ms Execution Time: 0.229 ms (7 rows) So we actually do have to print two lines, because without any workers the leader ends up doing the work. But we don't know this is happening because the the number of workers started is included in Gather node, but force_parallel_mode=regress hides that. The question is why are these failures correlated with incremental sort. I don't think we've tweaked Sort at all, no? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: