Re: [PATCH] Incremental sort (was: PoC: Partial sort) - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Date | |
Msg-id | CAAaqYe9qzKbxCvSp3dfLkuS1v8KKnB7kW3z-hZ2jnAQaveSm8w@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Incremental sort (was: PoC: Partial sort) (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
List | pgsql-hackers |
On Mon, Apr 6, 2020 at 7:31 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Mon, Apr 06, 2020 at 07:09:11PM -0400, James Coleman wrote: > >On Mon, Apr 6, 2020 at 6:13 PM Tomas Vondra > ><tomas.vondra@2ndquadrant.com> wrote: > >> > >> On Mon, Apr 06, 2020 at 05:47:48PM -0400, James Coleman wrote: > >> >On Mon, Apr 6, 2020 at 5:40 PM Tomas Vondra > >> ><tomas.vondra@2ndquadrant.com> wrote: > >> >> > >> >> On Mon, Apr 06, 2020 at 11:12:32PM +0200, Tomas Vondra wrote: > >> >> >On Mon, Apr 06, 2020 at 04:54:38PM -0400, Alvaro Herrera wrote: > >> >> >>On 2020-Apr-06, Tom Lane wrote: > >> >> >> > >> >> >>>Locally, things pass without force_parallel_mode, but turning it on > >> >> >>>produces failures that look similar to rhinoceros's (didn't examine > >> >> >>>other BF members). > >> >> >> > >> >> >>FWIW I looked at the eight failures there were about fifteen minutes ago > >> >> >>and they were all identical. I can confirm that, in my laptop, the > >> >> >>tests work without that GUC, and fail in exactly that way with it. > >> >> >> > >> >> > > >> >> >Yes, there's a thinko in show_incremental_sort_info() and it returns too > >> >> >soon. I'll push a fix in a minute. > >> >> > > >> >> > >> >> OK, I've pushed a fix - this should make the buildfarm happy again. > >> >> > >> >> It however seems to me a bit more needs to be done. The fix makes > >> >> show_incremental_sort_info closer to show_sort_info, but not entirely > >> >> because IncrementalSortState does not have sort_Done flag so it still > >> >> depends on (fullsortGroupInfo->groupCount > 0). I haven't noticed that > >> >> before, but not having that flag seems a bit weird to me. > >> >> > >> >> It also seems possibly incorrect - we may end up with > >> >> > >> >> fullsortGroupInfo->groupCount == 0 > >> >> prefixsortGroupInfo->groupCount > 0 > >> >> > >> >> but we won't print anything. > >> > > >> >This shouldn't ever be possible, because the only way we get any > >> >prefix groups at all is if we've already sorted a full sort group > >> >during the mode transition. > >> > > >> >> James, any opinion on this? I'd say we should restore the sort_Done flag > >> >> and make it work as in plain Sort. Or some comment explaining why > >> >> depending on the counts is OK (assuming it is). > >> > > >> >There's previous email traffic on this thread about that (I can look > >> >it up later this evening), but the short of it is that I believe that > >> >relying on the group count is actually more correct than a sort_Done > >> >flag in the case of incremental sort (in contrast to regular sort). > >> > > >> > >> OK. Maybe we should add a comment to explain.c saying it's OK. > >> > >> I've pushed a fix for failures due to different planned workers (in the > >> test I added to show changes due to add_partial_path tweaks). > >> > >> It seems we're not out of the woods yet, though. rhinoceros and > >> sidewinder failed with something like this: > >> > >> Sort Method: quicksort Memory: NNkB > >> + Sort Method: unknown Disk: NNkB > >> > >> Would you mind investigating at it? > > > >I assume that means those build farm members run with very low > >work_mem? Is it an acceptable fix to adjust work_mem up a bit just for > >these tests? Or is that bad practice and these are to expose issues > >with changing into disk sort mode? > > > > I don't think so - I don't see any work_mem changes in the config - see > the extra_config at the beginning of the page with details: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2020-04-06%2023%3A00%3A16 > > Moreover, this seems to be in regular Sort, not Incremental Sort and it > very much seems like it gets confused to print a worker info because the > only way for Sort to print two "Sort Method" lines seems to be to enter > either both > > if (sortstate->sort_Done && sortstate->tuplesortstate != NULL) > { > ... print leader info ... > } > > and > > if (sortstate->shared_info != NULL) > { > for (n = 0; n < sortstate->shared_info->num_workers; n++) > { > ... print worker info ... > } > } > > or maybe there are two workers? It's strange ... > > > It doesn't seem to be particularly platform-specific, but I've been > unable to reproduce it so far. It seems on older gcc versions, though. I haven't been able to reproduce it, but I'm 99% confident this will fix it: - if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS) + if (sinstrument->sortMethod == SORT_TYPE_STILL_IN_PROGRESS + || sinstrument->sortMethod == NULL) continue; /* ignore any unfilled slots */ Earlier we'd had this discussion about why SORT_TYPE_STILL_IN_PROGRESS was explicitly set to 0 in the enum declaration. Since there was no comment, we changed that, but here I believe that show_sort_info was relying on that as an indicator that a worker didn't actually do any work (since the DSM for the sort node gets set to all zeros, this would work). I'm not sure if the SORT_TYPE_STILL_IN_PROGRESS case is actually still needed, though. I've attached both a fix for this issue and a comment for the full/prefix sort group if blocks. James
Attachment
pgsql-hackers by date: