Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Parallel tuplesort (for parallel B-Tree index creation)
Date
Msg-id CAM3SWZTQHvmPTL-mZ-2gOfPQu3f=MLT0c4HXQPC5=+ESS42hjw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel tuplesort (for parallel B-Tree index creation)  (Claudio Freire <klaussfreire@gmail.com>)
Responses Re: Parallel tuplesort (for parallel B-Tree index creation)  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
On Thu, Sep 8, 2016 at 8:53 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
> setup:
>
> create table lotsofitext(i text, j text, w text, z integer, z2 bigint);
> insert into lotsofitext select cast(random() * 1000000000.0 as text)
> || 'blablablawiiiiblabla', cast(random() * 1000000000.0 as text) ||
> 'blablablawjjjblabla', cast(random() * 1000000000.0 as text) ||
> 'blablabl
> awwwabla', random() * 1000000000.0, random() * 1000000000000.0 from
> generate_series(1, 10000000);
>
> timed:
>
> select count(*) FROM (select * from lotsofitext order by i, j, w, z, z2) t;
>
> Unpatched Time: 100351.251 ms
> Patched Time: 75180.787 ms
>
> That's like a 25% speedup on random input. As we say over here, rather
> badly translated, not a turkey's boogers (meaning "nice!")

Cool! What work_mem setting were you using here?

>> More than using "n" or "memtupcount" what I'm saying is to assert that
>> memtuples[imin] is inside the heap, which would catch the same errors
>> the original assert would, and more.
>>
>> Assert(imin < state->memtupcount)
>>
>> If you prefer.
>>
>> The original asserts allows any value of imin for memtupcount>1, and
>> that's my main concern. It shouldn't.
>
> So, for the assertions to properly avoid clobbering/reading out of
> bounds memory, you need both the above assert:
>
>  +                */
>  +               memtuples[i] = memtuples[imin];
>  +               i = imin;
>  +       }
>  +
>>+       Assert(imin < state->memtupcount);
>  +       memtuples[imin] = *newtup;
>  +}
>
> And another one at the beginning, asserting:
>
>  +       SortTuple  *memtuples = state->memtuples;
>  +       int             n,
>  +                               imin,
>  +                               i;
>  +
>>+       Assert(state->memtupcount > 0 && memtuples[0].tupindex == newtup->tupindex);
>  +
>  +       CHECK_FOR_INTERRUPTS();
>
> It's worth making that change, IMHO, unless I'm missing something.

You're supposed to just not call it with an empty heap, so the
assertions trust that much. I'll look into that.

Currently, producing a new revision of this entire patchset. Improving
the cost model (used when the parallel_workers storage parameter is
not specified within CREATE INDEX) is taking a bit of time, but hope
to have it out in the next couple of days.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Aggregate Push Down - Performing aggregation on foreign server
Next
From: Tom Lane
Date:
Subject: Re: pg_dump with tables created in schemas created by extensions