Re: [HACKERS] Cost model for parallel CREATE INDEX - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Cost model for parallel CREATE INDEX
Date
Msg-id CA+TgmoaTwOUtyOM9GLG4LpANY2qxzy9NZXRKbP0bB26jRgyW4w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Cost model for parallel CREATE INDEX  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] Cost model for parallel CREATE INDEX  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Sun, Mar 5, 2017 at 7:14 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Sat, Mar 4, 2017 at 2:15 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>> So, I agree with Robert that we should actually use heap size for the
>> main, initial determination of # of workers to use, but we still need
>> to estimate the size of the final index [1], to let the cost model cap
>> the initial determination when maintenance_work_mem is just too low.
>> (This cap will rarely be applied in practice, as I said.)
>>
>> [1] https://wiki.postgresql.org/wiki/Parallel_External_Sort#bt_estimated_nblocks.28.29_function_in_pageinspect
>
> Having looked at it some more, this no longer seems worthwhile. In the
> next revision, I will add a backstop that limits the use of
> parallelism based on a lack of maintenance_work_mem in a simpler
> manner. Namely, the worker will have to be left with a
> maintenance_work_mem/nworkers share of no less than 32MB in order for
> parallel CREATE INDEX to proceed. There doesn't seem to be any great
> reason to bring the volume of data to be sorted into it.

+1.

> I expect the cost model to be significantly simplified in the next
> revision in other ways, too. There will be no new index storage
> parameter, nor a disable_parallelddl GUC. compute_parallel_worker()
> will be called in a fairly straightforward way within
> plan_create_index_workers(), using heap blocks, as agreed to already.

+1.

> pg_restore will avoid parallelism (that will happen by setting
> "max_parallel_workers_maintenance  = 0" when it runs), not because it
> cannot trust the cost model, but because it prefers to parallelize
> things its own way (with multiple restore jobs), and because execution
> speed may not be the top priority for pg_restore, unlike a live
> production system.

This part I'm not sure about.  I think people care quite a lot about
pg_restore speed, because they are often down when they're running it.
And they may have oodles mode CPUs that parallel restore can use
without help from parallel query.  I would be inclined to leave
pg_restore alone and let the chips fall where they may.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Print correct startup cost for the group aggregate.
Next
From: "Tsunakawa, Takayuki"
Date:
Subject: Re: [HACKERS] Supporting huge pages on Windows