Re: [HACKERS] Cost model for parallel CREATE INDEX - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Cost model for parallel CREATE INDEX
Date
Msg-id CA+Tgmob16p5EN5TMQHzbNuA-BJtEPFj860XnegM2-3OW=1CEbA@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Cost model for parallel CREATE INDEX  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: [HACKERS] Cost model for parallel CREATE INDEX  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Wed, Mar 1, 2017 at 12:58 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> * This scales based on output size (projected index size), not input
> size (heap scan input). Apparently, that's what we always do right
> now.

Actually, I'm not aware of any precedent for that. I'd just pass the
heap size to compute_parallel_workers(), leaving the index size as 0,
and call it good.  What you're doing now seems exactly backwards from
parallel query generally.

> So, the main factor that
> discourages parallel sequential scans doesn't really exist for
> parallel CREATE INDEX.

Agreed.

> We could always defer the cost model to another release, and only
> support the storage parameter for now, though that has disadvantages,
> some less obvious [4].

I think it's totally counter-intuitive that any hypothetical index
storage parameter would affect the degree of parallelism involved in
creating the index and also the degree of parallelism involved in
scanning it.  Whether or not other systems do such crazy things seems
to me to beside the point.  I think if CREATE INDEX allows an explicit
specification of the degree of parallelism (a decision I would favor)
it should have a syntactically separate place for unsaved build
options vs. persistent storage parameters.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Proposal : For Auto-Prewarm.
Next
From: David Steele
Date:
Subject: Re: [HACKERS] Indirect indexes