Re: Parallel Sort - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: Parallel Sort
Date
Msg-id CAP7QgmnZHZvvbb4hDTJ5-95Zn73TAy6RLXf6VdsiM3TAiZMhcw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Sort  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers



On Wed, May 15, 2013 at 11:11 AM, Noah Misch <noah@leadboat.com> wrote:
On Wed, May 15, 2013 at 08:12:34AM +0900, Michael Paquier wrote:
> The concept of clause parallelism for backend worker is close to the
> concept of clause shippability introduced in Postgres-XC. In the case of
> XC, the equivalent of the master backend is a backend located on a node
> called Coordinator that merges and organizes results fetched in parallel
> from remote nodes where data scans occur (on nodes called Datanodes). The
> backends used for tuple scans across Datanodes share the same data
> visibility as they use the same snapshot and transaction ID as the backend
> on Coordinator. This is different from the parallelism as there is no idea
> of snapshot import to worker backends.

Worker backends would indeed share snapshot and XID.

> However, the code in XC planner used for clause shippability evaluation is
> definitely worth looking at just considering the many similarities it
> shares with parallelism when evaluating if a given clause can be executed
> on a worker backend or not. It would be a waste to implement twice the same
> thing is there is code already available.

Agreed.  Local parallel query is very similar to distributed query; the
specific IPC cost multipliers differ, but that's about it.  I hope we can
benefit from XC's experience in this area.


I believe the parallel execution is much easier to be done if the data is partitioned.  Of course it is possible to make only the sort operation parallel but then the question would be how to split and pass each tuple to workers.  XC and Greenplum use notion of hash distributed table that enables the parallel sort (XC doesn't perform parallel sort on replicated table, I guess).  For postgres, I don't think hash distributed table is foreseeable option, but MergeAppend over inheritance is a good choice to run in parallel.  You won't even need to modify many lines of sort execution code if you correctly dispatch the work, as it's just to split and assign the subnode of query plan to workers.  Transactions and locks will be tricky though, and we might end up introducing small set of snapshot sharing infra for the former and notion of session id rather than process id for the latter.  I don't think SnapshotNow is the problem as anyway executor is reading catalogs with that today.

Thanks,
Hitoshi


--
Hitoshi Harada

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Assertion failure when promoting node by deleting recovery.conf and restart node
Next
From: Jeff Janes
Date:
Subject: psql \watch versus \timing