Re: 9.6 -> 10.0 - Mailing list pgsql-advocacy
From | Robert Haas |
---|---|
Subject | Re: 9.6 -> 10.0 |
Date | |
Msg-id | CA+TgmoZBiGvnQrjN7+KKseo1cRtpgJ0EkXSyNiNMYw1SbygAFQ@mail.gmail.com Whole thread Raw |
In response to | Re: 9.6 -> 10.0 (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: 9.6 -> 10.0
(Simon Riggs <simon@2ndQuadrant.com>)
|
List | pgsql-advocacy |
On Tue, Apr 5, 2016 at 10:25 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 22 March 2016 at 20:45, Simon Riggs <simon@2ndquadrant.com> wrote: >>> While having parallelism is awesome, it's only going to affect a >>> (arguably small or big depending on your viewpoint) subset of users. It's >>> going to be massive for those users, but it's not going to be useful for >>> anywhere near as many users as streaming replication+hot standby+pg_upgrade >>> in 9.0, or pitr+windows in 8.0. And yes, the vacuum freeze thing is also >>> going to be great - for a small subset of users (yes, those users are in a >>> lot of pain now). >> >> We don't yet have full parallel query, we only have parallel scan and >> parallel aggregation. > > My comment here missed the point that parallel hash join is also now > possible for small hash tables, so we at least have a useful subset of > functionality across parallel scan/join/agg. Not sure if this matters to you, but nested loops with an inner index scan also work. The thing we don't support in parallel yet is merge joins. The reason for that is that, while it's pretty obvious how to parallelize a hash join or nested loop - just have each process handle some of the rows - it's really unclear what it means to do a merge join in parallel. In fact, you basically can't; it's an inherently serial algorithm. My understanding of the literature in this area is that the trick used by other systems is basically to do a bunch of small merge joins instead of one big one. For example, if you have two compatibly partitioned tables, you can merge join each pair of partitions instead of merge-joining the appendrels. Then you get a bunch of small merge joins that can be scheduled across as many workers as you have. Once we have declarative partitioning, this should be doable. There's more that can be done: given two tables partitioned incompatibly, or one partitioned table and one unpartitioned table, the literature talks about doing an on-the-fly repartitioning of one of the tables to match the existing partitioning scheme of the other, after which you again have N separate merge joins that you can schedule across your pool of workers. What's not clear to me is whether trying to get this sort of thing working is the best use of developer time. At least in the short term, I think there are other parallel query limitations more in need of being lifted. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-advocacy by date: