Re: using custom scan nodes to prototype parallel sequential scan - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: using custom scan nodes to prototype parallel sequential scan
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F801076549@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: using custom scan nodes to prototype parallel sequential scan  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: using custom scan nodes to prototype parallel sequential scan  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> > Given
> > where we are with the infrastructure, there would be a number of
> > unhandled problems, such as deadlock detection (needs group locking or
> > similar), assessment of quals as to parallel-safety (needs
> > proisparallel or similar), general waterproofing to make sure that
> > pushing down a qual we shouldn't does do anything really dastardly
> > like crash the server (another written but yet-to-be-published patch
> > adds a bunch of relevant guards), and snapshot sharing (likewise).
> > But if you don't do anything weird, it should basically work.
> 
> If we build something fairly restricted, but production ready we can still
> do something useful to users.
> 
> * only functions marked as "CONTAINS NO SQL"
> * parallel_workers = 2 or fixed
> * only one Plan type, designed to maximise benefit and minimize optimizer
> change
> 
> Why?
> 
> * The above is all that is needed to test any infrastructure changes we
> accept. We need both infrastructure and tests. We could do this as a custom
> scan plugin, but why not make it work in core - that doesn't prevent a plugin
> version also if you desire it.
> 
> * only functions marked as "CONTAINS NO SQL"
> We don't really know what proisparallel is, but we do know what CONTAINS
> NO SQL means and can easily check for it.
> Plus I already have a patch for this, slightly bitrotted.
> 
Isn't provolatile = PROVOLATILE_IMMUTABLE sufficient?

> * parallel_workers = 2 (or at least not make it user settable) By fixing
> the number of workers at 2 we avoid any problems caused by having N variable,
> such as how to vary N fairly amongst users and other such considerations.
> We get the main benefit of parallelism, without causing other issues across
> the server.
> 
> * Fixed Plan: aggregate-scan
> To make everything simpler, allow only plans of a single type.
>  SELECT something, list of aggregates
>  FROM foo
>  WHERE filters
>  GROUP BY something
> because we know that passing large amounts of data from worker to master
> process will be slow, so focusing only on seq scan is not sensible; we should
> focus on plans that significantly reduce the number of rows passed upwards.
> We could just do this for very selective WHERE clauses, but that is not
> an important class of query.
> As soon as include aggregates, we reduce data passing significantly AND
> we hit a very important subset of queries:
> 
> This plan type is widely used in reporting queries, so will hit the mainline
> of BI applications and many Mat View creations.
> This will allow SELECT count(*) FROM foo to go faster also.
> 
I agree with you. The key of performance is how many rows can be reduced
by parallel processor, however, one thing I doubt is that the condition
to kick parallel execution may be too restrictive.
I'm not sure how much workloads run aggregate functions towards flat tables
rather than multiple joined relations.

Sorry, it might be a topic to discuss in a separated thread.
Can't we have a functionality to push-down aggregate functions across table
joins? If we could do this in some cases, but not any cases, it makes sense
to run typical BI workload in parallel.

Let me assume the following query:
 SELECT SUM(t1.X), SUM(t2.Y) FROM t1 JOIN t2 ON t1.pk_id = t2.fk_id GROUP BY t1.A, t2.B;

It is a usual query that joins two relations with PK and FK.

It can be broken down, like:
 SELECT SUM(t1.X), SUM(sq2.Y) FROM t1 JOIN (SELECT t2.fk_id, t2.B, SUM(t2.Y) Y FROM t2 GROUP BY t2.fk_id, t2.B) sq2
GROUPBY t1.A, sq2.B;
 

If FK has 100 tuples per one PK in average, aggregate function that runs prior
to join will reduce massive number of tuples to be joined, and we can leverage
parallel processors to make preliminary aggregate on flat table.

I'm now checking the previous academic people's job to identify the condition
what kind of join allows to push down aggregate functions.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.7751

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: What exactly is our CRC algorithm?
Next
From: Robert Haas
Date:
Subject: Re: using custom scan nodes to prototype parallel sequential scan