Re: Parallel queries in single transaction - Mailing list pgsql-hackers

From Paul Muntyanu
Subject Re: Parallel queries in single transaction
Date
Msg-id CACnYr+h7yzXtScWPcwHCCkXF0zLmtbqQjcOb_kyLP9JJ5awe+A@mail.gmail.com
Whole thread Raw
In response to Re: Parallel queries in single transaction  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Parallel queries in single transaction  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hi Tomas, thanks for looking into. I am more talking about queries which can not be optimized, e.g.
* fullscan of the table and heavy calculations for another one.
* query through FDW for both queries(e.g. one query fetches data from Kafka and another one is fetching from remote Postgres. There are no bounds for both queries for anything except local CPU, network and remote machine)

IO bound is not a problem in case if you have multiple tablesapces. And CPU bound can be not the case when you have 32 cores and 6 max workers per query. Then, during nigtly ETL, I do not have anything except single query running) == 6 cores are occupied. If I can run queries in parallel, I would occupy two IO stacks(two tablespaces) + 12 cores instead of sequentially 6 and then again 6.

Hope that makes sense

-P

On 16 Jul 2018, at 11:44, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Hi,

On 07/16/2018 09:45 AM, Paul Muntyanu wrote:
Hello,
   I am working with data warehouse based on postgresql and would like to propose a feature. The idea is to give control and ability for developer to execute queries in parallel within single transaction. Usual flow is next: START_TRANSACTION -> QUERY1 -> QUERY2 -> QUERY3 -> END_TRANSACTION. However sometimes QUERY1 and QUERY2 are independent and can be executed in parallel mode. E.g.: START_TRANSACTION -> DEFINE_QUERY1(no execution) -> DEFINE_QUERY2(no_execution) -> EXECUTE_QUERY1_AND_QUERY2(in parallel) -> QUERY3 -> END
Of course QUERY1 and QUERY2 can be dependent and then this would not work, but sometimes it is useful, especially when you have bound to e.g. CPU and query stuck.

I'm struggling to understand what would be the possible benefits. Either the queries are CPU-bound or stuck (e.g. waiting for I/O), they can't be both at the same time. If a query is stuck, running it concurrently is pointless. If they are CPU-bound, we can run them in parallel (which should produce the results faster).

I'd even dare to say that running the queries concurrently can easily hinder performance, because the queries will compete for parallel workers, preventing some of them from running in parallel mode.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Parallel queries in single transaction
Next
From: Michael Paquier
Date:
Subject: Re: Fix some error handling for read() and errno