Home > mailing lists

Re: Parallel queries in single transaction - Mailing list pgsql-hackers

From	Paul Muntyanu
Subject	Re: Parallel queries in single transaction
Date	July 16, 2018 16:03:08
Msg-id	CACnYr+h7yzXtScWPcwHCCkXF0zLmtbqQjcOb_kyLP9JJ5awe+A@mail.gmail.com Whole thread Raw
In response to	Re: Parallel queries in single transaction (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses	Re: Parallel queries in single transaction (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List	pgsql-hackers

Tree view

Hi Tomas, thanks for looking into. I am more talking about queries which can not be optimized, e.g.

* fullscan of the table and heavy calculations for another one.

* query through FDW for both queries(e.g. one query fetches data from Kafka and another one is fetching from remote Postgres. There are no bounds for both queries for anything except local CPU, network and remote machine)

IO bound is not a problem in case if you have multiple tablesapces. And CPU bound can be not the case when you have 32 cores and 6 max workers per query. Then, during nigtly ETL, I do not have anything except single query running) == 6 cores are occupied. If I can run queries in parallel, I would occupy two IO stacks(two tablespaces) + 12 cores instead of sequentially 6 and then again 6.

Hope that makes sense

-P

On 16 Jul 2018, at 11:44, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Hi,

On 07/16/2018 09:45 AM, Paul Muntyanu wrote:
Hello,
I am working with data warehouse based on postgresql and would like to propose a feature. The idea is to give control and ability for developer to execute queries in parallel within single transaction. Usual flow is next: START_TRANSACTION -> QUERY1 -> QUERY2 -> QUERY3 -> END_TRANSACTION. However sometimes QUERY1 and QUERY2 are independent and can be executed in parallel mode. E.g.: START_TRANSACTION -> DEFINE_QUERY1(no execution) -> DEFINE_QUERY2(no_execution) -> EXECUTE_QUERY1_AND_QUERY2(in parallel) -> QUERY3 -> END
Of course QUERY1 and QUERY2 can be dependent and then this would not work, but sometimes it is useful, especially when you have bound to e.g. CPU and query stuck.

I'm struggling to understand what would be the possible benefits. Either the queries are CPU-bound or stuck (e.g. waiting for I/O), they can't be both at the same time. If a query is stuck, running it concurrently is pointless. If they are CPU-bound, we can run them in parallel (which should produce the results faster).

I'd even dare to say that running the queries concurrently can easily hinder performance, because the queries will compete for parallel workers, preventing some of them from running in parallel mode.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Tomas Vondra
Date: 16 July 2018, 15:44:50
Subject: Re: Parallel queries in single transaction

From: Michael Paquier
Date: 16 July 2018, 17:09:19
Subject: Re: Fix some error handling for read() and errno

Re: Parallel queries in single transaction - Mailing list pgsql-hackers

Previous

Next