Re: SPI Concurrency Precautions? Problems with Parallel Execution ofMultiple CREATE TABLE statements - Mailing list pgsql-hackers

From Tom Mercha
Subject Re: SPI Concurrency Precautions? Problems with Parallel Execution ofMultiple CREATE TABLE statements
Date
Msg-id AM6PR05MB5080A6871E34CA17B196F922F4EE0@AM6PR05MB5080.eurprd05.prod.outlook.com
Whole thread Raw
In response to SPI Concurrency Precautions?  (Tom Mercha <mercha_t@hotmail.com>)
Responses Re: SPI Concurrency Precautions? Problems with Parallel Execution ofMultiple CREATE TABLE statements
List pgsql-hackers
On 17/02/2020 21:24, Tom Mercha wrote:
> Dear Hackers
> 
> I've been working on an extension and using SPI to execute some queries. 
> I am in a situation where I have the option to issue multiple queries 
> concurrently, ideally under same snapshot and transaction. In short, I 
> am achieving this by creating multiple dynamic background workers, each 
> one of them executing a query at the same time using 
> SPI_execute(sql_string, ...). To be more precise, sometimes I am also 
> opting to issue a 'CREATE TABLE AS <sql_query>' command, an SPI utility 
> command.
> 
> I was however wondering whether I can indeed achieve concurrency in this 
> way. My initial results are not showing much difference compared to a 
> not concurrent implementation. If there would be a large lock somewhere 
> in SPI implementation obviously this can be counterintuitive. What would 
> be the precautions I would need to consider when working with SPI in 
> this manner?
> 
> Thanks,
> Tom

Dear Hackers

I have run some tests to try and better highlight my issue as I am still 
struggling a lot with it.

I have 4 'CREATE TABLE AS' statements of this nature: "CREATE TABLE 
<different_tbl_name> AS <same_query>". This means that I have different 
table names for the same query.

I am spawning a number of dynamic background workers to execute each 
statement. When I spawn 4 workers on a quad-core machine, the resutling 
execution time per statement is {0.158s, 0.216s, 0.399s, 0.396s}. 
However, when I have just one worker, the results are {0.151s, 0.141s, 
0.146s, 0.136s}.

The way I am executing my statements is through SPI in each worker 
(using a PG extension) as follows:
  SPI_connect();
  SPI_exec(queryString, 0);
  SPI_finish();
In both test cases, SPI_connect/finish are executed 4 times.

What I expect is that with 4 workers, each statements will take approx 
0.15s to execute since they are independent from each other. This would 
result in approx a 4x speedup. Despite seeing concurrency, I am seeing 
that each invdividual statement will take longer to execute. I am 
struggling to understand this behavior, what this suggests to me is that 
there is a lock somewhere which completely defeats my purpose.

I was wondering how I could execute my CREATE TABLE statements in a 
parallel fashion given that they are independent from each other. If the 
lock is the problem, what steps could I take to relax it? I would 
greatly appreciate any guidance or insights on this topic.

Thanks,
Tom



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Do we ever edit release notes after release to add warnings about incompatibilities?
Next
From: Tomas Vondra
Date:
Subject: Re: Parallel copy