Thread: SPI Concurrency Precautions?
Dear Hackers I've been working on an extension and using SPI to execute some queries. I am in a situation where I have the option to issue multiple queries concurrently, ideally under same snapshot and transaction. In short, I am achieving this by creating multiple dynamic background workers, each one of them executing a query at the same time using SPI_execute(sql_string, ...). To be more precise, sometimes I am also opting to issue a 'CREATE TABLE AS <sql_query>' command, an SPI utility command. I was however wondering whether I can indeed achieve concurrency in this way. My initial results are not showing much difference compared to a not concurrent implementation. If there would be a large lock somewhere in SPI implementation obviously this can be counterintuitive. What would be the precautions I would need to consider when working with SPI in this manner? Thanks, Tom
Re: SPI Concurrency Precautions? Problems with Parallel Execution ofMultiple CREATE TABLE statements
From
Tom Mercha
Date:
On 17/02/2020 21:24, Tom Mercha wrote: > Dear Hackers > > I've been working on an extension and using SPI to execute some queries. > I am in a situation where I have the option to issue multiple queries > concurrently, ideally under same snapshot and transaction. In short, I > am achieving this by creating multiple dynamic background workers, each > one of them executing a query at the same time using > SPI_execute(sql_string, ...). To be more precise, sometimes I am also > opting to issue a 'CREATE TABLE AS <sql_query>' command, an SPI utility > command. > > I was however wondering whether I can indeed achieve concurrency in this > way. My initial results are not showing much difference compared to a > not concurrent implementation. If there would be a large lock somewhere > in SPI implementation obviously this can be counterintuitive. What would > be the precautions I would need to consider when working with SPI in > this manner? > > Thanks, > Tom Dear Hackers I have run some tests to try and better highlight my issue as I am still struggling a lot with it. I have 4 'CREATE TABLE AS' statements of this nature: "CREATE TABLE <different_tbl_name> AS <same_query>". This means that I have different table names for the same query. I am spawning a number of dynamic background workers to execute each statement. When I spawn 4 workers on a quad-core machine, the resutling execution time per statement is {0.158s, 0.216s, 0.399s, 0.396s}. However, when I have just one worker, the results are {0.151s, 0.141s, 0.146s, 0.136s}. The way I am executing my statements is through SPI in each worker (using a PG extension) as follows: SPI_connect(); SPI_exec(queryString, 0); SPI_finish(); In both test cases, SPI_connect/finish are executed 4 times. What I expect is that with 4 workers, each statements will take approx 0.15s to execute since they are independent from each other. This would result in approx a 4x speedup. Despite seeing concurrency, I am seeing that each invdividual statement will take longer to execute. I am struggling to understand this behavior, what this suggests to me is that there is a lock somewhere which completely defeats my purpose. I was wondering how I could execute my CREATE TABLE statements in a parallel fashion given that they are independent from each other. If the lock is the problem, what steps could I take to relax it? I would greatly appreciate any guidance or insights on this topic. Thanks, Tom
Re: SPI Concurrency Precautions? Problems with Parallel Execution ofMultiple CREATE TABLE statements
From
Robert Haas
Date:
On Sat, Feb 22, 2020 at 5:50 AM Tom Mercha <mercha_t@hotmail.com> wrote: > I am spawning a number of dynamic background workers to execute each > statement. When I spawn 4 workers on a quad-core machine, the resutling > execution time per statement is {0.158s, 0.216s, 0.399s, 0.396s}. > However, when I have just one worker, the results are {0.151s, 0.141s, > 0.146s, 0.136s}. > > The way I am executing my statements is through SPI in each worker > (using a PG extension) as follows: > SPI_connect(); > SPI_exec(queryString, 0); > SPI_finish(); > In both test cases, SPI_connect/finish are executed 4 times. > > What I expect is that with 4 workers, each statements will take approx > 0.15s to execute since they are independent from each other. This would > result in approx a 4x speedup. Despite seeing concurrency, I am seeing > that each invdividual statement will take longer to execute. I am > struggling to understand this behavior, what this suggests to me is that > there is a lock somewhere which completely defeats my purpose. > > I was wondering how I could execute my CREATE TABLE statements in a > parallel fashion given that they are independent from each other. If the > lock is the problem, what steps could I take to relax it? I would > greatly appreciate any guidance or insights on this topic. Well, I'm not altogether sure that your expectations are realistic. Rarely do things parallelize perfectly. In a case like this, some time is probably being spent doing disk I/O. When multiple processes do CPU work at the same time, you should be able to see near-linear speedup, but when multiple processes do disk I/O at the same time, you may see no speedup at all, or even a slowdown, because of the way that disks work. This seems especially likely given how short the queries are and the fact that they create a new table, which involves an fsync() operation. It's possible that if you run a query to select the wait events from pg_stat_activity, maybe using psql's \watch with a fractional value, you might be able to see something about what those queries are actually spending time on. It's also possible that you might get more interesting results if you have things that run for longer than a few hundred milliseconds. But in general I would question the assumption that this ought to scale well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company