Thread: Fastest way to clone schema ~1000x

Fastest way to clone schema ~1000x

From
Emiel Mols
Date:
Hello,

To improve our unit and end-to-end testing performance, we are looking to optimize initialization of around 500-1000 database *schemas* from a schema.sql file.

Background: in postgres, you cannot change databases on existing connections, and using schemas would allow us to minimize the amount of active connections needed from our test runners, as we can reuse connections by updating search_path. In our current database-per-test setup we see that there is around a ~30% (!!) total CPU overhead in native_queued_spin_lock_slowpath (as profiled with linux perf), presumably because of the high connection count. We run ~200 tests in parallel to saturate a 128 core machine.

In the schema-per-test setup, however, it becomes harder to cheaply create the schema. Before we could `create database test001 with template testbase` to set up the database for a test, and this was reasonably fast. Re-inserting a schema with ~500 table/index definitions across 500 test schema's is prohibitively expensive (around 1000ms per schema insertion means we're wasting 500 cpu-seconds, and there seems to be quite some lock contention too). Linux perf shows that in this setup we're reducing the native_queued_spin_lock_slowpath overhead to around 10%, but total test time is still slower due to all schema initialization being done. Also it feels quite elegant functions and types can be reused between tests.

Does anyone have some insights or great ideas :)? Also pretty curious to the fundamental reason why having high/changing connection counts to postgres results in this much (spin)lock contention (or perhaps we're doing something wrong in either our configuration or measurements?).

An alternative we haven't explored yet is to see if we can use pgbouncer or other connection pooler to mitigate the 30% issue (set limits so there are only ever X connections to postgres in total, and perhaps max Y per test/database). This does add another piece of infrastructure/complexity, so not really prefered.

Best,

Emiel

Re: Fastest way to clone schema ~1000x

From
Pavel Stehule
Date:
Hi

po 26. 2. 2024 v 7:28 odesílatel Emiel Mols <emiel@crisp.nl> napsal:
Hello,

To improve our unit and end-to-end testing performance, we are looking to optimize initialization of around 500-1000 database *schemas* from a schema.sql file.

Background: in postgres, you cannot change databases on existing connections, and using schemas would allow us to minimize the amount of active connections needed from our test runners, as we can reuse connections by updating search_path. In our current database-per-test setup we see that there is around a ~30% (!!) total CPU overhead in native_queued_spin_lock_slowpath (as profiled with linux perf), presumably because of the high connection count. We run ~200 tests in parallel to saturate a 128 core machine.

In the schema-per-test setup, however, it becomes harder to cheaply create the schema. Before we could `create database test001 with template testbase` to set up the database for a test, and this was reasonably fast. Re-inserting a schema with ~500 table/index definitions across 500 test schema's is prohibitively expensive (around 1000ms per schema insertion means we're wasting 500 cpu-seconds, and there seems to be quite some lock contention too). Linux perf shows that in this setup we're reducing the native_queued_spin_lock_slowpath overhead to around 10%, but total test time is still slower due to all schema initialization being done. Also it feels quite elegant functions and types can be reused between tests.

Does anyone have some insights or great ideas :)? Also pretty curious to the fundamental reason why having high/changing connection counts to postgres results in this much (spin)lock contention (or perhaps we're doing something wrong in either our configuration or measurements?).

An alternative we haven't explored yet is to see if we can use pgbouncer or other connection pooler to mitigate the 30% issue (set limits so there are only ever X connections to postgres in total, and perhaps max Y per test/database). This does add another piece of infrastructure/complexity, so not really prefered.

For testing

a) use templates - CREATE DATABASE test TEMPLATE some;

b) disable fsync (only for testing!!!)

Regards

Pavel


Best,

Emiel

Re: Fastest way to clone schema ~1000x

From
Emiel Mols
Date:
Thanks, as indicated we're using that right now. The 30% spinlock overhead unfortunately persists.

- Fsync was already disabled, too. Complete postgresql.conf used in testing:
listen_addresses = ''
max_connections = 2048
unix_socket_directories = '..'
shared_buffers = 128MB
log_line_prefix = ''
synchronous_commit = 'off'
wal_level = 'minimal'

- linux perf report comparing schema-per-test vs database-per-test: https://ibb.co/CW5w2MW

- Emiel


On Mon, Feb 26, 2024 at 1:36 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

po 26. 2. 2024 v 7:28 odesílatel Emiel Mols <emiel@crisp.nl> napsal:
Hello,

To improve our unit and end-to-end testing performance, we are looking to optimize initialization of around 500-1000 database *schemas* from a schema.sql file.

Background: in postgres, you cannot change databases on existing connections, and using schemas would allow us to minimize the amount of active connections needed from our test runners, as we can reuse connections by updating search_path. In our current database-per-test setup we see that there is around a ~30% (!!) total CPU overhead in native_queued_spin_lock_slowpath (as profiled with linux perf), presumably because of the high connection count. We run ~200 tests in parallel to saturate a 128 core machine.

In the schema-per-test setup, however, it becomes harder to cheaply create the schema. Before we could `create database test001 with template testbase` to set up the database for a test, and this was reasonably fast. Re-inserting a schema with ~500 table/index definitions across 500 test schema's is prohibitively expensive (around 1000ms per schema insertion means we're wasting 500 cpu-seconds, and there seems to be quite some lock contention too). Linux perf shows that in this setup we're reducing the native_queued_spin_lock_slowpath overhead to around 10%, but total test time is still slower due to all schema initialization being done. Also it feels quite elegant functions and types can be reused between tests.

Does anyone have some insights or great ideas :)? Also pretty curious to the fundamental reason why having high/changing connection counts to postgres results in this much (spin)lock contention (or perhaps we're doing something wrong in either our configuration or measurements?).

An alternative we haven't explored yet is to see if we can use pgbouncer or other connection pooler to mitigate the 30% issue (set limits so there are only ever X connections to postgres in total, and perhaps max Y per test/database). This does add another piece of infrastructure/complexity, so not really prefered.

For testing

a) use templates - CREATE DATABASE test TEMPLATE some;

b) disable fsync (only for testing!!!)

Regards

Pavel


Best,

Emiel

Re: Fastest way to clone schema ~1000x

From
Pavel Stehule
Date:


po 26. 2. 2024 v 8:08 odesílatel Emiel Mols <emiel@crisp.nl> napsal:
Thanks, as indicated we're using that right now. The 30% spinlock overhead unfortunately persists.

try to increase shared_buffer

128MB can be too low

max_connection = 2048 - it unrealistic high


- Fsync was already disabled, too. Complete postgresql.conf used in testing:
listen_addresses = ''
max_connections = 2048
unix_socket_directories = '..'
shared_buffers = 128MB
log_line_prefix = ''
synchronous_commit = 'off'
wal_level = 'minimal'

- linux perf report comparing schema-per-test vs database-per-test: https://ibb.co/CW5w2MW

- Emiel


On Mon, Feb 26, 2024 at 1:36 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

po 26. 2. 2024 v 7:28 odesílatel Emiel Mols <emiel@crisp.nl> napsal:
Hello,

To improve our unit and end-to-end testing performance, we are looking to optimize initialization of around 500-1000 database *schemas* from a schema.sql file.

Background: in postgres, you cannot change databases on existing connections, and using schemas would allow us to minimize the amount of active connections needed from our test runners, as we can reuse connections by updating search_path. In our current database-per-test setup we see that there is around a ~30% (!!) total CPU overhead in native_queued_spin_lock_slowpath (as profiled with linux perf), presumably because of the high connection count. We run ~200 tests in parallel to saturate a 128 core machine.

In the schema-per-test setup, however, it becomes harder to cheaply create the schema. Before we could `create database test001 with template testbase` to set up the database for a test, and this was reasonably fast. Re-inserting a schema with ~500 table/index definitions across 500 test schema's is prohibitively expensive (around 1000ms per schema insertion means we're wasting 500 cpu-seconds, and there seems to be quite some lock contention too). Linux perf shows that in this setup we're reducing the native_queued_spin_lock_slowpath overhead to around 10%, but total test time is still slower due to all schema initialization being done. Also it feels quite elegant functions and types can be reused between tests.

Does anyone have some insights or great ideas :)? Also pretty curious to the fundamental reason why having high/changing connection counts to postgres results in this much (spin)lock contention (or perhaps we're doing something wrong in either our configuration or measurements?).

An alternative we haven't explored yet is to see if we can use pgbouncer or other connection pooler to mitigate the 30% issue (set limits so there are only ever X connections to postgres in total, and perhaps max Y per test/database). This does add another piece of infrastructure/complexity, so not really prefered.

For testing

a) use templates - CREATE DATABASE test TEMPLATE some;

b) disable fsync (only for testing!!!)

Regards

Pavel


Best,

Emiel

Re: Fastest way to clone schema ~1000x

From
Emiel Mols
Date:
We've experimented with shared buffers to no effect.

The 2048 we actually need for our test setup. The way this works is that we have a single preforked backend for all tests where each backend worker maintains persistent connections *per test* (in database-per-test), so with say 50 backend processes, they might each meantain ~10-50 connections to postgres. That's the reason for looking into schema-per-test that would require only 1 persistent connection per worker and possibly only changing the search_path to handle a request (and hencefort cheap initialization of schema.sql being a new impediment).

But based on your answer, we will do some more research into pooling these requests over pgbouncer with pool_mode=transaction. That should multiplex all these backend->bouncer connections over a lot less connections to postgres itself.

Thanks!

On Mon, Feb 26, 2024 at 2:14 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:


po 26. 2. 2024 v 8:08 odesílatel Emiel Mols <emiel@crisp.nl> napsal:
Thanks, as indicated we're using that right now. The 30% spinlock overhead unfortunately persists.

try to increase shared_buffer

128MB can be too low

max_connection = 2048 - it unrealistic high


- Fsync was already disabled, too. Complete postgresql.conf used in testing:
listen_addresses = ''
max_connections = 2048
unix_socket_directories = '..'
shared_buffers = 128MB
log_line_prefix = ''
synchronous_commit = 'off'
wal_level = 'minimal'

- linux perf report comparing schema-per-test vs database-per-test: https://ibb.co/CW5w2MW

- Emiel


On Mon, Feb 26, 2024 at 1:36 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hi

po 26. 2. 2024 v 7:28 odesílatel Emiel Mols <emiel@crisp.nl> napsal:
Hello,

To improve our unit and end-to-end testing performance, we are looking to optimize initialization of around 500-1000 database *schemas* from a schema.sql file.

Background: in postgres, you cannot change databases on existing connections, and using schemas would allow us to minimize the amount of active connections needed from our test runners, as we can reuse connections by updating search_path. In our current database-per-test setup we see that there is around a ~30% (!!) total CPU overhead in native_queued_spin_lock_slowpath (as profiled with linux perf), presumably because of the high connection count. We run ~200 tests in parallel to saturate a 128 core machine.

In the schema-per-test setup, however, it becomes harder to cheaply create the schema. Before we could `create database test001 with template testbase` to set up the database for a test, and this was reasonably fast. Re-inserting a schema with ~500 table/index definitions across 500 test schema's is prohibitively expensive (around 1000ms per schema insertion means we're wasting 500 cpu-seconds, and there seems to be quite some lock contention too). Linux perf shows that in this setup we're reducing the native_queued_spin_lock_slowpath overhead to around 10%, but total test time is still slower due to all schema initialization being done. Also it feels quite elegant functions and types can be reused between tests.

Does anyone have some insights or great ideas :)? Also pretty curious to the fundamental reason why having high/changing connection counts to postgres results in this much (spin)lock contention (or perhaps we're doing something wrong in either our configuration or measurements?).

An alternative we haven't explored yet is to see if we can use pgbouncer or other connection pooler to mitigate the 30% issue (set limits so there are only ever X connections to postgres in total, and perhaps max Y per test/database). This does add another piece of infrastructure/complexity, so not really prefered.

For testing

a) use templates - CREATE DATABASE test TEMPLATE some;

b) disable fsync (only for testing!!!)

Regards

Pavel


Best,

Emiel

Re: Fastest way to clone schema ~1000x

From
Daniel Gustafsson
Date:
> On 26 Feb 2024, at 09:46, Emiel Mols <emiel@crisp.nl> wrote:

> The 2048 we actually need for our test setup.

There is a measurable overhead in connections, regardless of if they are used
or not.  If you are looking to squeeze out performance then doing more over
already established connections, and reducing max_connections, is a good place
to start.

--
Daniel Gustafsson




Re: Fastest way to clone schema ~1000x

From
Emiel Mols
Date:
On Mon, Feb 26, 2024 at 3:50 PM Daniel Gustafsson <daniel@yesql.se> wrote:
There is a measurable overhead in connections, regardless of if they are used
or not.  If you are looking to squeeze out performance then doing more over
already established connections, and reducing max_connections, is a good place
to start.

Clear, but with database-per-test (and our backend setup), it would have been *great* if we could have switched database on the same connection (similar to "USE xxx" in mysql). That would limit the connections to the amount of workers, not multiplied by tests.

Even with a pooler, we're still going to be maintaining 1000s of connections from the backend workers to the pooler. I would expect this to be rather efficient, but still unnecessary. Also, both pgbouncer/pgpool don't seem to support switching database in-connection (they could have implemented the aforementioned "USE" statement I think!). [Additionally we're using PHP that doesn't seem to have a good shared memory pool implementation -- pg_pconnect is pretty buggy].

I'll continue with some more testing. Thanks for now!

Re: Fastest way to clone schema ~1000x

From
Adrian Klaver
Date:
On 2/26/24 01:06, Emiel Mols wrote:
> On Mon, Feb 26, 2024 at 3:50 PM Daniel Gustafsson <daniel@yesql.se 
> <mailto:daniel@yesql.se>> wrote:
> 
>     There is a measurable overhead in connections, regardless of if they
>     are used
>     or not.  If you are looking to squeeze out performance then doing
>     more over
>     already established connections, and reducing max_connections, is a
>     good place
>     to start.
> 
> 
> Clear, but with database-per-test (and our backend setup), it would have 
> been *great* if we could have switched database on the same connection 
> (similar to "USE xxx" in mysql). That would limit the connections to the 
> amount of workers, not multiplied by tests.

That is because:

https://dev.mysql.com/doc/refman/8.3/en/glossary.html#glos_schema

"In MySQL, physically, a schema is synonymous with a database. You can 
substitute the keyword SCHEMA instead of DATABASE in MySQL SQL syntax, 
for example using CREATE SCHEMA instead of CREATE DATABASE. "


> 
> Even with a pooler, we're still going to be maintaining 1000s of 
> connections from the backend workers to the pooler. I would expect this 
> to be rather efficient, but still unnecessary. Also, both 
> pgbouncer/pgpool don't seem to support switching database in-connection 
> (they could have implemented the aforementioned "USE" statement I 
> think!). [Additionally we're using PHP that doesn't seem to have a good 
> shared memory pool implementation -- pg_pconnect is pretty buggy].
> 
> I'll continue with some more testing. Thanks for now!

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Fastest way to clone schema ~1000x

From
Emiel Mols
Date:
An update on this for anyone else who has similar issues/considerations:
- As suggested, we are sticking to 1 database-per-test, and not a schema-per-test.
- We moved our testing setup to maintain at most 1 connection per backend worker. If a request for a different test comes in, we close and reconnect to the different DB (in the future, we might test some affinity based routing here). This does empirically seem to improve performance around ~5-10% over maintaining 1000+ connections. It probably helps that the connection is local over a unix domain socket. We now set max_connections in testing to 512, where we usually max out at ~200.
- We tried PgBouncer, which seems to work as well, but doesn't improve performance much compared to the previous point. (Also it adds quite some complexity, we still need to maintain the 1000+ connections to the pooler as it doesn't support changing databases and the user names need to be known/configured ahead of time which is annoying for us)
- We looked more into the `native_queued_spin_lock_slowpath` issue, distilling a flamegraph to identify the actual caller(s): our latest thinking now is that this is an artifact of measurement with perf/bpf (!!), so the concerns with this can be ignored otherwise. Flame graph: https://ibb.co/sW34mgq. Reason to think so is that `bpf_trampoline_6442485415_0` is in its call path. (Please do reach out if you think this is wrong)

Thanks again for the quick help.

- Emiel

On Mon, Feb 26, 2024 at 5:10 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 2/26/24 01:06, Emiel Mols wrote:
> On Mon, Feb 26, 2024 at 3:50 PM Daniel Gustafsson <daniel@yesql.se
> <mailto:daniel@yesql.se>> wrote:
>
>     There is a measurable overhead in connections, regardless of if they
>     are used
>     or not.  If you are looking to squeeze out performance then doing
>     more over
>     already established connections, and reducing max_connections, is a
>     good place
>     to start.
>
>
> Clear, but with database-per-test (and our backend setup), it would have
> been *great* if we could have switched database on the same connection
> (similar to "USE xxx" in mysql). That would limit the connections to the
> amount of workers, not multiplied by tests.

That is because:

https://dev.mysql.com/doc/refman/8.3/en/glossary.html#glos_schema

"In MySQL, physically, a schema is synonymous with a database. You can
substitute the keyword SCHEMA instead of DATABASE in MySQL SQL syntax,
for example using CREATE SCHEMA instead of CREATE DATABASE. "


>
> Even with a pooler, we're still going to be maintaining 1000s of
> connections from the backend workers to the pooler. I would expect this
> to be rather efficient, but still unnecessary. Also, both
> pgbouncer/pgpool don't seem to support switching database in-connection
> (they could have implemented the aforementioned "USE" statement I
> think!). [Additionally we're using PHP that doesn't seem to have a good
> shared memory pool implementation -- pg_pconnect is pretty buggy].
>
> I'll continue with some more testing. Thanks for now!

--
Adrian Klaver
adrian.klaver@aklaver.com