Thread: Template for schema? (as opposed to databases)

Template for schema? (as opposed to databases)

From
Dominique Devienne
Date:
Hi,

As many have surmised from my previous messages, I need to create a set of schemas,
where two schemas as smallish and distincts, with an arbitrary number of additional schemas (of a 3rd kind),
which is "largish" (244 tables, 4'000+ columns, 1'300+ relations).

That's why I say / write that I have a system of 2+N schemas.

All 3 schemas types are created from code, which issues a bunch of DDL statements via libpq.

Creating the first two "fixed" schemas is fast enough. 300ms, including populating the 2nd with a bunch of rows.
But creating the 3rd kind, which will be done many times, is not so fast, at 2.5 - 3s, which may seem fast,
but when you do it dozens of time in unit-tests, adds up quickly to slow running tests...

My question is what could be done to speed up the creation of that 3rd type of schema?

Right now, that implies quite a few round-trips to the server, about 1'100, but on the LAN with sub-ms latency;
And IO wise, totals less than 1MB (since mostly DDLs):

EXEC:      1,125x (        0 rows,        445,373 bytes) in   1.810s (  0.2 MB/s)
RSET:      1,187x (       14 rows,        263,480 bytes) in   0.000s (  0.0 MB/s)

(1.8s is spent in libpq itself, the rest to reach to 2.5s - 3s are overhead in our own code, to generate the schema).

One idea I had, similar to the template-DB feature, would be to create 1 largish read-only schema at installation time
(i.e. not create only 2 but the 3 kinds of schemas), and have the ability to clone this 3rd (largish) kind entirely server-side.

So if there a way to close a schema easily in PostgreSQL?

Do note our triggers use the set search_path to avoid relying on the session's search_path, to make sure we call "our" functions (in schema).
(we have different client code-bases, some set the search-path, others don't, and that's the only way we found to have the trigger function correctly).
This means the template-schema name is part of the DDL for the schema, and a clone would need to use its own search-path, not the original.

At this point, the schema is "standalone", it does not have foreign-keys outside itself, nor does it use functions/procedures from the other 2 schemas.
But in the future it could. And right now, the schema does use functions from extensions (e.g. uuid-ossp, lo, ...). In all cases, the clone should continue
referencing external objects as-is.

If the "SCHEMA template" idea is difficult to pull of, has anyone experience on the gains from issuing the CREATE TABLE DDLs as part of the CREATE SCHEMA one?
That would result in fewer round-trips, but I somehow doubt it would gain much (again, because on the LAN with sub-ms latencies).

Any other tips on how to clone a schema? Or speed up creation of a schema in general?

Please do note this needs to be done from a libpq client, which has no direct access to the server (i.e. cannot use dump/restore solutions).

Thanks for any insights on the above. --DD

Re: Template for schema? (as opposed to databases)

From
"David G. Johnston"
Date:
On Tuesday, January 11, 2022, Dominique Devienne <ddevienne@gmail.com> wrote:
This means the template-schema name is part of the DDL for the schema, and a clone would need to use its own search-path, not the original.

This is your major limitation.  You are required to create new objects from code and cannot leverage any kind of copy of existing objects.  Unless the test database need only be three schemas always - then the schema can be identical because you can place them into different databases.
 
Please do note this needs to be done from a libpq client, which has no direct access to the server (i.e. cannot use dump/restore solutions).

PQexec and put the entire script into a single string.  That seems like the fastest possible way to create new objects (see the first point).

Beyond that there is too little detail to provide suggestions.  I would observe that for testing the meaningful values of N are 1 and 2.  Having a setup where N = 5 is not materially different than the N = 2 case (usually).

David J.

Re: Template for schema? (as opposed to databases)

From
Francisco Olarte
Date:
Hi Dominique:

On Tue, 11 Jan 2022 at 17:10, Dominique Devienne <ddevienne@gmail.com> wrote:
...
> Creating the first two "fixed" schemas is fast enough. 300ms, including populating the 2nd with a bunch of rows.
> But creating the 3rd kind, which will be done many times, is not so fast, at 2.5 - 3s, which may seem fast,
> but when you do it dozens of time in unit-tests, adds up quickly to slow running tests...

First question would be, are you running a test server? Are you
testing crashes of just application logic?
For these I normally use wal_level=minimal, fsync=off, same as for
full cluster restores ( it is faster to redo it if restore fails,
reinitdb on test cases ), it normally gives quite a nice performance
boost.

...
> Right now, that implies quite a few round-trips to the server, about 1'100, but on the LAN with sub-ms latency;

How much sub-ms? I mean, I have 0.74 pings to my router, but this
would be .82s, a good chunk of your quoted 2-3s, (and I'm not sure if
libpq can do all queries in one RTT ).

...
> One idea I had, similar to the template-DB feature, would be to create 1 largish read-only schema at installation
time
> (i.e. not create only 2 but the 3 kinds of schemas), and have the ability to clone this 3rd (largish) kind entirely
server-side.

You could try to use do-blocks, or even create procedures, even
parametrized. If all schemas are similar you could theoretically send
a parametrized procedure to create one, send another to execute it N
times, execute it, drop both. Basically, send some of your logic to
the database.

> So if there a way to close a schema easily in PostgreSQL?
s/close/clone/, I assume.

...
> Please do note this needs to be done from a libpq client, which has no direct access to the server (i.e. cannot use
dump/restoresolutions).
 

pg_dump/pg_restore work with the same connection as libpq, I even
suspect they are written using libpq ( but the need to use your own
client may preclude using dump/restore ).

Of course these are generalities for speeding up tsts.

FOS.



Re: Template for schema? (as opposed to databases)

From
Dominique Devienne
Date:
On Tue, Jan 11, 2022 at 5:30 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, January 11, 2022, Dominique Devienne <ddevienne@gmail.com> wrote:
This means the template-schema name is part of the DDL for the schema, and a clone would need to use its own search-path, not the original.

This is your major limitation.  You are required to create new objects from code and cannot leverage any kind of copy of existing objects.

But how to avoid that limitation?

Triggers in a schema should functions correctly, whether or not client sessions set the search_path, or use fully qualified object names.
I was actually surprised that functions from the schema itself (where the trigger is defined), do "not bind more tightly" to the dot (.) schema,
the "owner" schema of the trigger, compared to functions elsewhere.

Perhaps there's something I'm missing around trigger and name resolution?
 
Unless the test database need only be three schemas always - then the schema can be identical because you can place them into different databases.

No, as I wrote, it's 2+N,  
 
 
Please do note this needs to be done from a libpq client, which has no direct access to the server (i.e. cannot use dump/restore solutions).

PQexec and put the entire script into a single string.  That seems like the fastest possible way to create new objects (see the first point).

That didn't occur to me indeed. As it's not server-side. I might try that. Although I'd prefer a server-side solution.
 
Beyond that there is too little detail to provide suggestions.

What kind of further details would you need?
 
I would observe that for testing the meaningful values of N are 1 and 2.  Having a setup where N = 5 is not materially different than the N = 2 case (usually).

W/o more context, sure, that seems logical. But in this case, that's unlikely, to simulate the "real world". 

Thanks, --DD

Re: Template for schema? (as opposed to databases)

From
Dominique Devienne
Date:
On Tue, Jan 11, 2022 at 5:38 PM Francisco Olarte <folarte@peoplecall.com> wrote:
> Right now, that implies quite a few round-trips to the server, about 1'100, but on the LAN with sub-ms latency;

How much sub-ms? I mean, I have 0.74 pings to my router, but this
would be .82s, a good chunk of your quoted 2-3s, (and I'm not sure if libpq can do all queries in one RTT ).

Windows ping only says:
bytes=32 time<1ms TTL=64 

But Linux ping does say:
icmp_seq=8 ttl=64 time=0.236 ms (I picked the worse one).

So even with 1K round-trips, that should account only for +/- 250ms (1/10th the total). --DD

Re: Template for schema? (as opposed to databases)

From
Dominique Devienne
Date:
On Tue, Jan 11, 2022 at 5:38 PM Francisco Olarte <folarte@peoplecall.com> wrote:
On Tue, 11 Jan 2022 at 17:10, Dominique Devienne <ddevienne@gmail.com> wrote:
...
> Creating the first two "fixed" schemas is fast enough. 300ms, including populating the 2nd with a bunch of rows.
> But creating the 3rd kind, which will be done many times, is not so fast, at 2.5 - 3s, which may seem fast,
> but when you do it dozens of time in unit-tests, adds up quickly to slow running tests...

First question would be, are you running a test server?

No. This is a "site" server, used by everyone. Devs and testers / users.
 
Are you testing crashes of just application logic?

Well, I try to write code that does not crash :)
But yes, it's testing results of running the code are as expected.
 
For these I normally use wal_level=minimal, fsync=off, same as for
full cluster restores ( it is faster to redo it if restore fails,
reinitdb on test cases ), it normally gives quite a nice performance boost.

I'm well versed in libpq, not so much in server admin.
The server is "remote" and "just runs", and I connect to it... --DD

Re: Template for schema? (as opposed to databases)

From
"David G. Johnston"
Date:
On Tue, Jan 11, 2022 at 9:42 AM Dominique Devienne <ddevienne@gmail.com> wrote:
On Tue, Jan 11, 2022 at 5:30 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Tuesday, January 11, 2022, Dominique Devienne <ddevienne@gmail.com> wrote:
This means the template-schema name is part of the DDL for the schema, and a clone would need to use its own search-path, not the original.

This is your major limitation.  You are required to create new objects from code and cannot leverage any kind of copy of existing objects.

But how to avoid that limitation?

Sometimes limitations cannot be avoided...in this case however:

Get rid of the libpq requirement and you have options...namely I would create a template database (with 4 schemas) during a "build" stage and backup the containing cluster using pg_basebackup; and then during the "testing" I would restore that cluster in the testing environment as part of its setup and then copy the template database as needed to support the testing.  That avoids all of the parse/execute of SQL that happens in the server when using libpq.

Note: pg_dump/pg_restore is a libpq implemented procedure.

 
Beyond that there is too little detail to provide suggestions.

What kind of further details would you need?

The precise nature of these tests.  How many there are of what general types/purpose.  How long they take to run.  What kind of changes to the build/test system are reasonable.  How inflexible are the constraints that have been provided (or, at minimum, why are they there at all).
 
 
I would observe that for testing the meaningful values of N are 1 and 2.  Having a setup where N = 5 is not materially different than the N = 2 case (usually).

W/o more context, sure, that seems logical. But in this case, that's unlikely, to simulate the "real world". 


It doesn't really need context - it's a base premise.  There are only three kinds of meaningful cardinality.  Zero, One, More than One (which is sufficiently handled by using a quantity of 2).  Abide by that unless you can argue why breaking down "more than one" is important.

Unless you are doing a performance test, simulating the "real world" to the fullest is not something that your testing environment needs to do.

David J.