Thread: Massively Parallel transactioning?

Massively Parallel transactioning?

From
Benjamin Smith
Date:
Is there a way to update a number of databases hosted on a single server
without opening a separate psql connection to each database?

We have a cluster of servers hosting an application on Postgres. Right now, we
have dozens of databases per server, enough that we're starting to have
problems with our update process.

When we release updates, we have a migrate script within our update process
that runs all the database schema updates for all our clients. The way that it
works is to open a transaction on all the databases concurrently, run the
commands in sequence on the databases within the transactions, and then commit
them all (or rollback if there was a problem)

This way we can be sure that either all the databases are in synch, or that we
need to rollback the program patch/update.

So far, it's been a dream, but now, as we continue to grow, we're starting to
reach connection limits per server. Short of raising the number of
simultaneous connections, is there a way to run all the transactions for a
single server for all databases within it on a single (or small number) of
connections?

I've tried the following:

# ATTEMPT 1
$psql -U postgres template1 -h server1;
template1=# begin transaction; create table testtable (name varchar);
BEGIN
CREATE TABLE
\c somedatabase; ri
psql (8.4.4, server 8.4.0)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
You are now connected to database "somedatabase".
somedatabase=# rollback;
NOTICE:  there is no transaction in progress
ROLLBACK
somedatabase=# \c template1;
template1=# rollback;
NOTICE:  there is no transaction in progress
ROLLBACK
template1=#

# ATTEMPT 2
$psql -U postgres template1 -h server1;
template1=# alter table somedatabase.testtable add address varchar;
ERROR:  cross-database references are not implemented:
"somedatabase.public.students"
template1=#

Is there a better way?

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: Massively Parallel transactioning?

From
Steve Atkins
Date:
On Aug 18, 2010, at 4:32 PM, Benjamin Smith wrote:

> Is there a way to update a number of databases hosted on a single server
> without opening a separate psql connection to each database?
>
> We have a cluster of servers hosting an application on Postgres. Right now, we
> have dozens of databases per server, enough that we're starting to have
> problems with our update process.
>

Any reason you're using multiple databases, rather than multiple schemas?

> Is there a better way?

Schemas, probably. Though lazy schema updates done by the application might help too.

Cheers,
  Steve


Re: Massively Parallel transactioning?

From
ChronicDB Community Team
Date:
Hello Benjamin,

On Wed, 2010-08-18 at 16:32 -0700, Benjamin Smith wrote:
> Is there a way to update a number of databases hosted on a single server
> without opening a separate psql connection to each database?

I believe you are more interested in applying an atomic update for all
databases rather than specifically the problem of separate psql
connections to each database. Is that correct ?

> This way we can be sure that either all the databases are in synch, or that we
> need to rollback the program patch/update.

> simultaneous connections, is there a way to run all the transactions for a
> single server for all databases within it on a single (or small number) of
> connections?

It would be easy to extend the ChronicDB live database schema update
system to support an atomic schema change across a multitude of
databases.

> Is there a better way?

ChronicDB replicates a database to use a new schema in the background.
When the schema change completes and few incremental changes remain,
active transactions are temporarily paused and then rerouted to the new
schema. This currently works for a schema change of a single database,
but the parallel update you are requesting could be supported.


Re: Massively Parallel transactioning?

From
"Joshua D. Drake"
Date:
On Wed, 2010-08-18 at 16:32 -0700, Benjamin Smith wrote:

> So far, it's been a dream, but now, as we continue to grow, we're starting to
> reach connection limits per server. Short of raising the number of
> simultaneous connections, is there a way to run all the transactions for a
> single server for all databases within it on a single (or small number) of
> connections?

Well if you are just using it for updates to the schema etc... you
should only need to launch a single connection to each database to make
those changes.

Joshua D. Drake



Re: Massively Parallel transactioning?

From
Adrian von Bidder
Date:
Heyho!

On Thursday 19 August 2010 01.32:06 Benjamin Smith wrote:
> This way we can be sure that either all the databases are in synch, or
> that we  need to rollback the program patch/update.

I guess this might be more a hack than a solution: do the updates in batches
and use 2pc: first connect to batches of databases, but instead of commit,
you "prepare to commit".  Prepared commits like this are persistent accross
connections, so you can come back later and commit or rollback.

Note that such prepared commits will block (some) stuff and use resources
(not sure how many) before they are finally committed or rolled back, so
you'll want to make sure they don't stick around too long.

cheers
-- vbi

--
featured product: ClamAV Antivirus - http://www.clamav.net/

Attachment

Re: Massively Parallel transactioning?

From
Sam Mason
Date:
On Thu, Aug 19, 2010 at 05:40:21AM +0200, Adrian von Bidder wrote:
> On Thursday 19 August 2010 01.32:06 Benjamin Smith wrote:
> > This way we can be sure that either all the databases are in synch, or
> > that we  need to rollback the program patch/update.
>
> I guess this might be more a hack than a solution: do the updates in batches
> and use 2pc: first connect to batches of databases, but instead of commit,
> you "prepare to commit".  Prepared commits like this are persistent accross
> connections, so you can come back later and commit or rollback.

Rather than being a hack, 2PC sounds like it's needed for correctness;
how do you handle the case of only some databases receiving the COMMIT
command otherwise?

--
  Sam  http://samason.me.uk/

Re: Massively Parallel transactioning?

From
Benjamin Smith
Date:
On Wednesday, August 18, 2010 08:40:21 pm Adrian von Bidder wrote:
> Heyho!
>
> On Thursday 19 August 2010 01.32:06 Benjamin Smith wrote:
> > This way we can be sure that either all the databases are in synch, or
> > that we  need to rollback the program patch/update.
>
> I guess this might be more a hack than a solution: do the updates in
> batches and use 2pc: first connect to batches of databases, but instead of
> commit, you "prepare to commit".  Prepared commits like this are
> persistent accross connections, so you can come back later and commit or
> rollback.
>
> Note that such prepared commits will block (some) stuff and use resources
> (not sure how many) before they are finally committed or rolled back, so
> you'll want to make sure they don't stick around too long.

I can't see how this would be a hack, it's EXACTLY what I'm looking for!

So often I find that when limits in Postgres get in my way, it's because I
don't understand Postgres well enough.

Much kudos to all of the Postgres team!

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: Massively Parallel transactioning?

From
Benjamin Smith
Date:
On Wednesday, August 18, 2010 04:58:08 pm Joshua D. Drake wrote:
> Well if you are just using it for updates to the schema etc... you
> should only need to launch a single connection to each database to make
> those changes.

And that's exactly the problem. On each server, we have at least dozens of
active databases - one for each client. Opening a connection for each database
starts to become problematic.

This problem is now solved with prepared transactions.

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: Massively Parallel transactioning?

From
Ivan Voras
Date:
On 08/19/10 20:19, Benjamin Smith wrote:
> On Wednesday, August 18, 2010 04:58:08 pm Joshua D. Drake wrote:
>> Well if you are just using it for updates to the schema etc... you
>> should only need to launch a single connection to each database to make
>> those changes.
>
> And that's exactly the problem. On each server, we have at least dozens of
> active databases - one for each client. Opening a connection for each database
> starts to become problematic.

Hmm, let's talk about this problem.

If I understand you correctly, you have an arrangement where you have
one server which will push updates, and other, database servers, with
"dozens" of databases per server?

Let's try to convert this to numbers and assume that "dozens of
databases per server" means "50" and that you have "50 more servers".
This means that the server which pushes the updates needs to connect to
2500 databases. This is way to small a number of connections (sockets)
from a single client to create problems on the client side.

On the other hand, if you have, say, 50 databases per server, this means
that the update clients connects 50 times to the same server, which is
only a problem if the server has a small total number of connections
configured (max_connections) - smaller than 2x the number of databases.
Unless you are very careful not to actually exceed this number of
connections during normal database use (and depending on what you do
this may or may not be possible), it would actually be more benefitial
to raise max_connections to a sensible value - e.g. 500 (which would
probably need a bump in SEMMNI).