Thread: Massively Parallel transactioning?
Is there a way to update a number of databases hosted on a single server without opening a separate psql connection to each database? We have a cluster of servers hosting an application on Postgres. Right now, we have dozens of databases per server, enough that we're starting to have problems with our update process. When we release updates, we have a migrate script within our update process that runs all the database schema updates for all our clients. The way that it works is to open a transaction on all the databases concurrently, run the commands in sequence on the databases within the transactions, and then commit them all (or rollback if there was a problem) This way we can be sure that either all the databases are in synch, or that we need to rollback the program patch/update. So far, it's been a dream, but now, as we continue to grow, we're starting to reach connection limits per server. Short of raising the number of simultaneous connections, is there a way to run all the transactions for a single server for all databases within it on a single (or small number) of connections? I've tried the following: # ATTEMPT 1 $psql -U postgres template1 -h server1; template1=# begin transaction; create table testtable (name varchar); BEGIN CREATE TABLE \c somedatabase; ri psql (8.4.4, server 8.4.0) SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256) You are now connected to database "somedatabase". somedatabase=# rollback; NOTICE: there is no transaction in progress ROLLBACK somedatabase=# \c template1; template1=# rollback; NOTICE: there is no transaction in progress ROLLBACK template1=# # ATTEMPT 2 $psql -U postgres template1 -h server1; template1=# alter table somedatabase.testtable add address varchar; ERROR: cross-database references are not implemented: "somedatabase.public.students" template1=# Is there a better way? -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On Aug 18, 2010, at 4:32 PM, Benjamin Smith wrote: > Is there a way to update a number of databases hosted on a single server > without opening a separate psql connection to each database? > > We have a cluster of servers hosting an application on Postgres. Right now, we > have dozens of databases per server, enough that we're starting to have > problems with our update process. > Any reason you're using multiple databases, rather than multiple schemas? > Is there a better way? Schemas, probably. Though lazy schema updates done by the application might help too. Cheers, Steve
Hello Benjamin, On Wed, 2010-08-18 at 16:32 -0700, Benjamin Smith wrote: > Is there a way to update a number of databases hosted on a single server > without opening a separate psql connection to each database? I believe you are more interested in applying an atomic update for all databases rather than specifically the problem of separate psql connections to each database. Is that correct ? > This way we can be sure that either all the databases are in synch, or that we > need to rollback the program patch/update. > simultaneous connections, is there a way to run all the transactions for a > single server for all databases within it on a single (or small number) of > connections? It would be easy to extend the ChronicDB live database schema update system to support an atomic schema change across a multitude of databases. > Is there a better way? ChronicDB replicates a database to use a new schema in the background. When the schema change completes and few incremental changes remain, active transactions are temporarily paused and then rerouted to the new schema. This currently works for a schema change of a single database, but the parallel update you are requesting could be supported.
On Wed, 2010-08-18 at 16:32 -0700, Benjamin Smith wrote: > So far, it's been a dream, but now, as we continue to grow, we're starting to > reach connection limits per server. Short of raising the number of > simultaneous connections, is there a way to run all the transactions for a > single server for all databases within it on a single (or small number) of > connections? Well if you are just using it for updates to the schema etc... you should only need to launch a single connection to each database to make those changes. Joshua D. Drake
Heyho! On Thursday 19 August 2010 01.32:06 Benjamin Smith wrote: > This way we can be sure that either all the databases are in synch, or > that we need to rollback the program patch/update. I guess this might be more a hack than a solution: do the updates in batches and use 2pc: first connect to batches of databases, but instead of commit, you "prepare to commit". Prepared commits like this are persistent accross connections, so you can come back later and commit or rollback. Note that such prepared commits will block (some) stuff and use resources (not sure how many) before they are finally committed or rolled back, so you'll want to make sure they don't stick around too long. cheers -- vbi -- featured product: ClamAV Antivirus - http://www.clamav.net/
Attachment
On Thu, Aug 19, 2010 at 05:40:21AM +0200, Adrian von Bidder wrote: > On Thursday 19 August 2010 01.32:06 Benjamin Smith wrote: > > This way we can be sure that either all the databases are in synch, or > > that we need to rollback the program patch/update. > > I guess this might be more a hack than a solution: do the updates in batches > and use 2pc: first connect to batches of databases, but instead of commit, > you "prepare to commit". Prepared commits like this are persistent accross > connections, so you can come back later and commit or rollback. Rather than being a hack, 2PC sounds like it's needed for correctness; how do you handle the case of only some databases receiving the COMMIT command otherwise? -- Sam http://samason.me.uk/
On Wednesday, August 18, 2010 08:40:21 pm Adrian von Bidder wrote: > Heyho! > > On Thursday 19 August 2010 01.32:06 Benjamin Smith wrote: > > This way we can be sure that either all the databases are in synch, or > > that we need to rollback the program patch/update. > > I guess this might be more a hack than a solution: do the updates in > batches and use 2pc: first connect to batches of databases, but instead of > commit, you "prepare to commit". Prepared commits like this are > persistent accross connections, so you can come back later and commit or > rollback. > > Note that such prepared commits will block (some) stuff and use resources > (not sure how many) before they are finally committed or rolled back, so > you'll want to make sure they don't stick around too long. I can't see how this would be a hack, it's EXACTLY what I'm looking for! So often I find that when limits in Postgres get in my way, it's because I don't understand Postgres well enough. Much kudos to all of the Postgres team! -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On Wednesday, August 18, 2010 04:58:08 pm Joshua D. Drake wrote: > Well if you are just using it for updates to the schema etc... you > should only need to launch a single connection to each database to make > those changes. And that's exactly the problem. On each server, we have at least dozens of active databases - one for each client. Opening a connection for each database starts to become problematic. This problem is now solved with prepared transactions. -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On 08/19/10 20:19, Benjamin Smith wrote: > On Wednesday, August 18, 2010 04:58:08 pm Joshua D. Drake wrote: >> Well if you are just using it for updates to the schema etc... you >> should only need to launch a single connection to each database to make >> those changes. > > And that's exactly the problem. On each server, we have at least dozens of > active databases - one for each client. Opening a connection for each database > starts to become problematic. Hmm, let's talk about this problem. If I understand you correctly, you have an arrangement where you have one server which will push updates, and other, database servers, with "dozens" of databases per server? Let's try to convert this to numbers and assume that "dozens of databases per server" means "50" and that you have "50 more servers". This means that the server which pushes the updates needs to connect to 2500 databases. This is way to small a number of connections (sockets) from a single client to create problems on the client side. On the other hand, if you have, say, 50 databases per server, this means that the update clients connects 50 times to the same server, which is only a problem if the server has a small total number of connections configured (max_connections) - smaller than 2x the number of databases. Unless you are very careful not to actually exceed this number of connections during normal database use (and depending on what you do this may or may not be possible), it would actually be more benefitial to raise max_connections to a sensible value - e.g. 500 (which would probably need a bump in SEMMNI).