Thread: 9.1.2: Preventing connections / syncing a database
Our development workstations maintain a local copy of the production database (which has been small enough that this is workable right now, and preferable to having a test database with generated fake data). We've been doing this by rsync'ing a compressed pgdump from the production server, dropping and recreating the database, and doing a pg_restore. This is finally causing enough pain that it's worth revisiting: 1. You can't drop a database with open connections; as our app gets more complex, it gets more painful to bring all of it down and back up again, since it's so darned smart about reconnecting automatically, and we use secondary components like Sphinx and queue-runners which also open the database but which are launched under separate cover. 2. pg_restore on a 400MB database is starting to take a few minutes. My intent was to only worry about #1 for now. With flat files, you usually solve this by renaming the file away and then creating a new one. But Postgres is smarter than me, and you can't rename a database with open connections either. Phooey. So I tried this: alter database rails_dev connection limit 0; select pg_terminate_backend(procpid) from pg_stat_activity where datname='rails_dev'; But the app was still able to reconnect. Hmm. Then I tried this: revoke connect on database rails_dev from public; select pg_terminate_backend(procpid) from pg_stat_activity where datname='rails_dev'; Still, the app can reconnect. (So can psql.) So... 1. How can I prevent (or redirect, or timeout, or anything) new connections? I think superuser roles might be exempt from connection limits and privileges. I could repeatedly terminate backends until I'm able to rename the database, but... ick. 2. What's a better way to slave to a master database without being a read-only slave? In other words, we want to use the production database as a starting point each morning, but we'll be creating test rows and tables that we wouldn't want to propagate to production. Can I do this while the database is live through some existing replication tool? The production database is on Ubuntu but the workstations are Macs, FWIW. Jay
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > 1. How can I prevent (or redirect, or timeout, or anything) new connections? > I think superuser roles might be exempt from connection limits and > privileges. I could repeatedly terminate backends until I'm able to rename > the database, but... ick. Superusers have a pool of reserved connections outside of the normal connection limits for a database. What I usually do is this: update pg_database set datallowconn = false where datname = 'foobar'; > 2. What's a better way to slave to a master database without being a > read-only slave? In other words, we want to use the production database as > a starting point each morning, but we'll be creating test rows and tables > that we wouldn't want to propagate to production. Can I do this while the > database is live through some existing replication tool? The production > database is on Ubuntu but the workstations are Macs, FWIW. How incremental does it need to be? You could use Bucardo to create slaves that can still be written to. Then in the morning you would simply kick off a sync to bring the slave up to date with the master (and optionally remove any test rows, etc.) Many caveats there, of course - it depends on your exact needs. If you have the right hardware/software, using snapshots or clones is an excellent way to make dev databases as well. If the pg_dump / restore is working for you, I'd keep that as long as you can. Try fiddling with some of the compressions, etc. to maximize speed. Quick ideas: try nc or tweak rsync, and on the slave: turn fsync off, boost maintenance_work_mem and checkpoint_segments, look into parallel restore. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201201011447 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAk8AuRwACgkQvJuQZxSWSsgoqwCgvPuaFA30Kugof3Xpqs60PVQY IbEAoLMMtBL97pzXeKRSthWUFN4Rr3Yh =1HiF -----END PGP SIGNATURE-----
On 01/01/2012 11:51 AM, Jay Levitt wrote: > > revoke connect on database rails_dev from public; > select pg_terminate_backend(procpid) from pg_stat_activity where > datname='rails_dev'; > > Still, the app can reconnect. (So can psql.) > > So... > > 1. How can I prevent (or redirect, or timeout, or anything) new > connections? I think superuser roles might be exempt from connection > limits and privileges. I could repeatedly terminate backends until > I'm able to rename the database, but... ick. > > 2. What's a better way to slave to a master database without being a > read-only slave? In other words, we want to use the production > database as a starting point each morning, but we'll be creating test > rows and tables that we wouldn't want to propagate to production. Can > I do this while the database is live through some existing replication > tool? The production database is on Ubuntu but the workstations are > Macs, FWIW. > > Jay More along the lines of what Greg has said. Not clear to me why an individual dev box needs to be that current data-wise. Of course stopping and starting your app should be easy, especially for the developers so maybe that's a better place to start. Then dev can do it when and how often suits dev best (even cronning shutdown app; reload db; to happen 3am Sundays)
Rob Sargentg wrote: > Not clear to me why an > individual dev box needs to be that current data-wise. It's significantly easier to debug a production problem when you can duplicate the problem on your local machine. "Hey, when I go to JennyC's activity page and scroll down three times, I see out-of-order postings." Our site is a dynamic Facebook-style feed, and you're usually seeing only the newest content, so it's a truism that any noticeable production bug will be with some user who registered yesterday and posted something this morning. Likewise, new code has a much better chance of working on production if it's been tested against production data. I know big shops use fancy technology like "test plans" and "QA people". But. (There's actually a good argument that we don't WANT that.) Keeping the dev boxes up to date, daily, is the poor man's regression test. > Of course stopping > and starting your app should be easy, especially for the developers so maybe > that's a better place to start. Then dev can do it when and how often suits > dev best (even cronning shutdown app; reload db; to happen 3am Sundays) Ah ha! Clearly you don't develop on a laptop ☺ cron jobs are no longer useful for things like that, because it's in your backpack and asleep at 3am. Yeah, it would run when you wake up, but what's more annoying than opening your laptop and having it freeze while it swaps everything back in, finds your network, runs your cron jobs, rearranges your windows, etc? And yes, shutting down the app isn't as hard as I claim - it's two or three commands - but developers are lazy and avoid friction. If you have to stop the app for five minutes to update, you'll "do it later". It's like waiting for a compile; it interrupts your rhythm. As Rails developers, we're spoiled; there's no compile, there's no local deploy. You change a line of source code, you switch windows, your editor automatically saves when it loses focus, you refresh the browser, Rails automatically reloads the changed code, you see the change. (There are three different browser extensions that will automatically refresh your browser, too, in case that was too hard.) TL;DR: Reduce friction -> more frequent database updates -> fewer bugs. > On 01/01/2012 11:51 AM, Jay Levitt wrote: >> >> revoke connect on database rails_dev from public; >> select pg_terminate_backend(procpid) from pg_stat_activity where >> datname='rails_dev'; >> >> Still, the app can reconnect. (So can psql.) >> >> So... >> >> 1. How can I prevent (or redirect, or timeout, or anything) new >> connections? I think superuser roles might be exempt from connection >> limits and privileges. I could repeatedly terminate backends until I'm >> able to rename the database, but... ick. >> >> 2. What's a better way to slave to a master database without being a >> read-only slave? In other words, we want to use the production database as >> a starting point each morning, but we'll be creating test rows and tables >> that we wouldn't want to propagate to production. Can I do this while the >> database is live through some existing replication tool? The production >> database is on Ubuntu but the workstations are Macs, FWIW. >> >> Jay > More along the lines of what Greg has said. Not clear to me why an > individual dev box needs to be that current data-wise. Of course stopping > and starting your app should be easy, especially for the developers so maybe > that's a better place to start. Then dev can do it when and how often suits > dev best (even cronning shutdown app; reload db; to happen 3am Sundays) >
Greg Sabino Mullane wrote: > update pg_database set datallowconn = false where datname = 'foobar'; That's perfect - thanks. Now I can (I think) do this: pg_restore -d rails_dev_new [wait] psql template1 update pg_database set datallowconn = false where datname = 'rails_dev'; select pg_terminate_backend(procpid) from pg_stat_activity where \ datname='rails_dev'; begin; alter database rails_dev rename to rails_dev_old; alter database rails_dev_new rename to rails_dev; commit; drop database rails_dev_old; \q >> > 2. What's a better way to slave to a master database without being a >> > read-only slave? In other words, we want to use the production database as >> > a starting point each morning, but we'll be creating test rows and tables >> > that we wouldn't want to propagate to production. Can I do this while the >> > database is live through some existing replication tool? The production >> > database is on Ubuntu but the workstations are Macs, FWIW. > > How incremental does it need to be? You could use Bucardo to create > slaves that can still be written to. Then in the morning you would simply > kick off a sync to bring the slave up to date with the master (and optionally > remove any test rows, etc.) Many caveats there, of course - it depends on > your exact needs. If you have the right hardware/software, using snapshots > or clones is an excellent way to make dev databases as well. Bucardo looks great for replication, but it mentions that it won't do DDL. I think that means if someone added a new column to production yesterday, but I haven't run that migration yet locally, Bucardo will choke when it tries to sync.. ya? (Though the easy workaround is run the darn migration first.) By "snapshots", do you mean filesystem-level snapshots like XFS or LVM? OS X has no support for either, sadly. If you mean Postgres snapshots (and can I mention that I don't yet understand where MVCC snapshots meet WAL/xlog, but let's say snapshots are a thing), I see in the Slony docs that: > WAL-based replication duplicates absolutely everything, and nothing extra > that changes data can run on a WAL-based replica. That sounds like I couldn't use production log-shipping to sync test databases. Unless that doc is not *quite* true, and I could somehow: - Sync from production - Take a snapshot (whatever that means, exactly) - Do whatever with the test database - Tomorrow, just before I sync, roll back to that snapshot - Repeat > > If the pg_dump / restore is working for you, I'd keep that as long as you > can. Try fiddling with some of the compressions, etc. to maximize speed. > Quick ideas: try nc or tweak rsync, and on the slave: turn fsync off, > boost maintenance_work_mem and checkpoint_segments, look into parallel > restore. Yeah, it's working for now, but we're at that hockey-stick point on the curve where what works now will be too slow three months from now, so I want to start thinking ahead. Those are good ideas; I bet the pg_restore can be much faster with giant checkpoints, lots of memory, etc. I also see Bucardo's split_pg_dump, which would probably help - no point creating indexes on-the-fly. Jay
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 Jay Levitt spoke: >>Greg Sabino Mullane wrote: >> update pg_database set datallowconn = false where datname = 'foobar'; >That's perfect - thanks. Now I can (I think) do this: > > pg_restore -d rails_dev_new > [wait] > psql template1 > update pg_database set datallowconn = false where datname = 'rails_dev'; > select pg_terminate_backend(procpid) from pg_stat_activity where \ > datname='rails_dev'; > begin; > alter database rails_dev rename to rails_dev_old; > alter database rails_dev_new rename to rails_dev; > commit; > drop database rails_dev_old; > \q Yes, but if that's truly the process, you might as well save some steps and just drop the existing one and do a single rename: select pg_terminate_backend(procpid) from pg_stat_activity where \ datname='rails_dev'; drop database rails_dev; alter database rails_dev_new rename to rails_dev; \q a developer will find they have something on there they need about two minutes after you drop it. :) Space permitting, of course. > Bucardo looks great for replication, but it mentions that it won't do DDL. I > think that means if someone added a new column to production yesterday, but > I haven't run that migration yet locally, Bucardo will choke when it tries > to sync.. ya? (Though the easy workaround is run the darn migration first.) Yes - the canonical way is to get the schema in sync first, then let Bucardo handle the data. > By "snapshots", do you mean filesystem-level snapshots like XFS or LVM? OS X > has no support for either, sadly. Yes, that's exactly what I mean. We have clients using that with great success. Simply make a snapshot of the production database volumes, mount it on the dev box, and go. > That sounds like I couldn't use production log-shipping to sync test > databases. Unless that doc is not *quite* true, and I could somehow: > ... Well, you can use Postgres' PITR (point in time recovery) aka warm standby to create standby slaves identical to the master, and then at some point in time flip them to become live, independent databases that can be modified. The downside is that you then have to create a new base backup, which means rsyncing the entire data directory to a new slave/standby box. However, this might be worth it as you can frontload the time spent doing so - once it is rsynced and the standby is up and running (and receiving data from the master), swtiching it from standby to nomal mode (and thus creating a perfect clone of production at that moment) is pretty much instant. So the process would be: * Turn on archive_command on prod, point it to the dev box * Create a base backup, ship the data dir to the dev box, start up the db * In the AM, tell the dev box to go into recovery mode. Tell the prod box to stop trying to ship logs to it * Rearrange some dirs on the dev box, and start over again by making a new base backup, rsyncing data dir, etc. * In the AM, stop the old dev database. Bring the new one up (recover it) You could even make less frequent base backups if you keep enough logs around to play forward through more than a days worth of logs. > Those are good ideas; I bet the pg_restore can be much faster with giant > checkpoints, lots of memory, etc. I also see Bucardo's split_pg_dump, which > would probably help - no point creating indexes on-the-fly. Well, if they are needed on prod, you probably want them on dev :) - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201201021458 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAk8CDIIACgkQvJuQZxSWSsj1cQCfdJtmW/fmgPDRYk2esngyng7a WZMAnjafyd+EDFGVzPA/dPUUqhks9Qkb =HJak -----END PGP SIGNATURE-----