Fwd: Re: Restore postgresql data directory to tablespace on new host? Or swap tablespaces? - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Fwd: Re: Restore postgresql data directory to tablespace on new host? Or swap tablespaces? |
Date | |
Msg-id | 52FFE80E.80200@gmail.com Whole thread Raw |
List | pgsql-general |
Forwarding back to list. -------- Original Message -------- Subject: Re: [GENERAL] Restore postgresql data directory to tablespace on new host? Or swap tablespaces? Date: Sat, 15 Feb 2014 22:08:51 +0000 From: Antman, Jason (CMG-Atlanta) <Jason.Antman@coxinc.com> To: Adrian Klaver <adrian.klaver@gmail.com> Replies inline below. Thanks to everyone who's responded so far. The more I explain this, and answer questions, the more I see how my original "brilliant" idea (multiple DBs per postgres instance on one host, instead of 1:1:1 DB:postgres:host) is insane, without some specific support for it in postgres. So... I suppose, barring some suggestion nobody's made yet, the best we'll be able to do is try to run multiple postgres instances on each host, and manage the whole "service postgresql-9.0-24 stop" craziness that comes with that... Is anyone aware of documentation or best practices for running multiple postgres instances on the same host, and how to maximize resource sharing between them? Thanks, Jason On 02/15/2014 04:26 PM, Adrian Klaver wrote: > On 02/15/2014 10:31 AM, Antman, Jason (CMG-Atlanta) wrote: >> Well thanks for someone at least sending a reply, though I suppose I >> should have asked "how do I do this", or "what are the major hurdles to >> doing this", as it obviously has to be *possible* given unlimited >> knowledge, resources and time. >> >> Perhaps I should frame the question differently: >> >> If you had a single ~1TB database, and needed to be able to give fresh >> data copies to dev/test environments (which are usually largely idle) >> either on demand or daily, how would you do it? The only other thing >> that comes to mind is separate postgres instances (running multiple >> postgres instances per server?), one per database, for every >> environment. Which means that if 80% of the environments are idle at a >> given time, I'm effectively wasting 80% of the memory that I have >> allocated to shared buffers, etc. and I actually need 4x the resources >> I'm using? Unless postgres supports balooning of memory? > > <<Thinking out loud>> > > Your requirements: > > If I am following correctly you want a single universal data set that > is accessible by multiple databases at will. There will be 100-150 > different databases to which presumably different things are being > done. With, ideally, the databases having different names. Also it > would be helpful if the data could be refreshed without stopping the > cluster. Yup, that's the gist of it (what we do now for ~50 copies, each on their own virtual machine). > > The process: > > Start at known point, the state of the production server at point in > time. > > Apply that state as the starting point for 100-150 databases. > > Do things to those 100-150 databases that cause them to deviate from > the starting point and each other. > > Periodically reset some portion of the databases to a new starting > point based on a different state of the production database. Yup. Exactly. > > The issue: > > The killer would seem to be the global(cluster)/local(database) > problem. The cluster has the database at one state and then you try > impose a database state from another cluster on it. Even given your > stipulation-.."to be *possible* given unlimited knowledge, resources > and time.", I would say not in a time frame that is going to help you > in the practical future. Yeah, I guess. it's sounding like, aside from getting way down into the internals of postgres and spending months coding a really bad hack that won't work half the time, there's no sane way to stop the cluster (or not?) and force it to throw away everything it knows about a database, or replace that "knowledge" with fixed information... which would be what I'd need to do, I guess. > > Solution or not; > > Not seeing how you can do it without using some variation of what you > do now. Part of the problem in trying to come up with a solution is > not knowing what is being done to the test/dev databases. > > Is the full dataset a requirement for testing or would a subset do? Answered in my last post... effectively, yes, because the application ORM is so broken that nobody can get a valid-but-reduced dataset. The last attempt at deleting old data *through* the application's ORM ended up shrinking the DB size on disk (after vacuum) by under 50% in... I think 2 weeks of running around the clock. > Are there 100-150 users/tests each needing a database or could > databases be shared? That was based on 50 users each needing a maximum of 2-3 databases per day. No sharing, as it invalidates any testing they're doing. > > Is the 80% idle figure you mention manageable? > In other words do the tests run whenever, so the instances have to > always be available or can they be spun up on demand? Well... define "on demand". Currently they're always available, except when someone runs the command to refresh one, which currently takes approx. 120 seconds. If we could figure out how to, perhaps, snapshot virtual machines with the OS already booted, maybe that would work. But given the current expectation (~2 minutes start to finish for a refresh) I'm not sure we could reasonably spin them up on demand. Moreover, if we were spinning up a VM on demand, we'd end up with the same situation we're in now - every dev comes in in the morning and spins up their VM. > > >> >> Thanks, >> Jason >> >> On 02/15/2014 01:20 PM, Tom Lane wrote: >>> "Antman, Jason (CMG-Atlanta)" <Jason.Antman@coxinc.com> writes: >>>> Perhaps there's a postgres internals expert around, someone >>>> intimitely familiar with pg_xlog/pg_clog/pg_control, who can >>>> comment on whether it's possible to take the on-disk files from a >>>> single database in a single tablespace, and make them usable by a >>>> different postgres server, running multiple databases? >>> It is not. There's no need for detailed discussion. >>> >>> regards, tom lane >> >> > > -- Jason Antman | Systems Engineer | CMGdigital jason.antman@coxinc.com | p: 678-645-4155
pgsql-general by date: