Re: Restore postgresql data directory to tablespace on new host? Or swap tablespaces? - Mailing list pgsql-general
From | Antman, Jason (CMG-Atlanta) |
---|---|
Subject | Re: Restore postgresql data directory to tablespace on new host? Or swap tablespaces? |
Date | |
Msg-id | 52FFE2A0.8070207@coxinc.com Whole thread Raw |
In response to | Restore postgresql data directory to tablespace on new host? Or swap tablespaces? ("Antman, Jason (CMG-Atlanta)" <Jason.Antman@coxinc.com>) |
Responses |
Re: Restore postgresql data directory to tablespace on
new host? Or swap tablespaces?
Re: Restore postgresql data directory to tablespace on new host? Or swap tablespaces? |
List | pgsql-general |
On 02/15/2014 02:00 PM, Francisco Olarte wrote: > Hi: > > On Sat, Feb 15, 2014 at 7:31 PM, Antman, Jason (CMG-Atlanta) > <Jason.Antman@coxinc.com> wrote: >> Well thanks for someone at least sending a reply, though I suppose I >> should have asked "how do I do this", or "what are the major hurdles to >> doing this", as it obviously has to be *possible* given unlimited >> knowledge, resources and time. > Some things have been proven impossible, maybe this is one of those. > Maybe the lack of answers is due people having the same impression to > your your original message as me. I mean he NEEDS 100-150 dev/test > copies? The short answer is... due to too much technical debt, and some perhaps bad decisions made in the past... yeah. We've dug ourselves into this hole, and there's no feasible way out. Currently we have ~50 dev/test environments, each of which has a clone of the full production database (on a NetApp, using flexclones, so it's *much* smaller in terms of actual disk usage). Each of them is backed up its own postgres instance running in a dedicated VM. Data refreshes are done on-demand by stopping the postgres instance, unmounting and deleting the volume backing it, making a new clone of the daily snapshot, mounting it, and bringing postgres back up. But we don't have the money (or, more importantly, datacenter space) to dedicate a piece of hardware to each of these. So most dev environments can only handle ~1 person using the application at a time, due to trying to run a complex ORM (Django) against a ~1T database on a host with < 24G RAM. Actual testing happens in a different environment, which runs on bare metal. Load tests happen in an even more different environment, with a master and slave. So, we really need ~50 copies that can be refreshed at arbitrary intervals. Since I've already determined that's absolutely impossible without stopping postgres, I calculated 100-150 assuming we do bulk refreshes at night, and each developer is allocated 2-3 possible refreshes in a given 24-hour period. Since we've allowed the database to grow without using Django's facilities to split largely unrelated objects into different databases and handle any JOINs in the application, and we've gradually let the database diverge from the models actually in the ORM, every attempt at crafting a DB with a restricted set of recent data has failed. And yes, unfortunately, individual developers have had their own private database instance for long enough that migrating to a shared DB approach has been discarded as an option. > >> Perhaps I should frame the question differently: >> If you had a single ~1TB database, and needed to be able to give fresh >> data copies to dev/test environments (which are usually largely idle) >> either on demand or daily, how would you do it? The only other thing >> that comes to mind is separate postgres instances (running multiple >> postgres instances per server?), one per database, for every >> environment. Which means that if 80% of the environments are idle at a >> given time, I'm effectively wasting 80% of the memory that I have >> allocated to shared buffers, etc. and I actually need 4x the resources >> I'm using? Unless postgres supports balooning of memory? > If I NEEDED to be able to provide 100-150 snapshots to test/dev > environments 20% of which maybe active, I'll setup a cluster, buy > somewhere above a quarter terabyte RAM and some big snapshot > appliances. They are not that expensive, and someone NEEDING that big > dev/test systems must be on a humoungous project, where the cost of > throwing hardware at the problem would be dwarfed by any small delay > trying to set that up with minimal resources. A postgres instance > should happily run in half a gig, using proper snapshots would let you > share cache among them, so you could put quite a lot of them in one of > these 384G machines floating around, even on a 192G one, throw a > snapshotting filesystem or appliance on the mix and is not that > complex. I'm not sure what you mean by "snapshot appliance". If you mean something like a NetApp filer, yeah, that's what we're using. Unfortunately, no, we're on a project that's far from having an unlimited budget. The requirement for that big of a dev/test system is above... essentially, a lot of bad decisions that got us to the point of having a single ~1T database and no successful attempt at removing any significant amount of data from it. So you're suggesting running multiple instances of postgres in the same operating system, and they'll be able to share cache, instead of trying to get multiple DBs in the same instance? Is there any accepted method of managing N postgres instances on one host? Any tools or documentation for that, or best practices? Seems scary to me, but I suppose it's the best option we have... By "A postgres instance should happily run in half a gig" I assume you're not counting cache/buffers, etc.? Because when trying to run automation (i.e. automated testing) on our application, we can't get query execution time down to an acceptable level with anything less than ~48GB memory available in the host. Thanks for the suggestion though. I was considering this as an alternative, but every experience I've had of running multiple (let's say, more than 2 or 3) instances of the same application/daemon on the same host has inevitably led to all sorts of strange issues and problems... I suppose it sounds like it's going to be the only option though... > > Francisco Olarte. -- Jason Antman | Systems Engineer | CMGdigital jason.antman@coxinc.com | p: 678-645-4155
pgsql-general by date: