Re: Restore postgresql data directory to tablespace on new host? Or swap tablespaces? - Mailing list pgsql-general

From Francisco Olarte
Subject Re: Restore postgresql data directory to tablespace on new host? Or swap tablespaces?
Date
Msg-id CA+bJJbzrN8N7GmFzrSSwfjUrVcTC-Pf0tg2anGNCosXTimPFxg@mail.gmail.com
Whole thread Raw
In response to Re: Restore postgresql data directory to tablespace on new host? Or swap tablespaces?  ("Antman, Jason (CMG-Atlanta)" <Jason.Antman@coxinc.com>)
List pgsql-general
Hi Jason:

On Sat, Feb 15, 2014 at 10:55 PM, Antman, Jason (CMG-Atlanta)
<Jason.Antman@coxinc.com> wrote:
> The short answer is... due to too much technical debt, and some perhaps
> bad decisions made in the past... yeah. We've dug ourselves into this
> hole, and there's no feasible way out.

The more I read about your problem, the more I think you've nailed it
in that phrase.

I'll just comment somethings in case you find them useful, but it
seems joining the above with the lack of appropiate resources you will
have a bad time. You seem to be in a big shop, terabyte database, 50
test environments, you must have a lot of developers. In your case I
would start by estimating how much developer time you could save by
buying hardware.

>  Currently we have ~50 dev/test
> environments, each of which has a clone of the full production database
> (on a NetApp, using flexclones, so it's *much* smaller in terms of
> actual disk usage).

I'm not familiar with netapp, but if your flexclones are exported as
different filesystems you have a problem with that. Even if say, 100
clones of the database use only 5 Tb when you export it the instances
see them at 100 Tb.

> Each of them is backed up its own postgres instance
> running in a dedicated VM.

Some problem here, each VM needs its own cache. If I would need to
spin a lot of databases on limited hardware what I would try is to use
a single host with a snapshoting filesystem, ie ZFS or BTRFS or
whatever running a lot of instances. This way you share the code mem
between all the instances and you have a single OS instance, which can
easily save hundreds or megs per machine. But, what is more important,
you cannot do anything for shared buffers, work mem et al, they are
per-instance, but postgres also uses an estimate of the cache. If your
instances run on the same machine, with say 32Gb cache and the OS in
this box runs the snapshots it can use cached blocks for one instance
on another one, this is not possible if you are net-mounting it. I do
not know how much you would gain, as it woill be heavily dependent on
your concrete access patterns, but it's gonna be faster. BTW, I
suppose you've already optimized the access stuff for tests, like
running without fsync, minimal logging, etc in dev environements.

> Data refreshes are done on-demand by stopping
> the postgres instance, unmounting and deleting the volume backing it,
> making a new clone of the daily snapshot, mounting it, and bringing
> postgres back up.

This would be the same in which I told you, just on a different place.
Postgres doesn't meen wether it has a full VM or just a data directory
and a port.

> But we don't have the money (or, more importantly,
> datacenter space) to dedicate a piece of hardware to each of these. So
> most dev environments can only handle ~1 person using the application at
> a time, due to trying to run a complex ORM (Django) against a ~1T
> database on a host with < 24G RAM.

Every time you quote one of this things it seems more and more like
you are in big place and you have a really weird set of requisites.
AFAIK Django is not that big, the problem may be more with the kind of
things you are trying to do with them. And I do not quite understand
what are you calling 'a testing environment'. With only what I've
read, if the netapp thingy you mention is fast enough to serve the
snapshots, I'll normally think, ram is cheap, why not put 32Gb in each
developer machine and run everything there? But I suppose you have lot
of other constraints.

...
> So, we really need ~50 copies that can be refreshed at arbitrary
> intervals. Since I've already determined that's absolutely impossible
> without stopping postgres, I calculated 100-150 assuming we do bulk
> refreshes at night, and each developer is allocated 2-3 possible
> refreshes in a given 24-hour period.

That left me still more puzzled.

> Since we've allowed the database to grow without using Django's
> facilities to split largely unrelated objects into different databases
> and handle any JOINs in the application, and we've gradually let the
> database diverge from the models actually in the ORM, every attempt at
> crafting a DB with a restricted set of recent data has failed. And yes,
> unfortunately, individual developers have had their own private database
> instance for long enough that migrating to a shared DB approach has been
> discarded as an option.

As you say in your fist paragrapah, it seems you have dug a really deep hole.

From what I've read so far, a lot more knowledge about your problem is
needed to dessign a solution, and I'm not gonna try to do it. The only
thing I can do is tell you what I've said above, virtualization is
nice, but running several instances in the same machine ( be it
virtual or real ) will decrease your requisites. NetApps are nice, but
they virtualize the disks and hide potential sharings from the OS
under the db server, so moving the snapshots nearer will decrease them
too.

.....
> Unfortunately, no, we're on a project that's far from having an
> unlimited budget.

That I supposed. The things is it seems that, as you stated, you may
already be over budget. The thing about the machines is that given the
numbers you quote I have a mental picture of your project. From that
mental picture, for me, the cheapest way seems to be throwing
hardware, as it seems you are trading some k EURO/k$/whatever for lost
productivity on a whole lot of developers, and, time being money
specially in this cases, it seems cheaper. Although, as I said, I do
not know the restrictions.

> So you're suggesting running multiple instances of postgres in the same
> operating system, and they'll be able to share cache, instead of trying
> to get multiple DBs in the same instance? Is there any accepted method
> of managing N postgres instances on one host? Any tools or documentation
> for that, or best practices? Seems scary to me, but I suppose it's the
> best option we have...

You are scared of N servers? Someone who was looking to copy the
tablespaces? Do not worry, that is easy. If you want tools, Debian and
ubuntu installations install with the ability to run multiple server
instances, although that may not be adequate for you, you could grab
some ideas from there. We have done this kind of thing for version
upgrades ( running two versions in paralell, using pgbouncer to be
able to migrate databases one by one, and it was easy. Basically, all
a postgres instance uses is a directory for the data and a port
number. If you configure a master volume with has every data inside (
even the conf files ), you can then spin up an instance easily by
mounting COW copy of it elsewhere, changing the port in the conf and
firing a postmaster. IIRC you do not to even edit the conf and can
pass the port at start if it is not in the conf file. You may have a
problem if you are using tablespaces in the snapshot, this would
probably need a little fixing of links before starting the instance.


> By "A postgres instance should happily run in half a gig" I assume
> you're not counting cache/buffers, etc.? Because when trying to run
> automation (i.e. automated testing) on our application, we can't get
> query execution time down to an acceptable level with anything less than
> ~48GB memory available in the host.

There it comes again. Big problem, little resources. You use more RAM
for automation testing than I use to run production for our whole shop
( counting every process, not just the DB ). Postgres will run
happily, but slowly in your case. If you need to run automation on
every one of your 50 copies, you have the minimum requirement, 48Gb.
With VM I doubt you could squeeze it in less than 2400Gb, although
memory deduplication may help you. But if you run 50 copies of
postgres in the same OS and the OS knows the datadirs are COW, so it
automatically shares caches, you may be able to do better. As always
it depends on what your 48G are used for, if they are for work mem, it
wont help you. If they are for cache it might. If they are for shared
buffers, you may investigate some tunning. Anyway, the numbers you
quote everytime you put one always remember me of the classical film
words 'He is dead, but still does not know it', sorry, but this is my
impression.

> Thanks for the suggestion though. I was considering this as an
> alternative, but every experience I've had of running multiple (let's
> say, more than 2 or 3) instances of the same application/daemon on the
> same host has inevitably led to all sorts of strange issues and
> problems... I suppose it sounds like it's going to be the only option
> though...

Running multiple copies of the same program requires a little
planning, but is a great resource saver. You only have one OS, one
cache, one cpu / disk scheduler running ( and one init daemon, one
cron daemon, etc... ). It's a kind of lightweight virtualization. And
postgres is normally very well behaved in this scenarios.

Francisco Olarte.


pgsql-general by date:

Previous
From: Rémi Cura
Date:
Subject: Toast and slice of toast
Next
From: Francisco Olarte
Date:
Subject: Re: Restore postgresql data directory to tablespace on new host? Or swap tablespaces?