Thread: Hot Backup Ability

Hot Backup Ability

From
Michael Richards
Date:
Hi.
I've been mulling around a lot with this idea. I've looked around a bit
for info on being able to do hot backups on a running database, but there
isn't a lot of info available. The problem with just pg_dumping the data
is that it doesn't work well with large databases that are expected to be
processing transactions during the backup time period.

Dropping postgres down to a select-only lock level on all databases at
once was my thought. In order to keep the system running hot, you'd have
to set a flag to say that database is being backed up. My idea is to allow
a special directory where the deltas are written. IE: Someone inserts a
record, it would need to write that page to a file in the temp dir for
both the table, and its indexes. Then, when a select is run, it would have
to first check the delta table files, then the real indexes for the page
it's looking for.

This way, you could guarantee that the files being backed up would not be
altered in any way during the backup, and the deltas would be the only
overhead. Using the hole in file feature, I think that page changes could
be added to the file without making to too large, but I've not looked
closely on how indexes are physically stored to see this. I suppose the NT
port would require double the size of the database to do this, since I
don't think winblows supports holes in a file.

With the database in select-only mode, someone could either do a pg_dump
style backup, or backup the actual tables. I am guessing that it's more of
a restore time / backup size tradeoff with each backup method.

One reason I am looking at this (a possible 6.6 feature?) is that we are
using postgresql for a classifieds database which will replace a
SQL-Server. The database will easily be in the 10's of gigabytes range
with a few million items. I will of course need to backup this beast
without preventing the clients from adding things.

If someone can point me in the right direction, I can attempt to make it
work and submit a pile 'o patches againt 6.5.

Comments? 

-Michael



Re: [HACKERS] Hot Backup Ability

From
Bruce Momjian
Date:
Hot backups were added in 6.5.


> Hi.
> I've been mulling around a lot with this idea. I've looked around a bit
> for info on being able to do hot backups on a running database, but there
> isn't a lot of info available. The problem with just pg_dumping the data
> is that it doesn't work well with large databases that are expected to be
> processing transactions during the backup time period.
> 
> Dropping postgres down to a select-only lock level on all databases at
> once was my thought. In order to keep the system running hot, you'd have
> to set a flag to say that database is being backed up. My idea is to allow
> a special directory where the deltas are written. IE: Someone inserts a
> record, it would need to write that page to a file in the temp dir for
> both the table, and its indexes. Then, when a select is run, it would have
> to first check the delta table files, then the real indexes for the page
> it's looking for.
> 
> This way, you could guarantee that the files being backed up would not be
> altered in any way during the backup, and the deltas would be the only
> overhead. Using the hole in file feature, I think that page changes could
> be added to the file without making to too large, but I've not looked
> closely on how indexes are physically stored to see this. I suppose the NT
> port would require double the size of the database to do this, since I
> don't think winblows supports holes in a file.
> 
> With the database in select-only mode, someone could either do a pg_dump
> style backup, or backup the actual tables. I am guessing that it's more of
> a restore time / backup size tradeoff with each backup method.
> 
> One reason I am looking at this (a possible 6.6 feature?) is that we are
> using postgresql for a classifieds database which will replace a
> SQL-Server. The database will easily be in the 10's of gigabytes range
> with a few million items. I will of course need to backup this beast
> without preventing the clients from adding things.
> 
> If someone can point me in the right direction, I can attempt to make it
> work and submit a pile 'o patches againt 6.5.
> 
> Comments? 
> 
> -Michael
> 
> 
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Hot Backup Ability

From
Duane Currie
Date:
Hi Mike.

Get that job in Ottawa?

Anyway, version 6.5 apparently supports hot backups by using MVCC
to give a view of a consistent version of the database during a
pg_dump (http://postgresql.nextpath.com/doxlist.html  search for 'backup')

What you explain below is basically version-controlled additions anyway.

Anyone correct me if I'm wrong...

Duane

> Hi.
> I've been mulling around a lot with this idea. I've looked around a bit
> for info on being able to do hot backups on a running database, but there
> isn't a lot of info available. The problem with just pg_dumping the data
> is that it doesn't work well with large databases that are expected to be
> processing transactions during the backup time period.
> 
> Dropping postgres down to a select-only lock level on all databases at
> once was my thought. In order to keep the system running hot, you'd have
> to set a flag to say that database is being backed up. My idea is to allow
> a special directory where the deltas are written. IE: Someone inserts a
> record, it would need to write that page to a file in the temp dir for
> both the table, and its indexes. Then, when a select is run, it would have
> to first check the delta table files, then the real indexes for the page
> it's looking for.
> 
> This way, you could guarantee that the files being backed up would not be
> altered in any way during the backup, and the deltas would be the only
> overhead. Using the hole in file feature, I think that page changes could
> be added to the file without making to too large, but I've not looked
> closely on how indexes are physically stored to see this. I suppose the NT
> port would require double the size of the database to do this, since I
> don't think winblows supports holes in a file.
> 
> With the database in select-only mode, someone could either do a pg_dump
> style backup, or backup the actual tables. I am guessing that it's more of
> a restore time / backup size tradeoff with each backup method.
> 
> One reason I am looking at this (a possible 6.6 feature?) is that we are
> using postgresql for a classifieds database which will replace a
> SQL-Server. The database will easily be in the 10's of gigabytes range
> with a few million items. I will of course need to backup this beast
> without preventing the clients from adding things.
> 
> If someone can point me in the right direction, I can attempt to make it
> work and submit a pile 'o patches againt 6.5.
> 
> Comments? 
> 
> -Michael
> 
> 



Re: [HACKERS] Hot Backup Ability

From
Michael Richards
Date:
On Tue, 29 Jun 1999, Duane Currie wrote:

> Anyway, version 6.5 apparently supports hot backups by using MVCC
> to give a view of a consistent version of the database during a
> pg_dump (http://postgresql.nextpath.com/doxlist.html  search for
'backup')
Hrm. Nothing pops out.   
Just out of curiosity, I did a DUMP on the database while running a script
that ran a pile of updates. When I restored the database files, it was so
corrupted that I couldn't even run a select. vacuum just core dumped...   
If I can just run pg_dump to back it up, how does this guarantee any sort
of referential integrity? Also during such a dump, it seems that things   
block while waiting for a lock. This also happens during a pg_vacuum. I   
thought that mvcc was supposed to stop this...   
-Michael   



Re: [HACKERS] Hot Backup Ability

From
Bruce Momjian
Date:
> On Tue, 29 Jun 1999, Duane Currie wrote:
> 
> > Anyway, version 6.5 apparently supports hot backups by using MVCC
> > to give a view of a consistent version of the database during a
> > pg_dump (http://postgresql.nextpath.com/doxlist.html  search for
> 'backup')
> Hrm. Nothing pops out.
>     
> Just out of curiosity, I did a DUMP on the database while running a script
> that ran a pile of updates. When I restored the database files, it was so
> corrupted that I couldn't even run a select. vacuum just core dumped...

When you say DUMP, you mean pg_dump, right?  Are you using 6.5?

> If I can just run pg_dump to back it up, how does this guarantee any sort
> of referential integrity? Also during such a dump, it seems that things   
> block while waiting for a lock. This also happens during a pg_vacuum. I   
> thought that mvcc was supposed to stop this...

OK, sounds like you are using pg_dump, and 6.5.  pg_vacuum still blocks,
but pg_dump shouldn't.  This sounds unusual.  You should have gotten
everything at the time of the _start_ of the dump.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Hot Backup Ability

From
Todd Graham Lewis
Date:
On Tue, 29 Jun 1999, Duane Currie wrote:

> Anyway, version 6.5 apparently supports hot backups by using MVCC
> to give a view of a consistent version of the database during a
> pg_dump (http://postgresql.nextpath.com/doxlist.html  search for 'backup')

The string "backup" does not appear on that page, or on the "Documentation"
page linked off of there.  The section on backups in the integrated manual
doesn't say anything about MVCC.

Did any MVCC docs ever get written?

--
Todd Graham Lewis                        Postmaster, MindSpring Enterprises
tlewis@mindspring.net                                (800) 719-4664, x22804
                         "There is no spoon."



Re: [HACKERS] Hot Backup Ability

From
Vadim Mikheev
Date:
Todd Graham Lewis wrote:
> 
> On Tue, 29 Jun 1999, Duane Currie wrote:
> 
> > Anyway, version 6.5 apparently supports hot backups by using MVCC
> > to give a view of a consistent version of the database during a
> > pg_dump (http://postgresql.nextpath.com/doxlist.html  search for 'backup')
> 
> The string "backup" does not appear on that page, or on the "Documentation"
> page linked off of there.  The section on backups in the integrated manual
> doesn't say anything about MVCC.
> 
> Did any MVCC docs ever get written?

http://postgresql.nextpath.com/docs/user/index.html:

10. Multi-Version Concurrency Control

Vadim


Re: [HACKERS] Hot Backup Ability

From
Michael Richards
Date:
On Tue, 29 Jun 1999, Bruce Momjian wrote:

> > Just out of curiosity, I did a DUMP on the database while running a script
> > that ran a pile of updates. When I restored the database files, it was so
> > corrupted that I couldn't even run a select. vacuum just core dumped...
> 
> When you say DUMP, you mean pg_dump, right?  Are you using 6.5?

Erm. Well, no. I was running ufsdump. Once I read the section on mvcc and
re-did the test with the pg_dump, I realised that it does work as
documented...

I should think this is a good feature to broadcast to everyone. I don't
think other free systems support it.

The thing I got confuzed with that blocked transactions was the pg_vacuum.
Seeing as how it physically re-arranges data inside the tables and
indexes, is there any hope for not blocking the table for a long time as
it re-arranges a 15 gig table?

Will re-usable page support (whenever it is expected) eliminate the need
for vacuum?

Would it be easy to come up with a scheme for the vacuum function defrag a
set number of pages and such, release its locks if there is another
process blocked and waiting, then resume after that process is finished?

-Michael



Re: [HACKERS] Hot Backup Ability

From
Bruce Momjian
Date:
> On Tue, 29 Jun 1999, Bruce Momjian wrote:
> 
> > > Just out of curiosity, I did a DUMP on the database while running a script
> > > that ran a pile of updates. When I restored the database files, it was so
> > > corrupted that I couldn't even run a select. vacuum just core dumped...
> > 
> > When you say DUMP, you mean pg_dump, right?  Are you using 6.5?
> 
> Erm. Well, no. I was running ufsdump. Once I read the section on mvcc and
> re-did the test with the pg_dump, I realised that it does work as
> documented...


Woh!  Not a good idea.  We can't get a proper snapshot if the ufs blocks
are moving around while we are doing the backup.  We need pg_dump.

Glad it worked when you did it with pg_dump.

> I should think this is a good feature to broadcast to everyone. I don't
> think other free systems support it.

Probably not.  We have it as one of our main items in the release notes,
and on the web page describing the release.  We need people like you to
tell others about it.

> 
> The thing I got confuzed with that blocked transactions was the pg_vacuum.
> Seeing as how it physically re-arranges data inside the tables and
> indexes, is there any hope for not blocking the table for a long time as
> it re-arranges a 15 gig table?

Not really.  In fact, it even shrinks the table to give back free space.
The 6.5 pg_vacuum is much faster than earlier versions, but on a 15gig
table, it is going to take some time.

Some day, it would be nice to allow re-use of expired rows without
vacuum.  It is on our TODO list.

> Will re-usable page support (whenever it is expected) eliminate the need
> for vacuum?

It will allow you to vacuum less frequently, and perhaps never if you
don't want space back from expired rows.

> Would it be easy to come up with a scheme for the vacuum function defrag a
> set number of pages and such, release its locks if there is another
> process blocked and waiting, then resume after that process is finished?

That is a very nice idea.  We could just release and reaquire the lock,
knowing that if there is someone waiting, they would get the lock. 
Maybe someone can comment on this?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Hot Backup Ability

From
Duane Currie
Date:
Oops... sorry guys...  forgot about the frames.

The doc I was referring to was:
http://www.postgresql.org/docs/admin/release.htm

Duane

> On Tue, 29 Jun 1999, Duane Currie wrote:
> 
> > Anyway, version 6.5 apparently supports hot backups by using MVCC
> > to give a view of a consistent version of the database during a
> > pg_dump (http://postgresql.nextpath.com/doxlist.html  search for 'backup')
> 
> The string "backup" does not appear on that page, or on the "Documentation"
> page linked off of there.  The section on backups in the integrated manual
> doesn't say anything about MVCC.
> 
> Did any MVCC docs ever get written?
> 
> --
> Todd Graham Lewis                        Postmaster, MindSpring Enterprises
> tlewis@mindspring.net                                (800) 719-4664, x22804
> 
>                           "There is no spoon."
> 



Vaccum (Was: Re: [HACKERS] Hot Backup Ability)

From
The Hermit Hacker
Date:
On Wed, 30 Jun 1999, Bruce Momjian wrote:

> > Would it be easy to come up with a scheme for the vacuum function defrag a
> > set number of pages and such, release its locks if there is another
> > process blocked and waiting, then resume after that process is finished?
> 
> That is a very nice idea.  We could just release and reaquire the lock,
> knowing that if there is someone waiting, they would get the lock. 
> Maybe someone can comment on this?

My first thought is "doesn't this still require the 'page-reusing'
functionality to exist"?  Which virtually eliminates the problem...

If not, then why can't something be done where this is transparent
altogther?  Have some sort of mechanism that keeps track of "dead
space"...a trigger that says after X tuples have been deleted, do an
automatic vacuum of the database?

The automatic vacuum would be done in a way similar to Michael's
suggestion above...scan through for the first 'dead space', lock the table
for a short period of time and "move records up".  How many tuples could
you move in a very short period of time, such that it is virtually
transparent to end-users?

As a table gets larger and larger, a few 'dead tuples' aren't going to
make much of a different in performance, so make the threshold some
percentage of the size of the table, so at it grows, the number of 'dead
tuples' has to be larger...

And leave out the truncate at the end...

The 'manual vacuum' would still need to be run periodically, for the
truncate and for stats...

Just a thought...:)

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] Hot Backup Ability

From
Don Baccus
Date:
At 09:09 PM 6/29/99 -0400, Bruce Momjian wrote: 
>> Just out of curiosity, I did a DUMP on the database while running a script
>> that ran a pile of updates. When I restored the database files, it was so
>> corrupted that I couldn't even run a select. vacuum just core dumped...

>When you say DUMP, you mean pg_dump, right?  Are you using 6.5?

In his first note, he was proposing a scheme that would allow either
filesystem dumps or pg_dumps, which I think a couple of respondents
missed.

So I suspect he means a filesystem dump in this case.  Which of course
won't work in postgres, or in Oracle.
>
>


- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, and other goodies at
http://donb.photo.net


Re: Vaccum (Was: Re: [HACKERS] Hot Backup Ability)

From
Vince Vielhaber
Date:
On Wed, 30 Jun 1999, The Hermit Hacker wrote:

> On Wed, 30 Jun 1999, Bruce Momjian wrote:
> 
> > > Would it be easy to come up with a scheme for the vacuum function defrag a
> > > set number of pages and such, release its locks if there is another
> > > process blocked and waiting, then resume after that process is finished?
> > 
> > That is a very nice idea.  We could just release and reaquire the lock,
> > knowing that if there is someone waiting, they would get the lock. 
> > Maybe someone can comment on this?
> 
> My first thought is "doesn't this still require the 'page-reusing'
> functionality to exist"?  Which virtually eliminates the problem...
> 
> If not, then why can't something be done where this is transparent
> altogther?  Have some sort of mechanism that keeps track of "dead
> space"...a trigger that says after X tuples have been deleted, do an
> automatic vacuum of the database?
> 
> The automatic vacuum would be done in a way similar to Michael's
> suggestion above...scan through for the first 'dead space', lock the table
> for a short period of time and "move records up".  How many tuples could
> you move in a very short period of time, such that it is virtually
> transparent to end-users?
> 
> As a table gets larger and larger, a few 'dead tuples' aren't going to
> make much of a different in performance, so make the threshold some
> percentage of the size of the table, so at it grows, the number of 'dead
> tuples' has to be larger...
> 
> And leave out the truncate at the end...
> 
> The 'manual vacuum' would still need to be run periodically, for the
> truncate and for stats...
> 
> Just a thought...:)

Why not one step further.  Constant background vacuuming.  Do away with
the need to vacuum altogether.  Have something in the backend always 
scanning for dead tuples/dead space and when it finds some it can lock-
move-unlock as it goes.  This way it's not working with a set number or
looking for a certain threshold, just a constant maintenance process.

No?

Vince.
-- 
==========================================================================
Vince Vielhaber -- KA8CSH   email: vev@michvhf.com   flame-mail: /dev/null      # include <std/disclaimers.h>
       TEAM-OS2       Online Campground Directory    http://www.camping-usa.com      Online Giftshop Superstore
http://www.cloudninegifts.com
==========================================================================