Thread: doubt with pg_dump and high concurrent used databases

doubt with pg_dump and high concurrent used databases

From

Pablo Alcaraz

Date:

25 November 2007, 12:46:37

Hi all,

I read that pg_dump can run while the database is being used and makes
"consistent backups".

I have a huge and *heavy* selected, inserted and updated database.
Currently I have a cron task that disconnect the database users, make a
backup using pg_dump and put the database online again. The problem is,
now there are too much information and everyday the database store more
and more data, the backup process needs more and more time to run and I
am thinking about to do the backup using a process that let me to do it
with the minimal interruptions for the users.

I do not need a last second backup. I could the a backup with "almost
all" the data but I need the information on it to be coherent. For
example, if the backup store information about an invoice it *must* to
store both header and items invoice information. I could live if the
backup does not store some invoices information when is ran, because
they ll be backuped the next time the backup process run. But I can not
store only a part of the invoices. That is I call a coherent backup.

The best for me is that the cron tab does a concurrent backup with all
the information until the time it starts to run while the clients are
using the database. Example: if the cron launch the backup process at
12:30 AM, the backup moust be builded with all the information *until*
12:30AM. So if I need to restore it I get a database coherent with the
same information like it was at 12:30AM. it does not matter if the
process needs 4 hours to run.

Does the pg_dump create this kind of "consistent backups"? Or do I need
to do the backups using another program?

Regards

Pablo

Re: doubt with pg_dump and high concurrent used databases

From

Heikki Linnakangas

Date:

25 November 2007, 14:18:11

Pablo Alcaraz wrote:
> I read that pg_dump can run while the database is being used and makes
> "consistent backups".
>
> I have a huge and *heavy* selected, inserted and updated database.
> Currently I have a cron task that disconnect the database users, make a
> backup using pg_dump and put the database online again. The problem is,
> now there are too much information and everyday the database store more
> and more data, the backup process needs more and more time to run and I
> am thinking about to do the backup using a process that let me to do it
> with the minimal interruptions for the users.
>
> I do not need a last second backup. I could the a backup with "almost
> all" the data but I need the information on it to be coherent. For
> example, if the backup store information about an invoice it *must* to
> store both header and items invoice information. I could live if the
> backup does not store some invoices information when is ran, because
> they ll be backuped the next time the backup process run. But I can not
> store only a part of the invoices. That is I call a coherent backup.
>
> The best for me is that the cron tab does a concurrent backup with all
> the information until the time it starts to run while the clients are
> using the database. Example: if the cron launch the backup process at
> 12:30 AM, the backup moust be builded with all the information *until*
> 12:30AM. So if I need to restore it I get a database coherent with the
> same information like it was at 12:30AM. it does not matter if the
> process needs 4 hours to run.
>
> Does the pg_dump create this kind of "consistent backups"?

Yes, pg_dump is exactly what you need. It will dump the contents of the
database as they were when it started, regardless of how long it takes,
and there's no need to shut down or disconnect concurrent users.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: doubt with pg_dump and high concurrent used databases

From

Erik Jones

Date:

25 November 2007, 14:21:07

On Nov 25, 2007, at 10:46 AM, Pablo Alcaraz wrote:

> Hi all,
>
> I read that pg_dump can run while the database is being used and makes
> "consistent backups".
>
> I have a huge and *heavy* selected, inserted and updated database.
> Currently I have a cron task that disconnect the database users,
> make a
> backup using pg_dump and put the database online again. The problem
> is,
> now there are too much information and everyday the database store
> more
> and more data, the backup process needs more and more time to run
> and I
> am thinking about to do the backup using a process that let me to
> do it
> with the minimal interruptions for the users.
>
> I do not need a last second backup. I could the a backup with "almost
> all" the data but I need the information on it to be coherent. For
> example, if the backup store information about an invoice it *must* to
> store both header and items invoice information. I could live if the
> backup does not store some invoices information when is ran, because
> they ll be backuped the next time the backup process run. But I can
> not
> store only a part of the invoices. That is I call a coherent backup.
>
> The best for me is that the cron tab does a concurrent backup with all
> the information until the time it starts to run while the clients are
> using the database. Example: if the cron launch the backup process at
> 12:30 AM, the backup moust be builded with all the information *until*
> 12:30AM. So if I need to restore it I get a database coherent with the
> same information like it was at 12:30AM. it does not matter if the
> process needs 4 hours to run.
>
> Does the pg_dump create this kind of "consistent backups"? Or do I
> need
> to do the backups using another program?

Yes, that is exactly what pg_dump does.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

Re: doubt with pg_dump and high concurrent used databases

From

"Peter Childs"

Date:

25 November 2007, 15:15:24

On 25/11/2007, Erik Jones <erik@myemma.com> wrote:

On Nov 25, 2007, at 10:46 AM, Pablo Alcaraz wrote:

> Hi all,
>
> I read that pg_dump can run while the database is being used and makes
> "consistent backups".
>
> I have a huge and *heavy* selected, inserted and updated database.
> Currently I have a cron task that disconnect the database users,
> make a
> backup using pg_dump and put the database online again. The problem
> is,
> now there are too much information and everyday the database store
> more
> and more data, the backup process needs more and more time to run
> and I
> am thinking about to do the backup using a process that let me to
> do it
> with the minimal interruptions for the users.
>
> I do not need a last second backup. I could the a backup with "almost
> all" the data but I need the information on it to be coherent. For
> example, if the backup store information about an invoice it *must* to
> store both header and items invoice information. I could live if the
> backup does not store some invoices information when is ran, because
> they ll be backuped the next time the backup process run. But I can
> not
> store only a part of the invoices. That is I call a coherent backup.
>
> The best for me is that the cron tab does a concurrent backup with all
> the information until the time it starts to run while the clients are
> using the database. Example: if the cron launch the backup process at
> 12:30 AM, the backup moust be builded with all the information *until*
> 12:30AM. So if I need to restore it I get a database coherent with the
> same information like it was at 12:30AM. it does not matter if the
> process needs 4 hours to run.
>
> Does the pg_dump create this kind of "consistent backups"? Or do I
> need
> to do the backups using another program?

Yes, that is exactly what pg_dump does.

Yes so long as you are using transactions correctly. Ie doing a begin before each invoice and a commit afterwards if your not bothering and using auto commit you *may* have problems. pg_dump will show a constant state at the time when the backup was started. If your database was not "consistent" at that time you may have issues, But it will be constant from a database point of view ie foreign keys, primary keys, check constraints, triggers etc.

It all depends what you mean by consistent.

Peter.

Re: doubt with pg_dump and high concurrent used databases

From

Tom Lane

Date:

25 November 2007, 15:35:32

"Peter Childs" <peterachilds@gmail.com> writes:
> On 25/11/2007, Erik Jones <erik@myemma.com> wrote:
>>> Does the pg_dump create this kind of "consistent backups"? Or do I
>>> need to do the backups using another program?
>>
>> Yes, that is exactly what pg_dump does.
>>
> Yes so long as you are using transactions correctly. Ie doing a begin before
> each invoice and a commit afterwards if your not bothering and using auto
> commit you *may* have problems.

I think you need to qualify that a bit more.  What you're saying is that
if an application has consistency requirements that are momentarily
violated during multi-statement updates, and it fails to wrap such
updates into a single transaction, then pg_dump could capture one of the
intermediate states.  That's true, but it's hardly pg_dump's fault.
If there were a system crash partway through such a sequence, the
consistency requirements would be violated afterwards, too.

            regards, tom lane

Re: doubt with pg_dump and high concurrent used databases

From

Pablo Alcaraz

Date:

25 November 2007, 18:37:42

Tom Lane wrote:
> "Peter Childs" <peterachilds@gmail.com> writes:
>
>> On 25/11/2007, Erik Jones <erik@myemma.com> wrote:
>>
>>>> Does the pg_dump create this kind of "consistent backups"? Or do I
>>>> need to do the backups using another program?
>>>>
>>> Yes, that is exactly what pg_dump does.
>>>
>>>
>> Yes so long as you are using transactions correctly. Ie doing a begin before
>> each invoice and a commit afterwards if your not bothering and using auto
>> commit you *may* have problems.
>>
>
> I think you need to qualify that a bit more.  What you're saying is that
> if an application has consistency requirements that are momentarily
> violated during multi-statement updates, and it fails to wrap such
> updates into a single transaction, then pg_dump could capture one of the
> intermediate states.  That's true, but it's hardly pg_dump's fault.
> If there were a system crash partway through such a sequence, the
> consistency requirements would be violated afterwards, too.
>
>

Agree. In my case I define "consistent database state" like the state
the database has when the program that use it is stopped normally and
without errors. In this "state" the program starts without troubles and
"everything looks fine". I believe this behavior is because all the
inserts and updates are made using transactions. Another things will be
a bug, it ll be fixed and it ll not be pg_dump fault.

So if pg_dump can capture a "consistent state" with all the data until
the start time, without all the pending open transaction updates/inserts
in the same way that I did when I stopped the program before start
pg_dump, for me is usefull and enough to solve my problem.

Thanks to all!

Pablo

Re: doubt with pg_dump and high concurrent used databases

From

"Peter Childs"

Date:

26 November 2007, 06:41:32

On 25/11/2007, Pablo Alcaraz <pabloa@laotraesquina.com.ar> wrote:

Tom Lane wrote:
> "Peter Childs" <peterachilds@gmail.com> writes:
>
>> On 25/11/2007, Erik Jones <erik@myemma.com > wrote:
>>
>>>> Does the pg_dump create this kind of "consistent backups"? Or do I
>>>> need to do the backups using another program?
>>>>
>>> Yes, that is exactly what pg_dump does.
>>>
>>>
>> Yes so long as you are using transactions correctly. Ie doing a begin before
>> each invoice and a commit afterwards if your not bothering and using auto
>> commit you *may* have problems.
>>
>
> I think you need to qualify that a bit more. What you're saying is that
> if an application has consistency requirements that are momentarily
> violated during multi-statement updates, and it fails to wrap such
> updates into a single transaction, then pg_dump could capture one of the
> intermediate states. That's true, but it's hardly pg_dump's fault.
> If there were a system crash partway through such a sequence, the
> consistency requirements would be violated afterwards, too.
>
>

Agree. In my case I define "consistent database state" like the state
the database has when the program that use it is stopped normally and
without errors. In this "state" the program starts without troubles and
"everything looks fine". I believe this behavior is because all the
inserts and updates are made using transactions. Another things will be
a bug, it ll be fixed and it ll not be pg_dump fault.

So if pg_dump can capture a "consistent state" with all the data until
the start time, without all the pending open transaction updates/inserts
in the same way that I did when I stopped the program before start
pg_dump, for me is usefull and enough to solve my problem.

Thanks to all!

Pablo

Given your long description over what you though was "constant" I thought it important that the answer yes but was given rather than just a plain yes. I've met quite a few apps that create inconstant databases when the database its self is actually consistent.

Peter