Thread: I don't want to back up index files

I don't want to back up index files

From

Glen Parker

Date:

10 March 2009, 22:56:38

I am wondering the feasibility of having PG continue to work even if
non-essential indexes are gone or corrupt.  I brought this basic concept
up at some point in the past, but now I have a different motivation, so
I want to strike up discussion about it again.  This time around, I
simply don't want to back up indexes if I don't have to.  Because
indexes contain essentially redundant data, losing one does not equate
to losing real data.  Therefore, backing them up represents a lot of
overhead for very little benefit.

Here's the basic idea:

1) New field to pg_index (indvalid boolean).
2) Query planner skips indexes where indvalid = false.
3) Executer does not update indexes where indvalid = false.
4) Executer refuses insert or update to unique columns where indvalid =
false, throwing an error.
5) WAL roll forward marks indvalid = false if index file(s) are missing,
rather than panicking.
6) REINDEX recognizes syntax to only build indexes with indvalid =
false, marks indvalid = true.

Close to 25% of the on disk bulk of my database is index files.  It
would save a significant amount of the system resources used during the
backup, if I didn't have to archive the index files.  In the unlikely
event that a restore/roll forward becomes necessary, I could simply
issue something like "REINDEX DATABASE foo INVALID;" to restore all the
missing indexes and return the database to full function.  Prior to a
reindex, the database would perform poorly and refuse to do certain
inserts and updates, but the data would be available.  Backup files
would be smaller, and the restore/roll forward would be faster.

No down sides jump out at me, and it seems to me that for a regular PG
code hacker this could actually be fairly simple to implement.

Any chance of something like this being done in the future?


-Glen

Re: I don't want to back up index files

From

"Joshua D. Drake"

Date:

11 March 2009, 12:03:52

On Tue, 2009-03-10 at 18:54 -0700, Glen Parker wrote:
> I am wondering the feasibility of having PG continue to work even if
> non-essential indexes are gone or corrupt.  I brought this basic concept
> up at some point in the past, but now I have a different motivation, so
> I want to strike up discussion about it again.  This time around, I
> simply don't want to back up indexes if I don't have to.  Because
> indexes contain essentially redundant data, losing one does not equate
> to losing real data.  Therefore, backing them up represents a lot of
> overhead for very little benefit.

Hello,

I am sorry but this seems very silly. If you don't want to back up
indexes use pg_dump.

>
> Any chance of something like this being done in the future?
>

I am going to go out on a limb here and say, "no".

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: I don't want to back up index files

From

Grzegorz Jaśkiewicz

Date:

11 March 2009, 12:33:23

On Wed, Mar 11, 2009 at 3:03 PM, Joshua D. Drake <jd@commandprompt.com> wrote:

>> Any chance of something like this being done in the future?
>>
>
> I am going to go out on a limb here and say, "no".

That would probably be possible, by placing all indicies in a separate
directory in data, but....
Well, that would create whole another dimension of mess and
requirements on restore.

So like JD said, if you don't want to dump indicies - just use pg_dump...

--
GJ

Re: I don't want to back up index files

From

Glen Parker

Date:

11 March 2009, 18:27:24

Grzegorz Jaśkiewicz wrote:
> On Wed, Mar 11, 2009 at 3:03 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
>
>>> Any chance of something like this being done in the future?
>>>
>> I am going to go out on a limb here and say, "no".
>
> That would probably be possible, by placing all indicies in a separate
> directory in data, but....

I could simply use the system catalog to determine which files to backup
and which to ignore.  I suppose I would prefer to backup all unique
indexes, and perhaps all system indexes.  It's the proliferation of
large indexes that serve only to enhance select performance that I would
want to ignore.

> So like JD said, if you don't want to dump indicies - just use pg_dump...

If pg_dump were an acceptable backup tool, we wouldn't need PITR, would
we?  We used pg_dump for years.  There's a very good reason we no longer
do.  That suggestion is silly.

-Glen

Re: I don't want to back up index files

From

"Joshua D. Drake"

Date:

11 March 2009, 20:27:50

On Wed, 2009-03-11 at 14:25 -0700, Glen Parker wrote:
> Grzegorz Jaśkiewicz wrote:

> > So like JD said, if you don't want to dump indicies - just use pg_dump...
>
> If pg_dump were an acceptable backup tool, we wouldn't need PITR, would
> we?  We used pg_dump for years.  There's a very good reason we no longer
> do.  That suggestion is silly.

*shrug* you can consider it silly. It doesn't change the outcome.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: I don't want to back up index files

From

Scott Marlowe

Date:

11 March 2009, 20:38:17

On Wed, Mar 11, 2009 at 3:25 PM, Glen Parker <glenebob@nwlink.com> wrote:
> Grzegorz Jaśkiewicz wrote:
>> So like JD said, if you don't want to dump indicies - just use pg_dump...
>
> If pg_dump were an acceptable backup tool, we wouldn't need PITR, would we?
>  We used pg_dump for years.  There's a very good reason we no longer do.
>  That suggestion is silly.

pg_dump is a perfectly acceptable backup tool, as is PITR.  They have
different ways of operating based on what you need.  Trying to make
PITR act more like pg_dump seems kind of silly to me.

Re: I don't want to back up index files

From

Glen Parker

Date:

11 March 2009, 21:00:05

Scott Marlowe wrote:

> pg_dump is a perfectly acceptable backup tool, as is PITR.  They have
> different ways of operating based on what you need.  Trying to make
> PITR act more like pg_dump seems kind of silly to me.

pg_dump is not acceptable to us because of the potential to lose many
hours of valuable data.  Why would pg_dump even be relevant to this
discussion?  PITR offers a benefit that pg_dump does not, a benefit that
we, and countless other organizations, obviously find useful.
Suggesting that a person who's been managing PG in a commercial setting
since version 6.4 should just use pg_dump as an alternative to PITR is,
well, rather insulting.

That's two people now who have called the idea "silly" without even a
hint of a supporting argument.  Why would it be "silly" to improve the
performance of a highly valuable tool set without compromising its
utility?  Am I missing something here?  That's certainly possible, but
the idea didn't just hatch last night; I've put enough thought into this
to have reason to believe it's more than just "silly".

-Glen

Re: I don't want to back up index files

From

Alvaro Herrera

Date:

11 March 2009, 21:21:35

Glen Parker escribió:

> That's two people now who have called the idea "silly" without even a
> hint of a supporting argument.  Why would it be "silly" to improve the
> performance of a highly valuable tool set without compromising its
> utility?  Am I missing something here?  That's certainly possible, but
> the idea didn't just hatch last night; I've put enough thought into this
> to have reason to believe it's more than just "silly".

FWIW I don't think this idea is silly at all.  It's so not-silly, in
fact, that we already have some access methods that do this if an index
cannot be recovered (I think at least GiST does it).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: I don't want to back up index files

From

Tom Lane

Date:

11 March 2009, 22:49:40

Alvaro Herrera <alvherre@commandprompt.com> writes:
> FWIW I don't think this idea is silly at all.  It's so not-silly, in
> fact, that we already have some access methods that do this if an index
> cannot be recovered (I think at least GiST does it).

Well, there's a difference between "rebuild the index when it can't be
recovered" and "lose the index anytime the system burps".  AFAICS what
Glen is proposing is to not WAL-log index changes, and with that any
crash no matter how minor would have to invalidate indexes.

            regards, tom lane

Re: I don't want to back up index files

From

"Joshua D. Drake"

Date:

11 March 2009, 22:51:02

On Wed, 2009-03-11 at 16:57 -0700, Glen Parker wrote:
> Scott Marlowe wrote:

> That's two people now who have called the idea "silly" without even a
> hint of a supporting argument.  Why would it be "silly" to improve the
> performance of a highly valuable tool set without compromising its
> utility?  Am I missing something here?  That's certainly possible, but
> the idea didn't just hatch last night; I've put enough thought into this
> to have reason to believe it's more than just "silly".

O.k. a couple of things:

1. You could put all your indexes into a table space, this would allow
you to "try" different things with the indexes.

2. Even though my peer Alvaro doesn't think the idea is silly, I still
do and here is why. If you can invalidate the indexes you will have to
reindex (or recreate) to make them valid (as you mentioned). That is an
exclusive lock.

If your database has any level of concurrency the cost to
recreate/reindex those indexes right when you are attempting to get your
standby into production is going to be very high. Basically you are
trading 25% hard disk space for a longer, possibly excessively longer
outage. Hard disk space is so darn cheap that it doesn't seem to make
any sense.

Creating indexes concurrently is also out because while you are creating
those indexes your performance will tank because everything is
sequential scanning and there is a possibility that the concurrent
creation will fail.

Sincerely,

Joshua D. Drake

>
> -Glen
>
>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: I don't want to back up index files

From

Glen Parker

Date:

11 March 2009, 23:11:23

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> FWIW I don't think this idea is silly at all.  It's so not-silly, in
>> fact, that we already have some access methods that do this if an index
>> cannot be recovered (I think at least GiST does it).
>
> Well, there's a difference between "rebuild the index when it can't be
> recovered" and "lose the index anytime the system burps".  AFAICS what
> Glen is proposing is to not WAL-log index changes, and with that any
> crash no matter how minor would have to invalidate indexes.

Nooo...!  This has nothing to do with WAL logging index changes.  What I
propose would have no effect on an end user that continues to back up
indexes.  It would give people the *option* to not back up indexes.

This is about disaster recovery and the backups required to recover
sanely, not about hiccups that cause only the last handful of
transactions to be redone.

Causing invalidation of indexes after a crash might be acceptable in
some settings, but not in mine, and that isn't what I'm after.

-Glen

Re: I don't want to back up index files

From

Glen Parker

Date:

11 March 2009, 23:22:25

Joshua D. Drake wrote:
> 1. You could put all your indexes into a table space, this would allow
> you to "try" different things with the indexes.

Most of them are, but I still have to back them up in order to have a
valid backup, because the PITR code would choke if any are missing.

> 2. Even though my peer Alvaro doesn't think the idea is silly, I still
> do and here is why. If you can invalidate the indexes you will have to
> reindex (or recreate) to make them valid (as you mentioned). That is an
> exclusive lock.
>
> If your database has any level of concurrency the cost to
> recreate/reindex those indexes right when you are attempting to get your
> standby into production is going to be very high. Basically you are
> trading 25% hard disk space for a longer, possibly excessively longer
> outage. Hard disk space is so darn cheap that it doesn't seem to make
> any sense.

We have yet to recover from a PG disaster.  We back up every night, and
never use the back ups for anything.  To me, it seems perfectly
reasonable to get a quicker back up every night, with the remote
possibility of ever having to pay the price for it.  And IMO, the price
on recovery would be minor, if not negative.  The time to rebuild the
indexes would be offset by the shorter time needed to restore the
database cluster and allow the PITR stuff to work its magic.  And of
course keep in mind, that while indexes are building, all the tables not
being indexed are accessible.  Nothing is accessible during a PITR
operation.  I would also be able to rebuild only the most important
indexes and leave the system to operate with a few missing until things
quiet down, and then finish reindexing.

-Glen

Re: I don't want to back up index files

From

Tom Lane

Date:

11 March 2009, 23:34:47

Glen Parker <glenebob@nwlink.com> writes:
> Tom Lane wrote:
>> ... AFAICS what
>> Glen is proposing is to not WAL-log index changes, and with that any
>> crash no matter how minor would have to invalidate indexes.

> Nooo...!  This has nothing to do with WAL logging index changes.

How so?  In any PITR-based situation it seems to me you need to worry
about the WAL bulk a lot more than the bulk of the base backup.

            regards, tom lane

Re: I don't want to back up index files

From

Tom Lane

Date:

11 March 2009, 23:50:32

Glen Parker <glenebob@nwlink.com> writes:
> We have yet to recover from a PG disaster.  We back up every night, and
> never use the back ups for anything.  To me, it seems perfectly
> reasonable to get a quicker back up every night, with the remote
> possibility of ever having to pay the price for it.

Why don't you just switch to a less frequent full-backup schedule?

            regards, tom lane

Re: I don't want to back up index files

From

Scott Marlowe

Date:

12 March 2009, 00:00:00

On Wed, Mar 11, 2009 at 5:57 PM, Glen Parker <glenebob@nwlink.com> wrote:
> Scott Marlowe wrote:
>
>> pg_dump is a perfectly acceptable backup tool, as is PITR.  They have
>> different ways of operating based on what you need.  Trying to make
>> PITR act more like pg_dump seems kind of silly to me.
>
> pg_dump is not acceptable to us because of the potential to lose many hours
> of valuable data.

That's why we use pg_dump and slony.  We lose no data without a giant
hosting center fire.

>  Why would pg_dump even be relevant to this discussion?

Because it's one method of backing up a database?

>  PITR offers a benefit that pg_dump does not, a benefit that we, and
> countless other organizations, obviously find useful.

And this benefit is?  Continuous backup I assume.  Something other
forms of replication can do, and which pg_dump then becomes a useful
backup tool.

> Suggesting that a
> person who's been managing PG in a commercial setting since version 6.4
> should just use pg_dump as an alternative to PITR is, well, rather
> insulting.

Darn, I've only been around since 6.5.2, and have about four years on
an airline reservation system.  I guess my opinions just don't count.
How could I possibly understand your advanced thought processes.

> That's two people now who have called the idea "silly" without even a hint
> of a supporting argument.  Why would it be "silly" to improve the
> performance of a highly valuable tool set without compromising its utility?

Because it's the size of the WAL files that kills most people, and not
putting the index updates into WAL files would be a hack I wouldn't
trust, and having them on the otherside but not adding them is just
wasting space?  Cause maybe, you didn't explain everything as clearly
as you could, and I made assumptions based on your incomplete
description?

Re: I don't want to back up index files

From

"Joshua D. Drake"

Date:

12 March 2009, 00:34:32

On Wed, 2009-03-11 at 20:59 -0600, Scott Marlowe wrote:
> On Wed, Mar 11, 2009 at 5:57 PM, Glen Parker <glenebob@nwlink.com> wrote:
> > Scott Marlowe wrote:

> > Suggesting that a
> > person who's been managing PG in a commercial setting since version 6.4
> > should just use pg_dump as an alternative to PITR is, well, rather
> > insulting.
>
> Darn, I've only been around since 6.5.2, and have about four years on
> an airline reservation system.  I guess my opinions just don't count.
> How could I possibly understand your advanced thought processes.
>

Alright guys, let's not get into a my elephant trunk is bigger than
yours fight.

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: I don't want to back up index files

From

Alan Hodgson

Date:

12 March 2009, 13:13:36

On Wednesday 11 March 2009, Glen Parker <glenebob@nwlink.com> wrote:
> We have yet to recover from a PG disaster.  We back up every night, and
> never use the back ups for anything.  To me, it seems perfectly
> reasonable to get a quicker back up every night, with the remote
> possibility of ever having to pay the price for it.

This isn't totally relevant, but as a way to speed up base backups, I keep a
copy of the whole database rsync'd on the backup machine. The base backup
consists of refreshing the rsync, and then tarring it offline (of course
with the PITR backup commands in the sequence). My database is currently
750GB on disk and gets many tens of millions of updates a day, and the
rsync still runs in less than an hour per night.

I have done PITR recoveries (unfortunately). The log replay time dwarfs the
time it takes to copy the index files back over (it is, honestly, slower
than the original transactions were). Additionally, copying them is faster
than rebuilding them would be. Also, I can't imagine bringing a database
online without the majority of the indexes in place. The first dozen
non-indexed queries against large tables would kill the machine; not only
would you not be servicing users, but the rest of your restore would be
slowed immensely.

--
Even a sixth-grader can figure out that you can’t borrow money to pay off
your debt

Re: I don't want to back up index files

From

Glen Parker

Date:

12 March 2009, 16:59:44

Tom Lane wrote:
> Glen Parker <glenebob@nwlink.com> writes:
>> Tom Lane wrote:
>>> ... AFAICS what
>>> Glen is proposing is to not WAL-log index changes, and with that any
>>> crash no matter how minor would have to invalidate indexes.
>
>> Nooo...!  This has nothing to do with WAL logging index changes.
>
> How so?  In any PITR-based situation it seems to me you need to worry
> about the WAL bulk a lot more than the bulk of the base backup.

It isn't the bulk so much as the amount of time, and the impact to the
running system during that time, that it takes to execute the base backup.

I haven't noticed any real impact related to compressing and exporting
WAL files.

Anyway, more to the point, I'm not knowingly proposing anything that
should cause reduced system reliability in the event of a crash.

-Glen

Re: I don't want to back up index files

From

Glen Parker

Date:

12 March 2009, 17:01:33

Tom Lane wrote:
> Glen Parker <glenebob@nwlink.com> writes:
>> We have yet to recover from a PG disaster.  We back up every night, and
>> never use the back ups for anything.  To me, it seems perfectly
>> reasonable to get a quicker back up every night, with the remote
>> possibility of ever having to pay the price for it.
>
> Why don't you just switch to a less frequent full-backup schedule?


Paranoia.  Others in the organization have pushed to keep up with the
nightly back ups, so that decision is mostly out of my hands.

-Glen

Re: I don't want to back up index files

From

"Joshua D. Drake"

Date:

12 March 2009, 17:08:31

On Thu, 2009-03-12 at 12:59 -0700, Glen Parker wrote:
> Tom Lane wrote:
> > Glen Parker <glenebob@nwlink.com> writes:
> >> We have yet to recover from a PG disaster.  We back up every night, and
> >> never use the back ups for anything.  To me, it seems perfectly
> >> reasonable to get a quicker back up every night, with the remote
> >> possibility of ever having to pay the price for it.
> >
> > Why don't you just switch to a less frequent full-backup schedule?
>
>
> Paranoia.  Others in the organization have pushed to keep up with the
> nightly back ups, so that decision is mostly out of my hands.

Why not just take a backup of the pitr slave instead? If you need to do
it nightly, shut down the standby process, tar, start standby process.
You never have to  bother the master at all.

Sincerely,

Joshua D. Drake


>
> -Glen
>
>
--
PostgreSQL - XMPP: jdrake@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997

Re: I don't want to back up index files

From

Glen Parker

Date:

12 March 2009, 17:09:42

Scott Marlowe wrote:
>> That's two people now who have called the idea "silly" without even a hint
>> of a supporting argument.  Why would it be "silly" to improve the
>> performance of a highly valuable tool set without compromising its utility?
>
> Because it's the size of the WAL files that kills most people, and not
> putting the index updates into WAL files would be a hack I wouldn't
> trust, and having them on the otherside but not adding them is just
> wasting space?  Cause maybe, you didn't explain everything as clearly
> as you could, and I made assumptions based on your incomplete
> description?

I've re-read my original email and I just can't see how anybody got the
idea I was suggesting to not WAL record index changes.  That would be
insanity IMO.

So, to be clear, I'm not proposing any change to the way data is written
to the WAL files.

-Glen

Re: I don't want to back up index files

From

Alvaro Herrera

Date:

12 March 2009, 18:17:02

Glen Parker escribió:

> I've re-read my original email and I just can't see how anybody got the
> idea I was suggesting to not WAL record index changes.

The subject says that.


> That would be  insanity IMO.

Agreed :-)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: I don't want to back up index files

From

Tom Lane

Date:

12 March 2009, 20:26:23

Glen Parker <glenebob@nwlink.com> writes:
> I've re-read my original email and I just can't see how anybody got the
> idea I was suggesting to not WAL record index changes.

Mainly because the idea doesn't seem to make sense unless that's part
of the package.  If you don't cut index changes out of the WAL load
then the savings on the base backup alone aren't going to be all that
exciting when you consider the total cost of PITR backup.

Furthermore, you would need some very ugly hacks on the recovery process
to make it ignore (rather than try to apply) WAL records relating to
indexes.  I believe there are a fair number of cases where the recovery
process doesn't even know that a particular file is an index, because
the WAL stream doesn't tell it.  The live backends generating the WAL
log entries typically know that (and could suppress the entries) but the
recovery process has only a very limited view of reality.  It cannot,
for example, trust the system catalogs to be in a correct/consistent
state, so it couldn't look up the info for itself.

BTW, there's a related problem with the idea, which is that the
tools normally used to take base backups haven't got any way to
distinguish indexes from any other kind of relation.

            regards, tom lane

Re: I don't want to back up index files

From

Tom Lane

Date:

12 March 2009, 20:28:19

Alan Hodgson <ahodgson@simkin.ca> writes:
> I have done PITR recoveries (unfortunately). The log replay time dwarfs the
> time it takes to copy the index files back over (it is, honestly, slower
> than the original transactions were).

Yeah :-( ... the problem is that recovery is serialized while the
original transactions might have run in parallel.  There is work
going on to improve that, but I don't know how much better it's
likely to get.

            regards, tom lane

Re: I don't want to back up index files

From

Glen Parker

Date:

12 March 2009, 20:55:57

Tom Lane wrote:
> Glen Parker <glenebob@nwlink.com> writes:
> Mainly because the idea doesn't seem to make sense unless that's part
> of the package.  If you don't cut index changes out of the WAL load
> then the savings on the base backup alone aren't going to be all that
> exciting when you consider the total cost of PITR backup.

In our setting, I think it might be more exciting than you think.  As I
said, I've not noticed any real impact to the system related to WAL
exporting, but the nightly backup does indeed have a significant impact
because of how long it runs.  WAL export is a couple seconds ever few
minutes, which nobody ever notices.  The backup runs for a minimum of an
hour and fifteen minutes, which people definitely notice.

> Furthermore, you would need some very ugly hacks on the recovery process
> to make it ignore (rather than try to apply) WAL records relating to
> indexes.  I believe there are a fair number of cases where the recovery
> process doesn't even know that a particular file is an index, because
> the WAL stream doesn't tell it.  The live backends generating the WAL
> log entries typically know that (and could suppress the entries) but the
> recovery process has only a very limited view of reality.  It cannot,
> for example, trust the system catalogs to be in a correct/consistent
> state, so it couldn't look up the info for itself.

Could the live backends label the log entries with "hints" to be used by
the replay process?  In this case, I would think a simple flag
indicating whether replay is critical or not would suffice.

> BTW, there's a related problem with the idea, which is that the
> tools normally used to take base backups haven't got any way to
> distinguish indexes from any other kind of relation.

Yes there's no doubt it would increase the complexity of the base
backup, IF a person chooses to ignore indexes.  The up side is that
people who are happy with the backup as it is would have to do nothing
at all, it would just continue to work as it does now.  To ignore
indexes (and only certain indexes at that), you'd have to examine the
system catalog as part of each backup.  I already do that to some
extent, in order to discover all the extra tablespaces that need to be
backed up.

I guess the biggest problem I see with this is that it would have rather
a small target audience.

-Glen

Re: I don't want to back up index files

From

Simon Riggs

Date:

13 March 2009, 10:17:10

On Thu, 2009-03-12 at 12:57 -0700, Glen Parker wrote:
> Tom Lane wrote:
> > Glen Parker <glenebob@nwlink.com> writes:
> >> Tom Lane wrote:
> >>> ... AFAICS what
> >>> Glen is proposing is to not WAL-log index changes, and with that any
> >>> crash no matter how minor would have to invalidate indexes.
> >
> >> Nooo...!  This has nothing to do with WAL logging index changes.
> >
> > How so?  In any PITR-based situation it seems to me you need to worry
> > about the WAL bulk a lot more than the bulk of the base backup.
>
>
> It isn't the bulk so much as the amount of time, and the impact to the
> running system during that time, that it takes to execute the base backup.
>
> I haven't noticed any real impact related to compressing and exporting
> WAL files.
>
> Anyway, more to the point, I'm not knowingly proposing anything that
> should cause reduced system reliability in the event of a crash.

Glen,

Thanks for bringing your ideas to the list.

The idea of auto rebuilding indexes following recovery has already been
proposed, so is under consideration. It hasn't been proposed in relation
to the use case you mention, so that is new.

If we did as you suggest then it would speed up the base backup but
would also add index rebuild time onto the end of any recovery. If the
database is so large than base backup time is significant then rebuild
time for *all* indexes will be very significant. It seems an attractive
proposition but one that could lead to significant regret if disaster
did strike.

So I think it's a good idea, but on balance not one that enough people
would use to make it worth implementing of itself. Auto rebuilding
damaged indexes is on the radar however, so if we're close enough we may
be able to do something useful along the lines you suggest.

--
 Simon Riggs           www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

Re: I don't want to back up index files

From

Jeremy Harris

Date:

13 March 2009, 17:43:05

Simon Riggs wrote:
> The idea of auto rebuilding indexes following recovery has already been
> proposed, so is under consideration. It hasn't been proposed in relation
> to the use case you mention, so that is new.
>
> If we did as you suggest then it would speed up the base backup but
> would also add index rebuild time onto the end of any recovery.

Forgive me for being stupid, but isn't a time when all the data for
a table is being streamed in, during restore, the *perfect* time
to build an index?   Why wait until after the restore?

Cheers,
    Jeremy

Re: I don't want to back up index files

From

Glen Parker

Date:

13 March 2009, 17:50:56

Jeremy Harris wrote:
> Simon Riggs wrote:
>> The idea of auto rebuilding indexes following recovery has already been
>> proposed, so is under consideration. It hasn't been proposed in relation
>> to the use case you mention, so that is new.
>>
>> If we did as you suggest then it would speed up the base backup but
>> would also add index rebuild time onto the end of any recovery.
>
> Forgive me for being stupid, but isn't a time when all the data for
> a table is being streamed in, during restore, the *perfect* time
> to build an index?   Why wait until after the restore?

Not all the data streams in.  Most of it is there before the WAL replay
begins.

-Glen

Re: I don't want to back up index files

From

Lew

Date:

21 March 2009, 11:59:42

Glen Parker wrote:
> pg_dump is not acceptable to us because of the potential to lose many
> hours of valuable data.  Why would pg_dump even be relevant to this
> discussion?  PITR offers a benefit that pg_dump does not, a benefit that
> we, and countless other organizations, obviously find useful. Suggesting
> that a person who's been managing PG in a commercial setting since
> version 6.4 should just use pg_dump as an alternative to PITR is, well,
> rather insulting.

Dude, you were the one who started with calling other people's ideas "silly".
   To be aggrieved when they respond in kind is, well, silly.

--
Lew