Thread: Re: NOT LOGGED options (was Point in Time Recovery )

Re: NOT LOGGED options (was Point in Time Recovery )

From
"Simon@2ndquadrant.com"
Date:
> Tom Lane wrote
> "Simon@2ndquadrant.com" <simon@2ndquadrant.com> writes:
> > It would be my intention (in 8.1) to make those available via
> switches e.g.
> > NOT LOGGED options on CREATE INDEX and COPY, to allow users to take
> > advantage of the no logging optimization without turning off PITR system
> > wide. (Just as this is possible in Oracle and Teradata).
>
> Isn't this in direct conflict with your opinion above?  And I cannot say
> that I think this one is a good idea.  We do not have support for
> selective catalog xlogging; if you do something like this then you
> *will* have a broken database after recovery, because it will contain
> those indexes but with invalid contents.

No, its not in direct conflict. Turning OFF archive_mode would have a system
wide effect. The options described allow individual applications to make a
choice about whether certain very large operations are recoverable, or not.
I don't ever personally want to turn off system wide PITR, but there will be
times when I choose to avoid overhead on individual ops when the situation
dictates. This goes with your oft-mentioned dislike of systems that think
they know better than you do...

The first two optimizations have been included in 8.0 when archive_mode is
off. If there is a problem, then it will effect crash recovery of those
systems also. I suggest using exactly this optimisation, though under user
(application) control, rather than sysadmin control.

The challenges you mention have a solution. I wanted to add these to TODO,
not yet to discuss detailed implementation.

> > I would also aim to make the first Insert Select into an empty table not
> > logged (optionally). This is an important optimization for
> Oracle, teradata
> > and DB2 (which uses NOT LOGGED INITIALLY).
>
> This is even worse: not only do you have a broken database, but you have
> no way to recover.  (At least with an unlogged index you could fix it by
> REINDEX.)  If you don't care about longevity of the table, then make it
> a temp table.
>

It is frequently possible to use that route, though the option remains in
frequent use in other situations.

> The fact that Oracle does it does not automatically make it a good idea.
>

Amen to that. You will note that unless compatability has been a
requirement, there have been times I have not followed the Oracle path, e.g.
PITR design.

I admit it must seem strange that I tried so hard to put PITR in place, only
to suggest removing it, optionally...

Overall, the options I describe here have been in production use in major
enterprise Data Warehouse systems for almost 15 years now. Oracle and DB2
copied the original Teradata implementation; slowly because, they too,
didn't quickly or easily accept the wisdom. There is abosultely no doubt of
the true value of these optimisations - the TPC-H tests for all vendors make
use of those (hidden in the details of which load utility options are used,
or simply the default behaviour).

Logging only has value when the mean time to recover is low enough to make
recovery worthwhile. This can catch you in a bind because you have to decide
whether to reduce MTTR at the expense of 100% data recovery. For some big
systems, recovery is only an option if you exclude the biggest table(s). In
a Data Warehouse, where data is loaded in large volumes, it may only be
feasible to load it when you have this optimisation. In a recovery
situation, re-loading the largest fact tables from their original source
data files is more likely to be the best option, or in some cases, skipped
entirely in favour of loading new data.

I don't claim that everybody would want this, only that it is an extremely
beneficial optimisation for many very large databases - which is much of my
focus.

You've pointed out that I'm new "round here", which is certainly true - but
I have been many places... There are and will be many differences in
thinking that emerge from this; I regard all of this as synergy, not
argument.

Best Regards, Simon Riggs



Re: NOT LOGGED options (was Point in Time Recovery )

From
Manfred Spraul
Date:
Simon@2ndquadrant.com wrote:

>>Tom Lane wrote
>>    
>>
>>>NOT LOGGED options on CREATE INDEX and COPY, to allow users to take
>>>advantage of the no logging optimization without turning off PITR system
>>>wide. (Just as this is possible in Oracle and Teradata).
>>>      
>>>
>>Isn't this in direct conflict with your opinion above?  And I cannot say
>>that I think this one is a good idea.  We do not have support for
>>selective catalog xlogging;
>>
Is it possible to skip the xlog fsync for NOT LOGGED transactions?

--   Manfred


Re: NOT LOGGED options (was Point in Time Recovery )

From
"Simon Riggs"
Date:
> Manfred Spraul
> Simon@2ndquadrant.com wrote:
>
> >>Tom Lane wrote
> >>
> >>
> >>>NOT LOGGED options on CREATE INDEX and COPY, to allow users to take
> >>>advantage of the no logging optimization without turning off
> PITR system
> >>>wide. (Just as this is possible in Oracle and Teradata).
> >>>
> >>>
> >>Isn't this in direct conflict with your opinion above?  And I cannot say
> >>that I think this one is a good idea.  We do not have support for
> >>selective catalog xlogging;
> >>
> Is it possible to skip the xlog fsync for NOT LOGGED transactions?
>

Hmm...good thinking...however,

For very large operations, its the volume of the xlog writes thats the
problem, not the fsync of the logs. The type of things I'm thinking about
are large CREATE INDEX and large COPY operations, for very large tasks i.e.
> 1Gb. These are most useful in data warehousing operations - which is about
20% of the user base according to the survey stats from www.postgresql.org.

The wal buffer only gets synced at end of transaction, or when the buffer is
full. On long operations there is still only one commit, so not fsyncing
there won't gain much. The buffer will fill up repeatedly and require
flushing - which you can't really skip because when you get to the commit
you need to be certain that everything is down to disk - there's not much
point fsyncing the commit if the previous wal records haven't been.

If there was a way to tell whether a block in the wal buffer had been
written by a NOT LOGGED transaction, then it might be possible to vary the
fsync behaviour accordingly. That's a good idea if thats what you meant,
though it would mean changing some critical, well tested code that every wal
record goes through. I'd rather simply not write wal at all for the certain
specific situations when the user requests it - there are already decision
points in the code for both the situations I've mentioned, since these have
been optimised in 8.0 for when archive_command has not been set. It would be
a simply matter to add in a check at that point.

Anyway...this is probably 8.1 stuff now.

Best Regards, Simon Riggs