Thread: code question: storing INTO relation

code question: storing INTO relation

From

Neil Conway

Date:

18 October 2004, 07:41:15

I've got the CREATE TABLE AS restructuring almost finished, but came
across something that I could use some advice on. The current code
stores the "into" relation (and whether or not that relation has OIDs)
in the Query struct. This is ugly[1], but I'm not sure how to fix it.

The main reason Query needs to hold this data is so that InitPlan() can
produce the correct TupleDesc for the query's result set in
ExecutorStart() (it needs to know, for example, whether to include space
for OIDs in the TupleDesc).

Based on that, I think that moving this information into QueryDesc might
be doable, but I'm not sure if this will have negative implications
anywhere.

Any thoughts?

-Neil

[1] IMHO it is ugly because:

(a) "Query" is the product of the analyzer, but it is possible for the
INTO relation to change between parse-analysis and execution (consider
CREATE TABLE AS ... EXECUTE). We currently cope with this by copying the
Query and modifying it.

(b) The INTO relation is really a property of the statement to which the
SELECT/EXECUTE is attached, not the SELECT itself. In other words, in a
statement like:
   CREATE TABLE xyz AS SELECT ...;

The "destination" of the SELECT query is a property of the CREATE TABLE
AS statement, not the SELECT. The new implementation of CREATE TABLE AS
works just like this: the executor is run as normal, but it just uses a
special DestReceiver that dumps the SELECTs result set into a
newly-created heap table. Needing to modify the attached SELECT's Query
node to "let it know" that it is being invoked by CREATE TABLE AS is
ugly.

Re: code question: storing INTO relation

From

Tom Lane

Date:

19 October 2004, 00:37:21

Neil Conway <neilc@samurai.com> writes:
> I've got the CREATE TABLE AS restructuring almost finished, but came
> across something that I could use some advice on. The current code
> stores the "into" relation (and whether or not that relation has OIDs)
> in the Query struct. This is ugly[1], but I'm not sure how to fix it.

It strikes me that as far as the executor is concerned, CREATE TABLE AS
would be better treated as an INSERT (ie, pretend it was CREATE TABLE
followed by INSERT/SELECT).  If you did that then the idea would be to
create the new table and add it to the query range table.  Then you need
neither any special-case code in execMain, nor a new DestReceiver.

You'd still want to copy/modify the Query, but to convert it from a
SELECT into an INSERT, which is a pretty clean transformation I think.
        regards, tom lane

Re: code question: storing INTO relation

From

Gavin Sherry

Date:

19 October 2004, 01:09:31

On Mon, 18 Oct 2004, Tom Lane wrote:

> Neil Conway <neilc@samurai.com> writes:
> > I've got the CREATE TABLE AS restructuring almost finished, but came
> > across something that I could use some advice on. The current code
> > stores the "into" relation (and whether or not that relation has OIDs)
> > in the Query struct. This is ugly[1], but I'm not sure how to fix it.
>
> It strikes me that as far as the executor is concerned, CREATE TABLE AS
> would be better treated as an INSERT (ie, pretend it was CREATE TABLE
> followed by INSERT/SELECT).  If you did that then the idea would be to
> create the new table and add it to the query range table.  Then you need
> neither any special-case code in execMain, nor a new DestReceiver.
>
> You'd still want to copy/modify the Query, but to convert it from a
> SELECT into an INSERT, which is a pretty clean transformation I think.

Just thinking of optimisations, I wonder if it would be possible to bypass
WAL (like we do for ALTER TABLE ... SET TABLESPACE) if archiving is
disabled, and fsync the newly created relfilenode  after the data is inserted.

Gavin

Re: code question: storing INTO relation

From

"Simon Riggs"

Date:

21 October 2004, 22:54:48

> Gavin Sherry
> Just thinking of optimisations, I wonder if it would be possible to bypass
> WAL (like we do for ALTER TABLE ... SET TABLESPACE) if archiving is
> disabled, and fsync the newly created relfilenode  after the data is
inserted.
>

Yes please. This is good for Data Warehousing ELT-style operations.
(Extract, Load, Transform).

If I could go further, I'd like to add this as an option on the command if
possible, rather than a presumption that all such statements should not be
logged.

Thanks,

Best Regards, Simon Riggs

Re: code question: storing INTO relation

From

Neil Conway

Date:

22 October 2004, 00:48:38

On Fri, 2004-10-22 at 07:54, Simon Riggs wrote:
> If I could go further, I'd like to add this as an option on the command if
> possible, rather than a presumption that all such statements should not be
> logged.

Why is that necessary?

-Neil

Re: code question: storing INTO relation

From

"Simon Riggs"

Date:

22 October 2004, 10:39:24

> Neil Conway
> On Fri, 2004-10-22 at 07:54, Simon Riggs wrote:
> > If I could go further, I'd like to add this as an option on the command
if
> > possible, rather than a presumption that all such statements should not
be
> > logged.
>
> Why is that necessary?
>

So you can choose whether to do this or not.  IMHO, it is important to have
the optimization, but it shouldn't be the case that EVERY statement is
forced not to log.

If I risk data loss, I'd like it to be my choice to do this. This effects
COPY and CREATE INDEX also, though I know you aren't working on those at
present. I will implement thos eif tyou do this for CTAS.

DB2 supports this as a user option via NOT LOGGED INITIALLY
http://publib.boulder.ibm.com/infocenter/db2help/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0006079.htm

Oracle supports this as a user option via NOLOGGING
http://www.devx.com/dbzone/10MinuteSolution/22191/0/page/3
(The Oracle manuals aren't available online)

Teradata also supports this...

These are quoted only to show that others have found this a useful
optimization, not because I believe we should slavishly follow others in
their syntax.

Tom's view previously was, IIRC, that you can do this with a Temporary
Table, which is true, but that isn't very helpful when you actually want the
table to be permanent.

Best Regards, Simon Riggs

Re: code question: storing INTO relation

From

"Michael Paesold"

Date:

22 October 2004, 11:14:13

Simon Riggs wrote:

> Neil Conway wrote:
>> Why is that necessary?
>
> So you can choose whether to do this or not.  IMHO, it is important to 
> have
> the optimization, but it shouldn't be the case that EVERY statement is
> forced not to log.
>
> If I risk data loss, I'd like it to be my choice to do this. This effects
> COPY and CREATE INDEX also, though I know you aren't working on those at
> present. I will implement thos eif tyou do this for CTAS.

If I understand the original proposal correctly, there is no risk of data 
loss except in a temporary file. The data would be copied into a new file 
(without wal-logging), but after that, the file would be fsynced and the 
resulting changes would indeed be logged to WAL.

So if there is a crash during the copy, nothing valuable will be lost at 
all. If there is a crash after transaction commit of that transaction, 
nothing will be lost, again... the new file will have been fsynced already 
and everything else will be in WAL.

Best Regards,
Michael Paesold

Re: code question: storing INTO relation

From

Greg Stark

Date:

22 October 2004, 17:54:56

"Michael Paesold" <mpaesold@gmx.at> writes:

> If I understand the original proposal correctly, there is no risk of data loss
> except in a temporary file. The data would be copied into a new file (without
> wal-logging), but after that, the file would be fsynced and the resulting
> changes would indeed be logged to WAL.
> 
> So if there is a crash during the copy, nothing valuable will be lost at all.
> If there is a crash after transaction commit of that transaction, nothing will
> be lost, again... the new file will have been fsynced already and everything
> else will be in WAL.

This is one of the reasons CREATE TABLE AS and SELECT ... INTO ... are _not_
necessarily the same. In the first case the table didn't exist at all prior to
the statement. Therefore there's no need to log any modifications to the
table. If there's a crash the initial creation of the table should be rolled
back and none of the data needs to be retained.

In Oracle CREATE TABLE AS is much faster than SELECT ... INTO ... for
basically this reason.

In Postgres CREATE TABLE AS is currently being treated as a synonym for SELECT
... INTO ... So I think this may be an awkward feature to add. Also, like
reindex the logging would still be necessary for online backups. So this may
be a dead-end direction in the long term.

-- 
greg

Re: code question: storing INTO relation

From

Tom Lane

Date:

22 October 2004, 18:34:17

Greg Stark <gsstark@mit.edu> writes:
> This is one of the reasons CREATE TABLE AS and SELECT ... INTO ... are _not_
> necessarily the same.

Sure they are.  Are you confusing this with INSERT ... SELECT ?
        regards, tom lane

Re: code question: storing INTO relation

From

"Michael Paesold"

Date:

22 October 2004, 19:19:53

Greg Stark wrote:
> In Postgres CREATE TABLE AS is currently being treated as a synonym for 
> SELECT
> ... INTO ... So I think this may be an awkward feature to add. Also, like
> reindex the logging would still be necessary for online backups. So this 
> may
> be a dead-end direction in the long term.

Putting aside the CREATE TABLE AS/SELECT ... INTO vs INSERT ... SELECT 
confusion, online backups seem to be an issue.

Is there any command currently, that circumvents WAL and the result is still 
consistent with PITR? How is this done in other databases?
Implementing features that don't work with PITR does not feel right at all, 
of course, since eventually many if not most of the people will use 
continues logging.

Best Regards,
Michael

Re: code question: storing INTO relation

From

Greg Stark

Date:

22 October 2004, 19:26:51

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > This is one of the reasons CREATE TABLE AS and SELECT ... INTO ... are _not_
> > necessarily the same.
> 
> Sure they are.  Are you confusing this with INSERT ... SELECT ?

Uhm. oops.

-- 
greg

Re: code question: storing INTO relation

From

Simon Riggs

Date:

22 October 2004, 20:15:15

On Fri, 2004-10-22 at 19:20, Michael Paesold wrote:
> Greg Stark wrote:
> > In Postgres CREATE TABLE AS is currently being treated as a synonym for 
> > SELECT
> > ... INTO ... So I think this may be an awkward feature to add. Also, like
> > reindex the logging would still be necessary for online backups. So this 
> > may
> > be a dead-end direction in the long term.
> 
> Putting aside the CREATE TABLE AS/SELECT ... INTO vs INSERT ... SELECT 
> confusion, online backups seem to be an issue.

...*the* issue...

> 
> Is there any command currently, that circumvents WAL and the result is still 
> consistent with PITR? 

None, nor could there be.

> How is this done in other databases?

As I have said in previous posts, the user-level feature to turn off
logging for certain large operations is a feature offered by DB2, Oracle
and Teradata and has been in production use for around 15 years in many
large datawarehouses.

> Implementing features that don't work with PITR does not feel right at all, 
> of course, since eventually many if not most of the people will use 
> continues logging.
> 

I agree, hence why this should be a user option. The usage of this is
restricted to particular classes of database usage: data warehousing or
very large database applications. This isn't intended for use in OLTP or
web-site databases.

"feel right" is a subjective experience. I argue for the option only,
not to force everybody to use it if it is not sensible for your
application. Not all databases have the same purpose....

On this, and other issues, I will argue for further Data Warehousing
features, as options, in PostgreSQL. My reading of the user base is that
around 50% of users (Scientific + Data Warehousing users) would benefit
from improvements in this area.

-- 
Best Regards, Simon Riggs

Re: code question: storing INTO relation

From

Greg Stark

Date:

23 October 2004, 00:29:53

Simon Riggs <simon@2ndquadrant.com> writes:

> I agree, hence why this should be a user option. The usage of this is
> restricted to particular classes of database usage: data warehousing or
> very large database applications. This isn't intended for use in OLTP or
> web-site databases.

Well a lot of users also just don't use online backups. For these users
there's no downside to CREATE INDEX/REINDEX/CREATE TABLE AS not logging. 

The logging is not needed for a regular recovery, only for restoring from
and a backup older than the table/index creation.

-- 
greg

Re: code question: storing INTO relation

From

Simon Riggs

Date:

25 October 2004, 20:51:55

On Sat, 2004-10-23 at 00:29, Greg Stark wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> 
> > I agree, hence why this should be a user option. The usage of this is
> > restricted to particular classes of database usage: data warehousing or
> > very large database applications. This isn't intended for use in OLTP or
> > web-site databases.
> 
> Well a lot of users also just don't use online backups. For these users
> there's no downside to CREATE INDEX/REINDEX/CREATE TABLE AS not logging. 
> 

Yes, you're right. I'm just aiming higher, that's all...

A DW with large fact tables will benefit from the optimisation, since
the data loading can often be used to recover the database if required.
Reference data tables don't benefit from the optimization since they are
smaller and much easier to backup/recover. We want to join the fact
tables to the reference data tables, so would like both to exist in a
database that has BOTH PITR and non-logged bulk operations.

The alternative is to have an ODS that uses PITR, alongside a DW that
doesn't, though with data copying from the ODS to the DW. The latter
step is a time-waster I see no reason to encourage.

Anyway... I see no huge agreement with my viewpoint, so I'll just add it
to my own list...

-- 
Best Regards, Simon Riggs

Re: code question: storing INTO relation

From

Bruce Momjian

Date:

12 November 2004, 18:12:18

Added to TODO:
* Eliminate WAL logging for CREATE INDEX/REINDEX/CREATE TABLE AS  when not doing WAL archiving

---------------------------------------------------------------------------

Greg Stark wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> 
> > I agree, hence why this should be a user option. The usage of this is
> > restricted to particular classes of database usage: data warehousing or
> > very large database applications. This isn't intended for use in OLTP or
> > web-site databases.
> 
> Well a lot of users also just don't use online backups. For these users
> there's no downside to CREATE INDEX/REINDEX/CREATE TABLE AS not logging. 
> 
> The logging is not needed for a regular recovery, only for restoring from
> and a backup older than the table/index creation.
> 
> -- 
> greg
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: code question: storing INTO relation

From

Greg Stark

Date:

12 November 2004, 18:48:17

Bruce Momjian <pgman@candle.pha.pa.us> writes:

> Added to TODO:
> 
>     * Eliminate WAL logging for CREATE INDEX/REINDEX/CREATE TABLE AS
>       when not doing WAL archiving

I think that's already done for CREATE INDEX/REINDEX. The reason I mentioned
them was to provide precedent that CREATE TABLE AS could be treated that way.

-- 
greg

Re: code question: storing INTO relation

From

Bruce Momjian

Date:

12 November 2004, 23:01:26

Greg Stark wrote:
> 
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> 
> > Added to TODO:
> > 
> >     * Eliminate WAL logging for CREATE INDEX/REINDEX/CREATE TABLE AS
> >       when not doing WAL archiving
> 
> I think that's already done for CREATE INDEX/REINDEX. The reason I mentioned
> them was to provide precedent that CREATE TABLE AS could be treated that way.

I don't think so.  Can someone confirm?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: code question: storing INTO relation

From

Tom Lane

Date:

12 November 2004, 23:13:36

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Greg Stark wrote:
>> I think that's already done for CREATE INDEX/REINDEX.

> I don't think so.  Can someone confirm?

Greg is correct --- at least for btree build, which is the only index
type we have WAL-ified at all :-(
        regards, tom lane

Re: code question: storing INTO relation

From

Bruce Momjian

Date:

12 November 2004, 23:28:46

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Greg Stark wrote:
> >> I think that's already done for CREATE INDEX/REINDEX.
> 
> > I don't think so.  Can someone confirm?
> 
> Greg is correct --- at least for btree build, which is the only index
> type we have WAL-ified at all :-(

OK, TODO updated:
* Eliminate WAL logging for CREATE TABLE AS when not doing WAL  archiving

FYI, we do have a TODO for the other index methods:
* Add WAL index reliability improvement to non-btree indexes

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073

Re: code question: storing INTO relation

From

Kevin Brown

Date:

12 November 2004, 23:50:53

Bruce Momjian wrote:
> 
> Added to TODO:
> 
>     * Eliminate WAL logging for CREATE INDEX/REINDEX/CREATE TABLE AS
>       when not doing WAL archiving

But that means that these operations can't be performed in a
transaction unless WAL archiving is on, right?  (If I misunderstand
and thus am wrong about this then you can ignore the rest of what
follows, of course)

Suppose I do the following within a transaction with WAL archiving
turned off:

CREATE TABLE blah (x integer primary key, y varchar(32));
INSERT INTO blah SELECT x, y FROM foo;

and then roll it back because something didn't work out the way I
wanted it to.

If CREATE INDEX isn't WAL logged, then either the rollback of the
above will result in an inconsistent database, or the fact that CREATE
INDEX isn't WAL logged will cause the CREATE TABLE to fail because the
index creation (and thus the table creation) can't be done within a
transaction.

Being able to do DDL within a transaction is one of the coolest and
(from the POV of a DBA) most useful features of the database.  If
we're going to eliminate WAL logging of certain operations, it should
be done at the explicit request of the DBA, preferably through a GUC,
and preferably with a relevant GUC for each operation.

Since WAL archiving requires that the operations in question be WAL
logged, the GUCs that control WAL logging of those statements would
clearly be ineffective if WAL archiving is turned on.

-- 
Kevin Brown                          kevin@sysexperts.com

Re: code question: storing INTO relation

From

Alvaro Herrera

Date:

13 November 2004, 03:33:28

On Fri, Nov 12, 2004 at 03:49:44PM -0800, Kevin Brown wrote:
> Bruce Momjian wrote:
> > 
> > Added to TODO:
> > 
> >     * Eliminate WAL logging for CREATE INDEX/REINDEX/CREATE TABLE AS
> >       when not doing WAL archiving
> 
> But that means that these operations can't be performed in a
> transaction unless WAL archiving is on, right?

No, because we ensure the transactionality of the operation by fsync'ing
the involved files.  And the rollback-ability is ensured by registering
the files for deletion at rollback.  (Just as we register the files for
deletion at commit in case of a DROP TABLE, to make sure we can roll it
back.)

So don't worry, because what you want to do still works.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)

Re: code question: storing INTO relation

From

Simon Riggs

Date:

14 November 2004, 11:09:32

On Fri, 2004-11-12 at 23:13, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Greg Stark wrote:
> >> I think that's already done for CREATE INDEX/REINDEX.
> 
> > I don't think so.  Can someone confirm?
> 
> Greg is correct --- at least for btree build, which is the only index
> type we have WAL-ified at all :-(

[well...at least they're optimized then...  :) ]

With regard to the other index types, my opinion was:
HASH - works OK, but a pain to administer, no huge benefit in using
R-TREE - slightly broken in places, limited in usablity
GiST - index of choice for PostGIS, TSearch2, in need of optimization

Following recent optimization work on GiST, it now seems worth the
trouble to add WAL logging to it. ISTM that the other two aren't widely
used enough to make it worthwhile to spend time on, evidence for which
is also that no one ever has, up 'til now.

Time-management seems to be the key to making progress in the most
important areas...

-- 
Best Regards, Simon Riggs

Re: code question: storing INTO relation

From

Simon Riggs

Date:

14 November 2004, 11:30:59

On Fri, 2004-11-12 at 23:13, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Greg Stark wrote:
> >> I think that's already done for CREATE INDEX/REINDEX.
> 
> > I don't think so.  Can someone confirm?
> 
> Greg is correct --- at least for btree build, which is the only index
> type we have WAL-ified at all :-(
> 

Is there a place (or a single best place) to document this behaviour?
- with each command?
- in the backup section?
- in runtime?

Seems a shame to optimize and not tell anyone.

-- 
Best Regards, Simon Riggs

Re: code question: storing INTO relation

From

Tom Lane

Date:

14 November 2004, 17:58:37

Simon Riggs <simon@2ndquadrant.com> writes:
> On Fri, 2004-11-12 at 23:13, Tom Lane wrote:
>> Greg is correct --- at least for btree build, which is the only index
>> type we have WAL-ified at all :-(

> Is there a place (or a single best place) to document this behaviour?

If you're talking about the lack of WAL backup for non-btree indexes,
it is documented (at the end of the PITR section IIRC).

If you're talking about the optimization of not logging index builds,
I don't see a need to document that per se.  Ordinary users shouldn't
need to care, mainly because they can't affect it one way or the other.
Anyone who does care can look at the code and see how it's done.  (Open
source has a big advantage over closed source in that regard, and I
think it's reasonable to have different documentation practices than
closed-source products would use.)
        regards, tom lane

Re: code question: storing INTO relation

From

Neil Conway

Date:

14 November 2004, 23:00:14

On Sun, 2004-11-14 at 11:06 +0000, Simon Riggs wrote:
> HASH - works OK, but a pain to administer, no huge benefit in using

At least in theory, I think this could offer better performance for
equality searches than b+-tree. Given how common those kinds of queries
are, I still think hash indexes are worth putting some time into. My
guess is that their relatively poor performance at present (relative to
b+-trees) is just a reflection of how much more tuning and design work
has gone into the b+-tree code than the hash code.

> R-TREE - slightly broken in places, limited in usablity

I agree. I hope that when we have a good GiST infrastructure,
implementing rtree via GiST will offer performance that is as good as or
better than the builtin rtree.

> GiST - index of choice for PostGIS, TSearch2, in need of optimization

I'm working on adding page-level locking and WAL safety, although this
is a pretty difficult project. Gavin and I are also looking at
algorithms for bulk loading GiST indexes, although I'm not yet sure how
possible that will be.

-Neil

Re: code question: storing INTO relation

From

Simon Riggs

Date:

14 November 2004, 23:42:14

On Sun, 2004-11-14 at 22:59, Neil Conway wrote:
> On Sun, 2004-11-14 at 11:06 +0000, Simon Riggs wrote:
> > HASH - works OK, but a pain to administer, no huge benefit in using
> 
> At least in theory, I think this could offer better performance for
> equality searches than b+-tree. Given how common those kinds of queries
> are, I still think hash indexes are worth putting some time into. My
> guess is that their relatively poor performance at present (relative to
> b+-trees) is just a reflection of how much more tuning and design work
> has gone into the b+-tree code than the hash code.

Can be faster for equality searches on a fairly static table; on a
growing table, could be same or worse. IMHO The theoretical difference
in speed doesn't seem worth the effort of spending additional time in
that part of the code, given the inherent pain of REINDEX.

> > GiST - index of choice for PostGIS, TSearch2, in need of optimization
> 
> I'm working on adding page-level locking and WAL safety, although this
> is a pretty difficult project. 

Difficult, yes. I'm glad you're stepping up to the plate for the WAL
safety.

Two index types is sufficient, and ISTM should be the maximum therefore.
When you've finished tuning GiST, I wager that you will agree :)

-- 
Best Regards, Simon Riggs