Thread: code question: storing INTO relation
I've got the CREATE TABLE AS restructuring almost finished, but came across something that I could use some advice on. The current code stores the "into" relation (and whether or not that relation has OIDs) in the Query struct. This is ugly[1], but I'm not sure how to fix it. The main reason Query needs to hold this data is so that InitPlan() can produce the correct TupleDesc for the query's result set in ExecutorStart() (it needs to know, for example, whether to include space for OIDs in the TupleDesc). Based on that, I think that moving this information into QueryDesc might be doable, but I'm not sure if this will have negative implications anywhere. Any thoughts? -Neil [1] IMHO it is ugly because: (a) "Query" is the product of the analyzer, but it is possible for the INTO relation to change between parse-analysis and execution (consider CREATE TABLE AS ... EXECUTE). We currently cope with this by copying the Query and modifying it. (b) The INTO relation is really a property of the statement to which the SELECT/EXECUTE is attached, not the SELECT itself. In other words, in a statement like: CREATE TABLE xyz AS SELECT ...; The "destination" of the SELECT query is a property of the CREATE TABLE AS statement, not the SELECT. The new implementation of CREATE TABLE AS works just like this: the executor is run as normal, but it just uses a special DestReceiver that dumps the SELECTs result set into a newly-created heap table. Needing to modify the attached SELECT's Query node to "let it know" that it is being invoked by CREATE TABLE AS is ugly.
Neil Conway <neilc@samurai.com> writes: > I've got the CREATE TABLE AS restructuring almost finished, but came > across something that I could use some advice on. The current code > stores the "into" relation (and whether or not that relation has OIDs) > in the Query struct. This is ugly[1], but I'm not sure how to fix it. It strikes me that as far as the executor is concerned, CREATE TABLE AS would be better treated as an INSERT (ie, pretend it was CREATE TABLE followed by INSERT/SELECT). If you did that then the idea would be to create the new table and add it to the query range table. Then you need neither any special-case code in execMain, nor a new DestReceiver. You'd still want to copy/modify the Query, but to convert it from a SELECT into an INSERT, which is a pretty clean transformation I think. regards, tom lane
On Mon, 18 Oct 2004, Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: > > I've got the CREATE TABLE AS restructuring almost finished, but came > > across something that I could use some advice on. The current code > > stores the "into" relation (and whether or not that relation has OIDs) > > in the Query struct. This is ugly[1], but I'm not sure how to fix it. > > It strikes me that as far as the executor is concerned, CREATE TABLE AS > would be better treated as an INSERT (ie, pretend it was CREATE TABLE > followed by INSERT/SELECT). If you did that then the idea would be to > create the new table and add it to the query range table. Then you need > neither any special-case code in execMain, nor a new DestReceiver. > > You'd still want to copy/modify the Query, but to convert it from a > SELECT into an INSERT, which is a pretty clean transformation I think. Just thinking of optimisations, I wonder if it would be possible to bypass WAL (like we do for ALTER TABLE ... SET TABLESPACE) if archiving is disabled, and fsync the newly created relfilenode after the data is inserted. Gavin
> Gavin Sherry > Just thinking of optimisations, I wonder if it would be possible to bypass > WAL (like we do for ALTER TABLE ... SET TABLESPACE) if archiving is > disabled, and fsync the newly created relfilenode after the data is inserted. > Yes please. This is good for Data Warehousing ELT-style operations. (Extract, Load, Transform). If I could go further, I'd like to add this as an option on the command if possible, rather than a presumption that all such statements should not be logged. Thanks, Best Regards, Simon Riggs
On Fri, 2004-10-22 at 07:54, Simon Riggs wrote: > If I could go further, I'd like to add this as an option on the command if > possible, rather than a presumption that all such statements should not be > logged. Why is that necessary? -Neil
> Neil Conway > On Fri, 2004-10-22 at 07:54, Simon Riggs wrote: > > If I could go further, I'd like to add this as an option on the command if > > possible, rather than a presumption that all such statements should not be > > logged. > > Why is that necessary? > So you can choose whether to do this or not. IMHO, it is important to have the optimization, but it shouldn't be the case that EVERY statement is forced not to log. If I risk data loss, I'd like it to be my choice to do this. This effects COPY and CREATE INDEX also, though I know you aren't working on those at present. I will implement thos eif tyou do this for CTAS. DB2 supports this as a user option via NOT LOGGED INITIALLY http://publib.boulder.ibm.com/infocenter/db2help/index.jsp?topic=/com.ibm.db2.udb.doc/admin/c0006079.htm Oracle supports this as a user option via NOLOGGING http://www.devx.com/dbzone/10MinuteSolution/22191/0/page/3 (The Oracle manuals aren't available online) Teradata also supports this... These are quoted only to show that others have found this a useful optimization, not because I believe we should slavishly follow others in their syntax. Tom's view previously was, IIRC, that you can do this with a Temporary Table, which is true, but that isn't very helpful when you actually want the table to be permanent. Best Regards, Simon Riggs
Simon Riggs wrote: > Neil Conway wrote: >> Why is that necessary? > > So you can choose whether to do this or not. IMHO, it is important to > have > the optimization, but it shouldn't be the case that EVERY statement is > forced not to log. > > If I risk data loss, I'd like it to be my choice to do this. This effects > COPY and CREATE INDEX also, though I know you aren't working on those at > present. I will implement thos eif tyou do this for CTAS. If I understand the original proposal correctly, there is no risk of data loss except in a temporary file. The data would be copied into a new file (without wal-logging), but after that, the file would be fsynced and the resulting changes would indeed be logged to WAL. So if there is a crash during the copy, nothing valuable will be lost at all. If there is a crash after transaction commit of that transaction, nothing will be lost, again... the new file will have been fsynced already and everything else will be in WAL. Best Regards, Michael Paesold
"Michael Paesold" <mpaesold@gmx.at> writes: > If I understand the original proposal correctly, there is no risk of data loss > except in a temporary file. The data would be copied into a new file (without > wal-logging), but after that, the file would be fsynced and the resulting > changes would indeed be logged to WAL. > > So if there is a crash during the copy, nothing valuable will be lost at all. > If there is a crash after transaction commit of that transaction, nothing will > be lost, again... the new file will have been fsynced already and everything > else will be in WAL. This is one of the reasons CREATE TABLE AS and SELECT ... INTO ... are _not_ necessarily the same. In the first case the table didn't exist at all prior to the statement. Therefore there's no need to log any modifications to the table. If there's a crash the initial creation of the table should be rolled back and none of the data needs to be retained. In Oracle CREATE TABLE AS is much faster than SELECT ... INTO ... for basically this reason. In Postgres CREATE TABLE AS is currently being treated as a synonym for SELECT ... INTO ... So I think this may be an awkward feature to add. Also, like reindex the logging would still be necessary for online backups. So this may be a dead-end direction in the long term. -- greg
Greg Stark <gsstark@mit.edu> writes: > This is one of the reasons CREATE TABLE AS and SELECT ... INTO ... are _not_ > necessarily the same. Sure they are. Are you confusing this with INSERT ... SELECT ? regards, tom lane
Greg Stark wrote: > In Postgres CREATE TABLE AS is currently being treated as a synonym for > SELECT > ... INTO ... So I think this may be an awkward feature to add. Also, like > reindex the logging would still be necessary for online backups. So this > may > be a dead-end direction in the long term. Putting aside the CREATE TABLE AS/SELECT ... INTO vs INSERT ... SELECT confusion, online backups seem to be an issue. Is there any command currently, that circumvents WAL and the result is still consistent with PITR? How is this done in other databases? Implementing features that don't work with PITR does not feel right at all, of course, since eventually many if not most of the people will use continues logging. Best Regards, Michael
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > This is one of the reasons CREATE TABLE AS and SELECT ... INTO ... are _not_ > > necessarily the same. > > Sure they are. Are you confusing this with INSERT ... SELECT ? Uhm. oops. -- greg
On Fri, 2004-10-22 at 19:20, Michael Paesold wrote: > Greg Stark wrote: > > In Postgres CREATE TABLE AS is currently being treated as a synonym for > > SELECT > > ... INTO ... So I think this may be an awkward feature to add. Also, like > > reindex the logging would still be necessary for online backups. So this > > may > > be a dead-end direction in the long term. > > Putting aside the CREATE TABLE AS/SELECT ... INTO vs INSERT ... SELECT > confusion, online backups seem to be an issue. ...*the* issue... > > Is there any command currently, that circumvents WAL and the result is still > consistent with PITR? None, nor could there be. > How is this done in other databases? As I have said in previous posts, the user-level feature to turn off logging for certain large operations is a feature offered by DB2, Oracle and Teradata and has been in production use for around 15 years in many large datawarehouses. > Implementing features that don't work with PITR does not feel right at all, > of course, since eventually many if not most of the people will use > continues logging. > I agree, hence why this should be a user option. The usage of this is restricted to particular classes of database usage: data warehousing or very large database applications. This isn't intended for use in OLTP or web-site databases. "feel right" is a subjective experience. I argue for the option only, not to force everybody to use it if it is not sensible for your application. Not all databases have the same purpose.... On this, and other issues, I will argue for further Data Warehousing features, as options, in PostgreSQL. My reading of the user base is that around 50% of users (Scientific + Data Warehousing users) would benefit from improvements in this area. -- Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > I agree, hence why this should be a user option. The usage of this is > restricted to particular classes of database usage: data warehousing or > very large database applications. This isn't intended for use in OLTP or > web-site databases. Well a lot of users also just don't use online backups. For these users there's no downside to CREATE INDEX/REINDEX/CREATE TABLE AS not logging. The logging is not needed for a regular recovery, only for restoring from and a backup older than the table/index creation. -- greg
On Sat, 2004-10-23 at 00:29, Greg Stark wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > > I agree, hence why this should be a user option. The usage of this is > > restricted to particular classes of database usage: data warehousing or > > very large database applications. This isn't intended for use in OLTP or > > web-site databases. > > Well a lot of users also just don't use online backups. For these users > there's no downside to CREATE INDEX/REINDEX/CREATE TABLE AS not logging. > Yes, you're right. I'm just aiming higher, that's all... A DW with large fact tables will benefit from the optimisation, since the data loading can often be used to recover the database if required. Reference data tables don't benefit from the optimization since they are smaller and much easier to backup/recover. We want to join the fact tables to the reference data tables, so would like both to exist in a database that has BOTH PITR and non-logged bulk operations. The alternative is to have an ODS that uses PITR, alongside a DW that doesn't, though with data copying from the ODS to the DW. The latter step is a time-waster I see no reason to encourage. Anyway... I see no huge agreement with my viewpoint, so I'll just add it to my own list... -- Best Regards, Simon Riggs
Added to TODO: * Eliminate WAL logging for CREATE INDEX/REINDEX/CREATE TABLE AS when not doing WAL archiving --------------------------------------------------------------------------- Greg Stark wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > > I agree, hence why this should be a user option. The usage of this is > > restricted to particular classes of database usage: data warehousing or > > very large database applications. This isn't intended for use in OLTP or > > web-site databases. > > Well a lot of users also just don't use online backups. For these users > there's no downside to CREATE INDEX/REINDEX/CREATE TABLE AS not logging. > > The logging is not needed for a regular recovery, only for restoring from > and a backup older than the table/index creation. > > -- > greg > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Added to TODO: > > * Eliminate WAL logging for CREATE INDEX/REINDEX/CREATE TABLE AS > when not doing WAL archiving I think that's already done for CREATE INDEX/REINDEX. The reason I mentioned them was to provide precedent that CREATE TABLE AS could be treated that way. -- greg
Greg Stark wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Added to TODO: > > > > * Eliminate WAL logging for CREATE INDEX/REINDEX/CREATE TABLE AS > > when not doing WAL archiving > > I think that's already done for CREATE INDEX/REINDEX. The reason I mentioned > them was to provide precedent that CREATE TABLE AS could be treated that way. I don't think so. Can someone confirm? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Greg Stark wrote: >> I think that's already done for CREATE INDEX/REINDEX. > I don't think so. Can someone confirm? Greg is correct --- at least for btree build, which is the only index type we have WAL-ified at all :-( regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Greg Stark wrote: > >> I think that's already done for CREATE INDEX/REINDEX. > > > I don't think so. Can someone confirm? > > Greg is correct --- at least for btree build, which is the only index > type we have WAL-ified at all :-( OK, TODO updated: * Eliminate WAL logging for CREATE TABLE AS when not doing WAL archiving FYI, we do have a TODO for the other index methods: * Add WAL index reliability improvement to non-btree indexes -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > > Added to TODO: > > * Eliminate WAL logging for CREATE INDEX/REINDEX/CREATE TABLE AS > when not doing WAL archiving But that means that these operations can't be performed in a transaction unless WAL archiving is on, right? (If I misunderstand and thus am wrong about this then you can ignore the rest of what follows, of course) Suppose I do the following within a transaction with WAL archiving turned off: CREATE TABLE blah (x integer primary key, y varchar(32)); INSERT INTO blah SELECT x, y FROM foo; and then roll it back because something didn't work out the way I wanted it to. If CREATE INDEX isn't WAL logged, then either the rollback of the above will result in an inconsistent database, or the fact that CREATE INDEX isn't WAL logged will cause the CREATE TABLE to fail because the index creation (and thus the table creation) can't be done within a transaction. Being able to do DDL within a transaction is one of the coolest and (from the POV of a DBA) most useful features of the database. If we're going to eliminate WAL logging of certain operations, it should be done at the explicit request of the DBA, preferably through a GUC, and preferably with a relevant GUC for each operation. Since WAL archiving requires that the operations in question be WAL logged, the GUCs that control WAL logging of those statements would clearly be ineffective if WAL archiving is turned on. -- Kevin Brown kevin@sysexperts.com
On Fri, Nov 12, 2004 at 03:49:44PM -0800, Kevin Brown wrote: > Bruce Momjian wrote: > > > > Added to TODO: > > > > * Eliminate WAL logging for CREATE INDEX/REINDEX/CREATE TABLE AS > > when not doing WAL archiving > > But that means that these operations can't be performed in a > transaction unless WAL archiving is on, right? No, because we ensure the transactionality of the operation by fsync'ing the involved files. And the rollback-ability is ensured by registering the files for deletion at rollback. (Just as we register the files for deletion at commit in case of a DROP TABLE, to make sure we can roll it back.) So don't worry, because what you want to do still works. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "En las profundidades de nuestro inconsciente hay una obsesiva necesidad de un universo lógico y coherente. Pero el universo real se halla siempre un paso más allá de la lógica" (Irulan)
On Fri, 2004-11-12 at 23:13, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Greg Stark wrote: > >> I think that's already done for CREATE INDEX/REINDEX. > > > I don't think so. Can someone confirm? > > Greg is correct --- at least for btree build, which is the only index > type we have WAL-ified at all :-( [well...at least they're optimized then... :) ] With regard to the other index types, my opinion was: HASH - works OK, but a pain to administer, no huge benefit in using R-TREE - slightly broken in places, limited in usablity GiST - index of choice for PostGIS, TSearch2, in need of optimization Following recent optimization work on GiST, it now seems worth the trouble to add WAL logging to it. ISTM that the other two aren't widely used enough to make it worthwhile to spend time on, evidence for which is also that no one ever has, up 'til now. Time-management seems to be the key to making progress in the most important areas... -- Best Regards, Simon Riggs
On Fri, 2004-11-12 at 23:13, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Greg Stark wrote: > >> I think that's already done for CREATE INDEX/REINDEX. > > > I don't think so. Can someone confirm? > > Greg is correct --- at least for btree build, which is the only index > type we have WAL-ified at all :-( > Is there a place (or a single best place) to document this behaviour? - with each command? - in the backup section? - in runtime? Seems a shame to optimize and not tell anyone. -- Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > On Fri, 2004-11-12 at 23:13, Tom Lane wrote: >> Greg is correct --- at least for btree build, which is the only index >> type we have WAL-ified at all :-( > Is there a place (or a single best place) to document this behaviour? If you're talking about the lack of WAL backup for non-btree indexes, it is documented (at the end of the PITR section IIRC). If you're talking about the optimization of not logging index builds, I don't see a need to document that per se. Ordinary users shouldn't need to care, mainly because they can't affect it one way or the other. Anyone who does care can look at the code and see how it's done. (Open source has a big advantage over closed source in that regard, and I think it's reasonable to have different documentation practices than closed-source products would use.) regards, tom lane
On Sun, 2004-11-14 at 11:06 +0000, Simon Riggs wrote: > HASH - works OK, but a pain to administer, no huge benefit in using At least in theory, I think this could offer better performance for equality searches than b+-tree. Given how common those kinds of queries are, I still think hash indexes are worth putting some time into. My guess is that their relatively poor performance at present (relative to b+-trees) is just a reflection of how much more tuning and design work has gone into the b+-tree code than the hash code. > R-TREE - slightly broken in places, limited in usablity I agree. I hope that when we have a good GiST infrastructure, implementing rtree via GiST will offer performance that is as good as or better than the builtin rtree. > GiST - index of choice for PostGIS, TSearch2, in need of optimization I'm working on adding page-level locking and WAL safety, although this is a pretty difficult project. Gavin and I are also looking at algorithms for bulk loading GiST indexes, although I'm not yet sure how possible that will be. -Neil
On Sun, 2004-11-14 at 22:59, Neil Conway wrote: > On Sun, 2004-11-14 at 11:06 +0000, Simon Riggs wrote: > > HASH - works OK, but a pain to administer, no huge benefit in using > > At least in theory, I think this could offer better performance for > equality searches than b+-tree. Given how common those kinds of queries > are, I still think hash indexes are worth putting some time into. My > guess is that their relatively poor performance at present (relative to > b+-trees) is just a reflection of how much more tuning and design work > has gone into the b+-tree code than the hash code. Can be faster for equality searches on a fairly static table; on a growing table, could be same or worse. IMHO The theoretical difference in speed doesn't seem worth the effort of spending additional time in that part of the code, given the inherent pain of REINDEX. > > GiST - index of choice for PostGIS, TSearch2, in need of optimization > > I'm working on adding page-level locking and WAL safety, although this > is a pretty difficult project. Difficult, yes. I'm glad you're stepping up to the plate for the WAL safety. Two index types is sufficient, and ISTM should be the maximum therefore. When you've finished tuning GiST, I wager that you will agree :) -- Best Regards, Simon Riggs