Thread: NOLOGGING option, or ?
Recent test results have shown a substantial performance improvement (+25%) if WAL logging is disabled for large COPY statements. This is to be expected, though has a price attached: losing the ability to crash recover data loaded in this manner. There are two parts to this proposal. First, when and whether to do this at all. Second, syntax and invocation. Why? Performance. The performance gain has a price and so should only be enabled if requested explicitly by the user. It is up to the user whether they accept this price, since in many useful cases it is a small price against a huge saving. The price is that if a crash occurs, then any table that was not empty to begin with would not be in a transactionally consistent state following crash recovery. It may have data in it, but it would be up to the user to determine whether that was satisfactory or not. It could be possible to sense what to do in this situation automatically, by putting the table into a needs-recovery type state... I don't propose to handle this *at this stage*. Syntax and invocation: Previously I had discussed adding a NOLOGGING option onto both COPY and CREATE TABLE AS SELECT that would bypass the creation of wal logging data. That is still a possibility, but would require manual code changes to much of the SQL submitted. Now, I would like to discuss adding an enable_logging USERSET GUC, that would apply *only* to COPY and CREATE TABLE AS SELECT. The default of this would be false. How can we gain this performance benefit for those willing to accept the restrictions imposed? Your comments are sought and are most welcome. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > Recent test results have shown a substantial performance improvement > (+25%) if WAL logging is disabled for large COPY statements. How much of that is left after we fix the 64-bit-CRC issue? > Now, I would like to discuss adding an enable_logging USERSET GUC, [ fear and loathing ... ] I don't like the idea of a GUC at all, and USERSET is right out. I think it would have to be system-wide (cf fsync) to be even implementable let alone somewhat predictable. Even if it could be done per-backend with reasonable semantics, random users should not get to make that decision --- it should be the DBA's call, which means it needs at least SUSET permissions. BTW, I'm sure you are the last one who needs to be reminded that any such thing breaks PITR completely. Which is surely sufficient reason not to let it be USERSET. regards, tom lane
On Tue, May 31, 2005 at 10:47:30PM -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Recent test results have shown a substantial performance improvement > > (+25%) if WAL logging is disabled for large COPY statements. > BTW, I'm sure you are the last one who needs to be reminded that > any such thing breaks PITR completely. Which is surely sufficient > reason not to let it be USERSET. This doesn't work for COPY, but maybe for CREATE TABLE AS we could log the fact that the command was executed, so the replayer could execute the same command again. Of course, this handwaving doesn't explain how the system in recovery mode would be able to execute a full query to reconstruct the table, and also it doesn't say a lot about the extra complexity at the source level to implement this option. For people loading big files into the database, maybe we could think about a command to let a file be loaded directly as initial table content. So all that we'd need is a program to write the file, which could be done externally (The filewriter would have to have access to the catalog and input functions for the involved types, though I think for simple types it would be straighforward ... we could write frozen tuples to avoid TransactionId problems.) -- Alvaro Herrera (<alvherre[a]surnet.cl>) www.google.com: interfaz de línea de comando para la web.
On Wed, 2005-06-01 at 00:40 -0400, Alvaro Herrera wrote: > This doesn't work for COPY, but maybe for CREATE TABLE AS we could log > the fact that the command was executed, so the replayer could execute > the same command again. > > Of course, this handwaving doesn't explain how the system in recovery > mode would be able to execute a full query to reconstruct the table There's also the typical problem with this kind of approach: how do you handle non-deterministic queries? (e.g. "CREATE TABLE ... AS SELECT random(), gettimeofday(), some_func(t1.x) FROM t1 LIMIT 5") -Neil
Personally I don't think that it is a good idea to do that. People will tend to corrupt their systems because they want speed (sometimes without thinking about the consequences). I can only think of one scenario where nologging would actually make sense: Many people use session tables to keep track of user level information on a website. corrupting a session table (usually not very large) would not cause a lot of problems. Doing it for COPY would be fatal. I can tell you from experience that 80% of all users will use that if the manual says that PostgreSQL will beform better this way. This is a key feature to make people think that PostgreSQL is reliable. Best regards, Hans Simon Riggs wrote: > Recent test results have shown a substantial performance improvement > (+25%) if WAL logging is disabled for large COPY statements. This is to > be expected, though has a price attached: losing the ability to crash > recover data loaded in this manner. > > There are two parts to this proposal. First, when and whether to do this > at all. Second, syntax and invocation. > > Why? > > Performance. > > The performance gain has a price and so should only be enabled if > requested explicitly by the user. It is up to the user whether they > accept this price, since in many useful cases it is a small price > against a huge saving. > > The price is that if a crash occurs, then any table that was not empty > to begin with would not be in a transactionally consistent state > following crash recovery. It may have data in it, but it would be up to > the user to determine whether that was satisfactory or not. It could be > possible to sense what to do in this situation automatically, by putting > the table into a needs-recovery type state... I don't propose to handle > this *at this stage*. > > Syntax and invocation: > > Previously I had discussed adding a NOLOGGING option onto both COPY and > CREATE TABLE AS SELECT that would bypass the creation of wal logging > data. That is still a possibility, but would require manual code changes > to much of the SQL submitted. > > Now, I would like to discuss adding an enable_logging USERSET GUC, that > would apply *only* to COPY and CREATE TABLE AS SELECT. The default of > this would be false. > > How can we gain this performance benefit for those willing to accept the > restrictions imposed? > > Your comments are sought and are most welcome. > > Best Regards, Simon Riggs > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend -- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/664/393 39 74 www.cybertec.at, www.postgresql.at
On Tue, 2005-05-31 at 22:47 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Recent test results have shown a substantial performance improvement > > (+25%) if WAL logging is disabled for large COPY statements. > > How much of that is left after we fix the 64-bit-CRC issue? Well, I don't know. The I/O is the main thing I'm trying to avoid. > > Now, I would like to discuss adding an enable_logging USERSET GUC, > > [ fear and loathing ... ] OK. I needed to say the idea, to make sure we had considered it. I now pronounce it dead and buried. > BTW, I'm sure you are the last one who needs to be reminded that > any such thing breaks PITR completely. Which is surely sufficient > reason not to let it be USERSET. You're right, thank you. But I do need to be reminded to say "this would only apply when archiving is not enabled" just as the other two existing WAL-avoidance optimisations do. In answer to the other points raised, the main use case for this optimisation is to improve data load performance into an empty table. This is a very timeconsuming stage on a big warehouse and needs performance optimization. I agree with Hans-Jurgen that this is a dangerous option for use on general COPY commands, since these can be used on empty and already populated tables. I seek a way to improve the main use case though without leaving any danger in other situations. I have two suggested approaches: 1. Introduce a new LOAD command that only works on empty tables. Following a crash, the table is dropped and the user accepts that the action-on-recovery is to reload the table. (Though in PITR mode, the data would be logged). 2. Introduce NOT LOGGED INITIALLY mode, as DB2 has done. The first COPY into an empty table would avoid WAL logging, if the user invokes that option on the specific COPY command. There are some other arguments in favour of a LOAD command.... Alon? Best Regards, Simon Riggs
Neil Conway <neilc@samurai.com> writes: > On Wed, 2005-06-01 at 00:40 -0400, Alvaro Herrera wrote: > > This doesn't work for COPY, but maybe for CREATE TABLE AS we could log > > the fact that the command was executed, so the replayer could execute > > the same command again. > > > > Of course, this handwaving doesn't explain how the system in recovery > > mode would be able to execute a full query to reconstruct the table > > There's also the typical problem with this kind of approach: how do you > handle non-deterministic queries? (e.g. "CREATE TABLE ... AS SELECT > random(), gettimeofday(), some_func(t1.x) FROM t1 LIMIT 5") For CREATE TABLE AS in the non-PITR case you don't really need to WAL log the records at all. If it fails in the middle you just drop the table. When it completes you do a checkpoint before acknowledging the COMMIT. I think this is already done for CREATE INDEX/REINDEX, also only in the non-PITR case. -- greg
> There are some other arguments in favour of a LOAD command.... Alon? We already have LOAD, so you'll have to choose something else :) Chris
Greg Stark <gsstark@MIT.EDU> writes: > For CREATE TABLE AS in the non-PITR case you don't really need to WAL log the > records at all. If it fails in the middle you just drop the table. When it > completes you do a checkpoint before acknowledging the COMMIT. > > I think this is already done for CREATE INDEX/REINDEX, also only in the > non-PITR case. Sorry to followup to my own message, but it occurs to me that COPY could be made to automatically do this for the case of an empty destination table too. I'm not sure if it should automatically check for an empty table or if there should be an option for the user to indicate he wants COPY to replace the current contents entirely. The latter might actually be more useful. . But either way, you just WAL log a record indicating that the table should be entirely empty. Then you fill it up without logging anything. Do a checkpoint and then WAL log that the COPY is finished. If any failure occurs replay leaves it empty. Again this sadly only works in the non-PITR case. -- greg
On K, 2005-06-01 at 09:16 +0100, Simon Riggs wrote: > On Tue, 2005-05-31 at 22:47 -0400, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > Recent test results have shown a substantial performance improvement > > > (+25%) if WAL logging is disabled for large COPY statements. > > > > How much of that is left after we fix the 64-bit-CRC issue? > > Well, I don't know. The I/O is the main thing I'm trying to avoid. While avoiding IO is a good thing in general, WAL IO traffic can at least easily made parallel to other IO by allocating own disk for WAL. > > > Now, I would like to discuss adding an enable_logging USERSET GUC, > > > > [ fear and loathing ... ] > > OK. I needed to say the idea, to make sure we had considered it. I now > pronounce it dead and buried. > > > BTW, I'm sure you are the last one who needs to be reminded that > > any such thing breaks PITR completely. I don't think we do any WAlling of TEMP tables, so it may be easy to extend this to any table with 'NO_WAL' bit set. That would create kind of 'extended temp table' - unsafe but fast ;) -- Hannu Krosing <hannu@skype.net>
On Wed, 2005-06-01 at 16:34 +0800, Christopher Kings-Lynne wrote: > > There are some other arguments in favour of a LOAD command.... Alon? > > We already have LOAD, so you'll have to choose something else :) Its annoying, I grant you. :-) LOAD 'library' would still need to be the default. LOAD LIBRARY 'library' would be the new recommended usage. LOAD DATA... would be the new command... with most other options hanging off of that. There's no problem with that, since that is then the same as Oracle syntax for the load utility. Best Regards, Simon Riggs
On Wed, 2005-06-01 at 04:44 -0400, Greg Stark wrote: > Greg Stark <gsstark@MIT.EDU> writes: > > > For CREATE TABLE AS in the non-PITR case you don't really need to WAL log the > > records at all. If it fails in the middle you just drop the table. When it > > completes you do a checkpoint before acknowledging the COMMIT. > > > > I think this is already done for CREATE INDEX/REINDEX, also only in the > > non-PITR case. > > Sorry to followup to my own message, but it occurs to me that COPY could be > made to automatically do this for the case of an empty destination table too. > > I'm not sure if it should automatically check for an empty table or if there > should be an option for the user to indicate he wants COPY to replace the > current contents entirely. The latter might actually be more useful. . > > But either way, you just WAL log a record indicating that the table should be > entirely empty. Then you fill it up without logging anything. Do a checkpoint > and then WAL log that the COPY is finished. If any failure occurs replay > leaves it empty. > > Again this sadly only works in the non-PITR case. Yes, all of the above could work. It would use essentially the same functionality that Manfred suggested for handling truncated tables. Ignore the first LOAD DATA started message until recovery completes, then truncate table if the LOAD DATA complete message was not logged in wal. Best Regards, Simon Riggs
On K, 2005-06-01 at 00:01 +0100, Simon Riggs wrote: > Recent test results have shown a substantial performance improvement > (+25%) if WAL logging is disabled for large COPY statements. This is to > be expected, though has a price attached: losing the ability to crash > recover data loaded in this manner. Not only recover the DB itself but also having a hot standby (and hopefully a read-only replica some time in the future). > There are two parts to this proposal. First, when and whether to do this > at all. Second, syntax and invocation. I think this should be a decision done when creating a table, just like TEMP tables. So you always know if a certain table is or is not safe/replicated/recoverable. This has also the advantage of requiring no changes to actual COPY and INSERT commands. -- Hannu Krosing <hannu@skype.net>
Simon Riggs wrote: > On Wed, 2005-06-01 at 04:44 -0400, Greg Stark wrote: > > Greg Stark <gsstark@MIT.EDU> writes: > > > > > For CREATE TABLE AS in the non-PITR case you don't really need to WAL log the > > > records at all. If it fails in the middle you just drop the table. When it > > > completes you do a checkpoint before acknowledging the COMMIT. > > > > > > I think this is already done for CREATE INDEX/REINDEX, also only in the > > > non-PITR case. > > > > Sorry to followup to my own message, but it occurs to me that COPY could be > > made to automatically do this for the case of an empty destination table too. > > > > I'm not sure if it should automatically check for an empty table or if there > > should be an option for the user to indicate he wants COPY to replace the > > current contents entirely. The latter might actually be more useful. . > > > > But either way, you just WAL log a record indicating that the table should be > > entirely empty. Then you fill it up without logging anything. Do a checkpoint > > and then WAL log that the COPY is finished. If any failure occurs replay > > leaves it empty. > > > > Again this sadly only works in the non-PITR case. > > Yes, all of the above could work. > > It would use essentially the same functionality that Manfred suggested > for handling truncated tables. Ignore the first LOAD DATA started > message until recovery completes, then truncate table if the LOAD DATA > complete message was not logged in wal. Well, why not just add this functionality to COPY rather than create a new command? One optimization is to write the dirty shared buffers to the kernel then fsync that relation, rather than do a checkpoint. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Greg Stark <gsstark@mit.edu> writes: > Sorry to followup to my own message, but it occurs to me that COPY could be > made to automatically do this for the case of an empty destination table too. Not unless you are proposing to change COPY to acquire a lock strong enough to lock out other writers to the table for the duration ... regards, tom lane
Simon Riggs <simon@2ndquadrant.com> writes: > On Wed, 2005-06-01 at 16:34 +0800, Christopher Kings-Lynne wrote: > There are some other arguments in favour of a LOAD command.... Alon? >> >> We already have LOAD, so you'll have to choose something else :) > Its annoying, I grant you. :-) > LOAD 'library' would still need to be the default. > LOAD LIBRARY 'library' would be the new recommended usage. > LOAD DATA... would be the new command... with most other options hanging > off of that. There's no problem with that, since that is then the same > as Oracle syntax for the load utility. Uh, what's wrong with adding an option to COPY? Not like it hasn't got a ton of 'em already. The Oracle-compatibility angle doesn't interest me at all, mainly because I find it highly improbable that we'd be exactly compatible anyway. regards, tom lane
On 01 Jun 2005 04:44:24 -0400, Greg Stark wrote: > Greg Stark writes: >> >> For CREATE TABLE AS in the non-PITR case you don't really need to WAL log the >> records at all. If it fails in the middle you just drop the table. When it >> completes you do a checkpoint before acknowledging the COMMIT. >> >> I think this is already done for CREATE INDEX/REINDEX, also only in the >> non-PITR case. Checkpoint or fsync? > Sorry to followup to my own message, but it occurs to me that COPY could be > made to automatically do this for the case of an empty destination table too. Why only on an empty table? What is the problem with bypassing WAL on any table as long as all files of that table are fsync'ed before commit? > Again this sadly only works in the non-PITR case. Apart from that problem of course :) Jochem
Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > > Sorry to followup to my own message, but it occurs to me that COPY could be > > made to automatically do this for the case of an empty destination table too. > > Not unless you are proposing to change COPY to acquire a lock strong > enough to lock out other writers to the table for the duration ... Well, if the table is initally empty, what harm is there in locking the table? How many people query the table while it is being loaded, and because the transaction isn't committed, the table is empty to everyone else anyway. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Jochem van Dieten wrote: > On 01 Jun 2005 04:44:24 -0400, Greg Stark wrote: > > Greg Stark writes: > >> > >> For CREATE TABLE AS in the non-PITR case you don't really need to WAL log the > >> records at all. If it fails in the middle you just drop the table. When it > >> completes you do a checkpoint before acknowledging the COMMIT. > >> > >> I think this is already done for CREATE INDEX/REINDEX, also only in the > >> non-PITR case. > > Checkpoint or fsync? > > > > Sorry to followup to my own message, but it occurs to me that COPY could be > > made to automatically do this for the case of an empty destination table too. > > Why only on an empty table? What is the problem with bypassing WAL on > any table as long as all files of that table are fsync'ed before > commit? Because adding rows to a table might modify existing pages, and if the COPY fails, you have to restore those pages to a consistent state, and make sure they are recovered for partial page writes, which we can't do without WAL. With an initially empty table, you can just throw away the file system file. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> Not unless you are proposing to change COPY to acquire a lock strong >> enough to lock out other writers to the table for the duration ... > Well, if the table is initally empty, what harm is there in locking the > table? You cannot *know* whether it is empty unless you lock the table before you look. So your argument is circular. I think this only makes sense as an explicit option to COPY, one of the effects of which would be to take a stronger lock than COPY normally does. regards, tom lane
On 6/1/05, Hans-Jürgen Schönig <postgres@cybertec.at> wrote: > Personally I don't think that it is a good idea to do that. > People will tend to corrupt their systems because they want speed > (sometimes without thinking about the consequences). > > I can only think of one scenario where nologging would actually make > sense: Many people use session tables to keep track of user level > information on a website. corrupting a session table (usually not very > large) would not cause a lot of problems. Well, from what I know, TEMPORARY tables are not WAL-logged, since they won't exist after "restart" (since none of the current sessions would exist). The problem with TEMPORARY tables is that they are not globally visible. I think it would be neat to be able to create "server-life-time" tables. I.e. table which is initially empty, can be used by all users, but is guaranteed to be empty (truncated) upon server restart. A place to store global temporary variables. A table type to put on a RAM-disk. Potential uses? Imagine a site which has a busy 'session' table and a relatively 'static' other tables. Most of WALs would consist on 'useless' updates to session table. And recovery using WAL files would take longer (as PostgreSQL would have to dump and restore whole a lot of session data). Having a "global temporary" table would be helpful in such a situation. And theoretically it wouldn't need to "spill to disk" at all, provided it was small enough. Regards, Dawid
On Wed, 2005-06-01 at 10:18 -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> Not unless you are proposing to change COPY to acquire a lock strong > >> enough to lock out other writers to the table for the duration ... > > > Well, if the table is initally empty, what harm is there in locking the > > table? > > You cannot *know* whether it is empty unless you lock the table before > you look. So your argument is circular. > > I think this only makes sense as an explicit option to COPY, one of the > effects of which would be to take a stronger lock than COPY normally does. Locking the table is OK. We're loading it for the first time, so we are expecting to be the only users at this time. Here's a more fully worked out plan for wal/concurrency: 1. Lock table, scan until we find our first live row (not deleted, insert not aborted), in which case throw an error. If no error, then ok to proceed to next phase. 2. If there are any dead row versions we truncate the file, since we do not wish to see those rows ever again. No existing transaction can have a lock on them, since we hold it, so its ok for them to go. We record that we have done this in WAL with a short message to say that that we are loading an empty table, whether or not archiving is not enabled. 3. We load the table, with all TransactionIds set as FROZEN. We do not generate WAL for each row unless archiving is enabled. 4. Optionally, we set a flag on the table showing the whole table is frozen. Anybody writing to this table subsequently will spoil this flag. If the flag is set, all forms of VACUUM will return success immediately without performing a scan (since it is already in a perfect VACUUM FULL and VACUUM FREEZE state). If the server crashes, we replay WAL. If we see a load start message, we truncate the relation and note that a load has started. If there is WAL data for the tuples, we replay it. If WAL replay ends without the load transaction having successfully committed, then we truncate the table. The above applies to both LOAD/COPY whatever-yer-call-it and in modified form for CREATE TABLE AS SELECT. For CTAS, no scan is required in (1), and no truncate is required in (2), otherwise the same. I'm expecting Alon Goldshuv to join this discussion soon to explain some other ideas, new developments and discuss the potential for a new command, so lets wait for him... Best Regards, Simon Riggs
Hannu Krosing <hannu@skype.net> writes: > I think this should be a decision done when creating a table, just like > TEMP tables. So you always know if a certain table is or is not > safe/replicated/recoverable. > This has also the advantage of requiring no changes to actual COPY and > INSERT commands. That doesn't seem right to me; the scenario I envision is that you are willing to do the initial data loading over again (since you presumably still have the source data available). But once you've got it loaded you want full protection. Perhaps it could work to use an ALTER TABLE command to flip the state. But I'm not really seeing the point compared to treating it as a COPY option. I do not believe that anyone needs this to work on individual INSERT commands --- if you are after max speed, why aren't you using COPY? And treating it as an ALTER property opens the possibility of forgetting to ALTER the table back to normal behavior, which would be a foot-gun of large caliber indeed :-( regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Jochem van Dieten wrote: >> Why only on an empty table? What is the problem with bypassing WAL on >> any table as long as all files of that table are fsync'ed before >> commit? > Because adding rows to a table might modify existing pages, and if the > COPY fails, you have to restore those pages to a consistent state, and > make sure they are recovered for partial page writes, which we can't do > without WAL. With an initially empty table, you can just throw away the > file system file. You have also got to think about the effects on the table's indexes ... regards, tom lane
Simon Riggs <simon@2ndquadrant.com> writes: > 4. Optionally, we set a flag on the table showing the whole table is > frozen. Anybody writing to this table subsequently will spoil this flag. > If the flag is set, all forms of VACUUM will return success immediately > without performing a scan (since it is already in a perfect VACUUM FULL > and VACUUM FREEZE state). This bit strikes me as dangerous and not related to the original proposal. I don't care for the load-already-frozen-data part at all, either. That's not just giving up WAL protection, that's an outright MVCC semantics violation, in return for which we get ... not much. Certainly not any speedup in the LOAD itself. regards, tom lane
On 6/1/05, Bruce Momjian wrote: > Jochem van Dieten wrote: >> >> Why only on an empty table? What is the problem with bypassing WAL on >> any table as long as all files of that table are fsync'ed before >> commit? > > Because adding rows to a table might modify existing pages, and if the > COPY fails, you have to restore those pages to a consistent state, and > make sure they are recovered for partial page writes, which we can't do > without WAL. With an initially empty table, you can just throw away the > file system file. Thank you for the explanation, but I am afraid I still don't get it. COPY can either fail and do a normal rollback, in which case there is no problem because the xid never made it to the xlog. So I take it you are talking about a hard crash (pull the plug) somewhere during the actual writing to disk. In that case you have updated several pages and overwritten the free space with new tuples. But you have not overwritten live tuples, so why would you need to restore them? I mean, didn't PostgreSQL < 7.1 work without a WAL at all? Jochem
On Wed, 2005-06-01 at 11:31 -0400, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > I think this should be a decision done when creating a table, just like > > TEMP tables. So you always know if a certain table is or is not > > safe/replicated/recoverable. > > This has also the advantage of requiring no changes to actual COPY and > > INSERT commands. > > That doesn't seem right to me; the scenario I envision is that you are > willing to do the initial data loading over again (since you presumably > still have the source data available). But once you've got it loaded > you want full protection. Yes, thats the scenario. Believe me, I prefer less code, but I think general feeling now is that we must provide a data safe solution to the performance challenge. > Perhaps it could work to use an ALTER TABLE command to flip the state. > But I'm not really seeing the point compared to treating it as a COPY > option. I do not believe that anyone needs this to work on individual > INSERT commands --- if you are after max speed, why aren't you using > COPY? And treating it as an ALTER property opens the possibility of > forgetting to ALTER the table back to normal behavior, which would be > a foot-gun of large caliber indeed :-( Oh no, not the foot gun again. I surrender. Best Regards, Simon Riggs
On Wed, Jun 01, 2005 at 06:00:28PM +0100, Simon Riggs wrote: > On Wed, 2005-06-01 at 11:31 -0400, Tom Lane wrote: > > Perhaps it could work to use an ALTER TABLE command to flip the state. > > But I'm not really seeing the point compared to treating it as a COPY > > option. I do not believe that anyone needs this to work on individual > > INSERT commands --- if you are after max speed, why aren't you using > > COPY? And treating it as an ALTER property opens the possibility of > > forgetting to ALTER the table back to normal behavior, which would be > > a foot-gun of large caliber indeed :-( > > Oh no, not the foot gun again. I surrender. Sorry, what are you surrendering from/for/of/to? I think the proposal is doing a reasonable headways. So far we have - it is an option to COPY and CREATE TABLE AS, not GUC, not ALTER TABLE - it'd only work on empty tables - it'd only work it PITR is off - it needs to follow MVCC semantics, i.e. the tuples need to be written with the correct Xids - after the command is done, an fsync is applied to the table file(s) Another point that needs thought is what to do if the table has any indexes. Are operations on said indexes logged or not? Maybe we should just say that indexes are verbotten and the user needs to create them afterwards. -- Alvaro Herrera (<alvherre[a]surnet.cl>) "Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)
Alvaro Herrera <alvherre@surnet.cl> writes: > - it is an option to COPY and CREATE TABLE AS, not GUC, not ALTER TABLE AFAICS it could just happen automatically for CREATE TABLE AS; there's no need for an option there, any more than there is for CREATE INDEX. The only reason it needs to be an explicitly specified option for COPY is that it would require taking a sole-writer lock on the table, which COPY does not now do (and I believe I've heard of people using parallel COPYs to load a table faster, so changing the lock type wouldn't be transparent for everyone). > Another point that needs thought is what to do if the table has any > indexes. Are operations on said indexes logged or not? Maybe we should > just say that indexes are verbotten and the user needs to create them > afterwards. That seems pretty reasonable to me. Again, that's what you'd do anyway if you are after the fastest possible load time, so why should we work much harder to support an inefficient approach? Thinking about it, maybe the user-visible option should be defined thus: LOCK Causes COPY FROM to acquire Exclusive lock on the target table, rather than RowExclusive lock as it normally does. This ensures that no other process is modifying the table while the COPY proceeds. In some cases this can allow significantly faster operation. and then the checks on PITR mode, no indexes, and empty starting table could be internal implementation details rather than part of the user-visible spec (ie, we just fall through and do it normally if any of those conditions don't hold). I like this a little better because there might be application-level reasons to want exclusive lock, independently of implementation details. Also: AFAICS the starting table need not be empty, if we arrange for all inserts done by the COPY to be done into freshly-appended blocks. The initial WAL entry could note the current table length, and instead of "truncate to 0 length" the recovery action is "truncate to noted length". So really the constraints are just "no PITR" and "no indexes". regards, tom lane
I have been working on improving the COPY command performance and as a result also came up with other thoughts that may possibly be better off implemented in a new command (i.e LOAD DATA) rather than adding them to the existing COPY. The improvements I made were in the COPY parsing logic - changing a char-by-char parsing to a multi char buffered fast parsing, while using a bytebuffer which is faster than StringInfoData and minimizing the number of loads into the buffer. The data conversion and insertion parts of COPY I left untouched. As a result the parsing performance increased by about 550%, and the overall COPY performance increased by: Around 40% for 15 column (mixed types) table. Around 90% for 1 column table. (the difference betweeen the two is caused by data conversion overhead). I will post the patch and more numbers to the list later today with more details. I'll just comment now that it is only available for delimited ASCII input data when client and server encodings are the same. CSV and encoding conversions may be added later, this is merely to show that data could be loaded much faster. Here are some things that make me think a new LOAD command is a good idea: 1) There seem to be a possibility that many COPY modifications/improvements may be problematic to incorporate in the current postgres COPY code. Further more, it may be desired to keep the COPY command as is and also have a way to run an improved COPY command for purposes of backwards compatibility. 2) A modified command syntax for introducing a direct single row error handling. By direct I mean - a row that if rejected from within the COPY command context does not throw an error and rollsback the whole transaction. Instead the error is caught and recorded elsewhere, maybe in some error table, with some more information that can later on be retrieved. The following rows continue to be processed. This way there is barely any error handling overhead. Having a recursive row isolation into smaller batches is extremely expensive for non-small data sets. It's not an option for serious users. 3) maybe have an option to indicate the EOL (end of line) format in the command syntax. Current COPY code detects the line-end according to the first data line, this is problematic is 1st data line is mal formatted, and also this doesn't allow having any CR's for example in the data file when EOL is only a linefeed. That causes extra data errors in COPY processing. A CR is a valid data character. Specifying the line end in command syntax will save all this badness. 4) Data integrity and escaping improvements. My patch changes now treats all characters as data (unless it's an escaped delim or EOL) and therefore data integrity is preserved (take for example the following valid data field "file:\new\bang" that after COPY into the database, querying for it from psql will result is wrong data -- backslashes are gone, there are 2 data lines, and a bell will ring for \b!) However, some people that already got used to the postgres COPY escaping way may want to keep it. They could do so by still using the old COPY. 5) allow an ERRORLIMIT to allow control of aborting a load after a certain number of errors (and a pre-requisite for this is point number 2 above). 6) allow LIMIT and OFFSET, for files with header rows for example (could be done in COPY too). 7) Allow the blocks to be directly written to the table, rather than via the buffer cache. 8) Allow a bulk index insertion operation at the end of the LOAD step, if the data has been loaded in sorted order. Use something like the SORTED INDEXES statement on Oracle sql*loader to specify the sort order of the incoming data, so that the index build step can bypass another external sort before loading directly into the index. 9) allow for Simon's WAL bypass. I have surely missed some problems that hide behind the idea, but these points make me believe that LOAD DATA is a good idea. Alon. On 5/31/05 7:47 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndquadrant.com> writes: >> Recent test results have shown a substantial performance improvement >> (+25%) if WAL logging is disabled for large COPY statements. > > How much of that is left after we fix the 64-bit-CRC issue? > >> Now, I would like to discuss adding an enable_logging USERSET GUC, > > [ fear and loathing ... ] > > I don't like the idea of a GUC at all, and USERSET is right out. > I think it would have to be system-wide (cf fsync) to be even > implementable let alone somewhat predictable. Even if it could > be done per-backend with reasonable semantics, random users should > not get to make that decision --- it should be the DBA's call, > which means it needs at least SUSET permissions. > > BTW, I'm sure you are the last one who needs to be reminded that > any such thing breaks PITR completely. Which is surely sufficient > reason not to let it be USERSET. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
On Wed, 2005-06-01 at 11:55 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > 4. Optionally, we set a flag on the table showing the whole table is > > frozen. Anybody writing to this table subsequently will spoil this flag. > > If the flag is set, all forms of VACUUM will return success immediately > > without performing a scan (since it is already in a perfect VACUUM FULL > > and VACUUM FREEZE state). > > This bit strikes me as dangerous and not related to the original > proposal. I don't care for the load-already-frozen-data part at all, > either. That's not just giving up WAL protection, that's an outright > MVCC semantics violation, in return for which we get ... not much. > Certainly not any speedup in the LOAD itself. I agree it *sounds* dangerous, but is it? If so, how? I will immediately withdraw any idea that proves dangerous. We're holding the table lock and will continue to do so until end of transaction. No transaction with an earlier id will ever see the data we load because of the lock. Later transactions will see the data only when we commit. At that stage, all they care about is that the data is visible...it doesn't matter that we have frigged it to look like its been there a real long time. We're not giving up WAL protection. The PITR case is covered. Non-PITR cases have exactly the same transactional consistency across as crash as they do now. Nothing is lost through this proposal. Doing those two additional actions gives us the ability to avoid: 1. avoid writing the whole table twice because of SetBufferCommitInfoNeedsSave 2. avoid pointless VACUUMs of very large tables 3. load the table direct to read-only media, or copy it to read-only media at some point in the future without needing to do a VACUUM FREEZE All of that means we write the table once, rather than 3 times over the lifetime of the table. And we only need scan it for user SQL, not for wrap-around avoiding VACUUMs. I know you have a solution to the second one in mind. I would not argue against the needs-vacuuming bitmap idea when that comes back around. Best Regards, Simon Riggs
On Wed, Jun 01, 2005 at 06:55:46PM +0100, Simon Riggs wrote: > On Wed, 2005-06-01 at 11:55 -0400, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > 4. Optionally, we set a flag on the table showing the whole table is > > > frozen. Anybody writing to this table subsequently will spoil this flag. > > > If the flag is set, all forms of VACUUM will return success immediately > > > without performing a scan (since it is already in a perfect VACUUM FULL > > > and VACUUM FREEZE state). > > > > This bit strikes me as dangerous and not related to the original > > proposal. I don't care for the load-already-frozen-data part at all, > > either. That's not just giving up WAL protection, that's an outright > > MVCC semantics violation, in return for which we get ... not much. > > Certainly not any speedup in the LOAD itself. > > I agree it *sounds* dangerous, but is it? If so, how? I will immediately > withdraw any idea that proves dangerous. > > We're holding the table lock and will continue to do so until end of > transaction. No transaction with an earlier id will ever see the data we > load because of the lock. Suppose you load half the tuples and the plug is pulled. After recovery, you have half-load of tuples that are visible to everyone. This is a no-no. Plus, what is the benefit of having the tuples in frozen state to start with? If this is a data warehouse application, I'd expect the table to be dropped or truncated rather before the billion-transactions barrier comes to pass. -- Alvaro Herrera (<alvherre[a]surnet.cl>) "I dream about dreams about dreams", sang the nightingale under the pale moon (Sandman)
Simon Riggs <simon@2ndquadrant.com> writes: > If the server crashes, we replay WAL. If we see a load start message, we > truncate the relation and note that a load has started. If there is WAL > data for the tuples, we replay it. If WAL replay ends without the load > transaction having successfully committed, then we truncate the table. On further thought, this seems both risky and unnecessary. The reason it's risky is this scenario: * Backend 1 makes a LOAD-start WAL entry. * Backend 1 loads some data, extending the table beyond its former end. * Backend 1 errors out without committing its transaction. * Backend 2 inserts some data into the no-longer-locked table. It uses free space in one of the added pages, or maybe even adds new pages of its own. * Backend 2 commits. * System crashes, and we have to replay the above actions. In this scenario you cannot truncate at the end of replay without losing backend 2's committed data. You can think of various ways to avoid this risk (for instance, maybe *any* WAL-logged operation on the table should cause the pending TRUNCATE to be discarded) but they all seem expensive and/or still somewhat unsafe. The reason it's unnecessary is what's the point? All you're doing by not truncating is leaving some uncommitted tuples in the table. It's not the job of WAL recovery to get rid of such things; that's VACUUM's job. So what I'm thinking is we need no special WAL entries for this. What we need is just an operating mode of COPY in which it doesn't WAL-log its inserts, but instead fsyncs before completion, much like index build does. For safety it must do all its inserts into freshly-added pages; this is not to ensure truncatability, because we aren't going to do that anyway, but to ensure that we don't have unlogged operations changing pages that might contain committed tuples. (That would pose a risk of losing committed data to incomplete writes in case of system crash partway through. The same reason is why we need exclusive lock: else we might end up with pages containing a mix of logged and unlogged tuples.) Also there can be no indexes, since we don't want index entries pointing to unlogged tuples. And PITR can't be enabled. Otherwise no problem. regards, tom lane
Alvaro Herrera <alvherre@surnet.cl> writes: > On Wed, Jun 01, 2005 at 06:55:46PM +0100, Simon Riggs wrote: >> We're holding the table lock and will continue to do so until end of >> transaction. No transaction with an earlier id will ever see the data we >> load because of the lock. > Suppose you load half the tuples and the plug is pulled. After > recovery, you have half-load of tuples that are visible to everyone. > This is a no-no. Simon is expecting that the loaded tuples are guaranteed to be erased (by table truncation) during recovery. As I just noted I'm unconvinced of the safety of doing truncations during recovery, so I'd prefer not to depend on that. The scenario I was thinking of was different: you load pre-frozen tuples, commit, and thereby release the table lock. Now the tuples are visible to transactions that started before you did; that's what violates MVCC. regards, tom lane
On Wed, 2005-06-01 at 14:24 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@surnet.cl> writes: > > On Wed, Jun 01, 2005 at 06:55:46PM +0100, Simon Riggs wrote: > >> We're holding the table lock and will continue to do so until end of > >> transaction. No transaction with an earlier id will ever see the data we > >> load because of the lock. > > > Suppose you load half the tuples and the plug is pulled. After > > recovery, you have half-load of tuples that are visible to everyone. > > This is a no-no. > > Simon is expecting that the loaded tuples are guaranteed to be erased > (by table truncation) during recovery. As I just noted I'm unconvinced > of the safety of doing truncations during recovery, so I'd prefer not > to depend on that. > > The scenario I was thinking of was different: you load pre-frozen > tuples, commit, and thereby release the table lock. Now the tuples > are visible to transactions that started before you did; that's what > violates MVCC. Agreed. MVCC violation. OK, back to the drawing board. Best Regards, Simon Riggs
On Wed, 2005-06-01 at 14:14 -0400, Tom Lane wrote: > So what I'm thinking is we need no special WAL entries for this. What > we need is just an operating mode of COPY in which it doesn't WAL-log > its inserts, but instead fsyncs before completion, much like index build > does. For safety it must do all its inserts into freshly-added pages; > this is not to ensure truncatability, because we aren't going to do that > anyway, but to ensure that we don't have unlogged operations changing > pages that might contain committed tuples. (That would pose a risk of > losing committed data to incomplete writes in case of system crash > partway through. The same reason is why we need exclusive lock: else > we might end up with pages containing a mix of logged and unlogged > tuples.) Also there can be no indexes, since we don't want index > entries pointing to unlogged tuples. And PITR can't be enabled. > Otherwise no problem. What you describe above is a coherent set of features that provide most of the benefits I sought, plus some others. We also don't mess with WAL, which is grand thing. We gain the ability to load into tables with rows already in them. I don't agree with all of your other points, but given time schedules, I think that we win with the above, so forget the rest. The main COPY/LOAD DATA discussion is on another thread of this from Alon, who has some interesting ideas and some really cool performance results to share. Best Regards, Simon Riggs
Jochem van Dieten wrote: > On 6/1/05, Bruce Momjian wrote: > > Jochem van Dieten wrote: > >> > >> Why only on an empty table? What is the problem with bypassing WAL on > >> any table as long as all files of that table are fsync'ed before > >> commit? > > > > Because adding rows to a table might modify existing pages, and if the > > COPY fails, you have to restore those pages to a consistent state, and > > make sure they are recovered for partial page writes, which we can't do > > without WAL. With an initially empty table, you can just throw away the > > file system file. > > Thank you for the explanation, but I am afraid I still don't get it. > > COPY can either fail and do a normal rollback, in which case there is > no problem because the xid never made it to the xlog. So I take it you > are talking about a hard crash (pull the plug) somewhere during the > actual writing to disk. In that case you have updated several pages > and overwritten the free space with new tuples. But you have not > overwritten live tuples, so why would you need to restore them? I > mean, didn't PostgreSQL < 7.1 work without a WAL at all? What if you are adding rows to an existing page --- in that case you are writing a page that also contained valid tuples before the COPY. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Alon Goldshuv wrote: > 9) allow for Simon's WAL bypass. > > I have surely missed some problems that hide behind the idea, but these > points make me believe that LOAD DATA is a good idea. The community is unlikely to add a new LOAD DATA command that does _almost_ everything COPY does. We are much more likely to incrementally improve COPY. The problem with a new command is that it becomes unclear when you should use COPY and when LOAD DATA, and it confuses users, and has maintenance overhead. If Bizgres wants a new command name, go for it, but it is unlikely that the community release is going to go in that direction, unless there is a fundamental agreement that COPY is broken and needs a major revamp, and I have heard no talk of that. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> Not unless you are proposing to change COPY to acquire a lock strong > >> enough to lock out other writers to the table for the duration ... > > > Well, if the table is initally empty, what harm is there in locking the > > table? > > You cannot *know* whether it is empty unless you lock the table before > you look. So your argument is circular. > > I think this only makes sense as an explicit option to COPY, one of the > effects of which would be to take a stronger lock than COPY normally does. One idea would be to look at the table file size first. If it has zero blocks, lock the table and if it still has zero blocks, do the no-WAL copy. I hate to add a flag to a command when we can automatically handle it ourselves. Now, you mentioned the idea of doing the optimization in tables that already have data, and if we do that, we would need a flag because the lock is stronger than what we have now. What we could do is to do no-WAL automatically for empty tables (like when a database is first loaded), and use the flag for cases where the tables is not zero pages. The fact is that database loads are a prefect case for this optimization and old dumps are not going to have that flag anyway, and automatic is better if we can do it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce, > The problem with a new command is that it becomes unclear when you > should use COPY and when LOAD DATA, and it confuses users, and has > maintenance overhead. If Bizgres wants a new command name, go for it, > but it is unlikely that the community release is going to go in that > direction, unless there is a fundamental agreement that COPY is broken > and needs a major revamp, and I have heard no talk of that. The question of whether COPY should be improved or whether the changes should take the form of a new command is separate from the question of whether the performance of the load path in PostgreSQL needs improvement. The 90% performance increase (from 12 MB/s to 21 MB/s) that Alon reported comes from replacing the parsing logic within COPY. I believe that the parsing logic in COPY is fundamentally broken from a performance perspective, and may be broken from a functionality perspective WRT embedded backslashes. One of the reasons to consider a LOAD DATA command is that we can isolate the need for performance improvements and special syntax from the concerns of preserving the legacy behavior of COPY for use as the primary mechanism for DUMP and RESTORE. - Luke
Bruce Momjian <pgman@candle.pha.pa.us> writes: > One idea would be to look at the table file size first. If it has zero > blocks, lock the table and if it still has zero blocks, do the no-WAL > copy. I think that's a bad idea. It would make the behavior unpredictable --- sometimes a COPY will take an exclusive lock, and other times not; and the reason why is at a lower semantic level than the user is supposed to know about. Before you say "this is not important", consider the nontrivial risk that the stronger lock will cause a deadlock failure. I don't think that it's acceptable for lock strength to be unpredictable. regards, tom lane
"Luke Lonergan" <llonergan@greenplum.com> writes: > One of the reasons to consider a LOAD DATA command is that we can isolate > the need for performance improvements and special syntax from the concerns > of preserving the legacy behavior of COPY for use as the primary mechanism > for DUMP and RESTORE. ... and instead, define some new behavior that will soon be considered broken legacy code itself? There isn't any demand for changing the semantics of COPY, as far as I've noticed. If we can make it faster with the same semantics that's great, but I'm not in favor of inventing an alternate that does almost the same thing but (eg) breaks backslash handling in the name of speed. regards, tom lane
On Wed, 2005-06-01 at 10:35 -0700, Alon Goldshuv wrote: > I have been working on improving the COPY command performance > Around 40% for 15 column (mixed types) table. > Around 90% for 1 column table. Thats very cool. > 2) A modified command syntax for introducing a direct single row error > handling. By direct I mean - a row that if rejected from within the COPY > command context does not throw an error and rollsback the whole transaction. > Instead the error is caught and recorded elsewhere, maybe in some error > table, with some more information that can later on be retrieved. The > following rows continue to be processed. This way there is barely any error > handling overhead. Having a recursive row isolation into smaller batches is > extremely expensive for non-small data sets. It's not an option for serious > users. Can we call this the ERRORTABLE clause? > 5) allow an ERRORLIMIT to allow control of aborting a load after a certain > number of errors (and a pre-requisite for this is point number 2 above). The default for which would be ERRORLIMIT 0 to give backwards compatibility. 2) and 5) seem critical for combined usability & performance with real world data. I'm not clear from all of those options whether we still need a LOAD command, based upon other issues/comments raised on this thread. However, there are some other arguments for why it might be a good idea to have a LOAD DATA command separate from COPY. Certainly long term features would be easier to add with two commands. Trying to maintain backwards compatibility just because we use COPY seems like an uphill struggle and is going to mean we have to handle sensible new additions as options so we don't break existing applications. The most important one is the lock type held. [Oracle compatibility isn't one of them, even if it did provide the command name.] But things will be clearer when we see the patch. Best Regards, Simon Riggs
On Wed, 2005-06-01 at 18:32 -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > One idea would be to look at the table file size first. If it has zero > > blocks, lock the table and if it still has zero blocks, do the no-WAL > > copy. > > I think that's a bad idea. It would make the behavior unpredictable > --- sometimes a COPY will take an exclusive lock, and other times not; > and the reason why is at a lower semantic level than the user is > supposed to know about. > > Before you say "this is not important", consider the nontrivial risk > that the stronger lock will cause a deadlock failure. I don't think > that it's acceptable for lock strength to be unpredictable. While I agree with Bruce's sentiment, the locking is an issue with COPY. It is particularly important to be able to plan the concurrency within an application. That is why DB2's interesting ability to perform lock upgrades in a relatively unpredictable manner is not good. For CREATE TABLE AS SELECT, I think we could do this Bruce's way and win, since we would take the same grade of lock in both cases. Best Regards, Simon Riggs
Luke Lonergan wrote: > Bruce, > > > The problem with a new command is that it becomes unclear when you > > should use COPY and when LOAD DATA, and it confuses users, and has > > maintenance overhead. If Bizgres wants a new command name, go for it, > > but it is unlikely that the community release is going to go in that > > direction, unless there is a fundamental agreement that COPY is broken > > and needs a major revamp, and I have heard no talk of that. > > The question of whether COPY should be improved or whether the changes > should take the form of a new command is separate from the question of > whether the performance of the load path in PostgreSQL needs improvement. > > The 90% performance increase (from 12 MB/s to 21 MB/s) that Alon reported > comes from replacing the parsing logic within COPY. I believe that the > parsing logic in COPY is fundamentally broken from a performance > perspective, and may be broken from a functionality perspective WRT embedded > backslashes. COPY works as designed. The idea that some guy we have never heard of is going to appear and rewrite COPY's processing and tell us that the existing code is actually broken seems pretty arrogant to me. If it is broken (meaning doesn't work as designed), please show us facts rather than conjecture. Oh, and the "Our COPY improvements are so fundamental that they deserve a new command name" also has a similar flavor. (Please explain how you handle literal delimiters and nulls with no escape processing.) > One of the reasons to consider a LOAD DATA command is that we can isolate > the need for performance improvements and special syntax from the concerns > of preserving the legacy behavior of COPY for use as the primary mechanism > for DUMP and RESTORE. This seems like a case where GreenPlum's priorities and the community's priorities might not match. There is much more work required on your part if you are going to convince the community it needs a new data loading command, and starting out with the assumption in emails that it is going to be a newly named command isn't the best approach. That is my fundamental point. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > One idea would be to look at the table file size first. If it has zero > > blocks, lock the table and if it still has zero blocks, do the no-WAL > > copy. > > I think that's a bad idea. It would make the behavior unpredictable > --- sometimes a COPY will take an exclusive lock, and other times not; > and the reason why is at a lower semantic level than the user is > supposed to know about. > > Before you say "this is not important", consider the nontrivial risk > that the stronger lock will cause a deadlock failure. I don't think > that it's acceptable for lock strength to be unpredictable. Yea, but you are only doing the lock if the table is zero pages. Doesn't that help? Maybe not. I do like the LOCK keyword if we have to use one to enable this functionality, but I am suspecting people will want this functionality in pg_dump output. How do we do that? Just make it the default for pg_dump output? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Simon Riggs wrote: > I'm not clear from all of those options whether we still need a LOAD > command, based upon other issues/comments raised on this thread. > > However, there are some other arguments for why it might be a good idea > to have a LOAD DATA command separate from COPY. Certainly long term > features would be easier to add with two commands. Trying to maintain > backwards compatibility just because we use COPY seems like an uphill > struggle and is going to mean we have to handle sensible new additions > as options so we don't break existing applications. The most important > one is the lock type held. Well, we have had a pretty much unmodified COPY format since like the Berkeley days (I added \N and \.). Please tell us exactly what you want do to that requires a format change, and we can talk about it, but showing up with no proof and expecting a new command is the _wrong_ approach. It actually reminds me of the "our company developed it so it must be great" approach, which doesn't work well in the community. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Simon Riggs wrote: > On Wed, 2005-06-01 at 10:35 -0700, Alon Goldshuv wrote: >>2) A modified command syntax for introducing a direct single row error >>handling. By direct I mean - a row that if rejected from within the COPY >>5) allow an ERRORLIMIT to allow control of aborting a load after a certain >>number of errors (and a pre-requisite for this is point number 2 above). > 2) and 5) seem critical for combined usability & performance with real > world data. I'll second that! This would be a huge win for one of my real world applications, whether implemented as a new command, or as added capability on top of COPY. The other performance enhancements would certainly be nice to have also, but in my experience not nearly as important as these two. Joe
Bruce, The patch is not there to show that something is "broken" is it there to show how things could be done in another way, which may or may not be more desireable. > COPY works as designed. The idea that some guy we have never heard of > is going to appear and rewrite COPY's processing and tell us that the > existing code is actually broken seems pretty arrogant to me. If it is > broken (meaning doesn't work as designed), please show us facts rather > than conjecture. I am sure that the code works as designed. In my previous email I was referring to the fact that using COPY with a data field that happens to have backslashes in it, and then querying this data field you will get different results. For example do COPY of a field "c:\\one\ten\nine" and query for it. The result will be much different. This is a problem with clickstream data for example. That's all. It's very possible that there are reasons that I missed for why things are the way they are. > (Please explain how you handle literal delimiters and nulls with no > escape processing.) Escape processing is done only for these 2 cases. Thx, Alon.
Tom, > > ... and instead, define some new behavior that will soon be considered > broken legacy code itself? I'll not argue further about whether to have a separate LOAD command. That's not as important as fixing the performance issues in the data load path in PostgreSQL to me. However, I find it compelling that the underlying problem confronting high performance data loading in the engine is the slow parse code in psql/copy, etc., and not the WAL or other issues. Next on the list are huge gains from removing things like repetitive calls to strlen(TZ) in the attribute conversion code. > There isn't any demand for changing the semantics of COPY, as far as > I've noticed. If we can make it faster with the same semantics that's > great, but I'm not in favor of inventing an alternate that does almost > the same thing but (eg) breaks backslash handling in the name of speed. A 540% increase in parsing speed while performing substantially the same logic is what the patch accomplishes. The patch is now available on pgsql-patches. In this case the speed did not come from breaking backslash handling, but rather from using faster processing for escape processing and delimiter parsing. We *could* recreate identical escape semantics to COPY at nearly the same speed, but frankly we're puzzled as to why character sequences with "\" automatically imply escape processing. We've found that there are many cases where more sophisticated escape processing options are necessary, including multi-byte delimiters and escape sequences, and the default use of "\" for escaping breaks many data load cases. - Luke
Alon Goldshuv wrote: > Bruce, > > The patch is not there to show that something is "broken" is it there to > show how things could be done in another way, which may or may not be more > desireable. Sure, we are always looking for ways to improve things. > > COPY works as designed. The idea that some guy we have never heard of > > is going to appear and rewrite COPY's processing and tell us that the > > existing code is actually broken seems pretty arrogant to me. If it is > > broken (meaning doesn't work as designed), please show us facts rather > > than conjecture. > > I am sure that the code works as designed. In my previous email I was > referring to the fact that using COPY with a data field that happens to have > backslashes in it, and then querying this data field you will get different > results. For example do COPY of a field "c:\\one\ten\nine" and query for it. > The result will be much different. This is a problem with clickstream data > for example. That's all. It's very possible that there are reasons that I > missed for why things are the way they are. Yep, you have to double backslahses coming in as data so we can use backslash for marking null, delimiters, etc. I see no way around that, and no one since Berkeley has come up with one either. If you have an idea, we would _love_ to hear it. > > (Please explain how you handle literal delimiters and nulls with no > > escape processing.) > Escape processing is done only for these 2 cases. OK, how is that done? How about for newlines in the data? Right now I think that is the only escapes we do. We support more but basically the only required ones are delimiter and nulls. The others are around just for convenience. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Luke Lonergan wrote: > Tom, > > > > ... and instead, define some new behavior that will soon be considered > > broken legacy code itself? > > I'll not argue further about whether to have a separate LOAD command. > That's not as important as fixing the performance issues in the data load > path in PostgreSQL to me. > > However, I find it compelling that the underlying problem confronting high > performance data loading in the engine is the slow parse code in psql/copy, > etc., and not the WAL or other issues. Next on the list are huge gains from > removing things like repetitive calls to strlen(TZ) in the attribute > conversion code. Yep, we would _love_ those improvements. > > There isn't any demand for changing the semantics of COPY, as far as > > I've noticed. If we can make it faster with the same semantics that's > > great, but I'm not in favor of inventing an alternate that does almost > > the same thing but (eg) breaks backslash handling in the name of speed. > > A 540% increase in parsing speed while performing substantially the same > logic is what the patch accomplishes. The patch is now available on > pgsql-patches. In this case the speed did not come from breaking backslash > handling, but rather from using faster processing for escape processing and > delimiter parsing. We *could* recreate identical escape semantics to COPY > at nearly the same speed, but frankly we're puzzled as to why character > sequences with "\" automatically imply escape processing. I am confused why you are confused. :-) > We've found that there are many cases where more sophisticated escape > processing options are necessary, including multi-byte delimiters and escape > sequences, and the default use of "\" for escaping breaks many data load > cases. Uh, how do you do the escapes if you don't double the escape character on input so you can distinguish a literal escape from one use to mark special data like a literal delimiter or a null? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce, > Yep, you have to double backslahses coming in as data so we can use > backslash for marking null, delimiters, etc. I see no way around that, > and no one since Berkeley has come up with one either. If you have an > idea, we would _love_ to hear it. Apropos to the previous thread on escape processing, IMEO it needs to be switchable (e.g. No default of "\" as the escape character, escape processing not done by default). Otherwise lots of data loading scenarios (e.g. virtually all weblog sources) break in nasty ways. The remedy to date has been to perform extended preprocessing of the data to "pre-escape process" the data so that it will pass through the default escape processing of the PostgreSQL copy processing. I propose an extended syntax to COPY with a change in semantics to remove the default of "WITH ESCAPE '\'". - Luke
> > Yep, we would _love_ those improvements. > Coming soon, probably from the guy you've never heard of :-) > > I am confused why you are confused. :-) > > Uh, how do you do the escapes if you don't double the escape character > on input so you can distinguish a literal escape from one use to mark > special data like a literal delimiter or a null? Escape processing would proceed as before, but the semantics would change to allow the use of different characters as the escape character, in addition to the special characters for delimiter and newline. Also, escape processing would be "false" as the default, so that the only special characters by default would be the newline and delimiter characters. Also of importance is the specification of newline and delimiter as arbitrary double byte or 8-bit characters. - Luke
Luke Lonergan wrote: > I propose an extended syntax to COPY with a change in semantics to remove > the default of "WITH ESCAPE '\'". Er, doesn't this break existing database dumps? -O
Luke Lonergan wrote: > > > > > Yep, we would _love_ those improvements. > > > > Coming soon, probably from the guy you've never heard of :-) LOL > > > > I am confused why you are confused. :-) > > > > Uh, how do you do the escapes if you don't double the escape character > > on input so you can distinguish a literal escape from one use to mark > > special data like a literal delimiter or a null? > > Escape processing would proceed as before, but the semantics would change to > allow the use of different characters as the escape character, in addition > to the special characters for delimiter and newline. Also, escape > processing would be "false" as the default, so that the only special > characters by default would be the newline and delimiter characters. > > Also of importance is the specification of newline and delimiter as > arbitrary double byte or 8-bit characters. I am still confused how you have reliable, never-break semantics without special escaping. How do you distinguis an escape-delimiter used to escape a delimiter in the data from a literal escape-delimiter in the data being loaded --- it seems impossible to do. The idea of allowing a different escape character is interesting, however, and certainly possible. Right now we allow ESCAPE to be changed only in CSV mode, but I suppose it is possible to allow it to be changed in non-CSV mode as well. Or are you saying there would be no escape at all. If you make '@' the escape, you can't just say @n is a newline because you need to make '@' output as '@@' so you can distinguish @-n from a newline. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Oliver Jowett wrote: > Luke Lonergan wrote: > > > I propose an extended syntax to COPY with a change in semantics to remove > > the default of "WITH ESCAPE '\'". > > Er, doesn't this break existing database dumps? Right, we will not change the default. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Oliver, On 6/1/05 7:13 PM, "Oliver Jowett" <oliver@opencloud.com> wrote: > Luke Lonergan wrote: > >> I propose an extended syntax to COPY with a change in semantics to remove >> the default of "WITH ESCAPE '\'". > > Er, doesn't this break existing database dumps? Yes, one of the previously stated reasons to create another command for loading data. Another possible approach is to keep the default, but allow the escape processing to be turned off. - Luke
> How do you distinguis an escape-delimiter used to escape a delimiter in > the data from a literal escape-delimiter in the data being loaded --- it > seems impossible to do. The usual approach is fine - a pair of escapes is equivalent to a literal escape. > Or are you saying there would be no escape at all. If you make '@' the > escape, you can't just say @n is a newline because you need to make '@' > output as '@@' so you can distinguish @-n from a newline. Yup - which is why when we've implemented this set of semantics in the past we've allowed for the use of arbitrary 8-bit escape characters. Binary characters (128-255) will appear fairly regularly in web log data, and as Alon pointed out previously, many text fields include backslashes (particularly Windows friendly ones). More flexibility in string handling is needed. Luke
On Wed, Jun 01, 2005 at 07:35:33PM -0700, Luke Lonergan wrote: > >> I propose an extended syntax to COPY with a change in semantics to remove > >> the default of "WITH ESCAPE '\'". > > > > Er, doesn't this break existing database dumps? > > Yes, one of the previously stated reasons to create another command for > loading data. > > Another possible approach is to keep the default, but allow the escape > processing to be turned off. I've been following this thread, and I'm a little confused. Could you possibly clarify what you mean, by providing a couple of lines of input as it would be formatted with escape processing turned off - containing a text field with an embedded newline and tab and a null field. Cheers, Steve
On Wed, Jun 01, 2005 at 10:35:30AM -0700, Alon Goldshuv wrote: > 2) A modified command syntax for introducing a direct single row error > handling. By direct I mean - a row that if rejected from within the COPY > command context does not throw an error and rollsback the whole transaction. > Instead the error is caught and recorded elsewhere, maybe in some error > table, with some more information that can later on be retrieved. The > following rows continue to be processed. This way there is barely any error > handling overhead. Is there any idea on exactly how would this be done? Do you plan on using savepoints to implement it? I fail to see how is this "barely any overhead". Savepoints are not that expensive but they are not free either. (No, I haven't measured it.) -- Alvaro Herrera (<alvherre[a]surnet.cl>) "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)
Steve, > I've been following this thread, and I'm a little confused. Could you > possibly clarify what you mean, by providing a couple of lines of > input as it would be formatted with escape processing turned off - > containing a text field with an embedded newline and tab and a null field. Using an extended command syntax for a "copy-like" command named LOAD: LOAD [schema.]tablename [(column1,column2,...)] FROM {'filename' | STDIN} [ [WITH] [DELIMITER [AS] 'delimiter'] [NULL [AS] 'null string'] [ESCAPE [AS] escape¹] ] ; If you intend to support embedded newlines (0x0a) in your character data without escapes, you will need to choose an alternative newline character for formatting the data. An example that uses '0xaa' as the newline might be: Control statement: LOAD webform (formdata) FROM /home/sample/sample.txt WITH DELIMITER |¹ NULL ¹ NEWLINE '0xaa'; Sample with 2 identical rows (with binary representations depicted between <>): Blahblah<0xaa>blahblah<0x09>blahblah<0x00>blahblah<0xaa>Blahblah<0xaa>blahbl ah<0x09>blahblah<0x00>blahblah<0xaa> - Luke
yeah, this is all a bit confusing, but I *hope* I can clarify things here as I think I got a better understanding now. before that, let me just clarify that the performance improvements in the patch have nothing to do with the escaping mechanizm.Escapes could change. The performance gain in due to a buffered processing with minimal line/attribute buffer loads. I think that the basic issue is that there are some database users that would like to take their data and put it into thedatabase without pre-processing it - regardless if there are any backslashes in it or 0x0D (CR's) etc... these are theusers I am targeting in my patch as these are the users I ran into in the field. The only responsibility of these usersis to explicitly escape any delimiter or 0x0A (LF) characters that they intend to have as DATA. that's all. On the other hand there are users that would like to pre-process their data with C-escape sequences (or alternatevly, usersthat already have their data escaped) - this is what the postgres COPY targets these days. 2 different ways to do it... none of them is right or wrong. Examples: users that my patch targets may have a data row as such (delim = '|', EOL = [LF]): c:\one\two|d:\ten\nine[LF] using the way i do escaping in my patch those 2 fields of data will end up in the DB as Field 1: c:\one\two Field 2: d:\ten\nine which is what the user would want. If they wanted to have a pipe char in the second field they could escape it as such: d:\ten\ninehere is a pipe \| [LF] and no error will occur, and result will be: Field 2: d:\ten\nine here is a pipe | If you try to insert that first data line above using the existing COPY command you will get an undesired result: Field 1: c:one wo Field 2: d: ine Now, the other way around, users that do intend for their data to have escape sequences in it may have a line like this: that's a \t tab| and this is a \nline feed [LF] and will get the desired result of: Field 1: that's a tab Field 2: and this is a line feed while using my code they will get undesired results: Field 1: that's a \t tab Field 2: and this is a \nline feed so, basically it really depends on the target audience... Bruce, does that sounds right to you? Alon. -----Original Message----- From: pgsql-hackers-owner@postgresql.org on behalf of Steve Atkins Sent: Wed 6/1/2005 10:47 PM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] NOLOGGING option, or ? On Wed, Jun 01, 2005 at 07:35:33PM -0700, Luke Lonergan wrote: > >> I propose an extended syntax to COPY with a change in semantics to remove > >> the default of "WITH ESCAPE '\'". > > > > Er, doesn't this break existing database dumps? > > Yes, one of the previously stated reasons to create another command for > loading data. > > Another possible approach is to keep the default, but allow the escape > processing to be turned off. I've been following this thread, and I'm a little confused. Could you possibly clarify what you mean, by providing a couple of lines of input as it would be formatted with escape processing turned off - containing a text field with an embedded newline and tab and a null field. Cheers, Steve ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
>I've been following this thread, and I'm a little confused. Could you possibly clarify what you mean, by providing a couple of lines of input as it would be formatted with escape processing turned off - containing a text field with an embedded newline and tab and a null field. yeah, this is all a bit confusing, but I *hope* I can clarify things here as I think I got a better understanding now. before that, let me just clarify that the performance improvements in the patch have nothing to do with the escaping mechanizm.Escapes could change. The performance gain in due to a buffered processing with minimal line/attribute buffer loads. I think that the basic issue is that there are some database users that would like to take their data and put it into thedatabase without pre-processing it - regardless if there are any backslashes in it or 0x0D (CR's) etc... these are theusers I am targeting in my patch as these are the users I ran into in the field. The only responsibility of these usersis to explicitly escape any delimiter or 0x0A (LF) characters that they intend to have as DATA. that's all. On the other hand there are users that would like to pre-process their data with C-escape sequences (or alternatevly, usersthat already have their data escaped) - this is what the postgres COPY targets these days. 2 different ways to do it... none of them is right or wrong. Examples: users that my patch targets may have a data row as such (delim = '|', EOL = [LF]): c:\one\two|d:\ten\nine[LF] using the way i do escaping in my patch those 2 fields of data will end up in the DB as Field 1: c:\one\two Field 2: d:\ten\nine which is what the user would want. If they wanted to have a pipe char in the second field they could escape it as such: d:\ten\ninehere is a pipe \| [LF] and no error will occur, and result will be: Field 2: d:\ten\nine here is a pipe | If you try to insert that first data line above using the existing COPY command you will get an undesired result: Field 1: c:one wo Field 2: d: ine Now, the other way around, users that do intend for their data to have escape sequences in it may have a line like this: that's a \t tab| and this is a \nline feed [LF] and will get the desired result of: Field 1: that's a tab Field 2: and this is a line feed while using my code they will get undesired results: Field 1: that's a \t tab Field 2: and this is a \nline feed so, basically it really depends on the target audience... Bruce, does that sounds right to you? Alon. -----Original Message----- From: pgsql-hackers-owner@postgresql.org on behalf of Steve Atkins Sent: Wed 6/1/2005 10:47 PM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] NOLOGGING option, or ? On Wed, Jun 01, 2005 at 07:35:33PM -0700, Luke Lonergan wrote: > >> I propose an extended syntax to COPY with a change in semantics to remove > >> the default of "WITH ESCAPE '\'". > > > > Er, doesn't this break existing database dumps? > > Yes, one of the previously stated reasons to create another command for > loading data. > > Another possible approach is to keep the default, but allow the escape > processing to be turned off. I've been following this thread, and I'm a little confused. Could you possibly clarify what you mean, by providing a couple of lines of input as it would be formatted with escape processing turned off - containing a text field with an embedded newline and tab and a null field. Cheers, Steve ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster --------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
Steve, Oops. Example below should have read differently: > Sample with 2 identical rows (with binary representations depicted between > <>): > Blahblah<0xaa>blahblah<0x09>blahblah<0x00>blahblah<0xaa>Blahblah<0xaa>blahbl > ah<0x09>blahblah<0x00>blahblah<0xaa> Blahblah<0x0a>blahblah<0x09>blahblah<0x00>blahblah<0xaa>Blahblah<0x0a>blahbl ah<0x09>blahblah<0x00>blahblah<0xaa> This would result in the load of two records each of which would look like this (when printed on a typical terminal): Blahblah blahblah blahblah Luke
On Thu, Jun 02, 2005 at 12:30:01AM -0400, Alon Goldshuv wrote: > before that, let me just clarify that the performance improvements in > the patch have nothing to do with the escaping mechanizm. Escapes > could change. The performance gain in due to a buffered processing > with minimal line/attribute buffer loads. May I suggest you present them as separate issues and separate patches? This way, a patch with the performance improvements is very likely to get merged; simultaneously we can discuss changes to the escaping mechanism until everyone agrees (or not) and then produce a patch as appropiate. -- Alvaro Herrera (<alvherre[a]surnet.cl>) Este mail se entrega garantizadamente 100% libre de sarcasmo.
Alon Goldshuv wrote: > I think that the basic issue is that there are some database users that would like to take their data and put it into thedatabase without pre-processing it [...] > The only responsibility of these users is to explicitly escape any delimiter or 0x0A (LF) characters that they intend tohave as DATA. that's all. Haven't you just replaced one preprocessing step with another, then? -O
On K, 2005-06-01 at 18:05 -0400, Bruce Momjian wrote: > What we could do is to do no-WAL automatically for empty tables (like > when a database is first loaded), You forget that some databases use WAL for PITR / replication and doing it automatically there would surely mess up their replica. How is index creation handeled if it is not logged in WAL ? - is it not automatically WAL'ed ? - Must one recreate indexesafter PITR or failover ? > and use the flag for cases where the > tables is not zero pages. The fact is that database loads are a prefect > case for this optimization and old dumps are not going to have that flag > anyway, and automatic is better if we can do it. -- Hannu Krosing <hannu@tm.ee>
On K, 2005-06-01 at 11:31 -0400, Tom Lane wrote: > Hannu Krosing <hannu@skype.net> writes: > > I think this should be a decision done when creating a table, just like > > TEMP tables. So you always know if a certain table is or is not > > safe/replicated/recoverable. > > This has also the advantage of requiring no changes to actual COPY and > > INSERT commands. > > That doesn't seem right to me; the scenario I envision is that you are > willing to do the initial data loading over again (since you presumably > still have the source data available). But once you've got it loaded > you want full protection. What I mean, was that as it can't be safely replicated using log- shipping, It should be visible as such. > Perhaps it could work to use an ALTER TABLE command to flip the state. No. It would be the same as flipping a TEMP table to an ordinary table, which we don't support, and IMHO for a good reason > But I'm not really seeing the point compared to treating it as a COPY > option. The point is having a separate (sub)type of storage - non-WAL/non- replicated table and its indexes. > I do not believe that anyone needs this to work on individual > INSERT commands --- if you are after max speed, why aren't you using > COPY? And treating it as an ALTER property opens the possibility of > forgetting to ALTER the table back to normal behavior, which would be > a foot-gun of large caliber indeed :-( That's what I'm trying to avoid - If it is obvious, that the whole table is quasi-stable (in PITR/log-shipping sense) it is more clearly a user choice what kinds of data can be stored there. Same as TEMP tables again. -- Hannu Krosing <hannu@tm.ee>
> Escape processing would proceed as before, but the semantics would change to > allow the use of different characters as the escape character, in addition > to the special characters for delimiter and newline. If you mean syntax to specify escape and delimiter (and newline ?), that is a great addition, that imho all would like. > Also, escape > processing would be "false" as the default, so that the only special > characters by default would be the newline and delimiter characters. I don't see how that would be any advantage ? What is so wrong about having syntax to choose no escape processing, like "escape ''" ? Andreas
Oliver, > Haven't you just replaced one preprocessing step with another, then? Generally not. The most common problem with the current choice of escape character is that there are *lots* of data load scenarios with backslash in the text strings. The extra preprocessing to escape them is unnecessary on other databases and, in effect, causes the load to be even slower because you have to prepare the data ahead of time. Also, note that this patch can also do escape processing and the net result will still be 5+ times faster than what is there. In the data warehousing industry, data conversion and manipulation is normally kept distinct from data loading. Conversion is done by tools called ETL (Extract Transform Load) and the database will have a very fast path for direct loading of the resulting data. PostgreSQL is definitely a strange database right now in that there is a default filter applied to the data on load. It's even more strange because the load path is so slow, and now that we've found that the slowness is there mostly because of non-optimized parsing and attribute conversion routines. The question of how to do escape processing is a separate one, but is wrapped up in the question of whether to introduce a new loading routine or whether to optimize the old one. - Luke
Andreas, >> Escape processing would proceed as before, but the semantics would change to >> allow the use of different characters as the escape character, in addition >> to the special characters for delimiter and newline. > > If you mean syntax to specify escape and delimiter (and newline ?), > that is a great addition, that imho all would like. Cool. We've found it invaluable with customers doing web data analysis. > I don't see how that would be any advantage ? What is so wrong about having > syntax to > choose no escape processing, like "escape ''" ? I agree. - Luke
On Thu, Jun 02, 2005 at 07:33:13AM -0700, Luke Lonergan wrote: > Oliver, > > > Haven't you just replaced one preprocessing step with another, then? > > Generally not. The most common problem with the current choice of escape > character is that there are *lots* of data load scenarios with backslash in > the text strings. I can only think of one where it's common. Windows filenames. But if you're going to support arbitrary data in a load then whatever escape character you choose will appear sometimes. > The extra preprocessing to escape them is unnecessary on > other databases and, in effect, causes the load to be even slower because > you have to prepare the data ahead of time. > Also, note that this patch can also do escape processing and the net result > will still be 5+ times faster than what is there. I strongly suspect that a patch to improve performance without changing behaviour would be accepted with no questions asked. One that allowed specifying the field and record delimiters and the escape character and null symbol might require more discussion about an appropriate syntax at the very least. So you may want to separate the two. > In the data warehousing industry, data conversion and manipulation is > normally kept distinct from data loading. Conversion is done by tools > called ETL (Extract Transform Load) and the database will have a very fast > path for direct loading of the resulting data. PostgreSQL is definitely a > strange database right now in that there is a default filter applied to the > data on load. > It's even more strange because the load path is so slow, and now that we've > found that the slowness is there mostly because of non-optimized parsing and > attribute conversion routines. The question of how to do escape processing > is a separate one, but is wrapped up in the question of whether to introduce > a new loading routine or whether to optimize the old one. There are already two loader routines. One of them is text-based and is designed for easy generation of data load format using simple text manipulation tools by using delimiters. It also allows (unlike your suggestion) for loading of arbitrary data from a text file. Because it allows for arbitrary data and uses delimiters to separate fields it has to use an escaping mechanism. If you want to be able to load arbitrary data and not have to handle escape characters there's are two obvious ways to do it. The first is that used by MIME and suggested by you. That is to use a separator that you believe will not appear in the data. That can be done by using a long multicharacter separator containing random characters and assuming that sequence won't appear, it can be done by parsing the input data twice, looking for strings that don't appear for use as delimiters or it can take advantage of knowledge about what characters can and can not appear in the input data. (I can't imagine any case involving data-mining of web logs where the last is likely to be relevant). The other is to use length+data format for each tuple, avoiding all issues of escapes in the data and allowing arbitrary data to be represented. That's how the binary load format PG supports works, I believe. If you're really concerned about speed of load that may be a better format for your front-end to generate, perhaps? Cheers, Steve
Steve, > I can only think of one where it's common. Windows filenames. Nearly all weblog data then. > But if > you're going to support arbitrary data in a load then whatever escape > character you choose will appear sometimes. If we allow an 8-bit character set in the "text" file, then yes, any delimiter you choose has the potential to appear in your input data. In practice, with *mostly* 7-bit ASCII characters and even with international 8-bit text encodings, you can choose a delimiter and newline that work well. Exceptions are handled by the forthcoming single row error handling patch. > I strongly suspect that a patch to improve performance without changing > behaviour would be accepted with no questions asked. Understood - not sure it's the best thing for support of the users yet. We've found a large number of issues from customers with the unmodified behavior. > There are already two loader routines. One of them is text-based and is > designed for easy generation of data load format using simple text > manipulation tools by using delimiters. It also allows (unlike your > suggestion) for loading of arbitrary data from a text file. Not to distract, but try loading a binary null into a text field. The assumption of null terminated strings penetrates deep into the codebase. The existing system does not allow for loading arbitrary data from a text file. Our suggestion allows for escapes, but requires the ability to specify alternate characters or none. > Because it allows for arbitrary data and uses delimiters to separate > fields it has to use an escaping mechanism. > > If you want to be able to load arbitrary data and not have to handle > escape characters there's are two obvious ways to do it. Let's dispense with the notion that we're suggesting no escapes (see above). Binary with a bookends format is a fine idea and would be my personal preference if it were fast, which it isn't. Customers in the web log analysis and other data warehousing fields prefer "mostly 7-bit" ascii text input, which we're trying to support with this change. - Luke
>> 2) A modified command syntax for introducing a direct single row error >> handling. By direct I mean - a row that if rejected from within the COPY >> command context does not throw an error and rollsback the whole transaction. >> Instead the error is caught and recorded elsewhere, maybe in some error >> table, with some more information that can later on be retrieved. The >> following rows continue to be processed. This way there is barely any error >> handling overhead. > > Is there any idea on exactly how would this be done? Do you plan on > using savepoints to implement it? I fail to see how is this "barely any > overhead". Savepoints are not that expensive but they are not free either. > (No, I haven't measured it.) Good question, I am not entirely sure if this is possible yet, as I didn't think it through entirely yet. I guess data errors could be divided into 2 main categories: mal-formed data where error is detected even before forming a tuple, and the other is errors that are caused by some constraint violation, that is, after the tuple is formed and inserted. From what I hear and experience the big majority of errors are of the first type. In that case the error could be caught, the data line + line number + error description could be inserted into an ERROR table (all TEXT fields), and then COPY can skip forming a tuple, and move to parsing the next. In this process there is barely any overhead. The more difficult part obviously is handling the second error type, which I haven't looked at yet deeply. Hopefully it is not impossible to do while keeping transaction integrity (Any ideas anyone?). The overhead for this one will probably be larger, but again, we expect those to happen less (in most cases at least). Nevertheless, it is surely much faster than recursively narrowing down batch sizes. Alon.
"Luke Lonergan" <llonergan@greenplum.com> writes: > In the data warehousing industry, data conversion and manipulation is > normally kept distinct from data loading. It's a bit strange to call this conversion or manipulation. One way or another you have to escape whatever your delimiters are. How would you propose loading strings that contain newlines? The ETL transformations you're talking about are a different beast entirely. You're talking about things like canonicalizing case or looking up foreign key ids to replace strings and such. Simply parsing the file format properly isn't part of that game. Otherwise where do you stop? You could take this to a silly extreme and just say postgres should just load each line as a record with single text field and let "tools" deal with actually parsing. Or better yet, load the whole thing as a single big blob. Personally I would prefer to make prepared inserts as efficient as COPY and deprecate COPY. Then we could have an entirely client-side tool that handled as many formats as people want to implement without complicating the server. Things like various vintages of Excel, fixed column files, etc should all be handled as plugins for such a tool. That would have the side benefit of allowing people to do other batch jobs efficiently. Pipelining parameters to hundreds of executions of a prepared query in the network. Actually it seems like there's no particular reason the NOLOGGING option Tom described (where it only inserts on new pages, doesn't have any special WAL entries, just fsyncs at the end instead of WAL logging) can't work with arbitrary inserts. Somehow some state has to be preserved remembering which pages the nologging inserts have created and hold locks on. -- greg
Alon Goldshuv wrote: > >I've been following this thread, and I'm a little confused. Could you > possibly clarify what you mean, by providing a couple of lines of input > as it would be formatted with escape processing turned off - containing > a text field with an embedded newline and tab and a null field. > > > yeah, this is all a bit confusing, but I *hope* I can clarify things > here as I think I got a better understanding now. The basic problem with this thread is that it started with _conclusions_ (we need a LOAD DATA command, escapes are broken), and not with statements of fact (we need another way of specifying escapes, we have performance improvements). Any discussion that starts with conclusions instead of facts is bound to have this problem, and it often happens when a group discusses among themselves, outside the community, and appears with the conclusions, thinking they are helping us by not going into the details. As you can see, lack of facts actually hampers the discussion. What has me particularly concerned is someone saying that loading C:\TMP must be broken, and not understanding that the doubling of escapes is a major requirement to have data loaded reliably. Now, you can argue that a different escape should be possible, or that some other escape syntax could be used, but the existing mechanism is clearly 100% reliable when used properly and not broken. A quick email asking why C:\TMP doesn't load in properly would have yielded a much clearer conversation about why escaping is required in our current system, and the other options that should be explored. Saying escapes are broken and here is the fix really didn't get very far. I recommend you just start a new thread, with a new topic, and head in the _facts_ direction. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce, Point taken. > Now, you can argue that > a different escape should be possible, or that some other escape syntax > could be used, but the existing mechanism is clearly 100% reliable when > used properly and not broken. I think that having an option for another escape syntax (such as using ESCAPE AS clause in delimited format (non-csv) COPY) with a default of '\\' is a good compromise that will allow users to escape their data (like COPY currently is), or by specifying another escape character allow all of their backslashes to be treated as data. I'll start a new discussion about it on a new thread soon. Thx, Alon.
Alon Goldshuv wrote: > Bruce, > > Point taken. > > > Now, you can argue that > > a different escape should be possible, or that some other escape syntax > > could be used, but the existing mechanism is clearly 100% reliable when > > used properly and not broken. > > I think that having an option for another escape syntax (such as using > ESCAPE AS clause in delimited format (non-csv) COPY) with a default of '\\' > is a good compromise that will allow users to escape their data (like COPY > currently is), or by specifying another escape character allow all of their > backslashes to be treated as data. > > I'll start a new discussion about it on a new thread soon. Yep, great. Right now we only support single-byte escapes (and delimiters), so there should be a discussion if multiple byte values are useful too. Also, there was discussion of what the default should be. I don't think there is any agreement to change the existing defaults. You might argue for a new mode that sets certain defaults to be easier to load, but I am afraid of a new format that isn't 100% reliable, because it assumes that some sequence of bytes will never appear in the data. Anyway, these are ideas you can consider when making a proposal. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce, Is there a good source of multi-byte copy data test cases? What is currently done to test the trans-coding support? (where client and server encodings are different) I notice that the regression data in the CVS version of postgres does not seem to include cases other than the ASCII data, is there another source of data/cases we're missing? Also - Alon's looking into this, but it would appear that the presumption on EOL for two-byte encodings is 0x0a+0xNN, where 0x0a is followed by any byte. Similar for other current control characters (escape, delimiter). Is there a definition of format and semantics for COPY with 2-byte encodings we should look at? I've looked at the code and the docs like sql-copy.html and the question is relevant because of the following case: if newline were defined as 0x0a+0x00 as opposed to 0x0a+0xNN where N is arbitrary, we could parse using 16-bit logic.however if newline were defined as 0x0a+0xNN, we must use byte-wise parsing TIA - Luke
Luke Lonergan wrote: > Bruce, > > Is there a good source of multi-byte copy data test cases? What is > currently done to test the trans-coding support? (where client and server > encodings are different) > > I notice that the regression data in the CVS version of postgres does not > seem to include cases other than the ASCII data, is there another source of > data/cases we're missing? > > Also - Alon's looking into this, but it would appear that the presumption on > EOL for two-byte encodings is 0x0a+0xNN, where 0x0a is followed by any byte. > Similar for other current control characters (escape, delimiter). Is there > a definition of format and semantics for COPY with 2-byte encodings we > should look at? > > I've looked at the code and the docs like sql-copy.html and the question is > relevant because of the following case: > if newline were defined as 0x0a+0x00 as opposed to 0x0a+0xNN where N is > arbitrary, we could parse using 16-bit logic. > however > if newline were defined as 0x0a+0xNN, we must use byte-wise parsing We have two and three-byte encodings, so 16-bit seems like it wouldn't work. I am not aware of any specs except the C code itself. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce, > We have two and three-byte encodings, so 16-bit seems like it wouldn't > work. I am not aware of any specs except the C code itself. Ok - no problem. How about test data and cases? I see the SQL encoding examples used in src/test/regress/sql for testing encoding in SQL, but are there regressions for QA/test of multi-byte encoding support? If not, that's OK, but it would save us time if some were already written. WRT the COPY command, I'd like to have regressions that test the input of matched client/server encodings with different (standardized) multi-byte control characters. The current code seems to allow for an arbitrary second byte in control characters, so if we made a statement about control character support, I think it would be <ctl-byte><any-byte>...mblen...<any-byte> is allowed for specification of control characters (newline, delimiter). Luke
Luke Lonergan wrote: > Bruce, > > > We have two and three-byte encodings, so 16-bit seems like it wouldn't > > work. I am not aware of any specs except the C code itself. > > Ok - no problem. > > How about test data and cases? I see the SQL encoding examples used in > src/test/regress/sql for testing encoding in SQL, but are there regressions > for QA/test of multi-byte encoding support? If not, that's OK, but it would > save us time if some were already written. No, I don't think so, but the good news is that the existing code has always worked flawlessly. > WRT the COPY command, I'd like to have regressions that test the input of > matched client/server encodings with different (standardized) multi-byte > control characters. Makes sense, but how do we know what encodings the client supports? We would need some tests for that. > The current code seems to allow for an arbitrary second byte in control > characters, so if we made a statement about control character support, I > think it would be > <ctl-byte><any-byte>...mblen...<any-byte> > > is allowed for specification of control characters (newline, delimiter). I have no idea what you are talking about. Again, give me facts about what we currently don't do and what you want to do. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce, > > I have no idea what you are talking about. Again, give me facts about > what we currently don't do and what you want to do. Currently: - No statement of multi-byte control character format - No tests to define or prove "works flawlessly" or identify when something breaks the current operational state Desired: - Clear statement of multi-byte control character format - Tests that define what "works flawlessly" means - Luke