Thread: Implement UNLOGGED clause for COPY FROM
Hello, hackers. I've been thinking to suggest a peformance-oriented feature for COPY FROM. It's UNLOGGED clause, which means data loading skipping WAL generation. How to make it work is the following. 1. Aquire ACCESS EXCLUSIVE mode to lock the target table and its indexes. 2. Mark those relations 'unrecoverable' in pg_class/pg_index. 3. Issue one WAL to indicate when COPY UNLOGGED is executed. 4. Execute the data loading, bypassing WAL generation for data. 5. Sync the data to disk by performing checkpoint. During the recovery, I'd like to make postgres recognize both marked 'unrecoverable' flags of the second step and when the command was issued by the third step in order to recover data that the target table had before the execution of COPY UNLOGGED. Oracle's SQL*Loader has such a feature called UNRECOVERABLE to boost the loading speed for severe time limit of workload. Best, Takamichi Osumi
On 2020/07/09 11:36, osumi.takamichi@fujitsu.com wrote: > Hello, hackers. > > > I've been thinking to suggest > a peformance-oriented feature for COPY FROM. > It's UNLOGGED clause, which means data loading skipping WAL generation. This feature can work safely with wal_level=replica or logical? Or it can work only with wal_level=minimal? If yes, what is the main difference between this method and wal_skip_threshold? > > How to make it work is the following. > 1. Aquire ACCESS EXCLUSIVE mode to lock the target table and its indexes. > 2. Mark those relations 'unrecoverable' in pg_class/pg_index. > 3. Issue one WAL to indicate when COPY UNLOGGED is executed. > 4. Execute the data loading, bypassing WAL generation for data. > 5. Sync the data to disk by performing checkpoint. What happens if the server crashes before #5? Since no WAL for data-loading can be replayed, the target table should be truncated? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
5. Sync the data to disk by performing checkpoint.
Fujii-san Thank you for your interest in this idea. > This feature can work safely with wal_level=replica or logical? > Or it can work only with wal_level=minimal? >If yes, what is the main difference > between this method and wal_skip_threshold? I'm thinking this feature can be used when you set any parameters of wal_level. Besides that, data loading into a table *with some data* should be allowed. This means I don't want to limit the usage of this feature only for initial load for empty table or under condition of 'minimal' wal_level in other words. Let me explain more detail of the background. I got a report that one of my customers says that multiple COPY from multiple sessions have a bottleneck to write WAL. Her use case was DWH system using postgres mainly to load dozens of GB (or more) log data from multiple data sources to execute night batch processing everyday. Her scenario included both initial load to empty table and load to table that already has records. In passing, she also used our company's product of parallel loader, to load data with dozens of, nearly 100, BGWs at the same time. Through investigation of iostat, they found the same problem that CPU worked for WAL write intensively. This could happen after the implementation of Parallel copy that is really hotly discussed and reviewed in the mailing lists now. So I thought it's good to discuss this in advance. > > 4. Execute the data loading, bypassing WAL generation for data. > > 5. Sync the data to disk by performing checkpoint. > > What happens if the server crashes before #5? Since no WAL for data-loading can > be replayed, the target table should be truncated? My answer for this is just to load that COPY data again. It's because the application itself knows what kind of data was loaded from the command. Lastly, let me add some functional specifications of this clause. The syntax is "COPY tbl FROM ‘/path/to/input/file’ UNLOGGED". In terms of streaming replication, I'd like to ask for advice of other members in this community. Now, I think this feature requires to re-create standby immediately after the COPY UNLOGGED like Oracle's clause but I wanna make postgres more attractive than Oracle to users. Does someone have any ideas ? Regards, Takamichi Osumi
From: David G. Johnston <david.g.johnston@gmail.com>
> This step seems to invalidate the idea outright. The checkpoint command is superuser only and isn’t table specific. This seems to require both those things to be changed.
Perhaps FlushRelationBuffers() followed by smgrsync() can be used instead. Or, depending on the assumed use case (e.g. the DBA adds data regularly for analytics), we may allow COPY UNLOGGED to be used only by superusers and some special pg_ roles, and COPY UNLOGGED performs checkpoints. Anyway, I kind of feel that COPY UNLOGGED needs some special privileges, because it renders the table unrecoverable and not being replicated to the standby.
Regards
Takayuki Tsunakawa
On Thu, Jul 9, 2020 at 11:47 AM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > > In terms of streaming replication, > I'd like to ask for advice of other members in this community. > Now, I think this feature requires to re-create standby > immediately after the COPY UNLOGGED like Oracle's clause > This seems quite limiting to me and I think the same will be true for subscribers that get data via logical replication, right? I suspect that the user will perform such an operation from time-to-time and each time creating replica again could be really time-consuming and maybe more than it will save by making this operation unlogged. I wonder do they really need to replicate such a table and its data because each time creating a replica from scratch after an operation on one table doesn't sound advisable to me? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020/07/09 15:17, osumi.takamichi@fujitsu.com wrote: > Fujii-san > > Thank you for your interest in this idea. > >> This feature can work safely with wal_level=replica or logical? >> Or it can work only with wal_level=minimal? >> If yes, what is the main difference >> between this method and wal_skip_threshold? > I'm thinking this feature can be used > when you set any parameters of wal_level. > Besides that, data loading into a table *with some data* > should be allowed. This means I don't want to limit > the usage of this feature only for initial load > for empty table or under condition of 'minimal' wal_level in other words. > > Let me explain more detail of the background. > > I got a report that one of my customers says that > multiple COPY from multiple sessions have a bottleneck to write WAL. > Her use case was DWH system using postgres mainly to load dozens of GB (or more) log data > from multiple data sources to execute night batch processing everyday. > > Her scenario included both initial load to empty table > and load to table that already has records. Yes, I understand this use case. > > In passing, she also used our company's product of parallel loader, > to load data with dozens of, nearly 100, BGWs at the same time. > Through investigation of iostat, > they found the same problem that CPU worked for WAL write intensively. > > This could happen after the implementation > of Parallel copy that is really hotly discussed and reviewed in the mailing lists now. > So I thought it's good to discuss this in advance. > >>> 4. Execute the data loading, bypassing WAL generation for data. >>> 5. Sync the data to disk by performing checkpoint. >> >> What happens if the server crashes before #5? Since no WAL for data-loading can >> be replayed, the target table should be truncated? > My answer for this is just to load that COPY data again. > It's because the application itself knows what kind of data was loaded > from the command. When the server crashes before #5, some table and index pages that #4 loaded the data into might have been partially synced to the disk because of bgwriter or shared buffer replacement. So ISTM that the target table needs to be truncated to empty during recovery and users need to load whole the data into the table again. Is my understanding right? If yes, isn't this the same feature as that UNLOGGED table provides? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Hi David Johnston
Thank you for your comment.
Aside from that, though, how does this improve upon the existing capability to copy into an unlogged temporary table?
[>] unlogged temporary table can’t be inherited over sessions first of all.
And unlogged table needs to be recreated due to startup truncation of the table’s content
when the server crashes.
If you hold massive data in an unlogged table,
you’d forced to spend much time to recover it. This isn’t good.
So I’m thinking that COPY UNLOGGED’d provide a more flexible way for keeping data
for COPY FROM users.
I’m considering that the feature gives them a choice that during ordinary operation
you can keep WAL logging for a target table and when you need high-speed loading
you can bypass WAL generation for it.
To achieve this, we have to
consider a new idea like loaded data’d be added
at the end of the all other pages and detach those
if the server crashes during the UNLOGGED loading processing for example.
By the way, “ALTER TABLE tbl SET UNLOGGED” is supported by postgres.
You may think it’s OK to change LOGGED table to UNLOGGED table by this command.
But, it copies the whole relation once actually. (This isn’t written in the manual.)
So this command becomes slow if the table the command is applied to contains a lot of data.
Thus changing the table’s status of UNLOGGED/LOGGED also requires cost at the moment and I think this copy is an obstacle for switching that table’s status.
The discussion of the reason is written in the url below.
Best
Takamichi Osumi
"osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> writes: >> Aside from that, though, how does this improve upon the existing capability to copy into an unlogged temporary table? > [>] unlogged temporary table can’t be inherited over sessions first of all. Unlogged tables don't have to be temporary. > And unlogged table needs to be recreated due to startup truncation of the table’s content > when the server crashes. Indeed, and your proposed feature would extend that un-safety to tables that are NOT marked unlogged, which is not good. AFAICS, we can already accomplish basically the same thing as what you want to do like this: alter table foo set unlogged; copy foo from ...; alter table foo set logged; The mechanics of that are already well worked out. It's expensive, no doubt about that, but I think you're just fooling yourself to imagine that any shortcuts are possible. A mix of unlogged and logged data is never going to work safely. > To achieve this, we have to > consider a new idea like loaded data’d be added > at the end of the all other pages and detach those > if the server crashes during the UNLOGGED loading processing for example. You keep on ignoring the indexes... not to mention replication. regards, tom lane
Hi, > AFAICS, we can already accomplish basically the same thing as what you want to > do like this: > > alter table foo set unlogged; > copy foo from ...; > alter table foo set logged; This didn't satisfy what I wanted. In case that 'foo' has huge amount of rows at the beginning, this example would spend much time to copy the contents of 'foo' twice to swap relfilenodes atomically. When that loaded data by COPY is big too, its execution time becomes much longer. > You keep on ignoring the indexes... not to mention replication. Sorry for having made you think like this. When the server crash occurs during data loading of COPY UNLOGGED, it's a must to keep index consistent of course. I'm thinking that to rebuild the indexes on the target table would work. In my opinion, UNLOGGED clause must be designed to guarantee that where the data loaded by this clause is written starts from the end of all other data blocks. Plus, those blocks needs to be protected by any write of other transactions during the copy. Apart from that, the server must be aware of which block is the first block, or the range about where it started or ended in preparation for the crash. During the crash recovery, those points are helpful to recognize and detach such blocks in order to solve a situation that the loaded data is partially synced to the disk and the rest isn't. In terms of index, we can recreate index based on the relation's data protected by this mechanism above. Another idea of index crash recovery was to copy the indexes on the target table as a backup just before loading and write new added indexes from loaded data into this temporary index files in order to localize the new indexes. But, my purpose is to accelerate speed of data loading under the condition that target table has huge amount of data initially. Taking this purpose into an evaluation criterion, the initial copy of indexes would make the execution slow down. Thus, I choose rebuilding index. Another point I need to add for recovery would be how the startup postgres knows the condition of COPY UNLOGGED clause. My current idea is to utilize any other system file or create a new system file in the cluster for this clause. At least, it would become necessary for postgres to identify which blocks should be detached at the beginning when the command is executed. Therefore, we need add information for it. Lastly, I have to admit that the status of target table where data is loaded by COPY UNLOGGED would be marked as invalid and notified to standbys under replication environment from the point in time when the operation takes place. But, I'm going to allow users with special privileges (like DBA) to use this clause and this kind of tables would be judged by them not to replicate. Of course, I'm thinking better idea but now what I can say is like this for replication. Best, Takamichi Osumi
On Fri, Jul 17, 2020 at 9:53 AM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > Lastly, I have to admit that > the status of target table where data is loaded by COPY UNLOGGED would be marked > as invalid and notified to standbys under replication environment > from the point in time when the operation takes place. > But, I'm going to allow users with special privileges (like DBA) to use this clause > and this kind of tables would be judged by them not to replicate. > Of course, I'm thinking better idea but now what I can say is like this for replication. > If you are going to suggest users not to replicate such tables then why can't you suggest them to create such tables as UNLOGGED in the first place? Another idea could be that you create an 'unlogged' table, copy the data to it. Then perform Alter Table .. SET Logged and attach it to the main table. I think for this you need the main table to be partitioned but I think if that is possible then it might be better than all the hacking you are proposing to do in the server for this special operation. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi. Amit-san > If you are going to suggest users not to replicate such tables then why can't you > suggest them to create such tables as UNLOGGED in the first place? Another > idea could be that you create an 'unlogged' > table, copy the data to it. Then perform Alter Table .. SET Logged and attach it to > the main table. I think for this you need the main table to be partitioned but I > think if that is possible then it might be better than all the hacking you are > proposing to do in the server for this special operation. Thank you for your comment. At the beginning, I should have mentioned this function was for data warehouse, where you need to load large amounts of data in the shortest amount of time. Sorry for my bad explanation. Based on the fact that data warehouse cannot be separated from usage of applications like B.I. tool in general, we cannot define unlogged table at the beginning easily. Basically, such tools don't support to define unlogged table as far as I know. And if you want to do so, you need *modification or fix of existing application* which is implemented by a third party and commercially available for data analytics. In other words, to make CREATE UNLOGGED TABLE available in that application, you must revise the product's source code of the application directly, which is an act to invalidate the warranty from the software company of B.I. tool. In my opinion, it would be like unrealistic for everyone to do so. Best, Takamichi Osumi
On Wed, Jul 22, 2020 at 11:11 AM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > > If you are going to suggest users not to replicate such tables then why can't you > > suggest them to create such tables as UNLOGGED in the first place? Another > > idea could be that you create an 'unlogged' > > table, copy the data to it. Then perform Alter Table .. SET Logged and attach it to > > the main table. I think for this you need the main table to be partitioned but I > > think if that is possible then it might be better than all the hacking you are > > proposing to do in the server for this special operation. > Thank you for your comment. > > At the beginning, I should have mentioned this function was > for data warehouse, where you need to load large amounts of data > in the shortest amount of time. > Sorry for my bad explanation. > > Based on the fact that data warehouse cannot be separated from > usage of applications like B.I. tool in general, > we cannot define unlogged table at the beginning easily. > Basically, such tools don't support to define unlogged table as far as I know. > > And if you want to do so, you need *modification or fix of existing application* > which is implemented by a third party and commercially available for data analytics. > In other words, to make CREATE UNLOGGED TABLE available in that application, > you must revise the product's source code of the application directly, > which is an act to invalidate the warranty from the software company of B.I. tool. > In my opinion, it would be like unrealistic for everyone to do so. > So, does this mean that the need for data warehouse application can be satisfied if the table would have been an 'UNLOGGED'? However, you still need 'COPY UNLOGGED ..' syntax because they don't have control over table definition. I think if this is the case, then the application user should find a way for this. BTW, if the application is anyway going to execute a PG native syntax like 'COPY UNLOGGED ..' then why can't they simply set tables as UNLOGGED by using Alter Table? IIUC, I think here the case is that the applications are not allowed to change the standard definitions of tables owned/created by B.I tool and instead they use some Copy Unlogged sort of syntax provided by other databases to load the data at a faster speed and now they expect PG to also have a similar alternative. If that is true, then it is possible that those database doesn't have 'Create Unlogged Table ...' types of syntax which PG has and that is why they have provided such an alternative for Copy kind of commands. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, 17 Jul 2020 at 13:23, osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > Hi, > > > AFAICS, we can already accomplish basically the same thing as what you want to > > do like this: > > > > alter table foo set unlogged; > > copy foo from ...; > > alter table foo set logged; > This didn't satisfy what I wanted. > In case that 'foo' has huge amount of rows at the beginning, > this example would spend much time to copy > the contents of 'foo' twice to swap relfilenodes atomically. > When that loaded data by COPY is big too, its execution time becomes much longer. > > > You keep on ignoring the indexes... not to mention replication. > Sorry for having made you think like this. > > When the server crash occurs during data loading of COPY UNLOGGED, > it's a must to keep index consistent of course. > I'm thinking that to rebuild the indexes on the target table would work. > > In my opinion, UNLOGGED clause must be designed to guarantee that > where the data loaded by this clause is written starts from the end of all other data blocks. > Plus, those blocks needs to be protected by any write of other transactions during the copy. > Apart from that, the server must be aware of which block is the first block, > or the range about where it started or ended in preparation for the crash. > > During the crash recovery, those points are helpful to recognize and detach such blocks > in order to solve a situation that the loaded data is partially synced to the disk and the rest isn't. How do online backup and archive recovery work? Suppose that the user executes pg_basebackup during COPY UNLOGGED running, the physical backup might have the portion of tuples loaded by COPY UNLOGGED but these data are not recovered. It might not be a problem because the operation is performed without WAL records. But what if an insertion happens after COPY UNLOGGED but before pg_stop_backup()? I think that a new tuple could be inserted at the end of the table, following the data loaded by COPY UNLOGGED. With your approach described above, the newly inserted tuple will be recovered during archive recovery, but it either will be removed if we replay the insertion WAL then truncate the table or won’t be inserted due to missing block if we truncate the table then replay the insertion WAL, resulting in losing the tuple although the user got successful of insertion. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hello. Apologies for the delay. > > When the server crash occurs during data loading of COPY UNLOGGED, > > it's a must to keep index consistent of course. > > I'm thinking that to rebuild the indexes on the target table would work. > > > > In my opinion, UNLOGGED clause must be designed to guarantee that > > where the data loaded by this clause is written starts from the end of all other > data blocks. > > Plus, those blocks needs to be protected by any write of other transactions > during the copy. > > Apart from that, the server must be aware of which block is the first > > block, or the range about where it started or ended in preparation for the crash. > > > > During the crash recovery, those points are helpful to recognize and > > detach such blocks in order to solve a situation that the loaded data is partially > synced to the disk and the rest isn't. > > How do online backup and archive recovery work ? > > Suppose that the user executes pg_basebackup during COPY UNLOGGED running, > the physical backup might have the portion of tuples loaded by COPY UNLOGGED > but these data are not recovered. It might not be a problem because the operation > is performed without WAL records. But what if an insertion happens after COPY > UNLOGGED but before pg_stop_backup()? I think that a new tuple could be > inserted at the end of the table, following the data loaded by COPY UNLOGGED. > With your approach described above, the newly inserted tuple will be recovered > during archive recovery, but it either will be removed if we replay the insertion > WAL then truncate the table or won’t be inserted due to missing block if we > truncate the table then replay the insertion WAL, resulting in losing the tuple > although the user got successful of insertion. I consider that from the point in time when COPY UNLOGGED is executed, any subsequent operations to the data which comes from UNLOGGED operation also cannot be recovered even if those issued WAL. This is basically inevitable because subsequent operations after COPY UNLOGGED depend on blocks of loaded data without WAL, which means we cannot replay exact operations. Therefore, all I can do is to guarantee that when one recovery process ends, the target table returns to the state immediately before the COPY UNLOGGED is executed. This could be achieved by issuing and notifying the server of an invalidation WAL, an indicator to stop WAL application toward one specific table after this new type of WAL. I think I need to implement this mechanism as well for this feature. Thus, I'll take a measure against your concern of confusing data loss. For recovery of the loaded data itself, the user of this clause, like DBA or administrator of data warehouse for instance, would need to make a backup just after the data loading. For some developers, this behavior would seem incomplete because of the heavy user's burden. On the other hand, I'm aware of a fact that Oracle Database has a feature of UNRECOVERABLE clause, which is equivalent to what I'm suggesting now in this thread. This data loading without REDO log by the clause is more convenient than what I said above, because it's supported by a tool named Recovery Manager which enables users to make an incremental backup. This works to back up only the changed blocks since the previous backup and remove the manual burden from the user like above. Here, I have to admit that I cannot design and implement this kind of synergistic pair of all features at once for data warehousing. So I'd like to make COPY UNLOGGED as the first step. This is the URL of how Oracle database for data warehouse achieves the backup of no log operation while acquiring high speedof data loading. https://docs.oracle.com/database/121/VLDBG/GUID-42825ED1-C4C5-449B-870F-D2C8627CBF86.htm#VLDBG1578 Best, Takamichi Osumi
At Thu, 20 Aug 2020 00:18:52 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in > Hello. > > Apologies for the delay. > > > When the server crash occurs during data loading of COPY UNLOGGED, > > > it's a must to keep index consistent of course. > > > I'm thinking that to rebuild the indexes on the target table would work. > > > > > > In my opinion, UNLOGGED clause must be designed to guarantee that > > > where the data loaded by this clause is written starts from the end of all other > > data blocks. > > > Plus, those blocks needs to be protected by any write of other transactions > > during the copy. > > > Apart from that, the server must be aware of which block is the first > > > block, or the range about where it started or ended in preparation for the crash. > > > > > > During the crash recovery, those points are helpful to recognize and > > > detach such blocks in order to solve a situation that the loaded data is partially > > synced to the disk and the rest isn't. > > > > How do online backup and archive recovery work ? > > > > Suppose that the user executes pg_basebackup during COPY UNLOGGED running, > > the physical backup might have the portion of tuples loaded by COPY UNLOGGED > > but these data are not recovered. It might not be a problem because the operation > > is performed without WAL records. But what if an insertion happens after COPY > > UNLOGGED but before pg_stop_backup()? I think that a new tuple could be > > inserted at the end of the table, following the data loaded by COPY UNLOGGED. > > With your approach described above, the newly inserted tuple will be recovered > > during archive recovery, but it either will be removed if we replay the insertion > > WAL then truncate the table or won’t be inserted due to missing block if we > > truncate the table then replay the insertion WAL, resulting in losing the tuple > > although the user got successful of insertion. > I consider that from the point in time when COPY UNLOGGED is executed, > any subsequent operations to the data which comes from UNLOGGED operation > also cannot be recovered even if those issued WAL. > > This is basically inevitable because subsequent operations > after COPY UNLOGGED depend on blocks of loaded data without WAL, > which means we cannot replay exact operations. > > Therefore, all I can do is to guarantee that > when one recovery process ends, the target table returns to the state > immediately before the COPY UNLOGGED is executed. > This could be achieved by issuing and notifying the server of an invalidation WAL, > an indicator to stop WAL application toward one specific table after this new type of WAL. > I think I need to implement this mechanism as well for this feature. > Thus, I'll take a measure against your concern of confusing data loss. > > For recovery of the loaded data itself, the user of this clause, > like DBA or administrator of data warehouse for instance, > would need to make a backup just after the data loading. > For some developers, this behavior would seem incomplete because of the heavy user's burden. > > On the other hand, I'm aware of a fact that Oracle Database has a feature of UNRECOVERABLE clause, > which is equivalent to what I'm suggesting now in this thread. > > This data loading without REDO log by the clause is more convenient than what I said above, > because it's supported by a tool named Recovery Manager which enables users to make an incremental backup. > This works to back up only the changed blocks since the previous backup and > remove the manual burden from the user like above. > Here, I have to admit that I cannot design and implement > this kind of synergistic pair of all features at once for data warehousing. > So I'd like to make COPY UNLOGGED as the first step. > > This is the URL of how Oracle database for data warehouse achieves the backup of no log operation while acquiring highspeed of data loading. > https://docs.oracle.com/database/121/VLDBG/GUID-42825ED1-C4C5-449B-870F-D2C8627CBF86.htm#VLDBG1578 Anyway, if the target table is turned back to LOGGED, the succeedeing WAL stream can be polluted by the logs on the table. So any operations relying on WAL records is assumed not to be continuable. Not only useless, it is harmful. I think we don't accept emitting an insconsistent WAL stream intentionally while wal_level > minimal. You assert that we could prevent WAL from redoed by the "invalidation" but it is equivalent to turning the table into UNLOGGED. Why do you insist that a table should be labeled as "LOGGED" whereas it is virtually UNLOGGED? That costs nothing (if the table is almost empty). If you want to get the table back to LOGGED without emitting WAL, wal_level=minimal works. That requires restart twice, and table copying, though. It seems like that we can skip copying in this case but I'm not sure. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Aug 20, 2020 at 5:49 AM osumi.takamichi@fujitsu.com <osumi.takamichi@fujitsu.com> wrote: > > Hello. > > > During the crash recovery, those points are helpful to recognize and > > > detach such blocks in order to solve a situation that the loaded data is partially > > synced to the disk and the rest isn't. > > > > How do online backup and archive recovery work ? > > > > Suppose that the user executes pg_basebackup during COPY UNLOGGED running, > > the physical backup might have the portion of tuples loaded by COPY UNLOGGED > > but these data are not recovered. It might not be a problem because the operation > > is performed without WAL records. But what if an insertion happens after COPY > > UNLOGGED but before pg_stop_backup()? I think that a new tuple could be > > inserted at the end of the table, following the data loaded by COPY UNLOGGED. > > With your approach described above, the newly inserted tuple will be recovered > > during archive recovery, but it either will be removed if we replay the insertion > > WAL then truncate the table or won’t be inserted due to missing block if we > > truncate the table then replay the insertion WAL, resulting in losing the tuple > > although the user got successful of insertion. > I consider that from the point in time when COPY UNLOGGED is executed, > any subsequent operations to the data which comes from UNLOGGED operation > also cannot be recovered even if those issued WAL. > > This is basically inevitable because subsequent operations > after COPY UNLOGGED depend on blocks of loaded data without WAL, > which means we cannot replay exact operations. > > Therefore, all I can do is to guarantee that > when one recovery process ends, the target table returns to the state > immediately before the COPY UNLOGGED is executed. > This could be achieved by issuing and notifying the server of an invalidation WAL, > an indicator to stop WAL application toward one specific table after this new type of WAL. > I don't think we can achieve what you want by one special invalidation WAL. Consider a case where an update has happened on the page which exists before 'Copy Unlogged' operation and while writing that page to disk, the system crashed and the page is half-written. Without the special WAL mechanism you are proposing to introduce, during recovery, we can replay the full-page-image from WAL of such a page and then perform the required update, so after recovery, the page won't be torn anymore. Basically, the idea is that to protect from such torn-writes (half-written pages), we have a concept called full-page writes which protects the data from such writes after recovery. Before writing to any page after a checkpoint, we write its full-page-image in WAL which helps us in recovering from such situations but with your proposed mechanism it won't work. Another concern I have with this idea is that you want to keep writing WAL for such a relation but don't want to replay in recovery which sounds like a good idea. The idea to keep part of the table as logged and other as unlogged sounds scary to me. Now, IIUC, you are trying to come up with these ideas because to use Alter Table .. Set Unlogged, one has to rewrite the entire table and if such a table is large, it will be a very time-consuming operation. You might want to explore whether we can avoid rewriting the table for such an operation but I don't think that is easy either. The two problems I could see immediately are (a) we have to change BM_PERMANENT marking of exiting buffers of such a relation which again can be a time-consuming operation especially for a large value of shread_buffers, (b) create an _init fork of such a relation in-sync with the commit. You might want to read the archives to see why at the first place we have decided to re-write the table for SET UNLOGGED operation, see email [1]. [1] - https://www.postgresql.org/message-id/CAFcNs%2Bpeg3VPG2%3Dv6Lu3vfCDP8mt7cs6-RMMXxjxWNLREgSRVQ%40mail.gmail.com -- With Regards, Amit Kapila.
Hello, I think it's worth thinking about a sophisticated feature like Oracle's UNRECOVERABLE data loading (because SQL Server'sBCP load utility also has such a feature, but for an empty table), how about an easier approach like MySQL? I expectthis won't complicate Postgres code much. The customer is using Oracle RAC for high availability of a data warehouse. Then, I think they can use the traditional shareddisk-based HA clustering, not the streaming replication when they migrate to Postgres. They load data into the data warehouse with the nightly ETL or ELT. The loading window is limited, so they run multipleconcurrent loading sessions, with the transaction logging off. They probably use all resources for the data loadingduring that period. Then, you might think "How about turning fsync and full_page_writes to off?" But the customer doesn't like to be worriedabout the massive amount of WAL generated during the loading. OTOH, the latest MySQL 8.0.21 introduced the following feature. This is for the initial data loading into a new databaseinstance, though. https://dev.mysql.com/doc/refman/8.0/en/innodb-redo-log.html#innodb-disable-redo-logging -------------------------------------------------- Disabling Redo Logging As of MySQL 8.0.21, you can disable redo logging using the ALTER INSTANCE DISABLE INNODB REDO_LOG statement. This functionalityis intended for loading data into a new MySQL instance. Disabling redo logging speeds up data loading by avoidingredo log writes and doublewrite buffering. Warning This feature is intended only for loading data into a new MySQL instance. Do not disable redo logging on a production system.It is permitted to shutdown and restart the server while redo logging is disabled, but an unexpected server stoppagewhile redo logging is disabled can cause data loss and instance corruption. Attempting to restart the server after an unexpected server stoppage while redo logging is disabled is refused with the followingerror: [ERROR] [MY-013578] [InnoDB] Server was killed when Innodb Redo logging was disabled. Data files could be corrupt. You can try to restart the database with innodb_force_recovery=6 In this case, initialize a new MySQL instance and start the data loading procedure again. -------------------------------------------------- Following this idea, what do you think about adding a new value "none" to wal_level, where no WAL is generated? The settingof wal_level is recorded in pg_control. The startup process can see the value and reject recovery after abnormalshutdown, emitting a message similar to MySQL's. Just a quick idea. I hope no devil will appear in the details. Regards Takayuki Tsunakawa
On Wed, Aug 26, 2020 at 12:54 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > > Following this idea, what do you think about adding a new value "none" to wal_level, where no WAL is generated? The settingof wal_level is recorded in pg_control. The startup process can see the value and reject recovery after abnormalshutdown, emitting a message similar to MySQL's. > So you want your users to shutdown and restart the server before Copy because that would be required if you want to change the wal_level. However, even if we do that, users who are running the server previously with wal_level as 'replica' won't be happy after doing this change. Because if they change the wal_level to 'none' for certain operations like bulk load and then again change back the mode to 'replica' they need to back up the database again to setup 'replica' as they can't continue replication from the previous point (consider update on a page for which previously WAL was not written). -- With Regards, Amit Kapila.
From: Amit Kapila <amit.kapila16@gmail.com> > So you want your users to shutdown and restart the server before Copy > because that would be required if you want to change the wal_level. Yes. They seem to be fine with it, as far as I heard from a person who is involved in the system design. > However, even if we do that, users who are running the server > previously with wal_level as 'replica' won't be happy after doing this > change. Because if they change the wal_level to 'none' for certain > operations like bulk load and then again change back the mode to > 'replica' they need to back up the database again to setup 'replica' > as they can't continue replication from the previous point (consider > update on a page for which previously WAL was not written). Yes, it requires the database backup. The database backup should be a daily task anyway, so I expect it wouldn't imposeextra maintenance burdon on the user. Plus, not all users use the streaming replication for HA. I think it's helpfulfor the maturing Postgres to provide some kind of solution for some context so that user can scratch their itches. Regards Takayuki Tsunakawa
On Thu, Aug 27, 2020 at 7:04 AM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > > From: Amit Kapila <amit.kapila16@gmail.com> > > So you want your users to shutdown and restart the server before Copy > > because that would be required if you want to change the wal_level. > > Yes. They seem to be fine with it, as far as I heard from a person who is involved in the system design. > > > > However, even if we do that, users who are running the server > > previously with wal_level as 'replica' won't be happy after doing this > > change. Because if they change the wal_level to 'none' for certain > > operations like bulk load and then again change back the mode to > > 'replica' they need to back up the database again to setup 'replica' > > as they can't continue replication from the previous point (consider > > update on a page for which previously WAL was not written). > > Yes, it requires the database backup. The database backup should be a daily task anyway, so I expect it wouldn't imposeextra maintenance burdon on the user. > Sure, but on a daily basis, one requires only incremental WAL to complete the backup but in this case, it would require the entire database back up unless we have some form of block-level incremental backup method. OTOH, I don't deny that there is some use case with wal_level = 'none' for initial data loading as MySQL provides but I think that is a separate feature than what is proposed here (Copy Unlogged). It might be better if you can start a separate thread for that with some more details on the implementation side as well. -- With Regards, Amit Kapila.
From: Amit Kapila <amit.kapila16@gmail.com> > Sure, but on a daily basis, one requires only incremental WAL to > complete the backup but in this case, it would require the entire > database back up unless we have some form of block-level incremental > backup method. Regarding the backup time, I think users can shorten it by using the storage device's snapshoting (or split mirroring?),filesystem's snapshot feature. > OTOH, I don't deny that there is some use case with > wal_level = 'none' for initial data loading as MySQL provides but I > think that is a separate feature than what is proposed here (Copy > Unlogged). It might be better if you can start a separate thread for > that with some more details on the implementation side as well. Yeah, the feature doesn't match the title of this thread and could confuse other readers that join later. Nevertheless,I think MySQL's feature could be used for additional data loading as well if the user understands what he/sheis doing. So, we can discuss it in another thread just in case the ongoing discussion gets stuck. Regards Takayuki Tsunakawa
Hi. I expect I have some basic misunderstanding because IMO now this thread seems to have come full circle. Earlier, Osumi-san was rejecting the idea of using ALTER TABLE tbl SET UNLOGGED on basis that it is too time consuming for large data to switch the table modes [1]. Now the latest idea is to introduce a wal_level=none. But now apparently full daily backups are OK, and daily restarting the server before the copies is also OK [2]. ~ Doesn't wal_level=none essentially just behave as if every table was UNLOGGED; not just the ones we are loading? Doesn't wal_level=none come with all the same limitations/requirements (full daily backups/restarts etc) that the UNLOGGED TABLE would also have? So I don't recognise the difference? If wal_level=none is judged OK as a fast loading solution, then why wasn't an initially UNLOGGED table also judged OK by the same criteria? And if there is no real difference, then why is it necessary to introduce wal_level=none (instead of using the existing UNLOGGED feature) in the first place? Or, if all this problem is simply due to a quirk that the BI tool referred to does not support the CREATE UNLOGGED TABLE syntax [3], then surely there is some other workaround could be written to handle that. What part am I missing? -- [1] - https://www.postgresql.org/message-id/OSBPR01MB48884832932F93DAA953CEB9ED650%40OSBPR01MB4888.jpnprd01.prod.outlook.com [2] - https://www.postgresql.org/message-id/TYAPR01MB299005FC543C43348A4993FDFE550%40TYAPR01MB2990.jpnprd01.prod.outlook.com [3] - https://www.postgresql.org/message-id/OSBPR01MB4888CBD08DDF73721C18D2C0ED790%40OSBPR01MB4888.jpnprd01.prod.outlook.com Kind Regards, Peter Smith. Fujitsu Australia
From: Peter Smith <smithpb2250@gmail.com> > Earlier, Osumi-san was rejecting the idea of using ALTER TABLE tbl SET > UNLOGGED on basis that it is too time consuming for large data to > switch the table modes [1]. > Doesn't wal_level=none essentially just behave as if every table was > UNLOGGED; not just the ones we are loading? > > Doesn't wal_level=none come with all the same limitations/requirements > (full daily backups/restarts etc) that the UNLOGGED TABLE would also > have? ALTER TABLE takes long time proportional to the amount of existing data, while wal_level = none doesn't. Regards Takayuki Tsunakawa
On Thu, Sep 10, 2020 at 7:16 PM tsunakawa.takay@fujitsu.com <tsunakawa.takay@fujitsu.com> wrote: > ALTER TABLE takes long time proportional to the amount of existing data, while wal_level = none doesn't. Right, but if wal_level=none is considered OK for that table with existing data, then why not just create the table UNLOGGED in the first place? (or ALTER it to set UNLOGGED just one time and then leave it as UNLOGGED). Kind Regards, Peter Smith. Fujitsu Australia
From: Peter Smith <smithpb2250@gmail.com> On Thu, Sep 10, 2020 at 7:16 PM tsunakawa.takay@fujitsu.com > <tsunakawa.takay@fujitsu.com> wrote: > > ALTER TABLE takes long time proportional to the amount of existing data, > while wal_level = none doesn't. > > Right, but if wal_level=none is considered OK for that table with > existing data, then why not just create the table UNLOGGED in the > first place? (or ALTER it to set UNLOGGED just one time and then leave > it as UNLOGGED). The target tables sometimes receive updates (for data maintenance and/or correction). They don't want those updates to belost due to the database server crash. Unlogged tables lose their entire contents during crash recovery. Please think like this: logging is is the norm, and unlogged operations are exceptions/hacks for some requirement of whichthe user wants to minimize the use. Regards Takayuki Tsunakawa
At Fri, 11 Sep 2020 05:15:32 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > From: Peter Smith <smithpb2250@gmail.com> > On Thu, Sep 10, 2020 at 7:16 PM tsunakawa.takay@fujitsu.com > > <tsunakawa.takay@fujitsu.com> wrote: > > > ALTER TABLE takes long time proportional to the amount of existing data, > > while wal_level = none doesn't. > > > > Right, but if wal_level=none is considered OK for that table with > > existing data, then why not just create the table UNLOGGED in the > > first place? (or ALTER it to set UNLOGGED just one time and then leave > > it as UNLOGGED). > > The target tables sometimes receive updates (for data maintenance and/or correction). They don't want those updates tobe lost due to the database server crash. Unlogged tables lose their entire contents during crash recovery. > > Please think like this: logging is is the norm, and unlogged operations are exceptions/hacks for some requirement of whichthe user wants to minimize the use. I suspect that wal_level=none is a bit too toxic. "ALTER TABLE SET UNLOGGED" doesn't dump large amount of WAL so I don't think it can be a problem. "ALTER TABLE SET LOGGED" also doesn't issue WAL while wal_level=minimal but runs a table copy. I think the only problem of the UNLOGGED table method is that table copy. If we can skip the table-copy when ALTER TABLE SET LOGGED on wal_level=minimal, is your objective achived? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Fri, 11 Sep 2020 17:36:19 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Fri, 11 Sep 2020 05:15:32 +0000, "tsunakawa.takay@fujitsu.com" <tsunakawa.takay@fujitsu.com> wrote in > > From: Peter Smith <smithpb2250@gmail.com> > > On Thu, Sep 10, 2020 at 7:16 PM tsunakawa.takay@fujitsu.com > > > <tsunakawa.takay@fujitsu.com> wrote: > > > > ALTER TABLE takes long time proportional to the amount of existing data, > > > while wal_level = none doesn't. > > > > > > Right, but if wal_level=none is considered OK for that table with > > > existing data, then why not just create the table UNLOGGED in the > > > first place? (or ALTER it to set UNLOGGED just one time and then leave > > > it as UNLOGGED). > > > > The target tables sometimes receive updates (for data maintenance and/or correction). They don't want those updatesto be lost due to the database server crash. Unlogged tables lose their entire contents during crash recovery. > > > > Please think like this: logging is is the norm, and unlogged operations are exceptions/hacks for some requirement ofwhich the user wants to minimize the use. > > I suspect that wal_level=none is a bit too toxic. > > "ALTER TABLE SET UNLOGGED" doesn't dump large amount of WAL so I don't > think it can be a problem. "ALTER TABLE SET LOGGED" also doesn't issue (Oops! this runs a table copy) > WAL while wal_level=minimal but runs a table copy. I think the only > problem of the UNLOGGED table method is that table copy. > > If we can skip the table-copy when ALTER TABLE SET LOGGED on > wal_level=minimal, is your objective achived? -- Kyotaro Horiguchi NTT Open Source Software Center
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> > If we can skip the table-copy when ALTER TABLE SET LOGGED on > wal_level=minimal, is your objective achived? I expect so, if we can skip the table copy during ALTER TABLE SET LOGGED/UNLOGGED. On the other hand, both approaches havedifferent pros and cons. It's nice that ALTER TABLE doesn't require database restart, but the user has to specify tables. wal_level = none is vice versa. Anyway, wal_level = none would be useful for initial data loading after creatinga new database cluster, as MySQL suggests. Regards Takayuki Tsunakawa