RE: Implement UNLOGGED clause for COPY FROM - Mailing list pgsql-hackers
From | osumi.takamichi@fujitsu.com |
---|---|
Subject | RE: Implement UNLOGGED clause for COPY FROM |
Date | |
Msg-id | OSBPR01MB488867D6F32664057CD59DC6ED7C0@OSBPR01MB4888.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Implement UNLOGGED clause for COPY FROM (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Implement UNLOGGED clause for COPY FROM
Re: Implement UNLOGGED clause for COPY FROM |
List | pgsql-hackers |
Hi, > AFAICS, we can already accomplish basically the same thing as what you want to > do like this: > > alter table foo set unlogged; > copy foo from ...; > alter table foo set logged; This didn't satisfy what I wanted. In case that 'foo' has huge amount of rows at the beginning, this example would spend much time to copy the contents of 'foo' twice to swap relfilenodes atomically. When that loaded data by COPY is big too, its execution time becomes much longer. > You keep on ignoring the indexes... not to mention replication. Sorry for having made you think like this. When the server crash occurs during data loading of COPY UNLOGGED, it's a must to keep index consistent of course. I'm thinking that to rebuild the indexes on the target table would work. In my opinion, UNLOGGED clause must be designed to guarantee that where the data loaded by this clause is written starts from the end of all other data blocks. Plus, those blocks needs to be protected by any write of other transactions during the copy. Apart from that, the server must be aware of which block is the first block, or the range about where it started or ended in preparation for the crash. During the crash recovery, those points are helpful to recognize and detach such blocks in order to solve a situation that the loaded data is partially synced to the disk and the rest isn't. In terms of index, we can recreate index based on the relation's data protected by this mechanism above. Another idea of index crash recovery was to copy the indexes on the target table as a backup just before loading and write new added indexes from loaded data into this temporary index files in order to localize the new indexes. But, my purpose is to accelerate speed of data loading under the condition that target table has huge amount of data initially. Taking this purpose into an evaluation criterion, the initial copy of indexes would make the execution slow down. Thus, I choose rebuilding index. Another point I need to add for recovery would be how the startup postgres knows the condition of COPY UNLOGGED clause. My current idea is to utilize any other system file or create a new system file in the cluster for this clause. At least, it would become necessary for postgres to identify which blocks should be detached at the beginning when the command is executed. Therefore, we need add information for it. Lastly, I have to admit that the status of target table where data is loaded by COPY UNLOGGED would be marked as invalid and notified to standbys under replication environment from the point in time when the operation takes place. But, I'm going to allow users with special privileges (like DBA) to use this clause and this kind of tables would be judged by them not to replicate. Of course, I'm thinking better idea but now what I can say is like this for replication. Best, Takamichi Osumi
pgsql-hackers by date: