Thread: alter table xxx set unlogged take long time
Hi ,
We have PG v13.6 in RHEL8.4, we try to set table unlogged before load data. There are a lot of existing data in this table, when ‘alter table xxx set unlogged’, we found it take long time and spend time on IO datafileread. Is it expected?
Thanks,
James
Yes, the whole table needs to be written to WAL so this could take a long time for a large tableHi ,
We have PG v13.6 in RHEL8.4, we try to set table unlogged before load data. There are a lot of existing data in this table, when ‘alter table xxx set unlogged’, we found it take long time and spend time on IO datafileread. Is it expected?
How to make it fast ? These are our steps about copy large data from Oracle to Postgres
- Create table in Postgres 2. Extract data from Oracle to CSV 3. Alter table set xxx unlogged, 4. Run copy command into Postgres db 5. Alter table set xxx logged 6. Create index …
Step 5 took long time ,especially for large tables.
Thank,
James
From: Jim Mlodgenski <jimmy76@gmail.com>
Sent: Tuesday, July 26, 2022 8:21 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time
On Tue, Jul 26, 2022 at 4:53 AM James Pang (chaolpan) <chaolpan@cisco.com> wrote:
Hi ,
We have PG v13.6 in RHEL8.4, we try to set table unlogged before load data. There are a lot of existing data in this table, when ‘alter table xxx set unlogged’, we found it take long time and spend time on IO datafileread. Is it expected?
Yes, the whole table needs to be written to WAL so this could take a long time for a large table
"James Pang (chaolpan)" <chaolpan@cisco.com> writes: > How to make it fast ? These are our steps about copy large data from Oracle to Postgres > 1. Create table in Postgres 2. Extract data from Oracle to CSV 3. Alter table set xxx unlogged, 4. Run copy commandinto Postgres db 5. Alter table set xxx logged 6. Create index … The easy answer is to skip steps 3 and 5. regards, tom lane
Without step 3 , copy data take long time. Use wal_level=minimal can help make COPY load data without logging ? Thanks, James -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> Sent: Tuesday, July 26, 2022 8:43 PM To: James Pang (chaolpan) <chaolpan@cisco.com> Cc: Jim Mlodgenski <jimmy76@gmail.com>; pgsql-performance@lists.postgresql.org Subject: Re: alter table xxx set unlogged take long time "James Pang (chaolpan)" <chaolpan@cisco.com> writes: > How to make it fast ? These are our steps about copy large data from Oracle to Postgres > 1. Create table in Postgres 2. Extract data from Oracle to CSV 3. Alter table set xxx unlogged, 4. Run copy commandinto Postgres db 5. Alter table set xxx logged 6. Create index … The easy answer is to skip steps 3 and 5. regards, tom lane
Without step 3 , copy data take long time. Use wal_level=minimal can help make COPY load data without logging ?
-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, July 26, 2022 8:43 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: Jim Mlodgenski <jimmy76@gmail.com>; pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time
"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
> How to make it fast ? These are our steps about copy large data from Oracle to Postgres
> 1. Create table in Postgres 2. Extract data from Oracle to CSV 3. Alter table set xxx unlogged, 4. Run copy command into Postgres db 5. Alter table set xxx logged 6. Create index …
The easy answer is to skip steps 3 and 5.
regards, tom lane
We use JDBC to export data into csv ,then copy that to Postgres. Multiple sessions working on multiple tables. If not set unlogged , how to make COPY run fast ? possible to start a transaction include all of these “truncate table xxx; copy table xxxx; create index on tables….” With wal_level=minimal, is it ok to make copy and create index without logging ?
James
From: Jim Mlodgenski <jimmy76@gmail.com>
Sent: Tuesday, July 26, 2022 8:53 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time
On Tue, Jul 26, 2022 at 8:45 AM James Pang (chaolpan) <chaolpan@cisco.com> wrote:
Without step 3 , copy data take long time. Use wal_level=minimal can help make COPY load data without logging ?
I assume that you're most concerned with the total time of moving the data from the source database into the final table so you might get a big win by not moving the data twice and directly load the table through a Foregin Data Wrapper and avoid the csv export/import. Something like the oracle_fdw might help here:
-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Tuesday, July 26, 2022 8:43 PM
To: James Pang (chaolpan) <chaolpan@cisco.com>
Cc: Jim Mlodgenski <jimmy76@gmail.com>; pgsql-performance@lists.postgresql.org
Subject: Re: alter table xxx set unlogged take long time
"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
> How to make it fast ? These are our steps about copy large data from Oracle to Postgres
> 1. Create table in Postgres 2. Extract data from Oracle to CSV 3. Alter table set xxx unlogged, 4. Run copy command into Postgres db 5. Alter table set xxx logged 6. Create index …
The easy answer is to skip steps 3 and 5.
regards, tom lane
Without step 3 , copy data take long time. Use wal_level=minimal can help make COPY load data without logging ?
On 7/26/22 08:59, James Pang (chaolpan) wrote: > We use JDBC to export data into csv ,then copy that to Postgres. > Multiple sessions working on multiple tables. If not set unlogged , how > to make COPY run fast ? possible to start a transaction include all of > these “truncate table xxx; copy table xxxx; create index on tables….” > With wal_level=minimal, is it ok to make copy and create index without > logging ? Not sure if it would work for you, but perhaps a usable strategy would be to partition the existing large table on something (e.g. a new column like batch number?). Then (completely untested) I *think* you could create the "partition" initially as a free standing unlogged table, load it, index it, switch to logged, and then attach it to the partitioned table. Perhaps you could also have a background job that periodically aggregates the batch partitions into larger buckets to minimize the overall number of partitions. -- Joe Conway RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Joe Conway <mail@joeconway.com> writes: > Then (completely untested) I *think* you could create the "partition" > initially as a free standing unlogged table, load it, index it, switch > to logged, and then attach it to the partitioned table. I'm still of the opinion that this plan to load the data unlogged and switch to logged later is a loser. Sooner or later you have got to write the data to WAL, and this approach doesn't eliminate that cost. What it does do is create one whole extra cycle of writing the data to disk and reading it back. I don't think it's an oversight that no such thing is suggested in our standard tips for bulk-loading data: https://www.postgresql.org/docs/current/populate.html What perhaps *is* an oversight is that we don't suggest use of COPY FREEZE there. AFAIK that doesn't reduce the initial data loading cost directly, but it would save overhead later. regards, tom lane
On 7/27/22 10:46, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> Then (completely untested) I *think* you could create the "partition" >> initially as a free standing unlogged table, load it, index it, switch >> to logged, and then attach it to the partitioned table. > > I'm still of the opinion that this plan to load the data unlogged > and switch to logged later is a loser. Sooner or later you have > got to write the data to WAL, and this approach doesn't eliminate > that cost. What it does do is create one whole extra cycle of > writing the data to disk and reading it back. I don't think > it's an oversight that no such thing is suggested in our standard > tips for bulk-loading data: Yeah, agreed. I was mostly responding to the OP desire to use unlogged and not taking a stance on that. > https://www.postgresql.org/docs/current/populate.html > > What perhaps *is* an oversight is that we don't suggest > use of COPY FREEZE there. AFAIK that doesn't reduce the initial > data loading cost directly, but it would save overhead later. Oh, yes, very good point. -- Joe Conway RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
At Tue, 26 Jul 2022 12:41:07 +0000, "James Pang (chaolpan)" <chaolpan@cisco.com> wrote in > How to make it fast ? These are our steps about copy large data from Oracle to Postgres > > 1. Create table in Postgres 2. Extract data from Oracle to CSV 3. Alter table set xxx unlogged, 4. Run copy commandinto Postgres db 5. Alter table set xxx logged 6. Create index … > Step 5 took long time ,especially for large tables. As others pointed, the step5 inevitably requires WAL emittion. On the other hand, there is a proposed patch [1]. It lets ALTER TABLE SET LOGGED/UNLOGGED evade duping the whole target table and could reduce the amount of WAL to be emitted (caused by the difference of tuple-based WAL and per-page WAL) (in major cases). Could you try it and see if it works for you in any extent? regards. [1] https://commitfest.postgresql.org/38/3461/ -- Kyotaro Horiguchi NTT Open Source Software Center
Does "wal_level=minimal" help reducing wal emitting a lot for COPY and CREATE INDEX? We plan to remove "set unlogged/log", instead , just set "wal_level=minimal" ,then COPY data in parallel, then create index. Thanks, James -----Original Message----- From: Joe Conway <mail@joeconway.com> Sent: Wednesday, July 27, 2022 11:02 PM To: Tom Lane <tgl@sss.pgh.pa.us> Cc: James Pang (chaolpan) <chaolpan@cisco.com>; Jim Mlodgenski <jimmy76@gmail.com>; pgsql-performance@lists.postgresql.org Subject: Re: alter table xxx set unlogged take long time On 7/27/22 10:46, Tom Lane wrote: > Joe Conway <mail@joeconway.com> writes: >> Then (completely untested) I *think* you could create the "partition" >> initially as a free standing unlogged table, load it, index it, >> switch to logged, and then attach it to the partitioned table. > > I'm still of the opinion that this plan to load the data unlogged and > switch to logged later is a loser. Sooner or later you have got to > write the data to WAL, and this approach doesn't eliminate that cost. > What it does do is create one whole extra cycle of writing the data to > disk and reading it back. I don't think it's an oversight that no > such thing is suggested in our standard tips for bulk-loading data: Yeah, agreed. I was mostly responding to the OP desire to use unlogged and not taking a stance on that. > https://www.postgresql.org/docs/current/populate.html > > What perhaps *is* an oversight is that we don't suggest use of COPY > FREEZE there. AFAIK that doesn't reduce the initial data loading cost > directly, but it would save overhead later. Oh, yes, very good point. -- Joe Conway RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 7/28/22 03:47, James Pang (chaolpan) wrote: > Does "wal_level=minimal" help reducing wal emitting a lot for COPY > and CREATE INDEX? We plan to remove "set unlogged/log" , instead , > just set "wal_level=minimal" ,then COPY data in parallel, then create > index. (Note - please don't top post on these lists) Yes, wal_level = minimal is a big help in my experience if you can tolerate it. Similarly synchronous_commit = off might help as long as you are prepared to reload some data in the event of a crash (which generally is true when bulk loading). As noted in the docs: This parameter can be changed at any time; the behavior for any one transaction is determined by the setting in effect when it commits. It is therefore possible, and useful, to have some transactions commit synchronously and others asynchronously. -- Joe Conway RDS Open Source Databases Amazon Web Services: https://aws.amazon.com