Thread: Dirty buffers with suppress_redundant_updates_trigger
Hi, I am currently working on a data sync solution, where data is synced from external systems to Postgres. As I do not know what data changed in the source data store, I have opted for the following flow: - Load all data in a temp table - Update existing rows main table based on contents of temp table - Insert new rows main table - Delete obsolete rows main table Obviously, I want to minimize WAL logging and data writes. Therefore I have added suppress_redundant_updates_trigger to the main table. Nevertheless, I am still seeing pages being dirtied and written to the WAL, even if no rows changed. Is there anyone that could explain why this is happening? And does anyone know of better ways to minimize writes in such a 'data load/refresh' scenario, where data is the same in 99.9% of the cases? Small test case on Postgres 12.4 below, as you can see it dirties 8334 blocksk: testdatabase=# create table testtable (id int not null, name varchar(32), primary key(id)); CREATE TABLE testdatabase=# create temp table testtable_temp (id int not null, name varchar(32), primary key(id)); CREATE TABLE testdatabase=# insert into testtable (id, name) select i, md5(i::text) from generate_series(1,1000000) s(i); INSERT 0 1000000 testdatabase=# insert into testtable_temp select * from testtable; INSERT 0 1000000 testdatabase=# create trigger z_min_update before update on testtable for each row execute procedure suppress_redundant_updates_trigger(); CREATE TRIGGER testdatabase=# checkpoint; CHECKPOINT testdatabase=# explain (analyze, buffers) update testtable as d set name = s.name from testtable_temp as s where d.id = s.id; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- Update on testtable d (cost=30315.56..69708.58 rows=591714 width=98) (actual time=4168.901..4168.903 rows=0 loops=1) Buffers: shared hit=1008337 dirtied=8334, local read=8334 dirtied=8334 written=8331, temp read=11144 written=11144 -> Hash Join (cost=30315.56..69708.58 rows=591714 width=98) (actual time=486.771..1429.637 rows=1000000 loops=1) Hash Cond: (d.id = s.id) Buffers: shared hit=8337, local read=8334 dirtied=8334 written=8331, temp read=11144 written=11144 -> Seq Scan on testtable d (cost=0.00..18334.00 rows=1000000 width=10) (actual time=0.026..157.729 rows=1000000 loops=1) Buffers: shared hit=8334 -> Hash (cost=14251.14..14251.14 rows=591714 width=92) (actual time=486.128..486.129 rows=1000000 loops=1) Buckets: 32768 Batches: 32 Memory Usage: 2583kB Buffers: local read=8334 dirtied=8334 written=8331, temp written=7549 -> Seq Scan on testtable_temp s (cost=0.00..14251.14 rows=591714 width=92) (actual time=0.035..226.642 rows=1000000 loops=1) Buffers: local read=8334 dirtied=8334 written=8331 Planning Time: 0.429 ms Trigger z_min_update: time=57.069 calls=1000000 Execution Time: 4174.785 ms (15 rows) Thanks, Mike
On Sun, 2020-09-13 at 13:49 +0200, Mike Noordermeer wrote: > Hi, > > I am currently working on a data sync solution, where data is synced > from external systems to Postgres. As I do not know what data changed > in the source data store, I have opted for the following flow: > > - Load all data in a temp table > - Update existing rows main table based on contents of temp table > - Insert new rows main table > - Delete obsolete rows main table > > Obviously, I want to minimize WAL logging and data writes. Therefore I > have added suppress_redundant_updates_trigger to the main table. > > Nevertheless, I am still seeing pages being dirtied and written to the > WAL, even if no rows changed. Is there anyone that could explain why > this is happening? And does anyone know of better ways to minimize > writes in such a 'data load/refresh' scenario, where data is the same > in 99.9% of the cases? > > Small test case on Postgres 12.4 below, as you can see it dirties 8334 blocksk: > > testdatabase=# create table testtable (id int not null, name > varchar(32), primary key(id)); > CREATE TABLE > testdatabase=# create temp table testtable_temp (id int not null, name > varchar(32), primary key(id)); > CREATE TABLE > testdatabase=# insert into testtable (id, name) select i, md5(i::text) > from generate_series(1,1000000) s(i); > INSERT 0 1000000 > testdatabase=# insert into testtable_temp select * from testtable; > INSERT 0 1000000 > testdatabase=# create trigger z_min_update before update on testtable > for each row execute procedure suppress_redundant_updates_trigger(); > CREATE TRIGGER > testdatabase=# checkpoint; > CHECKPOINT > testdatabase=# explain (analyze, buffers) update testtable as d set > name = s.name from testtable_temp as s where d.id = s.id; > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------- > Update on testtable d (cost=30315.56..69708.58 rows=591714 width=98) > (actual time=4168.901..4168.903 rows=0 loops=1) > Buffers: shared hit=1008337 dirtied=8334, local read=8334 > dirtied=8334 written=8331, temp read=11144 written=11144 > -> Hash Join (cost=30315.56..69708.58 rows=591714 width=98) > (actual time=486.771..1429.637 rows=1000000 loops=1) > Hash Cond: (d.id = s.id) > Buffers: shared hit=8337, local read=8334 dirtied=8334 > written=8331, temp read=11144 written=11144 > -> Seq Scan on testtable d (cost=0.00..18334.00 > rows=1000000 width=10) (actual time=0.026..157.729 rows=1000000 > loops=1) > Buffers: shared hit=8334 > -> Hash (cost=14251.14..14251.14 rows=591714 width=92) > (actual time=486.128..486.129 rows=1000000 loops=1) > Buckets: 32768 Batches: 32 Memory Usage: 2583kB > Buffers: local read=8334 dirtied=8334 written=8331, > temp written=7549 > -> Seq Scan on testtable_temp s (cost=0.00..14251.14 > rows=591714 width=92) (actual time=0.035..226.642 rows=1000000 > loops=1) > Buffers: local read=8334 dirtied=8334 written=8331 > Planning Time: 0.429 ms > Trigger z_min_update: time=57.069 calls=1000000 > Execution Time: 4174.785 ms > (15 rows) These are probably the "hint bits" set on newly committed rows by the first reader. Note that te blocks are dirtied during the sequential scan, not during the update. You could try VACUUMing the tables before the update (which will set hint bits) and see if you still get the dirtied blocks. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Mon, 14 Sep 2020 at 06:03, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > These are probably the "hint bits" set on newly committed rows by the first reader. > Note that te blocks are dirtied during the sequential scan, not during the update. > > You could try VACUUMing the tables before the update (which will set hint bits) > and see if you still get the dirtied blocks. Thanks. I thought about that as well, but unfortunately VACUUMing does not seem to solve this. Even a full VACUUM does not prevent the dirty blocks, and even very old/never updated but often read tables see this behaviour. In my sample, the plain update w/ suppress_redundant_updates_trigger() does cause the dirty blocks, but if I do an update that compares the fields, like this: update testtable as d set name = s.name from testtable_temp as s where d.id = s.id and d.name <> s.name; then no blocks are dirtied. So it seems suppress_redundant_updates_trigger() does not entirely avoid writing _something_ to the blocks, and I don't know what it is and how to avoid it. I would of course like to avoid having to specify every column in the where clause. Kind regards, Mike (sorry Laurenz, forgot to include the list in my initial reply)
On Mon, 14 Sep 2020 at 06:32, Mike Noordermeer <mike@normi.net> wrote: > So it seems > suppress_redundant_updates_trigger() does not entirely avoid writing > _something_ to the blocks, and I don't know what it is and how to > avoid it. Looking at the pg_waldump output, it seems to write something concerning locking to the WAL, I am seeing lots of the following records: rmgr: Heap len (rec/tot): 54/ 54, tx: 6747, lsn: B/BC7785F0, prev B/BC7785B8, desc: LOCK off 36: xid 6747: flags 0x00 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13408/45636 blk 8332 Still trying to figure out what this means... Kind regards, Mike
So it seems that when before triggers are handled, a SELECT FOR UPDATE row-level lock is taken before the triggers are run. This causes a write to the heap, as row-level locks are stored on-heap. This has the unfortunate effect that suppress_redundant_updates_trigger() is not able to prevent all writes to the heap. I do not yet understand why these locks are taken in this case, and not when I compare the fields in the UPDATE WHERE-clause, but that may have something to do with transactional guarantees, or some other tradeoffs during development. If anyone has a brilliant idea on how to improve this situation, please let me know. Otherwise I guess I will have to resort to comparing all data values, either manually or in the WHERE clause, to prevent the data loads from flooding the WAL. Kind regards, Mike
For future reference, in the end I have just added a WHERE clause comparing all fields with IS DISTINCT FROM. This seems to work well, prevents any locks/WAL-generation and is about as fast as the previous solution with suppress_redundant_updates_trigger(). Kind regards, Mike
Just curious, are you doing this in a trigger or in your application code? Either way, I'd think you could use the table record type to compare the temp vs real table values as an entire unit.
with cte_indexes as(
select * from pg_indexes limit 10
)
select i1.indexdef, i2.tablename
from cte_indexes AS i1
join cte_indexes AS i2 on i1.indexname = i2.indexname
where
i1 IS DISTINCT FROM i2;
select * from pg_indexes limit 10
)
select i1.indexdef, i2.tablename
from cte_indexes AS i1
join cte_indexes AS i2 on i1.indexname = i2.indexname
where
i1 IS DISTINCT FROM i2;
trigger-
I would probably delete records in the regular table that do not exist in the temp table, then delete from the temp table that already exists in the main table and then update the remaining rows.
On Mon, 14 Sep 2020 at 17:36, Michael Lewis <mlewis@entrata.com> wrote: > Just curious, are you doing this in a trigger or in your application code? Either way, I'd think you could use the tablerecord type to compare the temp vs real table values as an entire unit. Application code - if I would put it in a trigger, it would probably start adding locks to the WAL again. As the application knows the schema, this was not much of a problem. I initially did compare the whole row (not as a record, but just as i1.* IS DISTINCT FROM i2.*) which worked, but as there are quite some tables with a rather large primary key and almost no additional fields, I anticipated that it may have been faster to only compare the remaining fields (but I have not verified this). > I would probably delete records in the regular table that do not exist in the temp table, then delete from the temp tablethat already exists in the main table and then update the remaining rows. That's an interesting approach, it may be that shuffling around the insert/update/delete improves things indeed - at the moment I don't do anything with the temp table after it has been created. I'm a bit short on time at the moment, but may further look into this when I find some time for it. Thanks, Mike