Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Önder Kalacı
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	March 10, 2023 09:54:48
Msg-id	CACawEhUu6S8E4Oo7+s5iaq=yLRZJb6uOZeEQSGJj-7NVkDzSaw@mail.gmail.com Whole thread
In response to	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List	pgsql-hackers

Tree view

Hi Amit, all

Some of the ideas I can think of are as follows:

1. Combine "SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS"
and "SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS" such that after
verifying updates and deletes of the first test, we can drop some of
the columns on both publisher and subscriber, then use alter
subscription ... refresh publication command and then do the steps of
the second test. Note that we don't add tables after initial setup,
only changing schema.

Done with an important caveat. I think this reorganization of the test helped

us to find one edge case regarding dropped columns.

I realized that the dropped columns also get into the tuples_equal() function. And,

the remote sends NULL to for the dropped columns(i.e., remoteslot), but

index_getnext_slot() (or table_scan_getnextslot) indeed fills the dropped

columns on the outslot. So, the dropped columns are not NULL in the outslot.

This triggers tuples_equal to fail. To fix that, I improved the tuples_equal

such that it skips the dropped columns.

I also spend quite a bit of time understanding how/why this impacts

HEAD. See steps below on HEAD, where REPLICA IDENTITY FULL

fails to replica the data properly:

-- pub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 timestamptz);
ALTER TABLE test REPLICA IDENTITY FULL;
INSERT INTO test SELECT NULL, i, i, (i)::text, now() FROM generate_series(0,1)i;
CREATE PUBLICATION pub FOR ALL TABLES;

-- sub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 timestamptz);
CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION pub;

-- show that before dropping the columns, the data in the source and
-- target are deleted properly
DELETE FROM test WHERE x = 0;

-- both on the source and target
SELECT count(*) FROM test WHERE x = 0;
┌───────┐
│ count │
├───────┤
│ 0 │
└───────┘
(1 row)

-- drop columns on both the the source
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- drop columns on both the the target
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- on the target
ALTER SUBSCRIPTION sub REFRESH PUBLICATION;

-- after dropping the columns
DELETE FROM test WHERE x = 1;

-- source
SELECT count(*) FROM test WHERE x = 1;
┌───────┐
│ count │
├───────┤
│ 0 │
└───────┘
(1 row)

-- target, OOPS wrong result!!!!
SELECT count(*) FROM test WHERE x = 1;

┌───────┐
│ count │
├───────┤
│ 1 │
└───────┘
(1 row)

Should we have a separate patch for the tuples_equal change so that we

might want to backport? Attached as v40_0001 on the patch.

Note that I need to have that commit as 0001 so that 0002 patch

passes the tests.

2. We can also combine "Some NULL values" and "PUBLICATION LACKS THE
COLUMN ON THE SUBS INDEX" as both use the same schema. After the first
test, we need to drop the existing index and create a new index on the
subscriber node.

done

3. General comment
+# updates 200 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");

I think here you are updating 20 rows not 200. So, the comment seems
wrong to me.

I think I have fixed that in an earlier version because I cannot see this

comment anymore.

Please think more and see if we can combine some other tests like
"Unique index that is not primary key or replica identity" and the
test we will have after comment#2 above.

I'll look for more opportunities and reply to the thread. I wanted to send

this mail so that you can have a look at (1) earlier.

Thanks,

Onder

Attachment

pgsql-hackers by date:

From: Peter Eisentraut
Date: 10 March 2023, 09:54:14
Subject: Re: pgsql: Allow tailoring of ICU locales with custom rules

From: Peter Eisentraut
Date: 10 March 2023, 09:59:12
Subject: Re: pgsql: Use ICU by default at initdb time.

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Attachment

Previous

Next