Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhUu6S8E4Oo7+s5iaq=yLRZJb6uOZeEQSGJj-7NVkDzSaw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi Amit, all



Some of the ideas I can think of are as follows:

1. Combine "SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS"
and "SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS" such that after
verifying updates and deletes of the first test, we can drop some of
the columns on both publisher and subscriber, then use alter
subscription ... refresh publication command and then do the steps of
the second test. Note that we don't add tables after initial setup,
only changing schema.

Done with an important caveat. I think this reorganization of the test helped
us to find one edge case regarding dropped columns.

I realized that the dropped columns also get into the tuples_equal() function. And,
the remote sends NULL to for the dropped columns(i.e., remoteslot), but 
index_getnext_slot() (or table_scan_getnextslot) indeed fills the dropped
columns on the outslot. So, the dropped columns are not NULL in the outslot.

This triggers tuples_equal to fail. To fix that, I improved the tuples_equal
such that it skips the dropped columns.

I also spend quite a bit of time understanding how/why this impacts
HEAD. See steps below on HEAD, where REPLICA IDENTITY FULL
fails to replica the data properly:


-- pub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 timestamptz);
ALTER TABLE test REPLICA IDENTITY FULL;
INSERT INTO test SELECT NULL, i, i, (i)::text, now() FROM generate_series(0,1)i;
CREATE PUBLICATION pub FOR ALL TABLES;

-- sub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 timestamptz);
CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION pub;

-- show that before dropping the columns, the data in the source and
-- target are deleted properly
DELETE FROM test WHERE x = 0;

-- both on the source and target
SELECT count(*) FROM test WHERE x = 0;
┌───────┐
│ count │
├───────┤
│     0 │
└───────┘
(1 row)

-- drop columns on both the the source
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- drop columns on both the the target
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- on the target
ALTER SUBSCRIPTION sub REFRESH PUBLICATION;


-- after dropping the columns
DELETE FROM test WHERE x = 1;

-- source
SELECT count(*) FROM test WHERE x = 1;
┌───────┐
│ count │
├───────┤
│     0 │
└───────┘
(1 row)

-- target, OOPS wrong result!!!!
SELECT count(*) FROM test WHERE x = 1;

┌───────┐
│ count │
├───────┤
│     1 │
└───────┘
(1 row)


Should we have a separate patch for the tuples_equal change so that we
might want to backport? Attached as v40_0001 on the patch.

Note that I need to have that commit as 0001 so that 0002 patch
passes the tests.


2. We can also combine "Some NULL values" and "PUBLICATION LACKS THE
COLUMN ON THE SUBS INDEX" as both use the same schema. After the first
test, we need to drop the existing index and create a new index on the
subscriber node.

done

 

3. General comment
+# updates 200 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");

I think here you are updating 20 rows not 200. So, the comment seems
wrong to me.

I think I have fixed that in an earlier version because I cannot see this 
comment anymore.
 

Please think more and see if we can combine some other tests like
"Unique index that is not primary key or replica identity" and the
test we will have after comment#2 above.


I'll look for more opportunities and reply to the thread. I wanted to send
this mail so that you can have a look at (1) earlier.


Thanks,
Onder  
Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: pgsql: Allow tailoring of ICU locales with custom rules
Next
From: Peter Eisentraut
Date:
Subject: Re: pgsql: Use ICU by default at initdb time.