RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id OS3PR01MB62754E945BD95A9B2A1164789E289@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
List pgsql-hackers
On Fri, Sep 23, 2022 at 0:14 AM Önder Kalacı <onderkalaci@gmail.com> wrote:
> Hii Wang wei,

Thanks for updating the patch and your reply.

> > 1. In the function GetCheapestReplicaIdentityFullPath.
> > +    if (rel->pathlist == NIL)
> > +    {
> > +        /*
> > +         * A sequential scan could have been dominated by by an index
> > scan
> > +         * during make_one_rel(). We should always have a sequential
> > scan
> > +         * before set_cheapest().
> > +         */
> > +        Path       *seqScanPath = create_seqscan_path(root, rel, NULL,
> > 0);
> > +
> > +        add_path(rel, seqScanPath);
> > +    }
> >
> > This is a question I'm not sure about:
> > Do we need this part to add sequential scan?
> >
> > I think in our case, the sequential scan seems to have been added by the
> > function make_one_rel (see function set_plain_rel_pathlist).
> 
> Yes, the sequential scan is added during make_one_rel.
> 
> > If I am missing something, please let me know. BTW, there is a typo in
> > above comment: `by by`.
> 
> As the comment mentions, the sequential scan could have been dominated &
> removed by index scan, see add_path():
> 
> *We also remove from the rel's pathlist any old paths that are dominated
> *  by new_path --- that is, new_path is cheaper, at least as well ordered,
> *  generates no more rows, requires no outer rels not required by the old
> *  path, and is no less parallel-safe.
> 
> Still, I agree that the comment could be improved, which I pushed.

Oh, sorry I didn't realize this part of the logic. Thanks for sharing this.

And I have another confusion about function GetCheapestReplicaIdentityFullPath:
If rel->pathlist is NIL, could we return NULL directly from this function, and
then set idxoid to InvalidOid in function FindUsableIndexForReplicaIdentityFull
in that case?

===

Here are some comments for test file  032_subscribe_use_index.pl on v18 patch:

1.
```
+# Basic test where the subscriber uses index
+# and only updates 1 row for and deletes
+# 1 other row
```
There seems to be an extra "for" here.

2. Typos for subscription name in the error messages.
tap_sub_rep_full_0 -> tap_sub_rep_full

3. Typo in comments
```
+# use the newly created index (provided that it fullfils the requirements).
```
fullfils -> fulfils

4. Some extra single quotes at the end of the error message ('").
For example:
```
# wait until the index is used on the subscriber
$node_subscriber->poll_query_until(
    'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index'";
```

5. The column names in the error message appear to be a typo.
```
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on
highcardinality column-1";
 
...
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on
highcardinality column-3";
 
...
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on
highcardinality column-4";
 
```
It seems that we need to do the following change: 'column-3' -> 'column-1' and
'column-4' -> 'column-2'.
Or we could use the column names directly like this: 'column-1' -> 'column a',
'column_3' -> 'column a' and 'column_4' -> 'column b'.

6. DELETE action is missing from the error message.
```
+# 2 rows from first command, another 2 from the second command
+# overall index_on_child_1_a is used 4 times
+$node_subscriber->poll_query_until(
+    'postgres', q{select idx_scan=4 from pg_stat_all_indexes where indexrelname = 'index_on_child_1_a';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates child_1 table'";
```
I think we execute both UPDATE and DELETE for child_1 here. Could we add DELETE
action to this error message?

7. Table name in the error message.
```
# check if the index is used even when the index has NULL values
$node_subscriber->poll_query_until(
    'postgres', q{select idx_scan=2 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates parent table'";
```
It seems to be "test_replica_id_full" here instead of "parent'".

Regards,
Wang wei

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Simplify standby state machine a bit in WaitForWALToBecomeAvailable()
Next
From: "Drouvot, Bertrand"
Date:
Subject: Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf