Thread: RE: Re: User defined data types in Logical Replication

RE: Re: User defined data types in Logical Replication

From

Huong Dangminh

Date:

16 November 2017, 15:55:13

Sawada-san,

Thanks for your response.
# And sorry again because I could not reply to your gmail 
# address from my environment due to security restriction.

> >> We are getting the bellow error while trying use Logical Replication
> >> with user defined data types in a C program (when call elog function).
> >>
> >>  ERROR:  XX000: cache lookup failed for type XXXXX
> >>
> >
> > Sorry for continuously disturbing in this topic, but am I missing something
> here?
> 
> No, but I'd suggest to provide a procedure for reproducing if possible,
> which will be helpful for investigation.

Sorry, I will be careful next time.

> > I mean that in case of type's OID in PUBLICATION host does not exists
> > in SUBSCRIPTION host's pg_type, it could returns unintended error (the
> XX000 above) when elog or ereport is executed.
> >
> > For more details, it happen in slot_store_error_callback when it try to
> call format_type_be(localtypoid) for errcontext.
> > slot_store_error_callback is set in slot_store_cstrings,
> slot_modify_cstrings function and it also be unset here, so the effect here
> is small but it happens.
> >
> 
> I think I found out the cause of this issue, and this is a bug. This can
> be reproduced, for example, if the input function of the data type calls
> elog() during applying on the environment where OIDs of the data type on
> publisher and subscriber are different. The cause of this issue is that
> we call format_type_be() with remotetypoid. If the OIDs of data type on
> publisher and subscriber are different we search it from syscache by the
> OID that doesn't exist on subscriber.

Yes, I also think that.

> On detail of your patch, I don't think this direction is good. Since the
> subscriber already has a LogicalRepTyp cache entry for the type we can report
> the error message using the data type name. So I think this issue can be
> fixed by using the remote type name got from the cache.

Thanks, 
I did not realize the LogicalRepRelMapEntry, remote type name is already here.

> Also I'm confused about the message of errcontext; currently we store the
> local data type OID corresponding to the remote data type name into the
> cache, and then we search the local data type name by the local data type
> OID stored in the cache. So  it means the both the local data type OID and
> the remote data type OID always imply the same data type. We use the both
> data type OIDs for log message in slot_store_error_callback, but I think
> what the function want to do is to show the different type names if the
> table definitions on both server are different (e.g. sending jsonb column
> data to text column data). I think we should use the type of the local relation
> attribute rather than remote's one.
> 
> Attached draft patch fixed this issue, at least on my environment.

It works good for me.

> Please review it.

I will review it soon.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd. 
http://www.nec-solutioninnovators.co.jp/en/

RE: Re: User defined data types in Logical Replication

From

Huong Dangminh

Date:

21 November 2017, 18:25:08

Sorry for not replying sooner.

> > Attached draft patch fixed this issue, at least on my environment.
> 
> It works good for me.
> 
> > Please review it.
> 
> I will review it soon.

There is one more case that user-defined data type is not supported in Logical Replication.
That is when remote data type's name does not exist in SUBSCRIBE.

In relation.c:logicalrep_typmap_gettypname
We search OID in syscache by remote's data type name and mapping it, if it does not exist in syscache 
We will be faced with the bellow error.

    if (!OidIsValid(entry->typoid))
        ereport(ERROR,
                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                 errmsg("data type \"%s.%s\" required for logical replication does not exist",
                        entry->nspname, entry->typname)));

I think, it is not necessary to check typoid here in order to avoid above case, is that right?
I attached a patch based on Sawada-san's patch with a bit of messages modified and remove the above check.
Could somebody check it for me or should I add it into CF?


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Attachment

fix_slot_store_error_callback_V2.patch

RE: Re: User defined data types in Logical Replication

From

Huong Dangminh

Date:

04 December 2017, 12:26:00

Hi,

> From: Huong Dangminh [mailto:huo-dangminh@ys.jp.nec.com]
> I attached a patch based on Sawada-san's patch with a bit of messages
> modified and remove the above check.
> Could somebody check it for me or should I add it into CF?

Sorry, I have added this thread to the next CF.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Re: Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

04 December 2017, 18:11:27

On Wed, Nov 22, 2017 at 12:25 AM, Huong Dangminh
<huo-dangminh@ys.jp.nec.com> wrote:
> Thanks for your response.
> # And sorry again because I could not reply to your gmail
> # address from my environment due to security restriction.

It's okay. I can understand your environment :-)

> Sorry for not replying sooner.
>
>> > Attached draft patch fixed this issue, at least on my environment.
>>
>> It works good for me.
>>
>> > Please review it.
>>
>> I will review it soon.
>
> There is one more case that user-defined data type is not supported in Logical Replication.
> That is when remote data type's name does not exist in SUBSCRIBE.
>
> In relation.c:logicalrep_typmap_gettypname
> We search OID in syscache by remote's data type name and mapping it, if it does not exist in syscache
> We will be faced with the bellow error.
>
>         if (!OidIsValid(entry->typoid))
>                 ereport(ERROR,
>                                 (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                  errmsg("data type \"%s.%s\" required for logical replication does not exist",
>                                                 entry->nspname, entry->typname)));
>
> I think, it is not necessary to check typoid here in order to avoid above case, is that right?

I think it's not right. We should end up with an error in the case
where the same type name doesn't exist on subscriber. With your
proposed patch, logicalrep_typmap_gettypname() can return an invalid
string (entry->typname) in that case, which can be a cause of SEGV.

> Sorry, I have added this thread to the next CF.

Thank you for adding it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: User defined data types in Logical Replication

From

Dang Minh Huong

Date:

06 December 2017, 18:23:52

Hi Sawada-san,

Sorry for my late response.

On 2017/12/05 0:11, Masahiko Sawada wrote:

There is one more case that user-defined data type is not supported in Logical Replication.
That is when remote data type's name does not exist in SUBSCRIBE.

In relation.c:logicalrep_typmap_gettypname
We search OID in syscache by remote's data type name and mapping it, if it does not exist in syscache
We will be faced with the bellow error.
       if (!OidIsValid(entry->typoid))               ereport(ERROR,                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),                                errmsg("data type \"%s.%s\" required for logical replication does not exist",                                               entry->nspname, entry->typname)));

I think, it is not necessary to check typoid here in order to avoid above case, is that right?

I think it's not right. We should end up with an error in the case where the same type name doesn't exist on subscriber. With your proposed patch, logicalrep_typmap_gettypname() can return an invalid string (entry->typname) in that case, which can be a cause of SEGV.

Thanks, I think you are right.
# I thought that entry->typname was received from walsender, and it is already be qualified in logicalrep_write_typ.
# But we also need check it in subscriber, because it could be lost info in transmission.
Also we do not need to fix for the case above too, because user can change type name by ALTER TYPE statement.
I attached the patch, which based on your patch with a bit of modified messages.

---
Thanks and best regards,
Dang Minh Huong

Attachment

fix_slot_store_error_callback_V3.patch

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

07 December 2017, 05:07:55

On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <kakalot49@gmail.com> wrote:
> Hi Sawada-san,
>
> Sorry for my late response.
>
> On 2017/12/05 0:11, Masahiko Sawada wrote:
>
> There is one more case that user-defined data type is not supported in
> Logical Replication.
> That is when remote data type's name does not exist in SUBSCRIBE.
>
> In relation.c:logicalrep_typmap_gettypname
> We search OID in syscache by remote's data type name and mapping it, if it
> does not exist in syscache
> We will be faced with the bellow error.
>
>         if (!OidIsValid(entry->typoid))
>                 ereport(ERROR,
>
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                                  errmsg("data type \"%s.%s\" required for
> logical replication does not exist",
>                                                 entry->nspname,
> entry->typname)));
>
> I think, it is not necessary to check typoid here in order to avoid above
> case, is that right?
>
> I think it's not right. We should end up with an error in the case where the
> same type name doesn't exist on subscriber. With your proposed patch,
> logicalrep_typmap_gettypname() can return an invalid string (entry->typname)
> in that case, which can be a cause of SEGV.
>
> Thanks, I think you are right.
> # I thought that entry->typname was received from walsender, and it is
> already be qualified in logicalrep_write_typ.
> # But we also need check it in subscriber, because it could be lost info in
> transmission.

Oops, the last sentence of my previous mail was wrong.
logicalrep_typmap_gettypname() doesn't return an invalid string since
entry->typname is initialized with a type name got from wal sender.

After more thought, we might not need to raise an error even if there
is not the same data type on both publisher and subscriber. Because
data is sent after converted to the text representation and is
converted to a data type according to the local table definition
subscribers don't always need to have the same data type. If it's
right, slot_store_error_callback() doesn't need to find a
corresponding local data type OID but just finds the corresponding
type name by remote data type OID. What do you think?

--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
DLIST_STATIC_INIT(lsn_mapping);

 typedef struct SlotErrCallbackArg
 {
-    LogicalRepRelation *rel;
-    int            attnum;
+    LogicalRepRelMapEntry *rel;
+    int            remote_attnum;
+    int            local_attnum;
 } SlotErrCallbackArg;

Since LogicalRepRelMapEntry has a map of local attributes to remote
ones we don't need to have two attribute numbers.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

08 December 2017, 06:06:58

On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <kakalot49@gmail.com> wrote:
>> Hi Sawada-san,
>>
>> Sorry for my late response.
>>
>> On 2017/12/05 0:11, Masahiko Sawada wrote:
>>
>> There is one more case that user-defined data type is not supported in
>> Logical Replication.
>> That is when remote data type's name does not exist in SUBSCRIBE.
>>
>> In relation.c:logicalrep_typmap_gettypname
>> We search OID in syscache by remote's data type name and mapping it, if it
>> does not exist in syscache
>> We will be faced with the bellow error.
>>
>>         if (!OidIsValid(entry->typoid))
>>                 ereport(ERROR,
>>
>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>                                  errmsg("data type \"%s.%s\" required for
>> logical replication does not exist",
>>                                                 entry->nspname,
>> entry->typname)));
>>
>> I think, it is not necessary to check typoid here in order to avoid above
>> case, is that right?
>>
>> I think it's not right. We should end up with an error in the case where the
>> same type name doesn't exist on subscriber. With your proposed patch,
>> logicalrep_typmap_gettypname() can return an invalid string (entry->typname)
>> in that case, which can be a cause of SEGV.
>>
>> Thanks, I think you are right.
>> # I thought that entry->typname was received from walsender, and it is
>> already be qualified in logicalrep_write_typ.
>> # But we also need check it in subscriber, because it could be lost info in
>> transmission.
>
> Oops, the last sentence of my previous mail was wrong.
> logicalrep_typmap_gettypname() doesn't return an invalid string since
> entry->typname is initialized with a type name got from wal sender.
>
> After more thought, we might not need to raise an error even if there
> is not the same data type on both publisher and subscriber. Because
> data is sent after converted to the text representation and is
> converted to a data type according to the local table definition
> subscribers don't always need to have the same data type. If it's
> right, slot_store_error_callback() doesn't need to find a
> corresponding local data type OID but just finds the corresponding
> type name by remote data type OID. What do you think?
>
> --- a/src/backend/replication/logical/worker.c
> +++ b/src/backend/replication/logical/worker.c
> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
> DLIST_STATIC_INIT(lsn_mapping);
>
>  typedef struct SlotErrCallbackArg
>  {
> -    LogicalRepRelation *rel;
> -    int            attnum;
> +    LogicalRepRelMapEntry *rel;
> +    int            remote_attnum;
> +    int            local_attnum;
>  } SlotErrCallbackArg;
>
> Since LogicalRepRelMapEntry has a map of local attributes to remote
> ones we don't need to have two attribute numbers.
>

Attached the patch incorporated what I have on mind. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

fix_slot_store_error_callback_v4.patch

RE: User defined data types in Logical Replication

From

Huong Dangminh

Date:

08 December 2017, 07:18:17

Hi Sawada-san,

> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <sawada.mshk@gmail.com>
> wrote:
> > On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <kakalot49@gmail.com>
> wrote:
> >> Hi Sawada-san,
> >>
> >> Sorry for my late response.
> >>
> >> On 2017/12/05 0:11, Masahiko Sawada wrote:
> >>
> >> There is one more case that user-defined data type is not supported
> >> in Logical Replication.
> >> That is when remote data type's name does not exist in SUBSCRIBE.
> >>
> >> In relation.c:logicalrep_typmap_gettypname
> >> We search OID in syscache by remote's data type name and mapping it,
> >> if it does not exist in syscache We will be faced with the bellow
> >> error.
> >>
> >>         if (!OidIsValid(entry->typoid))
> >>                 ereport(ERROR,
> >>
> >> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >>                                  errmsg("data type \"%s.%s\" required
> >> for logical replication does not exist",
> >>                                                 entry->nspname,
> >> entry->typname)));
> >>
> >> I think, it is not necessary to check typoid here in order to avoid
> >> above case, is that right?
> >>
> >> I think it's not right. We should end up with an error in the case
> >> where the same type name doesn't exist on subscriber. With your
> >> proposed patch,
> >> logicalrep_typmap_gettypname() can return an invalid string
> >> (entry->typname) in that case, which can be a cause of SEGV.
> >>
> >> Thanks, I think you are right.
> >> # I thought that entry->typname was received from walsender, and it
> >> is already be qualified in logicalrep_write_typ.
> >> # But we also need check it in subscriber, because it could be lost
> >> info in transmission.
> >
> > Oops, the last sentence of my previous mail was wrong.
> > logicalrep_typmap_gettypname() doesn't return an invalid string since
> > entry->typname is initialized with a type name got from wal sender.
> >
> > After more thought, we might not need to raise an error even if there
> > is not the same data type on both publisher and subscriber. Because
> > data is sent after converted to the text representation and is
> > converted to a data type according to the local table definition
> > subscribers don't always need to have the same data type. If it's
> > right, slot_store_error_callback() doesn't need to find a
> > corresponding local data type OID but just finds the corresponding
> > type name by remote data type OID. What do you think?
> >
> > --- a/src/backend/replication/logical/worker.c
> > +++ b/src/backend/replication/logical/worker.c
> > @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
> > DLIST_STATIC_INIT(lsn_mapping);
> >
> >  typedef struct SlotErrCallbackArg
> >  {
> > -    LogicalRepRelation *rel;
> > -    int            attnum;
> > +    LogicalRepRelMapEntry *rel;
> > +    int            remote_attnum;
> > +    int            local_attnum;
> >  } SlotErrCallbackArg;
> >
> > Since LogicalRepRelMapEntry has a map of local attributes to remote
> > ones we don't need to have two attribute numbers.
> >

Sorry for the late reply.

> Attached the patch incorporated what I have on mind. Please review it.

Thanks for the patch, I will do it at this weekend.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Re: User defined data types in Logical Replication

From

Dang Minh Huong

Date:

09 December 2017, 18:33:24

On 2017/12/08 13:18, Huong Dangminh wrote:

> Hi Sawada-san,
>
>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <sawada.mshk@gmail.com>
>> wrote:
>>> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <kakalot49@gmail.com>
>> wrote:
>>>> Hi Sawada-san,
>>>>
>>>> Sorry for my late response.
>>>>
>>>> On 2017/12/05 0:11, Masahiko Sawada wrote:
>>>>
>>>> There is one more case that user-defined data type is not supported
>>>> in Logical Replication.
>>>> That is when remote data type's name does not exist in SUBSCRIBE.
>>>>
>>>> In relation.c:logicalrep_typmap_gettypname
>>>> We search OID in syscache by remote's data type name and mapping it,
>>>> if it does not exist in syscache We will be faced with the bellow
>>>> error.
>>>>
>>>>          if (!OidIsValid(entry->typoid))
>>>>                  ereport(ERROR,
>>>>
>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>                                   errmsg("data type \"%s.%s\" required
>>>> for logical replication does not exist",
>>>>                                                  entry->nspname,
>>>> entry->typname)));
>>>>
>>>> I think, it is not necessary to check typoid here in order to avoid
>>>> above case, is that right?
>>>>
>>>> I think it's not right. We should end up with an error in the case
>>>> where the same type name doesn't exist on subscriber. With your
>>>> proposed patch,
>>>> logicalrep_typmap_gettypname() can return an invalid string
>>>> (entry->typname) in that case, which can be a cause of SEGV.
>>>>
>>>> Thanks, I think you are right.
>>>> # I thought that entry->typname was received from walsender, and it
>>>> is already be qualified in logicalrep_write_typ.
>>>> # But we also need check it in subscriber, because it could be lost
>>>> info in transmission.
>>> Oops, the last sentence of my previous mail was wrong.
>>> logicalrep_typmap_gettypname() doesn't return an invalid string since
>>> entry->typname is initialized with a type name got from wal sender.
Yeah, so we do not need to check the existing of publisher's type name 
in subscriber.
>>> After more thought, we might not need to raise an error even if there
>>> is not the same data type on both publisher and subscriber. Because
>>> data is sent after converted to the text representation and is
>>> converted to a data type according to the local table definition
>>> subscribers don't always need to have the same data type. If it's
>>> right, slot_store_error_callback() doesn't need to find a
>>> corresponding local data type OID but just finds the corresponding
>>> type name by remote data type OID. What do you think?
I totally agree. It will make logical replication more flexible with 
data type.
>>> --- a/src/backend/replication/logical/worker.c
>>> +++ b/src/backend/replication/logical/worker.c
>>> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
>>> DLIST_STATIC_INIT(lsn_mapping);
>>>
>>>   typedef struct SlotErrCallbackArg
>>>   {
>>> -    LogicalRepRelation *rel;
>>> -    int            attnum;
>>> +    LogicalRepRelMapEntry *rel;
>>> +    int            remote_attnum;
>>> +    int            local_attnum;
>>>   } SlotErrCallbackArg;
>>>
>>> Since LogicalRepRelMapEntry has a map of local attributes to remote
>>> ones we don't need to have two attribute numbers.
>>>
> Sorry for the late reply.
>
>> Attached the patch incorporated what I have on mind. Please review it.
> Thanks for the patch, I will do it at this weekend.
Your patch is fine for me.
But logicalrep_typmap_getid will be unused.
I attached the patch with removing of it.


---
Thanks and best regards,
Dang Minh Huong

Attachment

fix_slot_store_error_callback_v5.patch

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

12 December 2017, 03:40:56

On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong <kakalot49@gmail.com> wrote:
> On 2017/12/08 13:18, Huong Dangminh wrote:
>
>> Hi Sawada-san,
>>
>>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <sawada.mshk@gmail.com>
>>> wrote:
>>>>
>>>> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <kakalot49@gmail.com>
>>>
>>> wrote:
>>>>>
>>>>> Hi Sawada-san,
>>>>>
>>>>> Sorry for my late response.
>>>>>
>>>>> On 2017/12/05 0:11, Masahiko Sawada wrote:
>>>>>
>>>>> There is one more case that user-defined data type is not supported
>>>>> in Logical Replication.
>>>>> That is when remote data type's name does not exist in SUBSCRIBE.
>>>>>
>>>>> In relation.c:logicalrep_typmap_gettypname
>>>>> We search OID in syscache by remote's data type name and mapping it,
>>>>> if it does not exist in syscache We will be faced with the bellow
>>>>> error.
>>>>>
>>>>>          if (!OidIsValid(entry->typoid))
>>>>>                  ereport(ERROR,
>>>>>
>>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>>>>>                                   errmsg("data type \"%s.%s\" required
>>>>> for logical replication does not exist",
>>>>>                                                  entry->nspname,
>>>>> entry->typname)));
>>>>>
>>>>> I think, it is not necessary to check typoid here in order to avoid
>>>>> above case, is that right?
>>>>>
>>>>> I think it's not right. We should end up with an error in the case
>>>>> where the same type name doesn't exist on subscriber. With your
>>>>> proposed patch,
>>>>> logicalrep_typmap_gettypname() can return an invalid string
>>>>> (entry->typname) in that case, which can be a cause of SEGV.
>>>>>
>>>>> Thanks, I think you are right.
>>>>> # I thought that entry->typname was received from walsender, and it
>>>>> is already be qualified in logicalrep_write_typ.
>>>>> # But we also need check it in subscriber, because it could be lost
>>>>> info in transmission.
>>>>
>>>> Oops, the last sentence of my previous mail was wrong.
>>>> logicalrep_typmap_gettypname() doesn't return an invalid string since
>>>> entry->typname is initialized with a type name got from wal sender.
>
> Yeah, so we do not need to check the existing of publisher's type name in
> subscriber.
>>>>
>>>> After more thought, we might not need to raise an error even if there
>>>> is not the same data type on both publisher and subscriber. Because
>>>> data is sent after converted to the text representation and is
>>>> converted to a data type according to the local table definition
>>>> subscribers don't always need to have the same data type. If it's
>>>> right, slot_store_error_callback() doesn't need to find a
>>>> corresponding local data type OID but just finds the corresponding
>>>> type name by remote data type OID. What do you think?
>
> I totally agree. It will make logical replication more flexible with data
> type.
>>>>
>>>> --- a/src/backend/replication/logical/worker.c
>>>> +++ b/src/backend/replication/logical/worker.c
>>>> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
>>>> DLIST_STATIC_INIT(lsn_mapping);
>>>>
>>>>   typedef struct SlotErrCallbackArg
>>>>   {
>>>> -    LogicalRepRelation *rel;
>>>> -    int            attnum;
>>>> +    LogicalRepRelMapEntry *rel;
>>>> +    int            remote_attnum;
>>>> +    int            local_attnum;
>>>>   } SlotErrCallbackArg;
>>>>
>>>> Since LogicalRepRelMapEntry has a map of local attributes to remote
>>>> ones we don't need to have two attribute numbers.
>>>>
>> Sorry for the late reply.
>>
>>> Attached the patch incorporated what I have on mind. Please review it.
>>
>> Thanks for the patch, I will do it at this weekend.
>
> Your patch is fine for me.
> But logicalrep_typmap_getid will be unused.

Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow
to replicate data to an another data type of column, what is the data
type mapping on subscriber for? It seems enough to have remote data
type oid, namespace and name.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

RE: User defined data types in Logical Replication

From

Huong Dangminh

Date:

18 December 2017, 16:19:37

Hi Sawada-san,

> Sent: Tuesday, December 12, 2017 9:41 AM
> To: Dang Minh Huong <kakalot49@gmail.com>
> Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>; Yanagisawa
> Hiroshi(柳澤 博) <hir-yanagisawa@ut.jp.nec.com>; Dangminh Huong(ダンＭ
> フーン) <huo-dangminh@ys.jp.nec.com>
> Subject: Re: User defined data types in Logical Replication
> 
> On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong <kakalot49@gmail.com>
> wrote:
> > On 2017/12/08 13:18, Huong Dangminh wrote:
> >
> >> Hi Sawada-san,
> >>
> >>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada
> >>> <sawada.mshk@gmail.com>
> >>> wrote:
> >>>>
> >>>> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong
> >>>> <kakalot49@gmail.com>
> >>>
> >>> wrote:
> >>>>>
> >>>>> Hi Sawada-san,
> >>>>>
> >>>>> Sorry for my late response.
> >>>>>
> >>>>> On 2017/12/05 0:11, Masahiko Sawada wrote:
> >>>>>
> >>>>> There is one more case that user-defined data type is not
> >>>>> supported in Logical Replication.
> >>>>> That is when remote data type's name does not exist in SUBSCRIBE.
> >>>>>
> >>>>> In relation.c:logicalrep_typmap_gettypname
> >>>>> We search OID in syscache by remote's data type name and mapping
> >>>>> it, if it does not exist in syscache We will be faced with the
> >>>>> bellow error.
> >>>>>
> >>>>>          if (!OidIsValid(entry->typoid))
> >>>>>                  ereport(ERROR,
> >>>>>
> >>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >>>>>                                   errmsg("data type \"%s.%s\"
> >>>>> required for logical replication does not exist",
> >>>>>                                                  entry->nspname,
> >>>>> entry->typname)));
> >>>>>
> >>>>> I think, it is not necessary to check typoid here in order to
> >>>>> avoid above case, is that right?
> >>>>>
> >>>>> I think it's not right. We should end up with an error in the case
> >>>>> where the same type name doesn't exist on subscriber. With your
> >>>>> proposed patch,
> >>>>> logicalrep_typmap_gettypname() can return an invalid string
> >>>>> (entry->typname) in that case, which can be a cause of SEGV.
> >>>>>
> >>>>> Thanks, I think you are right.
> >>>>> # I thought that entry->typname was received from walsender, and
> >>>>> it is already be qualified in logicalrep_write_typ.
> >>>>> # But we also need check it in subscriber, because it could be
> >>>>> lost info in transmission.
> >>>>
> >>>> Oops, the last sentence of my previous mail was wrong.
> >>>> logicalrep_typmap_gettypname() doesn't return an invalid string
> >>>> since
> >>>> entry->typname is initialized with a type name got from wal sender.
> >
> > Yeah, so we do not need to check the existing of publisher's type name
> > in subscriber.
> >>>>
> >>>> After more thought, we might not need to raise an error even if
> >>>> there is not the same data type on both publisher and subscriber.
> >>>> Because data is sent after converted to the text representation and
> >>>> is converted to a data type according to the local table definition
> >>>> subscribers don't always need to have the same data type. If it's
> >>>> right, slot_store_error_callback() doesn't need to find a
> >>>> corresponding local data type OID but just finds the corresponding
> >>>> type name by remote data type OID. What do you think?
> >
> > I totally agree. It will make logical replication more flexible with
> > data type.
> >>>>
> >>>> --- a/src/backend/replication/logical/worker.c
> >>>> +++ b/src/backend/replication/logical/worker.c
> >>>> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
> >>>> DLIST_STATIC_INIT(lsn_mapping);
> >>>>
> >>>>   typedef struct SlotErrCallbackArg
> >>>>   {
> >>>> -    LogicalRepRelation *rel;
> >>>> -    int            attnum;
> >>>> +    LogicalRepRelMapEntry *rel;
> >>>> +    int            remote_attnum;
> >>>> +    int            local_attnum;
> >>>>   } SlotErrCallbackArg;
> >>>>
> >>>> Since LogicalRepRelMapEntry has a map of local attributes to remote
> >>>> ones we don't need to have two attribute numbers.
> >>>>
> >> Sorry for the late reply.
> >>
> >>> Attached the patch incorporated what I have on mind. Please review it.
> >>
> >> Thanks for the patch, I will do it at this weekend.
> >
> > Your patch is fine for me.
> > But logicalrep_typmap_getid will be unused.
> 
> Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow to
> replicate data to an another data type of column, what is the data type
> mapping on subscriber for? It seems enough to have remote data type oid,
> namespace and name.

Yeah, so the meaning of the type map will be disappeared, aren't you?
I updated the patch with removing of LogicalRepTyp.typoid and changing
concept "typmap" to "remote type".
How do you think?

---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Attachment

fix_slot_store_error_callback_v6.patch

RE: User defined data types in Logical Replication

From

Huong Dangminh

Date:

18 December 2017, 16:28:52

Hi Sawada-san,

> > Sent: Tuesday, December 12, 2017 9:41 AM
> > To: Dang Minh Huong <kakalot49@gmail.com>
> > Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>; Yanagisawa
> > Hiroshi(柳澤 博) <hir-yanagisawa@ut.jp.nec.com>; Dangminh Huong(ダン
> Ｍ
> > フーン) <huo-dangminh@ys.jp.nec.com>
> > Subject: Re: User defined data types in Logical Replication
> >
> > On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong
> > <kakalot49@gmail.com>
> > wrote:
> > > On 2017/12/08 13:18, Huong Dangminh wrote:
> > >
> > >> Hi Sawada-san,
> > >>
> > >>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada
> > >>> <sawada.mshk@gmail.com>
> > >>> wrote:
> > >>>>
> > >>>> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong
> > >>>> <kakalot49@gmail.com>
> > >>>
> > >>> wrote:
> > >>>>>
> > >>>>> Hi Sawada-san,
> > >>>>>
> > >>>>> Sorry for my late response.
> > >>>>>
> > >>>>> On 2017/12/05 0:11, Masahiko Sawada wrote:
> > >>>>>
> > >>>>> There is one more case that user-defined data type is not
> > >>>>> supported in Logical Replication.
> > >>>>> That is when remote data type's name does not exist in SUBSCRIBE.
> > >>>>>
> > >>>>> In relation.c:logicalrep_typmap_gettypname
> > >>>>> We search OID in syscache by remote's data type name and mapping
> > >>>>> it, if it does not exist in syscache We will be faced with the
> > >>>>> bellow error.
> > >>>>>
> > >>>>>          if (!OidIsValid(entry->typoid))
> > >>>>>                  ereport(ERROR,
> > >>>>>
> > >>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > >>>>>                                   errmsg("data type \"%s.%s\"
> > >>>>> required for logical replication does not exist",
> > >>>>>
> entry->nspname,
> > >>>>> entry->typname)));
> > >>>>>
> > >>>>> I think, it is not necessary to check typoid here in order to
> > >>>>> avoid above case, is that right?
> > >>>>>
> > >>>>> I think it's not right. We should end up with an error in the
> > >>>>> case where the same type name doesn't exist on subscriber. With
> > >>>>> your proposed patch,
> > >>>>> logicalrep_typmap_gettypname() can return an invalid string
> > >>>>> (entry->typname) in that case, which can be a cause of SEGV.
> > >>>>>
> > >>>>> Thanks, I think you are right.
> > >>>>> # I thought that entry->typname was received from walsender, and
> > >>>>> it is already be qualified in logicalrep_write_typ.
> > >>>>> # But we also need check it in subscriber, because it could be
> > >>>>> lost info in transmission.
> > >>>>
> > >>>> Oops, the last sentence of my previous mail was wrong.
> > >>>> logicalrep_typmap_gettypname() doesn't return an invalid string
> > >>>> since
> > >>>> entry->typname is initialized with a type name got from wal sender.
> > >
> > > Yeah, so we do not need to check the existing of publisher's type
> > > name in subscriber.
> > >>>>
> > >>>> After more thought, we might not need to raise an error even if
> > >>>> there is not the same data type on both publisher and subscriber.
> > >>>> Because data is sent after converted to the text representation
> > >>>> and is converted to a data type according to the local table
> > >>>> definition subscribers don't always need to have the same data
> > >>>> type. If it's right, slot_store_error_callback() doesn't need to
> > >>>> find a corresponding local data type OID but just finds the
> > >>>> corresponding type name by remote data type OID. What do you think?
> > >
> > > I totally agree. It will make logical replication more flexible with
> > > data type.
> > >>>>
> > >>>> --- a/src/backend/replication/logical/worker.c
> > >>>> +++ b/src/backend/replication/logical/worker.c
> > >>>> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
> > >>>> DLIST_STATIC_INIT(lsn_mapping);
> > >>>>
> > >>>>   typedef struct SlotErrCallbackArg
> > >>>>   {
> > >>>> -    LogicalRepRelation *rel;
> > >>>> -    int            attnum;
> > >>>> +    LogicalRepRelMapEntry *rel;
> > >>>> +    int            remote_attnum;
> > >>>> +    int            local_attnum;
> > >>>>   } SlotErrCallbackArg;
> > >>>>
> > >>>> Since LogicalRepRelMapEntry has a map of local attributes to
> > >>>> remote ones we don't need to have two attribute numbers.
> > >>>>
> > >> Sorry for the late reply.
> > >>
> > >>> Attached the patch incorporated what I have on mind. Please review
> it.
> > >>
> > >> Thanks for the patch, I will do it at this weekend.
> > >
> > > Your patch is fine for me.
> > > But logicalrep_typmap_getid will be unused.
> >
> > Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow
> > to replicate data to an another data type of column, what is the data
> > type mapping on subscriber for? It seems enough to have remote data
> > type oid, namespace and name.
> 
> Yeah, so the meaning of the type map will be disappeared, aren't you?
> I updated the patch with removing of LogicalRepTyp.typoid and changing
> concept "typmap" to "remote type".
> How do you think?

Sorry, Please confirm V7 of patch (attached in this mail).


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Attachment

fix_slot_store_error_callback_v7.patch

Re: User defined data types in Logical Replication

From

Petr Jelinek

Date:

18 December 2017, 16:43:52

Hi,

On 18/12/17 14:28, Huong Dangminh wrote:
>>>>
>>>> Your patch is fine for me.
>>>> But logicalrep_typmap_getid will be unused.
>>>
>>> Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow
>>> to replicate data to an another data type of column, what is the data
>>> type mapping on subscriber for? It seems enough to have remote data
>>> type oid, namespace and name.
>>
>> Yeah, so the meaning of the type map will be disappeared, aren't you?
>> I updated the patch with removing of LogicalRepTyp.typoid and changing
>> concept "typmap" to "remote type".
>> How do you think?
> 
> Sorry, Please confirm V7 of patch (attached in this mail).
> 

I think the changes make sense in terms of how it all works now.

That said I don't think the renaming idea is a good one, the naming was
chosen to be future proof because eventually we'll need to map types to
local oid (and possibly more) where the local info is cached so that we
can interpret binary representation of replicated data (which we'll add
at some point since it's big performance boost).

So I am afraid that if we do the rename of typmap to remotetype in this
patch it will a) make backports of fixes in the related code harder, b)
force us to rename it back again in the future.

I'd keep your general approach but keep using typmap naming.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

RE: User defined data types in Logical Replication

From

Huong Dangminh

Date:

18 December 2017, 17:25:11

Hi Petr Jelineks, Sawada-san

> I think the changes make sense in terms of how it all works now.
> 
> That said I don't think the renaming idea is a good one, the naming was
> chosen to be future proof because eventually we'll need to map types to
> local oid (and possibly more) where the local info is cached so that we
> can interpret binary representation of replicated data (which we'll add
> at some point since it's big performance boost).
> 
> So I am afraid that if we do the rename of typmap to remotetype in this
> patch it will a) make backports of fixes in the related code harder, b)
> force us to rename it back again in the future.

Thanks for your comment.

> I'd keep your general approach but keep using typmap naming.

I update the patch as Petr Jelineks mention, keep using typmap naming.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Attachment

fix_slot_store_error_callback_v8.patch

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

20 December 2017, 07:30:47

On Mon, Dec 18, 2017 at 11:25 PM, Huong Dangminh
<huo-dangminh@ys.jp.nec.com> wrote:
>> eventually we'll need to map types to
>> local oid (and possibly more) where the local info is cached so that we
>> can interpret binary representation of replicated data (which we'll add
>> at some point since it's big performance boost).

Sounds good.

>>
>> So I am afraid that if we do the rename of typmap to remotetype in this
>> patch it will a) make backports of fixes in the related code harder, b)
>> force us to rename it back again in the future.
>
> Thanks for your comment.
>
>> I'd keep your general approach but keep using typmap naming.
>
> I update the patch as Petr Jelineks mention, keep using typmap naming.
>

Thank you for updating the patch. Here is a review comment.

-       if (errarg->attnum < 0)
+       rel = errarg->rel;
+       remote_attnum = rel->attrmap[errarg->local_attnum];
+
+       if (remote_attnum < 0)
                return;

I think errarg->local_attnum can be -1 here and access an invalid
address if slot_store_error_callback() is called before setting
errarg.local_attnum. I cannot see such call path in the current code
so far but would need to be fixed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

RE: User defined data types in Logical Replication

From

Huong Dangminh

Date:

20 December 2017, 08:28:39

Hi Sawada-san,

> >> eventually we'll need to map types to local oid (and possibly more)
> >> where the local info is cached so that we can interpret binary
> >> representation of replicated data (which we'll add at some point
> >> since it's big performance boost).
> 
> Sounds good.
> 
> >>
> >> So I am afraid that if we do the rename of typmap to remotetype in
> >> this patch it will a) make backports of fixes in the related code
> >> harder, b) force us to rename it back again in the future.
> >
> > Thanks for your comment.
> >
> >> I'd keep your general approach but keep using typmap naming.
> >
> > I update the patch as Petr Jelineks mention, keep using typmap naming.
> >
> 
> Thank you for updating the patch. Here is a review comment.

Thanks for reviewing.

> -       if (errarg->attnum < 0)
> +       rel = errarg->rel;
> +       remote_attnum = rel->attrmap[errarg->local_attnum];
> +
> +       if (remote_attnum < 0)
>                 return;
> 
> I think errarg->local_attnum can be -1 here and access an invalid address
> if slot_store_error_callback() is called before setting
> errarg.local_attnum. I cannot see such call path in the current code so
> far but would need to be fixed.

I updated the patch to fix it.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Attachment

fix_slot_store_error_callback_v9.patch

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

20 December 2017, 10:06:46

On Wed, Dec 20, 2017 at 2:28 PM, Huong Dangminh
<huo-dangminh@ys.jp.nec.com> wrote:
> Hi Sawada-san,
>
>> >> eventually we'll need to map types to local oid (and possibly more)
>> >> where the local info is cached so that we can interpret binary
>> >> representation of replicated data (which we'll add at some point
>> >> since it's big performance boost).
>>
>> Sounds good.
>>
>> >>
>> >> So I am afraid that if we do the rename of typmap to remotetype in
>> >> this patch it will a) make backports of fixes in the related code
>> >> harder, b) force us to rename it back again in the future.
>> >
>> > Thanks for your comment.
>> >
>> >> I'd keep your general approach but keep using typmap naming.
>> >
>> > I update the patch as Petr Jelineks mention, keep using typmap naming.
>> >
>>
>> Thank you for updating the patch. Here is a review comment.
>
> Thanks for reviewing.
>
>> -       if (errarg->attnum < 0)
>> +       rel = errarg->rel;
>> +       remote_attnum = rel->attrmap[errarg->local_attnum];
>> +
>> +       if (remote_attnum < 0)
>>                 return;
>>
>> I think errarg->local_attnum can be -1 here and access an invalid address
>> if slot_store_error_callback() is called before setting
>> errarg.local_attnum. I cannot see such call path in the current code so
>> far but would need to be fixed.
>
> I updated the patch to fix it.
>

Thank you for quick response. The changes look good to me. But I
wonder if the following changes needs some comments to describe what
each checks does for.

-    if (errarg->attnum < 0)
+    if (errarg->local_attnum <0)
+        return;
+
+    rel = errarg->rel;
+    remote_attnum = rel->attrmap[errarg->local_attnum];
+
+    if (remote_attnum < 0)
         return;

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

RE: User defined data types in Logical Replication

From

Huong Dangminh

Date:

20 December 2017, 11:39:46

Hi Sawada-san,

> Thank you for quick response. The changes look good to me. But I wonder
> if the following changes needs some comments to describe what each checks
> does for.
> 
> -    if (errarg->attnum < 0)
> +    if (errarg->local_attnum <0)
> +        return;
> +
> +    rel = errarg->rel;
> +    remote_attnum = rel->attrmap[errarg->local_attnum];
> +
> +    if (remote_attnum < 0)
>          return;

Thanks, I have added some comments as you mentioned.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Attachment

fix_slot_store_error_callback_v10.patch

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

21 December 2017, 04:05:35

On Wed, Dec 20, 2017 at 5:39 PM, Huong Dangminh
<huo-dangminh@ys.jp.nec.com> wrote:
> Hi Sawada-san,
>
>> Thank you for quick response. The changes look good to me. But I wonder
>> if the following changes needs some comments to describe what each checks
>> does for.
>>
>> -    if (errarg->attnum < 0)
>> +    if (errarg->local_attnum <0)
>> +        return;
>> +
>> +    rel = errarg->rel;
>> +    remote_attnum = rel->attrmap[errarg->local_attnum];
>> +
>> +    if (remote_attnum < 0)
>>          return;
>
> Thanks, I have added some comments as you mentioned.
>

Thank you for updating the patch.

-       if (errarg->attnum < 0)
+       /* Check case of slot_store_error_callback() is called before
+        * errarg.local_attnum is set. */
+       if (errarg->local_attnum <0)

This comment style isn't preferred by PostgreSQL code. Please refer to
https://www.postgresql.org/docs/current/static/source-format.html

--
$ git diff --check
src/backend/replication/logical/worker.c:291: trailing whitespace.
+       /* Check case of slot_store_error_callback() is called before

There is an extra white space in the patch.

I'm inclined to think SlotErrCallbackArg can have remote_attnum and
pass it to the callback function. That way, we don't need to case the
case where local_attnum is not set yet.
Attached a new version patch incorporated the aboves. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

fix_slot_store_error_callback_v11.patch

Re: User defined data types in Logical Replication

From

Dang Minh Huong

Date:

23 December 2017, 10:08:57

On 2017/12/21 10:05, Masahiko Sawada wrote:

> On Wed, Dec 20, 2017 at 5:39 PM, Huong Dangminh
> <huo-dangminh@ys.jp.nec.com> wrote:
>> Hi Sawada-san,
>>
>>> Thank you for quick response. The changes look good to me. But I wonder
>>> if the following changes needs some comments to describe what each checks
>>> does for.
>>>
>>> -    if (errarg->attnum < 0)
>>> +    if (errarg->local_attnum <0)
>>> +        return;
>>> +
>>> +    rel = errarg->rel;
>>> +    remote_attnum = rel->attrmap[errarg->local_attnum];
>>> +
>>> +    if (remote_attnum < 0)
>>>           return;
>> Thanks, I have added some comments as you mentioned.
>>
> Thank you for updating the patch.
>
> -       if (errarg->attnum < 0)
> +       /* Check case of slot_store_error_callback() is called before
> +        * errarg.local_attnum is set. */
> +       if (errarg->local_attnum <0)
>
> This comment style isn't preferred by PostgreSQL code. Please refer to
> https://www.postgresql.org/docs/current/static/source-format.html
> --
> $ git diff --check
> src/backend/replication/logical/worker.c:291: trailing whitespace.
> +       /* Check case of slot_store_error_callback() is called before
Thanks, I will be careful in the next time.
> There is an extra white space in the patch.
>
> I'm inclined to think SlotErrCallbackArg can have remote_attnum and
> pass it to the callback function. That way, we don't need to case the
> case where local_attnum is not set yet.
Nice.
> Attached a new version patch incorporated the aboves. Please review it.
Thanks for updating the patch.
It looks fine to me.


---
Thanks and best regards,
Dang Minh Huong

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

25 December 2017, 04:58:41

On Sat, Dec 23, 2017 at 4:08 PM, Dang Minh Huong <kakalot49@gmail.com> wrote:
> On 2017/12/21 10:05, Masahiko Sawada wrote:
>
>> On Wed, Dec 20, 2017 at 5:39 PM, Huong Dangminh
>> <huo-dangminh@ys.jp.nec.com> wrote:
>>>
>>> Hi Sawada-san,
>>>
>>>> Thank you for quick response. The changes look good to me. But I wonder
>>>> if the following changes needs some comments to describe what each
>>>> checks
>>>> does for.
>>>>
>>>> -    if (errarg->attnum < 0)
>>>> +    if (errarg->local_attnum <0)
>>>> +        return;
>>>> +
>>>> +    rel = errarg->rel;
>>>> +    remote_attnum = rel->attrmap[errarg->local_attnum];
>>>> +
>>>> +    if (remote_attnum < 0)
>>>>           return;
>>>
>>> Thanks, I have added some comments as you mentioned.
>>>
>> Thank you for updating the patch.
>>
>> -       if (errarg->attnum < 0)
>> +       /* Check case of slot_store_error_callback() is called before
>> +        * errarg.local_attnum is set. */
>> +       if (errarg->local_attnum <0)
>>
>> This comment style isn't preferred by PostgreSQL code. Please refer to
>> https://www.postgresql.org/docs/current/static/source-format.html
>> --
>> $ git diff --check
>> src/backend/replication/logical/worker.c:291: trailing whitespace.
>> +       /* Check case of slot_store_error_callback() is called before
>
> Thanks, I will be careful in the next time.
>>
>> There is an extra white space in the patch.
>>
>> I'm inclined to think SlotErrCallbackArg can have remote_attnum and
>> pass it to the callback function. That way, we don't need to case the
>> case where local_attnum is not set yet.
>
> Nice.
>>
>> Attached a new version patch incorporated the aboves. Please review it.
>
> Thanks for updating the patch.
> It looks fine to me.
>

Thank you for confirmation.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

RE: User defined data types in Logical Replication

From

Huong Dangminh

Date:

25 December 2017, 07:26:11

> From: Masahiko Sawada [mailto:sawada.mshk@gmail.com]
> >> Attached a new version patch incorporated the aboves. Please review it.
> >
> > Thanks for updating the patch.
> > It looks fine to me.

I mean that it passes make check, and subscription TAP tests too.

> >
> 
> Thank you for confirmation.
> 

Also. I have changed status in CF to Ready for Committer.
I would be glad if it can be applied from PostgreSQL 10.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Re: User defined data types in Logical Replication

From

Alvaro Herrera

Date:

05 January 2018, 21:53:34

hash_seq_init in logicalrep_typmap_invalidate_cb is useless after your
patch.  If you remove it, the function becomes empty, so why is it there
an invalidation callback at all?

Are we now leaking memory if types keep repeatedly being re-created in
the origin?  I suppose it's not a common use pattern, but it'd be good
to avoid everlasting memleaks.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

09 January 2018, 11:22:13

On Sat, Jan 6, 2018 at 3:53 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> hash_seq_init in logicalrep_typmap_invalidate_cb is useless after your
> patch.  If you remove it, the function becomes empty, so why is it there
> an invalidation callback at all?

Thank you for the comment. Yeah, logicalrep_typmap_invalidate_cb is no
longer needed. Attached an updated patch.

> Are we now leaking memory if types keep repeatedly being re-created in
> the origin?

The type name and namespace name in LogicalRepTyp are freed when
updating entries but LogicalRepTyp entry itself could be leaked. It
can happen to relation map as well. Since we don't remove hash entry
during working in the origin the hash map entry for relation map is
leaked if publication repeatedly adds/drops tables and subscription
refreshes it.

> I suppose it's not a common use pattern, but it'd be good
> to avoid everlasting memleaks.

I agree. Can we remove entry from hash table in the callbacks instead
of setting InvalidOid when invalidate caches?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

fix_slot_store_error_callback_v12.patch

Re: User defined data types in Logical Replication

From

atorikoshi

Date:

30 January 2018, 12:36:07

Hi,

It seems to be in the middle of discussion, but I became a reviewer of
this problem several days ago so I've tested the latest patch
'fix_slot_store_error_callback_v12.patch'.

I reproduced the below ERROR by inserting elog() to INPUT function
of CREATE TYPE, and confirmed that above patch solves the problem.

 >>  ERROR:  XX000: cache lookup failed for type XXXXX

I also ran make check-world and there was no error.

Is the only remaining topic memory leaks?

On 2018/01/09 17:22, Masahiko Sawada wrote:

 >> I suppose it's not a common use pattern, but it'd be good
 >> to avoid everlasting memleaks.
 >
 > I agree. Can we remove entry from hash table in the callbacks instead
 > of setting InvalidOid when invalidate caches?

-- 
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

31 January 2018, 04:38:46

On Tue, Jan 30, 2018 at 6:36 PM, atorikoshi
<torikoshi_atsushi_z2@lab.ntt.co.jp> wrote:
> Hi,
>
> It seems to be in the middle of discussion, but I became a reviewer of
> this problem several days ago so I've tested the latest patch
> 'fix_slot_store_error_callback_v12.patch'.
>
> I reproduced the below ERROR by inserting elog() to INPUT function
> of CREATE TYPE, and confirmed that above patch solves the problem.
>
>>>  ERROR:  XX000: cache lookup failed for type XXXXX
>
> I also ran make check-world and there was no error.
>
> Is the only remaining topic memory leaks?
>

Yeah, but I think that the patch for the avoiding memleaks should be a
separate patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: User defined data types in Logical Replication

From

Alvaro Herrera

Date:

06 March 2018, 02:35:22

I noticed that logicalrep_typmap_gettypname's only caller is
slot_store_error_callback, which is an error callback.  So by raising an
error from the former, we would effectively just hide the actual reason
for the error behind the error that the cache entry cannot be found.

Therefore, I'm inclined to make this function raise a warning, then
return a substitute value (something like "unrecognized type XYZ").  Do
the same for the case with the mismatched major versions.  Lastly, if
LogicalRepTypMap is not defined, also throw a warning and return a
substitute value rather than try to initialize the hash table:
initializing the table is not going to make the entry show up in it,
anyway, so it's pretty pointless; and there's plenty chance for things
to go wrong, which is not a good idea in an error callback.

Do we need a new TAP test for this?  Judging by
https://coverage.postgresql.org/src/backend/replication/logical/relation.c.gcov.html
showing all zeroes for the last function, we do.  Same with
slot_store_error_callback in
https://coverage.postgresql.org/src/backend/replication/logical/worker.c.gcov.html

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

06 March 2018, 04:39:16

On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I noticed that logicalrep_typmap_gettypname's only caller is
> slot_store_error_callback, which is an error callback.  So by raising an
> error from the former, we would effectively just hide the actual reason
> for the error behind the error that the cache entry cannot be found.
>
> Therefore, I'm inclined to make this function raise a warning, then
> return a substitute value (something like "unrecognized type XYZ").  Do
> the same for the case with the mismatched major versions.  Lastly, if
> LogicalRepTypMap is not defined, also throw a warning and return a
> substitute value rather than try to initialize the hash table:
> initializing the table is not going to make the entry show up in it,
> anyway, so it's pretty pointless; and there's plenty chance for things
> to go wrong, which is not a good idea in an error callback.

I agree with you about not hiding the actual reason for the error but
if we raise a warning at logicalrep_typmap_gettypname don't we call
slot_store_error_callback recursively?

>
> Do we need a new TAP test for this?  Judging by
> https://coverage.postgresql.org/src/backend/replication/logical/relation.c.gcov.html
> showing all zeroes for the last function, we do.  Same with
> slot_store_error_callback in
> https://coverage.postgresql.org/src/backend/replication/logical/worker.c.gcov.html
>

Agreed. Will add a TAP test.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: User defined data types in Logical Replication

From

Alvaro Herrera

Date:

06 March 2018, 20:52:21

Masahiko Sawada wrote:
> On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> > Therefore, I'm inclined to make this function raise a warning, then
> > return a substitute value (something like "unrecognized type XYZ").
> > [...]
> 
> I agree with you about not hiding the actual reason for the error but
> if we raise a warning at logicalrep_typmap_gettypname don't we call
> slot_store_error_callback recursively?

Hmm, now that you mention it, I don't really know.  I think it's
supposed not to happen, since calling ereport() again opens a new
recursion level, but then maybe errcontext doesn't depend on the
recursion level ... I haven't checked.  This is why the TAP test would
be handy :-)

> Agreed. Will add a TAP test.

Great.  This patch waits on that, then.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

07 March 2018, 15:19:02

On Wed, Mar 7, 2018 at 2:52 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Masahiko Sawada wrote:
>> On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
>> > Therefore, I'm inclined to make this function raise a warning, then
>> > return a substitute value (something like "unrecognized type XYZ").
>> > [...]
>>
>> I agree with you about not hiding the actual reason for the error but
>> if we raise a warning at logicalrep_typmap_gettypname don't we call
>> slot_store_error_callback recursively?
>
> Hmm, now that you mention it, I don't really know.  I think it's
> supposed not to happen, since calling ereport() again opens a new
> recursion level, but then maybe errcontext doesn't depend on the
> recursion level ... I haven't checked.  This is why the TAP test would
> be handy :-)

The calling ereport opens a new recursion level. The calling ereport
with error doesn't return to caller but the calling with warning does.
So the recursively calling ereport(WARNING) ends up with exceeding the
errordata stack size. So it seems to me that we can set errcontext in
logicalrep_typmap_gettypname() instead of raising warning or error.

>
>> Agreed. Will add a TAP test.
>
> Great.  This patch waits on that, then.
>

Okay. I think the most simple and convenient way to reproduce this
issue is to call an elog(LOG) in input function of a user-defined data
type. So I'm thinking to create the test in src/test/module directory.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

14 March 2018, 05:18:38

On Wed, Mar 7, 2018 at 9:19 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Wed, Mar 7, 2018 at 2:52 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Masahiko Sawada wrote:
>>> On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>
>>> > Therefore, I'm inclined to make this function raise a warning, then
>>> > return a substitute value (something like "unrecognized type XYZ").
>>> > [...]
>>>
>>> I agree with you about not hiding the actual reason for the error but
>>> if we raise a warning at logicalrep_typmap_gettypname don't we call
>>> slot_store_error_callback recursively?
>>
>> Hmm, now that you mention it, I don't really know.  I think it's
>> supposed not to happen, since calling ereport() again opens a new
>> recursion level, but then maybe errcontext doesn't depend on the
>> recursion level ... I haven't checked.  This is why the TAP test would
>> be handy :-)
>
> The calling ereport opens a new recursion level. The calling ereport
> with error doesn't return to caller but the calling with warning does.
> So the recursively calling ereport(WARNING) ends up with exceeding the
> errordata stack size. So it seems to me that we can set errcontext in
> logicalrep_typmap_gettypname() instead of raising warning or error.
>
>>
>>> Agreed. Will add a TAP test.
>>
>> Great.  This patch waits on that, then.
>>
>
> Okay. I think the most simple and convenient way to reproduce this
> issue is to call an elog(LOG) in input function of a user-defined data
> type. So I'm thinking to create the test in src/test/module directory.
>

After more thought, I think since the errors in
logicalrep_typmap_gettypname are not relevant with the actual error
(i.g. type conversion error) it would not be good idea to show the
error message of "could not found data type entry" in errcontext.
It might be more appropriate if we return a substitute value
("unrecognized type" or "unrecognized built-in type") without raising
neither an error nor a warning. Thoughts?

Regarding to regression test, I added a new test module
test_subscription that creates a new user-defined data type. In a
subscription regression test, using test_subscription we make
subscriber call slot_store_error_callback and check if the subscriber
can correctly look up both remote and local data type strings. One
downside of this regression test is that in a failure case, the
duration of the test will be long (up to 180sec) because it has to
wait for the polling timeout.
Attached an updated patch with a regression test.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

fix_slot_store_error_callback_v13.patch

Re: User defined data types in Logical Replication

From

Alvaro Herrera

Date:

14 March 2018, 20:47:53

Masahiko Sawada wrote:

> After more thought, I think since the errors in
> logicalrep_typmap_gettypname are not relevant with the actual error
> (i.g. type conversion error) it would not be good idea to show the
> error message of "could not found data type entry" in errcontext.
> It might be more appropriate if we return a substitute value
> ("unrecognized type" or "unrecognized built-in type") without raising
> neither an error nor a warning. Thoughts?

Yeah, seems a reasonable idea to me.  Here's a tidied-up version of your
patch, minus the regression test changes (I may end up committing that
one separately).  But I now hesitate to push it because I'm unsure of
how does type mapping actually work, and whether it's still working
after this patch -- for example if we create two user-defined datatypes
in opposite orders in the nodes (so they get different OIDs), are we
able to replicate data correctly from one side to the other?  If there's
code to support this case, it is not at all obvious where it is.

> Regarding to regression test, I added a new test module
> test_subscription that creates a new user-defined data type.

I think this is a good thing to add.  I wonder if we could have the
module create two extensions, and have the TAP test create first A then
B in one node, and first B in the other node, replicate a table that
uses type A, see how it fails, then create type A and ensure replication
works correctly.  (Ideally the types would not be binary compatible, so
that it'd not work by accident.)

> One downside of this regression test is that in a failure case, the
> duration of the test will be long (up to 180sec) because it has to
> wait for the polling timeout.

This is bad.  I haven't read it.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: User defined data types in Logical Replication

From

Alvaro Herrera

Date:

14 March 2018, 22:15:52

Alvaro Herrera wrote:

> Yeah, seems a reasonable idea to me.  Here's a tidied-up version of your
> patch, minus the regression test changes (I may end up committing that
> one separately).

... and patch.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

0001-logical-replication-change-how-datatype-mapping-is-u.patch

Re: User defined data types in Logical Replication

From

Alvaro Herrera

Date:

15 March 2018, 03:41:03

Masahiko Sawada wrote:

> Regarding to regression test, I added a new test module
> test_subscription that creates a new user-defined data type. In a
> subscription regression test, using test_subscription we make
> subscriber call slot_store_error_callback and check if the subscriber
> can correctly look up both remote and local data type strings. One
> downside of this regression test is that in a failure case, the
> duration of the test will be long (up to 180sec) because it has to
> wait for the polling timeout.
> Attached an updated patch with a regression test.

Pushed the fix to pg10 and master.  Thanks to all involved for the
report, patches and review.

Here's the regression test patch.  The problem with it is that the TAP
test is not verifying much -- I tried applying it before the fix commit,
and it succeeds!  The only funny is that the errcontext messages are
wrong, they look like this:

2018-03-14 20:31:03.564 -03 [763018] LOG:  input int: 1
2018-03-14 20:31:03.564 -03 [763018] CONTEXT:  processing remote data for replication target relation "public.test"
column"b", remote type dummytext, local type dummyint

2018-03-14 20:31:03.564 -03 [763018] LOG:  input text: "one"
2018-03-14 20:31:03.564 -03 [763018] CONTEXT:  processing remote data for replication target relation "public.test"
column"a", remote type dummyint, local type dummytext

but I think it would be better to verify them.  (With your version I
think you were trusting that the OID would not match anything, giving
raise to the "cache lookup failed" error before the patch.  I'm not sure
that that's very trustworthy either.)

I think this is a worthwhile test, but IMO it should be improved a bit
before we include it.  Also, we can come up with a better name for the
test surely, not just refer to this particular bug. Maybe "typemap".

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: User defined data types in Logical Replication

From

Alvaro Herrera

Date:

15 March 2018, 03:43:59


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

0001-test-module.patch

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

16 March 2018, 03:54:56

On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Masahiko Sawada wrote:
>
>> Regarding to regression test, I added a new test module
>> test_subscription that creates a new user-defined data type. In a
>> subscription regression test, using test_subscription we make
>> subscriber call slot_store_error_callback and check if the subscriber
>> can correctly look up both remote and local data type strings. One
>> downside of this regression test is that in a failure case, the
>> duration of the test will be long (up to 180sec) because it has to
>> wait for the polling timeout.
>> Attached an updated patch with a regression test.
>
> Pushed the fix to pg10 and master.  Thanks to all involved for the
> report, patches and review.

Thank you for committing!

>
> Here's the regression test patch.  The problem with it is that the TAP
> test is not verifying much -- I tried applying it before the fix commit,
> and it succeeds!  The only funny is that the errcontext messages are
> wrong, they look like this:
>
> 2018-03-14 20:31:03.564 -03 [763018] LOG:  input int: 1
> 2018-03-14 20:31:03.564 -03 [763018] CONTEXT:  processing remote data for replication target relation "public.test"
column"b", remote type dummytext, local type dummyint
 
> 2018-03-14 20:31:03.564 -03 [763018] LOG:  input text: "one"
> 2018-03-14 20:31:03.564 -03 [763018] CONTEXT:  processing remote data for replication target relation "public.test"
column"a", remote type dummyint, local type dummytext
 
>
> but I think it would be better to verify them.  (With your version I
> think you were trusting that the OID would not match anything, giving
> raise to the "cache lookup failed" error before the patch.  I'm not sure
> that that's very trustworthy either.)

Agreed. Also, since my patch attempted to lead the error we need a
long time to check if it failed, which is not good.

>
> I think this is a worthwhile test, but IMO it should be improved a bit
> before we include it.  Also, we can come up with a better name for the
> test surely, not just refer to this particular bug. Maybe "typemap".
>

It might be useful if we have the facility of TAP test to check the
log message and regexp-match the message to the expected string.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: User defined data types in Logical Replication

From

Alvaro Herrera

Date:

16 March 2018, 04:24:36

Masahiko Sawada wrote:
> On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> > I think this is a worthwhile test, but IMO it should be improved a bit
> > before we include it.  Also, we can come up with a better name for the
> > test surely, not just refer to this particular bug. Maybe "typemap".
> 
> It might be useful if we have the facility of TAP test to check the
> log message and regexp-match the message to the expected string.

Something similar to PostgresNode::issues_sql_like() perhaps?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: User defined data types in Logical Replication

From

Huong Dangminh

Date:

17 March 2018, 04:47:38

> Masahiko Sawada wrote:
> 
> > Regarding to regression test, I added a new test module
> > test_subscription that creates a new user-defined data type. In a
> > subscription regression test, using test_subscription we make
> > subscriber call slot_store_error_callback and check if the subscriber
> > can correctly look up both remote and local data type strings. One
> > downside of this regression test is that in a failure case, the
> > duration of the test will be long (up to 180sec) because it has to
> > wait for the polling timeout.
> > Attached an updated patch with a regression test.
> 
> Pushed the fix to pg10 and master.  Thanks to all involved for the report,
> patches and review.
> 

Thank you all for fixing it.


---
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

19 March 2018, 06:50:55

On Fri, Mar 16, 2018 at 10:24 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> Masahiko Sawada wrote:
>> On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
>> > I think this is a worthwhile test, but IMO it should be improved a bit
>> > before we include it.  Also, we can come up with a better name for the
>> > test surely, not just refer to this particular bug. Maybe "typemap".
>>
>> It might be useful if we have the facility of TAP test to check the
>> log message and regexp-match the message to the expected string.
>
> Something similar to PostgresNode::issues_sql_like() perhaps?
>

Yeah, I didn't know that but I think it's a good idea. Unlike
issues_sql_like() we don't issue anything to the subscriber. So maybe
we need truncate logfile before insertion and verify logfile of
particular period. The test code would be like follows.

$node_subscriber->safe_psql('postgres', 'CREATE SUBSCRIPTION...");
truncate $node_subscriber->logfile, 0;
$node_publisher->safe_psql('postgres', 'INSERT .. ')
my $log = TestLib::slurp_file($node_subscriber->logfile);

# Verify logs
like($log, qr/processing remote data for replication target relation
"public.test" column "b", remote type dummyint, local type dummyint/,
'callback function of datatype conversion1');
like($log, qr/processing remote data for replication target relation
"public.test" column "a", remote type dummytext, local type
dummytext/, 'callback function of datatype conversion2');

Thoughts?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: User defined data types in Logical Replication

From

Kyotaro HORIGUCHI

Date:

19 March 2018, 10:11:39

At Mon, 19 Mar 2018 12:50:55 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
<CAD21AoD=kiDVxfZ2aT_Oeg7+5etkxg0eqmsRE-gcUbptKNir6g@mail.gmail.com>
> On Fri, Mar 16, 2018 at 10:24 AM, Alvaro Herrera
> <alvherre@alvh.no-ip.org> wrote:
> > Masahiko Sawada wrote:
> >> On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >
> >> > I think this is a worthwhile test, but IMO it should be improved a bit
> >> > before we include it.  Also, we can come up with a better name for the
> >> > test surely, not just refer to this particular bug. Maybe "typemap".
> >>
> >> It might be useful if we have the facility of TAP test to check the
> >> log message and regexp-match the message to the expected string.
> >
> > Something similar to PostgresNode::issues_sql_like() perhaps?
> >
> 
> Yeah, I didn't know that but I think it's a good idea. Unlike
> issues_sql_like() we don't issue anything to the subscriber. So maybe
> we need truncate logfile before insertion and verify logfile of
> particular period. The test code would be like follows.
> 
> $node_subscriber->safe_psql('postgres', 'CREATE SUBSCRIPTION...");
> truncate $node_subscriber->logfile, 0;
> $node_publisher->safe_psql('postgres', 'INSERT .. ')
> my $log = TestLib::slurp_file($node_subscriber->logfile);
> 
> # Verify logs
> like($log, qr/processing remote data for replication target relation
> "public.test" column "b", remote type dummyint, local type dummyint/,
> 'callback function of datatype conversion1');
> like($log, qr/processing remote data for replication target relation
> "public.test" column "a", remote type dummytext, local type
> dummytext/, 'callback function of datatype conversion2');
> 
> Thoughts?

FWIW something like is in a currently proposed patch.

https://www.postgresql.org/message-id/20180129.194023.228030941.horiguchi.kyotaro@lab.ntt.co.jp

0003 contains the following function.

+# find $pat in logfile of $node after $off-th byte
+sub find_in_log
+{
+    my ($node, $pat, $off) = @_;
+
+    $off = 0 unless defined $off;
+    my $log = TestLib::slurp_file($node->logfile);
+    return 0 if (length($log) <= $off);
+
+    $log = substr($log, $off);
+
+    return $log =~ m/$pat/;
+}

It is used as the follows.

+$logstart = get_log_size($node_standby);
+$node_standby->start;
+
+my $failed = 0;
+for (my $i = 0 ; $i < 10000 ; $i++)
+{
+    if (find_in_log($node_standby,
+                    "requested WAL segment [0-9A-F]+ has already been removed",
+                    $logstart))
+    {
+        $failed = 1;
+        last;
+    }
+    usleep(100_000);
+}
+ok($failed, 'check replication has been broken');

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

19 March 2018, 13:57:48

On Mon, Mar 19, 2018 at 12:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Mar 16, 2018 at 10:24 AM, Alvaro Herrera
> <alvherre@alvh.no-ip.org> wrote:
>> Masahiko Sawada wrote:
>>> On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>
>>> > I think this is a worthwhile test, but IMO it should be improved a bit
>>> > before we include it.  Also, we can come up with a better name for the
>>> > test surely, not just refer to this particular bug. Maybe "typemap".
>>>
>>> It might be useful if we have the facility of TAP test to check the
>>> log message and regexp-match the message to the expected string.
>>
>> Something similar to PostgresNode::issues_sql_like() perhaps?
>>
>
> Yeah, I didn't know that but I think it's a good idea. Unlike
> issues_sql_like() we don't issue anything to the subscriber. So maybe
> we need truncate logfile before insertion and verify logfile of
> particular period. The test code would be like follows.
>
> $node_subscriber->safe_psql('postgres', 'CREATE SUBSCRIPTION...");
> truncate $node_subscriber->logfile, 0;
> $node_publisher->safe_psql('postgres', 'INSERT .. ')
> my $log = TestLib::slurp_file($node_subscriber->logfile);
>
> # Verify logs
> like($log, qr/processing remote data for replication target relation
> "public.test" column "b", remote type dummyint, local type dummyint/,
> 'callback function of datatype conversion1');
> like($log, qr/processing remote data for replication target relation
> "public.test" column "a", remote type dummytext, local type
> dummytext/, 'callback function of datatype conversion2');
>
> Thoughts?
>

Attached an updated test patch added the above test(0002 patch). Since
for this test case it's enough to use existing test functions I didn't
create new test functions. Also I found that the local data type name
in log for data type conversion isn't qualified whereas the remote
data type is always qualified. Attached 0001 patch fixes that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: User defined data types in Logical Replication

From

Masahiko Sawada

Date:

11 April 2018, 04:16:44

On Mon, Mar 19, 2018 at 7:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Mon, Mar 19, 2018 at 12:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>> On Fri, Mar 16, 2018 at 10:24 AM, Alvaro Herrera
>> <alvherre@alvh.no-ip.org> wrote:
>>> Masahiko Sawada wrote:
>>>> On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>>
>>>> > I think this is a worthwhile test, but IMO it should be improved a bit
>>>> > before we include it.  Also, we can come up with a better name for the
>>>> > test surely, not just refer to this particular bug. Maybe "typemap".
>>>>
>>>> It might be useful if we have the facility of TAP test to check the
>>>> log message and regexp-match the message to the expected string.
>>>
>>> Something similar to PostgresNode::issues_sql_like() perhaps?
>>>
>>
>> Yeah, I didn't know that but I think it's a good idea. Unlike
>> issues_sql_like() we don't issue anything to the subscriber. So maybe
>> we need truncate logfile before insertion and verify logfile of
>> particular period. The test code would be like follows.
>>
>> $node_subscriber->safe_psql('postgres', 'CREATE SUBSCRIPTION...");
>> truncate $node_subscriber->logfile, 0;
>> $node_publisher->safe_psql('postgres', 'INSERT .. ')
>> my $log = TestLib::slurp_file($node_subscriber->logfile);
>>
>> # Verify logs
>> like($log, qr/processing remote data for replication target relation
>> "public.test" column "b", remote type dummyint, local type dummyint/,
>> 'callback function of datatype conversion1');
>> like($log, qr/processing remote data for replication target relation
>> "public.test" column "a", remote type dummytext, local type
>> dummytext/, 'callback function of datatype conversion2');
>>
>> Thoughts?
>>
>
> Attached an updated test patch added the above test(0002 patch). Since
> for this test case it's enough to use existing test functions I didn't
> create new test functions. Also I found that the local data type name
> in log for data type conversion isn't qualified whereas the remote
> data type is always qualified. Attached 0001 patch fixes that.
>

The original issue has been fixed and this entry on CommitFest has
been marked as "Committed" but there are still works for improving
testing. Perhaps I should register a new entry of remaining patches to
next CommitFest.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: User defined data types in Logical Replication

From

Đặng Minh Hướng

Date:

11 April 2018, 15:26:54

2018-04-11 10:16 GMT+09:00 Masahiko Sawada <sawada.mshk@gmail.com>:

On Mon, Mar 19, 2018 at 7:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Attached an updated test patch added the above test(0002 patch). Since
> for this test case it's enough to use existing test functions I didn't
> create new test functions. Also I found that the local data type name
> in log for data type conversion isn't qualified whereas the remote
> data type is always qualified. Attached 0001 patch fixes that.
>

The original issue has been fixed and this entry on CommitFest has
been marked as "Committed" but there are still works for improving
testing. Perhaps I should register a new entry of remaining patches to
next CommitFest.

Thanks. I appreciate it.

---

Dang Minh Huong