Thread: RE: Re: User defined data types in Logical Replication
Sawada-san, Thanks for your response. # And sorry again because I could not reply to your gmail # address from my environment due to security restriction. > >> We are getting the bellow error while trying use Logical Replication > >> with user defined data types in a C program (when call elog function). > >> > >> ERROR: XX000: cache lookup failed for type XXXXX > >> > > > > Sorry for continuously disturbing in this topic, but am I missing something > here? > > No, but I'd suggest to provide a procedure for reproducing if possible, > which will be helpful for investigation. Sorry, I will be careful next time. > > I mean that in case of type's OID in PUBLICATION host does not exists > > in SUBSCRIPTION host's pg_type, it could returns unintended error (the > XX000 above) when elog or ereport is executed. > > > > For more details, it happen in slot_store_error_callback when it try to > call format_type_be(localtypoid) for errcontext. > > slot_store_error_callback is set in slot_store_cstrings, > slot_modify_cstrings function and it also be unset here, so the effect here > is small but it happens. > > > > I think I found out the cause of this issue, and this is a bug. This can > be reproduced, for example, if the input function of the data type calls > elog() during applying on the environment where OIDs of the data type on > publisher and subscriber are different. The cause of this issue is that > we call format_type_be() with remotetypoid. If the OIDs of data type on > publisher and subscriber are different we search it from syscache by the > OID that doesn't exist on subscriber. Yes, I also think that. > On detail of your patch, I don't think this direction is good. Since the > subscriber already has a LogicalRepTyp cache entry for the type we can report > the error message using the data type name. So I think this issue can be > fixed by using the remote type name got from the cache. Thanks, I did not realize the LogicalRepRelMapEntry, remote type name is already here. > Also I'm confused about the message of errcontext; currently we store the > local data type OID corresponding to the remote data type name into the > cache, and then we search the local data type name by the local data type > OID stored in the cache. So it means the both the local data type OID and > the remote data type OID always imply the same data type. We use the both > data type OIDs for log message in slot_store_error_callback, but I think > what the function want to do is to show the different type names if the > table definitions on both server are different (e.g. sending jsonb column > data to text column data). I think we should use the type of the local relation > attribute rather than remote's one. > > Attached draft patch fixed this issue, at least on my environment. It works good for me. > Please review it. I will review it soon. --- Thanks and best regards, Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
Sorry for not replying sooner. > > Attached draft patch fixed this issue, at least on my environment. > > It works good for me. > > > Please review it. > > I will review it soon. There is one more case that user-defined data type is not supported in Logical Replication. That is when remote data type's name does not exist in SUBSCRIBE. In relation.c:logicalrep_typmap_gettypname We search OID in syscache by remote's data type name and mapping it, if it does not exist in syscache We will be faced with the bellow error. if (!OidIsValid(entry->typoid)) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("data type \"%s.%s\" required for logical replication does not exist", entry->nspname, entry->typname))); I think, it is not necessary to check typoid here in order to avoid above case, is that right? I attached a patch based on Sawada-san's patch with a bit of messages modified and remove the above check. Could somebody check it for me or should I add it into CF? --- Thanks and best regards, Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
Attachment
Hi, > From: Huong Dangminh [mailto:huo-dangminh@ys.jp.nec.com] > I attached a patch based on Sawada-san's patch with a bit of messages > modified and remove the above check. > Could somebody check it for me or should I add it into CF? Sorry, I have added this thread to the next CF. --- Thanks and best regards, Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
On Wed, Nov 22, 2017 at 12:25 AM, Huong Dangminh <huo-dangminh@ys.jp.nec.com> wrote: > Thanks for your response. > # And sorry again because I could not reply to your gmail > # address from my environment due to security restriction. It's okay. I can understand your environment :-) > Sorry for not replying sooner. > >> > Attached draft patch fixed this issue, at least on my environment. >> >> It works good for me. >> >> > Please review it. >> >> I will review it soon. > > There is one more case that user-defined data type is not supported in Logical Replication. > That is when remote data type's name does not exist in SUBSCRIBE. > > In relation.c:logicalrep_typmap_gettypname > We search OID in syscache by remote's data type name and mapping it, if it does not exist in syscache > We will be faced with the bellow error. > > if (!OidIsValid(entry->typoid)) > ereport(ERROR, > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("data type \"%s.%s\" required for logical replication does not exist", > entry->nspname, entry->typname))); > > I think, it is not necessary to check typoid here in order to avoid above case, is that right? I think it's not right. We should end up with an error in the case where the same type name doesn't exist on subscriber. With your proposed patch, logicalrep_typmap_gettypname() can return an invalid string (entry->typname) in that case, which can be a cause of SEGV. > Sorry, I have added this thread to the next CF. Thank you for adding it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi Sawada-san,
Sorry for my late response.
On 2017/12/05 0:11, Masahiko Sawada wrote:
Thanks, I think you are right.I think it's not right. We should end up with an error in the case where the same type name doesn't exist on subscriber. With your proposed patch, logicalrep_typmap_gettypname() can return an invalid string (entry->typname) in that case, which can be a cause of SEGV.There is one more case that user-defined data type is not supported in Logical Replication. That is when remote data type's name does not exist in SUBSCRIBE. In relation.c:logicalrep_typmap_gettypname We search OID in syscache by remote's data type name and mapping it, if it does not exist in syscache We will be faced with the bellow error. if (!OidIsValid(entry->typoid)) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("data type \"%s.%s\" required for logical replication does not exist", entry->nspname, entry->typname))); I think, it is not necessary to check typoid here in order to avoid above case, is that right?
# I thought that entry->typname was received from walsender, and it is already be qualified in logicalrep_write_typ.
# But we also need check it in subscriber, because it could be lost info in transmission.
Also we do not need to fix for the case above too, because user can change type name by ALTER TYPE statement.
I attached the patch, which based on your patch with a bit of modified messages.
---
Thanks and best regards,
Dang Minh Huong
Attachment
On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <kakalot49@gmail.com> wrote: > Hi Sawada-san, > > Sorry for my late response. > > On 2017/12/05 0:11, Masahiko Sawada wrote: > > There is one more case that user-defined data type is not supported in > Logical Replication. > That is when remote data type's name does not exist in SUBSCRIBE. > > In relation.c:logicalrep_typmap_gettypname > We search OID in syscache by remote's data type name and mapping it, if it > does not exist in syscache > We will be faced with the bellow error. > > if (!OidIsValid(entry->typoid)) > ereport(ERROR, > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > errmsg("data type \"%s.%s\" required for > logical replication does not exist", > entry->nspname, > entry->typname))); > > I think, it is not necessary to check typoid here in order to avoid above > case, is that right? > > I think it's not right. We should end up with an error in the case where the > same type name doesn't exist on subscriber. With your proposed patch, > logicalrep_typmap_gettypname() can return an invalid string (entry->typname) > in that case, which can be a cause of SEGV. > > Thanks, I think you are right. > # I thought that entry->typname was received from walsender, and it is > already be qualified in logicalrep_write_typ. > # But we also need check it in subscriber, because it could be lost info in > transmission. Oops, the last sentence of my previous mail was wrong. logicalrep_typmap_gettypname() doesn't return an invalid string since entry->typname is initialized with a type name got from wal sender. After more thought, we might not need to raise an error even if there is not the same data type on both publisher and subscriber. Because data is sent after converted to the text representation and is converted to a data type according to the local table definition subscribers don't always need to have the same data type. If it's right, slot_store_error_callback() doesn't need to find a corresponding local data type OID but just finds the corresponding type name by remote data type OID. What do you think? --- a/src/backend/replication/logical/worker.c +++ b/src/backend/replication/logical/worker.c @@ -100,8 +100,9 @@ static dlist_head lsn_mapping = DLIST_STATIC_INIT(lsn_mapping); typedef struct SlotErrCallbackArg { - LogicalRepRelation *rel; - int attnum; + LogicalRepRelMapEntry *rel; + int remote_attnum; + int local_attnum; } SlotErrCallbackArg; Since LogicalRepRelMapEntry has a map of local attributes to remote ones we don't need to have two attribute numbers. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <kakalot49@gmail.com> wrote: >> Hi Sawada-san, >> >> Sorry for my late response. >> >> On 2017/12/05 0:11, Masahiko Sawada wrote: >> >> There is one more case that user-defined data type is not supported in >> Logical Replication. >> That is when remote data type's name does not exist in SUBSCRIBE. >> >> In relation.c:logicalrep_typmap_gettypname >> We search OID in syscache by remote's data type name and mapping it, if it >> does not exist in syscache >> We will be faced with the bellow error. >> >> if (!OidIsValid(entry->typoid)) >> ereport(ERROR, >> >> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), >> errmsg("data type \"%s.%s\" required for >> logical replication does not exist", >> entry->nspname, >> entry->typname))); >> >> I think, it is not necessary to check typoid here in order to avoid above >> case, is that right? >> >> I think it's not right. We should end up with an error in the case where the >> same type name doesn't exist on subscriber. With your proposed patch, >> logicalrep_typmap_gettypname() can return an invalid string (entry->typname) >> in that case, which can be a cause of SEGV. >> >> Thanks, I think you are right. >> # I thought that entry->typname was received from walsender, and it is >> already be qualified in logicalrep_write_typ. >> # But we also need check it in subscriber, because it could be lost info in >> transmission. > > Oops, the last sentence of my previous mail was wrong. > logicalrep_typmap_gettypname() doesn't return an invalid string since > entry->typname is initialized with a type name got from wal sender. > > After more thought, we might not need to raise an error even if there > is not the same data type on both publisher and subscriber. Because > data is sent after converted to the text representation and is > converted to a data type according to the local table definition > subscribers don't always need to have the same data type. If it's > right, slot_store_error_callback() doesn't need to find a > corresponding local data type OID but just finds the corresponding > type name by remote data type OID. What do you think? > > --- a/src/backend/replication/logical/worker.c > +++ b/src/backend/replication/logical/worker.c > @@ -100,8 +100,9 @@ static dlist_head lsn_mapping = > DLIST_STATIC_INIT(lsn_mapping); > > typedef struct SlotErrCallbackArg > { > - LogicalRepRelation *rel; > - int attnum; > + LogicalRepRelMapEntry *rel; > + int remote_attnum; > + int local_attnum; > } SlotErrCallbackArg; > > Since LogicalRepRelMapEntry has a map of local attributes to remote > ones we don't need to have two attribute numbers. > Attached the patch incorporated what I have on mind. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Hi Sawada-san, > On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> > wrote: > > On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <kakalot49@gmail.com> > wrote: > >> Hi Sawada-san, > >> > >> Sorry for my late response. > >> > >> On 2017/12/05 0:11, Masahiko Sawada wrote: > >> > >> There is one more case that user-defined data type is not supported > >> in Logical Replication. > >> That is when remote data type's name does not exist in SUBSCRIBE. > >> > >> In relation.c:logicalrep_typmap_gettypname > >> We search OID in syscache by remote's data type name and mapping it, > >> if it does not exist in syscache We will be faced with the bellow > >> error. > >> > >> if (!OidIsValid(entry->typoid)) > >> ereport(ERROR, > >> > >> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > >> errmsg("data type \"%s.%s\" required > >> for logical replication does not exist", > >> entry->nspname, > >> entry->typname))); > >> > >> I think, it is not necessary to check typoid here in order to avoid > >> above case, is that right? > >> > >> I think it's not right. We should end up with an error in the case > >> where the same type name doesn't exist on subscriber. With your > >> proposed patch, > >> logicalrep_typmap_gettypname() can return an invalid string > >> (entry->typname) in that case, which can be a cause of SEGV. > >> > >> Thanks, I think you are right. > >> # I thought that entry->typname was received from walsender, and it > >> is already be qualified in logicalrep_write_typ. > >> # But we also need check it in subscriber, because it could be lost > >> info in transmission. > > > > Oops, the last sentence of my previous mail was wrong. > > logicalrep_typmap_gettypname() doesn't return an invalid string since > > entry->typname is initialized with a type name got from wal sender. > > > > After more thought, we might not need to raise an error even if there > > is not the same data type on both publisher and subscriber. Because > > data is sent after converted to the text representation and is > > converted to a data type according to the local table definition > > subscribers don't always need to have the same data type. If it's > > right, slot_store_error_callback() doesn't need to find a > > corresponding local data type OID but just finds the corresponding > > type name by remote data type OID. What do you think? > > > > --- a/src/backend/replication/logical/worker.c > > +++ b/src/backend/replication/logical/worker.c > > @@ -100,8 +100,9 @@ static dlist_head lsn_mapping = > > DLIST_STATIC_INIT(lsn_mapping); > > > > typedef struct SlotErrCallbackArg > > { > > - LogicalRepRelation *rel; > > - int attnum; > > + LogicalRepRelMapEntry *rel; > > + int remote_attnum; > > + int local_attnum; > > } SlotErrCallbackArg; > > > > Since LogicalRepRelMapEntry has a map of local attributes to remote > > ones we don't need to have two attribute numbers. > > Sorry for the late reply. > Attached the patch incorporated what I have on mind. Please review it. Thanks for the patch, I will do it at this weekend. --- Thanks and best regards, Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
On 2017/12/08 13:18, Huong Dangminh wrote: > Hi Sawada-san, > >> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> >> wrote: >>> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <kakalot49@gmail.com> >> wrote: >>>> Hi Sawada-san, >>>> >>>> Sorry for my late response. >>>> >>>> On 2017/12/05 0:11, Masahiko Sawada wrote: >>>> >>>> There is one more case that user-defined data type is not supported >>>> in Logical Replication. >>>> That is when remote data type's name does not exist in SUBSCRIBE. >>>> >>>> In relation.c:logicalrep_typmap_gettypname >>>> We search OID in syscache by remote's data type name and mapping it, >>>> if it does not exist in syscache We will be faced with the bellow >>>> error. >>>> >>>> if (!OidIsValid(entry->typoid)) >>>> ereport(ERROR, >>>> >>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), >>>> errmsg("data type \"%s.%s\" required >>>> for logical replication does not exist", >>>> entry->nspname, >>>> entry->typname))); >>>> >>>> I think, it is not necessary to check typoid here in order to avoid >>>> above case, is that right? >>>> >>>> I think it's not right. We should end up with an error in the case >>>> where the same type name doesn't exist on subscriber. With your >>>> proposed patch, >>>> logicalrep_typmap_gettypname() can return an invalid string >>>> (entry->typname) in that case, which can be a cause of SEGV. >>>> >>>> Thanks, I think you are right. >>>> # I thought that entry->typname was received from walsender, and it >>>> is already be qualified in logicalrep_write_typ. >>>> # But we also need check it in subscriber, because it could be lost >>>> info in transmission. >>> Oops, the last sentence of my previous mail was wrong. >>> logicalrep_typmap_gettypname() doesn't return an invalid string since >>> entry->typname is initialized with a type name got from wal sender. Yeah, so we do not need to check the existing of publisher's type name in subscriber. >>> After more thought, we might not need to raise an error even if there >>> is not the same data type on both publisher and subscriber. Because >>> data is sent after converted to the text representation and is >>> converted to a data type according to the local table definition >>> subscribers don't always need to have the same data type. If it's >>> right, slot_store_error_callback() doesn't need to find a >>> corresponding local data type OID but just finds the corresponding >>> type name by remote data type OID. What do you think? I totally agree. It will make logical replication more flexible with data type. >>> --- a/src/backend/replication/logical/worker.c >>> +++ b/src/backend/replication/logical/worker.c >>> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping = >>> DLIST_STATIC_INIT(lsn_mapping); >>> >>> typedef struct SlotErrCallbackArg >>> { >>> - LogicalRepRelation *rel; >>> - int attnum; >>> + LogicalRepRelMapEntry *rel; >>> + int remote_attnum; >>> + int local_attnum; >>> } SlotErrCallbackArg; >>> >>> Since LogicalRepRelMapEntry has a map of local attributes to remote >>> ones we don't need to have two attribute numbers. >>> > Sorry for the late reply. > >> Attached the patch incorporated what I have on mind. Please review it. > Thanks for the patch, I will do it at this weekend. Your patch is fine for me. But logicalrep_typmap_getid will be unused. I attached the patch with removing of it. --- Thanks and best regards, Dang Minh Huong
Attachment
On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong <kakalot49@gmail.com> wrote: > On 2017/12/08 13:18, Huong Dangminh wrote: > >> Hi Sawada-san, >> >>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada <sawada.mshk@gmail.com> >>> wrote: >>>> >>>> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong <kakalot49@gmail.com> >>> >>> wrote: >>>>> >>>>> Hi Sawada-san, >>>>> >>>>> Sorry for my late response. >>>>> >>>>> On 2017/12/05 0:11, Masahiko Sawada wrote: >>>>> >>>>> There is one more case that user-defined data type is not supported >>>>> in Logical Replication. >>>>> That is when remote data type's name does not exist in SUBSCRIBE. >>>>> >>>>> In relation.c:logicalrep_typmap_gettypname >>>>> We search OID in syscache by remote's data type name and mapping it, >>>>> if it does not exist in syscache We will be faced with the bellow >>>>> error. >>>>> >>>>> if (!OidIsValid(entry->typoid)) >>>>> ereport(ERROR, >>>>> >>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), >>>>> errmsg("data type \"%s.%s\" required >>>>> for logical replication does not exist", >>>>> entry->nspname, >>>>> entry->typname))); >>>>> >>>>> I think, it is not necessary to check typoid here in order to avoid >>>>> above case, is that right? >>>>> >>>>> I think it's not right. We should end up with an error in the case >>>>> where the same type name doesn't exist on subscriber. With your >>>>> proposed patch, >>>>> logicalrep_typmap_gettypname() can return an invalid string >>>>> (entry->typname) in that case, which can be a cause of SEGV. >>>>> >>>>> Thanks, I think you are right. >>>>> # I thought that entry->typname was received from walsender, and it >>>>> is already be qualified in logicalrep_write_typ. >>>>> # But we also need check it in subscriber, because it could be lost >>>>> info in transmission. >>>> >>>> Oops, the last sentence of my previous mail was wrong. >>>> logicalrep_typmap_gettypname() doesn't return an invalid string since >>>> entry->typname is initialized with a type name got from wal sender. > > Yeah, so we do not need to check the existing of publisher's type name in > subscriber. >>>> >>>> After more thought, we might not need to raise an error even if there >>>> is not the same data type on both publisher and subscriber. Because >>>> data is sent after converted to the text representation and is >>>> converted to a data type according to the local table definition >>>> subscribers don't always need to have the same data type. If it's >>>> right, slot_store_error_callback() doesn't need to find a >>>> corresponding local data type OID but just finds the corresponding >>>> type name by remote data type OID. What do you think? > > I totally agree. It will make logical replication more flexible with data > type. >>>> >>>> --- a/src/backend/replication/logical/worker.c >>>> +++ b/src/backend/replication/logical/worker.c >>>> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping = >>>> DLIST_STATIC_INIT(lsn_mapping); >>>> >>>> typedef struct SlotErrCallbackArg >>>> { >>>> - LogicalRepRelation *rel; >>>> - int attnum; >>>> + LogicalRepRelMapEntry *rel; >>>> + int remote_attnum; >>>> + int local_attnum; >>>> } SlotErrCallbackArg; >>>> >>>> Since LogicalRepRelMapEntry has a map of local attributes to remote >>>> ones we don't need to have two attribute numbers. >>>> >> Sorry for the late reply. >> >>> Attached the patch incorporated what I have on mind. Please review it. >> >> Thanks for the patch, I will do it at this weekend. > > Your patch is fine for me. > But logicalrep_typmap_getid will be unused. Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow to replicate data to an another data type of column, what is the data type mapping on subscriber for? It seems enough to have remote data type oid, namespace and name. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi Sawada-san, > Sent: Tuesday, December 12, 2017 9:41 AM > To: Dang Minh Huong <kakalot49@gmail.com> > Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>; Yanagisawa > Hiroshi(柳澤 博) <hir-yanagisawa@ut.jp.nec.com>; Dangminh Huong(ダンM > フーン) <huo-dangminh@ys.jp.nec.com> > Subject: Re: User defined data types in Logical Replication > > On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong <kakalot49@gmail.com> > wrote: > > On 2017/12/08 13:18, Huong Dangminh wrote: > > > >> Hi Sawada-san, > >> > >>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada > >>> <sawada.mshk@gmail.com> > >>> wrote: > >>>> > >>>> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong > >>>> <kakalot49@gmail.com> > >>> > >>> wrote: > >>>>> > >>>>> Hi Sawada-san, > >>>>> > >>>>> Sorry for my late response. > >>>>> > >>>>> On 2017/12/05 0:11, Masahiko Sawada wrote: > >>>>> > >>>>> There is one more case that user-defined data type is not > >>>>> supported in Logical Replication. > >>>>> That is when remote data type's name does not exist in SUBSCRIBE. > >>>>> > >>>>> In relation.c:logicalrep_typmap_gettypname > >>>>> We search OID in syscache by remote's data type name and mapping > >>>>> it, if it does not exist in syscache We will be faced with the > >>>>> bellow error. > >>>>> > >>>>> if (!OidIsValid(entry->typoid)) > >>>>> ereport(ERROR, > >>>>> > >>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > >>>>> errmsg("data type \"%s.%s\" > >>>>> required for logical replication does not exist", > >>>>> entry->nspname, > >>>>> entry->typname))); > >>>>> > >>>>> I think, it is not necessary to check typoid here in order to > >>>>> avoid above case, is that right? > >>>>> > >>>>> I think it's not right. We should end up with an error in the case > >>>>> where the same type name doesn't exist on subscriber. With your > >>>>> proposed patch, > >>>>> logicalrep_typmap_gettypname() can return an invalid string > >>>>> (entry->typname) in that case, which can be a cause of SEGV. > >>>>> > >>>>> Thanks, I think you are right. > >>>>> # I thought that entry->typname was received from walsender, and > >>>>> it is already be qualified in logicalrep_write_typ. > >>>>> # But we also need check it in subscriber, because it could be > >>>>> lost info in transmission. > >>>> > >>>> Oops, the last sentence of my previous mail was wrong. > >>>> logicalrep_typmap_gettypname() doesn't return an invalid string > >>>> since > >>>> entry->typname is initialized with a type name got from wal sender. > > > > Yeah, so we do not need to check the existing of publisher's type name > > in subscriber. > >>>> > >>>> After more thought, we might not need to raise an error even if > >>>> there is not the same data type on both publisher and subscriber. > >>>> Because data is sent after converted to the text representation and > >>>> is converted to a data type according to the local table definition > >>>> subscribers don't always need to have the same data type. If it's > >>>> right, slot_store_error_callback() doesn't need to find a > >>>> corresponding local data type OID but just finds the corresponding > >>>> type name by remote data type OID. What do you think? > > > > I totally agree. It will make logical replication more flexible with > > data type. > >>>> > >>>> --- a/src/backend/replication/logical/worker.c > >>>> +++ b/src/backend/replication/logical/worker.c > >>>> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping = > >>>> DLIST_STATIC_INIT(lsn_mapping); > >>>> > >>>> typedef struct SlotErrCallbackArg > >>>> { > >>>> - LogicalRepRelation *rel; > >>>> - int attnum; > >>>> + LogicalRepRelMapEntry *rel; > >>>> + int remote_attnum; > >>>> + int local_attnum; > >>>> } SlotErrCallbackArg; > >>>> > >>>> Since LogicalRepRelMapEntry has a map of local attributes to remote > >>>> ones we don't need to have two attribute numbers. > >>>> > >> Sorry for the late reply. > >> > >>> Attached the patch incorporated what I have on mind. Please review it. > >> > >> Thanks for the patch, I will do it at this weekend. > > > > Your patch is fine for me. > > But logicalrep_typmap_getid will be unused. > > Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow to > replicate data to an another data type of column, what is the data type > mapping on subscriber for? It seems enough to have remote data type oid, > namespace and name. Yeah, so the meaning of the type map will be disappeared, aren't you? I updated the patch with removing of LogicalRepTyp.typoid and changing concept "typmap" to "remote type". How do you think? --- Thanks and best regards, Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
Attachment
Hi Sawada-san, > > Sent: Tuesday, December 12, 2017 9:41 AM > > To: Dang Minh Huong <kakalot49@gmail.com> > > Cc: PostgreSQL Hackers <pgsql-hackers@postgresql.org>; Yanagisawa > > Hiroshi(柳澤 博) <hir-yanagisawa@ut.jp.nec.com>; Dangminh Huong(ダン > M > > フーン) <huo-dangminh@ys.jp.nec.com> > > Subject: Re: User defined data types in Logical Replication > > > > On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong > > <kakalot49@gmail.com> > > wrote: > > > On 2017/12/08 13:18, Huong Dangminh wrote: > > > > > >> Hi Sawada-san, > > >> > > >>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada > > >>> <sawada.mshk@gmail.com> > > >>> wrote: > > >>>> > > >>>> On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong > > >>>> <kakalot49@gmail.com> > > >>> > > >>> wrote: > > >>>>> > > >>>>> Hi Sawada-san, > > >>>>> > > >>>>> Sorry for my late response. > > >>>>> > > >>>>> On 2017/12/05 0:11, Masahiko Sawada wrote: > > >>>>> > > >>>>> There is one more case that user-defined data type is not > > >>>>> supported in Logical Replication. > > >>>>> That is when remote data type's name does not exist in SUBSCRIBE. > > >>>>> > > >>>>> In relation.c:logicalrep_typmap_gettypname > > >>>>> We search OID in syscache by remote's data type name and mapping > > >>>>> it, if it does not exist in syscache We will be faced with the > > >>>>> bellow error. > > >>>>> > > >>>>> if (!OidIsValid(entry->typoid)) > > >>>>> ereport(ERROR, > > >>>>> > > >>>>> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > >>>>> errmsg("data type \"%s.%s\" > > >>>>> required for logical replication does not exist", > > >>>>> > entry->nspname, > > >>>>> entry->typname))); > > >>>>> > > >>>>> I think, it is not necessary to check typoid here in order to > > >>>>> avoid above case, is that right? > > >>>>> > > >>>>> I think it's not right. We should end up with an error in the > > >>>>> case where the same type name doesn't exist on subscriber. With > > >>>>> your proposed patch, > > >>>>> logicalrep_typmap_gettypname() can return an invalid string > > >>>>> (entry->typname) in that case, which can be a cause of SEGV. > > >>>>> > > >>>>> Thanks, I think you are right. > > >>>>> # I thought that entry->typname was received from walsender, and > > >>>>> it is already be qualified in logicalrep_write_typ. > > >>>>> # But we also need check it in subscriber, because it could be > > >>>>> lost info in transmission. > > >>>> > > >>>> Oops, the last sentence of my previous mail was wrong. > > >>>> logicalrep_typmap_gettypname() doesn't return an invalid string > > >>>> since > > >>>> entry->typname is initialized with a type name got from wal sender. > > > > > > Yeah, so we do not need to check the existing of publisher's type > > > name in subscriber. > > >>>> > > >>>> After more thought, we might not need to raise an error even if > > >>>> there is not the same data type on both publisher and subscriber. > > >>>> Because data is sent after converted to the text representation > > >>>> and is converted to a data type according to the local table > > >>>> definition subscribers don't always need to have the same data > > >>>> type. If it's right, slot_store_error_callback() doesn't need to > > >>>> find a corresponding local data type OID but just finds the > > >>>> corresponding type name by remote data type OID. What do you think? > > > > > > I totally agree. It will make logical replication more flexible with > > > data type. > > >>>> > > >>>> --- a/src/backend/replication/logical/worker.c > > >>>> +++ b/src/backend/replication/logical/worker.c > > >>>> @@ -100,8 +100,9 @@ static dlist_head lsn_mapping = > > >>>> DLIST_STATIC_INIT(lsn_mapping); > > >>>> > > >>>> typedef struct SlotErrCallbackArg > > >>>> { > > >>>> - LogicalRepRelation *rel; > > >>>> - int attnum; > > >>>> + LogicalRepRelMapEntry *rel; > > >>>> + int remote_attnum; > > >>>> + int local_attnum; > > >>>> } SlotErrCallbackArg; > > >>>> > > >>>> Since LogicalRepRelMapEntry has a map of local attributes to > > >>>> remote ones we don't need to have two attribute numbers. > > >>>> > > >> Sorry for the late reply. > > >> > > >>> Attached the patch incorporated what I have on mind. Please review > it. > > >> > > >> Thanks for the patch, I will do it at this weekend. > > > > > > Your patch is fine for me. > > > But logicalrep_typmap_getid will be unused. > > > > Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow > > to replicate data to an another data type of column, what is the data > > type mapping on subscriber for? It seems enough to have remote data > > type oid, namespace and name. > > Yeah, so the meaning of the type map will be disappeared, aren't you? > I updated the patch with removing of LogicalRepTyp.typoid and changing > concept "typmap" to "remote type". > How do you think? Sorry, Please confirm V7 of patch (attached in this mail). --- Thanks and best regards, Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
Attachment
Hi, On 18/12/17 14:28, Huong Dangminh wrote: >>>> >>>> Your patch is fine for me. >>>> But logicalrep_typmap_getid will be unused. >>> >>> Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow >>> to replicate data to an another data type of column, what is the data >>> type mapping on subscriber for? It seems enough to have remote data >>> type oid, namespace and name. >> >> Yeah, so the meaning of the type map will be disappeared, aren't you? >> I updated the patch with removing of LogicalRepTyp.typoid and changing >> concept "typmap" to "remote type". >> How do you think? > > Sorry, Please confirm V7 of patch (attached in this mail). > I think the changes make sense in terms of how it all works now. That said I don't think the renaming idea is a good one, the naming was chosen to be future proof because eventually we'll need to map types to local oid (and possibly more) where the local info is cached so that we can interpret binary representation of replicated data (which we'll add at some point since it's big performance boost). So I am afraid that if we do the rename of typmap to remotetype in this patch it will a) make backports of fixes in the related code harder, b) force us to rename it back again in the future. I'd keep your general approach but keep using typmap naming. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi Petr Jelineks, Sawada-san > I think the changes make sense in terms of how it all works now. > > That said I don't think the renaming idea is a good one, the naming was > chosen to be future proof because eventually we'll need to map types to > local oid (and possibly more) where the local info is cached so that we > can interpret binary representation of replicated data (which we'll add > at some point since it's big performance boost). > > So I am afraid that if we do the rename of typmap to remotetype in this > patch it will a) make backports of fixes in the related code harder, b) > force us to rename it back again in the future. Thanks for your comment. > I'd keep your general approach but keep using typmap naming. I update the patch as Petr Jelineks mention, keep using typmap naming. --- Thanks and best regards, Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
Attachment
On Mon, Dec 18, 2017 at 11:25 PM, Huong Dangminh <huo-dangminh@ys.jp.nec.com> wrote: >> eventually we'll need to map types to >> local oid (and possibly more) where the local info is cached so that we >> can interpret binary representation of replicated data (which we'll add >> at some point since it's big performance boost). Sounds good. >> >> So I am afraid that if we do the rename of typmap to remotetype in this >> patch it will a) make backports of fixes in the related code harder, b) >> force us to rename it back again in the future. > > Thanks for your comment. > >> I'd keep your general approach but keep using typmap naming. > > I update the patch as Petr Jelineks mention, keep using typmap naming. > Thank you for updating the patch. Here is a review comment. - if (errarg->attnum < 0) + rel = errarg->rel; + remote_attnum = rel->attrmap[errarg->local_attnum]; + + if (remote_attnum < 0) return; I think errarg->local_attnum can be -1 here and access an invalid address if slot_store_error_callback() is called before setting errarg.local_attnum. I cannot see such call path in the current code so far but would need to be fixed. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi Sawada-san, > >> eventually we'll need to map types to local oid (and possibly more) > >> where the local info is cached so that we can interpret binary > >> representation of replicated data (which we'll add at some point > >> since it's big performance boost). > > Sounds good. > > >> > >> So I am afraid that if we do the rename of typmap to remotetype in > >> this patch it will a) make backports of fixes in the related code > >> harder, b) force us to rename it back again in the future. > > > > Thanks for your comment. > > > >> I'd keep your general approach but keep using typmap naming. > > > > I update the patch as Petr Jelineks mention, keep using typmap naming. > > > > Thank you for updating the patch. Here is a review comment. Thanks for reviewing. > - if (errarg->attnum < 0) > + rel = errarg->rel; > + remote_attnum = rel->attrmap[errarg->local_attnum]; > + > + if (remote_attnum < 0) > return; > > I think errarg->local_attnum can be -1 here and access an invalid address > if slot_store_error_callback() is called before setting > errarg.local_attnum. I cannot see such call path in the current code so > far but would need to be fixed. I updated the patch to fix it. --- Thanks and best regards, Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
Attachment
On Wed, Dec 20, 2017 at 2:28 PM, Huong Dangminh <huo-dangminh@ys.jp.nec.com> wrote: > Hi Sawada-san, > >> >> eventually we'll need to map types to local oid (and possibly more) >> >> where the local info is cached so that we can interpret binary >> >> representation of replicated data (which we'll add at some point >> >> since it's big performance boost). >> >> Sounds good. >> >> >> >> >> So I am afraid that if we do the rename of typmap to remotetype in >> >> this patch it will a) make backports of fixes in the related code >> >> harder, b) force us to rename it back again in the future. >> > >> > Thanks for your comment. >> > >> >> I'd keep your general approach but keep using typmap naming. >> > >> > I update the patch as Petr Jelineks mention, keep using typmap naming. >> > >> >> Thank you for updating the patch. Here is a review comment. > > Thanks for reviewing. > >> - if (errarg->attnum < 0) >> + rel = errarg->rel; >> + remote_attnum = rel->attrmap[errarg->local_attnum]; >> + >> + if (remote_attnum < 0) >> return; >> >> I think errarg->local_attnum can be -1 here and access an invalid address >> if slot_store_error_callback() is called before setting >> errarg.local_attnum. I cannot see such call path in the current code so >> far but would need to be fixed. > > I updated the patch to fix it. > Thank you for quick response. The changes look good to me. But I wonder if the following changes needs some comments to describe what each checks does for. - if (errarg->attnum < 0) + if (errarg->local_attnum <0) + return; + + rel = errarg->rel; + remote_attnum = rel->attrmap[errarg->local_attnum]; + + if (remote_attnum < 0) return; Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi Sawada-san, > Thank you for quick response. The changes look good to me. But I wonder > if the following changes needs some comments to describe what each checks > does for. > > - if (errarg->attnum < 0) > + if (errarg->local_attnum <0) > + return; > + > + rel = errarg->rel; > + remote_attnum = rel->attrmap[errarg->local_attnum]; > + > + if (remote_attnum < 0) > return; Thanks, I have added some comments as you mentioned. --- Thanks and best regards, Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
Attachment
On Wed, Dec 20, 2017 at 5:39 PM, Huong Dangminh <huo-dangminh@ys.jp.nec.com> wrote: > Hi Sawada-san, > >> Thank you for quick response. The changes look good to me. But I wonder >> if the following changes needs some comments to describe what each checks >> does for. >> >> - if (errarg->attnum < 0) >> + if (errarg->local_attnum <0) >> + return; >> + >> + rel = errarg->rel; >> + remote_attnum = rel->attrmap[errarg->local_attnum]; >> + >> + if (remote_attnum < 0) >> return; > > Thanks, I have added some comments as you mentioned. > Thank you for updating the patch. - if (errarg->attnum < 0) + /* Check case of slot_store_error_callback() is called before + * errarg.local_attnum is set. */ + if (errarg->local_attnum <0) This comment style isn't preferred by PostgreSQL code. Please refer to https://www.postgresql.org/docs/current/static/source-format.html -- $ git diff --check src/backend/replication/logical/worker.c:291: trailing whitespace. + /* Check case of slot_store_error_callback() is called before There is an extra white space in the patch. I'm inclined to think SlotErrCallbackArg can have remote_attnum and pass it to the callback function. That way, we don't need to case the case where local_attnum is not set yet. Attached a new version patch incorporated the aboves. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On 2017/12/21 10:05, Masahiko Sawada wrote: > On Wed, Dec 20, 2017 at 5:39 PM, Huong Dangminh > <huo-dangminh@ys.jp.nec.com> wrote: >> Hi Sawada-san, >> >>> Thank you for quick response. The changes look good to me. But I wonder >>> if the following changes needs some comments to describe what each checks >>> does for. >>> >>> - if (errarg->attnum < 0) >>> + if (errarg->local_attnum <0) >>> + return; >>> + >>> + rel = errarg->rel; >>> + remote_attnum = rel->attrmap[errarg->local_attnum]; >>> + >>> + if (remote_attnum < 0) >>> return; >> Thanks, I have added some comments as you mentioned. >> > Thank you for updating the patch. > > - if (errarg->attnum < 0) > + /* Check case of slot_store_error_callback() is called before > + * errarg.local_attnum is set. */ > + if (errarg->local_attnum <0) > > This comment style isn't preferred by PostgreSQL code. Please refer to > https://www.postgresql.org/docs/current/static/source-format.html > -- > $ git diff --check > src/backend/replication/logical/worker.c:291: trailing whitespace. > + /* Check case of slot_store_error_callback() is called before Thanks, I will be careful in the next time. > There is an extra white space in the patch. > > I'm inclined to think SlotErrCallbackArg can have remote_attnum and > pass it to the callback function. That way, we don't need to case the > case where local_attnum is not set yet. Nice. > Attached a new version patch incorporated the aboves. Please review it. Thanks for updating the patch. It looks fine to me. --- Thanks and best regards, Dang Minh Huong
On Sat, Dec 23, 2017 at 4:08 PM, Dang Minh Huong <kakalot49@gmail.com> wrote: > On 2017/12/21 10:05, Masahiko Sawada wrote: > >> On Wed, Dec 20, 2017 at 5:39 PM, Huong Dangminh >> <huo-dangminh@ys.jp.nec.com> wrote: >>> >>> Hi Sawada-san, >>> >>>> Thank you for quick response. The changes look good to me. But I wonder >>>> if the following changes needs some comments to describe what each >>>> checks >>>> does for. >>>> >>>> - if (errarg->attnum < 0) >>>> + if (errarg->local_attnum <0) >>>> + return; >>>> + >>>> + rel = errarg->rel; >>>> + remote_attnum = rel->attrmap[errarg->local_attnum]; >>>> + >>>> + if (remote_attnum < 0) >>>> return; >>> >>> Thanks, I have added some comments as you mentioned. >>> >> Thank you for updating the patch. >> >> - if (errarg->attnum < 0) >> + /* Check case of slot_store_error_callback() is called before >> + * errarg.local_attnum is set. */ >> + if (errarg->local_attnum <0) >> >> This comment style isn't preferred by PostgreSQL code. Please refer to >> https://www.postgresql.org/docs/current/static/source-format.html >> -- >> $ git diff --check >> src/backend/replication/logical/worker.c:291: trailing whitespace. >> + /* Check case of slot_store_error_callback() is called before > > Thanks, I will be careful in the next time. >> >> There is an extra white space in the patch. >> >> I'm inclined to think SlotErrCallbackArg can have remote_attnum and >> pass it to the callback function. That way, we don't need to case the >> case where local_attnum is not set yet. > > Nice. >> >> Attached a new version patch incorporated the aboves. Please review it. > > Thanks for updating the patch. > It looks fine to me. > Thank you for confirmation. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
> From: Masahiko Sawada [mailto:sawada.mshk@gmail.com] > >> Attached a new version patch incorporated the aboves. Please review it. > > > > Thanks for updating the patch. > > It looks fine to me. I mean that it passes make check, and subscription TAP tests too. > > > > Thank you for confirmation. > Also. I have changed status in CF to Ready for Committer. I would be glad if it can be applied from PostgreSQL 10. --- Thanks and best regards, Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
hash_seq_init in logicalrep_typmap_invalidate_cb is useless after your patch. If you remove it, the function becomes empty, so why is it there an invalidation callback at all? Are we now leaking memory if types keep repeatedly being re-created in the origin? I suppose it's not a common use pattern, but it'd be good to avoid everlasting memleaks. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jan 6, 2018 at 3:53 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > hash_seq_init in logicalrep_typmap_invalidate_cb is useless after your > patch. If you remove it, the function becomes empty, so why is it there > an invalidation callback at all? Thank you for the comment. Yeah, logicalrep_typmap_invalidate_cb is no longer needed. Attached an updated patch. > Are we now leaking memory if types keep repeatedly being re-created in > the origin? The type name and namespace name in LogicalRepTyp are freed when updating entries but LogicalRepTyp entry itself could be leaked. It can happen to relation map as well. Since we don't remove hash entry during working in the origin the hash map entry for relation map is leaked if publication repeatedly adds/drops tables and subscription refreshes it. > I suppose it's not a common use pattern, but it'd be good > to avoid everlasting memleaks. I agree. Can we remove entry from hash table in the callbacks instead of setting InvalidOid when invalidate caches? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Hi, It seems to be in the middle of discussion, but I became a reviewer of this problem several days ago so I've tested the latest patch 'fix_slot_store_error_callback_v12.patch'. I reproduced the below ERROR by inserting elog() to INPUT function of CREATE TYPE, and confirmed that above patch solves the problem. >> ERROR: XX000: cache lookup failed for type XXXXX I also ran make check-world and there was no error. Is the only remaining topic memory leaks? On 2018/01/09 17:22, Masahiko Sawada wrote: >> I suppose it's not a common use pattern, but it'd be good >> to avoid everlasting memleaks. > > I agree. Can we remove entry from hash table in the callbacks instead > of setting InvalidOid when invalidate caches? -- Atsushi Torikoshi NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 30, 2018 at 6:36 PM, atorikoshi <torikoshi_atsushi_z2@lab.ntt.co.jp> wrote: > Hi, > > It seems to be in the middle of discussion, but I became a reviewer of > this problem several days ago so I've tested the latest patch > 'fix_slot_store_error_callback_v12.patch'. > > I reproduced the below ERROR by inserting elog() to INPUT function > of CREATE TYPE, and confirmed that above patch solves the problem. > >>> ERROR: XX000: cache lookup failed for type XXXXX > > I also ran make check-world and there was no error. > > Is the only remaining topic memory leaks? > Yeah, but I think that the patch for the avoiding memleaks should be a separate patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
I noticed that logicalrep_typmap_gettypname's only caller is slot_store_error_callback, which is an error callback. So by raising an error from the former, we would effectively just hide the actual reason for the error behind the error that the cache entry cannot be found. Therefore, I'm inclined to make this function raise a warning, then return a substitute value (something like "unrecognized type XYZ"). Do the same for the case with the mismatched major versions. Lastly, if LogicalRepTypMap is not defined, also throw a warning and return a substitute value rather than try to initialize the hash table: initializing the table is not going to make the entry show up in it, anyway, so it's pretty pointless; and there's plenty chance for things to go wrong, which is not a good idea in an error callback. Do we need a new TAP test for this? Judging by https://coverage.postgresql.org/src/backend/replication/logical/relation.c.gcov.html showing all zeroes for the last function, we do. Same with slot_store_error_callback in https://coverage.postgresql.org/src/backend/replication/logical/worker.c.gcov.html Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I noticed that logicalrep_typmap_gettypname's only caller is > slot_store_error_callback, which is an error callback. So by raising an > error from the former, we would effectively just hide the actual reason > for the error behind the error that the cache entry cannot be found. > > Therefore, I'm inclined to make this function raise a warning, then > return a substitute value (something like "unrecognized type XYZ"). Do > the same for the case with the mismatched major versions. Lastly, if > LogicalRepTypMap is not defined, also throw a warning and return a > substitute value rather than try to initialize the hash table: > initializing the table is not going to make the entry show up in it, > anyway, so it's pretty pointless; and there's plenty chance for things > to go wrong, which is not a good idea in an error callback. I agree with you about not hiding the actual reason for the error but if we raise a warning at logicalrep_typmap_gettypname don't we call slot_store_error_callback recursively? > > Do we need a new TAP test for this? Judging by > https://coverage.postgresql.org/src/backend/replication/logical/relation.c.gcov.html > showing all zeroes for the last function, we do. Same with > slot_store_error_callback in > https://coverage.postgresql.org/src/backend/replication/logical/worker.c.gcov.html > Agreed. Will add a TAP test. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Masahiko Sawada wrote: > On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Therefore, I'm inclined to make this function raise a warning, then > > return a substitute value (something like "unrecognized type XYZ"). > > [...] > > I agree with you about not hiding the actual reason for the error but > if we raise a warning at logicalrep_typmap_gettypname don't we call > slot_store_error_callback recursively? Hmm, now that you mention it, I don't really know. I think it's supposed not to happen, since calling ereport() again opens a new recursion level, but then maybe errcontext doesn't depend on the recursion level ... I haven't checked. This is why the TAP test would be handy :-) > Agreed. Will add a TAP test. Great. This patch waits on that, then. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 7, 2018 at 2:52 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Masahiko Sawada wrote: >> On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >> > Therefore, I'm inclined to make this function raise a warning, then >> > return a substitute value (something like "unrecognized type XYZ"). >> > [...] >> >> I agree with you about not hiding the actual reason for the error but >> if we raise a warning at logicalrep_typmap_gettypname don't we call >> slot_store_error_callback recursively? > > Hmm, now that you mention it, I don't really know. I think it's > supposed not to happen, since calling ereport() again opens a new > recursion level, but then maybe errcontext doesn't depend on the > recursion level ... I haven't checked. This is why the TAP test would > be handy :-) The calling ereport opens a new recursion level. The calling ereport with error doesn't return to caller but the calling with warning does. So the recursively calling ereport(WARNING) ends up with exceeding the errordata stack size. So it seems to me that we can set errcontext in logicalrep_typmap_gettypname() instead of raising warning or error. > >> Agreed. Will add a TAP test. > > Great. This patch waits on that, then. > Okay. I think the most simple and convenient way to reproduce this issue is to call an elog(LOG) in input function of a user-defined data type. So I'm thinking to create the test in src/test/module directory. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Mar 7, 2018 at 9:19 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Wed, Mar 7, 2018 at 2:52 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> Masahiko Sawada wrote: >>> On Tue, Mar 6, 2018 at 8:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> >>> > Therefore, I'm inclined to make this function raise a warning, then >>> > return a substitute value (something like "unrecognized type XYZ"). >>> > [...] >>> >>> I agree with you about not hiding the actual reason for the error but >>> if we raise a warning at logicalrep_typmap_gettypname don't we call >>> slot_store_error_callback recursively? >> >> Hmm, now that you mention it, I don't really know. I think it's >> supposed not to happen, since calling ereport() again opens a new >> recursion level, but then maybe errcontext doesn't depend on the >> recursion level ... I haven't checked. This is why the TAP test would >> be handy :-) > > The calling ereport opens a new recursion level. The calling ereport > with error doesn't return to caller but the calling with warning does. > So the recursively calling ereport(WARNING) ends up with exceeding the > errordata stack size. So it seems to me that we can set errcontext in > logicalrep_typmap_gettypname() instead of raising warning or error. > >> >>> Agreed. Will add a TAP test. >> >> Great. This patch waits on that, then. >> > > Okay. I think the most simple and convenient way to reproduce this > issue is to call an elog(LOG) in input function of a user-defined data > type. So I'm thinking to create the test in src/test/module directory. > After more thought, I think since the errors in logicalrep_typmap_gettypname are not relevant with the actual error (i.g. type conversion error) it would not be good idea to show the error message of "could not found data type entry" in errcontext. It might be more appropriate if we return a substitute value ("unrecognized type" or "unrecognized built-in type") without raising neither an error nor a warning. Thoughts? Regarding to regression test, I added a new test module test_subscription that creates a new user-defined data type. In a subscription regression test, using test_subscription we make subscriber call slot_store_error_callback and check if the subscriber can correctly look up both remote and local data type strings. One downside of this regression test is that in a failure case, the duration of the test will be long (up to 180sec) because it has to wait for the polling timeout. Attached an updated patch with a regression test. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Masahiko Sawada wrote: > After more thought, I think since the errors in > logicalrep_typmap_gettypname are not relevant with the actual error > (i.g. type conversion error) it would not be good idea to show the > error message of "could not found data type entry" in errcontext. > It might be more appropriate if we return a substitute value > ("unrecognized type" or "unrecognized built-in type") without raising > neither an error nor a warning. Thoughts? Yeah, seems a reasonable idea to me. Here's a tidied-up version of your patch, minus the regression test changes (I may end up committing that one separately). But I now hesitate to push it because I'm unsure of how does type mapping actually work, and whether it's still working after this patch -- for example if we create two user-defined datatypes in opposite orders in the nodes (so they get different OIDs), are we able to replicate data correctly from one side to the other? If there's code to support this case, it is not at all obvious where it is. > Regarding to regression test, I added a new test module > test_subscription that creates a new user-defined data type. I think this is a good thing to add. I wonder if we could have the module create two extensions, and have the TAP test create first A then B in one node, and first B in the other node, replicate a table that uses type A, see how it fails, then create type A and ensure replication works correctly. (Ideally the types would not be binary compatible, so that it'd not work by accident.) > One downside of this regression test is that in a failure case, the > duration of the test will be long (up to 180sec) because it has to > wait for the polling timeout. This is bad. I haven't read it. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > Yeah, seems a reasonable idea to me. Here's a tidied-up version of your > patch, minus the regression test changes (I may end up committing that > one separately). ... and patch. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Masahiko Sawada wrote: > Regarding to regression test, I added a new test module > test_subscription that creates a new user-defined data type. In a > subscription regression test, using test_subscription we make > subscriber call slot_store_error_callback and check if the subscriber > can correctly look up both remote and local data type strings. One > downside of this regression test is that in a failure case, the > duration of the test will be long (up to 180sec) because it has to > wait for the polling timeout. > Attached an updated patch with a regression test. Pushed the fix to pg10 and master. Thanks to all involved for the report, patches and review. Here's the regression test patch. The problem with it is that the TAP test is not verifying much -- I tried applying it before the fix commit, and it succeeds! The only funny is that the errcontext messages are wrong, they look like this: 2018-03-14 20:31:03.564 -03 [763018] LOG: input int: 1 2018-03-14 20:31:03.564 -03 [763018] CONTEXT: processing remote data for replication target relation "public.test" column"b", remote type dummytext, local type dummyint 2018-03-14 20:31:03.564 -03 [763018] LOG: input text: "one" 2018-03-14 20:31:03.564 -03 [763018] CONTEXT: processing remote data for replication target relation "public.test" column"a", remote type dummyint, local type dummytext but I think it would be better to verify them. (With your version I think you were trusting that the OID would not match anything, giving raise to the "cache lookup failed" error before the patch. I'm not sure that that's very trustworthy either.) I think this is a worthwhile test, but IMO it should be improved a bit before we include it. Also, we can come up with a better name for the test surely, not just refer to this particular bug. Maybe "typemap". -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Masahiko Sawada wrote: > >> Regarding to regression test, I added a new test module >> test_subscription that creates a new user-defined data type. In a >> subscription regression test, using test_subscription we make >> subscriber call slot_store_error_callback and check if the subscriber >> can correctly look up both remote and local data type strings. One >> downside of this regression test is that in a failure case, the >> duration of the test will be long (up to 180sec) because it has to >> wait for the polling timeout. >> Attached an updated patch with a regression test. > > Pushed the fix to pg10 and master. Thanks to all involved for the > report, patches and review. Thank you for committing! > > Here's the regression test patch. The problem with it is that the TAP > test is not verifying much -- I tried applying it before the fix commit, > and it succeeds! The only funny is that the errcontext messages are > wrong, they look like this: > > 2018-03-14 20:31:03.564 -03 [763018] LOG: input int: 1 > 2018-03-14 20:31:03.564 -03 [763018] CONTEXT: processing remote data for replication target relation "public.test" column"b", remote type dummytext, local type dummyint > 2018-03-14 20:31:03.564 -03 [763018] LOG: input text: "one" > 2018-03-14 20:31:03.564 -03 [763018] CONTEXT: processing remote data for replication target relation "public.test" column"a", remote type dummyint, local type dummytext > > but I think it would be better to verify them. (With your version I > think you were trusting that the OID would not match anything, giving > raise to the "cache lookup failed" error before the patch. I'm not sure > that that's very trustworthy either.) Agreed. Also, since my patch attempted to lead the error we need a long time to check if it failed, which is not good. > > I think this is a worthwhile test, but IMO it should be improved a bit > before we include it. Also, we can come up with a better name for the > test surely, not just refer to this particular bug. Maybe "typemap". > It might be useful if we have the facility of TAP test to check the log message and regexp-match the message to the expected string. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Masahiko Sawada wrote: > On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > I think this is a worthwhile test, but IMO it should be improved a bit > > before we include it. Also, we can come up with a better name for the > > test surely, not just refer to this particular bug. Maybe "typemap". > > It might be useful if we have the facility of TAP test to check the > log message and regexp-match the message to the expected string. Something similar to PostgresNode::issues_sql_like() perhaps? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> Masahiko Sawada wrote: > > > Regarding to regression test, I added a new test module > > test_subscription that creates a new user-defined data type. In a > > subscription regression test, using test_subscription we make > > subscriber call slot_store_error_callback and check if the subscriber > > can correctly look up both remote and local data type strings. One > > downside of this regression test is that in a failure case, the > > duration of the test will be long (up to 180sec) because it has to > > wait for the polling timeout. > > Attached an updated patch with a regression test. > > Pushed the fix to pg10 and master. Thanks to all involved for the report, > patches and review. > Thank you all for fixing it. --- Dang Minh Huong NEC Solution Innovators, Ltd. http://www.nec-solutioninnovators.co.jp/en/
On Fri, Mar 16, 2018 at 10:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Masahiko Sawada wrote: >> On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > >> > I think this is a worthwhile test, but IMO it should be improved a bit >> > before we include it. Also, we can come up with a better name for the >> > test surely, not just refer to this particular bug. Maybe "typemap". >> >> It might be useful if we have the facility of TAP test to check the >> log message and regexp-match the message to the expected string. > > Something similar to PostgresNode::issues_sql_like() perhaps? > Yeah, I didn't know that but I think it's a good idea. Unlike issues_sql_like() we don't issue anything to the subscriber. So maybe we need truncate logfile before insertion and verify logfile of particular period. The test code would be like follows. $node_subscriber->safe_psql('postgres', 'CREATE SUBSCRIPTION..."); truncate $node_subscriber->logfile, 0; $node_publisher->safe_psql('postgres', 'INSERT .. ') my $log = TestLib::slurp_file($node_subscriber->logfile); # Verify logs like($log, qr/processing remote data for replication target relation "public.test" column "b", remote type dummyint, local type dummyint/, 'callback function of datatype conversion1'); like($log, qr/processing remote data for replication target relation "public.test" column "a", remote type dummytext, local type dummytext/, 'callback function of datatype conversion2'); Thoughts? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
At Mon, 19 Mar 2018 12:50:55 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in <CAD21AoD=kiDVxfZ2aT_Oeg7+5etkxg0eqmsRE-gcUbptKNir6g@mail.gmail.com> > On Fri, Mar 16, 2018 at 10:24 AM, Alvaro Herrera > <alvherre@alvh.no-ip.org> wrote: > > Masahiko Sawada wrote: > >> On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > >> > I think this is a worthwhile test, but IMO it should be improved a bit > >> > before we include it. Also, we can come up with a better name for the > >> > test surely, not just refer to this particular bug. Maybe "typemap". > >> > >> It might be useful if we have the facility of TAP test to check the > >> log message and regexp-match the message to the expected string. > > > > Something similar to PostgresNode::issues_sql_like() perhaps? > > > > Yeah, I didn't know that but I think it's a good idea. Unlike > issues_sql_like() we don't issue anything to the subscriber. So maybe > we need truncate logfile before insertion and verify logfile of > particular period. The test code would be like follows. > > $node_subscriber->safe_psql('postgres', 'CREATE SUBSCRIPTION..."); > truncate $node_subscriber->logfile, 0; > $node_publisher->safe_psql('postgres', 'INSERT .. ') > my $log = TestLib::slurp_file($node_subscriber->logfile); > > # Verify logs > like($log, qr/processing remote data for replication target relation > "public.test" column "b", remote type dummyint, local type dummyint/, > 'callback function of datatype conversion1'); > like($log, qr/processing remote data for replication target relation > "public.test" column "a", remote type dummytext, local type > dummytext/, 'callback function of datatype conversion2'); > > Thoughts? FWIW something like is in a currently proposed patch. https://www.postgresql.org/message-id/20180129.194023.228030941.horiguchi.kyotaro@lab.ntt.co.jp 0003 contains the following function. +# find $pat in logfile of $node after $off-th byte +sub find_in_log +{ + my ($node, $pat, $off) = @_; + + $off = 0 unless defined $off; + my $log = TestLib::slurp_file($node->logfile); + return 0 if (length($log) <= $off); + + $log = substr($log, $off); + + return $log =~ m/$pat/; +} It is used as the follows. +$logstart = get_log_size($node_standby); +$node_standby->start; + +my $failed = 0; +for (my $i = 0 ; $i < 10000 ; $i++) +{ + if (find_in_log($node_standby, + "requested WAL segment [0-9A-F]+ has already been removed", + $logstart)) + { + $failed = 1; + last; + } + usleep(100_000); +} +ok($failed, 'check replication has been broken'); regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Mar 19, 2018 at 12:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Fri, Mar 16, 2018 at 10:24 AM, Alvaro Herrera > <alvherre@alvh.no-ip.org> wrote: >> Masahiko Sawada wrote: >>> On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> >>> > I think this is a worthwhile test, but IMO it should be improved a bit >>> > before we include it. Also, we can come up with a better name for the >>> > test surely, not just refer to this particular bug. Maybe "typemap". >>> >>> It might be useful if we have the facility of TAP test to check the >>> log message and regexp-match the message to the expected string. >> >> Something similar to PostgresNode::issues_sql_like() perhaps? >> > > Yeah, I didn't know that but I think it's a good idea. Unlike > issues_sql_like() we don't issue anything to the subscriber. So maybe > we need truncate logfile before insertion and verify logfile of > particular period. The test code would be like follows. > > $node_subscriber->safe_psql('postgres', 'CREATE SUBSCRIPTION..."); > truncate $node_subscriber->logfile, 0; > $node_publisher->safe_psql('postgres', 'INSERT .. ') > my $log = TestLib::slurp_file($node_subscriber->logfile); > > # Verify logs > like($log, qr/processing remote data for replication target relation > "public.test" column "b", remote type dummyint, local type dummyint/, > 'callback function of datatype conversion1'); > like($log, qr/processing remote data for replication target relation > "public.test" column "a", remote type dummytext, local type > dummytext/, 'callback function of datatype conversion2'); > > Thoughts? > Attached an updated test patch added the above test(0002 patch). Since for this test case it's enough to use existing test functions I didn't create new test functions. Also I found that the local data type name in log for data type conversion isn't qualified whereas the remote data type is always qualified. Attached 0001 patch fixes that. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
On Mon, Mar 19, 2018 at 7:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Mon, Mar 19, 2018 at 12:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> On Fri, Mar 16, 2018 at 10:24 AM, Alvaro Herrera >> <alvherre@alvh.no-ip.org> wrote: >>> Masahiko Sawada wrote: >>>> On Thu, Mar 15, 2018 at 9:41 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >>> >>>> > I think this is a worthwhile test, but IMO it should be improved a bit >>>> > before we include it. Also, we can come up with a better name for the >>>> > test surely, not just refer to this particular bug. Maybe "typemap". >>>> >>>> It might be useful if we have the facility of TAP test to check the >>>> log message and regexp-match the message to the expected string. >>> >>> Something similar to PostgresNode::issues_sql_like() perhaps? >>> >> >> Yeah, I didn't know that but I think it's a good idea. Unlike >> issues_sql_like() we don't issue anything to the subscriber. So maybe >> we need truncate logfile before insertion and verify logfile of >> particular period. The test code would be like follows. >> >> $node_subscriber->safe_psql('postgres', 'CREATE SUBSCRIPTION..."); >> truncate $node_subscriber->logfile, 0; >> $node_publisher->safe_psql('postgres', 'INSERT .. ') >> my $log = TestLib::slurp_file($node_subscriber->logfile); >> >> # Verify logs >> like($log, qr/processing remote data for replication target relation >> "public.test" column "b", remote type dummyint, local type dummyint/, >> 'callback function of datatype conversion1'); >> like($log, qr/processing remote data for replication target relation >> "public.test" column "a", remote type dummytext, local type >> dummytext/, 'callback function of datatype conversion2'); >> >> Thoughts? >> > > Attached an updated test patch added the above test(0002 patch). Since > for this test case it's enough to use existing test functions I didn't > create new test functions. Also I found that the local data type name > in log for data type conversion isn't qualified whereas the remote > data type is always qualified. Attached 0001 patch fixes that. > The original issue has been fixed and this entry on CommitFest has been marked as "Committed" but there are still works for improving testing. Perhaps I should register a new entry of remaining patches to next CommitFest. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
2018-04-11 10:16 GMT+09:00 Masahiko Sawada <sawada.mshk@gmail.com>:
The original issue has been fixed and this entry on CommitFest hasOn Mon, Mar 19, 2018 at 7:57 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Attached an updated test patch added the above test(0002 patch). Since
> for this test case it's enough to use existing test functions I didn't
> create new test functions. Also I found that the local data type name
> in log for data type conversion isn't qualified whereas the remote
> data type is always qualified. Attached 0001 patch fixes that.
>
been marked as "Committed" but there are still works for improving
testing. Perhaps I should register a new entry of remaining patches to
next CommitFest.
Thanks. I appreciate it.
---
Dang Minh Huong