Thread: [GENERAL] Potential bug with pg_notify

[GENERAL] Potential bug with pg_notify

From
François Beaulieu
Date:
Hi all,

I’m trying to feed a worker process on another server using pg_notify in a trigger. I’m running pgsql 9.4 and hitting
somebehaviour that I’m hoping is just a bug that can be solved with an upgrade, but I’m not finding any references to
itbeing a known bug and the uptime on my database server is critical so I can’t upgrade on a whim. Basically, the
triggerruns on every row insert and notifies me with a few fields from the row. This works perfectly most of the time,
butevery few hundred rows, the notify will return null values. Here are the details:
 

My database uses time-based partitioning (and subpartitioning) with pg_partman. This is done with a before insert
triggeron the main parent table. pg_partman creates subtables dynamically, so I can’t easily run my trigger on each
childtable. I also can’t run it after insert on my parent table because the row never makes it there. Thus, I ave
electedto run my trigger before insert on the parent table, and have named it so that it is alphabetically first and
willrun before the partitioning trigger. Works perfectly most of the time. Here are the trigger and associated plpgsql
function:

—CODE---
CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
DECLARE
BEGIN
  PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id || ',' || NEW.userfield);
  RETURN new;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER aaa_notify BEFORE INSERT ON aaa FOR EACH ROW execute procedure notify_trigger();
—/CODE—

On my listener, every once in a while, this returns rows such as this:

AAA,17947227,XXX
AAA, , 
AAA,17947229,ZZZ 

Notice the second line, which appears to have even the autoincrement ‘_id' empty. It would seem to match _id = 17947228
andthat row does exist in the table and has data populated for all fields. 
 

Has anyone ever seen anything like this?

Thanks,
-=François Beaulieu
SBK Telecom


Re: [GENERAL] Potential bug with pg_notify

From
Adrian Klaver
Date:
On 02/10/2017 02:54 PM, François Beaulieu wrote:
>
> Hi all,
>
> I’m trying to feed a worker process on another server using pg_notify in a trigger. I’m running pgsql 9.4 and hitting
somebehaviour that I’m hoping is just a bug that can be solved with an upgrade, but I’m not finding any references to
itbeing a known bug and the uptime on my database server is critical so I can’t upgrade on a whim. Basically, the
triggerruns on every row insert and notifies me with a few fields from the row. This works perfectly most of the time,
butevery few hundred rows, the notify will return null values. Here are the details: 
>
> My database uses time-based partitioning (and subpartitioning) with pg_partman. This is done with a before insert
triggeron the main parent table. pg_partman creates subtables dynamically, so I can’t easily run my trigger on each
childtable. I also can’t run it after insert on my parent table because the row never makes it there. Thus, I ave
electedto run my trigger before insert on the parent table, and have named it so that it is alphabetically first and
willrun before the partitioning trigger. Works perfectly most of the time. Here are the trigger and associated plpgsql
function:
>
> —CODE---
> CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
> DECLARE
> BEGIN
>   PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id || ',' || NEW.userfield);
>   RETURN new;
> END;
> $$ LANGUAGE plpgsql;
>
> CREATE TRIGGER aaa_notify BEFORE INSERT ON aaa FOR EACH ROW execute procedure notify_trigger();
> —/CODE—
>
> On my listener, every once in a while, this returns rows such as this:
>
> AAA,17947227,XXX
> AAA, ,
> AAA,17947229,ZZZ
>
> Notice the second line, which appears to have even the autoincrement ‘_id' empty. It would seem to match _id =
17947228and that row does exist in the table and has data populated for all fields. 
>
> Has anyone ever seen anything like this?

Can't say I have. I think this is going to need more information:

1) What is the schema for the parent table?

2) What is the pg_partman trigger function definition, as well the
CREATE TRIGGER definition that calls the function?

3) Are the first row and the second row in the same partition?


Just wondering if the pg_partman partitioning is delaying the allocation
of _id from the sequence in a way that your notify_trigger() does not
get it in time. It is just that TG_TABLE_NAME and NEW.userfield are part
of the INSERT, while NEW._id is actually a request for information from
another object.

>
> Thanks,
> -=François Beaulieu
> SBK Telecom
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Potential bug with pg_notify

From
François Beaulieu
Date:
> On Feb 13, 2017, at 10:28 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> 
> On 02/10/2017 02:54 PM, François Beaulieu wrote:
>> 
>> Hi all,
>> 
>> I’m trying to feed a worker process on another server using pg_notify in a trigger. I’m running pgsql 9.4 and
hittingsome behaviour that I’m hoping is just a bug that can be solved with an upgrade, but I’m not finding any
referencesto it being a known bug and the uptime on my database server is critical so I can’t upgrade on a whim.
Basically,the trigger runs on every row insert and notifies me with a few fields from the row. This works perfectly
mostof the time, but every few hundred rows, the notify will return null values. Here are the details:
 
>> 
>> My database uses time-based partitioning (and subpartitioning) with pg_partman. This is done with a before insert
triggeron the main parent table. pg_partman creates subtables dynamically, so I can’t easily run my trigger on each
childtable. I also can’t run it after insert on my parent table because the row never makes it there. Thus, I ave
electedto run my trigger before insert on the parent table, and have named it so that it is alphabetically first and
willrun before the partitioning trigger. Works perfectly most of the time. Here are the trigger and associated plpgsql
function:
>> 
>> —CODE---
>> CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
>> DECLARE
>> BEGIN
>> PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id || ',' || NEW.userfield);
>> RETURN new;
>> END;
>> $$ LANGUAGE plpgsql;
>> 
>> CREATE TRIGGER aaa_notify BEFORE INSERT ON aaa FOR EACH ROW execute procedure notify_trigger();
>> —/CODE—
>> 
>> On my listener, every once in a while, this returns rows such as this:
>> 
>> AAA,17947227,XXX
>> AAA, ,
>> AAA,17947229,ZZZ
>> 
>> Notice the second line, which appears to have even the autoincrement ‘_id' empty. It would seem to match _id =
17947228and that row does exist in the table and has data populated for all fields.
 
>> 
>> Has anyone ever seen anything like this?
> 
> Can't say I have. I think this is going to need more information:
> 
> -- 
> Adrian Klaver
> adrian.klaver@aklaver.com

> 1) What is the schema for the parent table?

                                                         Table "public.cdr"
  Column    |            Type             |                     Modifiers                     | Storage  | Stats target
|Descripti
 
on

-------------+-----------------------------+---------------------------------------------------+----------+--------------+----------
---
_id         | bigint                      | not null default nextval('cdr__id_seq'::regclass) | plain    |
|
calldate    | timestamp without time zone | not null                                          | plain    |
|
callanswer  | timestamp without time zone |                                                   | plain    |
|
callend     | timestamp without time zone | not null                                          | plain    |
|
clid        | character varying(80)       |                                                   | extended |
|
dnid        | character varying(80)       |                                                   | extended |
|
src         | character varying(80)       |                                                   | extended |
|
dst         | character varying(80)       | not null                                          | extended |
|
dcontext    | character varying(80)       | not null                                          | extended |
|
channel     | character varying(80)       | not null                                          | extended |
|
dstchannel  | character varying(80)       |                                                   | extended |
|
lastapp     | character varying(80)       |                                                   | extended |
|
lastdata    | character varying(80)       |                                                   | extended |
|
duration    | integer                     | not null                                          | plain    |
|
billsec     | integer                     | not null                                          | plain    |
|
disposition | character varying(45)       | not null                                          | extended |
|
amaflags    | integer                     | not null                                          | plain    |
|
accountcode | character varying(20)       |                                                   | extended |
|
uniqueid    | character varying(150)      | not null                                          | extended |
|
userfield   | character varying(255)      |                                                   | extended |
|
Indexes:
   "cdr_pkey" PRIMARY KEY, btree (_id, calldate)
   "cdr__id_idx" btree (_id)
   "cdr_accountcode_idx" btree (accountcode)
   "cdr_amaflags_idx" btree (amaflags)
   "cdr_calldate_idx" btree (calldate)
   "cdr_clid_idx" btree (clid)
   "cdr_disposition_idx" btree (disposition)
   "cdr_dnid_idx" btree (dnid)
   "cdr_dst_idx" btree (dst)
   "cdr_src_idx" btree (src)
   "cdr_uniqueid_idx" btree (uniqueid)
   "cdr_userfield_idx" btree (userfield)
Triggers:
   cdr_notify BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE notify_trigger()
   cdr_part_trig BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE cdr_part_trig_func()
Child tables: cdr_p2016w06,
             cdr_p2016w07,
             cdr_p2016w08,
             cdr_p2016w09,
             cdr_p2016w10,
etc…

> 2) What is the pg_partman trigger function definition, as well the CREATE TRIGGER definition that calls the
function?
cdr_part_trig BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE cdr_part_trig_func()

public | cdr_part_trig_func                | trigger          |                     | trigger | invoker  | volatile   |
postgres|
 
plpgsql  |
+|
 
       |                                   |                  |                     |         |          |            |
        |
 
        |         BEGIN
+|
       |                                   |                  |                     |         |          |            |
        |
 
        |         IF TG_OP = 'INSERT' THEN
+|
       |                                   |                  |                     |         |          |            |
        |
 
        |             IF NEW.calldate >= '2017-02-13 00:00:00+00' AND NEW.calldate < '2017-02-20 00:00:00+00' THEN
+|
       |                                   |                  |                     |         |          |            |
        |
 
        |                 INSERT INTO public.cdr_p2017w07 VALUES (NEW.*);
+|
       |                                   |                  |                     |         |
        |             ELSIF NEW.calldate >= '2017-02-06 00:00:00+00' AND NEW.calldate < '2017-02-13 00:00:00+00' THEN
+|
       |                                   |                  |                     |         |          |            |
        |          
 
    |                 INSERT INTO public.cdr_p2017w06 VALUES (NEW.*);                                               +|
     […]
       |                                   |                  |                     |         |          |            |
        |         
 
     |             ELSE
+|
       |                                   |                  |                     |         |          |            |
        |          
 
    |                 RETURN NEW;                                                                                   +|
       |                                   |                  |                     |         |          |            |
        |          
 
    |             END IF;                                                                                           +|
       |                                   |                  |                     |         |          |            |
        |          
 
    |         END IF;                                                                                               +|
       |                                   |                  |                     |         |          |            |
        |          
 
    |         RETURN NULL;                                                                                          +|
       |                                   |                  |                     |         |          |            |
        |          
 
    |         END                                                                                                    |
                                                                    |
 

> 3) Are the first row and the second row in the same partition?

Doubtful, the problem occurs several times a day and we only have one partition a day. Let me check with the above
example.What would be the best way to determine which child a row is in, and the relative position in the child table?
 

Also; my worker in written in perl and uses DBD::Pg. I haven’t been able to 100% eliminate the module itself as the
causeof the bug. Any suggestions on how I might go about doing that efficiently?
 

Thanks,
-=François Beaulieu
SBK Telecom



Re: [GENERAL] Potential bug with pg_notify

From
Adrian Klaver
Date:
On 02/13/2017 07:59 AM, François Beaulieu wrote:
>
>> On Feb 13, 2017, at 10:28 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>> On 02/10/2017 02:54 PM, François Beaulieu wrote:
>>>
>>> Hi all,
>>>
>>> I’m trying to feed a worker process on another server using pg_notify in a trigger. I’m running pgsql 9.4 and
hittingsome behaviour that I’m hoping is just a bug that can be solved with an upgrade, but I’m not finding any
referencesto it being a known bug and the uptime on my database server is critical so I can’t upgrade on a whim.
Basically,the trigger runs on every row insert and notifies me with a few fields from the row. This works perfectly
mostof the time, but every few hundred rows, the notify will return null values. Here are the details: 
>>>
>>> My database uses time-based partitioning (and subpartitioning) with pg_partman. This is done with a before insert
triggeron the main parent table. pg_partman creates subtables dynamically, so I can’t easily run my trigger on each
childtable. I also can’t run it after insert on my parent table because the row never makes it there. Thus, I ave
electedto run my trigger before insert on the parent table, and have named it so that it is alphabetically first and
willrun before the partitioning trigger. Works perfectly most of the time. Here are the trigger and associated plpgsql
function:
>>>
>>> —CODE---
>>> CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
>>> DECLARE
>>> BEGIN
>>> PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id || ',' || NEW.userfield);
>>> RETURN new;
>>> END;
>>> $$ LANGUAGE plpgsql;
>>>
>>> CREATE TRIGGER aaa_notify BEFORE INSERT ON aaa FOR EACH ROW execute procedure notify_trigger();
>>> —/CODE—
>>>
>>> On my listener, every once in a while, this returns rows such as this:
>>>
>>> AAA,17947227,XXX
>>> AAA, ,
>>> AAA,17947229,ZZZ
>>>
>>> Notice the second line, which appears to have even the autoincrement ‘_id' empty. It would seem to match _id =
17947228and that row does exist in the table and has data populated for all fields. 
>>>
>>> Has anyone ever seen anything like this?
>>
>> Can't say I have. I think this is going to need more information:
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com
>
>> 1) What is the schema for the parent table?
>
>                                                          Table "public.cdr"
>   Column    |            Type             |                     Modifiers                     | Storage  | Stats
target| Descripti 
> on
>
-------------+-----------------------------+---------------------------------------------------+----------+--------------+----------
> ---
> _id         | bigint                      | not null default nextval('cdr__id_seq'::regclass) | plain    |
 | 
> calldate    | timestamp without time zone | not null                                          | plain    |
 | 
> callanswer  | timestamp without time zone |                                                   | plain    |
 | 
> callend     | timestamp without time zone | not null                                          | plain    |
 | 
> clid        | character varying(80)       |                                                   | extended |
 | 
> dnid        | character varying(80)       |                                                   | extended |
 | 
> src         | character varying(80)       |                                                   | extended |
 | 
> dst         | character varying(80)       | not null                                          | extended |
 | 
> dcontext    | character varying(80)       | not null                                          | extended |
 | 
> channel     | character varying(80)       | not null                                          | extended |
 | 
> dstchannel  | character varying(80)       |                                                   | extended |
 | 
> lastapp     | character varying(80)       |                                                   | extended |
 | 
> lastdata    | character varying(80)       |                                                   | extended |
 | 
> duration    | integer                     | not null                                          | plain    |
 | 
> billsec     | integer                     | not null                                          | plain    |
 | 
> disposition | character varying(45)       | not null                                          | extended |
 | 
> amaflags    | integer                     | not null                                          | plain    |
 | 
> accountcode | character varying(20)       |                                                   | extended |
 | 
> uniqueid    | character varying(150)      | not null                                          | extended |
 | 
> userfield   | character varying(255)      |                                                   | extended |
 | 
> Indexes:
>    "cdr_pkey" PRIMARY KEY, btree (_id, calldate)
>    "cdr__id_idx" btree (_id)
>    "cdr_accountcode_idx" btree (accountcode)
>    "cdr_amaflags_idx" btree (amaflags)
>    "cdr_calldate_idx" btree (calldate)
>    "cdr_clid_idx" btree (clid)
>    "cdr_disposition_idx" btree (disposition)
>    "cdr_dnid_idx" btree (dnid)
>    "cdr_dst_idx" btree (dst)
>    "cdr_src_idx" btree (src)
>    "cdr_uniqueid_idx" btree (uniqueid)
>    "cdr_userfield_idx" btree (userfield)
> Triggers:
>    cdr_notify BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE notify_trigger()
>    cdr_part_trig BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE cdr_part_trig_func()
> Child tables: cdr_p2016w06,
>              cdr_p2016w07,
>              cdr_p2016w08,
>              cdr_p2016w09,
>              cdr_p2016w10,
> etc…
>
>> 2) What is the pg_partman trigger function definition, as well the CREATE TRIGGER definition that calls the
function?
> cdr_part_trig BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE cdr_part_trig_func()
>
> public | cdr_part_trig_func                | trigger          |                     | trigger | invoker  | volatile
|postgres | 
> plpgsql  |
  +| 
>        |                                   |                  |                     |         |          |
|         | 
>         |         BEGIN
 +| 
>        |                                   |                  |                     |         |          |
|         | 
>         |         IF TG_OP = 'INSERT' THEN
 +| 
>        |                                   |                  |                     |         |          |
|         | 
>         |             IF NEW.calldate >= '2017-02-13 00:00:00+00' AND NEW.calldate < '2017-02-20 00:00:00+00' THEN
 +| 
>        |                                   |                  |                     |         |          |
|         | 
>         |                 INSERT INTO public.cdr_p2017w07 VALUES (NEW.*);
 +| 
>        |                                   |                  |                     |         |
>         |             ELSIF NEW.calldate >= '2017-02-06 00:00:00+00' AND NEW.calldate < '2017-02-13 00:00:00+00' THEN
 +| 
>        |                                   |                  |                     |         |          |
|         | 
>     |                 INSERT INTO public.cdr_p2017w06 VALUES (NEW.*);
+|
>      […]
>        |                                   |                  |                     |         |          |
|         | 
>      |             ELSE
+|
>        |                                   |                  |                     |         |          |
|         | 
>     |                 RETURN NEW;
+|
>        |                                   |                  |                     |         |          |
|         | 
>     |             END IF;
+|
>        |                                   |                  |                     |         |          |
|         | 
>     |         END IF;
+|
>        |                                   |                  |                     |         |          |
|         | 
>     |         RETURN NULL;
+|
>        |                                   |                  |                     |         |          |
|         | 
>     |         END
|                                                                      | 
>
>> 3) Are the first row and the second row in the same partition?
>
> Doubtful, the problem occurs several times a day and we only have one partition a day. Let me check with the above
example.What would be the best way to determine which child a row is in, and the relative position in the child table? 

Looks like you are partitioning every 7 days, so could you not use the
calldate to figure out which child table it is in?

As to position, maybe ctid though it has caveats:

https://www.postgresql.org/docs/9.4/static/ddl-system-columns.html

>
> Also; my worker in written in perl and uses DBD::Pg. I haven’t been able to 100% eliminate the module itself as the
causeof the bug. Any suggestions on how I might go about doing that efficiently? 

What does the worker do?

What I know about Perl is that I do not know:)

I did notice one thing, in your table definition you have:

calldate    | timestamp without time zone

yet in the partition function specify an time zone offset(+00), UTC I
presume.

Could it be the module is not dealing with time zones correctly? Though
thinking about this that would seem to manifest a problem only around
the 7th day boundary. So put this down to thinking aloud.

>
> Thanks,
> -=François Beaulieu
> SBK Telecom
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Potential bug with pg_notify

From
François Beaulieu
Date:
> On Feb 13, 2017, at 11:45 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> 
> On 02/13/2017 07:59 AM, François Beaulieu wrote:
>> 
>>> On Feb 13, 2017, at 10:28 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>> 
>>> On 02/10/2017 02:54 PM, François Beaulieu wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I’m trying to feed a worker process on another server using pg_notify in a trigger. I’m running pgsql 9.4 and
hittingsome behaviour that I’m hoping is just a bug that can be solved with an upgrade, but I’m not finding any
referencesto it being a known bug and the uptime on my database server is critical so I can’t upgrade on a whim.
Basically,the trigger runs on every row insert and notifies me with a few fields from the row. This works perfectly
mostof the time, but every few hundred rows, the notify will return null values. Here are the details:
 
>>>> 
>>>> My database uses time-based partitioning (and subpartitioning) with pg_partman. This is done with a before insert
triggeron the main parent table. pg_partman creates subtables dynamically, so I can’t easily run my trigger on each
childtable. I also can’t run it after insert on my parent table because the row never makes it there. Thus, I ave
electedto run my trigger before insert on the parent table, and have named it so that it is alphabetically first and
willrun before the partitioning trigger. Works perfectly most of the time. Here are the trigger and associated plpgsql
function:
>>>> 
>>>> —CODE---
>>>> CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
>>>> DECLARE
>>>> BEGIN
>>>> PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id || ',' || NEW.userfield);
>>>> RETURN new;
>>>> END;
>>>> $$ LANGUAGE plpgsql;
>>>> 
>>>> CREATE TRIGGER aaa_notify BEFORE INSERT ON aaa FOR EACH ROW execute procedure notify_trigger();
>>>> —/CODE—
>>>> 
>>>> On my listener, every once in a while, this returns rows such as this:
>>>> 
>>>> AAA,17947227,XXX
>>>> AAA, ,
>>>> AAA,17947229,ZZZ
>>>> 
>>>> Notice the second line, which appears to have even the autoincrement ‘_id' empty. It would seem to match _id =
17947228and that row does exist in the table and has data populated for all fields.
 
>>>> 
>>>> Has anyone ever seen anything like this?
>>> 
>>> Can't say I have. I think this is going to need more information:
>>> 
>>> --
>>> Adrian Klaver
>>> adrian.klaver@aklaver.com
>> 
>>> 1) What is the schema for the parent table?
>> 
>>                                                         Table "public.cdr"
>>  Column    |            Type             |                     Modifiers                     | Storage  | Stats
target| Descripti
 
>> on
>>
-------------+-----------------------------+---------------------------------------------------+----------+--------------+----------
>> ---
>> _id         | bigint                      | not null default nextval('cdr__id_seq'::regclass) | plain    |
  |
 
>> calldate    | timestamp without time zone | not null                                          | plain    |
  |
 
>> callanswer  | timestamp without time zone |                                                   | plain    |
  |
 
>> callend     | timestamp without time zone | not null                                          | plain    |
  |
 
>> clid        | character varying(80)       |                                                   | extended |
  |
 
>> dnid        | character varying(80)       |                                                   | extended |
  |
 
>> src         | character varying(80)       |                                                   | extended |
  |
 
>> dst         | character varying(80)       | not null                                          | extended |
  |
 
>> dcontext    | character varying(80)       | not null                                          | extended |
  |
 
>> channel     | character varying(80)       | not null                                          | extended |
  |
 
>> dstchannel  | character varying(80)       |                                                   | extended |
  |
 
>> lastapp     | character varying(80)       |                                                   | extended |
  |
 
>> lastdata    | character varying(80)       |                                                   | extended |
  |
 
>> duration    | integer                     | not null                                          | plain    |
  |
 
>> billsec     | integer                     | not null                                          | plain    |
  |
 
>> disposition | character varying(45)       | not null                                          | extended |
  |
 
>> amaflags    | integer                     | not null                                          | plain    |
  |
 
>> accountcode | character varying(20)       |                                                   | extended |
  |
 
>> uniqueid    | character varying(150)      | not null                                          | extended |
  |
 
>> userfield   | character varying(255)      |                                                   | extended |
  |
 
>> Indexes:
>>   "cdr_pkey" PRIMARY KEY, btree (_id, calldate)
>>   "cdr__id_idx" btree (_id)
>>   "cdr_accountcode_idx" btree (accountcode)
>>   "cdr_amaflags_idx" btree (amaflags)
>>   "cdr_calldate_idx" btree (calldate)
>>   "cdr_clid_idx" btree (clid)
>>   "cdr_disposition_idx" btree (disposition)
>>   "cdr_dnid_idx" btree (dnid)
>>   "cdr_dst_idx" btree (dst)
>>   "cdr_src_idx" btree (src)
>>   "cdr_uniqueid_idx" btree (uniqueid)
>>   "cdr_userfield_idx" btree (userfield)
>> Triggers:
>>   cdr_notify BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE notify_trigger()
>>   cdr_part_trig BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE cdr_part_trig_func()
>> Child tables: cdr_p2016w06,
>>             cdr_p2016w07,
>>             cdr_p2016w08,
>>             cdr_p2016w09,
>>             cdr_p2016w10,
>> etc…
>> 
>>> 2) What is the pg_partman trigger function definition, as well the CREATE TRIGGER definition that calls the
function?
>> cdr_part_trig BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE cdr_part_trig_func()
>> 
>> public | cdr_part_trig_func                | trigger          |                     | trigger | invoker  | volatile
| postgres |
 
>> plpgsql  |
   +|
 
>>       |                                   |                  |                     |         |          |
|         |
 
>>        |         BEGIN
 +|
 
>>       |                                   |                  |                     |         |          |
|         |
 
>>        |         IF TG_OP = 'INSERT' THEN
 +|
 
>>       |                                   |                  |                     |         |          |
|         |
 
>>        |             IF NEW.calldate >= '2017-02-13 00:00:00+00' AND NEW.calldate < '2017-02-20 00:00:00+00' THEN
 +|
 
>>       |                                   |                  |                     |         |          |
|         |
 
>>        |                 INSERT INTO public.cdr_p2017w07 VALUES (NEW.*);
 +|
 
>>       |                                   |                  |                     |         |
>>        |             ELSIF NEW.calldate >= '2017-02-06 00:00:00+00' AND NEW.calldate < '2017-02-13 00:00:00+00' THEN
 +|
 
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |                 INSERT INTO public.cdr_p2017w06 VALUES (NEW.*);
+|
>>     […]
>>       |                                   |                  |                     |         |          |
|         |
 
>>      |             ELSE
+|
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |                 RETURN NEW;
+|
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |             END IF;
+|
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |         END IF;
+|
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |         RETURN NULL;
+|
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |         END
|                                                                      |
 
>> 
>>> 3) Are the first row and the second row in the same partition?
>> 
>> Doubtful, the problem occurs several times a day and we only have one partition a day. Let me check with the above
example.What would be the best way to determine which child a row is in, and the relative position in the child table?
 
> 
> As to position, maybe ctid though it has caveats:

The three rows in my example return a ctid of (742,17), (742,18) and (742,19) respectively, in their child table. So,
probablynot at a partition boundary.
 

>> Also; my worker in written in perl and uses DBD::Pg. I haven’t been able to 100% eliminate the module itself as the
causeof the bug. Any suggestions on how I might go about doing that efficiently?
 
> 
> What does the worker do?

Sorry, that's my employer’s classified IP. :-) 
Does it matter?

> Could it be the module is not dealing with time zones correctly? Though thinking about this that would seem to
manifesta problem only around the 7th day boundary. So put this down to thinking aloud.
 

No, the partitioning scheme seems to be respecting the timezone properly, and my issue is happening every few hours in
themiddle of the day and we’re in UTC+5, so not near the end of the day in UTC. Besides, I believe timestamp without
timezoneassumes the local timezone of the server, which is set to UTC anyway.
 

Has the schema eliminated your original theory regarding the delaying of the generation of the _id? I don’t think that
wouldnormally be an issue that occurs sporadically and the _id seems to be part of the INSERT, which would indicate
that,as it should, it’s done generating before my trigger is called.
 

Thanks,
-=François Beaulieu
SBK Telecom

Re: [GENERAL] Potential bug with pg_notify

From
Adrian Klaver
Date:
On 02/13/2017 09:04 AM, François Beaulieu wrote:
>
>> On Feb 13, 2017, at 11:45 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
                                                                      |
>>>
>>>> 3) Are the first row and the second row in the same partition?
>>>
>>> Doubtful, the problem occurs several times a day and we only have one partition a day. Let me check with the above
example.What would be the best way to determine which child a row is in, and the relative position in the child table? 
>>
>> As to position, maybe ctid though it has caveats:
>
> The three rows in my example return a ctid of (742,17), (742,18) and (742,19) respectively, in their child table. So,
probablynot at a partition boundary. 
>
>>> Also; my worker in written in perl and uses DBD::Pg. I haven’t been able to 100% eliminate the module itself as the
causeof the bug. Any suggestions on how I might go about doing that efficiently? 
>>
>> What does the worker do?
>
> Sorry, that's my employer’s classified IP. :-)
> Does it matter?

Only that it makes it harder to give any suggestions on eliminating it
as a source of error if it is a black box.  I don't think, at this
point, it is necessary to see the actual source. If it is possible a
high level synopsis of what it does might be sufficient.

>
>> Could it be the module is not dealing with time zones correctly? Though thinking about this that would seem to
manifesta problem only around the 7th day boundary. So put this down to thinking aloud. 
>
> No, the partitioning scheme seems to be respecting the timezone properly, and my issue is happening every few hours
inthe middle of the day and we’re in UTC+5, so not near the end of the day in UTC. Besides, I believe timestamp without
timezoneassumes the local timezone of the server, which is set to UTC anyway. 
>
> Has the schema eliminated your original theory regarding the delaying of the generation of the _id? I don’t think
thatwould normally be an issue that occurs sporadically and the _id seems to be part of the INSERT, which would
indicatethat, as it should, it’s done generating before my trigger is called. 

I don't see anything that would explain a delay. Still the fact remains
that in most cases the notify captures the _id, but in some cases it
does not. Going back to your OP I realized I missed that the
NEW.userfield was also not coming through. So that seems to confirm that
pg_notify() is firing before it gets access to NEW.*. Having said that I
have no idea why?

The only thing I can think to do is(untested):

CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
DECLARE
    _newid integer;
BEGIN
   SELECT NEW._id INTO _newid;
   IF _newid IS NULL OR NOT FOUND THEN
    RAISE NOTICE 'NEW._id is NULL/NOT FOUND';
    pg_sleep(0.1); --Or whatever interval you want.
   END IF;
   PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id|| ','
|| NEW.userfield);
   RETURN new;
END;
$$ LANGUAGE plpgsql;

Not really a solution but it might help determine whether it is a timing
issue. Also this is probably something that should be done on a test
server to be safe.


>
> Thanks,
> -=François Beaulieu
> SBK Telecom
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: [GENERAL] Potential bug with pg_notify

From
François Beaulieu
Date:
> On Feb 13, 2017, at 1:56 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> 
> On 02/13/2017 09:04 AM, François Beaulieu wrote:
>> 
>>> On Feb 13, 2017, at 11:45 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>> 
>                                                                     |
>>>> 
>>>>> 3) Are the first row and the second row in the same partition?
>>>> 
>>>> Doubtful, the problem occurs several times a day and we only have one partition a day. Let me check with the above
example.What would be the best way to determine which child a row is in, and the relative position in the child table?
 
>>> 
>>> As to position, maybe ctid though it has caveats:
>> 
>> The three rows in my example return a ctid of (742,17), (742,18) and (742,19) respectively, in their child table.
So,probably not at a partition boundary.
 
>> 
>>>> Also; my worker in written in perl and uses DBD::Pg. I haven’t been able to 100% eliminate the module itself as
thecause of the bug. Any suggestions on how I might go about doing that efficiently?
 
>>> 
>>> What does the worker do?
>> 
>> Sorry, that's my employer’s classified IP. :-)
>> Does it matter?
> 
> Only that it makes it harder to give any suggestions on eliminating it as a source of error if it is a black box.  I
don'tthink, at this point, it is necessary to see the actual source. If it is possible a high level synopsis of what it
doesmight be sufficient.
 

Suffice it to say the worker uses a read-only connection to the database to receive these notices and to query the
tablefor the matching rows in certain circumstances. It never modifies the database in any way; it only uses this
informationto act upon a completely different subsystem. I loosely based it on a snippet of code from this very mailing
list:

https://www.postgresql.org/message-id/20050104031937.GA80695@winnie.fuhr.org

>>> Could it be the module is not dealing with time zones correctly? Though thinking about this that would seem to
manifesta problem only around the 7th day boundary. So put this down to thinking aloud.
 
>> 
>> No, the partitioning scheme seems to be respecting the timezone properly, and my issue is happening every few hours
inthe middle of the day and we’re in UTC+5, so not near the end of the day in UTC. Besides, I believe timestamp without
timezoneassumes the local timezone of the server, which is set to UTC anyway.
 
>> 
>> Has the schema eliminated your original theory regarding the delaying of the generation of the _id? I don’t think
thatwould normally be an issue that occurs sporadically and the _id seems to be part of the INSERT, which would
indicatethat, as it should, it’s done generating before my trigger is called.
 
> 
> I don't see anything that would explain a delay. Still the fact remains that in most cases the notify captures the
_id,but in some cases it does not. Going back to your OP I realized I missed that the NEW.userfield was also not coming
through.So that seems to confirm that pg_notify() is firing before it gets access to NEW.*. Having said that I have no
ideawhy?
 
> 
> The only thing I can think to do is(untested):
> 
> CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
> DECLARE
>     _newid integer;
> BEGIN
>  SELECT NEW._id INTO _newid;
>  IF _newid IS NULL OR NOT FOUND THEN
>     RAISE NOTICE 'NEW._id is NULL/NOT FOUND';
>     pg_sleep(0.1); --Or whatever interval you want.
>  END IF;
>  PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id|| ',' || NEW.userfield);
>  RETURN new;
> END;
> $$ LANGUAGE plpgsql;
> 
> Not really a solution but it might help determine whether it is a timing issue. Also this is probably something that
shouldbe done on a test server to be safe.
 

Thanks for the tip, I’ll try that in my lab. I want to try to replicate the issue more consistently first, so that my
testsafter the change will be more conclusive. 
 

In the meantime, if anyone has any other suggestions, please don’t hesitate.


Thanks,
-=François Beaulieu
SBK Telecom


Re: [GENERAL] Potential bug with pg_notify

From
Adrian Klaver
Date:
On 02/13/2017 11:50 AM, François Beaulieu wrote:
>
>> On Feb 13, 2017, at 1:56 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>> On 02/13/2017 09:04 AM, François Beaulieu wrote:
>>>
>>>> On Feb 13, 2017, at 11:45 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>>>
>>                                                                     |
>>>>>
>>>>>> 3) Are the first row and the second row in the same partition?
>>>>>
>>>>> Doubtful, the problem occurs several times a day and we only have one partition a day. Let me check with the
aboveexample. What would be the best way to determine which child a row is in, and the relative position in the child
table?
>>>>
>>>> As to position, maybe ctid though it has caveats:
>>>
>>> The three rows in my example return a ctid of (742,17), (742,18) and (742,19) respectively, in their child table.
So,probably not at a partition boundary. 
>>>
>>>>> Also; my worker in written in perl and uses DBD::Pg. I haven’t been able to 100% eliminate the module itself as
thecause of the bug. Any suggestions on how I might go about doing that efficiently? 
>>>>
>>>> What does the worker do?
>>>
>>> Sorry, that's my employer’s classified IP. :-)
>>> Does it matter?
>>
>> Only that it makes it harder to give any suggestions on eliminating it as a source of error if it is a black box.  I
don'tthink, at this point, it is necessary to see the actual source. If it is possible a high level synopsis of what it
doesmight be sufficient. 
>
> Suffice it to say the worker uses a read-only connection to the database to receive these notices and to query the
tablefor the matching rows in certain circumstances. It never modifies the database in any way; it only uses this
informationto act upon a completely different subsystem. I loosely based it on a snippet of code from this very mailing
list:
>
> https://www.postgresql.org/message-id/20050104031937.GA80695@winnie.fuhr.org
>
>>>> Could it be the module is not dealing with time zones correctly? Though thinking about this that would seem to
manifesta problem only around the 7th day boundary. So put this down to thinking aloud. 
>>>
>>> No, the partitioning scheme seems to be respecting the timezone properly, and my issue is happening every few hours
inthe middle of the day and we’re in UTC+5, so not near the end of the day in UTC. Besides, I believe timestamp without
timezoneassumes the local timezone of the server, which is set to UTC anyway. 
>>>
>>> Has the schema eliminated your original theory regarding the delaying of the generation of the _id? I don’t think
thatwould normally be an issue that occurs sporadically and the _id seems to be part of the INSERT, which would
indicatethat, as it should, it’s done generating before my trigger is called. 
>>
>> I don't see anything that would explain a delay. Still the fact remains that in most cases the notify captures the
_id,but in some cases it does not. Going back to your OP I realized I missed that the NEW.userfield was also not coming
through.So that seems to confirm that pg_notify() is firing before it gets access to NEW.*. Having said that I have no
ideawhy? 
>>
>> The only thing I can think to do is(untested):
>>
>> CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
>> DECLARE
>>     _newid integer;
>> BEGIN
>>  SELECT NEW._id INTO _newid;
>>  IF _newid IS NULL OR NOT FOUND THEN
>>     RAISE NOTICE 'NEW._id is NULL/NOT FOUND';
>>     pg_sleep(0.1); --Or whatever interval you want.
>>  END IF;
>>  PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id|| ',' || NEW.userfield);
>>  RETURN new;
>> END;
>> $$ LANGUAGE plpgsql;
>>
>> Not really a solution but it might help determine whether it is a timing issue. Also this is probably something that
shouldbe done on a test server to be safe. 
>
> Thanks for the tip, I’ll try that in my lab. I want to try to replicate the issue more consistently first, so that my
testsafter the change will be more conclusive.  
>
> In the meantime, if anyone has any other suggestions, please don’t hesitate.

The only thing I can come up with is in your test lab once you
replicate the issue crank up the logging level:

https://www.postgresql.org/docs/9.4/static/runtime-config-logging.html#RUNTIME-CONFIG-SEVERITY-LEVELS

log_min_messages

to see if more detail sheds any light.

>
>
> Thanks,
> -=François Beaulieu
> SBK Telecom
>


--
Adrian Klaver
adrian.klaver@aklaver.com