Re: [GENERAL] Potential bug with pg_notify - Mailing list pgsql-general

From François Beaulieu
Subject Re: [GENERAL] Potential bug with pg_notify
Date
Msg-id EF14329B-FF19-46E2-A3BF-FDEB4BB0E92E@sbktelecom.com
Whole thread Raw
In response to Re: [GENERAL] Potential bug with pg_notify  (Adrian Klaver <adrian.klaver@aklaver.com>)
Responses Re: [GENERAL] Potential bug with pg_notify  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
> On Feb 13, 2017, at 11:45 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> 
> On 02/13/2017 07:59 AM, François Beaulieu wrote:
>> 
>>> On Feb 13, 2017, at 10:28 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>> 
>>> On 02/10/2017 02:54 PM, François Beaulieu wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I’m trying to feed a worker process on another server using pg_notify in a trigger. I’m running pgsql 9.4 and
hittingsome behaviour that I’m hoping is just a bug that can be solved with an upgrade, but I’m not finding any
referencesto it being a known bug and the uptime on my database server is critical so I can’t upgrade on a whim.
Basically,the trigger runs on every row insert and notifies me with a few fields from the row. This works perfectly
mostof the time, but every few hundred rows, the notify will return null values. Here are the details:
 
>>>> 
>>>> My database uses time-based partitioning (and subpartitioning) with pg_partman. This is done with a before insert
triggeron the main parent table. pg_partman creates subtables dynamically, so I can’t easily run my trigger on each
childtable. I also can’t run it after insert on my parent table because the row never makes it there. Thus, I ave
electedto run my trigger before insert on the parent table, and have named it so that it is alphabetically first and
willrun before the partitioning trigger. Works perfectly most of the time. Here are the trigger and associated plpgsql
function:
>>>> 
>>>> —CODE---
>>>> CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
>>>> DECLARE
>>>> BEGIN
>>>> PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id || ',' || NEW.userfield);
>>>> RETURN new;
>>>> END;
>>>> $$ LANGUAGE plpgsql;
>>>> 
>>>> CREATE TRIGGER aaa_notify BEFORE INSERT ON aaa FOR EACH ROW execute procedure notify_trigger();
>>>> —/CODE—
>>>> 
>>>> On my listener, every once in a while, this returns rows such as this:
>>>> 
>>>> AAA,17947227,XXX
>>>> AAA, ,
>>>> AAA,17947229,ZZZ
>>>> 
>>>> Notice the second line, which appears to have even the autoincrement ‘_id' empty. It would seem to match _id =
17947228and that row does exist in the table and has data populated for all fields.
 
>>>> 
>>>> Has anyone ever seen anything like this?
>>> 
>>> Can't say I have. I think this is going to need more information:
>>> 
>>> --
>>> Adrian Klaver
>>> adrian.klaver@aklaver.com
>> 
>>> 1) What is the schema for the parent table?
>> 
>>                                                         Table "public.cdr"
>>  Column    |            Type             |                     Modifiers                     | Storage  | Stats
target| Descripti
 
>> on
>>
-------------+-----------------------------+---------------------------------------------------+----------+--------------+----------
>> ---
>> _id         | bigint                      | not null default nextval('cdr__id_seq'::regclass) | plain    |
  |
 
>> calldate    | timestamp without time zone | not null                                          | plain    |
  |
 
>> callanswer  | timestamp without time zone |                                                   | plain    |
  |
 
>> callend     | timestamp without time zone | not null                                          | plain    |
  |
 
>> clid        | character varying(80)       |                                                   | extended |
  |
 
>> dnid        | character varying(80)       |                                                   | extended |
  |
 
>> src         | character varying(80)       |                                                   | extended |
  |
 
>> dst         | character varying(80)       | not null                                          | extended |
  |
 
>> dcontext    | character varying(80)       | not null                                          | extended |
  |
 
>> channel     | character varying(80)       | not null                                          | extended |
  |
 
>> dstchannel  | character varying(80)       |                                                   | extended |
  |
 
>> lastapp     | character varying(80)       |                                                   | extended |
  |
 
>> lastdata    | character varying(80)       |                                                   | extended |
  |
 
>> duration    | integer                     | not null                                          | plain    |
  |
 
>> billsec     | integer                     | not null                                          | plain    |
  |
 
>> disposition | character varying(45)       | not null                                          | extended |
  |
 
>> amaflags    | integer                     | not null                                          | plain    |
  |
 
>> accountcode | character varying(20)       |                                                   | extended |
  |
 
>> uniqueid    | character varying(150)      | not null                                          | extended |
  |
 
>> userfield   | character varying(255)      |                                                   | extended |
  |
 
>> Indexes:
>>   "cdr_pkey" PRIMARY KEY, btree (_id, calldate)
>>   "cdr__id_idx" btree (_id)
>>   "cdr_accountcode_idx" btree (accountcode)
>>   "cdr_amaflags_idx" btree (amaflags)
>>   "cdr_calldate_idx" btree (calldate)
>>   "cdr_clid_idx" btree (clid)
>>   "cdr_disposition_idx" btree (disposition)
>>   "cdr_dnid_idx" btree (dnid)
>>   "cdr_dst_idx" btree (dst)
>>   "cdr_src_idx" btree (src)
>>   "cdr_uniqueid_idx" btree (uniqueid)
>>   "cdr_userfield_idx" btree (userfield)
>> Triggers:
>>   cdr_notify BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE notify_trigger()
>>   cdr_part_trig BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE cdr_part_trig_func()
>> Child tables: cdr_p2016w06,
>>             cdr_p2016w07,
>>             cdr_p2016w08,
>>             cdr_p2016w09,
>>             cdr_p2016w10,
>> etc…
>> 
>>> 2) What is the pg_partman trigger function definition, as well the CREATE TRIGGER definition that calls the
function?
>> cdr_part_trig BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE cdr_part_trig_func()
>> 
>> public | cdr_part_trig_func                | trigger          |                     | trigger | invoker  | volatile
| postgres |
 
>> plpgsql  |
   +|
 
>>       |                                   |                  |                     |         |          |
|         |
 
>>        |         BEGIN
 +|
 
>>       |                                   |                  |                     |         |          |
|         |
 
>>        |         IF TG_OP = 'INSERT' THEN
 +|
 
>>       |                                   |                  |                     |         |          |
|         |
 
>>        |             IF NEW.calldate >= '2017-02-13 00:00:00+00' AND NEW.calldate < '2017-02-20 00:00:00+00' THEN
 +|
 
>>       |                                   |                  |                     |         |          |
|         |
 
>>        |                 INSERT INTO public.cdr_p2017w07 VALUES (NEW.*);
 +|
 
>>       |                                   |                  |                     |         |
>>        |             ELSIF NEW.calldate >= '2017-02-06 00:00:00+00' AND NEW.calldate < '2017-02-13 00:00:00+00' THEN
 +|
 
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |                 INSERT INTO public.cdr_p2017w06 VALUES (NEW.*);
+|
>>     […]
>>       |                                   |                  |                     |         |          |
|         |
 
>>      |             ELSE
+|
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |                 RETURN NEW;
+|
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |             END IF;
+|
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |         END IF;
+|
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |         RETURN NULL;
+|
>>       |                                   |                  |                     |         |          |
|         |
 
>>     |         END
|                                                                      |
 
>> 
>>> 3) Are the first row and the second row in the same partition?
>> 
>> Doubtful, the problem occurs several times a day and we only have one partition a day. Let me check with the above
example.What would be the best way to determine which child a row is in, and the relative position in the child table?
 
> 
> As to position, maybe ctid though it has caveats:

The three rows in my example return a ctid of (742,17), (742,18) and (742,19) respectively, in their child table. So,
probablynot at a partition boundary.
 

>> Also; my worker in written in perl and uses DBD::Pg. I haven’t been able to 100% eliminate the module itself as the
causeof the bug. Any suggestions on how I might go about doing that efficiently?
 
> 
> What does the worker do?

Sorry, that's my employer’s classified IP. :-) 
Does it matter?

> Could it be the module is not dealing with time zones correctly? Though thinking about this that would seem to
manifesta problem only around the 7th day boundary. So put this down to thinking aloud.
 

No, the partitioning scheme seems to be respecting the timezone properly, and my issue is happening every few hours in
themiddle of the day and we’re in UTC+5, so not near the end of the day in UTC. Besides, I believe timestamp without
timezoneassumes the local timezone of the server, which is set to UTC anyway.
 

Has the schema eliminated your original theory regarding the delaying of the generation of the _id? I don’t think that
wouldnormally be an issue that occurs sporadically and the _id seems to be part of the INSERT, which would indicate
that,as it should, it’s done generating before my trigger is called.
 

Thanks,
-=François Beaulieu
SBK Telecom

pgsql-general by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [GENERAL] intentional or oversight? pg_dump -c does not restoredefault priviliges on schema public
Next
From: David Hinkle
Date:
Subject: [GENERAL] Bad planning data resulting in OOM killing of postgres