Re: [GENERAL] Potential bug with pg_notify - Mailing list pgsql-general

From Adrian Klaver
Subject Re: [GENERAL] Potential bug with pg_notify
Date
Msg-id 8a4a64fb-bb5c-a22a-a884-f11250a5e741@aklaver.com
Whole thread Raw
In response to Re: [GENERAL] Potential bug with pg_notify  (François Beaulieu <Francois.Beaulieu@sbktelecom.com>)
Responses Re: [GENERAL] Potential bug with pg_notify  (François Beaulieu <Francois.Beaulieu@sbktelecom.com>)
List pgsql-general
On 02/13/2017 07:59 AM, François Beaulieu wrote:
>
>> On Feb 13, 2017, at 10:28 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>> On 02/10/2017 02:54 PM, François Beaulieu wrote:
>>>
>>> Hi all,
>>>
>>> I’m trying to feed a worker process on another server using pg_notify in a trigger. I’m running pgsql 9.4 and
hittingsome behaviour that I’m hoping is just a bug that can be solved with an upgrade, but I’m not finding any
referencesto it being a known bug and the uptime on my database server is critical so I can’t upgrade on a whim.
Basically,the trigger runs on every row insert and notifies me with a few fields from the row. This works perfectly
mostof the time, but every few hundred rows, the notify will return null values. Here are the details: 
>>>
>>> My database uses time-based partitioning (and subpartitioning) with pg_partman. This is done with a before insert
triggeron the main parent table. pg_partman creates subtables dynamically, so I can’t easily run my trigger on each
childtable. I also can’t run it after insert on my parent table because the row never makes it there. Thus, I ave
electedto run my trigger before insert on the parent table, and have named it so that it is alphabetically first and
willrun before the partitioning trigger. Works perfectly most of the time. Here are the trigger and associated plpgsql
function:
>>>
>>> —CODE---
>>> CREATE OR REPLACE FUNCTION notify_trigger() RETURNS trigger AS $$
>>> DECLARE
>>> BEGIN
>>> PERFORM pg_notify('watchers', TG_TABLE_NAME || ',' || NEW._id || ',' || NEW.userfield);
>>> RETURN new;
>>> END;
>>> $$ LANGUAGE plpgsql;
>>>
>>> CREATE TRIGGER aaa_notify BEFORE INSERT ON aaa FOR EACH ROW execute procedure notify_trigger();
>>> —/CODE—
>>>
>>> On my listener, every once in a while, this returns rows such as this:
>>>
>>> AAA,17947227,XXX
>>> AAA, ,
>>> AAA,17947229,ZZZ
>>>
>>> Notice the second line, which appears to have even the autoincrement ‘_id' empty. It would seem to match _id =
17947228and that row does exist in the table and has data populated for all fields. 
>>>
>>> Has anyone ever seen anything like this?
>>
>> Can't say I have. I think this is going to need more information:
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com
>
>> 1) What is the schema for the parent table?
>
>                                                          Table "public.cdr"
>   Column    |            Type             |                     Modifiers                     | Storage  | Stats
target| Descripti 
> on
>
-------------+-----------------------------+---------------------------------------------------+----------+--------------+----------
> ---
> _id         | bigint                      | not null default nextval('cdr__id_seq'::regclass) | plain    |
 | 
> calldate    | timestamp without time zone | not null                                          | plain    |
 | 
> callanswer  | timestamp without time zone |                                                   | plain    |
 | 
> callend     | timestamp without time zone | not null                                          | plain    |
 | 
> clid        | character varying(80)       |                                                   | extended |
 | 
> dnid        | character varying(80)       |                                                   | extended |
 | 
> src         | character varying(80)       |                                                   | extended |
 | 
> dst         | character varying(80)       | not null                                          | extended |
 | 
> dcontext    | character varying(80)       | not null                                          | extended |
 | 
> channel     | character varying(80)       | not null                                          | extended |
 | 
> dstchannel  | character varying(80)       |                                                   | extended |
 | 
> lastapp     | character varying(80)       |                                                   | extended |
 | 
> lastdata    | character varying(80)       |                                                   | extended |
 | 
> duration    | integer                     | not null                                          | plain    |
 | 
> billsec     | integer                     | not null                                          | plain    |
 | 
> disposition | character varying(45)       | not null                                          | extended |
 | 
> amaflags    | integer                     | not null                                          | plain    |
 | 
> accountcode | character varying(20)       |                                                   | extended |
 | 
> uniqueid    | character varying(150)      | not null                                          | extended |
 | 
> userfield   | character varying(255)      |                                                   | extended |
 | 
> Indexes:
>    "cdr_pkey" PRIMARY KEY, btree (_id, calldate)
>    "cdr__id_idx" btree (_id)
>    "cdr_accountcode_idx" btree (accountcode)
>    "cdr_amaflags_idx" btree (amaflags)
>    "cdr_calldate_idx" btree (calldate)
>    "cdr_clid_idx" btree (clid)
>    "cdr_disposition_idx" btree (disposition)
>    "cdr_dnid_idx" btree (dnid)
>    "cdr_dst_idx" btree (dst)
>    "cdr_src_idx" btree (src)
>    "cdr_uniqueid_idx" btree (uniqueid)
>    "cdr_userfield_idx" btree (userfield)
> Triggers:
>    cdr_notify BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE notify_trigger()
>    cdr_part_trig BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE cdr_part_trig_func()
> Child tables: cdr_p2016w06,
>              cdr_p2016w07,
>              cdr_p2016w08,
>              cdr_p2016w09,
>              cdr_p2016w10,
> etc…
>
>> 2) What is the pg_partman trigger function definition, as well the CREATE TRIGGER definition that calls the
function?
> cdr_part_trig BEFORE INSERT ON cdr FOR EACH ROW EXECUTE PROCEDURE cdr_part_trig_func()
>
> public | cdr_part_trig_func                | trigger          |                     | trigger | invoker  | volatile
|postgres | 
> plpgsql  |
  +| 
>        |                                   |                  |                     |         |          |
|         | 
>         |         BEGIN
 +| 
>        |                                   |                  |                     |         |          |
|         | 
>         |         IF TG_OP = 'INSERT' THEN
 +| 
>        |                                   |                  |                     |         |          |
|         | 
>         |             IF NEW.calldate >= '2017-02-13 00:00:00+00' AND NEW.calldate < '2017-02-20 00:00:00+00' THEN
 +| 
>        |                                   |                  |                     |         |          |
|         | 
>         |                 INSERT INTO public.cdr_p2017w07 VALUES (NEW.*);
 +| 
>        |                                   |                  |                     |         |
>         |             ELSIF NEW.calldate >= '2017-02-06 00:00:00+00' AND NEW.calldate < '2017-02-13 00:00:00+00' THEN
 +| 
>        |                                   |                  |                     |         |          |
|         | 
>     |                 INSERT INTO public.cdr_p2017w06 VALUES (NEW.*);
+|
>      […]
>        |                                   |                  |                     |         |          |
|         | 
>      |             ELSE
+|
>        |                                   |                  |                     |         |          |
|         | 
>     |                 RETURN NEW;
+|
>        |                                   |                  |                     |         |          |
|         | 
>     |             END IF;
+|
>        |                                   |                  |                     |         |          |
|         | 
>     |         END IF;
+|
>        |                                   |                  |                     |         |          |
|         | 
>     |         RETURN NULL;
+|
>        |                                   |                  |                     |         |          |
|         | 
>     |         END
|                                                                      | 
>
>> 3) Are the first row and the second row in the same partition?
>
> Doubtful, the problem occurs several times a day and we only have one partition a day. Let me check with the above
example.What would be the best way to determine which child a row is in, and the relative position in the child table? 

Looks like you are partitioning every 7 days, so could you not use the
calldate to figure out which child table it is in?

As to position, maybe ctid though it has caveats:

https://www.postgresql.org/docs/9.4/static/ddl-system-columns.html

>
> Also; my worker in written in perl and uses DBD::Pg. I haven’t been able to 100% eliminate the module itself as the
causeof the bug. Any suggestions on how I might go about doing that efficiently? 

What does the worker do?

What I know about Perl is that I do not know:)

I did notice one thing, in your table definition you have:

calldate    | timestamp without time zone

yet in the partition function specify an time zone offset(+00), UTC I
presume.

Could it be the module is not dealing with time zones correctly? Though
thinking about this that would seem to manifest a problem only around
the 7th day boundary. So put this down to thinking aloud.

>
> Thanks,
> -=François Beaulieu
> SBK Telecom
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Vincent Veyron
Date:
Subject: Re: [GENERAL] Recorded PostgreSQL at 10TB and Beyond
Next
From: Frank van Vugt
Date:
Subject: Re: [GENERAL] intentional or oversight? pg_dump -c does not restore default priviliges on schema public