Thread: stored procedure suddenly runs slowly in HOT STANDBY but fast in primary

We have two database servers running streaming replication between them:

Primary server
==============
OS: Linux version 2.6.18-371.3.1.el5
PostgreSQL: 9.1.11


HOT standby server
==================
OS: Linux version 2.6.32-431.11.2.el6.x86_64
PostgreSQL: 9.1.11

Since July 1, one SP suddenly runs slowly in HOT STANDBY server. After
investigation, I can narrow the problem to one particular query in SP. The
weird things are:

(1) The SP takes about 25 seconds to run in HOT STANDBY only, but only 0.5
second in primary
(2) If I extract the query in the SP and run it in a psql session, it runs
fine even in HOT STANDBY
(3) The SP is:
CREATE OR REPLACE FUNCTION tmp_test (p_beacon_id bigint, p_rpt_start_ts
bigint, p_rpt_end_ts bigint) RETURNS bigint AS $$
DECLARE
   --
   v_min_locate_id     bigint;
   --
BEGIN
   --
   SELECT MIN(locate_id) INTO v_min_locate_id
   FROM   event_startstop
   WHERE  beacon_id = p_beacon_id
   AND    locate_id IS NOT NULL
   AND    network_timestamp BETWEEN p_rpt_start_ts AND p_rpt_end_ts;
   --
   RETURN v_min_locate_id;
   --
EXCEPTION
   WHEN OTHERS THEN
      RAISE EXCEPTION 'tmp_test %, %', SQLSTATE, SQLERRM;
END
$$ LANGUAGE 'plpgsql' STABLE;

(4) explain analyze buffers in HOT STANDBY:
DB=# explain (analyze, buffers true) select * from tmp_test (55627,
1403989199, 1404187199);
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Function Scan on tmp_test  (cost=0.25..0.26 rows=1 width=8) (actual
time=25300.000..25300.002 rows=1 loops=1)
   Buffers: shared hit=25357218 read=880466 written=4235
 Total runtime: 25300.067 ms
(3 rows)

(5) if running the SQL from psql:
DB=# explain (analyze, buffers true) select min(locate_id) from
event_startstop where beacon_id=55627 and locate_id is not null and
network_timestamp between 1403989199 and 1404187199;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=79.11..79.12 rows=1 width=8) (actual time=0.342..0.342
rows=1 loops=1)
   Buffers: shared hit=8 read=7
   ->  Append  (cost=0.00..79.06 rows=20 width=8) (actual time=0.190..0.326
rows=11 loops=1)
         Buffers: shared hit=8 read=7
         ->  Seq Scan on event_startstop  (cost=0.00..0.00 rows=1 width=8)
(actual time=0.002..0.002 rows=0 loops=1)
               Filter: ((locate_id IS NOT NULL) AND (network_timestamp >=
1403989199) AND (network_timestamp <= 1404187199) AND (beacon_id = 55627))
         ->  Bitmap Heap Scan on event_startstop_201406_b54to56k
event_startstop  (cost=4.71..79.06 rows=19 width=8) (actual
time=0.186..0.310 rows=11 loops=1)
               Recheck Cond: ((beacon_id = 55627) AND (network_timestamp >=
1403989199) AND (network_timestamp <= 1404187199))
               Filter: (locate_id IS NOT NULL)
               Buffers: shared hit=8 read=7
               ->  Bitmap Index Scan on
event_startstop_201406_b54to56k_bidntslid_idx  (cost=0.00..4.71 rows=19
width=0) (actual time=0.170..0.170 rows=11 loops=1)
                     Index Cond: ((beacon_id = 55627) AND (network_timestamp
>= 1403989199) AND (network_timestamp <= 1404187199))
                     Buffers: shared hit=5 read=1
 Total runtime: 0.485 ms
(14 rows)

Time: 159.359 ms

(6) the event_startstop is a parent table with 406 children tables



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/stored-procedure-suddenly-runs-slowly-in-HOT-STANDBY-but-fast-in-primary-tp5810599.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


piuschan <pchan@contigo.com> writes:
> PostgreSQL: 9.1.11

> Since July 1, one SP suddenly runs slowly in HOT STANDBY server. After
> investigation, I can narrow the problem to one particular query in SP.

>    SELECT MIN(locate_id) INTO v_min_locate_id
>    FROM   event_startstop
>    WHERE  beacon_id = p_beacon_id
>    AND    locate_id IS NOT NULL
>    AND    network_timestamp BETWEEN p_rpt_start_ts AND p_rpt_end_ts;

> (6) the event_startstop is a parent table with 406 children tables

TBH, the astonishing part of this report is not that it's slow, but
that it ever was not slow.  9.1 is not capable of avoiding scanning
the other 405 child tables when given a parameterized query such as
this one.  (You haven't said, but I suppose that the child tables
are partitioned on beacon_id and/or network_timestamp, so that knowledge
of the constants these columns are being compared to is essential for
doing constraint exclusion.)

You could work around that by inserting constants into the query with
EXECUTE, but a better answer would be to update to 9.2 or later.

            regards, tom lane


Hi Tom,

Thanks for your reply. As you said, the weirdest part is the the SP ran fine
in primary server and hot standby server. I did another test over the
weekend:

(1) restore the production DB dump to an internal server with replication
set up.
(2) ran the SP in internal hot standy DB
(3) surprisingly the SP ran fast in internal hot standby DB

So what is the possible explanations to these test results? Is there a way
to tell if the SP scans through all children event_startstop tables as you
said?

Thanks a lot,

Pius



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/stored-procedure-suddenly-runs-slowly-in-HOT-STANDBY-but-fast-in-primary-tp5810599p5810779.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.