Thread: Help with slow table update

Help with slow table update

From

Pawel Veselov

Date:

10 April 2015, 19:25:57

Hi.

I have a plpgsql procedure that updates a few similar tables.

for some reason, updates on one of the tables take a lot longer the updates on the other ones. The difference is, say, 7 seconds vs. 80 milliseconds.

the procedure uses cursors and record variables to do the updates. For example:

update r_agrio_total set

unserved = unserved + (agrow->>'unserved')::numeric(38),

r_brkconn = mush_brk_conn(r_brkconn, q_item.r_brkconn),

where

tagid = _tagid and

unitid = (akey->>'unitid')::numeric and

placement = (akey->>'placement')::numeric and

device_type = (akey->>'device_type')::numeric;

There is another table (xq_agr) that is read record by record, and for each of those records, such update is executed.

I was trying to select analyze the updates to see where the time could be spent.

There are only 24 row in the "bad" table, and 3,400 rows in "good" table. So, for the "bad" table, most of the updates will be on the same rows. The times were measured on processing 100 original records.

When I'm analyzing pure update statements, I don't see anything strange.

"bad" table: explain analyze update r_agrio_total set unconfirmed = unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0 and device_type = 100;

RESULT:

Update on r_agrio_total (cost=0.42..4.46 rows=1 width=321) (actual time=0.253..0.253 rows=0 loops=1)

-> Index Scan using tag_r_agrio_total on r_agrio_total (cost=0.42..4.46 rows=1 width=321) (actual time=0.037..0.041 rows=1 loops=1)

Index Cond: (tagid = 1000::numeric)

Filter: ((unitid = 1000::numeric) AND (placement = 0::numeric) AND (device_type = 100::numeric))

Rows Removed by Filter: 7

Total runtime: 0.282 ms

"good" table: explain analyze update r_agrio_hourly set unconfirmed = unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0 and device_type = 100 and rowdate = '2015-02-23T13';

RESULT:

Update on r_agrio_hourly (cost=0.42..17.36 rows=6 width=329) (actual time=0.102..0.102 rows=0 loops=1)

-> Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..17.36 rows=6 width=329) (actual time=0.047..0.048 rows=1 loops=1)

Index Cond: ((tagid = 1000::numeric) AND (unitid = 1000::numeric) AND ((rowdate)::text = '2015-02-23T13'::text) AND (device_type = 100::numeric) AND (placement = 0::numeric))

Total runtime: 0.135 ms

When I try doing it with WITH statement (really, to apply the actual data that the plpgsql function uses), there is something strange in the "bad" table.

explain analyze

with SRC as (select * from xq_agr where id = 914830)

update r_agrio_total set

unconfirmed = unconfirmed + (SRC.r_agrio->>'unconfirmed')::numeric(38)

from SRC

where

tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and

unitid = (SRC.r_agrio->'key'->>'unit')::numeric and

placement = (SRC.r_agrio->'key'->>'placement')::numeric and

device_type = (SRC.r_agrio->'key'->>'device_type')::numeric;

RESULT:

Update on r_agrio_total (cost=8.91..32777.51 rows=19331 width=409) (actual time=0.107..0.107 rows=0 loops=1)

CTE src

-> Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44 rows=1 width=379) (actual time=0.026..0.027 rows=1 loops=1)

Index Cond: (id = 914830)

-> Nested Loop (cost=0.46..32769.07 rows=19331 width=409) (actual time=0.107..0.107 rows=0 loops=1)

-> CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual time=0.032..0.033 rows=1 loops=1)

-> Index Scan using u_r_agrio_total on r_agrio_total (cost=0.46..32285.78 rows=19331 width=321) (actual time=0.001..0.001 rows=0 loops=1)

Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->> 'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text) ->> 'unit'::text))::numeric) AND (device_type = (((src.r_agrio -> 'key'::text) ->> 'device_type'::text))::numeric) AND (placement = (((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))

Total runtime: 0.155 ms

explain analyze

with SRC as (select * from xq_agr where id = 914830)

update r_agrio_hourly set

unconfirmed = unconfirmed + (SRC.r_agrio->>'unconfirmed')::numeric(38)

from SRC

where

tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and

unitid = (SRC.r_agrio->'key'->>'unit')::numeric and

placement = (SRC.r_agrio->'key'->>'placement')::numeric and

device_type = (SRC.r_agrio->'key'->>'device_type')::numeric and

rowdate = (SRC.r_agrio->'key'->>'rowdate');

RESULT:

Update on r_agrio_hourly (cost=8.91..52.91 rows=20 width=417) (actual time=0.123..0.123 rows=0 loops=1)

CTE src

-> Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44 rows=1 width=379) (actual time=0.023..0.024 rows=1 loops=1)

Index Cond: (id = 914830)

-> Nested Loop (cost=0.47..44.47 rows=20 width=417) (actual time=0.121..0.121 rows=0 loops=1)

-> CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual time=0.030..0.031 rows=1 loops=1)

-> Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.47..43.95 rows=20 width=329) (actual time=0.000..0.000 rows=0 loops=1)

Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->> 'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text) ->> 'unit'::text))::numeric) AND ((rowdate)::text = ((src.r_agrio -> 'key'::text) ->> 'rowdate'::text)) AND (device_type = (((src.r_agrio -> 'key'::text) ->> 'device_type'::text))::numeric) AND (placement = (((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))

Total runtime: 0.176 ms

I've tried doing vacuum full analyze, rebuilding indexes. Vacuum did help somewhat, but the time differential is still huge (that 7sec vs. 80ms)

The table structure (some fields thrown out just to save output length):

=> \d r_agrio_total

Column | Type | Modifiers

-------------------+---------------+------------------------------------------------------------

id | bigint | not null default nextval('r_agrio_total_id_seq'::regclass)

tagid | numeric(38,0) | not null

unitid | numeric(38,0) | not null

device_type | numeric(38,0) | not null

placement | numeric(38,0) | not null default 0

unserved | numeric(38,0) | not null default 0

r_brkconn | json |

Indexes:

"r_agrio_total_pkey" PRIMARY KEY, btree (id)

"u_r_agrio_total" UNIQUE, btree (tagid, unitid, device_type, placement)

"unit_r_agrio_total" btree (unitid)

"tag_r_agrio_total" btree (tagid)

=> \d r_agrio_hourly

Column | Type | Modifiers

-------------------+-----------------------+-------------------------------------------------------------

id | bigint | not null default nextval('r_agrio_hourly_id_seq'::regclass)

tagid | numeric(38,0) | not null

unitid | numeric(38,0) | not null

rowdate | character varying(15) | not null

device_type | numeric(38,0) | not null

placement | numeric(38,0) | not null default 0

unserved | numeric(38,0) | not null default 0

unconfirmed | numeric(38,0) | not null default 0

r_brkconn | json |

Indexes:

"r_agrio_hourly_pkey" PRIMARY KEY, btree (id)

"u_r_agrio_hourly" UNIQUE, btree (tagid, unitid, rowdate, device_type, placement)

"unit_r_agrio_hourly" btree (unitid)

"rowdate_r_agrio_hourly" btree (rowdate)

"tag_r_agrio_hourly" btree (tagid)

Would appreciate any help on this, thank you!

Re: Help with slow table update

From

Jim Nasby

Date:

13 April 2015, 00:40:28

On 4/9/15 6:18 PM, Pawel Veselov wrote:
> Hi.
>
> I have a plpgsql procedure that updates a few similar tables.
> for some reason, updates on one of the tables take a lot longer the
> updates on the other ones. The difference is, say, 7 seconds vs. 80
> milliseconds.
>
> the procedure uses cursors and record variables to do the updates. For
> example:
>
>          update r_agrio_total set
>            unserved = unserved + (agrow->>'unserved')::numeric(38),
>            r_brkconn = mush_brk_conn(r_brkconn, q_item.r_brkconn),
>          where
>            tagid = _tagid and
>            unitid = (akey->>'unitid')::numeric and
>            placement = (akey->>'placement')::numeric and
>            device_type = (akey->>'device_type')::numeric;
>
> There is another table (xq_agr) that is read record by record, and for
> each of those records, such update is executed.
>
> I was trying to select analyze the updates to see where the time could
> be spent.
> There are only 24 row in the "bad" table, and 3,400 rows in "good"
> table. So, for the "bad" table, most of the updates will be on the same
> rows. The times were measured on processing 100 original records.
>
> When I'm analyzing pure update statements, I don't see anything strange.
>
> "bad" table: explain analyze update r_agrio_total set unconfirmed =
> unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
> and device_type = 100;
>
> RESULT:
>   Update on r_agrio_total  (cost=0.42..4.46 rows=1 width=321) (actual
> time=0.253..0.253 rows=0 loops=1)
>     ->  Index Scan using tag_r_agrio_total on r_agrio_total
>   (cost=0.42..4.46 rows=1 width=321) (actual time=0.037..0.041 rows=1
> loops=1)
>           Index Cond: (tagid = 1000::numeric)
>           Filter: ((unitid = 1000::numeric) AND (placement = 0::numeric)
> AND (device_type = 100::numeric))
>           Rows Removed by Filter: 7
>   Total runtime: 0.282 ms
>
> "good" table: explain analyze update r_agrio_hourly set unconfirmed =
> unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
> and device_type = 100 and rowdate = '2015-02-23T13';
>
> RESULT:
>   Update on r_agrio_hourly  (cost=0.42..17.36 rows=6 width=329) (actual
> time=0.102..0.102 rows=0 loops=1)
>     ->  Index Scan using u_r_agrio_hourly on r_agrio_hourly
>   (cost=0.42..17.36 rows=6 width=329) (actual time=0.047..0.048 rows=1
> loops=1)
>           Index Cond: ((tagid = 1000::numeric) AND (unitid =
> 1000::numeric) AND ((rowdate)::text = '2015-02-23T13'::text) AND
> (device_type = 100::numeric) AND (placement = 0::numeric))
>   Total runtime: 0.135 ms
>
> When I try doing it with WITH statement (really, to apply the actual
> data that the plpgsql function uses), there is something strange in the
> "bad" table.
>
> explain analyze
> with SRC as (select * from xq_agr where id = 914830)
>          update r_agrio_total set
>            unconfirmed = unconfirmed +
> (SRC.r_agrio->>'unconfirmed')::numeric(38)
>          from SRC
>          where
>            tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and
>            unitid = (SRC.r_agrio->'key'->>'unit')::numeric and
>            placement = (SRC.r_agrio->'key'->>'placement')::numeric and
>            device_type = (SRC.r_agrio->'key'->>'device_type')::numeric;
>
> RESULT:
>   Update on r_agrio_total  (cost=8.91..32777.51 rows=19331 width=409)
> (actual time=0.107..0.107 rows=0 loops=1)
>     CTE src
>       ->  Index Scan using xq_agr_pkey on xq_agr  (cost=0.42..8.44
> rows=1 width=379) (actual time=0.026..0.027 rows=1 loops=1)
>             Index Cond: (id = 914830)
>     ->  Nested Loop  (cost=0.46..32769.07 rows=19331 width=409) (actual
> time=0.107..0.107 rows=0 loops=1)
>           ->  CTE Scan on src  (cost=0.00..0.02 rows=1 width=88) (actual
> time=0.032..0.033 rows=1 loops=1)
>           ->  Index Scan using u_r_agrio_total on r_agrio_total
>   (*cost=0.46..32285.78 rows=19331* width=321) (actual time=0.001..0.001
> rows=0 loops=1)
>                 Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->>
> 'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text)
> ->> 'unit'::text))::numeric) AND (device_type = (((src.r_agrio ->
> 'key'::text) ->> 'device_type'::text))::numeric) AND (placement =
> (((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))
>   Total runtime: 0.155 ms
>
> explain analyze
> with SRC as (select * from xq_agr where id = 914830)
>          update r_agrio_hourly set
>            unconfirmed = unconfirmed +
> (SRC.r_agrio->>'unconfirmed')::numeric(38)
>          from SRC
>          where
>            tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and
>            unitid = (SRC.r_agrio->'key'->>'unit')::numeric and
>            placement = (SRC.r_agrio->'key'->>'placement')::numeric and
>            device_type = (SRC.r_agrio->'key'->>'device_type')::numeric and
>            rowdate = (SRC.r_agrio->'key'->>'rowdate');
>
> RESULT:
>   Update on r_agrio_hourly  (cost=8.91..52.91 rows=20 width=417) (actual
> time=0.123..0.123 rows=0 loops=1)
>     CTE src
>       ->  Index Scan using xq_agr_pkey on xq_agr  (cost=0.42..8.44
> rows=1 width=379) (actual time=0.023..0.024 rows=1 loops=1)
>             Index Cond: (id = 914830)
>     ->  Nested Loop  (cost=0.47..44.47 rows=20 width=417) (actual
> time=0.121..0.121 rows=0 loops=1)
>           ->  CTE Scan on src  (cost=0.00..0.02 rows=1 width=88) (actual
> time=0.030..0.031 rows=1 loops=1)
>           ->  Index Scan using u_r_agrio_hourly on r_agrio_hourly
>   (*cost=0.47..43.95 rows=20* width=329) (actual time=0.000..0.000
> rows=0 loops=1)
>                 Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->>
> 'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text)
> ->> 'unit'::text))::numeric) AND ((rowdate)::text = ((src.r_agrio ->
> 'key'::text) ->> 'rowdate'::text)) AND (device_type = (((src.r_agrio ->
> 'key'::text) ->> 'device_type'::text))::numeric) AND (placement =
> (((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))
>   Total runtime: 0.176 ms
>
> I've tried doing vacuum full analyze, rebuilding indexes. Vacuum did
> help somewhat, but the time differential is still huge (that 7sec vs. 80ms)
>
> The table structure (some fields thrown out just to save output length):
>
> => \d r_agrio_total
>        Column       |     Type      |                         Modifiers
> -------------------+---------------+------------------------------------------------------------
>   id                | bigint        | not null default
> nextval('r_agrio_total_id_seq'::regclass)
>   tagid             | numeric(38,0) | not null
>   unitid            | numeric(38,0) | not null
>   device_type       | numeric(38,0) | not null
>   placement         | numeric(38,0) | not null default 0
>   unserved          | numeric(38,0) | not null default 0
>   r_brkconn         | json          |
> Indexes:
>      "r_agrio_total_pkey" PRIMARY KEY, btree (id)
>      "u_r_agrio_total" UNIQUE, btree (tagid, unitid, device_type, placement)
>      "unit_r_agrio_total" btree (unitid)
>      "tag_r_agrio_total" btree (tagid)
>
> => \d r_agrio_hourly
>        Column       |         Type          |
>   Modifiers
> -------------------+-----------------------+-------------------------------------------------------------
>   id                | bigint                | not null default
> nextval('r_agrio_hourly_id_seq'::regclass)
>   tagid             | numeric(38,0)         | not null
>   unitid            | numeric(38,0)         | not null
>   rowdate           | character varying(15) | not null
>   device_type       | numeric(38,0)         | not null
>   placement         | numeric(38,0)         | not null default 0
>   unserved          | numeric(38,0)         | not null default 0
>   unconfirmed       | numeric(38,0)         | not null default 0
>   r_brkconn         | json                  |
> Indexes:
>      "r_agrio_hourly_pkey" PRIMARY KEY, btree (id)
>      "u_r_agrio_hourly" UNIQUE, btree (tagid, unitid, rowdate,
> device_type, placement)
>      "unit_r_agrio_hourly" btree (unitid)
>      "rowdate_r_agrio_hourly" btree (rowdate)
>      "tag_r_agrio_hourly" btree (tagid)
>
> Would appreciate any help on this, thank you!

What is the ->> operator? I'm not familiar with it.

numeric is FAR slower than int/bigint. Unless you *really* need 38
digits, use one of the other types.

Cursors tend to make things slow. Avoid them if you can.

As for your specific question, I suggest you modify the plpgsql function
so that it's doing an EXPLAIN ANALYZE on the slow table. EXPLAIN ANALYZE
actually returns a recordset the same way a SELECT would, with a single
column of type text. So you just need to do something with that output.
The easiest thing would be to replace this in your function:

UPDATE slow_table SET ...

to this (untested)

RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ...

and change the function so it returns SETOF text instead of whatever it
returns now.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Help with slow table update

From

Pawel Veselov

Date:

14 April 2015, 00:01:57

On Sun, Apr 12, 2015 at 5:40 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 4/9/15 6:18 PM, Pawel Veselov wrote:
Hi.

I have a plpgsql procedure that updates a few similar tables.
for some reason, updates on one of the tables take a lot longer the
updates on the other ones. The difference is, say, 7 seconds vs. 80
milliseconds.

the procedure uses cursors and record variables to do the updates. For
example:

update r_agrio_total set
unserved = unserved + (agrow->>'unserved')::numeric(38),
r_brkconn = mush_brk_conn(r_brkconn, q_item.r_brkconn),
where
tagid = _tagid and
unitid = (akey->>'unitid')::numeric and
placement = (akey->>'placement')::numeric and
device_type = (akey->>'device_type')::numeric;

There is another table (xq_agr) that is read record by record, and for
each of those records, such update is executed.

I was trying to select analyze the updates to see where the time could
be spent.
There are only 24 row in the "bad" table, and 3,400 rows in "good"
table. So, for the "bad" table, most of the updates will be on the same
rows. The times were measured on processing 100 original records.

When I'm analyzing pure update statements, I don't see anything strange.

"bad" table: explain analyze update r_agrio_total set unconfirmed =
unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
and device_type = 100;

RESULT:
Update on r_agrio_total (cost=0.42..4.46 rows=1 width=321) (actual
time=0.253..0.253 rows=0 loops=1)
-> Index Scan using tag_r_agrio_total on r_agrio_total
(cost=0.42..4.46 rows=1 width=321) (actual time=0.037..0.041 rows=1
loops=1)
Index Cond: (tagid = 1000::numeric)
Filter: ((unitid = 1000::numeric) AND (placement = 0::numeric)
AND (device_type = 100::numeric))
Rows Removed by Filter: 7
Total runtime: 0.282 ms

"good" table: explain analyze update r_agrio_hourly set unconfirmed =
unconfirmed +0 where tagid = 1000 and unitid = 1000 and placement = 0
and device_type = 100 and rowdate = '2015-02-23T13';

RESULT:
Update on r_agrio_hourly (cost=0.42..17.36 rows=6 width=329) (actual
time=0.102..0.102 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..17.36 rows=6 width=329) (actual time=0.047..0.048 rows=1
loops=1)
Index Cond: ((tagid = 1000::numeric) AND (unitid =
1000::numeric) AND ((rowdate)::text = '2015-02-23T13'::text) AND
(device_type = 100::numeric) AND (placement = 0::numeric))
Total runtime: 0.135 ms

When I try doing it with WITH statement (really, to apply the actual
data that the plpgsql function uses), there is something strange in the
"bad" table.

explain analyze
with SRC as (select * from xq_agr where id = 914830)
update r_agrio_total set
unconfirmed = unconfirmed +
(SRC.r_agrio->>'unconfirmed')::numeric(38)
from SRC
where
tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and
unitid = (SRC.r_agrio->'key'->>'unit')::numeric and
placement = (SRC.r_agrio->'key'->>'placement')::numeric and
device_type = (SRC.r_agrio->'key'->>'device_type')::numeric;

RESULT:
Update on r_agrio_total (cost=8.91..32777.51 rows=19331 width=409)
(actual time=0.107..0.107 rows=0 loops=1)
CTE src
-> Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44
rows=1 width=379) (actual time=0.026..0.027 rows=1 loops=1)
Index Cond: (id = 914830)
-> Nested Loop (cost=0.46..32769.07 rows=19331 width=409) (actual
time=0.107..0.107 rows=0 loops=1)
-> CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual
time=0.032..0.033 rows=1 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(*cost=0.46..32285.78 rows=19331* width=321) (actual time=0.001..0.001

rows=0 loops=1)
Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->>
'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text)
->> 'unit'::text))::numeric) AND (device_type = (((src.r_agrio ->
'key'::text) ->> 'device_type'::text))::numeric) AND (placement =
(((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))
Total runtime: 0.155 ms

explain analyze
with SRC as (select * from xq_agr where id = 914830)
update r_agrio_hourly set
unconfirmed = unconfirmed +
(SRC.r_agrio->>'unconfirmed')::numeric(38)
from SRC
where
tagid = (SRC.r_agrio->'key'->>'tagid')::numeric and
unitid = (SRC.r_agrio->'key'->>'unit')::numeric and
placement = (SRC.r_agrio->'key'->>'placement')::numeric and
device_type = (SRC.r_agrio->'key'->>'device_type')::numeric and
rowdate = (SRC.r_agrio->'key'->>'rowdate');

RESULT:
Update on r_agrio_hourly (cost=8.91..52.91 rows=20 width=417) (actual
time=0.123..0.123 rows=0 loops=1)
CTE src
-> Index Scan using xq_agr_pkey on xq_agr (cost=0.42..8.44
rows=1 width=379) (actual time=0.023..0.024 rows=1 loops=1)
Index Cond: (id = 914830)
-> Nested Loop (cost=0.47..44.47 rows=20 width=417) (actual
time=0.121..0.121 rows=0 loops=1)
-> CTE Scan on src (cost=0.00..0.02 rows=1 width=88) (actual
time=0.030..0.031 rows=1 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(*cost=0.47..43.95 rows=20* width=329) (actual time=0.000..0.000

rows=0 loops=1)
Index Cond: ((tagid = (((src.r_agrio -> 'key'::text) ->>
'tagid'::text))::numeric) AND (unitid = (((src.r_agrio -> 'key'::text)
->> 'unit'::text))::numeric) AND ((rowdate)::text = ((src.r_agrio ->
'key'::text) ->> 'rowdate'::text)) AND (device_type = (((src.r_agrio ->
'key'::text) ->> 'device_type'::text))::numeric) AND (placement =
(((src.r_agrio -> 'key'::text) ->> 'placement'::text))::numeric))
Total runtime: 0.176 ms

I've tried doing vacuum full analyze, rebuilding indexes. Vacuum did
help somewhat, but the time differential is still huge (that 7sec vs. 80ms)

The table structure (some fields thrown out just to save output length):

=> \d r_agrio_total
Column | Type | Modifiers
-------------------+---------------+------------------------------------------------------------
id | bigint | not null default
nextval('r_agrio_total_id_seq'::regclass)
tagid | numeric(38,0) | not null
unitid | numeric(38,0) | not null
device_type | numeric(38,0) | not null
placement | numeric(38,0) | not null default 0
unserved | numeric(38,0) | not null default 0
r_brkconn | json |
Indexes:
"r_agrio_total_pkey" PRIMARY KEY, btree (id)
"u_r_agrio_total" UNIQUE, btree (tagid, unitid, device_type, placement)
"unit_r_agrio_total" btree (unitid)
"tag_r_agrio_total" btree (tagid)

=> \d r_agrio_hourly
Column | Type |
Modifiers
-------------------+-----------------------+-------------------------------------------------------------
id | bigint | not null default
nextval('r_agrio_hourly_id_seq'::regclass)
tagid | numeric(38,0) | not null
unitid | numeric(38,0) | not null
rowdate | character varying(15) | not null
device_type | numeric(38,0) | not null
placement | numeric(38,0) | not null default 0
unserved | numeric(38,0) | not null default 0
unconfirmed | numeric(38,0) | not null default 0
r_brkconn | json |
Indexes:
"r_agrio_hourly_pkey" PRIMARY KEY, btree (id)
"u_r_agrio_hourly" UNIQUE, btree (tagid, unitid, rowdate,
device_type, placement)
"unit_r_agrio_hourly" btree (unitid)
"rowdate_r_agrio_hourly" btree (rowdate)
"tag_r_agrio_hourly" btree (tagid)

Would appreciate any help on this, thank you!

What is the ->> operator? I'm not familiar with it.

It extracts JSON value as a string.

numeric is FAR slower than int/bigint. Unless you *really* need 38 digits, use one of the other types.

Yeah, that's on purpose.

Cursors tend to make things slow. Avoid them if you can.

Is there an alternative to iterating over a number of rows, where a direct update query is not an option?

I really doubt that either the actual processing logic, including use of types has anything to do with my problem. This is based on the fact that out of the tables that are being changed, only one is exhibiting the problem. All of the involved tables have nearly the same structure, and have the same logical operations performed on them. I thought may be the "bad" table is slow because it was first in the list, and Postgres was caching the functions results, but I moved things around, and pattern is the same.

As for your specific question, I suggest you modify the plpgsql function so that it's doing an EXPLAIN ANALYZE on the slow table. EXPLAIN ANALYZE actually returns a recordset the same way a SELECT would, with a single column of type text. So you just need to do something with that output. The easiest thing would be to replace this in your function:

UPDATE slow_table SET ...

to this (untested)

RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ...

and change the function so it returns SETOF text instead of whatever it returns now.

Thank you, that made it a lot easier to see into what's really going on. But the outcome is somewhat the same. The "bad" table analysis shows a very high cost, and thousands of rows, where the table contains only 24 rows. This time, however, the actual run time is shown, and one can see where the time is spent (I was using just a sum of clock_time()s around the update statements to see where the problem is).

r_agrio_hourly - "good", r_agrio_total - "bad".

Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=2.248..2.248 rows=0 loops=1)

-> Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1 loops=1)

Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 3::numeric) AND (placement = 2::numeric))

Total runtime: 2.281 ms

Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=106.766..106.766 rows=0 loops=1)

-> Index Scan using u_r_agrio_total on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626 rows=1 loops=1)

Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric))

Total runtime: 106.793 ms

Update on r_agrio_hourly (cost=0.42..978.49 rows=252 width=329) (actual time=1.796..1.796 rows=0 loops=1)

-> Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..978.49 rows=252 width=329) (actual time=1.054..1.090 rows=1 loops=1)

Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 1::numeric) AND (placement = 2::numeric))

Total runtime: 1.824 ms

Update on r_agrio_total (cost=0.42..11553.27 rows=3017 width=321) (actual time=102.685..102.685 rows=0 loops=1)

-> Index Scan using u_r_agrio_total on r_agrio_total (cost=0.42..11553.27 rows=3017 width=321) (actual time=0.675..33.375 rows=1 loops=1)

Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND (device_type = 1::numeric) AND (placement = 2::numeric))

Total runtime: 102.711 ms

Update on r_agrio_hourly (cost=0.42..1005.13 rows=259 width=329) (actual time=1.962..1.962 rows=0 loops=1)

-> Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..1005.13 rows=259 width=329) (actual time=0.978..1.124 rows=1 loops=1)

Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 2::numeric) AND (placement = 2::numeric))

Total runtime: 1.992 ms

Update on r_agrio_total (cost=0.42..66541.26 rows=18103 width=321) (actual time=108.574..108.574 rows=0 loops=1)

-> Index Scan using u_r_agrio_total on r_agrio_total (cost=0.42..66541.26 rows=18103 width=321) (actual time=0.761..33.245 rows=1 loops=1)

Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND (device_type = 2::numeric) AND (placement = 2::numeric))

Total runtime: 108.601 ms

(and there is ~800 more, with the same pattern).

Thank you!

Re: Help with slow table update

From

"David G. Johnston"

Date:

14 April 2015, 01:03:22

On Mon, Apr 13, 2015 at 5:01 PM, Pawel Veselov <pawel.veselov@gmail.com> wrote:

r_agrio_hourly - "good", r_agrio_total - "bad".

Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=2.248..2.248 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 2.281 ms
Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=106.766..106.766 rows=0 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626 rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 106.793 ms

What it is you expect to see here?

What are the results (count and times) for:

SELECT count(*) FROM r_agrio_total WHERE tagid = 1002 and unitid = 1002;

SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002 and unitid = 1002;

More queries along this line might be needed. The underlying question is how many index rows need to be skipped over on "total" to get the final result - or rather are the columns in the index in descending order of cardinality?

Any chance you can perform a "REINDEX" - maybe there is some bloat present? There are queries to help discern if that may be the case, I do not know then off the top of my head, but just doing it might be acceptable and is definitely quicker if so.

I'm still not really following your presentation but maybe my thoughts will spark something.

David J.

Re: Help with slow table update

From

Jim Nasby

Date:

14 April 2015, 02:38:03

On 4/13/15 7:01 PM, Pawel Veselov wrote:
>     Cursors tend to make things slow. Avoid them if you can.
>
>
> Is there an alternative to iterating over a number of rows, where a
> direct update query is not an option?
>
> I really doubt that either the actual processing logic, including use of
> types has anything to do with my problem. This is based on the fact that
> out of the tables that are being changed, only one is exhibiting the
> problem. All of the involved tables have nearly the same structure, and
> have the same logical operations performed on them. I thought may be the
> "bad" table is slow because it was first in the list, and Postgres was
> caching the functions results, but I moved things around, and pattern is
> the same.

I'm guessing that you're essentially processing a queue. Take a look at
http://www.postgresql.org/message-id/552C750F.2010401@BlueTreble.com for
some ideas. Basically, not only do cursors have non-trivial overhead,
doing a ton of single-row queries is going to have a non-trivial
overhead itself.

>     As for your specific question, I suggest you modify the plpgsql
>     function so that it's doing an EXPLAIN ANALYZE on the slow table.
>     EXPLAIN ANALYZE actually returns a recordset the same way a SELECT
>     would, with a single column of type text. So you just need to do
>     something with that output. The easiest thing would be to replace
>     this in your function:
>
>     UPDATE slow_table SET ...
>
>     to this (untested)
>
>     RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ...
>
>     and change the function so it returns SETOF text instead of whatever
>     it returns now.
>
>
> Thank you, that made it a lot easier to see into what's really going on.
> But the outcome is somewhat the same. The "bad" table analysis shows a
> very high cost, and thousands of rows, where the table contains only 24
> rows. This time, however, the actual run time is shown, and one can see
> where the time is spent (I was using just a sum of clock_time()s around
> the update statements to see where the problem is).
>
> r_agrio_hourly - "good", r_agrio_total - "bad".
>
>   Update on r_agrio_hourly  (cost=0.42..970.32 rows=250 width=329)
> (actual time=2.248..2.248 rows=0 loops=1)
>   ->  Index Scan using u_r_agrio_hourly on r_agrio_hourly
>   (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207
> rows=1 loops=1)
>           Index Cond: ((tagid = 1002::numeric) AND (unitid =
> 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND
> (device_type = 3::numeric) AND (placement = 2::numeric))
>   Total runtime: 2.281 ms
>   Update on r_agrio_total  (cost=0.42..45052.56 rows=12068 width=321)
> (actual time=106.766..106.766 rows=0 loops=1)
>   ->  Index Scan using u_r_agrio_total on r_agrio_total
>   (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
> rows=1 loops=1)
>           Index Cond: ((tagid = 1002::numeric) AND (unitid =
> 1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric))
>   Total runtime: 106.793 ms

Keep in mind that the estimated cost is not terribly useful; it's the
actual times that matter.

I suspect what's happening here is a combination of things. First, the
hourly table is basically living in cache, but the total table is not.
That means that when you go to find a row in the total table you're
actually hitting the disk instead of pulling the data from memory.

Second, you may have a lot of dead rows in the total table. I suspect
this because of the very large amount of time the index scan is taking.
Unless you're running on an old 10MB MFM drive you'd be pretty hard
pressed for even 2 IO operations (one for the index leaf page and one
for the heap page) to take 32ms. I suspect the index scan is having to
read many dead rows in before it finds a live one, and incurring
multiple IOs. Swiching to EXPLAIN (analyze, buffers) would help confirm
that.

Third, I think something odd is happening with the update itself. I'm
pretty sure that the index scan itself is visiting the heap pages, so
each page should be in shared buffers by the time each tuple hits the
update node. That makes me wonder what on earth is taking 60ms to update
the tuple. I suspect it's going into either finding a free buffer to put
the new tuple on, or waiting to try and extend the relation. Selecting
ctid from the freshly updated rows and comparing the first number to the
total number of pages in the heap would show if the new tuples are all
ending up at the end of the heap.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Help with slow table update

From

Pawel Veselov

Date:

14 April 2015, 17:41:52

On Mon, Apr 13, 2015 at 6:03 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Mon, Apr 13, 2015 at 5:01 PM, Pawel Veselov <pawel.veselov@gmail.com> wrote:

r_agrio_hourly - "good", r_agrio_total - "bad".

Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=2.248..2.248 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207 rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 2.281 ms
Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=106.766..106.766 rows=0 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626 rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 106.793 ms

What it is you expect to see here?

What are the results (count and times) for:

SELECT count(*) FROM r_agrio_total WHERE tagid = 1002 and unitid = 1002;

Result: 8 (the whole table is 24 rows). It returns somewhat with a stumble, but relatively quickly.

db=> explain analyze SELECT count(*) FROM r_agrio_total WHERE tagid = 1002 and unitid = 1002;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=4.45..4.46 rows=1 width=0) (actual time=327.194..327.195 rows=1 loops=1)

-> Index Scan using tag_r_agrio_total on r_agrio_total (cost=0.42..4.45 rows=1 width=0) (actual time=0.039..327.189 rows=8 loops=1)

Index Cond: (tagid = 1002::numeric)

Filter: (unitid = 1002::numeric)

Total runtime: 327.228 ms

SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002 and unitid = 1002;

Result is 2869. Returns somewhat quckly. Explain analyze is crazy though:

db=> explain analyze SELECT count(*) FROM r_agrio_hourly WHERE tagid = 1002 and unitid = 1002;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=68134.68..68134.69 rows=1 width=0) (actual time=15177.211..15177.211 rows=1 loops=1)

-> Index Scan using adunit_r_agrio_hourly on r_agrio_hourly (cost=0.42..67027.10 rows=443035 width=0) (actual time=0.096..15175.730 rows=2869 loops=1)

Index Cond: (unitid = 1002::numeric)

Filter: (tagid = 1002::numeric)

Total runtime: 15177.240 ms

More queries along this line might be needed. The underlying question is how many index rows need to be skipped over on "total" to get the final result - or rather are the columns in the index in descending order of cardinality?

Idea is - both tables have unique multi-field indices, and each update hits exactly one row from that index, no more, and all fields from the index are locked with equality condition on the update. All of the updates (within a transaction) would always work on a small subset of rows (max a few hundred, ever; in this case, it's may be around 10). I expect it to be possible for the server to keep the active working set in the cache at all times. Since the index is unique, there shouldn't be a reason to re-scan the table, if a cached row is found, no?

Any chance you can perform a "REINDEX" - maybe there is some bloat present? There are queries to help discern if that may be the case, I do not know then off the top of my head, but just doing it might be acceptable and is definitely quicker if so.

That's the thing - I've done both vacuum full, and re-index. The very first time I did vacuum full things improved (60 seconds to 7 seconds). Re-index didn't improve anything (but it was done after vacuum full).

I'm still not really following your presentation but maybe my thoughts will spark something.

Thank you! I hope I clarified this some :)

Re: Help with slow table update

From

Pawel Veselov

Date:

14 April 2015, 18:29:03

On Mon, Apr 13, 2015 at 7:37 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 4/13/15 7:01 PM, Pawel Veselov wrote:
Cursors tend to make things slow. Avoid them if you can.

Is there an alternative to iterating over a number of rows, where a
direct update query is not an option?

I really doubt that either the actual processing logic, including use of
types has anything to do with my problem. This is based on the fact that
out of the tables that are being changed, only one is exhibiting the
problem. All of the involved tables have nearly the same structure, and
have the same logical operations performed on them. I thought may be the
"bad" table is slow because it was first in the list, and Postgres was
caching the functions results, but I moved things around, and pattern is
the same.

I'm guessing that you're essentially processing a queue. Take a look at http://www.postgresql.org/message-id/552C750F.2010401@BlueTreble.com for some ideas. Basically, not only do cursors have non-trivial overhead, doing a ton of single-row queries is going to have a non-trivial overhead itself.

Thank you for the pointers. PgQ sounds interesting, it has to be remote for RDS (I use RDS), but I'll try implementing a solution based on it.

However, for all the times that is being spent during this update, the breakdown is:

update "total" table: 10.773033

update "hourly" table: 00.179711

update "daily" table: 01.082467

update some other table (actually, it has cardinality similar to "total" table): 00.168287

clean the queue table: 00.021733

overhead: 00.014922

The "overhead" is time taken to run the whole procedure, minus all these other times that have been counted.

(some notes about the "daily" table below)

As for your specific question, I suggest you modify the plpgsql
function so that it's doing an EXPLAIN ANALYZE on the slow table.
EXPLAIN ANALYZE actually returns a recordset the same way a SELECT
would, with a single column of type text. So you just need to do
something with that output. The easiest thing would be to replace
this in your function:

UPDATE slow_table SET ...

to this (untested)

RETURN QUERY EXPLAIN ANALYZE UPDATE slow_table SET ...

and change the function so it returns SETOF text instead of whatever
it returns now.

Thank you, that made it a lot easier to see into what's really going on.
But the outcome is somewhat the same. The "bad" table analysis shows a
very high cost, and thousands of rows, where the table contains only 24
rows. This time, however, the actual run time is shown, and one can see
where the time is spent (I was using just a sum of clock_time()s around
the update statements to see where the problem is).

r_agrio_hourly - "good", r_agrio_total - "bad".

Update on r_agrio_hourly (cost=0.42..970.32 rows=250 width=329)
(actual time=2.248..2.248 rows=0 loops=1)
-> Index Scan using u_r_agrio_hourly on r_agrio_hourly
(cost=0.42..970.32 rows=250 width=329) (actual time=0.968..1.207
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid =
1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND
(device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 2.281 ms
Update on r_agrio_total (cost=0.42..45052.56 rows=12068 width=321)
(actual time=106.766..106.766 rows=0 loops=1)
-> Index Scan using u_r_agrio_total on r_agrio_total
(cost=0.42..45052.56 rows=12068 width=321) (actual time=0.936..32.626
rows=1 loops=1)
Index Cond: ((tagid = 1002::numeric) AND (unitid =
1002::numeric) AND (device_type = 3::numeric) AND (placement = 2::numeric))
Total runtime: 106.793 ms

Keep in mind that the estimated cost is not terribly useful; it's the actual times that matter.

I suspect what's happening here is a combination of things. First, the hourly table is basically living in cache, but the total table is not. That means that when you go to find a row in the total table you're actually hitting the disk instead of pulling the data from memory.

Second, you may have a lot of dead rows in the total table. I suspect this because of the very large amount of time the index scan is taking. Unless you're running on an old 10MB MFM drive you'd be pretty hard pressed for even 2 IO operations (one for the index leaf page and one for the heap page) to take 32ms. I suspect the index scan is having to read many dead rows in before it finds a live one, and incurring multiple IOs. Swiching to EXPLAIN (analyze, buffers) would help confirm that.

That looks most likely to me as well. Most of the updates in a single "batch", for the total table would be on the same record, while for hourly table it's a lot less. Logically, the tables contain identical data, except that hourly table breaks it down per hour, and total table contains the data for all times. The daily table contains the same data per day.

So, if I compared the tables, the "total" table has the smallest amount of rows, followed by daily, and then by hourly. However, total table gets the most amount of updates on the same row, followed by daily followed by hourly. It seems to me that the more of these inline updates are done on rows, the worse this particular situation is (as you can see from the above, the daily table is also already over a second long to get updated)

Here is the explain analyze output with buffers:

Update on r_agrio_hourly (cost=0.42..986.10 rows=254 width=329) (actual time=5.945..5.945 rows=0 loops=1)

Buffers: shared hit=827 read=431 dirtied=17

-> Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..986.10 rows=254 width=329) (actual time=2.220..2.645 rows=1 loops=1)

Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 3::numeric) AND (placement = 2::numeric))

Buffers: shared hit=402 read=14 dirtied=11

Total runtime: 6.005 ms

Update on r_agrio_total (cost=0.42..6.49 rows=1 width=321) (actual time=320.494..320.494 rows=0 loops=1)

Buffers: shared hit=170298 read=79791 dirtied=6 written=41

-> Index Scan using tag_r_agrio_total on r_agrio_total (cost=0.42..6.49 rows=1 width=321) (actual time=0.977..233.330 rows=1 loops=1)

Index Cond: (tagid = 1002::numeric)

Filter: ((unitid = 1002::numeric) AND (placement = 2::numeric) AND (device_type = 3::numeric))

Rows Removed by Filter: 7

Buffers: shared hit=26817 read=59973 written=41

Total runtime: 320.530 ms

Update on r_agrio_hourly (cost=0.42..989.90 rows=255 width=329) (actual time=2.605..2.605 rows=0 loops=1)

Buffers: shared hit=817 read=429 dirtied=16

-> Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..989.90 rows=255 width=329) (actual time=1.606..1.644 rows=1 loops=1)

Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 1::numeric) AND (placement = 2::numeric))

Buffers: shared hit=133 read=283 dirtied=12

Total runtime: 2.634 ms

Update on r_agrio_total (cost=0.42..6.49 rows=1 width=321) (actual time=283.720..283.720 rows=0 loops=1)

Buffers: shared hit=196398 read=53955 dirtied=5 written=18

-> Index Scan using tag_r_agrio_total on r_agrio_total (cost=0.42..6.49 rows=1 width=321) (actual time=0.868..210.946 rows=1 loops=1)

Index Cond: (tagid = 1002::numeric)

Filter: ((unitid = 1002::numeric) AND (placement = 2::numeric) AND (device_type = 1::numeric))

Rows Removed by Filter: 7

Buffers: shared hit=39758 read=47033 written=18

Total runtime: 283.757 ms

Update on r_agrio_hourly (cost=0.42..1020.91 rows=263 width=329) (actual time=1.850..1.850 rows=0 loops=1)

Buffers: shared hit=1211 read=35 dirtied=15

-> Index Scan using u_r_agrio_hourly on r_agrio_hourly (cost=0.42..1020.91 rows=263 width=329) (actual time=0.976..1.087 rows=1 loops=1)

Index Cond: ((tagid = 1002::numeric) AND (unitid = 1002::numeric) AND ((rowdate)::text = '2015-04-09T23'::text) AND (device_type = 2::numeric) AND (placement = 2::numeric))

Buffers: shared hit=399 read=17 dirtied=11

Total runtime: 1.878 ms

Update on r_agrio_total (cost=0.42..6.49 rows=1 width=321) (actual time=287.481..287.481 rows=0 loops=1)

Buffers: shared hit=181548 read=68207 dirtied=4 written=24

-> Index Scan using tag_r_agrio_total on r_agrio_total (cost=0.42..6.49 rows=1 width=321) (actual time=0.785..206.947 rows=1 loops=1)

Index Cond: (tagid = 1002::numeric)

Filter: ((unitid = 1002::numeric) AND (placement = 2::numeric) AND (device_type = 2::numeric))

Rows Removed by Filter: 7

Buffers: shared hit=31168 read=55623 written=24

Total runtime: 287.506 ms

Third, I think something odd is happening with the update itself. I'm pretty sure that the index scan itself is visiting the heap pages, so each page should be in shared buffers by the time each tuple hits the update node. That makes me wonder what on earth is taking 60ms to update the tuple. I suspect it's going into either finding a free buffer to put the new tuple on, or waiting to try and extend the relation. Selecting ctid from the freshly updated rows and comparing the first number to the total number of pages in the heap would show if the new tuples are all ending up at the end of the heap.

before:

ctid

-----------

(25536,8)

(77292,7)

(77292,6)

(77292,5)

(77292,4)

(77292,3)

(77292,2)

(77292,1)

after:

ctid

-----------

(77314,2)

(77314,1)

(77313,7)

(77313,6)

(77313,5)

(77313,4)

(77313,8)

(77312,9)

But, I'm sorry, I don't know how to determine "total number of pages in the heap"...

I wonder if what I need to do, considering that I update a lot of "the same" rows as I process this queue, is to create a temp table, update the rows there, and then update the actual tables once at the end...

Re: Help with slow table update

From

Jim Nasby

Date:

14 April 2015, 20:16:17

On 4/14/15 1:28 PM, Pawel Veselov wrote:
>
> I wonder if what I need to do, considering that I update a lot of "the
> same" rows as I process this queue, is to create a temp table, update
> the rows there, and then update the actual tables once at the end...

That's what I'd do.

The other option would be to use a constraint trigger paired with a
per-row trigger on the hourly table to drive the daily table, and on the
daily table to drive the total table. The way that would work is the
per-row table would simply keep track of all the unique records that
were changed in a statement (presumably by putting them in a temp
table). Once the statement is "done", the constraint trigger would fire;
it would summarize all the changed data and do a much smaller number of
updates to the table being summarized into.

BTW, you also made a comment about not having to hit the table if you
look at something in an index. You can only do that if all the data you
need is in the index, AND the page with the record is marked as being
all-visible (google for Postgres Visibility Map). If that's not the case
then you still have to pull the row in the table in, in order to
determine visibility. The only case where you can still avoid hitting
the table is something like a NOT EXISTS; if you can't find any entries
in the index for something then they definitely won't be in the table.
But remember that if you update or delete a row, removing it from an
index, the data will stay in that index until vacuum comes along.

Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum
full rebuilds all the indexes for you.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Help with slow table update

From

Pawel Veselov

Date:

14 April 2015, 21:44:36

On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 4/14/15 1:28 PM, Pawel Veselov wrote:

I wonder if what I need to do, considering that I update a lot of "the
same" rows as I process this queue, is to create a temp table, update
the rows there, and then update the actual tables once at the end...

That's what I'd do.

Well, in short, I changed (repeat the body of loop for how many tables are there)

LOOP (item)

UPDATE table with item

IF not found INSERT item INTO table; END IF;

END LOOP;

to:

CREATE TEMP TABLE xq_table (like table) on commit drop;

LOOP (item)

LOOP

UPDATE xq_table with item;

exit when found;

INSERT INTO xq_table select * from table for update;

continue when found;

INSERT item INTO xq_table;

exit;

END LOOP;

UPDATE table a set (rows) = (xq.rows)

FROM xq_table xq

WHERE (a.keys) = (xq.keys)

That works significantly faster. The final update statement is very fast. The process is somewhat slow in the beginning as it sucks in records from "total" into "xq_total", but once all of that is moved into the temp table, it rushes through the rest.

The other option would be to use a constraint trigger paired with a per-row trigger on the hourly table to drive the daily table, and on the daily table to drive the total table. The way that would work is the per-row table would simply keep track of all the unique records that were changed in a statement (presumably by putting them in a temp table). Once the statement is "done", the constraint trigger would fire; it would summarize all the changed data and do a much smaller number of updates to the table being summarized into.

I'm not sure how I would be able to avoid the same number of changes on the total table, trigger would fire on each update, won't it? So, same problem with a lot of changes on a table...

BTW, you also made a comment about not having to hit the table if you look at something in an index. You can only do that if all the data you need is in the index, AND the page with the record is marked as being all-visible (google for Postgres Visibility Map). If that's not the case then you still have to pull the row in the table in, in order to determine visibility. The only case where you can still avoid hitting the table is something like a NOT EXISTS; if you can't find any entries in the index for something then they definitely won't be in the table.

What I was saying is that if a table has a unique index, and there is cached fact that a particular index value points to a particular row, there shouldn't be a need to re-scan the index again to search for any more matching values (which would be necessary if the index was not unique). Again, all considering the size of the index, the amount of different index values that are being queried, etc.

But remember that if you update or delete a row, removing it from an index, the data will stay in that index until vacuum comes along.

Also, there's no point in doing a REINDEX after a VACUUM FULL; vacuum full rebuilds all the indexes for you.

I was being desperate :)

I still think there is something very wrong with this particular table. First, I have production systems that employ this function on way larger data set, and there is no problem (so far, but still). This machine is part of a test deployment, there is no constant load, the only data that is being written now is when I do these tests. Vacuuming should prune all that dead stuff, and if it's absent, it's unclear where is the time spent navigating/updating the table with 24 rows :)

Re: Help with slow table update

From

Jim Nasby

Date:

14 April 2015, 22:30:03

On 4/14/15 4:44 PM, Pawel Veselov wrote:
> On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim.Nasby@bluetreble.com
> <mailto:Jim.Nasby@bluetreble.com>> wrote:
>
>     On 4/14/15 1:28 PM, Pawel Veselov wrote:
>
>
>         I wonder if what I need to do, considering that I update a lot
>         of "the
>         same" rows as I process this queue, is to create a temp table,
>         update
>         the rows there, and then update the actual tables once at the end...
>
>
>     That's what I'd do.
>
>
> Well, in short, I changed (repeat the body of loop for how many tables
> are there)
>
> LOOP (item)
>    UPDATE table with item
>    IF not found INSERT item INTO table; END IF;
> END LOOP;
>
> to:
>
> CREATE TEMP TABLE xq_table (like table) on commit drop;
> LOOP (item)
>    LOOP
>      UPDATE xq_table with item;
>      exit when found;
>      INSERT INTO xq_table select * from table for update;
>      continue when found;
>      INSERT item INTO xq_table;
>      exit;
>    END LOOP;
> END LOOP;
> UPDATE table a set (rows) = (xq.rows)
>    FROM xq_table xq
>    WHERE (a.keys) = (xq.keys)
>
> That works significantly faster. The final update statement is very
> fast. The process is somewhat slow in the beginning as it sucks in
> records from "total" into "xq_total", but once all of that is moved into
> the temp table, it rushes through the rest.

Databases like to think in sets. It will generally be more efficient to
do set operations instead of a bunch of row-by-row stuff.

Since you're pulling all of this from some other table your best bet is
probably something like:

CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;

CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;

-- Same thing for daily
-- Same thing for total

>     The other option would be to use a constraint trigger paired with a
>     per-row trigger on the hourly table to drive the daily table, and on
>     the daily table to drive the total table. The way that would work is
>     the per-row table would simply keep track of all the unique records
>     that were changed in a statement (presumably by putting them in a
>     temp table). Once the statement is "done", the constraint trigger
>     would fire; it would summarize all the changed data and do a much
>     smaller number of updates to the table being summarized into.
>
>
> I'm not sure how I would be able to avoid the same number of changes on
> the total table, trigger would fire on each update, won't it? So, same
> problem with a lot of changes on a table...

The difference is that you'd be doing plain INSERTs into a temp table
and then summarizing that. That's going to be a LOT more efficient than
a slew of updates on an existing table.

>     BTW, you also made a comment about not having to hit the table if
>     you look at something in an index. You can only do that if all the
>     data you need is in the index, AND the page with the record is
>     marked as being all-visible (google for Postgres Visibility Map). If
>     that's not the case then you still have to pull the row in the table
>     in, in order to determine visibility. The only case where you can
>     still avoid hitting the table is something like a NOT EXISTS; if you
>     can't find any entries in the index for something then they
>     definitely won't be in the table.
>
>
> What I was saying is that if a table has a unique index, and there is
> cached fact that a particular index value points to a particular row,
> there shouldn't be a need to re-scan the index again to search for any
> more matching values (which would be necessary if the index was not
> unique). Again, all considering the size of the index, the amount of
> different index values that are being queried, etc.

It still has to rescan because of visibility concerns.

>     But remember that if you update or delete a row, removing it from an
>     index, the data will stay in that index until vacuum comes along.
>
>     Also, there's no point in doing a REINDEX after a VACUUM FULL;
>     vacuum full rebuilds all the indexes for you.
>
>
> I was being desperate :)
>
> I still think there is something very wrong with this particular table.
> First, I have production systems that employ this function on way larger
> data set, and there is no problem (so far, but still). This machine is
> part of a test deployment, there is no constant load, the only data that
> is being written now is when I do these tests. Vacuuming should prune
> all that dead stuff, and if it's absent, it's unclear where is the time
> spent navigating/updating the table with 24 rows :)

I think you definitely have a problem with dead rows, as evidenced by
the huge improvement VACUUM FULL made.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: Help with slow table update

From

Pawel Veselov

Date:

15 April 2015, 00:01:43

On Tue, Apr 14, 2015 at 3:29 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 4/14/15 4:44 PM, Pawel Veselov wrote:
On Tue, Apr 14, 2015 at 1:15 PM, Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>> wrote:

On 4/14/15 1:28 PM, Pawel Veselov wrote:

I wonder if what I need to do, considering that I update a lot
of "the
same" rows as I process this queue, is to create a temp table,
update
the rows there, and then update the actual tables once at the end...

That's what I'd do.

Well, in short, I changed (repeat the body of loop for how many tables
are there)

LOOP (item)
UPDATE table with item
IF not found INSERT item INTO table; END IF;
END LOOP;

to:

CREATE TEMP TABLE xq_table (like table) on commit drop;
LOOP (item)
LOOP
UPDATE xq_table with item;
exit when found;
INSERT INTO xq_table select * from table for update;
continue when found;
INSERT item INTO xq_table;
exit;
END LOOP;
END LOOP;
UPDATE table a set (rows) = (xq.rows)
FROM xq_table xq
WHERE (a.keys) = (xq.keys)

That works significantly faster. The final update statement is very
fast. The process is somewhat slow in the beginning as it sucks in
records from "total" into "xq_total", but once all of that is moved into
the temp table, it rushes through the rest.

Databases like to think in sets. It will generally be more efficient to do set operations instead of a bunch of row-by-row stuff.

Since you're pulling all of this from some other table your best bet is probably something like:

CREATE TEMP TABLE raw AS DELETE FROM queue WHERE ... RETURNING *;

CREATE TEMP VIEW hourly_v AS SELECT ... FROM raw GROUP BY;
UPDATE ar_hourly SET ... FROM hourly_v JOIN ...;
INSERT INTO ar_hourly SELECT FROM hourly_v LEFT JOIN ar_hourly ON ...;

-- Same thing for daily
-- Same thing for total

In my previous post, there was a problem with that pseudo-code, as it's missing inserts into the final table at the end of loop, for those records that need to be inserted and not updated.

This is where using sets becomes really tedious, as Postgres severely lacks an upsert-like statement.

I don't think there are joins allowed in UPDATE statement, so I will need to use WITH query, right?

Also, I'm not sure how LEFT JOIN will help me isolate and insert "missed" entries...

Would it be OK to replace upsert part with merging into a temp table, then deleting and inserting from temp table? Is there any penalty for insert/delete comparing to update?

[skipped]

But remember that if you update or delete a row, removing it from an
index, the data will stay in that index until vacuum comes along.

Also, there's no point in doing a REINDEX after a VACUUM FULL;
vacuum full rebuilds all the indexes for you.

I was being desperate :)

I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is
part of a test deployment, there is no constant load, the only data that
is being written now is when I do these tests. Vacuuming should prune
all that dead stuff, and if it's absent, it's unclear where is the time
spent navigating/updating the table with 24 rows :)

I think you definitely have a problem with dead rows, as evidenced by the huge improvement VACUUM FULL made.

But it's not clear why (and not reasonable, IMHO, that) it wouldn't improve past current point.

Re: Help with slow table update

From

Igor Neyman

Date:

15 April 2015, 14:14:23

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Pawel Veselov
Sent: Tuesday, April 14, 2015 8:01 PM
To: Jim Nasby
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Help with slow table update

[skipped]

This is where using sets becomes really tedious, as Postgres severely lacks an upsert-like statement.

I don't think there are joins allowed in UPDATE statement, so I will need to use WITH query, right?

Also, I'm not sure how LEFT JOIN will help me isolate and insert "missed" entries...

Would it be OK to replace upsert part with merging into a temp table, then deleting and inserting from temp table? Is there any penalty for insert/delete comparing to update?

[skipped]

Yes, you can do UPDATE with joins (http://www.postgresql.org/docs/9.4/static/sql-update.html) like this:

UPDATE table1 A SET col1 = B.col2

FROM table2 B

WHERE A.col3 = B.col4;

Regards,

Igor Neyman

Re: Help with slow table update

From

Pawel Veselov

Date:

15 April 2015, 18:19:41

[skipped]

This is where using sets becomes really tedious, as Postgres severely lacks an upsert-like statement.
I don't think there are joins allowed in UPDATE statement, so I will need to use WITH query, right?
Also, I'm not sure how LEFT JOIN will help me isolate and insert "missed" entries...

[skipped]

Yes, you can do UPDATE with joins (http://www.postgresql.org/docs/9.4/static/sql-update.html) like this:

UPDATE table1 A SET col1 = B.col2
FROM table2 B
WHERE A.col3 = B.col4;

I meant using JOIN operator in the update. But it's still possible, though through WITH query.

Re: Help with slow table update

From

Pawel Veselov

Date:

17 April 2015, 22:24:38

[skipped]

But remember that if you update or delete a row, removing it from an
index, the data will stay in that index until vacuum comes along.

Also, there's no point in doing a REINDEX after a VACUUM FULL;
vacuum full rebuilds all the indexes for you.

I was being desperate :)

I still think there is something very wrong with this particular table.
First, I have production systems that employ this function on way larger
data set, and there is no problem (so far, but still). This machine is
part of a test deployment, there is no constant load, the only data that
is being written now is when I do these tests. Vacuuming should prune
all that dead stuff, and if it's absent, it's unclear where is the time
spent navigating/updating the table with 24 rows :)

I think you definitely have a problem with dead rows, as evidenced by the huge improvement VACUUM FULL made.

But it's not clear why (and not reasonable, IMHO, that) it wouldn't improve past current point.

What I should've done is 'VACUUM FULL VERBOSE'. Once I did, it told me there were 800k dead rows that can't be removed. After digging around I found some dangling prepared transactions, going back months. Once I threw those away, and re-vacuumed, things got back to normal.

Thanks for all your help and advice.

Re: Help with slow table update

From

Tim Uckun

Date:

20 April 2015, 02:53:36

On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov <pawel.veselov@gmail.com> wrote:

I found some dangling prepared transactions

How do you find and remove these?

Re: Help with slow table update

From

Jim Nasby

Date:

20 April 2015, 05:25:22

On 4/19/15 9:53 PM, Tim Uckun wrote:
>
> On Sat, Apr 18, 2015 at 10:24 AM, Pawel Veselov <pawel.veselov@gmail.com
> <mailto:pawel.veselov@gmail.com>> wrote:
>
>     I found some dangling prepared transactions
>
>
>
> How do you find and remove these?

SELECT * FROM pg_prepared_xacts;
ROLLBACK PREPARED xid;
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com