Thread: Slow update
Hello.
I have a daily process that synchronizes our reports database from our production databases. In the past few days, it happened a couple of times that an update query took around 7-8 hours to complete, which seems a bit excessive. This is the query:
UPDATE rb
SET service = b.service,
status = b.status,
has_notification = gateway_id NOT IN (4,101,102),
operator = COALESCE(
b.actual_target_network_id,
b.requested_target_network_id
)
FROM sms.billing b
WHERE b.time_arrived >= :date_start
AND rb.time_stamp >= :date_start
AND rb.delivered = 0
AND rb.sms_user = b.user_id
AND rb.reference = b.user_reference
AND OVERLAY( rb.msisdn placing '972' from 1 for 1 ) = b.msisdn
AND NOT mo_billed
AND system_id <> 6 -- Exclude Corporate, as it aleady has service/status
;
The variable ":date_start" is set to a date 3 days ago.
I ran explain for this query and it gave me this:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=21567.12..854759.82 rows=1 width=210)
Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
-> Bitmap Heap Scan on rb (cost=21546.02..23946.16 rows=819 width=198)
Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
Filter: ((NOT mo_billed) AND (system_id <> 6))
-> BitmapAnd (cost=21546.02..21546.02 rows=819 width=0)
-> Bitmap Index Scan on rb_delivered_ind (cost=0.00..1419.99 rows=45768 width=0)
Index Cond: (delivered = 0)
-> Bitmap Index Scan on rb_timestamp_ind (cost=0.00..20125.37 rows=188994 width=0)
Index Cond: ((time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on billing b (cost=21.10..1004.77 rows=351 width=49)
Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
Filter: ((b.time_arrived >= '2009-01-18 00:00:00'::timestamp without time zone) AND (b.time_arrived < '2009-01-21 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on billing_msisdn_sme_reference (cost=0.00..21.10 rows=351 width=0)
Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
I'm not an expert on reading plans, but it seems to me that it uses indices on both tables that participate in this query, so it shouldn't take such a long time.
The number of records in the table rb for the past three days is 386833. On the sms.billing table it seems to select the index on the msisdn and sme_reference fields and use it partially (only using the msisdn field). Looking at that table, the frequency of each value in the msisdn field is at most 17678 for the current data, where mostly it's a couple of thousands. How can this take so long?
Thanks,
Herouth
I have a daily process that synchronizes our reports database from our production databases. In the past few days, it happened a couple of times that an update query took around 7-8 hours to complete, which seems a bit excessive. This is the query:
UPDATE rb
SET service = b.service,
status = b.status,
has_notification = gateway_id NOT IN (4,101,102),
operator = COALESCE(
b.actual_target_network_id,
b.requested_target_network_id
)
FROM sms.billing b
WHERE b.time_arrived >= :date_start
AND rb.time_stamp >= :date_start
AND rb.delivered = 0
AND rb.sms_user = b.user_id
AND rb.reference = b.user_reference
AND OVERLAY( rb.msisdn placing '972' from 1 for 1 ) = b.msisdn
AND NOT mo_billed
AND system_id <> 6 -- Exclude Corporate, as it aleady has service/status
;
The variable ":date_start" is set to a date 3 days ago.
I ran explain for this query and it gave me this:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=21567.12..854759.82 rows=1 width=210)
Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
-> Bitmap Heap Scan on rb (cost=21546.02..23946.16 rows=819 width=198)
Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
Filter: ((NOT mo_billed) AND (system_id <> 6))
-> BitmapAnd (cost=21546.02..21546.02 rows=819 width=0)
-> Bitmap Index Scan on rb_delivered_ind (cost=0.00..1419.99 rows=45768 width=0)
Index Cond: (delivered = 0)
-> Bitmap Index Scan on rb_timestamp_ind (cost=0.00..20125.37 rows=188994 width=0)
Index Cond: ((time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on billing b (cost=21.10..1004.77 rows=351 width=49)
Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
Filter: ((b.time_arrived >= '2009-01-18 00:00:00'::timestamp without time zone) AND (b.time_arrived < '2009-01-21 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on billing_msisdn_sme_reference (cost=0.00..21.10 rows=351 width=0)
Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
I'm not an expert on reading plans, but it seems to me that it uses indices on both tables that participate in this query, so it shouldn't take such a long time.
The number of records in the table rb for the past three days is 386833. On the sms.billing table it seems to select the index on the msisdn and sme_reference fields and use it partially (only using the msisdn field). Looking at that table, the frequency of each value in the msisdn field is at most 17678 for the current data, where mostly it's a couple of thousands. How can this take so long?
Thanks,
Herouth
Hello,
- did you vacuum your tables recently ?
- What I miss in your query is a check for the rows that do not need to be udated:
AND NOT (service = b.service
AND status = b.status
AND has_notification = gateway_id NOT IN (4,101,102)
AND operator = COALESCE( b.actual_target_network_id, b.requested_target_network_id )
depending on the fraction of rows that are already up to date, the might fasten your process quite a lot...
Hope To Help,
Marc Mamin
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Herouth Maoz
Sent: Wednesday, January 21, 2009 10:30 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Slow update
I have a daily process that synchronizes our reports database from our production databases. In the past few days, it happened a couple of times that an update query took around 7-8 hours to complete, which seems a bit excessive. This is the query:
UPDATE rb
SET service = b.service,
status = b.status,
has_notification = gateway_id NOT IN (4,101,102),
operator = COALESCE(
b.actual_target_network_id,
b.requested_target_network_id
)
FROM sms.billing b
WHERE b.time_arrived >= :date_start
AND rb.time_stamp >= :date_start
AND rb.delivered = 0
AND rb.sms_user = b.user_id
AND rb.reference = b.user_reference
AND OVERLAY( rb.msisdn placing '972' from 1 for 1 ) = b.msisdn
AND NOT mo_billed
AND system_id <> 6 -- Exclude Corporate, as it aleady has service/status
;
The variable ":date_start" is set to a date 3 days ago.
I ran explain for this query and it gave me this:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=21567.12..854759.82 rows=1 width=210)
Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
-> Bitmap Heap Scan on rb (cost=21546.02..23946.16 rows=819 width=198)
Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
Filter: ((NOT mo_billed) AND (system_id <> 6))
-> BitmapAnd (cost=21546.02..21546.02 rows=819 width=0)
-> Bitmap Index Scan on rb_delivered_ind (cost=0.00..1419.99 rows=45768 width=0)
Index Cond: (delivered = 0)
-> Bitmap Index Scan on rb_timestamp_ind (cost=0.00..20125.37 rows=188994 width=0)
Index Cond: ((time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on billing b (cost=21.10..1004.77 rows=351 width=49)
Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
Filter: ((b.time_arrived >= '2009-01-18 00:00:00'::timestamp without time zone) AND (b.time_arrived < '2009-01-21 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on billing_msisdn_sme_reference (cost=0.00..21.10 rows=351 width=0)
Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
I'm not an expert on reading plans, but it seems to me that it uses indices on both tables that participate in this query, so it shouldn't take such a long time.
The number of records in the table rb for the past three days is 386833. On the sms.billing table it seems to select the index on the msisdn and sme_reference fields and use it partially (only using the msisdn field). Looking at that table, the frequency of each value in the msisdn field is at most 17678 for the current data, where mostly it's a couple of thousands. How can this take so long?
Thanks,
Herouth
Marc Mamin wrote:
I don't see why it would. As far as I know, the high saving in update time is done by using the indices. All the other conditions that are not on indices are all checked using a sequential scan on the rows that were brought from the index, so adding more conditions wouldn't make this a lot faster - maybe even slower because more comparisons are made.Hello,- did you vacuum your tables recently ?- What I miss in your query is a check for the rows that do not need to be udated:AND NOT (service = b.serviceAND status = b.statusAND has_notification = gateway_id NOT IN (4,101,102)AND operator = COALESCE( b.actual_target_network_id, b.requested_target_network_id )depending on the fraction of rows that are already up to date, the might fasten your process quite a lot...
In any case, the logic of the database is that the records that have delivered = 0 are always a subset of the records that are changed in this query, so querying on delivered=0 - which is an indexed query - actually make the above redundant.
Thanks for your response,
Herouth
Filip Rembiałkowski wrote: > > 1. which postgres version? 8.3.1 > 2. can you post results of EXPLAIN ANALYZE (please note it actually > executes the query)? > Well, if it executes the query it's a problem. I might be able to do so during the weekend, when I can play with the scripts and get away with failures, but of course there is less data in the tables then. Thanks, Herouth
2009/1/21 Herouth Maoz <herouth@unicell.co.il>
1. which postgres version?
2. can you post results of EXPLAIN ANALYZE (please note it actually executes the query)?
Hello.
I have a daily process that synchronizes our reports database from our production databases. In the past few days, it happened a couple of times that an update query took around 7-8 hours to complete, which seems a bit excessive. This is the query:
UPDATE rb
SET service = b.service,
status = b.status,
has_notification = gateway_id NOT IN (4,101,102),
operator = COALESCE(
b.actual_target_network_id,
b.requested_target_network_id
)
FROM sms.billing b
WHERE b.time_arrived >= :date_start
AND rb.time_stamp >= :date_start
AND rb.delivered = 0
AND rb.sms_user = b.user_id
AND rb.reference = b.user_reference
AND OVERLAY( rb.msisdn placing '972' from 1 for 1 ) = b.msisdn
AND NOT mo_billed
AND system_id <> 6 -- Exclude Corporate, as it aleady has service/status
;
The variable ":date_start" is set to a date 3 days ago.
I ran explain for this query and it gave me this:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=21567.12..854759.82 rows=1 width=210)
Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
-> Bitmap Heap Scan on rb (cost=21546.02..23946.16 rows=819 width=198)
Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
Filter: ((NOT mo_billed) AND (system_id <> 6))
-> BitmapAnd (cost=21546.02..21546.02 rows=819 width=0)
-> Bitmap Index Scan on rb_delivered_ind (cost=0.00..1419.99 rows=45768 width=0)
Index Cond: (delivered = 0)
-> Bitmap Index Scan on rb_timestamp_ind (cost=0.00..20125.37 rows=188994 width=0)
Index Cond: ((time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on billing b (cost=21.10..1004.77 rows=351 width=49)
Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
Filter: ((b.time_arrived >= '2009-01-18 00:00:00'::timestamp without time zone) AND (b.time_arrived < '2009-01-21 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on billing_msisdn_sme_reference (cost=0.00..21.10 rows=351 width=0)
Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
I'm not an expert on reading plans, but it seems to me that it uses indices on both tables that participate in this query, so it shouldn't take such a long time.
The number of records in the table rb for the past three days is 386833. On the sms.billing table it seems to select the index on the msisdn and sme_reference fields and use it partially (only using the msisdn field). Looking at that table, the frequency of each value in the msisdn field is at most 17678 for the current data, where mostly it's a couple of thousands. How can this take so long?
1. which postgres version?
2. can you post results of EXPLAIN ANALYZE (please note it actually executes the query)?
--
Filip Rembiałkowski
On Wed, Jan 21, 2009 at 12:55 PM, Herouth Maoz <herouth@unicell.co.il> wrote: > Well, if it executes the query it's a problem. I might be able to do so > during the weekend, when I can play with the scripts and get away with > failures, but of course there is less data in the tables then. > you should seirously think about having test machine -- GJ
On Wed, Jan 21, 2009 at 02:55:00PM +0200, Herouth Maoz wrote: > Filip Rembiakowski wrote: > > 2. can you post results of EXPLAIN ANALYZE (please note it actually > > executes the query)? > > Well, if it executes the query it's a problem. You can wrap an EXPLAIN ANALYSE up in BEGIN;...ROLLBACK;. That way PG won't record any of the changes permanently. More explanation is available in the docs: http://www.postgresql.org/docs/current/static/sql-explain.html -- Sam http://samason.me.uk/
Grzegorz Jaśkiewicz wrote:
I have a test machine - but the data in there is test data, and it's a slower machine. A testing environment is good for development, but can hardly be used to really simulate the production machine for performance.On Wed, Jan 21, 2009 at 12:55 PM, Herouth Maoz <herouth@unicell.co.il> wrote:Well, if it executes the query it's a problem. I might be able to do so during the weekend, when I can play with the scripts and get away with failures, but of course there is less data in the tables then.you should seirously think about having test machine
Herouth
> I don't see why it would
This may reduce I/O activity and reduce the vacuum activity on this table.
Here a small example:
insert into update_test select * from generate_series (1,100000)
vacuum full verbose update_test
-> INFO: "update_test": found 0 removable, 100000 nonremovable row versions in 393 pages
--now update one row:
-- or in you case, only the rows that would get modified (my query proposal)
-- or in you case, only the rows that would get modified (my query proposal)
update update_test set a=1 where a=1;
vacuum full verbose update_test
-> INFO: "update_test": found 1 removable, 100000 nonremovable row versions in 393 pages
-> INFO: "update_test": found 1 removable, 100000 nonremovable row versions in 393 pages
--update all rows
-- or in your case, all rows that match your update query
update update_test set a=a
vacuum full verbose update_test
-- or in your case, all rows that match your update query
update update_test set a=a
vacuum full verbose update_test
-> INFO: "update_test": found 100000 removable, 100000 nonremovable row versions in 785 pages
Adding elements in the where clause will slow down the "recheck" operations, but your indexes will probably be used as in your query.
While limiting the number of rows being updated, you will reduce I/O activity and reduce the need of vacuuming your table...
This approach may be superfluous if the extra conditions do not reduce the number of updated rows significantly...
This approach may be superfluous if the extra conditions do not reduce the number of updated rows significantly...
cheers,
Marc
From: Herouth Maoz [mailto:herouth@unicell.co.il]
Sent: Wednesday, January 21, 2009 12:50 PM
To: Marc Mamin
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Slow update
Marc Mamin wrote:
Hello,
- did you vacuum your tables recently ?
- What I miss in your query is a check for the rows that do not need to be udated:
AND NOT (service = b.service
AND status = b.status
AND has_notification = gateway_id NOT IN (4,101,102)
AND operator = COALESCE( b.actual_target_network_id, b.requested_target_network_id )
depending on the fraction of rows that are already up to date, the might fasten your process quite a lot...
I don't see why it would. As far as I know, the high saving in update time is done by using the indices. All the other conditions that are not on indices are all checked using a sequential scan on the rows that were brought from the index, so adding more conditions wouldn't make this a lot faster - maybe even slower because more comparisons are made.
In any case, the logic of the database is that the records that have delivered = 0 are always a subset of the records that are changed in this query, so querying on delivered=0 - which is an indexed query - actually make the above redundant.
Thanks for your response,
Herouth
On Wed, Jan 21, 2009 at 9:36 AM, Herouth Maoz <herouth@unicell.co.il> wrote: > I have a test machine - but the data in there is test data, and it's a > slower machine. A testing environment is good for development, but can > hardly be used to really simulate the production machine for performance. > What the other poster was referring to was a staging system. A staging system has the same amount of data and type of servers, and is necessary if you really want to performance test something before you do it on production.
Filip Rembiałkowski wrote:
I'm sorry it took some time to answer these questions - as I explained, I needed an opportunity to make the change in our production machine since the data in the development machine wouldn't do at all.2009/1/21 Herouth Maoz <herouth@unicell.co.il>Hello.
I have a daily process that synchronizes our reports database from our production databases. In the past few days, it happened a couple of times that an update query took around 7-8 hours to complete, which seems a bit excessive. This is the query:
UPDATE rb
SET service = b.service,
status = b.status,
has_notification = gateway_id NOT IN (4,101,102),
operator = COALESCE(
b.actual_target_network_id,
b.requested_target_network_id
)
FROM sms.billing b
WHERE b.time_arrived >= :date_start
AND rb.time_stamp >= :date_start
AND rb.delivered = 0
AND rb.sms_user = b.user_id
AND rb.reference = b.user_reference
AND OVERLAY( rb.msisdn placing '972' from 1 for 1 ) = b.msisdn
AND NOT mo_billed
AND system_id <> 6 -- Exclude Corporate, as it aleady has service/status
;
The variable ":date_start" is set to a date 3 days ago.
I ran explain for this query and it gave me this:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=21567.12..854759.82 rows=1 width=210)
Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
-> Bitmap Heap Scan on rb (cost=21546.02..23946.16 rows=819 width=198)
Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
Filter: ((NOT mo_billed) AND (system_id <> 6))
-> BitmapAnd (cost=21546.02..21546.02 rows=819 width=0)
-> Bitmap Index Scan on rb_delivered_ind (cost=0.00..1419.99 rows=45768 width=0)
Index Cond: (delivered = 0)
-> Bitmap Index Scan on rb_timestamp_ind (cost=0.00..20125.37 rows=188994 width=0)
Index Cond: ((time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on billing b (cost=21.10..1004.77 rows=351 width=49)
Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
Filter: ((b.time_arrived >= '2009-01-18 00:00:00'::timestamp without time zone) AND (b.time_arrived < '2009-01-21 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on billing_msisdn_sme_reference (cost=0.00..21.10 rows=351 width=0)
Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
I'm not an expert on reading plans, but it seems to me that it uses indices on both tables that participate in this query, so it shouldn't take such a long time.
The number of records in the table rb for the past three days is 386833. On the sms.billing table it seems to select the index on the msisdn and sme_reference fields and use it partially (only using the msisdn field). Looking at that table, the frequency of each value in the msisdn field is at most 17678 for the current data, where mostly it's a couple of thousands. How can this take so long?
1. which postgres version?
2. can you post results of EXPLAIN ANALYZE (please note it actually executes the query)?
So the answer is PostgreSQL v. 8.3.1. The output of explain analyze is at the end of this message. Note that the run started at 04:20:50, and finished at 11:29:30. Also, a full vacuum was ran on the entire database a day before, and I run analyze on each table whenever there is a bulk insert or update to it - and the only way data comes in is in bulks.
Could the delay have anything to do with disk space? The device where the database is located has around 3G left (98% full). If that's the problem, I suppose I can add a table space on another device and move the four largest tables into it, but if it has nothing to do with it, i'd rather stay on the one device. The machine has 4G of RAM.
I hope someone can clue me in based on the results of explain analyze.
Herouth
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
------------------------------------------
Nested Loop (cost=27968.36..645009.77 rows=1 width=210) (actual time=57128.573..25674240.262 rows=166844 loops=1
)
Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
-> Bitmap Heap Scan on rb (cost=27943.66..29556.78 rows=549 width=198) (actual time=37105.248..54856.430 rows
=151332 loops=1)
Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-02-06 00:00:00'::timestamp without time zone))
Filter: ((NOT mo_billed) AND (system_id <> 6))
-> BitmapAnd (cost=27943.66..27943.66 rows=549 width=0) (actual time=37049.818..37049.818 rows=0 loops=
1)
-> Bitmap Index Scan on rb_delivered_ind (cost=0.00..884.99 rows=24367 width=0) (actual time=8147
.209..8147.209 rows=871142 loops=1)
Index Cond: (delivered = 0)
-> Bitmap Index Scan on rb_timestamp_ind (cost=0.00..27058.14 rows=235424 width=0) (actual time=2
8884.578..28884.578 rows=1333295 loops=1)
Index Cond: (time_stamp >= '2009-02-06 00:00:00'::timestamp without time zone)
-> Bitmap Heap Scan on billing b (cost=24.70..1110.35 rows=389 width=49) (actual time=165.740..169.260 rows=1
0 loops=151332)
Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((
rb.msisdn)::text, 2)))
Filter: (b.time_arrived >= '2009-02-06 00:00:00'::timestamp without time zone)
-> Bitmap Index Scan on billing_msisdn_sme_reference (cost=0.00..24.70 rows=389 width=0) (actual time=2
1.418..21.418 rows=252 loops=151332)
Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substrin
g"((rb.msisdn)::text, 2)))
Total runtime: 25720210.772 ms
(16 rows)
On Mon, Feb 9, 2009 at 12:50 PM, Herouth Maoz <herouth@unicell.co.il> wrote: > I hope someone can clue me in based on the results of explain analyze. Did you have a chance to run vmstat on it, and post it here ? Maybe - if db resides on the same disc with everything else, something (ab)uses that much io, and it has to wait. Also, I don't know - but personaly I didn't like the line in explain: -> Bitmap Index Scan on billing_msisdn_sme_reference (cost=0.00..24.70 rows=389 width=0) (actual time=2 1.418..21.418 rows=252 loops=151332) Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substrin g"((rb.msisdn)::text, 2))) But the cost is next to none, so that's not it. -- GJ
Grzegorz Jaśkiewicz wrote:
I haven't ran it, but it's worth mentioning that the data is on a separate device than the system/users. Also, the system is used only for the reporting system. Other than PostgreSQL and the cron job that runs this query, nothing runs on the system at night. Reports start being produced around 7:00 AM which should give the query ample time to finish (almost 3 hours) before any contention starts.On Mon, Feb 9, 2009 at 12:50 PM, Herouth Maoz <herouth@unicell.co.il> wrote:I hope someone can clue me in based on the results of explain analyze.Did you have a chance to run vmstat on it, and post it here ? Maybe - if db resides on the same disc with everything else, something (ab)uses that much io, and it has to wait.
Herouth
On Feb 9, 2009, at 2:07 PM, Grzegorz Jaśkiewicz wrote: > On Mon, Feb 9, 2009 at 12:50 PM, Herouth Maoz > <herouth@unicell.co.il> wrote: >> I hope someone can clue me in based on the results of explain >> analyze. > > Did you have a chance to run vmstat on it, and post it here ? Maybe - > if db resides on the same disc with everything else, something > (ab)uses that much io, and it has to wait. > Also, I don't know - but personaly I didn't like the line in explain: > > -> Bitmap Index Scan on billing_msisdn_sme_reference > (cost=0.00..24.70 rows=389 width=0) (actual time=2 > 1.418..21.418 rows=252 loops=151332) > Index Cond: ((b.msisdn)::text = > (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substrin > g"((rb.msisdn)::text, 2))) > > But the cost is next to none, so that's not it. Actually, it's inside a nested loop and if I read correctly it gets looped over 151332 times. That means it takes 151332 * (21.418 - 1.418) = 3026640 ms, which is almost 12% of the total time. The biggie seems to be the bitmap heap scan on rb though. The row estimates for that one are way off (estimated 549 rows vs actual 151332). Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4991338b747034711712127!
Alban Hertroys wrote: > On Feb 9, 2009, at 2:07 PM, Grzegorz Jaśkiewicz wrote: > >> On Mon, Feb 9, 2009 at 12:50 PM, Herouth Maoz <herouth@unicell.co.il> >> wrote: >>> I hope someone can clue me in based on the results of explain analyze. >> >> Did you have a chance to run vmstat on it, and post it here ? Maybe - >> if db resides on the same disc with everything else, something >> (ab)uses that much io, and it has to wait. >> Also, I don't know - but personaly I didn't like the line in explain: >> >> -> Bitmap Index Scan on billing_msisdn_sme_reference >> (cost=0.00..24.70 rows=389 width=0) (actual time=2 >> 1.418..21.418 rows=252 loops=151332) >> Index Cond: ((b.msisdn)::text = >> (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substrin >> g"((rb.msisdn)::text, 2))) >> >> But the cost is next to none, so that's not it. > > > Actually, it's inside a nested loop and if I read correctly it gets > looped over 151332 times. That means it takes 151332 * (21.418 - > 1.418) = 3026640 ms, which is almost 12% of the total time. > > The biggie seems to be the bitmap heap scan on rb though. The row > estimates for that one are way off (estimated 549 rows vs actual 151332). > > Alban Hertroys > To be quite honest, I'm not sure exactly how to read this plan, and what the row values mean. The issue here is that sometimes the query works in reasonable time, and sometimes it takes half a day. Of course, this may be because the data size is different, but I don't know which part of the plan tells me that. What do rows vs. loops signify? How can the estimate be so far off if I'm running analyze on all the tables right after I make any big updates to them? I find it hard to believe that the problem is with the complex comparison caused by the different formats of the fields in rb and in billing. This should add a constant multiplier to the time it takes to run the query, but not cause the query to run one time in 5 minutes, and the next day in 12 hours! Thanks for the assistance. Herouth Here is the plan collected tonight, which took a reasonable amount of time (5 minutes). It seems to be a totally different plan, isn't it?: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=1157750.08..1167132.31 rows=1 width=210) (actual time=238247.983..239980.264 rows=111676 loops=1) Merge Cond: ((rb.reference = b.user_reference) AND ((rb.sms_user)::text = (b.user_id)::text) AND (((("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2))) = (b.msisdn)::text)) -> Sort (cost=31137.76..31141.31 rows=1423 width=198) (actual time=117858.431..117932.544 rows=111676 loops=1) Sort Key: rb.reference, rb.sms_user, ((("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2))) Sort Method: quicksort Memory: 30941kB -> Bitmap Heap Scan on rb (cost=26968.95..31063.23 rows=1423 width=198) (actual time=113615.187..116935.502 rows=111676 loops=1) Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-02-12 00:00:00'::timestamp without time zone)) Filter: ((NOT mo_billed) AND (system_id <> 6)) -> BitmapAnd (cost=26968.95..26968.95 rows=1423 width=0) (actual time=113454.761..113454.761 rows=0 loops=1) -> Bitmap Index Scan on rb_delivered_ind (cost=0.00..2522.46 rows=69896 width=0) (actual time=9358.397..9358.397 rows=150651 loops=1) Index Cond: (delivered = 0) -> Bitmap Index Scan on rb_timestamp_ind (cost=0.00..24445.53 rows=213475 width=0) (actual time=104091.620..104091.620 rows=303308 loops=1) Index Cond: (time_stamp >= '2009-02-12 00:00:00'::timestamp without time zone) -> Sort (cost=1117952.26..1120779.49 rows=1130889 width=50) (actual time=119485.709..120263.045 rows=756135 loops=1) Sort Key: b.user_reference, b.user_id, b.msisdn Sort Method: external sort Disk: 60976kB -> Bitmap Heap Scan on billing b (cost=36754.98..1004246.88 rows=1130889 width=50) (actual time=24409.448..101034.765 rows=896474 loops=1) Recheck Cond: (time_arrived >= '2009-02-12 00:00:00'::timestamp without time zone) -> Bitmap Index Scan on billing_time_arrived (cost=0.00..36472.26 rows=1130889 width=0) (actual time=23936.245..23936.245 rows=1166881 loops=1) Index Cond: (time_arrived >= '2009-02-12 00:00:00'::timestamp without time zone) Total runtime: 307958.152 ms (21 rows)