Thread: Slow update

Slow update

From
Herouth Maoz
Date:
Hello.

I have a daily process that synchronizes our reports database from our production databases. In the past few days, it happened a couple of times that an update query took around 7-8 hours to complete, which seems a bit excessive. This is the query:

    UPDATE rb
    SET service = b.service,
    status = b.status,
    has_notification = gateway_id NOT IN (4,101,102),
    operator = COALESCE(
        b.actual_target_network_id,
        b.requested_target_network_id
    )
    FROM sms.billing b
    WHERE b.time_arrived >= :date_start
    AND   rb.time_stamp >= :date_start
    AND   rb.delivered = 0
    AND   rb.sms_user = b.user_id
    AND   rb.reference = b.user_reference
    AND   OVERLAY( rb.msisdn placing '972' from 1 for 1 ) = b.msisdn
    AND   NOT mo_billed
    AND   system_id <> 6  -- Exclude Corporate, as it aleady has service/status
    ;

The variable ":date_start" is set to a date 3 days ago.

I ran explain for this query and it gave me this:


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=21567.12..854759.82 rows=1 width=210)
   Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
   ->  Bitmap Heap Scan on rb  (cost=21546.02..23946.16 rows=819 width=198)
         Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
         Filter: ((NOT mo_billed) AND (system_id <> 6))
         ->  BitmapAnd  (cost=21546.02..21546.02 rows=819 width=0)
               ->  Bitmap Index Scan on rb_delivered_ind  (cost=0.00..1419.99 rows=45768 width=0)
                     Index Cond: (delivered = 0)
               ->  Bitmap Index Scan on rb_timestamp_ind  (cost=0.00..20125.37 rows=188994 width=0)
                     Index Cond: ((time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
   ->  Bitmap Heap Scan on billing b  (cost=21.10..1004.77 rows=351 width=49)
         Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
         Filter: ((b.time_arrived >= '2009-01-18 00:00:00'::timestamp without time zone) AND (b.time_arrived < '2009-01-21 00:00:00'::timestamp without time zone))
         ->  Bitmap Index Scan on billing_msisdn_sme_reference  (cost=0.00..21.10 rows=351 width=0)
               Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))

I'm not an expert on reading plans, but it seems to me that it uses indices on both tables that participate in this query, so it shouldn't take such a long time.

The number of records in the table rb for the past three days is 386833. On the sms.billing table it seems to select the index on the msisdn and sme_reference fields and use it partially (only using the msisdn field). Looking at that table, the frequency of each value in the msisdn field is at most 17678 for the current data, where mostly it's a couple of thousands. How can this take so long?

Thanks,
Herouth

Re: Slow update

From
"Marc Mamin"
Date:
Hello,
 
- did you vacuum your tables recently ?
 
- What I miss in your query is a check for the rows that do not need to be udated:
 
AND NOT (service = b.service
               AND status = b.status
              AND has_notification = gateway_id NOT IN (4,101,102)
              AND operator = COALESCE( b.actual_target_network_id,  b.requested_target_network_id   )
 
 
depending on the fraction of rows that are already up to date, the might fasten your process quite a lot...
 
 
Hope To Help,
 
Marc Mamin

 
 
 


From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Herouth Maoz
Sent: Wednesday, January 21, 2009 10:30 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Slow update

Hello.

I have a daily process that synchronizes our reports database from our production databases. In the past few days, it happened a couple of times that an update query took around 7-8 hours to complete, which seems a bit excessive. This is the query:

    UPDATE rb
    SET service = b.service,
    status = b.status,
    has_notification = gateway_id NOT IN (4,101,102),
    operator = COALESCE(
        b.actual_target_network_id,
        b.requested_target_network_id
    )
    FROM sms.billing b
    WHERE b.time_arrived >= :date_start
    AND   rb.time_stamp >= :date_start
    AND   rb.delivered = 0
    AND   rb.sms_user = b.user_id
    AND   rb.reference = b.user_reference
    AND   OVERLAY( rb.msisdn placing '972' from 1 for 1 ) = b.msisdn
    AND   NOT mo_billed
    AND   system_id <> 6  -- Exclude Corporate, as it aleady has service/status
    ;

The variable ":date_start" is set to a date 3 days ago.

I ran explain for this query and it gave me this:


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=21567.12..854759.82 rows=1 width=210)
   Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
   ->  Bitmap Heap Scan on rb  (cost=21546.02..23946.16 rows=819 width=198)
         Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
         Filter: ((NOT mo_billed) AND (system_id <> 6))
         ->  BitmapAnd  (cost=21546.02..21546.02 rows=819 width=0)
               ->  Bitmap Index Scan on rb_delivered_ind  (cost=0.00..1419.99 rows=45768 width=0)
                     Index Cond: (delivered = 0)
               ->  Bitmap Index Scan on rb_timestamp_ind  (cost=0.00..20125.37 rows=188994 width=0)
                     Index Cond: ((time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
   ->  Bitmap Heap Scan on billing b  (cost=21.10..1004.77 rows=351 width=49)
         Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
         Filter: ((b.time_arrived >= '2009-01-18 00:00:00'::timestamp without time zone) AND (b.time_arrived < '2009-01-21 00:00:00'::timestamp without time zone))
         ->  Bitmap Index Scan on billing_msisdn_sme_reference  (cost=0.00..21.10 rows=351 width=0)
               Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))

I'm not an expert on reading plans, but it seems to me that it uses indices on both tables that participate in this query, so it shouldn't take such a long time.

The number of records in the table rb for the past three days is 386833. On the sms.billing table it seems to select the index on the msisdn and sme_reference fields and use it partially (only using the msisdn field). Looking at that table, the frequency of each value in the msisdn field is at most 17678 for the current data, where mostly it's a couple of thousands. How can this take so long?

Thanks,
Herouth

Re: Slow update

From
Herouth Maoz
Date:

Marc Mamin wrote:

Hello,
 
- did you vacuum your tables recently ?
 
- What I miss in your query is a check for the rows that do not need to be udated:
 
AND NOT (service = b.service
               AND status = b.status
              AND has_notification = gateway_id NOT IN (4,101,102)
              AND operator = COALESCE( b.actual_target_network_id,  b.requested_target_network_id   )
 
 
depending on the fraction of rows that are already up to date, the might fasten your process quite a lot...
I don't see why it would. As far as I know, the high saving in update time is done by using the indices. All the other conditions that are not on indices are all checked using a sequential scan on the rows that were brought from the index, so adding more conditions wouldn't make this a lot faster - maybe even slower because more comparisons are made.

In any case, the logic of the database is that the records that have delivered = 0 are always a subset of the records that are changed in this query, so querying on delivered=0 - which is an indexed query - actually make the above redundant.

Thanks for your response,
Herouth

Re: Slow update

From
Herouth Maoz
Date:
Filip Rembiałkowski wrote:

>
> 1. which postgres version?
8.3.1
> 2. can you post results of EXPLAIN ANALYZE (please note it actually
> executes the query)?
>
Well, if it executes the query it's a problem. I might be able to do so
during the weekend, when I can play with the scripts and get away with
failures, but of course there is less data in the tables then.

Thanks,
Herouth

Re: Slow update

From
Filip Rembiałkowski
Date:

2009/1/21 Herouth Maoz <herouth@unicell.co.il>
Hello.

I have a daily process that synchronizes our reports database from our production databases. In the past few days, it happened a couple of times that an update query took around 7-8 hours to complete, which seems a bit excessive. This is the query:

    UPDATE rb
    SET service = b.service,
    status = b.status,
    has_notification = gateway_id NOT IN (4,101,102),
    operator = COALESCE(
        b.actual_target_network_id,
        b.requested_target_network_id
    )
    FROM sms.billing b
    WHERE b.time_arrived >= :date_start
    AND   rb.time_stamp >= :date_start
    AND   rb.delivered = 0
    AND   rb.sms_user = b.user_id
    AND   rb.reference = b.user_reference
    AND   OVERLAY( rb.msisdn placing '972' from 1 for 1 ) = b.msisdn
    AND   NOT mo_billed
    AND   system_id <> 6  -- Exclude Corporate, as it aleady has service/status
    ;

The variable ":date_start" is set to a date 3 days ago.

I ran explain for this query and it gave me this:


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=21567.12..854759.82 rows=1 width=210)
   Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
   ->  Bitmap Heap Scan on rb  (cost=21546.02..23946.16 rows=819 width=198)
         Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
         Filter: ((NOT mo_billed) AND (system_id <> 6))
         ->  BitmapAnd  (cost=21546.02..21546.02 rows=819 width=0)
               ->  Bitmap Index Scan on rb_delivered_ind  (cost=0.00..1419.99 rows=45768 width=0)
                     Index Cond: (delivered = 0)
               ->  Bitmap Index Scan on rb_timestamp_ind  (cost=0.00..20125.37 rows=188994 width=0)
                     Index Cond: ((time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
   ->  Bitmap Heap Scan on billing b  (cost=21.10..1004.77 rows=351 width=49)
         Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
         Filter: ((b.time_arrived >= '2009-01-18 00:00:00'::timestamp without time zone) AND (b.time_arrived < '2009-01-21 00:00:00'::timestamp without time zone))
         ->  Bitmap Index Scan on billing_msisdn_sme_reference  (cost=0.00..21.10 rows=351 width=0)
               Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))

I'm not an expert on reading plans, but it seems to me that it uses indices on both tables that participate in this query, so it shouldn't take such a long time.

The number of records in the table rb for the past three days is 386833. On the sms.billing table it seems to select the index on the msisdn and sme_reference fields and use it partially (only using the msisdn field). Looking at that table, the frequency of each value in the msisdn field is at most 17678 for the current data, where mostly it's a couple of thousands. How can this take so long?

1. which postgres version?
2. can you post results of EXPLAIN ANALYZE (please note it actually executes the query)? 




--
Filip Rembiałkowski

Re: Slow update

From
Grzegorz Jaśkiewicz
Date:
On Wed, Jan 21, 2009 at 12:55 PM, Herouth Maoz <herouth@unicell.co.il> wrote:
> Well, if it executes the query it's a problem. I might be able to do so
> during the weekend, when I can play with the scripts and get away with
> failures, but of course there is less data in the tables then.
>

you should seirously think about having test machine

--
GJ

Re: Slow update

From
Sam Mason
Date:
On Wed, Jan 21, 2009 at 02:55:00PM +0200, Herouth Maoz wrote:
> Filip Rembiakowski wrote:
> > 2. can you post results of EXPLAIN ANALYZE (please note it actually
> > executes the query)?
>
> Well, if it executes the query it's a problem.

You can wrap an EXPLAIN ANALYSE up in BEGIN;...ROLLBACK;.  That way
PG won't record any of the changes permanently.  More explanation is
available in the docs:

  http://www.postgresql.org/docs/current/static/sql-explain.html

--
  Sam  http://samason.me.uk/

Re: Slow update

From
Herouth Maoz
Date:

Grzegorz Jaśkiewicz wrote:

On Wed, Jan 21, 2009 at 12:55 PM, Herouth Maoz <herouth@unicell.co.il> wrote: 
Well, if it executes the query it's a problem. I might be able to do so
during the weekend, when I can play with the scripts and get away with
failures, but of course there is less data in the tables then.
   
you should seirously think about having test machine
 
I have a test machine - but the data in there is test data, and it's a slower machine. A testing environment is good for development, but can hardly be used to really simulate the production machine for performance.

Herouth

Re: Slow update

From
"Marc Mamin"
Date:
> I don't see why it would
 
This may reduce I/O activity and reduce the vacuum activity on this table.
 
Here a small example:

insert into update_test select * from generate_series (1,100000)
vacuum full verbose update_test
-> INFO:  "update_test": found 0 removable, 100000 nonremovable row versions in 393 pages
 
--now update one row:
-- or in you case, only the rows that would get modified (my query proposal)
 
update update_test set a=1 where a=1;
 
vacuum full verbose update_test
-> INFO:  "update_test": found 1 removable, 100000 nonremovable row versions in 393 pages
 
--update all rows
-- or in your case, all rows that match your update query
update update_test set a=a
vacuum full verbose update_test
 
-> INFO:  "update_test": found 100000 removable, 100000 nonremovable row versions in 785 pages
 

Adding elements in the where clause will slow down the "recheck" operations, but your indexes will probably be used as in your query.
 
While limiting the number of rows being updated, you will reduce I/O activity and reduce the need of vacuuming your table...
This approach may be superfluous if the extra conditions do not reduce the number of updated rows significantly...
 
 
 
cheers,
 
Marc
 


From: Herouth Maoz [mailto:herouth@unicell.co.il]
Sent: Wednesday, January 21, 2009 12:50 PM
To: Marc Mamin
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Slow update

Marc Mamin wrote:

Hello,
 
- did you vacuum your tables recently ?
 
- What I miss in your query is a check for the rows that do not need to be udated:
 
AND NOT (service = b.service
               AND status = b.status
              AND has_notification = gateway_id NOT IN (4,101,102)
              AND operator = COALESCE( b.actual_target_network_id,  b.requested_target_network_id   )
 
 
depending on the fraction of rows that are already up to date, the might fasten your process quite a lot...
I don't see why it would. As far as I know, the high saving in update time is done by using the indices. All the other conditions that are not on indices are all checked using a sequential scan on the rows that were brought from the index, so adding more conditions wouldn't make this a lot faster - maybe even slower because more comparisons are made.

In any case, the logic of the database is that the records that have delivered = 0 are always a subset of the records that are changed in this query, so querying on delivered=0 - which is an indexed query - actually make the above redundant.

Thanks for your response,
Herouth

Re: Slow update

From
Scott Marlowe
Date:
On Wed, Jan 21, 2009 at 9:36 AM, Herouth Maoz <herouth@unicell.co.il> wrote:
> I have a test machine - but the data in there is test data, and it's a
> slower machine. A testing environment is good for development, but can
> hardly be used to really simulate the production machine for performance.
>

What the other poster was referring to was a staging system.  A
staging system has the same amount of data and type of servers, and is
necessary if you really want to performance test something before you
do it on production.

Re: Slow update

From
Herouth Maoz
Date:

Filip Rembiałkowski wrote:


2009/1/21 Herouth Maoz <herouth@unicell.co.il>
Hello.

I have a daily process that synchronizes our reports database from our production databases. In the past few days, it happened a couple of times that an update query took around 7-8 hours to complete, which seems a bit excessive. This is the query:

    UPDATE rb
    SET service = b.service,
    status = b.status,
    has_notification = gateway_id NOT IN (4,101,102),
    operator = COALESCE(
        b.actual_target_network_id,
        b.requested_target_network_id
    )
    FROM sms.billing b
    WHERE b.time_arrived >= :date_start
    AND   rb.time_stamp >= :date_start
    AND   rb.delivered = 0
    AND   rb.sms_user = b.user_id
    AND   rb.reference = b.user_reference
    AND   OVERLAY( rb.msisdn placing '972' from 1 for 1 ) = b.msisdn
    AND   NOT mo_billed
    AND   system_id <> 6  -- Exclude Corporate, as it aleady has service/status
    ;

The variable ":date_start" is set to a date 3 days ago.

I ran explain for this query and it gave me this:


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=21567.12..854759.82 rows=1 width=210)
   Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
   ->  Bitmap Heap Scan on rb  (cost=21546.02..23946.16 rows=819 width=198)
         Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
         Filter: ((NOT mo_billed) AND (system_id <> 6))
         ->  BitmapAnd  (cost=21546.02..21546.02 rows=819 width=0)
               ->  Bitmap Index Scan on rb_delivered_ind  (cost=0.00..1419.99 rows=45768 width=0)
                     Index Cond: (delivered = 0)
               ->  Bitmap Index Scan on rb_timestamp_ind  (cost=0.00..20125.37 rows=188994 width=0)
                     Index Cond: ((time_stamp >= '2009-01-18 00:00:00'::timestamp without time zone) AND (time_stamp < '2009-01-21 00:00:00'::timestamp without time zone))
   ->  Bitmap Heap Scan on billing b  (cost=21.10..1004.77 rows=351 width=49)
         Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))
         Filter: ((b.time_arrived >= '2009-01-18 00:00:00'::timestamp without time zone) AND (b.time_arrived < '2009-01-21 00:00:00'::timestamp without time zone))
         ->  Bitmap Index Scan on billing_msisdn_sme_reference  (cost=0.00..21.10 rows=351 width=0)
               Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((rb.msisdn)::text, 2)))

I'm not an expert on reading plans, but it seems to me that it uses indices on both tables that participate in this query, so it shouldn't take such a long time.

The number of records in the table rb for the past three days is 386833. On the sms.billing table it seems to select the index on the msisdn and sme_reference fields and use it partially (only using the msisdn field). Looking at that table, the frequency of each value in the msisdn field is at most 17678 for the current data, where mostly it's a couple of thousands. How can this take so long?

1. which postgres version?
2. can you post results of EXPLAIN ANALYZE (please note it actually executes the query)? 
I'm sorry it took some time to answer these questions - as I explained, I needed an opportunity to make the change in our production machine since the data in the development machine wouldn't do at all.

So the answer is PostgreSQL v. 8.3.1. The output of explain analyze is at the end of this message. Note that the run started at 04:20:50, and finished at 11:29:30. Also, a full vacuum was ran on the entire database a day before, and I run analyze on each table whenever there is a bulk insert or update to it - and the only way data comes in is in bulks.

Could the delay have anything to do with disk space? The device where the database is located has around 3G left (98% full). If that's the problem, I suppose I can add a table space on another device and move the four largest tables into it, but if it has nothing to do with it, i'd rather stay on the one device. The machine has 4G of RAM.

I hope someone can clue me in based on the results of explain analyze.

Herouth



                                                                         QUERY PLAN

------------------------------------------------------------------------------------------------------------------
------------------------------------------
 Nested Loop  (cost=27968.36..645009.77 rows=1 width=210) (actual time=57128.573..25674240.262 rows=166844 loops=1
)
   Join Filter: (((rb.sms_user)::text = (b.user_id)::text) AND (rb.reference = b.user_reference))
   ->  Bitmap Heap Scan on rb  (cost=27943.66..29556.78 rows=549 width=198) (actual time=37105.248..54856.430 rows
=151332 loops=1)
         Recheck Cond: ((delivered = 0) AND (time_stamp >= '2009-02-06 00:00:00'::timestamp without time zone))
         Filter: ((NOT mo_billed) AND (system_id <> 6))
         ->  BitmapAnd  (cost=27943.66..27943.66 rows=549 width=0) (actual time=37049.818..37049.818 rows=0 loops=
1)
               ->  Bitmap Index Scan on rb_delivered_ind  (cost=0.00..884.99 rows=24367 width=0) (actual time=8147
.209..8147.209 rows=871142 loops=1)
                     Index Cond: (delivered = 0)
               ->  Bitmap Index Scan on rb_timestamp_ind  (cost=0.00..27058.14 rows=235424 width=0) (actual time=2
8884.578..28884.578 rows=1333295 loops=1)
                     Index Cond: (time_stamp >= '2009-02-06 00:00:00'::timestamp without time zone)
   ->  Bitmap Heap Scan on billing b  (cost=24.70..1110.35 rows=389 width=49) (actual time=165.740..169.260 rows=1
0 loops=151332)
         Recheck Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substring"((
rb.msisdn)::text, 2)))
         Filter: (b.time_arrived >= '2009-02-06 00:00:00'::timestamp without time zone)
         ->  Bitmap Index Scan on billing_msisdn_sme_reference  (cost=0.00..24.70 rows=389 width=0) (actual time=2
1.418..21.418 rows=252 loops=151332)
               Index Cond: ((b.msisdn)::text = (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substrin
g"((rb.msisdn)::text, 2)))
 Total runtime: 25720210.772 ms
(16 rows)


Re: Slow update

From
Grzegorz Jaśkiewicz
Date:
On Mon, Feb 9, 2009 at 12:50 PM, Herouth Maoz <herouth@unicell.co.il> wrote:
> I hope someone can clue me in based on the results of explain analyze.

Did you have a chance to run vmstat on it, and post it here ? Maybe -
if db resides on the same disc with everything else, something
(ab)uses that much io, and it has to wait.
Also, I don't know - but personaly I didn't like the line in explain:

 ->  Bitmap Index Scan on billing_msisdn_sme_reference
(cost=0.00..24.70 rows=389 width=0) (actual time=2
1.418..21.418 rows=252 loops=151332)
        Index Cond: ((b.msisdn)::text =
(("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substrin
g"((rb.msisdn)::text, 2)))

But the cost is next to none, so that's not it.


--
GJ

Re: Slow update

From
Herouth Maoz
Date:

Grzegorz Jaśkiewicz wrote:

On Mon, Feb 9, 2009 at 12:50 PM, Herouth Maoz <herouth@unicell.co.il> wrote: 
I hope someone can clue me in based on the results of explain analyze.   
Did you have a chance to run vmstat on it, and post it here ? Maybe -
if db resides on the same disc with everything else, something
(ab)uses that much io, and it has to wait.
I haven't ran it, but it's worth mentioning that the data is on a separate device than the system/users. Also, the system is used only for the reporting system. Other than PostgreSQL and the cron job that runs this query, nothing runs on the system at night. Reports start being produced around 7:00 AM which should give the query ample time to finish (almost 3 hours) before any contention starts.

Herouth

Re: Slow update

From
Alban Hertroys
Date:
On Feb 9, 2009, at 2:07 PM, Grzegorz Jaśkiewicz wrote:

> On Mon, Feb 9, 2009 at 12:50 PM, Herouth Maoz
> <herouth@unicell.co.il> wrote:
>> I hope someone can clue me in based on the results of explain
>> analyze.
>
> Did you have a chance to run vmstat on it, and post it here ? Maybe -
> if db resides on the same disc with everything else, something
> (ab)uses that much io, and it has to wait.
> Also, I don't know - but personaly I didn't like the line in explain:
>
> ->  Bitmap Index Scan on billing_msisdn_sme_reference
> (cost=0.00..24.70 rows=389 width=0) (actual time=2
> 1.418..21.418 rows=252 loops=151332)
>        Index Cond: ((b.msisdn)::text =
> (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substrin
> g"((rb.msisdn)::text, 2)))
>
> But the cost is next to none, so that's not it.


Actually, it's inside a nested loop and if I read correctly it gets
looped over 151332 times. That means it takes 151332 * (21.418 -
1.418) = 3026640 ms, which is almost 12% of the total time.

The biggie seems to be the bitmap heap scan on rb though. The row
estimates for that one are way off (estimated 549 rows vs actual
151332).

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4991338b747034711712127!



Re: Slow update

From
Herouth Maoz
Date:
Alban Hertroys wrote:
> On Feb 9, 2009, at 2:07 PM, Grzegorz Jaśkiewicz wrote:
>
>> On Mon, Feb 9, 2009 at 12:50 PM, Herouth Maoz <herouth@unicell.co.il>
>> wrote:
>>> I hope someone can clue me in based on the results of explain analyze.
>>
>> Did you have a chance to run vmstat on it, and post it here ? Maybe -
>> if db resides on the same disc with everything else, something
>> (ab)uses that much io, and it has to wait.
>> Also, I don't know - but personaly I didn't like the line in explain:
>>
>> ->  Bitmap Index Scan on billing_msisdn_sme_reference
>> (cost=0.00..24.70 rows=389 width=0) (actual time=2
>> 1.418..21.418 rows=252 loops=151332)
>>        Index Cond: ((b.msisdn)::text =
>> (("substring"((rb.msisdn)::text, 1, 0) || '972'::text) || "substrin
>> g"((rb.msisdn)::text, 2)))
>>
>> But the cost is next to none, so that's not it.
>
>
> Actually, it's inside a nested loop and if I read correctly it gets
> looped over 151332 times. That means it takes 151332 * (21.418 -
> 1.418) = 3026640 ms, which is almost 12% of the total time.
>
> The biggie seems to be the bitmap heap scan on rb though. The row
> estimates for that one are way off (estimated 549 rows vs actual 151332).
>
> Alban Hertroys
>
To be quite honest, I'm not sure exactly how to read this plan, and what
the row values mean. The issue here is that sometimes the query works in
reasonable time, and sometimes it takes half a day. Of course, this may
be because the data size is different, but I don't know which part of
the plan tells me that.

What do rows vs. loops signify? How can the estimate be so far off if
I'm running analyze on all the tables right after I make any big updates
to them?

I find it hard to believe that the problem is with the complex
comparison caused by the different formats of the fields in rb and in
billing. This should add a constant multiplier to the time it takes to
run the query, but not cause the query to run one time in 5 minutes, and
the next day in 12 hours!

Thanks for the assistance.

Herouth

Here is the plan collected tonight, which took a reasonable amount of
time (5 minutes). It seems to be a totally different plan, isn't it?:


QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=1157750.08..1167132.31 rows=1 width=210) (actual
time=238247.983..239980.264 rows=111676 loops=1)
   Merge Cond: ((rb.reference = b.user_reference) AND
((rb.sms_user)::text = (b.user_id)::text) AND
(((("substring"((rb.msisdn)::text, 1, 0) || '972'::text) ||
"substring"((rb.msisdn)::text, 2))) = (b.msisdn)::text))
   ->  Sort  (cost=31137.76..31141.31 rows=1423 width=198) (actual
time=117858.431..117932.544 rows=111676 loops=1)
         Sort Key: rb.reference, rb.sms_user,
((("substring"((rb.msisdn)::text, 1, 0) || '972'::text) ||
"substring"((rb.msisdn)::text, 2)))
         Sort Method:  quicksort  Memory: 30941kB
         ->  Bitmap Heap Scan on rb  (cost=26968.95..31063.23 rows=1423
width=198) (actual time=113615.187..116935.502 rows=111676 loops=1)
               Recheck Cond: ((delivered = 0) AND (time_stamp >=
'2009-02-12 00:00:00'::timestamp without time zone))
               Filter: ((NOT mo_billed) AND (system_id <> 6))
               ->  BitmapAnd  (cost=26968.95..26968.95 rows=1423
width=0) (actual time=113454.761..113454.761 rows=0 loops=1)
                     ->  Bitmap Index Scan on rb_delivered_ind
(cost=0.00..2522.46 rows=69896 width=0) (actual time=9358.397..9358.397
rows=150651 loops=1)
                           Index Cond: (delivered = 0)
                     ->  Bitmap Index Scan on rb_timestamp_ind
(cost=0.00..24445.53 rows=213475 width=0) (actual
time=104091.620..104091.620 rows=303308 loops=1)
                           Index Cond: (time_stamp >= '2009-02-12
00:00:00'::timestamp without time zone)
   ->  Sort  (cost=1117952.26..1120779.49 rows=1130889 width=50) (actual
time=119485.709..120263.045 rows=756135 loops=1)
         Sort Key: b.user_reference, b.user_id, b.msisdn
         Sort Method:  external sort  Disk: 60976kB
         ->  Bitmap Heap Scan on billing b  (cost=36754.98..1004246.88
rows=1130889 width=50) (actual time=24409.448..101034.765 rows=896474
loops=1)
               Recheck Cond: (time_arrived >= '2009-02-12
00:00:00'::timestamp without time zone)
               ->  Bitmap Index Scan on billing_time_arrived
(cost=0.00..36472.26 rows=1130889 width=0) (actual
time=23936.245..23936.245 rows=1166881 loops=1)
                     Index Cond: (time_arrived >= '2009-02-12
00:00:00'::timestamp without time zone)
 Total runtime: 307958.152 ms
(21 rows)