Thread: Problems with inconsistant query performance.

Problems with inconsistant query performance.

From
Matthew Schumacher
Date:
List,

I posted a little about this a while back to the general list, but never
really got any where with it so I'll try again, this time with a little
more detail and hopefully someone can send me in the right direction.

Here is the problem, I have a procedure that is called 100k times a day.
 Most of the time it's screaming fast, but other times it takes a few
seconds.  When it does lag my machine can get behind which causes other
problems, so I'm trying to figure out why there is such a huge delta in
performance with this proc.

The proc is pretty large (due to the number of vars) so I will summarize
it here:

==========================================================================
CREATE acctmessage( <lots of accounting columns> )RETURNS void AS $$
BEGIN
INSERT into tmpaccounting_tab ( ... ) values ( ... );

IF _acctType = 'start' THEN
  BEGIN
  INSERT into radutmp_tab ( ... ) valuse ( ... );
  EXCEPTION WHEN UNIQUE_VIOLATION THEN
    NULL;
  END;
ELSIF _acctType = 'stop' THEN
  UPDATE radutmp_tab SET ... WHERE sessionId = _sessionId AND userName =
_userName;
  IF (NOT FOUND) THEN
    INSERT into radutmp_tab ( ... ) values ( ... );
  END IF;

END IF;
END;
$$
LANGUAGE plpgsql;
==========================================================================

So in a nutshell, if I get an accounting record put it in the
tmpaccounting_tab and then insert or update the radutmp_tab based on
what kind of record it is.  If for some reason the message is a start
message and a duplicate, drop it, and if the message is a stop message
and we don't have the start then insert it.

The tmpaccounting_tab table doesn't have any indexes and gets flushed to
the accounting_tab table nightly so it should have very good insert
performance as the table is kept small (compared to accounting_tab) and
doesn't have index overhead.  The radutmp_tab is also kept small as
completed sessions are flushed to another table nightly, but I do keep
an index on sessionId and userName so the update isn't slow.

Now that you have the layout, the problem: I log whenever a query takes
more than 250ms and have logged this query:

duration: 3549.229 ms  statement: select acctMessage( 'stop',
'username', 'address', 'time', 'session', 'port', 'address', 'bytes',
'bytes', 0, 0, 1, 'reason', '', '', '', 'proto', 'service',  'info')

But when I go do an explain analyze it is very fast:

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.03 rows=1 width=0) (actual time=6.812..6.813
rows=1 loops=1)
 Total runtime: 6.888 ms

So the question is why on a relatively simple proc and I getting a query
performance delta between 3549ms and 7ms?

Here are some values from my postgres.conf to look at:

shared_buffers = 60000                  # min 16 or max_connections*2,
8KB each
temp_buffers = 5000                     # min 100, 8KB each
#max_prepared_transactions = 5          # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared
memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 131072                       # min 64, size in KB
maintenance_work_mem = 262144           # min 1024, size in KB
max_stack_depth = 2048                  # min 100, size in KB
effective_cache_size = 65536            # typically 8KB each


Thanks for any help you can give,
schu


Re: Problems with inconsistant query performance.

From
"Jim C. Nasby"
Date:
Periodically taking longer is probably a case of some other process in
the database holding a lock you need, or otherwise bogging the system
down, especially if you're always running acctmessage from the same
connection (because the query plans shouldn't be changing then). I'd
suggest looking at what else is happening at the same time.

Also, it's more efficient to operate on chunks of data rather than one
row at a time whenever possible. If you have to log each row
individually, consider simply logging them into a table, and then
periodically pulling data out of that table to do additional processing
on it.

BTW, your detection of duplicates/row existance has a race condition.
Take a look at example 36-1 at
http://www.postgresql.org/docs/8.1/interactive/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
for a better way to handle it.

On Wed, Sep 27, 2006 at 10:37:22AM -0800, Matthew Schumacher wrote:
> List,
>
> I posted a little about this a while back to the general list, but never
> really got any where with it so I'll try again, this time with a little
> more detail and hopefully someone can send me in the right direction.
>
> Here is the problem, I have a procedure that is called 100k times a day.
>  Most of the time it's screaming fast, but other times it takes a few
> seconds.  When it does lag my machine can get behind which causes other
> problems, so I'm trying to figure out why there is such a huge delta in
> performance with this proc.
>
> The proc is pretty large (due to the number of vars) so I will summarize
> it here:
>
> ==========================================================================
> CREATE acctmessage( <lots of accounting columns> )RETURNS void AS $$
> BEGIN
> INSERT into tmpaccounting_tab ( ... ) values ( ... );
>
> IF _acctType = 'start' THEN
>   BEGIN
>   INSERT into radutmp_tab ( ... ) valuse ( ... );
>   EXCEPTION WHEN UNIQUE_VIOLATION THEN
>     NULL;
>   END;
> ELSIF _acctType = 'stop' THEN
>   UPDATE radutmp_tab SET ... WHERE sessionId = _sessionId AND userName =
> _userName;
>   IF (NOT FOUND) THEN
>     INSERT into radutmp_tab ( ... ) values ( ... );
>   END IF;
>
> END IF;
> END;
> $$
> LANGUAGE plpgsql;
> ==========================================================================
>
> So in a nutshell, if I get an accounting record put it in the
> tmpaccounting_tab and then insert or update the radutmp_tab based on
> what kind of record it is.  If for some reason the message is a start
> message and a duplicate, drop it, and if the message is a stop message
> and we don't have the start then insert it.
>
> The tmpaccounting_tab table doesn't have any indexes and gets flushed to
> the accounting_tab table nightly so it should have very good insert
> performance as the table is kept small (compared to accounting_tab) and
> doesn't have index overhead.  The radutmp_tab is also kept small as
> completed sessions are flushed to another table nightly, but I do keep
> an index on sessionId and userName so the update isn't slow.
>
> Now that you have the layout, the problem: I log whenever a query takes
> more than 250ms and have logged this query:
>
> duration: 3549.229 ms  statement: select acctMessage( 'stop',
> 'username', 'address', 'time', 'session', 'port', 'address', 'bytes',
> 'bytes', 0, 0, 1, 'reason', '', '', '', 'proto', 'service',  'info')
>
> But when I go do an explain analyze it is very fast:
>
>                                      QUERY PLAN
> ------------------------------------------------------------------------------------
>  Result  (cost=0.00..0.03 rows=1 width=0) (actual time=6.812..6.813
> rows=1 loops=1)
>  Total runtime: 6.888 ms
>
> So the question is why on a relatively simple proc and I getting a query
> performance delta between 3549ms and 7ms?
>
> Here are some values from my postgres.conf to look at:
>
> shared_buffers = 60000                  # min 16 or max_connections*2,
> 8KB each
> temp_buffers = 5000                     # min 100, 8KB each
> #max_prepared_transactions = 5          # can be 0 or more
> # note: increasing max_prepared_transactions costs ~600 bytes of shared
> memory
> # per transaction slot, plus lock space (see max_locks_per_transaction).
> work_mem = 131072                       # min 64, size in KB
> maintenance_work_mem = 262144           # min 1024, size in KB
> max_stack_depth = 2048                  # min 100, size in KB
> effective_cache_size = 65536            # typically 8KB each
>
>
> Thanks for any help you can give,
> schu
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Problems with inconsistant query performance.

From
Matthew Schumacher
Date:
Jim,

Thanks for the help.  I went and looked at that example and I don't see
how it's different than the "INSERT into radutmp_tab" I'm already doing.
 Both raise an exception, the only difference is that I'm not doing
anything with it.  Perhaps you are talking about the "IF (NOT FOUND)" I
put after the "UPDATE radutmp_tab".  Should this be an EXCEPTION
instead?  Also I don't know how this could cause a race condition.  As
far as I understand each proc is run in it's own transaction, and the
code in the proc is run serially.  Can you explain more why this could
case a race?

Thanks,
schu



Jim C. Nasby wrote:
> Periodically taking longer is probably a case of some other process in
> the database holding a lock you need, or otherwise bogging the system
> down, especially if you're always running acctmessage from the same
> connection (because the query plans shouldn't be changing then). I'd
> suggest looking at what else is happening at the same time.
>
> Also, it's more efficient to operate on chunks of data rather than one
> row at a time whenever possible. If you have to log each row
> individually, consider simply logging them into a table, and then
> periodically pulling data out of that table to do additional processing
> on it.
>
> BTW, your detection of duplicates/row existance has a race condition.
> Take a look at example 36-1 at
> http://www.postgresql.org/docs/8.1/interactive/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
> for a better way to handle it.

>> ==========================================================================
>> CREATE acctmessage( <lots of accounting columns> )RETURNS void AS $$
>> BEGIN
>> INSERT into tmpaccounting_tab ( ... ) values ( ... );
>>
>> IF _acctType = 'start' THEN
>>   BEGIN
>>   INSERT into radutmp_tab ( ... ) valuse ( ... );
>>   EXCEPTION WHEN UNIQUE_VIOLATION THEN
>>     NULL;
>>   END;
>> ELSIF _acctType = 'stop' THEN
>>   UPDATE radutmp_tab SET ... WHERE sessionId = _sessionId AND userName =
>> _userName;
>>   IF (NOT FOUND) THEN
>>     INSERT into radutmp_tab ( ... ) values ( ... );
>>   END IF;
>>
>> END IF;
>> END;
>> $$
>> LANGUAGE plpgsql;
>> ==========================================================================

Re: Problems with inconsistant query performance.

From
"Jim C. Nasby"
Date:
On Wed, Sep 27, 2006 at 01:33:09PM -0800, Matthew Schumacher wrote:
> Jim,
>
> Thanks for the help.  I went and looked at that example and I don't see
> how it's different than the "INSERT into radutmp_tab" I'm already doing.
>  Both raise an exception, the only difference is that I'm not doing
> anything with it.  Perhaps you are talking about the "IF (NOT FOUND)" I
> put after the "UPDATE radutmp_tab".  Should this be an EXCEPTION
> instead?  Also I don't know how this could cause a race condition.  As
> far as I understand each proc is run in it's own transaction, and the
> code in the proc is run serially.  Can you explain more why this could
> case a race?

It can cause a race if another process could be performing those same
inserts or updates at the same time.

I know the UPDATE case can certainly cause a race. 2 connections try to
update, both hit NOT FOUND, both try to insert... only one will get to
commit.

I think that the UNIQUE_VIOLATION case should be safe, since a second
inserter should block if there's another insert that's waiting to
commit.

DELETEs are something else to think about for both cases.

If you're certain that only one process will be performing DML on those
tables at any given time, then what you have is safe. But if that's the
case, I'm thinking you should be able to group things into chunks, which
should be more efficient.

> Thanks,
> schu
>
>
>
> Jim C. Nasby wrote:
> > Periodically taking longer is probably a case of some other process in
> > the database holding a lock you need, or otherwise bogging the system
> > down, especially if you're always running acctmessage from the same
> > connection (because the query plans shouldn't be changing then). I'd
> > suggest looking at what else is happening at the same time.
> >
> > Also, it's more efficient to operate on chunks of data rather than one
> > row at a time whenever possible. If you have to log each row
> > individually, consider simply logging them into a table, and then
> > periodically pulling data out of that table to do additional processing
> > on it.
> >
> > BTW, your detection of duplicates/row existance has a race condition.
> > Take a look at example 36-1 at
> > http://www.postgresql.org/docs/8.1/interactive/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
> > for a better way to handle it.
>
> >> ==========================================================================
> >> CREATE acctmessage( <lots of accounting columns> )RETURNS void AS $$
> >> BEGIN
> >> INSERT into tmpaccounting_tab ( ... ) values ( ... );
> >>
> >> IF _acctType = 'start' THEN
> >>   BEGIN
> >>   INSERT into radutmp_tab ( ... ) valuse ( ... );
> >>   EXCEPTION WHEN UNIQUE_VIOLATION THEN
> >>     NULL;
> >>   END;
> >> ELSIF _acctType = 'stop' THEN
> >>   UPDATE radutmp_tab SET ... WHERE sessionId = _sessionId AND userName =
> >> _userName;
> >>   IF (NOT FOUND) THEN
> >>     INSERT into radutmp_tab ( ... ) values ( ... );
> >>   END IF;
> >>
> >> END IF;
> >> END;
> >> $$
> >> LANGUAGE plpgsql;
> >> ==========================================================================
>

--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Problems with inconsistant query performance.

From
Matthew Schumacher
Date:
Jim C. Nasby wrote:
>
> It can cause a race if another process could be performing those same
> inserts or updates at the same time.

There are inserts and updates running all of the time, but never the
same data.  I'm not sure how I can get around this since the queries are
coming from my radius system which is not able to queue this stuff up
because it waits for a successful query before returning an OK packet
back to the client.

>
> I know the UPDATE case can certainly cause a race. 2 connections try to
> update, both hit NOT FOUND, both try to insert... only one will get to
> commit.

Why is that?  Doesn't the first update lock the row causing the second
one to wait, then the second one stomps on the row allowing both to
commit?  I must be confused....

>
> I think that the UNIQUE_VIOLATION case should be safe, since a second
> inserter should block if there's another insert that's waiting to
> commit.

Are you saying that inserts inside of an EXCEPTION block, but normal
inserts don't?

>
> DELETEs are something else to think about for both cases.

I only do one delete and that is every night when I move the data to the
primary table and remove that days worth of data from the tmp table.
This is done at non-peak times with a vacuum, so I think I'm good here.

>
> If you're certain that only one process will be performing DML on those
> tables at any given time, then what you have is safe. But if that's the
> case, I'm thinking you should be able to group things into chunks, which
> should be more efficient.

Yea, I wish I could, but I really need to do one at a time because of
how radius waits for a successful query before telling the access server
all is well.  If the query fails, the access server won't get the 'OK'
packet and will send the data to the secondary radius system where it
gets queued.

Do you know of a way to see what is going on with the locking system
other than "select * from pg_locks"?  I can't ever seem to catch the
system when queries start to lag.

Thanks again,
schu

Re: Problems with inconsistant query performance.

From
"Jim C. Nasby"
Date:
On Wed, Sep 27, 2006 at 02:17:23PM -0800, Matthew Schumacher wrote:
> Jim C. Nasby wrote:
> >
> > It can cause a race if another process could be performing those same
> > inserts or updates at the same time.
>
> There are inserts and updates running all of the time, but never the
> same data.  I'm not sure how I can get around this since the queries are
> coming from my radius system which is not able to queue this stuff up
> because it waits for a successful query before returning an OK packet
> back to the client.
>
> >
> > I know the UPDATE case can certainly cause a race. 2 connections try to
> > update, both hit NOT FOUND, both try to insert... only one will get to
> > commit.
>
> Why is that?  Doesn't the first update lock the row causing the second
> one to wait, then the second one stomps on the row allowing both to
> commit?  I must be confused....

What if there's no row to update?

Process A               Process B
UPDATE .. NOT FOUND
                        UPDATE .. NOT FOUND
                        INSERT
INSERT blocks
                        COMMIT
UNIQUE_VIOLATION

That's assuming that there's a unique index. If there isn't one, you'd
get duplicate records.

> > I think that the UNIQUE_VIOLATION case should be safe, since a second
> > inserter should block if there's another insert that's waiting to
> > commit.
>
> Are you saying that inserts inside of an EXCEPTION block, but normal
> inserts don't?

No... if there's a unique index, a second INSERT attempting to create a
duplicate record will block until the first INSERT etiher commits or
rollsback.

> > DELETEs are something else to think about for both cases.
>
> I only do one delete and that is every night when I move the data to the
> primary table and remove that days worth of data from the tmp table.
> This is done at non-peak times with a vacuum, so I think I'm good here.

Except that you might still have someone fire off that function while
the delete's running, or vice-versa. So there could be a race condition
(I haven't thought enough about what race conditions that could cause).

> > If you're certain that only one process will be performing DML on those
> > tables at any given time, then what you have is safe. But if that's the
> > case, I'm thinking you should be able to group things into chunks, which
> > should be more efficient.
>
> Yea, I wish I could, but I really need to do one at a time because of
> how radius waits for a successful query before telling the access server
> all is well.  If the query fails, the access server won't get the 'OK'
> packet and will send the data to the secondary radius system where it
> gets queued.

In that case, the key is to do the absolute smallest amount of work
possible as part of that transaction. Ideally, you would only insert a
record into a queue table somewhere, and then periodically process
records out of that table in batches.

> Do you know of a way to see what is going on with the locking system
> other than "select * from pg_locks"?  I can't ever seem to catch the
> system when queries start to lag.

No. Your best bet is to open two psql sessions and step through things
in different combinations (make sure and do this in transactions).
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Problems with inconsistant query performance.

From
"Marcin Mank"
Date:
> So the question is why on a relatively simple proc and I getting a query
> performance delta between 3549ms and 7ms?

What version of PG is it?

I had such problems in a pseudo-realtime app I use here with Postgres, and
they went away when I moved to 8.1 (from 7.4). I guess it is better shared
buffer management code (don`t You see a
big_query_searching_through_half_the_db just before You get this slow
insert? ) .

Greetings
Marcin


Re: Problems with inconsistant query performance.

From
Matthew Schumacher
Date:
Marcin Mank wrote:
>> So the question is why on a relatively simple proc and I getting a query
>> performance delta between 3549ms and 7ms?
>
> What version of PG is it?
>
> I had such problems in a pseudo-realtime app I use here with Postgres, and
> they went away when I moved to 8.1 (from 7.4). I guess it is better shared
> buffer management code (don`t You see a
> big_query_searching_through_half_the_db just before You get this slow
> insert? ) .
>
> Greetings
> Marcin
>

Marcin,

It is 8.1.4, and there is searching being done on the radutmp_tab all of
the time.  It is relatively small though, only a couple of thousand
rows.  The tmpaccounting_tab table which also gets inserts is only used
for inserting, then one large query every night to flush the data to an
indexed table.

What I really need is a way to profile my proc when it runs slow so that
I can resolve which of the queries is really slow.  Anyone with an idea
on how to do this?

schu

Re: Problems with inconsistant query performance.

From
Bill Moran
Date:
In response to Matthew Schumacher <matt.s@aptalaska.net>:
>
> What I really need is a way to profile my proc when it runs slow so that
> I can resolve which of the queries is really slow.  Anyone with an idea
> on how to do this?

You could turn on statement logging and duration logging.  This would
give you a record of when things run and how long they take.  A little
work analyzing should show you which queries are running when your
favorite query slows down.

--
Bill Moran
Collaborative Fusion Inc.

Re: Problems with inconsistant query performance.

From
"Jim C. Nasby"
Date:
On Thu, Sep 28, 2006 at 11:28:43AM -0400, Bill Moran wrote:
> In response to Matthew Schumacher <matt.s@aptalaska.net>:
> >
> > What I really need is a way to profile my proc when it runs slow so that
> > I can resolve which of the queries is really slow.  Anyone with an idea
> > on how to do this?
>
> You could turn on statement logging and duration logging.  This would
> give you a record of when things run and how long they take.  A little
> work analyzing should show you which queries are running when your
> favorite query slows down.

By default, that doesn't help you debug what's happening inside a
function, because you only get the call to the function. I don't know if
you can increase verbosity to combat that.

Something else to consider is that gettimeofday() on some platforms is
painfully slow, which could completely skew all your numbers.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Problems with inconsistant query performance.

From
Bill Moran
Date:
In response to "Jim C. Nasby" <jim@nasby.net>:

> On Thu, Sep 28, 2006 at 11:28:43AM -0400, Bill Moran wrote:
> > In response to Matthew Schumacher <matt.s@aptalaska.net>:
> > >
> > > What I really need is a way to profile my proc when it runs slow so that
> > > I can resolve which of the queries is really slow.  Anyone with an idea
> > > on how to do this?
> >
> > You could turn on statement logging and duration logging.  This would
> > give you a record of when things run and how long they take.  A little
> > work analyzing should show you which queries are running when your
> > favorite query slows down.
>
> By default, that doesn't help you debug what's happening inside a
> function, because you only get the call to the function. I don't know if
> you can increase verbosity to combat that.

Right, but my point was that he believes another query is interfering
when the target query is slow.  Turning on those logging statements
will:
a) Allow him to identify times when the query is slow.
b) Identify other queries that are running at the same time.

If he sees a pattern (i.e. My query is always slow if query X5 is
running at the same time) he'll have a good lead into further research.

--
Bill Moran
Collaborative Fusion Inc.