Thread: How much expensive are row level statistics?

How much expensive are row level statistics?

From
Carlos Benkendorf
Date:
Hi,
 
I would like to use autovacuum but is not too much expensive collecting row level statistics?
 
Are there some numbers that I could use?
 
Thanks in advance!
 
Benkendorf


Yahoo! doce lar. Faça do Yahoo! sua homepage.

Re: How much expensive are row level statistics?

From
Michael Fuhr
Date:
On Sun, Dec 11, 2005 at 11:53:36AM +0000, Carlos Benkendorf wrote:
> I would like to use autovacuum but is not too much expensive
> collecting row level statistics?

The cost depends on your usage patterns.  I did tests with one of
my applications and saw no significant performance difference for
simple selects, but a series of insert/update/delete operations ran
about 30% slower when block- and row-level statistics were enabled
versus when the statistics collector was disabled.

--
Michael Fuhr

Re: How much expensive are row level statistics?

From
Alvaro Herrera
Date:
Michael Fuhr wrote:
> On Sun, Dec 11, 2005 at 11:53:36AM +0000, Carlos Benkendorf wrote:
> > I would like to use autovacuum but is not too much expensive
> > collecting row level statistics?
>
> The cost depends on your usage patterns.  I did tests with one of
> my applications and saw no significant performance difference for
> simple selects, but a series of insert/update/delete operations ran
> about 30% slower when block- and row-level statistics were enabled
> versus when the statistics collector was disabled.

This series of i/u/d operations ran with no sleep in between, right?
I wouldn't expect a normal OLTP operation to be like this.  (If it is
you have a serious shortage of hardware ...)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: How much expensive are row level statistics?

From
"Merlin Moncure"
Date:
>
> On Sun, Dec 11, 2005 at 11:53:36AM +0000, Carlos Benkendorf wrote:
> > I would like to use autovacuum but is not too much expensive
> > collecting row level statistics?
>
> The cost depends on your usage patterns.  I did tests with one of
> my applications and saw no significant performance difference for
> simple selects, but a series of insert/update/delete operations ran
> about 30% slower when block- and row-level statistics were enabled
> versus when the statistics collector was disabled.

That approximately confirms my results, except that the penalty may even
be a little bit higher in the worst-case scenario.  Row level stats hit
the hardest if you are doing 1 row at a time operations over a
persistent connection.  Since my apps inherited this behavior from their
COBOL legacy, I keep them off.  If your app follows the monolithic query
approach to problem solving (pull lots of rows in, edit them on the
client, and send them back), penalty is basically zero.

Merlin


Re: How much expensive are row level statistics?

From
Michael Fuhr
Date:
On Mon, Dec 12, 2005 at 01:33:27PM -0500, Merlin Moncure wrote:
> > The cost depends on your usage patterns.  I did tests with one of
> > my applications and saw no significant performance difference for
> > simple selects, but a series of insert/update/delete operations ran
> > about 30% slower when block- and row-level statistics were enabled
> > versus when the statistics collector was disabled.
>
> That approximately confirms my results, except that the penalty may even
> be a little bit higher in the worst-case scenario.  Row level stats hit
> the hardest if you are doing 1 row at a time operations over a
> persistent connection.

That's basically how the application I tested works: it receives
data from a stream and performs whatever insert/update/delete
statements are necessary to update the database for each chunk of
data.  Repeat a few thousand times.

--
Michael Fuhr

Re: How much expensive are row level statistics?

From
Michael Fuhr
Date:
On Mon, Dec 12, 2005 at 10:23:42AM -0300, Alvaro Herrera wrote:
> Michael Fuhr wrote:
> > The cost depends on your usage patterns.  I did tests with one of
> > my applications and saw no significant performance difference for
> > simple selects, but a series of insert/update/delete operations ran
> > about 30% slower when block- and row-level statistics were enabled
> > versus when the statistics collector was disabled.
>
> This series of i/u/d operations ran with no sleep in between, right?
> I wouldn't expect a normal OLTP operation to be like this.  (If it is
> you have a serious shortage of hardware ...)

There's no sleeping but there is some client-side processing between
groups of i/u/d operations.  As I mentioned in another message, the
application reads a chunk of data from a stream, does a few i/u/d
operations to update the database, and repeats several thousand times.

The hardware is old but it's adequate for this application.  What
kind of overhead would you expect?

--
Michael Fuhr

Re: How much expensive are row level statistics?

From
Tom Lane
Date:
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:
>> The cost depends on your usage patterns.  I did tests with one of
>> my applications and saw no significant performance difference for
>> simple selects, but a series of insert/update/delete operations ran
>> about 30% slower when block- and row-level statistics were enabled
>> versus when the statistics collector was disabled.

> That approximately confirms my results, except that the penalty may even
> be a little bit higher in the worst-case scenario.  Row level stats hit
> the hardest if you are doing 1 row at a time operations over a
> persistent connection.

IIRC, the only significant cost from enabling stats is the cost of
transmitting the counts to the stats collector, which is a cost
basically paid once at each transaction commit.  So short transactions
will definitely have more overhead than longer ones.  Even for a really
simple transaction, though, 30% seems high --- the stats code is
designed deliberately to minimize the penalty.

            regards, tom lane

Re: How much expensive are row level statistics?

From
Michael Fuhr
Date:
On Mon, Dec 12, 2005 at 06:01:01PM -0500, Tom Lane wrote:
> IIRC, the only significant cost from enabling stats is the cost of
> transmitting the counts to the stats collector, which is a cost
> basically paid once at each transaction commit.  So short transactions
> will definitely have more overhead than longer ones.  Even for a really
> simple transaction, though, 30% seems high --- the stats code is
> designed deliberately to minimize the penalty.

Now there goes Tom with his skeptical eye again, and here comes me
saying "oops" again.  Further tests show that for this application
the killer is stats_command_string, not stats_block_level or
stats_row_level.  Here are timings for the same set of operations
(thousands of insert, update, and delete statements in one transaction)
run under various settings:

stats_command_string = off
stats_block_level = off
stats_row_level = off
time: 2:09.46

stats_command_string = off
stats_block_level = on
stats_row_level = off
time: 2:12.28

stats_command_string = off
stats_block_level = on
stats_row_level = on
time: 2:14.38

stats_command_string = on
stats_block_level = off
stats_row_level = off
time: 2:50.58

stats_command_string = on
stats_block_level = on
stats_row_level = on
time: 2:53.76

[Wanders off, swearing that he ran these tests before and saw higher
penalties for block- and row-level statistics.]

--
Michael Fuhr

Re: How much expensive are row level statistics?

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> Further tests show that for this application
> the killer is stats_command_string, not stats_block_level or
> stats_row_level.

I tried it with pgbench -c 10, and got these results:
    41% reduction in TPS rate for stats_command_string
    9% reduction in TPS rate for stats_block/row_level (any combination)

strace'ing a backend confirms my belief that stats_block/row_level send
just one stats message per transaction (at least for the relatively
small number of tables touched per transaction by pgbench).  However
stats_command_string sends 14(!) --- there are seven commands per
pgbench transaction and each results in sending a <command> message and
later an <IDLE> message.

Given the rather lackadaisical way in which the stats collector makes
the data available, it seems like the backends are being much too
enthusiastic about posting their stats_command_string status
immediately.  Might be worth thinking about how to cut back the
overhead by suppressing some of these messages.

            regards, tom lane

Re: How much expensive are row level statistics?

From
Michael Fuhr
Date:
On Mon, Dec 12, 2005 at 10:20:45PM -0500, Tom Lane wrote:
> Given the rather lackadaisical way in which the stats collector makes
> the data available, it seems like the backends are being much too
> enthusiastic about posting their stats_command_string status
> immediately.  Might be worth thinking about how to cut back the
> overhead by suppressing some of these messages.

Would a GUC setting akin to log_min_duration_statement be feasible?
Does the backend support, or could it be easily modified to support,
a mechanism that would post the command string after a configurable
amount of time had expired, and then continue processing the query?
That way admins could avoid the overhead of posting messages for
short-lived queries that nobody's likely to see in pg_stat_activity
anyway.

--
Michael Fuhr

Re: How much expensive are row level statistics?

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> Does the backend support, or could it be easily modified to support,
> a mechanism that would post the command string after a configurable
> amount of time had expired, and then continue processing the query?

Not really, unless you want to add the overhead of setting a timer
interrupt for every query.  Which is sort of counterproductive when
the motivation is to reduce overhead ...

(It might be more or less free if you have statement_timeout set, since
there would be a setitimer call anyway.  But I don't think that's the
norm.)

            regards, tom lane

Re: How much expensive are row level statistics?

From
Kevin Brown
Date:
Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > Does the backend support, or could it be easily modified to support,
> > a mechanism that would post the command string after a configurable
> > amount of time had expired, and then continue processing the query?
>
> Not really, unless you want to add the overhead of setting a timer
> interrupt for every query.  Which is sort of counterproductive when
> the motivation is to reduce overhead ...
>
> (It might be more or less free if you have statement_timeout set, since
> there would be a setitimer call anyway.  But I don't think that's the
> norm.)

Actually, it's probably not necessary to set the timer at the
beginning of every query.  It's probably sufficient to just have it go
off periodically, e.g. once every second, and thus set it when the
timer goes off.  And the running command wouldn't need to be re-posted
if it's the same as last time around.  Turn off the timer if the
connection is idle now and was idle last time around (or not, if
there's no harm in having the timer running all the time), turn it on
again at the start of the next transaction.

In essence, the backend would be "polling" itself every second or so
and recording its state at that time, rather than on every
transaction.

Assuming that doing all that wouldn't screw something else up...



--
Kevin Brown                          kevin@sysexperts.com

Re: How much expensive are row level statistics?

From
Simon Riggs
Date:
On Thu, 2005-12-15 at 19:06 -0500, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > Does the backend support, or could it be easily modified to support,
> > a mechanism that would post the command string after a configurable
> > amount of time had expired, and then continue processing the query?
>
> Not really, unless you want to add the overhead of setting a timer
> interrupt for every query.  Which is sort of counterproductive when
> the motivation is to reduce overhead ...
>
> (It might be more or less free if you have statement_timeout set, since
> there would be a setitimer call anyway.  But I don't think that's the
> norm.)

We could do the deferred send fairly easily. You need only set a timer
when stats_command_string = on, so we'd only do that when requested by
the admin. Overall, that would be a cheaper way of doing it than now.

However, I'm more inclined to the idea of a set of functions that allow
an administrator to retrieve the full SQL text executing in a backend,
with an option to return an EXPLAIN of the currently executing plan.
Right now, stats only gives you the first 1000 chars, so you're always
stuck if its a big query. Plus we don't yet have a way of getting the
exact  EXPLAIN of a running query (you can get close, but it could
differ).

Pull is better than push. Asking specific backends what they're doing
when you need to know will be efficient; asking them to send their
command strings, all of the time, deferred or not will always be more
wasteful. Plus if you forgot to turn on stats_command_string before
execution, then you've no way of knowing anyhow.

Best Regards, Simon Riggs




Re: How much expensive are row level statistics?

From
"Merlin Moncure"
Date:
> Now there goes Tom with his skeptical eye again, and here comes me
> saying "oops" again.  Further tests show that for this application

I made the same mistake, fwiw.  The big hit comes with command_string.
However, row level stats bring a big enough penalty (~10% on my usage)
that I keep them turned off.  The penalty is not just run time either,
but increased cpu time.  It just isn't an essential feature so unless it
causes near zero extra load it will stay off on my servers.

Additionally, back when I was testing the win32/pg platform I was
getting random restarts of the stats collector when the server was under
high load and row_level stats were on.  This was a while back so this
issue may or may not be resolved...it was really nasty because it
cleared out pg_stats_activity which in turn ruined my admin tools.  I
should probably give that another look.

Merlin

Re: How much expensive are row level statistics?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > Further tests show that for this application
> > the killer is stats_command_string, not stats_block_level or
> > stats_row_level.
>
> I tried it with pgbench -c 10, and got these results:
>     41% reduction in TPS rate for stats_command_string

Woh, 41%.  That's just off the charts!  What are we doing internally
that would cause that?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073