Re: Add min and max execute statement time in pg_stat_statement - Mailing list pgsql-hackers

From KONDO Mitsumasa
Subject Re: Add min and max execute statement time in pg_stat_statement
Date
Msg-id 52E9D98A.4000007@lab.ntt.co.jp
Whole thread Raw
In response to Re: Add min and max execute statement time in pg_stat_statement  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Add min and max execute statement time in pg_stat_statement
List pgsql-hackers
(2014/01/29 16:58), Peter Geoghegan wrote:
> On Tue, Jan 28, 2014 at 10:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp> writes:
>>> By the way, latest pg_stat_statement might affect performance in Windows system.
>>> Because it uses fflush() system call every creating new entry in
>>> pg_stat_statements, and it calls many fread() to warm file cache.
>>
>> This statement doesn't seem to have much to do with the patch as
>> committed.
>
> You could make a strong case for the patch improving performance in
> practice for many users, by allowing us to increase the
> pg_stat_statements.max default value to 5,000, 5 times the previous
> value. The real expense of creating a new entry is the exclusive
> locking of the hash table, which blocks *everything* in
> pg_stat_statements. That isn't expanded at all, since queries are only
> written with a shared lock, which only blocks the creation of new
> entries which was already relatively expensive. In particular, it does
> not block the maintenance of costs for all already observed entries in
> the hash table. It's obvious that simply reducing the pressure on the
> cache will improve matters a lot, which for many users the external
> texts patch does.
>
> Since Mitsumasa has compared that patch and at least one of his
> proposed pg_stat_statements patches on several occasions, I will
> re-iterate the difference: any increased overhead from the external
> texts patch is paid only when a query is first entered into the hash
> table. Then, there is obviously and self-evidently no additional
> overhead from contention for any future execution of the same query,
> no matter what, indefinitely (the exclusive locking section of
> creating a new entry does not do I/O, except in fantastically unlikely
> circumstances). So for many of the busy production systems I work
> with, that's the difference between a cost paid perhaps tens of
> thousands of times a second, and a cost paid every few days or weeks.
> Even if he is right about a write() taking an unreasonably large
> amount of time on Windows, the consequences felt as pg_stat_statements
> LWLock contention would be very limited.
>
> I am not opposed in principle to adding new things to the counters
> struct in pg_stat_statements. I just think that the fact that the
> overhead of installing the module on a busy production system is
> currently so low is of *major* value, and therefore any person that
> proposes to expand that struct should be required to very conclusively
> demonstrate that there is no appreciably increase in overhead. Having
> a standard deviation column would be nice, but it's still not that
> important. Maybe when we have portable atomic addition we can afford
> to be much more inclusive of that kind of thing.

I'd like to know the truth and the fact in your patch, rather than your argument:-)

So I create detail pg_stat_statements benchmark tool using pgbench.
This tool can create 10000 pattern unique sqls in a file, and it is only
for measuring pg_stat_statements performance. Because it only updates
pg_stat_statements data and doesn't write to disk at all. It's fair benchmark.

[usage]
perl create_sql.pl >file.sql
psql -h -h xxx.xxx.xxx.xxx mitsu-ko -c 'SELECT pg_stat_statements_reset()'
pgbench -h xxx.xxx.xxx.xxx mitsu-ko -c64 -j32 -n -T 180 -f file.sql

[part of sample sqls file, they are executed in pgbench]
SELECT
1-1/9+5*8*6+5+9-6-3-4/9%7-2%7/5-9/7+2+9/7-1%5%9/3-4/4-9/1+5+5/6/5%2+1*2*3-8/8+5-3-8-4/8+5%2*2-2-5%6+4
SELECT
1%9%7-8/5%6-1%2*2-7+9*6-2*6-9+1-2*9+6+7*8-4-2*1%3/7-1%4%9+4+7/5+4/2-3-5%8/3/7*6-1%8*6*1/7/2%5*6/2-3-9
SELECT
1%3*2/8%6%5-3%1+1/6*6*5/9-2*5+6/6*5-1/2-6%4%8/7%2*7%5%9%5/9%1%1-3-9/2*1+1*6%8/2*4/1+6*7*1+5%6+9*1-9*6
...

I test some methods that include old pgss, old pgss with my patch, and new pgss.
Test server and postgresql.conf are same in I tested last day in this ML-thread.
And test methods and test results are here,

[methods]
method 1: with old pgss, pg_stat_statements.max=1000(default)
method 2: with old pgss with my patch, pg_stat_statements.max=1000(default)
peter 1 : with new pgss(Peter's patch), pg_stat_statements.max=5000(default)
peter 2 : with new pgss(Peter's patch), pg_stat_statements.max=1000

[for reference]
method 5:with old pgss, pg_stat_statements.max=5000
method 6:with old pgss with my patch, pg_stat_statements.max=5000

[results]
  method   |  try1  |  try2  |  try3  | degrade performance ratio
-----------------------------------------------------------------
  method 1 | 6.546  | 6.558  | 6.638  | 0% (reference score)
  method 2 | 6.527  | 6.556  | 6.574  | 1%
  peter 1  | 5.204  | 5.203  | 5.216  |20%
  peter 2  | 4.241  | 4.236  | 4.262  |35%

  method 5 | 5.931  | 5.848  | 5.872  |11%
  method 6 | 5.794  | 5.792  | 5.776  |12%


New pgss is extremely degrade performance than old pgss, and I cannot see
performance scaling.
I understand that his argument was "My patch reduces time of LWLock in
pg_stat_statements,
it is good for performance. However, Kondo's patch will be degrade performance in
pg_stat_statements. so I give it -1.". But unless seeing this test results, his
claim seems
to contradict it even in Linux OS.

I think that his patch's strong point is only storing long query, at the sacrifice of
significant performance. Does it hope in community? At least, I don't hope it.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: updated emacs configuration
Next
From: KONDO Mitsumasa
Date:
Subject: Re: Add min and max execute statement time in pg_stat_statement