Re: Any better plan for this query?.. - Mailing list pgsql-performance

From Dimitri
Subject Re: Any better plan for this query?..
Date
Msg-id 5482c80a0905120319s36699564iaa4216f4ce011f2f@mail.gmail.com
Whole thread Raw
In response to Re: Any better plan for this query?..  (Dimitri Fontaine <dfontaine@hi-media.com>)
Responses Re: Any better plan for this query?..
Re: Any better plan for this query?..
Re: Any better plan for this query?..
Re: Any better plan for this query?..
List pgsql-performance
Folks, before you start to think "what a dumb guy doing a dumb thing" :-))
I'll explain you few details:

it's for more than 10 years I'm using a db_STRESS kit
(http://dimitrik.free.fr/db_STRESS.html) to check databases
performance and scalability. Until now I was very happy with results
it gave me as it stress very well each database engine internals an
put on light some things I should probably skip on other workloads.
What do you want, with a time the "fast" query executed before in
500ms now runs within 1-2ms  - not only hardware was improved but also
database engines increased their performance a lot! :-))

In 2007 I've published the first public results with PostgreSQL, and
it was 2 times faster on that time comparing to MySQL
(http://dimitrik.free.fr/db_STRESS_BMK_Part1.html)

Last month for the launching of MySQL 5.4 I've done a long series of
tests and at the end for my curiosity I've executed the same load
against PostgreSQL 8.3.7 to see if MySQL is more close now. For my big
surprise, MySQL was faster! As well observations on PG processing
bring me a lot of questions - I supposed something was abnormal on PG
side, but I did not have too much time to understand what it was
exactly (http://dimitrik.free.fr/db_STRESS_MySQL_540_and_others_Apr2009.html#note_5443)

What I'm trying to do now is to understand what exactly is the problem.

What I discovered so far with all your help:
  - the impact of a planner
  - the impact of the analyze target
  - the impact of prepare / execute
  - scalability limit on 32 cores

I'll also try to adapt prepare/execute solution to see how much it
improves performance and/or scalability.

As well helping from the other thread I was able to improve a lot the
TPS stability on read+write workload! :-)

Any other comments are welcome!

Rgds,
-Dimitri

On 5/12/09, Dimitri Fontaine <dfontaine@hi-media.com> wrote:
> Hi,
>
> Dimitri <dimitrik.fr@gmail.com> writes:
>
>>>> So, why I don't use prepare here: let's say I'm testing the worst
>>>> stress case :-)  Imagine you have thousands of such kind of queries -
>>>> you cannot prepare all of them! :-)
>>>
>>> Thousands?  Surely there'll be a dozen or three of most common queries,
>>> to which you pass different parameters.  You can prepare thoseu
>>
>> Ok, and if each client just connect to the database, execute each kind
>> of query just *once* and then disconnect?..  - cost of prepare will
>> kill performance here if it's not reused at least 10 times within the
>> same session.
>
> In a scenario which looks like this one, what I'm doing is using
> pgbouncer transaction pooling. Now a new connection from client can be
> served by an existing backend, which already has prepared your
> statement.
>
> So you first SELECT name FROM pg_prepared_statements; to know if you
> have to PREPARE or just EXECUTE, and you not only maintain much less
> running backends, lower fork() calls, but also benefit fully from
> preparing the statements even when you EXECUTE once per client
> connection.
>
>> Well, I know, we always can do better, and even use stored procedures,
>> etc. etc.
>
> Plain SQL stored procedure will prevent PostgreSQL to prepare your
> queries, only PLpgSQL functions will force transparent plan caching. But
> calling this PL will cost about 1ms per call in my tests, so it's not a
> good solution.
>
> It's possible to go as far as providing your own PostgreSQL C module
> where you PREPARE at _PG_init() time and EXECUTE in a SQL callable
> function, coupled with pgbouncer it should max out the perfs. But maybe
> you're not willing to go this far.
>
> Anyway, is hammering the server with always the same query your real
> need or just a simplified test-case? If the former, you'll see there are
> good ways to theorically obtain better perfs than what you're currently
> reaching, if the latter I urge you to consider some better benchmarking
> tools, such as playr or tsung.
>
>   https://area51.myyearbook.com/trac.cgi/wiki/Playr
>   http://tsung.erlang-projects.org/
>   http://pgfouine.projects.postgresql.org/tsung.html
>   http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
>
> Regards,
> --
> dim
>

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Timestamp index not used in some cases
Next
From: Simon Riggs
Date:
Subject: Re: Any better plan for this query?..