Re: Any better plan for this query?.. - Mailing list pgsql-performance

From Aidan Van Dyk
Subject Re: Any better plan for this query?..
Date
Msg-id 20090511194615.GT3305@yugib.highrise.ca
Whole thread Raw
In response to Re: Any better plan for this query?..  (Dimitri <dimitrik.fr@gmail.com>)
Responses Re: Any better plan for this query?..
List pgsql-performance
* Dimitri <dimitrik.fr@gmail.com> [090511 11:18]:
> Folks, it's completely crazy, but here is what I found:
>
> - if HISTORY table is analyzed with target 1000 my SELECT response
> time is jumping to 3ms, and the max throughput is limited to 6.000 TPS
> (it's what happenned to 8.3.7)
>
> -if HISTORY table is analyzed with target 5 - my SELECT response time
> is decreasing to 1.2ms (!)  and then my max TPS level is ~12.000 !
> and CPU is used up to 95% even by 8.3.7 :-)  and 8.4 performed better
> just because I left its analyze target to default 100 value.
>
> Anyone may explain me why analyze target may have so huge negative
> secondary effect?..

It's actually pretty straight forward.

The PostgreSQL query planner is a "smart planner".  It takes into
consideration all the statistics available on the columns/tables,
expected outputs based on inputs, etc, to choose what it thinks will be
the best plan.  The more data you have in statistics (the larger
statistics target you have), the more CPU time and longer it's going to
take to "plan" your queries.  The tradeoff is hopefully better plans.

But, in your scenario, where you are hitting the database with the
absolute worst possible way to use PostgreSQL, with small, repeated,
simple queries, you're not getting the advantage  of "better" plans.  In
your case, you're throwing absolutely simple queries at PG as fast as
you can, and for each query, PostgreSQL has to:

1) Parse the given "query string"
2) Given the statistics available, plan the query and pick the best one
3) Actually run the query.

Part 2 is going to dominate the CPU time in your tests, more so the more
statistics it has to evaluate, and unless the data has to come from the
disks (i.e. not in shared buffers or cache) is thus going to dominate the
time before you get your results.  More statistics means more time
needed to do the planning/picking of the query.

If you were to use prepared statements, the cost of #1 and #2 is done
once, and then every time you throw a new execution of the query to
PostgreSQL, you get to just do #3, the easy quick part, especially for
small simple queries where all the data is in shared buffers or the cache.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

Attachment

pgsql-performance by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Any better plan for this query?..
Next
From: Cory Coager
Date:
Subject: Query planner making bad decisions