Thread: prepared statements suboptimal?

prepared statements suboptimal?

From
rihad
Date:
Hi, I'm planning to use prepared statements of indefinite lifetime in a
daemon that will execute same statements rather frequently in reply to
client requests.

This link:
http://www.postgresql.org/docs/8.3/static/sql-prepare.html
has a note on performance:

In some situations, the query plan produced for a prepared statement
will be inferior to the query plan that would have been chosen if the
statement had been submitted and executed normally. This is because when
the statement is planned and the planner attempts to determine the
optimal query plan, the actual values of any parameters specified in the
statement are unavailable. PostgreSQL collects statistics on the
distribution of data in the table, and can use constant values in a
statement to make guesses about the likely result of executing the
statement. Since this data is unavailable when planning prepared
statements with parameters, the chosen plan might be suboptimal.

I don't understand why postgres couldn't plan this:
SELECT foo.e, foo.f
FROM foo
WHERE pk=$1 AND b=$2 AND status='1' AND c <= $3;

to be later executed any slower than

SELECT foo.e, foo.f
FROM foo
WHERE pk='abcabc' AND b='13' AND status='1' AND c <= '2007-11-20 13:14:15';

Can I help it make more educated guesses? In what scenarios could
prepared statements turn around and bite me, being slower than simple
queries? Is this a real problem in practice? Should I "refresh" prepared
statements from time to time? If so, how? Only by deallocating them and
preparing anew? Any knob to tweak for that?

Okay, enough questions :)

Thank you for any insights.

Re: prepared statements suboptimal?

From
Reg Me Please
Date:
Il Wednesday 07 November 2007 12:25:46 rihad ha scritto:
> I don't understand why postgres couldn't plan this:
> SELECT foo.e, foo.f
> FROM foo
> WHERE pk=$1 AND b=$2 AND status='1' AND c <= $3;
>
> to be later executed any slower than
>
> SELECT foo.e, foo.f
> FROM foo
> WHERE pk='abcabc' AND b='13' AND status='1' AND c <= '2007-11-20 13:14:15';

One solution is that the planner will work when it will see the query.
At that time the comparisons are all against unknown values.

Try the same with dyamical SQL, that is you dynamically build by placing the
current values instead of the $1, $2 and $3 placeholders.
In this case the planner will see the query with all current real values.
Then you execute it and compare the timings. Maybe you get better scores:
there's no warranty for better performances becasue you are going to send
the whole query again and again to the planner.

Of course you need a plpgsql function for this.
--
Reg me Please

Re: prepared statements suboptimal?

From
rihad
Date:
rihad wrote:
> Hi, I'm planning to use prepared statements of indefinite lifetime in a
> daemon that will execute same statements rather frequently in reply to
> client requests.
>
> This link:
> http://www.postgresql.org/docs/8.3/static/sql-prepare.html
> has a note on performance:
>
> In some situations, the query plan produced for a prepared statement
> will be inferior to the query plan that would have been chosen if the
> statement had been submitted and executed normally. This is because when
> the statement is planned and the planner attempts to determine the
> optimal query plan, the actual values of any parameters specified in the
> statement are unavailable. PostgreSQL collects statistics on the
> distribution of data in the table, and can use constant values in a
> statement to make guesses about the likely result of executing the
> statement. Since this data is unavailable when planning prepared
> statements with parameters, the chosen plan might be suboptimal.
>
> I don't understand why postgres couldn't plan this:
> SELECT foo.e, foo.f
> FROM foo
> WHERE pk=$1 AND b=$2 AND status='1' AND c <= $3;
>
> to be later executed any slower than
>
> SELECT foo.e, foo.f
> FROM foo
> WHERE pk='abcabc' AND b='13' AND status='1' AND c <= '2007-11-20 13:14:15';
>
> Can I help it make more educated guesses? In what scenarios could
> prepared statements turn around and bite me, being slower than simple
> queries? Is this a real problem in practice? Should I "refresh" prepared
> statements from time to time? If so, how? Only by deallocating them and
> preparing anew? Any knob to tweak for that?
>
> Okay, enough questions :)
>
> Thank you for any insights.
>


 From http://www.postgresql.org/docs/8.3/static/protocol-flow.html I
just read that "This possible penalty is avoided when using the unnamed
statement, since it is not planned until actual parameter values are
available."

Since I'm using Perl's DBI/pg, in postmaster's logs I can see that DBI's
prepare() seems to  using named prepared statements:

Nov  7 15:57:46 sol postgres[1685]: [2-1] LOG:  execute dbdpg_1:
Nov  7 15:57:46 sol postgres[1685]: [2-2]               SELECT
...
is there any way to tell it to use unnamed prepared statements? I
understand this is not a strictly PostgreSQL question so sorry if I'm
off the topic.

Re: prepared statements suboptimal?

From
"Albe Laurenz"
Date:
rihad wrote:
> Hi, I'm planning to use prepared statements of indefinite
> lifetime in a daemon that will execute same statements
> rather frequently in reply to client requests.
>
> This link:
> http://www.postgresql.org/docs/8.3/static/sql-prepare.html
> has a note on performance:
>
> In some situations, the query plan produced for a prepared statement
> will be inferior to the query plan that would have been chosen if the
> statement had been submitted and executed normally. [...]
>
> I don't understand why postgres couldn't plan this:
> SELECT foo.e, foo.f
> FROM foo
> WHERE pk=$1 AND b=$2 AND status='1' AND c <= $3;
>
> to be later executed any slower than
>
> SELECT foo.e, foo.f
> FROM foo
> WHERE pk='abcabc' AND b='13' AND status='1' AND c <=
> '2007-11-20 13:14:15';

For example, if the table contains almost no rows in the
beginning, the planner will choose to use a full table schan
even if - say - 'pk' is the primary key.

If you use the same execution plan later when the table is big,
the full table scan will hurt considerably, and you would
be much better of with an index lookup.

Other scenarios are certainly conceivable, but this one is
easy to understand.

> Can I help it make more educated guesses? In what scenarios could
> prepared statements turn around and bite me, being slower than simple
> queries? Is this a real problem in practice? Should I
> "refresh" prepared statements from time to time? If so, how? Only by
> deallocating them and preparing anew? Any knob to tweak for that?

You'll probably have to deallocate them and allocate them anew.

Yours,
Laurenz Albe

Re: prepared statements suboptimal?

From
Tom Lane
Date:
rihad <rihad@mail.ru> writes:
> I don't understand why postgres couldn't plan this:
> SELECT foo.e, foo.f
> FROM foo
> WHERE pk=$1 AND b=$2 AND status='1' AND c <= $3;

> to be later executed any slower than

> SELECT foo.e, foo.f
> FROM foo
> WHERE pk='abcabc' AND b='13' AND status='1' AND c <= '2007-11-20 13:14:15';

The reason is that without knowing the parameter values, the planner has
to pick a "generic" plan that will hopefully not be too awful regardless
of what the actual values end up being.  When it has the actual values
it can make much tighter estimates of the number of matching rows, and
possibly choose a much better but special-purpose plan.  As an example,
if the available indexes are on b and c then the best query plan for the
first case is probably bitmap indexscan on b.  But in the second case,
the planner might be able to determine (by consulting the ANALYZE stats)
that there are many rows matching b='13' but very few rows with c <=
'2007-11-20 13:14:15', so for those specific parameter values an
indexscan on c would be better.  It would be folly to choose that as the
generic plan, though, since on the average a one-sided inequality on c
could be expected to not be very selective at all.

            regards, tom lane

Re: prepared statements suboptimal?

From
rihad
Date:
Tom Lane wrote:
> rihad <rihad@mail.ru> writes:
>> I don't understand why postgres couldn't plan this:
>> SELECT foo.e, foo.f
>> FROM foo
>> WHERE pk=$1 AND b=$2 AND status='1' AND c <= $3;
>
>> to be later executed any slower than
>
>> SELECT foo.e, foo.f
>> FROM foo
>> WHERE pk='abcabc' AND b='13' AND status='1' AND c <= '2007-11-20 13:14:15';
>
> The reason is that without knowing the parameter values, the planner has
> to pick a "generic" plan that will hopefully not be too awful regardless
> of what the actual values end up being.  When it has the actual values
> it can make much tighter estimates of the number of matching rows, and
> possibly choose a much better but special-purpose plan.  As an example,
> if the available indexes are on b and c then the best query plan for the
> first case is probably bitmap indexscan on b.  But in the second case,
> the planner might be able to determine (by consulting the ANALYZE stats)
> that there are many rows matching b='13' but very few rows with c <=
> '2007-11-20 13:14:15', so for those specific parameter values an
> indexscan on c would be better.  It would be folly to choose that as the
> generic plan, though, since on the average a one-sided inequality on c
> could be expected to not be very selective at all.
>
Aha, thanks for a thorough explanation. Now I understand that while
looking for a way to fulfill the query postgres will try hard to pick
the one requiring the least number of rows visits. I've skimmed over my
queries: almost all of them make use of the primary key as the first
thing in the WHERE clause (say, a username, which is the only pk in the
table): shouldn't that be enough for postgres to *always* decide to scan
the pk's index (since a query on a pk always returns either one or zero
results)?

  Same question for any number of joins where bar.id or baz.id is always
aPK:

select ... from foo JOIN bar ON(foo.bar_id=bar.id) JOIN baz
ON(foo.baz_id=baz.id) WHERE asd=? AND dsa=?;


Re: prepared statements suboptimal?

From
Tom Lane
Date:
rihad <rihad@mail.ru> writes:
> Aha, thanks for a thorough explanation. Now I understand that while
> looking for a way to fulfill the query postgres will try hard to pick
> the one requiring the least number of rows visits. I've skimmed over my
> queries: almost all of them make use of the primary key as the first
> thing in the WHERE clause (say, a username, which is the only pk in the
> table): shouldn't that be enough for postgres to *always* decide to scan
> the pk's index (since a query on a pk always returns either one or zero
> results)?

Yeah, if there's always a PK equality constraint then the dependence on
specific parameter values is much weaker, so you could probably use a
prepared statement without worrying.  The cases where prepared
statements tend to suck usually involve either inequalities, or
equalities on non-unique columns where the number of matches varies
wildly for different data values.  In cases like that, knowing the exact
value being compared to makes a very large difference in the rowcount
estimate.

            regards, tom lane