Thread: performance inconsistency

performance inconsistency

From
Phuong Ma
Date:
Good day,

We're running a week-old CVS snapshot of PostgreSQL 7.1, and I'm not
sure
if this performance inconsistency is specific to it, or if this is just
something in PostgreSQL in general, but it seems kind of odd, and I
could
use some help here. ;)

I have run two queries in a table full of invoices where the only
difference
between the queries is the where clause:

SELECT COUNT(DISTINCT l."ORDER_NUM") AS line_orders,
        SUM(l."GROSS_PROFIT" + "REPLACEMENT_COST") AS grand_total,
        SUM(l."LIST_PRICE" * l."SHIPPED_QUANTITY") AS
grand_total_alternative,
        SUM(l."GROSS_PROFIT") AS grand_total_profit
 FROM   cu_invoice_li1_tbl l INNER JOIN cu_invoice_03_tbl h
   ON   l."ORDER_NUM"=h."ORDER_NUM"

WHERE substr("ORDER_NUM", 1, 1) != 'W'; # (Orders NOT prefixed with W.)

OR...

WHERE substr("ORDER_NUM", 1, 1) = 'I';  # (Orders prefixed with I.)

The first query returns more rows than the second (because it includes
orders prefixed with I, C, F, etc), whereas the second only returns
orders
prefixed with I. It seems that the logic of identifying a single
character
as *not* being a single value would be identical or at least similar in
efficiency to the logic of a single character *being* being a single
value.

However, the first query runs in about 10-15 seconds, and the second
query
ran for over 40 minutes before I cancelled it.

Now, I can easily use a series of "!=" statements to get it down so that
it's only the 'I' orders (which runs even faster, at about 5-6
seconds!),
but can anyone explain this to me? :)

Are "!=" substring evaluations inherently faster by an obscene order of
magnitude, or is something really wrong here?



J.

Re: performance inconsistency

From
Tom Lane
Date:
Phuong Ma <pma@commandprompt.com> writes:
> WHERE substr("ORDER_NUM", 1, 1) != 'W'; # (Orders NOT prefixed with W.)

> OR...

> WHERE substr("ORDER_NUM", 1, 1) = 'I';  # (Orders prefixed with I.)

> However, the first query runs in about 10-15 seconds, and the second
> query
> ran for over 40 minutes before I cancelled it.

What query plans does EXPLAIN show for each case?  (And have you done
VACUUM ANALYZE lately?)

The change in the WHERE clause may be altering the planner's selectivity
guesstimates enough to cause choice of a less appropriate plan.  Hard to
tell without seeing what EXPLAIN has to say, though.

            regards, tom lane