Re: reducing number of ANDs speeds up query - Mailing list pgsql-general

From Alban Hertroys
Subject Re: reducing number of ANDs speeds up query
Date
Msg-id CAF-3MvNS94v9+gPFvW=eF8WO9=QsbwAP3pK3PmPuSg+ism5fpA@mail.gmail.com
Whole thread Raw
In response to Re: reducing number of ANDs speeds up query  ("T. E. Lawrence" <t.e.lawrence@icloud.com>)
Responses Re: reducing number of ANDs speeds up query RESOLVED  ("T. E. Lawrence" <t.e.lawrence@icloud.com>)
List pgsql-general
On 12 January 2013 12:41, T. E. Lawrence <t.e.lawrence@icloud.com> wrote:
Hi and thank you for your notes!

> You really ought to include the output of EXPLAIN ANALYZE in cases such as these (if it doesn't already point you to the culprit).

I'll do so, it takes quite long...

> Most likely you'll find that the last condition added a sequential scan to the query plan,

Exactly! EXPLAIN says so.

> which can have several causes/reasons. Are the estimated #rows close to the actual #rows?

Yes, this is the problem. I read that in such cases indexes are not read. However if the previous conditions are executed first, the result is zero or just a few rows and there is no need seq scan the whole values column.

You mean they don't match, do you?

The database doesn't know what you know and its making the wrong decision based on incorrect data.

The database won't use an index if it thinks that there aren't many rows to check against a condition. Most likely (the results from explain analyze would tell) the database thinks there are much fewer rows in table b than there actually are.

You'll probably want to read about database maintenance for Postgres and how to keep its statistics up to date. Autovacuum may need some tuning or you need to run manual VACUUM more frequently.
In fact, run VACUUM now and see if the problem goes away.

You'll usually also want to run VACUUM after a large batch job.
 
> Meanwhile, it looks like most of your AND's are involved in joining tables a and b. Perhaps it helps to use an explicit join instead of an implicit one?

I am not quite sure what this means, but will read about it.

You're currently using implicit joins by combining your join conditions in the WHERE clause of your query, like this:
SELECT *
FROM a, b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND b.value = 'yadayada';

You can also explicitly put your join conditions with the joins, like so:
SELECT *
FROM a INNER JOIN b ON (a.col1 = b.col1 AND a.col2 = b.col2)
WHERE b.value = 'yadayada';

You explicitly tell the database that those are the conditions to be joined on and that the remaining conditions are filters on the result set. With just two tables the need for such isn't that obvious, but with more tables it quickly becomes difficult to see what condition in an implicit join is part of the joins and which is the result set filter. With explicit joins that's much clearer.
It wouldn't be the first time that I rewrite a query to use explicit joins, only to find that the original query was incorrect.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

pgsql-general by date:

Previous
From: Tony Theodore
Date:
Subject: Re: reducing number of ANDs speeds up query
Next
From: Jeff Janes
Date:
Subject: Re: changes "during checkpointing"