Thread: reducing number of ANDs speeds up query

reducing number of ANDs speeds up query

From
"T. E. Lawrence"
Date:
Hello,

I have a pretty standard query with two tables:

SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;

With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute),
becausethe value column is not indexed (contains items longer than 8K). 

However the previous conditions "WHERE ... AND ... AND" should have already reduced the candidate rows to just a few
(table_bcontains over 50m rows). And indeed, removing the last "AND b.value=..." speeds the query to just a
millisecond.

Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND ... AND" and then the last "AND
b.value=..."on the (very small) result? 

Thank you and kind regards,
T.


Re: reducing number of ANDs speeds up query

From
Amit kapila
Date:
On Saturday, January 12, 2013 7:17 AM T. E. Lawrence wrote:
> Hello,

> I have a pretty standard query with two tables:

> SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;

> With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute),
becausethe value column is not indexed (contains items longer than 8K). 

> However the previous conditions "WHERE ... AND ... AND" should have already reduced the candidate rows to just a few
(table_bcontains over 50m rows). And indeed, removing the last "AND b.value=..." speeds the query to just a
millisecond.

> Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND ... AND" and then the last "AND
b.value=..."on the (very small) result? 

You can try once with below query:
Select * from (SELECT a.id,b.value FROM table_a a, table_b b WHERE ... AND ... ) X where X.value=...;

If this doesn't work can you send the Explain .. output for both queries(the query you are using and the query I have
suggested)


With Regards,
Amit Kapila.

Re: reducing number of ANDs speeds up query

From
Alban Hertroys
Date:
You really ought to include the output of EXPLAIN ANALYZE in cases such as these (if it doesn't already point you to the culprit).

Most likely you'll find that the last condition added a sequential scan to the query plan, which can have several causes/reasons. Are the estimated #rows close to the actual #rows? Is b.value indexed? How selective is the value you're matching it against (is it uncommon or quite common)? Etc, etc.

Meanwhile, it looks like most of your AND's are involved in joining tables a and b. Perhaps it helps to use an explicit join instead of an implicit one?


On 12 January 2013 02:47, T. E. Lawrence <t.e.lawrence@icloud.com> wrote:
Hello,

I have a pretty standard query with two tables:

SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;

With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute), because the value column is not indexed (contains items longer than 8K).

However the previous conditions "WHERE ... AND ... AND" should have already reduced the candidate rows to just a few (table_b contains over 50m rows). And indeed, removing the last "AND b.value=..." speeds the query to just a millisecond.

Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND ... AND" and then the last "AND b.value=..." on the (very small) result?

Thank you and kind regards,
T.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: reducing number of ANDs speeds up query

From
Eduardo Morras
Date:
On Sat, 12 Jan 2013 02:47:26 +0100
"T. E. Lawrence" <t.e.lawrence@icloud.com> wrote:

> Hello,
>
> I have a pretty standard query with two tables:
>
> SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;
>
> With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute),
becausethe value column is not indexed (contains items longer than 8K). 

You can construct your own home made index, add a new column in table b, with the first 8-16 bytes/chars of b.value,
usethis column on your query and refine to a complete b.value. Don't forget tocCreate an index for it too. You can keep
thiscolumn updated with a trigger. 

Perhaps you can use a partial index for b.value column, i never used that feature so documentation/others can point you
howto do it. 


> However the previous conditions "WHERE ... AND ... AND" should have already reduced the candidate rows to just a few
(table_bcontains over 50m rows). And indeed, removing the last "AND b.value=..." speeds the query to just a
millisecond.
>
> Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND ... AND" and then the last "AND
b.value=..."on the (very small) result? 
>
> Thank you and kind regards,
> T.

---   ---
Eduardo Morras <emorrasg@yahoo.es>


Re: reducing number of ANDs speeds up query

From
"T. E. Lawrence"
Date:
On 12.01.2013, at 07:10, Amit kapila <amit.kapila@huawei.com> wrote:
> You can try once with below query:
> Select * from (SELECT a.id,b.value FROM table_a a, table_b b WHERE ... AND ... ) X where X.value=...;
>
> If this doesn't work can you send the Explain .. output for both queries(the query you are using and the query I have
suggested)
>
> With Regards,
> Amit Kapila.

Hi and thank you!

I will try this and let you know!

T.


Re: reducing number of ANDs speeds up query

From
"T. E. Lawrence"
Date:
Hi and thank you for your notes!

> You really ought to include the output of EXPLAIN ANALYZE in cases such as these (if it doesn't already point you to
theculprit). 

I'll do so, it takes quite long...

> Most likely you'll find that the last condition added a sequential scan to the query plan,

Exactly! EXPLAIN says so.

> which can have several causes/reasons. Are the estimated #rows close to the actual #rows?

Yes, this is the problem. I read that in such cases indexes are not read. However if the previous conditions are
executedfirst, the result is zero or just a few rows and there is no need seq scan the whole values column. 

> Is b.value indexed?

No, because it contains too long values for indexing.

> How selective is the value you're matching it against (is it uncommon or quite common)? Etc, etc.

Zero to a few.

> Meanwhile, it looks like most of your AND's are involved in joining tables a and b. Perhaps it helps to use an
explicitjoin instead of an implicit one? 

I am not quite sure what this means, but will read about it.

There were 2 more suggestions, I'll try now everything and write back.

Thank you very much for your help!
T.


Re: reducing number of ANDs speeds up query

From
"T. E. Lawrence"
Date:
Hi and thank you!

On 12.01.2013, at 11:52, Eduardo Morras <emorrasg@yahoo.es> wrote:

>> With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute),
becausethe value column is not indexed (contains items longer than 8K). 
>
> You can construct your own home made index, add a new column in table b, with the first 8-16 bytes/chars of b.value,
usethis column on your query and refine to a complete b.value. Don't forget tocCreate an index for it too. You can keep
thiscolumn updated with a trigger. 

Yes, I have been considering this in a slightly different way. value contains short and long values (mixed). Only the
shortvalues are queried directly. The long values are queried in a tsearch column or in an external Sphinx Search. So
probablyI should split the short and long values and then index the short values. 

It is nevertheless slightly annoying that I cannot make the query do the value thing last...

> Perhaps you can use a partial index for b.value column, i never used that feature so documentation/others can point
youhow to do it. 

Did not know of them, reading. Thank you!
T.


Re: reducing number of ANDs speeds up query

From
Tony Theodore
Date:
On 12/01/2013, at 12:47 PM, T. E. Lawrence <t.e.lawrence@icloud.com> wrote:

> Hello,
>
> I have a pretty standard query with two tables:
>
> SELECT table_a.id FROM table_a a, table_b b WHERE ... AND ... AND b.value=...;
>
> With the last "AND b.value=..." the query is extremely slow (did not wait for it to end, but more than a minute),
becausethe value column is not indexed (contains items longer than 8K). 
>
> However the previous conditions "WHERE ... AND ... AND" should have already reduced the candidate rows to just a few
(table_bcontains over 50m rows). And indeed, removing the last "AND b.value=..." speeds the query to just a
millisecond.
>
> Is there a way to instruct PostgreSQL to do first the initial "WHERE ... AND ... AND" and then the last "AND
b.value=..."on the (very small) result? 

Have you looked at the WITH clause [1,2]:

WITH filtered as (SELECT table_a.id, b.value as val FROM table_a a, table_b b WHERE … AND …)
SELECT * FROM filtered WHERE filtered.val=…

It evaluates the the first SELECT once, then applies the second SELECT to the first in memory (at least that's the way
Ithink about them). 

Cheers,

Tony


[1] http://www.postgresql.org/docs/9.2/static/queries-with.html
[2] http://www.postgresql.org/docs/9.2/static/sql-select.html#SQL-WITH



Re: reducing number of ANDs speeds up query

From
Alban Hertroys
Date:
On 12 January 2013 12:41, T. E. Lawrence <t.e.lawrence@icloud.com> wrote:
Hi and thank you for your notes!

> You really ought to include the output of EXPLAIN ANALYZE in cases such as these (if it doesn't already point you to the culprit).

I'll do so, it takes quite long...

> Most likely you'll find that the last condition added a sequential scan to the query plan,

Exactly! EXPLAIN says so.

> which can have several causes/reasons. Are the estimated #rows close to the actual #rows?

Yes, this is the problem. I read that in such cases indexes are not read. However if the previous conditions are executed first, the result is zero or just a few rows and there is no need seq scan the whole values column.

You mean they don't match, do you?

The database doesn't know what you know and its making the wrong decision based on incorrect data.

The database won't use an index if it thinks that there aren't many rows to check against a condition. Most likely (the results from explain analyze would tell) the database thinks there are much fewer rows in table b than there actually are.

You'll probably want to read about database maintenance for Postgres and how to keep its statistics up to date. Autovacuum may need some tuning or you need to run manual VACUUM more frequently.
In fact, run VACUUM now and see if the problem goes away.

You'll usually also want to run VACUUM after a large batch job.
 
> Meanwhile, it looks like most of your AND's are involved in joining tables a and b. Perhaps it helps to use an explicit join instead of an implicit one?

I am not quite sure what this means, but will read about it.

You're currently using implicit joins by combining your join conditions in the WHERE clause of your query, like this:
SELECT *
FROM a, b
WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND b.value = 'yadayada';

You can also explicitly put your join conditions with the joins, like so:
SELECT *
FROM a INNER JOIN b ON (a.col1 = b.col1 AND a.col2 = b.col2)
WHERE b.value = 'yadayada';

You explicitly tell the database that those are the conditions to be joined on and that the remaining conditions are filters on the result set. With just two tables the need for such isn't that obvious, but with more tables it quickly becomes difficult to see what condition in an implicit join is part of the joins and which is the result set filter. With explicit joins that's much clearer.
It wouldn't be the first time that I rewrite a query to use explicit joins, only to find that the original query was incorrect.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: reducing number of ANDs speeds up query RESOLVED

From
"T. E. Lawrence"
Date:
RESOLVED
--
Dear all,

Thank you for your great help and multiple advices.

I discovered the problem and I have to say that it is very stupid and strange.

Here is what happened.

From all advices I tried first partial index. The index was built and there was no change in the speed of the slow
query.Which depressed me greatly. In the midst of my depression I ran VACUUM ANALYZE which took about 10 hours (the db
isabout 170 GB and has more than 500m rows in some tables, running on a 4 core, 8 GB RAM dedicated PostgreSQL cloud
server).Towards the end of VACUUM ANALYZE I was playing with some queries and suddenly the slow query became fast!
(whichpartially defeated the notion that one does not need ANALYZE upon CREATE INDEX) And I said "Aha!". 

So I decided to try the whole thing properly from the beginning. Dropped the index, did again VACUUM ANALYZE and tried
thequeries, in order to measure them without and with index. Surprise! - the slow query was blazing fast. The previous
indexes(not the dropped partial index) were properly used. All was fine. 

Which makes me think that, as we grew the database more than 250 times in size over a 2-3 months period, relying on
autovacuum(some tables grew from 200k to 50m records, other from 1m to 500m records), the autovacuum has either let us
downor something has happen to the ANALYZE. 

Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?

Tank you and all the best,
T.


Re: reducing number of ANDs speeds up query RESOLVED

From
Jeff Janes
Date:
On Monday, January 14, 2013, T. E. Lawrence wrote:
RESOLVED
--
Dear all,

Thank you for your great help and multiple advices.

I discovered the problem and I have to say that it is very stupid and strange.

Here is what happened.

 
...
 
So I decided to try the whole thing properly from the beginning. Dropped the index, did again VACUUM ANALYZE and tried the queries, in order to measure them without and with index. Surprise! - the slow query was blazing fast. The previous indexes (not the dropped partial index) were properly used. All was fine.

Which makes me think that, as we grew the database more than 250 times in size over a 2-3 months period, relying on autovacuum (some tables grew from 200k to 50m records, other from 1m to 500m records), the autovacuum has either let us down or something has happen to the ANALYZE.

What do pg_stat_user_tables tell you about last_vacuum, last_autovacuum, last_analyze, last_autoanalyze ?


Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?

No.  For example, if you constantly do things that need an access exclusive lock, then autovac will keep getting interrupted and never finish.

Cheers,

Jeff


Re: reducing number of ANDs speeds up query RESOLVED

From
"T. E. Lawrence"
Date:
On 15.01.2013, at 05:45, Jeff Janes <jeff.janes@gmail.com> wrote:

>> Which makes me think that, as we grew the database more than 250 times in size over a 2-3 months period, relying on
autovacuum(some tables grew from 200k to 50m records, other from 1m to 500m records), the autovacuum has either let us
downor something has happen to the ANALYZE. 
>
> What do pg_stat_user_tables tell you about last_vacuum, last_autovacuum, last_analyze, last_autoanalyze?

              relname               |          last_vacuum          |        last_autovacuum        |
last_analyze         |       last_autoanalyze         

------------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
 elements                           | 2013-01-14 16:14:48.963573+00 |                               | 2013-01-14
16:19:48.651155+00| 2012-12-12 12:23:31.308877+00 

This is the problematic table. I think it is clear. Last autovacuum has been never and last autoanalyze has been
mid-December.

Thank you!

>> Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?
>
> No.  For example, if you constantly do things that need an access exclusive lock, then autovac will keep getting
interruptedand never finish. 

I see.

So, apparently, we need to interrupt the heavy imports on some reasonable intervals and do manual VACUUM ANALYZE?

> Cheers,
>
> Jeff


Thank you very much,
T.



Re: reducing number of ANDs speeds up query RESOLVED

From
Tom Lane
Date:
"T. E. Lawrence" <t.e.lawrence@icloud.com> <CAMkU=1y6UuxPYbf_ky8DVDsJi=g=uQ1t0B6kwLEtdc7NLxB_-Q@mail.gmail.com> writes:
> On 15.01.2013, at 05:45, Jeff Janes <jeff.janes@gmail.com> wrote:
>>> Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?

>> No.  For example, if you constantly do things that need an access exclusive lock, then autovac will keep getting
interruptedand never finish. 

> I see.

> So, apparently, we need to interrupt the heavy imports on some reasonable intervals and do manual VACUUM ANALYZE?

Data import as such, no matter how "heavy", shouldn't be a problem.
The question is what are you doing that takes access-exclusive table
locks frequently, and do you really need to do that?

A quick look at the docs suggests that ALTER TABLE, REINDEX, or CLUSTER
would be the most likely candidates for taking exclusive table locks.

            regards, tom lane


Re: reducing number of ANDs speeds up query RESOLVED

From
Jeff Janes
Date:
On Tue, Jan 15, 2013 at 7:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "T. E. Lawrence" <t.e.lawrence@icloud.com> <CAMkU=1y6UuxPYbf_ky8DVDsJi=g=uQ1t0B6kwLEtdc7NLxB_-Q@mail.gmail.com>
writes:
>> On 15.01.2013, at 05:45, Jeff Janes <jeff.janes@gmail.com> wrote:
>>>> Is the autovacuum 100% reliable in relation to VACUUM ANALYZE?
>
>>> No.  For example, if you constantly do things that need an access exclusive lock, then autovac will keep getting
interruptedand never finish. 
>
>> I see.
>
>> So, apparently, we need to interrupt the heavy imports on some reasonable intervals and do manual VACUUM ANALYZE?
>
> Data import as such, no matter how "heavy", shouldn't be a problem.
> The question is what are you doing that takes access-exclusive table
> locks frequently, and do you really need to do that?
>
> A quick look at the docs suggests that ALTER TABLE, REINDEX, or CLUSTER
> would be the most likely candidates for taking exclusive table locks.

But that isn't an exhaustive list--weaker locks will also cancel
autovacuum, for example I think the SHARE lock taken by CREATE INDEX
will and the even weaker one taken by CREATE INDEX CONCURRENTLY will
too.

But will all of those cancel auto-analyze as well as auto-vac?  I
guess they will because they use the same lock level.

T.E., Fortunately in point releases from August 2012 (9.0.9, 9.1.5,
etc.), the default server log settings will log both the cancel and
the command triggering the cancel.  So if you are running an up to
date server, you can just look in the logs to see what is happening.

Cheers,

Jeff


Re: reducing number of ANDs speeds up query RESOLVED

From
"T. E. Lawrence"
Date:
On 15.01.2013, at 16:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "T. E. Lawrence" <t.e.lawrence@icloud.com> <CAMkU=1y6UuxPYbf_ky8DVDsJi=g=uQ1t0B6kwLEtdc7NLxB_-
>> So, apparently, we need to interrupt the heavy imports on some reasonable intervals and do manual VACUUM ANALYZE?
>
> Data import as such, no matter how "heavy", shouldn't be a problem.
> The question is what are you doing that takes access-exclusive table
> locks frequently, and do you really need to do that?
>
> A quick look at the docs suggests that ALTER TABLE, REINDEX, or CLUSTER
> would be the most likely candidates for taking exclusive table locks.
>
>             regards, tom lane

Thank you for this.

We will have to research into this, it will take a while.

T.


Re: reducing number of ANDs speeds up query RESOLVED

From
"T. E. Lawrence"
Date:
On 15.01.2013, at 17:32, Jeff Janes <jeff.janes@gmail.com> wrote:
> T.E., Fortunately in point releases from August 2012 (9.0.9, 9.1.5,
> etc.), the default server log settings will log both the cancel and
> the command triggering the cancel.  So if you are running an up to
> date server, you can just look in the logs to see what is happening.
>
> Cheers,
>
> Jeff

That's interesting, I'll check it. Thank you.
T.


Re: reducing number of ANDs speeds up query RESOLVED

From
Eduardo Morras
Date:
On Wed, 16 Jan 2013 23:42:23 +0100
"T. E. Lawrence" <t.e.lawrence@icloud.com> wrote:

>
> On 15.01.2013, at 17:32, Jeff Janes <jeff.janes@gmail.com> wrote:
> > T.E., Fortunately in point releases from August 2012 (9.0.9, 9.1.5,
> > etc.), the default server log settings will log both the cancel and
> > the command triggering the cancel.  So if you are running an up to
> > date server, you can just look in the logs to see what is happening.
> >
> > Cheers,
> >
> > Jeff
>
> That's interesting, I'll check it. Thank you.

And now the million dollars question, do you have any transaction in 'IDLE IN TRANSACTION' state?

If yes, there's one possible problem. For example, we used (note the past) a message queue middleware which uses a
postgresdb for message passing, but it keeps the transaction in 'IDLE IN TRANSACTION' state, so if i do a select * from
message_tbli get 1-5 rows, but the tbl used (note again the past) several Gb of hd space because autovacuum couldn't
clean.Of course once discovered the problem the middleware was changed by another (home-built) one which don't keeps
theIDLE IN TRANSACTION state. 

> T.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


---   ---
Eduardo Morras <emorrasg@yahoo.es>