Thread: index not used in joins

index not used in joins

From
Sebastian Böck
Date:
Hello all,

i have a problem with index usage and joins.
Attached is some SQL demonstrating my problem;

Why is the index only used in the 2nd query?

Can anybody explain me how to avoid/fix this.

Thanks in advance

Sebastian
CREATE TABLE users (
    login NAME NOT NULL PRIMARY KEY,
    datum TIMESTAMP,
    version INTEGER
);

CREATE TABLE test (
    datum TIMESTAMP NOT NULL,
    version INTEGER NOT NULL,
    approved TIMESTAMP
);

CREATE OR REPLACE VIEW v AS
    SELECT t.*
    FROM test AS t
    INNER JOIN users AS u ON
        t.datum <= u.datum AND
        (t.version = u.version OR
         t.approved IS NOT NULL);

CREATE OR REPLACE FUNCTION fill () RETURNS BOOLEAN AS '
    DECLARE
        i INTEGER;
    BEGIN
        FOR i IN 1..1000 LOOP
            EXECUTE ''INSERT INTO test (datum,version)
                VALUES (now(),''|| i || '')'';
        END LOOP;
        RETURN TRUE;
    END;
' LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION approved () RETURNS BOOLEAN AS '
    DECLARE
        i INTEGER;
    BEGIN
        FOR i IN 1..1000 LOOP
            EXECUTE ''INSERT INTO test (datum,version,approved)
                VALUES (now(),''|| i || '',now())'';
        END LOOP;
        RETURN TRUE;
    END;
' LANGUAGE plpgsql;

SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT fill();
SELECT approved();

INSERT INTO users (login,datum,version) VALUES ('sb',now(),'999');

CREATE INDEX test_ ON test (datum);
CREATE INDEX test_999 ON test (datum) WHERE version = '999' OR approved IS NOT NULL;

ANALYZE;

EXPLAIN ANALYZE SELECT * FROM v;
EXPLAIN ANALYZE SELECT t.* FROM test AS t INNER JOIN users AS u ON t.datum <= u.datum AND (t.version = '999' OR
t.approvedIS NOT NULL); 

Re: index not used in joins

From
Richard Huxton
Date:
Sebastian Böck wrote:
> Richard Huxton wrote:
>> Can you post the output from your "explain analyse" calls too? The
>> statistics aren't going to be the same on different machines.
>>
>
> Sure, here it is.

Thanks. (PS - remember to cc the list too).

> EXPLAIN ANALYZE SELECT * FROM v;
>                                                           QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------

>
>  Nested Loop  (cost=0.00..263.12 rows=116 width=20) (actual
> time=5.171..109.910 rows=1020 loops=1)
>    Join Filter: (("inner"."version" = "outer"."version") OR
> ("inner".approved IS NOT NULL))
>    ->  Seq Scan on users u  (cost=0.00..1.01 rows=1 width=12) (actual
> time=0.005..0.009 rows=1 loops=1)
>    ->  Index Scan using test_ on test t  (cost=0.00..155.74 rows=7092
> width=20) (actual time=0.012..64.873 rows=21000 loops=1)
>          Index Cond: (t.datum <= "outer".datum)
>  Total runtime: 111.879 ms

> EXPLAIN ANALYZE SELECT t.* FROM test AS t INNER JOIN users AS u ON
> t.datum <= u.datum AND (t.version = '999' OR t.approved IS NOT NULL);
>                                                          QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------

>
>  Nested Loop  (cost=0.00..7.78 rows=133 width=20) (actual
> time=0.035..7.733 rows=1020 loops=1)
>    ->  Seq Scan on users u  (cost=0.00..1.01 rows=1 width=8) (actual
> time=0.006..0.010 rows=1 loops=1)
>    ->  Index Scan using test_999 on test t  (cost=0.00..5.11 rows=132
> width=20) (actual time=0.017..3.358 rows=1020 loops=1)
>          Index Cond: (t.datum <= "outer".datum)
>          Filter: (("version" = 999) OR (approved IS NOT NULL))
>  Total runtime: 9.528 ms

OK - so what you want to know is why index "test_999" is used in the
second but not the first, even though both return the same rows.

The fact is that the conditional index:
   CREATE INDEX test_999 ON test (datum)
   WHERE version = '999' OR approved IS NOT NULL;
AFAIK looks at the WHERE clause of your query to determine where it can
run. Don't forget that the planner needs to pick which index is best
*before* it starts fetching data.

So - in the first example there might be rows where e.g. t.version=998
which means test_999 would be a poor choice of index.

--
   Richard Huxton
   Archonet Ltd

Re: index not used in joins

From
Sebastian Böck
Date:
Richard Huxton wrote:
> Sebastian Böck wrote:
>
>> Richard Huxton wrote:
>>
>>> Can you post the output from your "explain analyse" calls too? The
>>> statistics aren't going to be the same on different machines.
>>>
>>
>> Sure, here it is.
>
>
> Thanks. (PS - remember to cc the list too).

[output of EXPLAIN ANALYZE]

> OK - so what you want to know is why index "test_999" is used in the
> second but not the first, even though both return the same rows.
>
> The fact is that the conditional index:
>   CREATE INDEX test_999 ON test (datum)
>   WHERE version = '999' OR approved IS NOT NULL;
> AFAIK looks at the WHERE clause of your query to determine where it can
> run. Don't forget that the planner needs to pick which index is best
> *before* it starts fetching data.
>
> So - in the first example there might be rows where e.g. t.version=998
> which means test_999 would be a poor choice of index.

But what if the table users contains only 1 row and the column "version"
has a value of "999"?

Are there any other options to speed up this kind of query?

Thanks so far

Sebastian

Re: index not used in joins

From
Richard Huxton
Date:
Sebastian Böck wrote:
> Richard Huxton wrote:
>
>> Sebastian Böck wrote:
>>
>>> Richard Huxton wrote:
>>>
>>>> Can you post the output from your "explain analyse" calls too? The
>>>> statistics aren't going to be the same on different machines.
>>>>
>>>
>>> Sure, here it is.
>>
>>
>>
>> Thanks. (PS - remember to cc the list too).
>
>
> [output of EXPLAIN ANALYZE]
>
>> OK - so what you want to know is why index "test_999" is used in the
>> second but not the first, even though both return the same rows.
>>
>> The fact is that the conditional index:
>>   CREATE INDEX test_999 ON test (datum)
>>   WHERE version = '999' OR approved IS NOT NULL;
>> AFAIK looks at the WHERE clause of your query to determine where it
>> can run. Don't forget that the planner needs to pick which index is
>> best *before* it starts fetching data.
>>
>> So - in the first example there might be rows where e.g. t.version=998
>> which means test_999 would be a poor choice of index.
>
>
> But what if the table users contains only 1 row and the column "version"
> has a value of "999"?

It still doesn't know that the only value in "version" is 999(*). Let's
say there were 2000 rows and 1900 had the value 999 - the index is still
useless because we'd have to do a sequential scan to check the remaining
200 rows.

> Are there any other options to speed up this kind of query?

Well, your problem is the (version=X OR approved IS NOT NULL) clause. I
must admit I can't quite see what this is supposed to do. The "test"
table connects to the "users" table via "version" (and "datum", though
not a simple check) unless the "test" has been "approved", in which case
it applies to all users?
Can you explain what the various tables/columns are really for?

(*) Don't forget the statistics for column values are usually
out-of-date compared to the actual data, so you can't rely on it.

--
   Richard Huxton
   Archonet Ltd

Re: index not used in joins

From
Sebastian Böck
Date:
Richard Huxton wrote:
> Sebastian Böck wrote:
>
>> Richard Huxton wrote:
>>
>>> Sebastian Böck wrote:
>>>
>>>> Richard Huxton wrote:
>>>>
>>>>> Can you post the output from your "explain analyse" calls too? The
>>>>> statistics aren't going to be the same on different machines.
>>>>>
>>>>
>>>> Sure, here it is.
>>>
>>>
>>>
>>>
>>> Thanks. (PS - remember to cc the list too).
>>
>>
>>
>> [output of EXPLAIN ANALYZE]
>>
>>> OK - so what you want to know is why index "test_999" is used in the
>>> second but not the first, even though both return the same rows.
>>>
>>> The fact is that the conditional index:
>>>   CREATE INDEX test_999 ON test (datum)
>>>   WHERE version = '999' OR approved IS NOT NULL;
>>> AFAIK looks at the WHERE clause of your query to determine where it
>>> can run. Don't forget that the planner needs to pick which index is
>>> best *before* it starts fetching data.
>>>
>>> So - in the first example there might be rows where e.g.
>>> t.version=998 which means test_999 would be a poor choice of index.
>>
>>
>>
>> But what if the table users contains only 1 row and the column "version"
>> has a value of "999"?
>
>
> It still doesn't know that the only value in "version" is 999(*). Let's
> say there were 2000 rows and 1900 had the value 999 - the index is still
> useless because we'd have to do a sequential scan to check the remaining
> 200 rows.
>
>> Are there any other options to speed up this kind of query?
>
>
> Well, your problem is the (version=X OR approved IS NOT NULL) clause. I
> must admit I can't quite see what this is supposed to do. The "test"
> table connects to the "users" table via "version" (and "datum", though
> not a simple check) unless the "test" has been "approved", in which case
> it applies to all users?
> Can you explain what the various tables/columns are really for?

The whole thing is a multiuser facility managment application.
Every user can plan things like he wants (different versions).
All these changes apply to a common (approved) version.
Things get complicated as everybody should be able to "travel"
through the history via the "datum" field.

That's why i need this "silly OR" in my where-clause.

At the moment i get very exciting results using immutable
functions, but i have another question.

In the docs it is stated that:

IMMUTABLE indicates that the function always returns the same
result when given the same argument values;

What if i define my functions like:

CREATE OR REPLACE FUNCTION datum () RETURNS TIMESTAMP AS '
   SELECT datum FROM public.benutzer;
' LANGUAGE sql IMMUTABLE;

They normally (untill now) give the correct results,
also if the values in the underlaying view changes.

Can i relay on this or is it only luck.

> (*) Don't forget the statistics for column values are usually
> out-of-date compared to the actual data, so you can't rely on it.

I'm aware of that.

Thanks

Sebastian