Thread: Incorrect index being used

Incorrect index being used

From

Jesse Long

Date:

09 October 2013, 07:50:13

Hi PostgreSQL community,

I have the following query, run immediately after executing VACUUM in
the database. There is only one connection to the database.

The query runs for much longer than I expect it to run for, and I think
this is due to it using the incorrect subplan. As you can see, subplans
1 and 3 make use of and index, but these subplans are not used.
Subplans  and 4 are seqscan, and they are used.

How can I get PostgreSQL to use subplan 1 and 3?

Thanks,
Jesse

testdb=> explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE r0.NODE_ID = 29 AND r0.ARCHIVE_DATE >= '2013-07-08
18:28:00'AND (EXISTS (SELECT r1.* FROM ARCHIVE_DOCUMENT_INDEX AS r1 WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID =
r0.NODE_IDAND r1.VALUE = 'BSH70002152' ) OR EXISTS ( SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID
=r0.ID AND r2.NODE_ID = r0.NODE_ID AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10; 

                                                                                              QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  Limit  (cost=0.56..151.79 rows=10 width=122) (actual time=44601.350..97649.196 rows=2 loops=1)

    ->  Index Scan Backward using idx_archive_document_x_node_id_archive_date on archive_document r0
(cost=0.56..8258406.24rows=546105 width=122) (actual time=44601.33 

          Index Cond: ((node_id = 29) AND (archive_date >= '2013-07-08 18:28:00'::timestamp without time zone))

          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))

          Rows Removed by Filter: 710851

          SubPlan 1

            ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r1
(cost=0.57..4.59rows=1 width=0) (never executed) 

                  Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'BSH70002152'::text))

                  Heap Fetches: 0

          SubPlan 2

            ->  Seq Scan on archive_document_index r1_1  (cost=0.00..1958104.00 rows=1520 width=16) (actual
time=44418.383..44558.293rows=4 loops=1) 

                  Filter: ((value)::text = 'BSH70002152'::text)

                  Rows Removed by Filter: 95009919

          SubPlan 3

            ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r2
(cost=0.57..4.59rows=1 width=0) (never executed) 

                  Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'TC212592'::text))

                  Heap Fetches: 0

          SubPlan 4

            ->  Seq Scan on archive_document_index r2_1  (cost=0.00..1958104.00 rows=1520 width=16) (actual
time=41659.464..41663.342rows=1 loops=1) 

                  Filter: ((value)::text = 'TC212592'::text)

                  Rows Removed by Filter: 95009922

  Total runtime: 97683.836 ms

(22 rows)

Re: Incorrect index being used

From

Albe Laurenz

Date:

09 October 2013, 10:10:06

Jesse Long wrote:
> I have the following query, run immediately after executing VACUUM in
> the database. There is only one connection to the database.

You should run ANALYZE, not VACUUM.

> The query runs for much longer than I expect it to run for, and I think
> this is due to it using the incorrect subplan. As you can see, subplans
> 1 and 3 make use of and index, but these subplans are not used.
> Subplans  and 4 are seqscan, and they are used.
> 
> How can I get PostgreSQL to use subplan 1 and 3?

They are only possible if an "Index Only Scan" is possible, which
can only be used if the respective table entries are visible for
all transactions.

> testdb=> explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE r0.NODE_ID = 29 AND
> r0.ARCHIVE_DATE >= '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.* FROM ARCHIVE_DOCUMENT_INDEX AS r1
> WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE = 'BSH70002152' ) OR EXISTS (
> SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID = r0.ID AND r2.NODE_ID = r0.NODE_ID
> AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10;

[...]

>           Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed
> SubPlan 4))
> 
>           Rows Removed by Filter: 710851
> 
>           SubPlan 1
> 
>             ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on
> archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never executed)
> 
>                   Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value =
> 'BSH70002152'::text))
> 
>                   Heap Fetches: 0
> 
>           SubPlan 2
> 
>             ->  Seq Scan on archive_document_index r1_1  (cost=0.00..1958104.00 rows=1520 width=16)
> (actual time=44418.383..44558.293 rows=4 loops=1)
> 
>                   Filter: ((value)::text = 'BSH70002152'::text)
> 
>                   Rows Removed by Filter: 95009919
> 
>           SubPlan 3
> 
>             ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on
> archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never executed)
> 
>                   Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value =
> 'TC212592'::text))
> 
>                   Heap Fetches: 0
> 
>           SubPlan 4
> 
>             ->  Seq Scan on archive_document_index r2_1  (cost=0.00..1958104.00 rows=1520 width=16)
> (actual time=41659.464..41663.342 rows=1 loops=1)
> 
>                   Filter: ((value)::text = 'TC212592'::text)
> 
>                   Rows Removed by Filter: 95009922

The estimates are quite off.
Does "ANALYZE archive_document", possibly after increasing
default_statistics_target, make a difference?

Yours,
Laurenz Albe

Re: Incorrect index being used

From

Jesse Long

Date:

09 October 2013, 10:58:00

On 09/10/2013 12:10, Albe Laurenz wrote:
> Jesse Long wrote:
>> I have the following query, run immediately after executing VACUUM in
>> the database. There is only one connection to the database.
> You should run ANALYZE, not VACUUM.
>
>> The query runs for much longer than I expect it to run for, and I think
>> this is due to it using the incorrect subplan. As you can see, subplans
>> 1 and 3 make use of and index, but these subplans are not used.
>> Subplans  and 4 are seqscan, and they are used.
>>
>> How can I get PostgreSQL to use subplan 1 and 3?
> They are only possible if an "Index Only Scan" is possible, which
> can only be used if the respective table entries are visible for
> all transactions.
>
>> testdb=> explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE r0.NODE_ID = 29 AND
>> r0.ARCHIVE_DATE >= '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.* FROM ARCHIVE_DOCUMENT_INDEX AS r1
>> WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE = 'BSH70002152' ) OR EXISTS (
>> SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID = r0.ID AND r2.NODE_ID = r0.NODE_ID
>> AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10;
> [...]
>
>>            Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed
>> SubPlan 4))
>>
>>            Rows Removed by Filter: 710851
>>
>>            SubPlan 1
>>
>>              ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on
>> archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never executed)
>>
>>                    Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value =
>> 'BSH70002152'::text))
>>
>>                    Heap Fetches: 0
>>
>>            SubPlan 2
>>
>>              ->  Seq Scan on archive_document_index r1_1  (cost=0.00..1958104.00 rows=1520 width=16)
>> (actual time=44418.383..44558.293 rows=4 loops=1)
>>
>>                    Filter: ((value)::text = 'BSH70002152'::text)
>>
>>                    Rows Removed by Filter: 95009919
>>
>>            SubPlan 3
>>
>>              ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on
>> archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never executed)
>>
>>                    Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value =
>> 'TC212592'::text))
>>
>>                    Heap Fetches: 0
>>
>>            SubPlan 4
>>
>>              ->  Seq Scan on archive_document_index r2_1  (cost=0.00..1958104.00 rows=1520 width=16)
>> (actual time=41659.464..41663.342 rows=1 loops=1)
>>
>>                    Filter: ((value)::text = 'TC212592'::text)
>>
>>                    Rows Removed by Filter: 95009922
> The estimates are quite off.
> Does "ANALYZE archive_document", possibly after increasing
> default_statistics_target, make a difference?
>
> Yours,
> Laurenz Albe
>

Hi Laurenz,

Thank you for the feedback.

There is no problem with row visibility, there is only one connection to
the database - the connection I am using to do these selects.

Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both
tables concerned, but not much changed:

                                                                                              QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  Limit  (cost=0.56..151.85 rows=10 width=122) (actual time=40841.984..85668.213 rows=2 loops=1)

    ->  Index Scan Backward using idx_archive_document_x_node_id_archive_date on archive_document r0
(cost=0.56..7627640.20rows=504186 width=122) (actual time=40841.98 

          Index Cond: ((node_id = 29) AND (archive_date >= '2013-07-08 18:28:00'::timestamp without time zone))

          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))

          Rows Removed by Filter: 710851

          SubPlan 1

            ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r1
(cost=0.57..4.59rows=1 width=0) (never executed) 

                  Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'BSH70002152'::text))

                  Heap Fetches: 0

          SubPlan 2

            ->  Seq Scan on archive_document_index r1_1  (cost=0.00..1958101.80 rows=1568 width=16) (actual
time=36633.365..40841.909rows=4 loops=1) 

                  Filter: ((value)::text = 'BSH70002152'::text)

                  Rows Removed by Filter: 95009919

          SubPlan 3

            ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r2
(cost=0.57..4.59rows=1 width=0) (never executed) 

                  Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'TC212592'::text))

                  Heap Fetches: 0

          SubPlan 4

            ->  Seq Scan on archive_document_index r2_1  (cost=0.00..1958101.80 rows=1568 width=16) (actual
time=40241.599..44462.485rows=1 loops=1) 

                  Filter: ((value)::text = 'TC212592'::text)

                  Rows Removed by Filter: 95009922

  Total runtime: 85676.734 ms

(22 rows)

Thanks,
Jesse

Re: Incorrect index being used

From

Jesse Long

Date:

09 October 2013, 11:02:16

On 09/10/2013 12:57, Jesse Long wrote:
> On 09/10/2013 12:10, Albe Laurenz wrote:
>> Jesse Long wrote:
>>> I have the following query, run immediately after executing VACUUM in
>>> the database. There is only one connection to the database.
>> You should run ANALYZE, not VACUUM.
>>
>>> The query runs for much longer than I expect it to run for, and I think
>>> this is due to it using the incorrect subplan. As you can see, subplans
>>> 1 and 3 make use of and index, but these subplans are not used.
>>> Subplans  and 4 are seqscan, and they are used.
>>>
>>> How can I get PostgreSQL to use subplan 1 and 3?
>> They are only possible if an "Index Only Scan" is possible, which
>> can only be used if the respective table entries are visible for
>> all transactions.
>>
>>> testdb=> explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE
>>> r0.NODE_ID = 29 AND
>>> r0.ARCHIVE_DATE >= '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.*
>>> FROM ARCHIVE_DOCUMENT_INDEX AS r1
>>> WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE
>>> = 'BSH70002152' ) OR EXISTS (
>>> SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID =
>>> r0.ID AND r2.NODE_ID = r0.NODE_ID
>>> AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10;
>> [...]
>>
>>>            Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR
>>> (alternatives: SubPlan 3 or hashed
>>> SubPlan 4))
>>>
>>>            Rows Removed by Filter: 710851
>>>
>>>            SubPlan 1
>>>
>>>              ->  Index Only Scan using
>>> archive_document_index_x_archive_id_node_id_value on
>>> archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never
>>> executed)
>>>
>>>                    Index Cond: ((archive_id = r0.id) AND (node_id =
>>> r0.node_id) AND (value =
>>> 'BSH70002152'::text))
>>>
>>>                    Heap Fetches: 0
>>>
>>>            SubPlan 2
>>>
>>>              ->  Seq Scan on archive_document_index r1_1
>>> (cost=0.00..1958104.00 rows=1520 width=16)
>>> (actual time=44418.383..44558.293 rows=4 loops=1)
>>>
>>>                    Filter: ((value)::text = 'BSH70002152'::text)
>>>
>>>                    Rows Removed by Filter: 95009919
>>>
>>>            SubPlan 3
>>>
>>>              ->  Index Only Scan using
>>> archive_document_index_x_archive_id_node_id_value on
>>> archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never
>>> executed)
>>>
>>>                    Index Cond: ((archive_id = r0.id) AND (node_id =
>>> r0.node_id) AND (value =
>>> 'TC212592'::text))
>>>
>>>                    Heap Fetches: 0
>>>
>>>            SubPlan 4
>>>
>>>              ->  Seq Scan on archive_document_index r2_1
>>> (cost=0.00..1958104.00 rows=1520 width=16)
>>> (actual time=41659.464..41663.342 rows=1 loops=1)
>>>
>>>                    Filter: ((value)::text = 'TC212592'::text)
>>>
>>>                    Rows Removed by Filter: 95009922
>> The estimates are quite off.
>> Does "ANALYZE archive_document", possibly after increasing
>> default_statistics_target, make a difference?
>>
>> Yours,
>> Laurenz Albe
>>
>
> Hi Laurenz,
>
> Thank you for the feedback.
>
> There is no problem with row visibility, there is only one connection
> to the database - the connection I am using to do these selects.
>
> Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both
> tables concerned, but not much changed:
>
> QUERY PLAN
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>
>  Limit  (cost=0.56..151.85 rows=10 width=122) (actual
> time=40841.984..85668.213 rows=2 loops=1)
>
>    ->  Index Scan Backward using
> idx_archive_document_x_node_id_archive_date on archive_document r0
> (cost=0.56..7627640.20 rows=504186 width=122) (actual time=40841.98
>
>          Index Cond: ((node_id = 29) AND (archive_date >= '2013-07-08
> 18:28:00'::timestamp without time zone))
>
>          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR
> (alternatives: SubPlan 3 or hashed SubPlan 4))
>
>          Rows Removed by Filter: 710851
>
>          SubPlan 1
>
>            ->  Index Only Scan using
> archive_document_index_x_archive_id_node_id_value on
> archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never
> executed)
>
>                  Index Cond: ((archive_id = r0.id) AND (node_id =
> r0.node_id) AND (value = 'BSH70002152'::text))
>
>                  Heap Fetches: 0
>
>          SubPlan 2
>
>            ->  Seq Scan on archive_document_index r1_1
> (cost=0.00..1958101.80 rows=1568 width=16) (actual
> time=36633.365..40841.909 rows=4 loops=1)
>
>                  Filter: ((value)::text = 'BSH70002152'::text)
>
>                  Rows Removed by Filter: 95009919
>
>          SubPlan 3
>
>            ->  Index Only Scan using
> archive_document_index_x_archive_id_node_id_value on
> archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never
> executed)
>
>                  Index Cond: ((archive_id = r0.id) AND (node_id =
> r0.node_id) AND (value = 'TC212592'::text))
>
>                  Heap Fetches: 0
>
>          SubPlan 4
>
>            ->  Seq Scan on archive_document_index r2_1
> (cost=0.00..1958101.80 rows=1568 width=16) (actual
> time=40241.599..44462.485 rows=1 loops=1)
>
>                  Filter: ((value)::text = 'TC212592'::text)
>
>                  Rows Removed by Filter: 95009922
>
>  Total runtime: 85676.734 ms
>
> (22 rows)

Sorry, I neglected to mention that this is on PostgreSQL 9.3.0.

Thanks,
Jesse

Re: Incorrect index being used

From

Albe Laurenz

Date:

09 October 2013, 13:20:53

Jesse Long wrote:
> There is no problem with row visibility, there is only one connection to
> the database - the connection I am using to do these selects.

No idea why the plans cannot be used.

It might be helpful to see the table and index definitions.

> Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both
> tables concerned, but not much changed:

Did you try increasing default_statistics_target before ANALYZE?

Yours,
Laurenz Albe

Re: Incorrect index being used

From

Jesse Long

Date:

09 October 2013, 14:57:51

On 09/10/2013 12:10, Albe Laurenz wrote:
> Jesse Long wrote:
>> I have the following query, run immediately after executing VACUUM in
>> the database. There is only one connection to the database.
> You should run ANALYZE, not VACUUM.
>
>> The query runs for much longer than I expect it to run for, and I think
>> this is due to it using the incorrect subplan. As you can see, subplans
>> 1 and 3 make use of and index, but these subplans are not used.
>> Subplans  and 4 are seqscan, and they are used.
>>
>> How can I get PostgreSQL to use subplan 1 and 3?
> They are only possible if an "Index Only Scan" is possible, which
> can only be used if the respective table entries are visible for
> all transactions.
>
>> testdb=> explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE r0.NODE_ID = 29 AND
>> r0.ARCHIVE_DATE >= '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.* FROM ARCHIVE_DOCUMENT_INDEX AS r1
>> WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE = 'BSH70002152' ) OR EXISTS (
>> SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID = r0.ID AND r2.NODE_ID = r0.NODE_ID
>> AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10;
> [...]
>
>>            Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed
>> SubPlan 4))
>>
>>            Rows Removed by Filter: 710851
>>
>>            SubPlan 1
>>
>>              ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on
>> archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never executed)
>>
>>                    Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value =
>> 'BSH70002152'::text))
>>
>>                    Heap Fetches: 0
>>
>>            SubPlan 2
>>
>>              ->  Seq Scan on archive_document_index r1_1  (cost=0.00..1958104.00 rows=1520 width=16)
>> (actual time=44418.383..44558.293 rows=4 loops=1)
>>
>>                    Filter: ((value)::text = 'BSH70002152'::text)
>>
>>                    Rows Removed by Filter: 95009919
>>
>>            SubPlan 3
>>
>>              ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on
>> archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never executed)
>>
>>                    Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value =
>> 'TC212592'::text))
>>
>>                    Heap Fetches: 0
>>
>>            SubPlan 4
>>
>>              ->  Seq Scan on archive_document_index r2_1  (cost=0.00..1958104.00 rows=1520 width=16)
>> (actual time=41659.464..41663.342 rows=1 loops=1)
>>
>>                    Filter: ((value)::text = 'TC212592'::text)
>>
>>                    Rows Removed by Filter: 95009922
> The estimates are quite off.
> Does "ANALYZE archive_document", possibly after increasing
> default_statistics_target, make a difference?
>
> Yours,
> Laurenz Albe
>

Hi Laurenz,

Thank you for the feedback.

There is no problem with row visibility, there is only one connection to
the database - the connection I am using to do these selects.

Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both
tables concerned, but not much changed:

                                                                                              QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  Limit  (cost=0.56..151.85 rows=10 width=122) (actual time=40841.984..85668.213 rows=2 loops=1)

    ->  Index Scan Backward using idx_archive_document_x_node_id_archive_date on archive_document r0
(cost=0.56..7627640.20rows=504186 width=122) (actual time=40841.98 

          Index Cond: ((node_id = 29) AND (archive_date >= '2013-07-08 18:28:00'::timestamp without time zone))

          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4))

          Rows Removed by Filter: 710851

          SubPlan 1

            ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r1
(cost=0.57..4.59rows=1 width=0) (never executed) 

                  Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'BSH70002152'::text))

                  Heap Fetches: 0

          SubPlan 2

            ->  Seq Scan on archive_document_index r1_1  (cost=0.00..1958101.80 rows=1568 width=16) (actual
time=36633.365..40841.909rows=4 loops=1) 

                  Filter: ((value)::text = 'BSH70002152'::text)

                  Rows Removed by Filter: 95009919

          SubPlan 3

            ->  Index Only Scan using archive_document_index_x_archive_id_node_id_value on archive_document_index r2
(cost=0.57..4.59rows=1 width=0) (never executed) 

                  Index Cond: ((archive_id = r0.id) AND (node_id = r0.node_id) AND (value = 'TC212592'::text))

                  Heap Fetches: 0

          SubPlan 4

            ->  Seq Scan on archive_document_index r2_1  (cost=0.00..1958101.80 rows=1568 width=16) (actual
time=40241.599..44462.485rows=1 loops=1) 

                  Filter: ((value)::text = 'TC212592'::text)

                  Rows Removed by Filter: 95009922

  Total runtime: 85676.734 ms

(22 rows)

Thanks,
Jesse

Re: Incorrect index being used

From

Jesse Long

Date:

09 October 2013, 15:02:05

On 09/10/2013 12:57, Jesse Long wrote:
> On 09/10/2013 12:10, Albe Laurenz wrote:
>> Jesse Long wrote:
>>> I have the following query, run immediately after executing VACUUM in
>>> the database. There is only one connection to the database.
>> You should run ANALYZE, not VACUUM.
>>
>>> The query runs for much longer than I expect it to run for, and I think
>>> this is due to it using the incorrect subplan. As you can see, subplans
>>> 1 and 3 make use of and index, but these subplans are not used.
>>> Subplans  and 4 are seqscan, and they are used.
>>>
>>> How can I get PostgreSQL to use subplan 1 and 3?
>> They are only possible if an "Index Only Scan" is possible, which
>> can only be used if the respective table entries are visible for
>> all transactions.
>>
>>> testdb=> explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE
>>> r0.NODE_ID = 29 AND
>>> r0.ARCHIVE_DATE >= '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.*
>>> FROM ARCHIVE_DOCUMENT_INDEX AS r1
>>> WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE
>>> = 'BSH70002152' ) OR EXISTS (
>>> SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID =
>>> r0.ID AND r2.NODE_ID = r0.NODE_ID
>>> AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10;
>> [...]
>>
>>>            Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR
>>> (alternatives: SubPlan 3 or hashed
>>> SubPlan 4))
>>>
>>>            Rows Removed by Filter: 710851
>>>
>>>            SubPlan 1
>>>
>>>              ->  Index Only Scan using
>>> archive_document_index_x_archive_id_node_id_value on
>>> archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never
>>> executed)
>>>
>>>                    Index Cond: ((archive_id = r0.id) AND (node_id =
>>> r0.node_id) AND (value =
>>> 'BSH70002152'::text))
>>>
>>>                    Heap Fetches: 0
>>>
>>>            SubPlan 2
>>>
>>>              ->  Seq Scan on archive_document_index r1_1
>>> (cost=0.00..1958104.00 rows=1520 width=16)
>>> (actual time=44418.383..44558.293 rows=4 loops=1)
>>>
>>>                    Filter: ((value)::text = 'BSH70002152'::text)
>>>
>>>                    Rows Removed by Filter: 95009919
>>>
>>>            SubPlan 3
>>>
>>>              ->  Index Only Scan using
>>> archive_document_index_x_archive_id_node_id_value on
>>> archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never
>>> executed)
>>>
>>>                    Index Cond: ((archive_id = r0.id) AND (node_id =
>>> r0.node_id) AND (value =
>>> 'TC212592'::text))
>>>
>>>                    Heap Fetches: 0
>>>
>>>            SubPlan 4
>>>
>>>              ->  Seq Scan on archive_document_index r2_1
>>> (cost=0.00..1958104.00 rows=1520 width=16)
>>> (actual time=41659.464..41663.342 rows=1 loops=1)
>>>
>>>                    Filter: ((value)::text = 'TC212592'::text)
>>>
>>>                    Rows Removed by Filter: 95009922
>> The estimates are quite off.
>> Does "ANALYZE archive_document", possibly after increasing
>> default_statistics_target, make a difference?
>>
>> Yours,
>> Laurenz Albe
>>
>
> Hi Laurenz,
>
> Thank you for the feedback.
>
> There is no problem with row visibility, there is only one connection
> to the database - the connection I am using to do these selects.
>
> Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both
> tables concerned, but not much changed:
>
> QUERY PLAN
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>
>  Limit  (cost=0.56..151.85 rows=10 width=122) (actual
> time=40841.984..85668.213 rows=2 loops=1)
>
>    ->  Index Scan Backward using
> idx_archive_document_x_node_id_archive_date on archive_document r0
> (cost=0.56..7627640.20 rows=504186 width=122) (actual time=40841.98
>
>          Index Cond: ((node_id = 29) AND (archive_date >= '2013-07-08
> 18:28:00'::timestamp without time zone))
>
>          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR
> (alternatives: SubPlan 3 or hashed SubPlan 4))
>
>          Rows Removed by Filter: 710851
>
>          SubPlan 1
>
>            ->  Index Only Scan using
> archive_document_index_x_archive_id_node_id_value on
> archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never
> executed)
>
>                  Index Cond: ((archive_id = r0.id) AND (node_id =
> r0.node_id) AND (value = 'BSH70002152'::text))
>
>                  Heap Fetches: 0
>
>          SubPlan 2
>
>            ->  Seq Scan on archive_document_index r1_1
> (cost=0.00..1958101.80 rows=1568 width=16) (actual
> time=36633.365..40841.909 rows=4 loops=1)
>
>                  Filter: ((value)::text = 'BSH70002152'::text)
>
>                  Rows Removed by Filter: 95009919
>
>          SubPlan 3
>
>            ->  Index Only Scan using
> archive_document_index_x_archive_id_node_id_value on
> archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never
> executed)
>
>                  Index Cond: ((archive_id = r0.id) AND (node_id =
> r0.node_id) AND (value = 'TC212592'::text))
>
>                  Heap Fetches: 0
>
>          SubPlan 4
>
>            ->  Seq Scan on archive_document_index r2_1
> (cost=0.00..1958101.80 rows=1568 width=16) (actual
> time=40241.599..44462.485 rows=1 loops=1)
>
>                  Filter: ((value)::text = 'TC212592'::text)
>
>                  Rows Removed by Filter: 95009922
>
>  Total runtime: 85676.734 ms
>
> (22 rows)

Sorry, I neglected to mention that this is on PostgreSQL 9.3.0.

Thanks,
Jesse

Re: Incorrect index being used

From

Tom Lane

Date:

09 October 2013, 16:06:37

Jesse Long <jpl@unknown.za.net> writes:
> The query runs for much longer than I expect it to run for, and I think
> this is due to it using the incorrect subplan. As you can see, subplans
> 1 and 3 make use of and index, but these subplans are not used.
> Subplans  and 4 are seqscan, and they are used.
> How can I get PostgreSQL to use subplan 1 and 3?

You can't, and you would not like the results if you did.

The construct that's being described (perhaps not very intelligibly)
by this EXPLAIN output is an alternative pair of subplans.  Actually
there are two such alternative pairs in this example.  The indexscan
variants are subplans that would be fast if executed only once or
twice.  The seqscan variants, if used, are used to load a hashtable
that is then probed for each row of the outer plan.  If there are a
lot of rows to be considered in the outer plan, then it's better to
pay the price of loading the hashtable, because each hashtable probe
will be a lot cheaper than doing a fresh indexscan with the comparison
value from the current outer row.

In this example, we can see that the outer scan that the subplans
are attached to eliminated 710851 rows by means of the subplan filters,
meaning that the subplans were probed 710851+2 times.  If each of those
probes had been done with a separate indexscan, you'd likely still be
waiting for the result.  Using the seqscan+hashtable was definitely the
right choice here.

BTW, the reason it looks like this rather than just hard-wiring the
seqscan choice is a planner implementation artifact --- at the time
that the subplan plans are created, we don't know how many rows are
expected to pass through the outer plan level.  So we plan it both
ways and leave the choice to be made during executor startup.

What I'd suggest is that you see if you can't get rid of the "EXISTS() OR
EXISTS()" construction in favor of a single EXISTS clause --- I'm too lazy
to work out the details but it looks like you could do the OR in the WHERE
clause of a single EXISTS sub-select.  That would allow the planner to
convert the EXISTS into a semi-join, which might work better than what
you've got.  As is, you're dealing with fairly generic sub-select logic
that isn't going to be terribly well optimized.

            regards, tom lane

Re: Incorrect index being used

From

Jesse Long

Date:

10 October 2013, 07:48:25

On 09/10/2013 15:20, Albe Laurenz wrote:
> Jesse Long wrote:
>> There is no problem with row visibility, there is only one connection to
>> the database - the connection I am using to do these selects.
> No idea why the plans cannot be used.
>
> It might be helpful to see the table and index definitions.
>
>> Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both
>> tables concerned, but not much changed:
> Did you try increasing default_statistics_target before ANALYZE?
>
> Yours,
> Laurenz Albe
>

Hi Laurenz,

After running VACUUM ANALYZE, the query did run the preferred sub plan,
at least once. This was before making any changes.

I did ALTER COLUMN SET STATISTICS 10000 for each column involved after
that (is that what you meant?). But it did not make much difference, but
I have read the manual regarding this setting and think I understand it.

Would it be a bad idea to ALTER COLUMN SET STATISTICS 100000? Would that
have a very negative impact on disk usage or performace?

Thanks,
Jesse

Re: Incorrect index being used

From

Jesse Long

Date:

10 October 2013, 07:51:23

On 09/10/2013 18:06, Tom Lane wrote:
> Jesse Long <jpl@unknown.za.net> writes:
>> The query runs for much longer than I expect it to run for, and I think
>> this is due to it using the incorrect subplan. As you can see, subplans
>> 1 and 3 make use of and index, but these subplans are not used.
>> Subplans  and 4 are seqscan, and they are used.
>> How can I get PostgreSQL to use subplan 1 and 3?
> You can't, and you would not like the results if you did.
>
> The construct that's being described (perhaps not very intelligibly)
> by this EXPLAIN output is an alternative pair of subplans.  Actually
> there are two such alternative pairs in this example.  The indexscan
> variants are subplans that would be fast if executed only once or
> twice.  The seqscan variants, if used, are used to load a hashtable
> that is then probed for each row of the outer plan.  If there are a
> lot of rows to be considered in the outer plan, then it's better to
> pay the price of loading the hashtable, because each hashtable probe
> will be a lot cheaper than doing a fresh indexscan with the comparison
> value from the current outer row.
>
> In this example, we can see that the outer scan that the subplans
> are attached to eliminated 710851 rows by means of the subplan filters,
> meaning that the subplans were probed 710851+2 times.  If each of those
> probes had been done with a separate indexscan, you'd likely still be
> waiting for the result.  Using the seqscan+hashtable was definitely the
> right choice here.
>
> BTW, the reason it looks like this rather than just hard-wiring the
> seqscan choice is a planner implementation artifact --- at the time
> that the subplan plans are created, we don't know how many rows are
> expected to pass through the outer plan level.  So we plan it both
> ways and leave the choice to be made during executor startup.
>
> What I'd suggest is that you see if you can't get rid of the "EXISTS() OR
> EXISTS()" construction in favor of a single EXISTS clause --- I'm too lazy
> to work out the details but it looks like you could do the OR in the WHERE
> clause of a single EXISTS sub-select.  That would allow the planner to
> convert the EXISTS into a semi-join, which might work better than what
> you've got.  As is, you're dealing with fairly generic sub-select logic
> that isn't going to be terribly well optimized.
>

Hi Tom,

I am very grateful for your detailed reply. I have not had much time to
pursue this issue further, but as soon as I have I will investigate and
study what you have written.

Thanks for taking the time to write your thoughts in detail.

Cheers,
Jesse

Re: Incorrect index being used

From

Jesse Long

Date:

10 October 2013, 11:48:11

On 09/10/2013 15:20, Albe Laurenz wrote:
> Jesse Long wrote:
>> There is no problem with row visibility, there is only one connection to
>> the database - the connection I am using to do these selects.
> No idea why the plans cannot be used.
>
> It might be helpful to see the table and index definitions.
>
>> Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both
>> tables concerned, but not much changed:
> Did you try increasing default_statistics_target before ANALYZE?
>
> Yours,
> Laurenz Albe
>

Hi Laurenz,

After running VACUUM ANALYZE, the query did run the preferred sub plan,
at least once. This was before making any changes.

I did ALTER COLUMN SET STATISTICS 10000 for each column involved after
that (is that what you meant?). But it did not make much difference, but
I have read the manual regarding this setting and think I understand it.

Would it be a bad idea to ALTER COLUMN SET STATISTICS 100000? Would that
have a very negative impact on disk usage or performace?

Thanks,
Jesse

Re: Incorrect index being used

From

Jesse Long

Date:

10 October 2013, 11:51:17

On 09/10/2013 18:06, Tom Lane wrote:
> Jesse Long <jpl@unknown.za.net> writes:
>> The query runs for much longer than I expect it to run for, and I think
>> this is due to it using the incorrect subplan. As you can see, subplans
>> 1 and 3 make use of and index, but these subplans are not used.
>> Subplans  and 4 are seqscan, and they are used.
>> How can I get PostgreSQL to use subplan 1 and 3?
> You can't, and you would not like the results if you did.
>
> The construct that's being described (perhaps not very intelligibly)
> by this EXPLAIN output is an alternative pair of subplans.  Actually
> there are two such alternative pairs in this example.  The indexscan
> variants are subplans that would be fast if executed only once or
> twice.  The seqscan variants, if used, are used to load a hashtable
> that is then probed for each row of the outer plan.  If there are a
> lot of rows to be considered in the outer plan, then it's better to
> pay the price of loading the hashtable, because each hashtable probe
> will be a lot cheaper than doing a fresh indexscan with the comparison
> value from the current outer row.
>
> In this example, we can see that the outer scan that the subplans
> are attached to eliminated 710851 rows by means of the subplan filters,
> meaning that the subplans were probed 710851+2 times.  If each of those
> probes had been done with a separate indexscan, you'd likely still be
> waiting for the result.  Using the seqscan+hashtable was definitely the
> right choice here.
>
> BTW, the reason it looks like this rather than just hard-wiring the
> seqscan choice is a planner implementation artifact --- at the time
> that the subplan plans are created, we don't know how many rows are
> expected to pass through the outer plan level.  So we plan it both
> ways and leave the choice to be made during executor startup.
>
> What I'd suggest is that you see if you can't get rid of the "EXISTS() OR
> EXISTS()" construction in favor of a single EXISTS clause --- I'm too lazy
> to work out the details but it looks like you could do the OR in the WHERE
> clause of a single EXISTS sub-select.  That would allow the planner to
> convert the EXISTS into a semi-join, which might work better than what
> you've got.  As is, you're dealing with fairly generic sub-select logic
> that isn't going to be terribly well optimized.
>

Hi Tom,

I am very grateful for your detailed reply. I have not had much time to
pursue this issue further, but as soon as I have I will investigate and
study what you have written.

Thanks for taking the time to write your thoughts in detail.

Cheers,
Jesse

Re: Incorrect index being used

From

Albe Laurenz

Date:

11 October 2013, 09:23:36

Jesse Long wrote:
> I did ALTER COLUMN SET STATISTICS 10000 for each column involved after
> that (is that what you meant?). But it did not make much difference, but
> I have read the manual regarding this setting and think I understand it.
> 
> Would it be a bad idea to ALTER COLUMN SET STATISTICS 100000? Would that
> have a very negative impact on disk usage or performace?

No, in the light of what Tom wrote, I was on the wrong track entirely
and your best bet is probably to rewrite the query.

Yours,
Laurenz Albe

Re: Incorrect index being used

From

Jesse Long

Date:

11 October 2013, 13:32:33

On 09/10/2013 18:06, Tom Lane wrote:
> Jesse Long <jpl@unknown.za.net> writes:
>> The query runs for much longer than I expect it to run for, and I think
>> this is due to it using the incorrect subplan. As you can see, subplans
>> 1 and 3 make use of and index, but these subplans are not used.
>> Subplans  and 4 are seqscan, and they are used.
>> How can I get PostgreSQL to use subplan 1 and 3?
> You can't, and you would not like the results if you did.
>
> The construct that's being described (perhaps not very intelligibly)
> by this EXPLAIN output is an alternative pair of subplans.  Actually
> there are two such alternative pairs in this example.  The indexscan
> variants are subplans that would be fast if executed only once or
> twice.  The seqscan variants, if used, are used to load a hashtable
> that is then probed for each row of the outer plan.  If there are a
> lot of rows to be considered in the outer plan, then it's better to
> pay the price of loading the hashtable, because each hashtable probe
> will be a lot cheaper than doing a fresh indexscan with the comparison
> value from the current outer row.
>
> In this example, we can see that the outer scan that the subplans
> are attached to eliminated 710851 rows by means of the subplan filters,
> meaning that the subplans were probed 710851+2 times.  If each of those
> probes had been done with a separate indexscan, you'd likely still be
> waiting for the result.  Using the seqscan+hashtable was definitely the
> right choice here.
>
> BTW, the reason it looks like this rather than just hard-wiring the
> seqscan choice is a planner implementation artifact --- at the time
> that the subplan plans are created, we don't know how many rows are
> expected to pass through the outer plan level.  So we plan it both
> ways and leave the choice to be made during executor startup.
>
> What I'd suggest is that you see if you can't get rid of the "EXISTS() OR
> EXISTS()" construction in favor of a single EXISTS clause --- I'm too lazy
> to work out the details but it looks like you could do the OR in the WHERE
> clause of a single EXISTS sub-select.  That would allow the planner to
> convert the EXISTS into a semi-join, which might work better than what
> you've got.  As is, you're dealing with fairly generic sub-select logic
> that isn't going to be terribly well optimized.
>

Hi Tom,

I understand what you say about using the index (archive_id, node_id,
value) to do a separate lookup for each row in the archive_document
table that would be filtered. I understand that this would be constly.

However, the seqscan is killing me. I have another index on
archive_document_index which has been there all along - (node_id,
value). Would it not be better for PostgreSQL to use this index to
perform an index scan instead of a seqscan when populating the hash table?

explain select * from archive_document_index where node_id = 29 and
value = 'BSH70002152';
                                                         QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
  Index Scan using archive_document_index_node_id_value_idx on
archive_document_index  (cost=0.57..36.13 rows=14 width=33)
    Index Cond: ((node_id = 29) AND ((value)::text = 'BSH70002152'::text))
(2 rows)

Thanks,
Jesse

Re: Incorrect index being used

From

Jesse Long

Date:

11 October 2013, 17:32:21

On 09/10/2013 18:06, Tom Lane wrote:
> Jesse Long <jpl@unknown.za.net> writes:
>> The query runs for much longer than I expect it to run for, and I think
>> this is due to it using the incorrect subplan. As you can see, subplans
>> 1 and 3 make use of and index, but these subplans are not used.
>> Subplans  and 4 are seqscan, and they are used.
>> How can I get PostgreSQL to use subplan 1 and 3?
> You can't, and you would not like the results if you did.
>
> The construct that's being described (perhaps not very intelligibly)
> by this EXPLAIN output is an alternative pair of subplans.  Actually
> there are two such alternative pairs in this example.  The indexscan
> variants are subplans that would be fast if executed only once or
> twice.  The seqscan variants, if used, are used to load a hashtable
> that is then probed for each row of the outer plan.  If there are a
> lot of rows to be considered in the outer plan, then it's better to
> pay the price of loading the hashtable, because each hashtable probe
> will be a lot cheaper than doing a fresh indexscan with the comparison
> value from the current outer row.
>
> In this example, we can see that the outer scan that the subplans
> are attached to eliminated 710851 rows by means of the subplan filters,
> meaning that the subplans were probed 710851+2 times.  If each of those
> probes had been done with a separate indexscan, you'd likely still be
> waiting for the result.  Using the seqscan+hashtable was definitely the
> right choice here.
>
> BTW, the reason it looks like this rather than just hard-wiring the
> seqscan choice is a planner implementation artifact --- at the time
> that the subplan plans are created, we don't know how many rows are
> expected to pass through the outer plan level.  So we plan it both
> ways and leave the choice to be made during executor startup.
>
> What I'd suggest is that you see if you can't get rid of the "EXISTS() OR
> EXISTS()" construction in favor of a single EXISTS clause --- I'm too lazy
> to work out the details but it looks like you could do the OR in the WHERE
> clause of a single EXISTS sub-select.  That would allow the planner to
> convert the EXISTS into a semi-join, which might work better than what
> you've got.  As is, you're dealing with fairly generic sub-select logic
> that isn't going to be terribly well optimized.
>

Hi Tom,

I understand what you say about using the index (archive_id, node_id,
value) to do a separate lookup for each row in the archive_document
table that would be filtered. I understand that this would be constly.

However, the seqscan is killing me. I have another index on
archive_document_index which has been there all along - (node_id,
value). Would it not be better for PostgreSQL to use this index to
perform an index scan instead of a seqscan when populating the hash table?

explain select * from archive_document_index where node_id = 29 and
value = 'BSH70002152';
                                                         QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
  Index Scan using archive_document_index_node_id_value_idx on
archive_document_index  (cost=0.57..36.13 rows=14 width=33)
    Index Cond: ((node_id = 29) AND ((value)::text = 'BSH70002152'::text))
(2 rows)

Thanks,
Jesse

Re: Incorrect index being used

From

BladeOfLight16

Date:

11 October 2013, 23:46:27

On Fri, Oct 11, 2013 at 9:32 AM, Jesse Long <jpl@unknown.za.net> wrote:

explain select * from archive_document_index where node_id = 29 and value = 'BSH70002152';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Index Scan using archive_document_index_node_id_value_idx on archive_document_index (cost=0.57..36.13 rows=14 width=33)
Index Cond: ((node_id = 29) AND ((value)::text = 'BSH70002152'::text))
(2 rows)

I believe that this is what Tom is suggesting:

SELECT *
FROM ARCHIVE_DOCUMENT AS ad
WHERE ad.NODE_ID = 29
AND ad.ARCHIVE_DATE >= '2013-07-08 18:28:00'
AND EXISTS (SELECT *
              FROM ARCHIVE_DOCUMENT_INDEX AS adi
              WHERE adi.ARCHIVE_ID = ad.ID
                AND adi.NODE_ID = ad.NODE_ID
                AND (adi.VALUE = 'BSH70002152' OR adi.VALUE = 'TC212592')
             )
ORDER BY ad.ARCHIVE_DATE DESC
LIMIT 10;

Forgive my changing of the aliases. I found names like r0 difficult to interpret easily.

This is a much simpler query since it only requires one subselect, and I believe Tom is suggesting that this query may be able to make use of the index or at least find some more efficient plan. How does this perform?

Re: Incorrect index being used

From

Jesse Long

Date:

14 October 2013, 07:51:19

On 12/10/2013 01:45, BladeOfLight16 wrote:
> On Fri, Oct 11, 2013 at 9:32 AM, Jesse Long <jpl@unknown.za.net
> <mailto:jpl@unknown.za.net>> wrote:
>
>     explain select * from archive_document_index where node_id = 29
>     and value = 'BSH70002152';
>     QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------
>      Index Scan using archive_document_index_node_id_value_idx on
>     archive_document_index  (cost=0.57..36.13 rows=14 width=33)
>        Index Cond: ((node_id = 29) AND ((value)::text =
>     'BSH70002152'::text))
>     (2 rows)
>
>
> I believe that this is what Tom is suggesting:
>
> SELECT *
> FROM ARCHIVE_DOCUMENT AS ad
> WHERE ad.NODE_ID = 29
>   AND ad.ARCHIVE_DATE >= '2013-07-08 18:28:00'
>   AND EXISTS (SELECT *
>               FROM ARCHIVE_DOCUMENT_INDEX AS adi
>               WHERE adi.ARCHIVE_ID = ad.ID
>                 AND adi.NODE_ID = ad.NODE_ID
>                 AND (adi.VALUE = 'BSH70002152' OR adi.VALUE = 'TC212592')
>              )
> ORDER BY ad.ARCHIVE_DATE DESC
> LIMIT 10;
>
> Forgive my changing of the aliases. I found names like r0 difficult to
> interpret easily.
>
> This is a much simpler query since it only requires one subselect, and
> I believe Tom is suggesting that this query may be able to make use of
> the index or at least find some more efficient plan. How does this
> perform?

Indeed it does perform very much better. With the modified query the
plan and the execution time are excellent.

However, my previous question remains - in the original query plan,
there are two hash tables being populated using seqscan + filter. During
each seqscan, over 95 million records were wastefully read. I think that
this could have been dramatically improved by using an index scan. Am I
mistaken? Is it a optimisation that could be implemented but has not
been implemented yet? Is it a bug?

Thanks,
Jesse

Re: Incorrect index being used

From

Jesse Long

Date:

14 October 2013, 11:51:04

On 12/10/2013 01:45, BladeOfLight16 wrote:

On Fri, Oct 11, 2013 at 9:32 AM, Jesse Long <jpl@unknown.za.net> wrote:
explain select * from archive_document_index where node_id = 29 and value = 'BSH70002152';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Index Scan using archive_document_index_node_id_value_idx on archive_document_index (cost=0.57..36.13 rows=14 width=33)
Index Cond: ((node_id = 29) AND ((value)::text = 'BSH70002152'::text))
(2 rows)

I believe that this is what Tom is suggesting:

SELECT *
FROM ARCHIVE_DOCUMENT AS ad
WHERE ad.NODE_ID = 29
AND ad.ARCHIVE_DATE >= '2013-07-08 18:28:00'
AND EXISTS (SELECT *
              FROM ARCHIVE_DOCUMENT_INDEX AS adi
              WHERE adi.ARCHIVE_ID = ad.ID
                AND adi.NODE_ID = ad.NODE_ID
                AND (adi.VALUE = 'BSH70002152' OR adi.VALUE = 'TC212592')
             )
ORDER BY ad.ARCHIVE_DATE DESC
LIMIT 10;

Forgive my changing of the aliases. I found names like r0 difficult to interpret easily.

This is a much simpler query since it only requires one subselect, and I believe Tom is suggesting that this query may be able to make use of the index or at least find some more efficient plan. How does this perform?

Indeed it does perform very much better. With the modified query the plan and the execution time are excellent.

However, my previous question remains - in the original query plan, there are two hash tables being populated using seqscan + filter. During each seqscan, over 95 million records were wastefully read. I think that this could have been dramatically improved by using an index scan. Am I mistaken? Is it a optimisation that could be implemented but has not been implemented yet? Is it a bug?

Thanks,
Jesse