Re: Incorrect index being used - Mailing list pgsql-general

From Jesse Long
Subject Re: Incorrect index being used
Date
Msg-id 525537AD.50106@iso-8859-1.za.net
Whole thread Raw
In response to Re: Incorrect index being used  (Jesse Long <jpl@iso-8859-1.za.net>)
List pgsql-general
On 09/10/2013 12:57, Jesse Long wrote:
> On 09/10/2013 12:10, Albe Laurenz wrote:
>> Jesse Long wrote:
>>> I have the following query, run immediately after executing VACUUM in
>>> the database. There is only one connection to the database.
>> You should run ANALYZE, not VACUUM.
>>
>>> The query runs for much longer than I expect it to run for, and I think
>>> this is due to it using the incorrect subplan. As you can see, subplans
>>> 1 and 3 make use of and index, but these subplans are not used.
>>> Subplans  and 4 are seqscan, and they are used.
>>>
>>> How can I get PostgreSQL to use subplan 1 and 3?
>> They are only possible if an "Index Only Scan" is possible, which
>> can only be used if the respective table entries are visible for
>> all transactions.
>>
>>> testdb=> explain analyse SELECT * FROM ARCHIVE_DOCUMENT AS r0 WHERE
>>> r0.NODE_ID = 29 AND
>>> r0.ARCHIVE_DATE >= '2013-07-08 18:28:00' AND (EXISTS (SELECT r1.*
>>> FROM ARCHIVE_DOCUMENT_INDEX AS r1
>>> WHERE r1.ARCHIVE_ID = r0.ID AND r1.NODE_ID = r0.NODE_ID AND r1.VALUE
>>> = 'BSH70002152' ) OR EXISTS (
>>> SELECT r2.* FROM ARCHIVE_DOCUMENT_INDEX AS r2 WHERE r2.ARCHIVE_ID =
>>> r0.ID AND r2.NODE_ID = r0.NODE_ID
>>> AND r2.VALUE = 'TC212592' ) ) ORDER BY r0.ARCHIVE_DATE DESC LIMIT 10;
>> [...]
>>
>>>            Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR
>>> (alternatives: SubPlan 3 or hashed
>>> SubPlan 4))
>>>
>>>            Rows Removed by Filter: 710851
>>>
>>>            SubPlan 1
>>>
>>>              ->  Index Only Scan using
>>> archive_document_index_x_archive_id_node_id_value on
>>> archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never
>>> executed)
>>>
>>>                    Index Cond: ((archive_id = r0.id) AND (node_id =
>>> r0.node_id) AND (value =
>>> 'BSH70002152'::text))
>>>
>>>                    Heap Fetches: 0
>>>
>>>            SubPlan 2
>>>
>>>              ->  Seq Scan on archive_document_index r1_1
>>> (cost=0.00..1958104.00 rows=1520 width=16)
>>> (actual time=44418.383..44558.293 rows=4 loops=1)
>>>
>>>                    Filter: ((value)::text = 'BSH70002152'::text)
>>>
>>>                    Rows Removed by Filter: 95009919
>>>
>>>            SubPlan 3
>>>
>>>              ->  Index Only Scan using
>>> archive_document_index_x_archive_id_node_id_value on
>>> archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never
>>> executed)
>>>
>>>                    Index Cond: ((archive_id = r0.id) AND (node_id =
>>> r0.node_id) AND (value =
>>> 'TC212592'::text))
>>>
>>>                    Heap Fetches: 0
>>>
>>>            SubPlan 4
>>>
>>>              ->  Seq Scan on archive_document_index r2_1
>>> (cost=0.00..1958104.00 rows=1520 width=16)
>>> (actual time=41659.464..41663.342 rows=1 loops=1)
>>>
>>>                    Filter: ((value)::text = 'TC212592'::text)
>>>
>>>                    Rows Removed by Filter: 95009922
>> The estimates are quite off.
>> Does "ANALYZE archive_document", possibly after increasing
>> default_statistics_target, make a difference?
>>
>> Yours,
>> Laurenz Albe
>>
>
> Hi Laurenz,
>
> Thank you for the feedback.
>
> There is no problem with row visibility, there is only one connection
> to the database - the connection I am using to do these selects.
>
> Thanks you for the advise regarding ANALYZE. I ran ANALYZE on both
> tables concerned, but not much changed:
>
> QUERY PLAN
>
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>
>  Limit  (cost=0.56..151.85 rows=10 width=122) (actual
> time=40841.984..85668.213 rows=2 loops=1)
>
>    ->  Index Scan Backward using
> idx_archive_document_x_node_id_archive_date on archive_document r0
> (cost=0.56..7627640.20 rows=504186 width=122) (actual time=40841.98
>
>          Index Cond: ((node_id = 29) AND (archive_date >= '2013-07-08
> 18:28:00'::timestamp without time zone))
>
>          Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR
> (alternatives: SubPlan 3 or hashed SubPlan 4))
>
>          Rows Removed by Filter: 710851
>
>          SubPlan 1
>
>            ->  Index Only Scan using
> archive_document_index_x_archive_id_node_id_value on
> archive_document_index r1  (cost=0.57..4.59 rows=1 width=0) (never
> executed)
>
>                  Index Cond: ((archive_id = r0.id) AND (node_id =
> r0.node_id) AND (value = 'BSH70002152'::text))
>
>                  Heap Fetches: 0
>
>          SubPlan 2
>
>            ->  Seq Scan on archive_document_index r1_1
> (cost=0.00..1958101.80 rows=1568 width=16) (actual
> time=36633.365..40841.909 rows=4 loops=1)
>
>                  Filter: ((value)::text = 'BSH70002152'::text)
>
>                  Rows Removed by Filter: 95009919
>
>          SubPlan 3
>
>            ->  Index Only Scan using
> archive_document_index_x_archive_id_node_id_value on
> archive_document_index r2  (cost=0.57..4.59 rows=1 width=0) (never
> executed)
>
>                  Index Cond: ((archive_id = r0.id) AND (node_id =
> r0.node_id) AND (value = 'TC212592'::text))
>
>                  Heap Fetches: 0
>
>          SubPlan 4
>
>            ->  Seq Scan on archive_document_index r2_1
> (cost=0.00..1958101.80 rows=1568 width=16) (actual
> time=40241.599..44462.485 rows=1 loops=1)
>
>                  Filter: ((value)::text = 'TC212592'::text)
>
>                  Rows Removed by Filter: 95009922
>
>  Total runtime: 85676.734 ms
>
> (22 rows)

Sorry, I neglected to mention that this is on PostgreSQL 9.3.0.

Thanks,
Jesse


pgsql-general by date:

Previous
From: Jesse Long
Date:
Subject: Re: Incorrect index being used
Next
From: Bob Futrelle
Date:
Subject: I need more specific instructions for switching to digest mode for this list