Thread: Need help with one query

Need help with one query

From
Anne Rosset
Date:
Hi,
We have the following 2  tables:
\d audit_change
             Table "public.audit_change"
     Column     |          Type          | Modifiers
----------------+------------------------+-----------
 id             | character varying(32)  | not null
audit_entry_id | character varying(32)  |
...
Indexes:
    "audit_change_pk" primary key, btree (id)
    "audit_change_entry" btree (audit_entry_id)

and
\d audit_entry;
              Table "public.audit_entry"
     Column     |           Type           | Modifiers
----------------+--------------------------+-----------
 id             | character varying(32)    | not null
 object_id      | character varying(32)    | not null
...
Indexes:
    "audit_entry_pk" primary key, btree (id)
      "audit_entry_object" btree (object_id)


We do the following query:
EXPLAIN ANALYZE
SELECT audit_change.id AS id,
 audit_change.audit_entry_id AS auditEntryId,
 audit_entry.object_id AS objectId,
 audit_change.property_name AS propertyName,
 audit_change.property_type AS propertyType,
 audit_change.old_value AS
 oldValue, audit_change.new_value AS newValue,
 audit_change.flexfield AS flexField
FROM audit_entry audit_entry, audit_change audit_change
WHERE  audit_change.audit_entry_id=audit_entry.id
AND audit_entry.object_id='artf414029';

                                                                 QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=8.79..253664.55 rows=4 width=136) (actual
time=4612.674..6683.158 rows=4 loops=1)
   Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
   ->  Seq Scan on audit_change  (cost=0.00..225212.52 rows=7584852
width=123) (actual time=0.009..2838.216 rows=7584852 loops=1)
   ->  Hash  (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049
rows=4 loops=1)
         ->  Index Scan using audit_entry_object on audit_entry
(cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1)
               Index Cond: ((object_id)::text = 'artf414029'::text)
 Total runtime: 6683.220 ms
(7 rows)


Why does the query not use the index on audit_entry_id and do a seq scan
(as you see the table has many rows)?



If we split the query into 2 queries, it only takes less than 0.3 ms
EXPLAIN ANALYZE select * from audit_entry WHERE
audit_entry.object_id='artf414029';
                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
 Index Scan using audit_entry_object on audit_entry  (cost=0.00..8.75
rows=3 width=111) (actual time=0.037..0.044 rows=4 loops=1)
   Index Cond: ((object_id)::text = 'artf414029'::text)
 Total runtime: 0.073 ms
(3 rows)

EXPLAIN ANALYZE select * from audit_change WHERE audit_entry_id in
('adte1DDFEA5B011C8988C3928752', 'adte5DDFEA5B011D441230BD20CC',
'adte5DDFEA5B011E40601E8DA10F', 'adte5DDFEA5B011E8CC26071627C') ORDER BY
property_name ASC;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=30.25..30.27 rows=10 width=123) (actual time=0.190..0.192
rows=4 loops=1)
   Sort Key: property_name
   ->  Bitmap Heap Scan on audit_change  (cost=9.99..30.08 rows=10
width=123) (actual time=0.173..0.177 rows=4 loops=1)
         Recheck Cond: ((audit_entry_id)::text = ANY

(('{adte1DDFEA5B011C8988C3928752,adte5DDFEA5B011D441230BD20CC,adte5DDFEA5B011E40601E8DA10F,adte5DDFEA5B011E8CC26071627C}'::character

varying[])::text[]))
         ->  Bitmap Index Scan on audit_change_entry  (cost=0.00..9.99
rows=10 width=0) (actual time=0.167..0.167 rows=4 loops=1)
               Index Cond: ((audit_entry_id)::text = ANY

(('{adte1DDFEA5B011C8988C3928752,adte5DDFEA5B011D441230BD20CC,adte5DDFEA5B011E40601E8DA10F,adte5DDFEA5B011E8CC26071627C}'::character

varying[])::text[]))
 Total runtime: 0.219 ms
(7 rows)

Thanks for your help,
Anne

Re: Need help with one query

From
Richard Huxton
Date:
Anne Rosset wrote:
> EXPLAIN ANALYZE
> SELECT
>   audit_change.id             AS id,
>   audit_change.audit_entry_id AS auditEntryId,
>   audit_entry.object_id       AS objectId,
>   audit_change.property_name  AS propertyName,
>   audit_change.property_type  AS propertyType,
>   audit_change.old_value      AS oldValue,
>   audit_change.new_value      AS newValue,
>   audit_change.flexfield      AS flexField
> FROM
>   audit_entry audit_entry, audit_change audit_change
> WHERE
>   audit_change.audit_entry_id = audit_entry.id
>   AND audit_entry.object_id = 'artf414029';
[query reformatted to make it more readable]

Not quite clear why you are aliasing the tables to their own names...

>
---------------------------------------------------------------------------------------------------------------------------------------------
>
> Hash Join  (cost=8.79..253664.55 rows=4 width=136) (actual
> time=4612.674..6683.158 rows=4 loops=1)
>   Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
>   ->  Seq Scan on audit_change  (cost=0.00..225212.52 rows=7584852
> width=123) (actual time=0.009..2838.216 rows=7584852 loops=1)
>   ->  Hash  (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049
> rows=4 loops=1)
>         ->  Index Scan using audit_entry_object on audit_entry
> (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1)
>               Index Cond: ((object_id)::text = 'artf414029'::text)
> Total runtime: 6683.220 ms

Very odd. It knows the table is large and that the seq-scan is going to
be expensive.

Try issuing "set enable_seqscan = off" and run the explain analyse
again. That should show the cost of using the indexes.

--
  Richard Huxton
  Archonet Ltd

Re: Need help with one query

From
Tom Lane
Date:
Richard Huxton <dev@archonet.com> writes:
>> Hash Join  (cost=8.79..253664.55 rows=4 width=136) (actual
>> time=4612.674..6683.158 rows=4 loops=1)
>> Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
>> ->  Seq Scan on audit_change  (cost=0.00..225212.52 rows=7584852
>> width=123) (actual time=0.009..2838.216 rows=7584852 loops=1)
>> ->  Hash  (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049
>> rows=4 loops=1)
>> ->  Index Scan using audit_entry_object on audit_entry
>> (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1)
>> Index Cond: ((object_id)::text = 'artf414029'::text)
>> Total runtime: 6683.220 ms

> Very odd. It knows the table is large and that the seq-scan is going to
> be expensive.

Yeah, *very* odd.  A nestloop with inner indexscan should have an
estimated cost far lower than this plan.  What Postgres version is
this exactly?  Do you have any nondefault planner parameter settings?

            regards, tom lane

Re: Need help with one query

From
Anne Rosset
Date:
Richard Huxton wrote:

>Anne Rosset wrote:
>
>
>>EXPLAIN ANALYZE
>>SELECT
>>  audit_change.id             AS id,
>>  audit_change.audit_entry_id AS auditEntryId,
>>  audit_entry.object_id       AS objectId,
>>  audit_change.property_name  AS propertyName,
>>  audit_change.property_type  AS propertyType,
>>  audit_change.old_value      AS oldValue,
>>  audit_change.new_value      AS newValue,
>>  audit_change.flexfield      AS flexField
>>FROM
>>  audit_entry audit_entry, audit_change audit_change
>>WHERE
>>  audit_change.audit_entry_id = audit_entry.id
>>  AND audit_entry.object_id = 'artf414029';
>>
>>
>[query reformatted to make it more readable]
>
>Not quite clear why you are aliasing the tables to their own names...
>
>
>

>>---------------------------------------------------------------------------------------------------------------------------------------------
>>
>>Hash Join  (cost=8.79..253664.55 rows=4 width=136) (actual
>>time=4612.674..6683.158 rows=4 loops=1)
>>  Hash Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
>>  ->  Seq Scan on audit_change  (cost=0.00..225212.52 rows=7584852
>>width=123) (actual time=0.009..2838.216 rows=7584852 loops=1)
>>  ->  Hash  (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049
>>rows=4 loops=1)
>>        ->  Index Scan using audit_entry_object on audit_entry
>>(cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1)
>>              Index Cond: ((object_id)::text = 'artf414029'::text)
>>Total runtime: 6683.220 ms
>>
>>
>
>Very odd. It knows the table is large and that the seq-scan is going to
>be expensive.
>
>Try issuing "set enable_seqscan = off" and run the explain analyse
>again. That should show the cost of using the indexes.
>
>
>

With "set enable_seqscan = off":

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=11.35..12497.53 rows=59 width=859) (actual
time=46.074..49.742 rows=7 loops=1)
-> Index Scan using audit_entry_pk on audit_entry (cost=0.00..7455.95
rows=55 width=164) (actual time=45.940..49.541 rows=2 loops=1)
Filter: ((object_id)::text = 'artf1024'::text)
-> Bitmap Heap Scan on audit_change (cost=11.35..90.93 rows=59
width=777) (actual time=0.086..0.088 rows=4 loops=2)
Recheck Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
-> Bitmap Index Scan on audit_change_entry (cost=0.00..11.33 rows=59
width=0) (actual time=0.076..0.076 rows=4 loops=2)
Index Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
Total runtime: 49.801 ms


The db version is 8.2.4

We are wondering if it is because of our audit_entry_id's format (like
'adte1DDFEA5B011C8988C3928752').  Any inputs?
Thanks,
Anne

Re: Need help with one query

From
Robert Haas
Date:
On Fri, Mar 20, 2009 at 1:16 PM, Anne Rosset <arosset@collab.net> wrote:
> Richard Huxton wrote:
>> Anne Rosset wrote:
>>> EXPLAIN ANALYZE
>>> SELECT
>>>  audit_change.id             AS id,
>>>  audit_change.audit_entry_id AS auditEntryId,
>>>  audit_entry.object_id       AS objectId,
>>>  audit_change.property_name  AS propertyName,
>>>  audit_change.property_type  AS propertyType,
>>>  audit_change.old_value      AS oldValue,
>>>  audit_change.new_value      AS newValue,
>>>  audit_change.flexfield      AS flexField
>>> FROM
>>>  audit_entry audit_entry, audit_change audit_change
>>> WHERE
>>>  audit_change.audit_entry_id = audit_entry.id
>>>  AND audit_entry.object_id = 'artf414029';
>>>
>>
>> [query reformatted to make it more readable]
>>
>> Not quite clear why you are aliasing the tables to their own names...
>>
>>
>>>
>>>
>>>
---------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> Hash Join  (cost=8.79..253664.55 rows=4 width=136) (actual
>>> time=4612.674..6683.158 rows=4 loops=1)
>>>  Hash Cond: ((audit_change.audit_entry_id)::text =
>>> (audit_entry.id)::text)
>>>  ->  Seq Scan on audit_change  (cost=0.00..225212.52 rows=7584852
>>> width=123) (actual time=0.009..2838.216 rows=7584852 loops=1)
>>>  ->  Hash  (cost=8.75..8.75 rows=3 width=45) (actual time=0.049..0.049
>>> rows=4 loops=1)
>>>       ->  Index Scan using audit_entry_object on audit_entry
>>> (cost=0.00..8.75 rows=3 width=45) (actual time=0.033..0.042 rows=4 loops=1)
>>>             Index Cond: ((object_id)::text = 'artf414029'::text)
>>> Total runtime: 6683.220 ms
>>>
>>
>> Very odd. It knows the table is large and that the seq-scan is going to
>> be expensive.
>>
>> Try issuing "set enable_seqscan = off" and run the explain analyse
>> again. That should show the cost of using the indexes.
>>
>>
>
> With "set enable_seqscan = off":
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=11.35..12497.53 rows=59 width=859) (actual
> time=46.074..49.742 rows=7 loops=1)
> -> Index Scan using audit_entry_pk on audit_entry (cost=0.00..7455.95
> rows=55 width=164) (actual time=45.940..49.541 rows=2 loops=1)
> Filter: ((object_id)::text = 'artf1024'::text)
> -> Bitmap Heap Scan on audit_change (cost=11.35..90.93 rows=59 width=777)
> (actual time=0.086..0.088 rows=4 loops=2)
> Recheck Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
> -> Bitmap Index Scan on audit_change_entry (cost=0.00..11.33 rows=59
> width=0) (actual time=0.076..0.076 rows=4 loops=2)
> Index Cond: ((audit_change.audit_entry_id)::text = (audit_entry.id)::text)
> Total runtime: 49.801 ms
>
>
> The db version is 8.2.4
>
> We are wondering if it is because of our audit_entry_id's format (like
> 'adte1DDFEA5B011C8988C3928752').  Any inputs?
> Thanks,
> Anne

Something is wrong here.  How can setting enable_seqscan to off result
in a plan with a far lower estimated cost than the original plan?  If
the planner thought the non-seq-scan plan is cheaper, it would have
picked that one to begin with.

...Robert

Re: Need help with one query

From
Alvaro Herrera
Date:
Robert Haas escribió:

> Something is wrong here.  How can setting enable_seqscan to off result
> in a plan with a far lower estimated cost than the original plan?  If
> the planner thought the non-seq-scan plan is cheaper, it would have
> picked that one to begin with.

GEQO?  Anne, what's geqo_threshold set to?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Need help with one query

From
Anne Rosset
Date:
Alvaro Herrera wrote:

>Robert Haas escribió:
>
>
>
>>Something is wrong here.  How can setting enable_seqscan to off result
>>in a plan with a far lower estimated cost than the original plan?  If
>>the planner thought the non-seq-scan plan is cheaper, it would have
>>picked that one to begin with.
>>
>>
>
>GEQO?  Anne, what's geqo_threshold set to?
>
>
>
Hi Alvaro:
It is turned off
geqo | off | Enables genetic query optimization.
Thanks,
Anne

Re: Need help with one query

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 20, 2009 at 1:16 PM, Anne Rosset <arosset@collab.net> wrote:
>> The db version is 8.2.4

> Something is wrong here.  How can setting enable_seqscan to off result
> in a plan with a far lower estimated cost than the original plan?

Planner bug no doubt ... given how old the PG release is, I'm not
particularly interested in probing into it now.  If Anne can still
reproduce this behavior on 8.2.something-recent, we should look closer.

            regards, tom lane

Re: Need help with one query

From
Robert Haas
Date:
On Fri, Mar 20, 2009 at 4:29 PM, Anne Rosset <arosset@collab.net> wrote:
> Alvaro Herrera wrote:
>> Robert Haas escribió:
>>> Something is wrong here.  How can setting enable_seqscan to off result
>>> in a plan with a far lower estimated cost than the original plan?  If
>>> the planner thought the non-seq-scan plan is cheaper, it would have
>>> picked that one to begin with.
>>>
>>
>> GEQO?  Anne, what's geqo_threshold set to?
> Hi Alvaro:
> It is turned off
> geqo | off | Enables genetic query optimization.
> Thanks,
> Anne

Can you please send ALL your non-commented postgresql.conf lines?

...Robert

Re: Need help with one query

From
Anne Rosset
Date:
Tom Lane wrote:

>Robert Haas <robertmhaas@gmail.com> writes:
>
>
>>On Fri, Mar 20, 2009 at 1:16 PM, Anne Rosset <arosset@collab.net> wrote:
>>
>>
>>>The db version is 8.2.4
>>>
>>>
>
>
>
>>Something is wrong here.  How can setting enable_seqscan to off result
>>in a plan with a far lower estimated cost than the original plan?
>>
>>
>
>Planner bug no doubt ... given how old the PG release is, I'm not
>particularly interested in probing into it now.  If Anne can still
>reproduce this behavior on 8.2.something-recent, we should look closer.
>
>            regards, tom lane
>
>
Thanks Tom, Richard.
Here are our postgres conf :


shared_buffers = 2000MB
sort_mem = 150000
vacuum_mem = 100000
work_mem = 20MB                # min 64kB
maintenance_work_mem = 256MB        # min 1MB
max_fsm_pages = 204800
full_page_writes = off            # recover from partial page writes
wal_buffers = 1MB            # min 32kB
                    # (change requires restart)
commit_delay = 20000            # range 0-100000, in microseconds
commit_siblings = 3            # range 1-1000
checkpoint_segments = 128
checkpoint_warning = 240s
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = off
random_page_cost = 2.0
effective_cache_size = 2500MB
geqo = off
default_statistics_target = 750
stats_command_string = on
update_process_title = on

stats_start_collector = on
stats_row_level = on

autovacuum = on
autovacuum_vacuum_threshold = 500    # min # of tuple updates before
                    # vacuum
autovacuum_analyze_threshold = 250    # min # of tuple updates before
                    # analyze
autovacuum_vacuum_scale_factor = 0.2    # fraction of rel size before
                    # vacuum
autovacuum_analyze_scale_factor = 0.1



Anne

Re: Need help with one query

From
Robert Haas
Date:
On Mon, Mar 23, 2009 at 1:08 PM, Anne Rosset <arosset@collab.net> wrote:
> enable_nestloop = off

That may be the source of your problem.  Generally setting enable_* to
off is a debugging tool, not something you ever want to do in
production.

...Robert

Re: Need help with one query

From
Anne Rosset
Date:
Robert Haas wrote:

>On Mon, Mar 23, 2009 at 1:08 PM, Anne Rosset <arosset@collab.net> wrote:
>
>
>>enable_nestloop = off
>>
>>
>
>That may be the source of your problem.  Generally setting enable_* to
>off is a debugging tool, not something you ever want to do in
>production.
>
>...Robert
>
>
Thanks Robert. It seems to have solved the problem. Thanks a lot,

Anne