Re: PostgreSQL 12.3 slow index scan chosen - Mailing list pgsql-performance

From Kenneth Marshall
Subject Re: PostgreSQL 12.3 slow index scan chosen
Date
Msg-id 20200620195543.GR1497@aart.rice.edu
Whole thread Raw
In response to Re: PostgreSQL 12.3 slow index scan chosen  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Sat, Jun 20, 2020 at 02:22:03PM -0400, Tom Lane wrote:
> I wrote:
> > ... oh, now I see: apparently, your filter condition is such that *no*
> > rows of the objectcustomfieldvalues table get past the filter:
> >
> >               ->  Index Scan using objectcustomfieldvalues3 on objectcustomfieldvalues objectcustomfieldvalues_1
(cost=0.56..807603.40rows=915 width=4) (actual time=21165.441..21165.441 rows=0 loops=1)
 
> >                      Filter: ((disabled = 0) AND ((largecontent ~~* '%958575%'::text) OR ((content)::text ~~*
'%958575%'::text)))
> >                      Rows Removed by Filter: 19030904
> 
> > That's kind of annoying :-(.  I wonder if there's a way to be smarter?
> > This case would work a lot better if the filter conditions were not
> > applied till after the merge; but of course that wouldn't be an
> > improvement in general.  Or maybe we should penalize the mergejoin
> > cost estimate if there's a highly selective filter in the way.
> 
> I experimented with this some more, with the intention of creating a
> planner patch that would do the latter, and was surprised to find that
> there already is such a penalty.  It's sort of indirect and undocumented,
> but nonetheless the estimate for whether a mergejoin can stop early
> does get heavily de-rated if the planner realizes that the table is
> being heavily filtered.  So now I'm thinking that your problem is that
> "rows=915" is not a good enough estimate of what will happen in this
> indexscan.  It looks good in comparison to the table size of 19M rows,
> but on a percentage basis compared to the true value of 0 rows, it's
> still pretty bad.  You said you'd increased the stats target for
> objectcustomfieldvalues.objectid, but maybe the real problem is needing
> to increase the targets for content and largecontent, in hopes of driving
> down the estimate for how many rows will pass this filter condition.
> 
>             regards, tom lane

Hi Tom,

I increased the statistics on the content field because it had the most
values (19000000) versus largecontent (1000). When I reached 8000, the
plan changed to:

# explain (analyze,buffers) SELECT DISTINCT main.* FROM Articles main JOIN ObjectCustomFieldValues
ObjectCustomFieldValues_1 ON ( ObjectCustomFieldValues_1.Disabled = '0' ) AND ( ObjectCustomFieldValues_1.ObjectId =
main.id)  WHERE (ObjectCustomFieldValues_1.LargeContent ILIKE '%958575%' OR ObjectCustomFieldValues_1.Content ILIKE
'%958575%')AND (main.Disabled  = '0') ORDER BY main.SortOrder ASC, main.Name ASC;
 
                                                                                 QUERY
PLAN                                                                                 

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=1151.07..1151.10 rows=1 width=137) (actual time=1.782..1.782 rows=0 loops=1)
   Buffers: shared hit=158
   ->  Sort  (cost=1151.07..1151.08 rows=1 width=137) (actual time=1.781..1.781 rows=0 loops=1)
         Sort Key: main.sortorder, main.name, main.id, main.summary, main.class, main.parent, main.uri, main.creator,
main.created,main.lastupdatedby, main.lastupdated
 
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=158
         ->  Hash Join  (cost=185.12..1151.06 rows=1 width=137) (actual time=1.777..1.777 rows=0 loops=1)
               Hash Cond: (objectcustomfieldvalues_1.objectid = main.id)
               Buffers: shared hit=158
               ->  Bitmap Heap Scan on objectcustomfieldvalues objectcustomfieldvalues_1  (cost=177.36..1141.46
rows=699width=4) (actual time=1.704..1.704 rows=0 loops=1)
 
                     Recheck Cond: ((largecontent ~~* '%958575%'::text) OR ((content)::text ~~* '%958575%'::text))
                     Filter: (disabled = 0)
                     Buffers: shared hit=154
                     ->  BitmapOr  (cost=177.36..177.36 rows=868 width=0) (actual time=1.703..1.703 rows=0 loops=1)
                           Buffers: shared hit=154
                           ->  Bitmap Index Scan on objectcustomfieldvalues_largecontent_trgm  (cost=0.00..30.80 rows=1
width=0)(actual time=0.282..0.282 rows=0 loops=1)
 
                                 Index Cond: (largecontent ~~* '%958575%'::text)
                                 Buffers: shared hit=28
                           ->  Bitmap Index Scan on objectcustomfieldvalues_content_trgm  (cost=0.00..146.21 rows=868
width=0)(actual time=1.421..1.421 rows=0 loops=1)
 
                                 Index Cond: ((content)::text ~~* '%958575%'::text)
                                 Buffers: shared hit=126
               ->  Hash  (cost=5.91..5.91 rows=148 width=137) (actual time=0.071..0.071 rows=148 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 34kB
                     Buffers: shared hit=4
                     ->  Seq Scan on articles main  (cost=0.00..5.91 rows=148 width=137) (actual time=0.007..0.044
rows=148loops=1)
 
                           Filter: (disabled = '0'::smallint)
                           Rows Removed by Filter: 5
                           Buffers: shared hit=4
 Planning Time: 15.568 ms
 Execution Time: 1.818 ms
(30 rows)

Time: 17.679 ms

It is too bad that the statistics target has to be set that high to
choose correctly. I wonder if this class of problem would be handled by
some of the ideas discussed in other threads about pessimizing plans
that have exteremely large downsides. Thank you again for looking into
this and I have learned a lot.

Regards,
Ken



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL 12.3 slow index scan chosen
Next
From: Alvaro Herrera
Date:
Subject: Re: PostgreSQL 12.3 slow index scan chosen