Re: Index no longer being used, destroying and recreating it restoresuse. - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Index no longer being used, destroying and recreating it restoresuse.
Date
Msg-id 92ea007a-abe0-1dd2-7e5c-080da70c2e2c@aklaver.com
Whole thread Raw
In response to Re: Index no longer being used, destroying and recreating it restores use.  (Koen De Groote <kdg.dev@gmail.com>)
Responses Re: Index no longer being used, destroying and recreating it restores use.  (Koen De Groote <kdg.dev@gmail.com>)
List pgsql-general
On 6/5/20 7:05 AM, Koen De Groote wrote:
> I've collected all relevant info(I think so at least) and put it here:
> 
> The table in question is used to keep filepath data, of files on a 
> harddrive.
> The query in question is used to retrieve items which should be backed 
> up, but have not yet been.
> 
> The relevant columns of the table:
> 
>                                                      Table "public.item"
>                 Column               |            Type             | 
> Collation | Nullable |                   Default
>
------------------------------------+-----------------------------+-----------+----------+----------------------------------------------
>   id                                 | bigint                      |     
>        | not null | nextval('item_id_seq'::regclass)
>   shouldbebackedup                   | boolean                     |     
>        | not null | true
>   backupperformed                    | boolean                     |     
>        | not null | false
>   itemCreated                        | timestamp without time zone |     
>        |          | now()
>   filepath                           | text                        |     
>        |          |
> 
> 
> The existing index, which no longer gets used:
> "index_in_question" btree (shouldbebackedup, backupperformed, 
> itemCreated, filepath) WHERE shouldbebackedup = true AND backupperformed 
> = false
> 
> The new index, made out of the exact same columns and conditions, get 
> used immediately after creation:
> CREATE INDEX CONCURRENTLY index_test ON item USING btree 
> (shouldbebackedup, backupperformed, itemCreated, filepath) WHERE 
> shouldbebackedup = true AND backupperformed = false;
> 
> 
> The query in question will look something like this:
> select * from item where shouldbebackedup=true and 
> itemCreated<='2020-06-05 00:00:00.000' and backupperformed=false order 
> by filepath asc, id asc limit 100 offset 10400;

The result of EXPLAIN ANALYZE for above.

> 
> Having done a count, there are around 13000 items here, without the 
> offset and limit.
> That being said, the amount is entirely dependant on what was added on a 
> previous day.
> 
> 
> I tried creating an extended statistic, like this, but it had no effect:
> CREATE STATISTICS s1 (dependencies) ON shouldbebackedup, backupperformed 
> FROM item;
> 
> Settings from the conf file I think are related:
> 
> shared_buffers = 1024MB
> effective_cache_size = 2048MB
> random_page_cost = 1.1
> effective_io_concurrency = 200
> work_mem = 32MB
> 
> Finally, I state again that this database gets a nightly "vacuum analyze".
> 
> My thanks for looking at this and any suggestions one might have.
> 
> Regards,
> Koen
> 
> On Thu, Jun 4, 2020 at 7:08 PM Tom Lane <tgl@sss.pgh.pa.us 
> <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>     Adrian Klaver <adrian.klaver@aklaver.com
>     <mailto:adrian.klaver@aklaver.com>> writes:
>      > On 6/4/20 9:43 AM, Tom Lane wrote:
>      >> It's possible that the index had bloated to the point where the
>     planner
>      >> thought it was cheaper to use a seqscan.  Did you make a note of the
>      >> cost estimates for the different plans?
> 
>      > I missed the part where the OP pointed to a SO question. In that
>      > question where links to explain.depesz.com
>     <http://explain.depesz.com> output.
> 
>     Ah, I didn't bother to chase that link either.
> 
>     So the cost estimates are only a fraction of a percent apart, making
>     it unsurprising for not-so-large changes in the index size to cause
>     a flip in the apparently-cheapest plan.  The real question then is
>     why the cost estimates aren't actually modeling the real execution
>     times very well; and I'd venture that that question boils down to
>     why is this rowcount estimate so far off:
> 
>      >                ->  Parallel Seq Scan on oscar mike_three
>      > (cost=0.000..1934568.500 rows=2385585 width=3141) (actual
>      > time=159.800..158018.961 rows=23586 loops=3)
>      >                        Filter: (four AND (NOT bravo) AND (zulu <=
>      > 'echo'::timestamp without time zone))
>      >                        Rows Removed by Filter: 8610174
> 
>     We're not going to be able to answer that if the OP doesn't wish
>     to decloak his data a bit more ... but a reasonable guess is that
>     those filter conditions are correlated.  With late-model Postgres
>     you might be able to improve matters by creating extended statistics
>     for this table.
> 
>                              regards, tom lane
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



pgsql-general by date:

Previous
From: Koen De Groote
Date:
Subject: Re: Index no longer being used, destroying and recreating it restores use.
Next
From: Thomas Kellerer
Date:
Subject: Re: Oracle vs. PostgreSQL - a comment