Re: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index - Mailing list pgsql-general

From Jeff Janes
Subject Re: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index
Date
Msg-id CAMkU=1zBz4pRLiZaxMXrQPwD9Ee-XZLGHt00+QxiFf3_PamArQ@mail.gmail.com
Whole thread Raw
In response to RE: [EXT] Re: Why using a partial index is doing slightly more logical I/O than a normal index  ("Dirschel, Steve" <steve.dirschel@thomsonreuters.com>)
List pgsql-general
On Wed, May 3, 2023 at 2:00 PM Dirschel, Steve <steve.dirschel@thomsonreuters.com> wrote:

Thanks for the reply Jeff.  Yes-  more of an academic question.  Regarding this part:

 

   Index Cond: ((workflow_id = 1070) AND ((status)::text = ANY ('{NOT_STARTED,PAUSED,PENDING,RUNNING}'::text[])))

   Filter: (deleted_millis <= 0)

   Buffers: shared hit=24

 

For this usage, the =ANY is applied as an "in-index filter".  It only descends the index once, to where workflow_id=1070, and then scans forward applying the =ANY to each index-tuple until it exhausts the =1070 condition.  As long as all the =1070 entries fit into just a few buffers, the count of buffers accessed by doing this is fewer than doing the re-descents.  (Stepping from tuple to tuple in the same index page doesn't count as a new access.  While a re-descent releases and reacquires the buffer)

 

There are 2,981,425 rows where workflow_id = 1070.  Does that change your theory of using an “in-index filter” for that plan? 


Yes.  There is no way that that many index tuples will fit in just 24 index leaf pages, so I think it must be using the re-descending method for both plans.  Then I don't know why there is a difference in the number of buffer accesses.  But the difference seems trivial, so I wouldn't put much effort into investigating it.

 

When you say there was a bit of speculation on the “boundard condition” vs “in-index filter” is the speculation on if Postgres has 2 different ways of processing a =ANY filter or is the speculation that one is being used by one plan and the other is being used by the other plan?


The speculation was that this applied to your query.  But going back to some of my original tests, I see that I remembered some of the details wrong on the broader topic as well.  When it voluntarily doesn't use the =ANY as a boundary condition, that shows up in the plan as having the condition evicted from "Index Cond" line and instead show up in a "Filter" line, at least in my one test case (which means it is no longer an in-index filter, as it jumps to the table and verifies visibility before applying the filter).  So the thing that the plans really don't distinguish is between when it just chooses not to use the extra index column for cost reasons, from when it thinks it is unable to use it for correctness/safety reasons.

Cheers,

Jeff

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Invoking SQL function while doing CREATE OR REPLACE on it
Next
From: Evan Rempel
Date:
Subject: Re: RHEL repo package crc mismatches