Thread: Question about behavior of conditional indexes

Question about behavior of conditional indexes

From
Koen De Groote
Date:
Greetings all,

Working on postgres 11.

I'm researching an index growing in size and never shrinking, and not being used anymore after a while.

The index looks like this:

"index002" btree (action_performed, should_still_perform_action, action_performed_at DESC) WHERE should_still_perform_action = false AND action_performed = true

So, there are 2 boolean fields, and a conditional clause for both. The table has about 50M rows, the index barely ever goes over 100K matched rows.

The idea is to search for rows matching these conditions quickly, and then update them. This happens daily.

This means the condition no longer match the index. At this point, does the row get removed from the index? Or does it stay in there as a dead row?

I'm noticing index bloat on this index and am wondering if all these updated rows just stay in the index?

The index currently sits at 330MB. If I drop and re-create it, it's 1.5MB.

A cronjob runs a vacuum once per day, I can see the amount of dead rows dropping in monitoring software.

But should this also take care of indexes? In postgres 11, you can't reindex concurrently, so I was wondering if indexes are skipped by vacuum? Or only in case of conditional indexes?



So I'm wondering if the behavior is as I described.

Regards,
Koen De Groote

Re: Question about behavior of conditional indexes

From
Gavin Flower
Date:
On 21/09/21 22:28, Koen De Groote wrote:
> Greetings all,
>
> Working on postgres 11.
>
> I'm researching an index growing in size and never shrinking, and not 
> being used anymore after a while.
>
> The index looks like this:
>
> |"index002" btree (action_performed, should_still_perform_action, 
> action_performed_at DESC) WHERE should_still_perform_action = false 
> AND action_performed = true |
> So, there are 2 boolean fields, and a conditional clause for both. The 
> table has about 50M rows, the index barely ever goes over 100K matched 
> rows.
>
> The idea is to search for rows matching these conditions quickly, and 
> then update them. This happens daily.
>
> This means the condition no longer match the index. At this point, 
> does the row get removed from the index? Or does it stay in there as a 
> dead row?
>
> I'm noticing index bloat on this index and am wondering if all these 
> updated rows just stay in the index?
>
> The index currently sits at 330MB. If I drop and re-create it, it's 1.5MB.
>
> A cronjob runs a vacuum once per day, I can see the amount of dead 
> rows dropping in monitoring software.
>
> But should this also take care of indexes? In postgres 11, you can't 
> reindex concurrently, so I was wondering if indexes are skipped by 
> vacuum? Or only in case of conditional indexes?
>
>
>
> So I'm wondering if the behavior is as I described.
>
> Regards,
> Koen De Groote

Can you upgrade to pg13?  A lot of work was done on indexes in pg12 & 
13.  So possibly your problem may have been resolved, at least in part.

Note that pg 14 is due out this month.


Cheers,
Gavin




Re: Question about behavior of conditional indexes

From
Ninad Shah
Date:


On Tue, 21 Sept 2021 at 15:59, Koen De Groote <kdg.dev@gmail.com> wrote:
Greetings all,

Working on postgres 11.

I'm researching an index growing in size and never shrinking, and not being used anymore after a while.

The index looks like this:

"index002" btree (action_performed, should_still_perform_action, action_performed_at DESC) WHERE should_still_perform_action = false AND action_performed = true

So, there are 2 boolean fields, and a conditional clause for both. The table has about 50M rows, the index barely ever goes over 100K matched rows.

The idea is to search for rows matching these conditions quickly, and then update them. This happens daily.

This means the condition no longer match the index. At this point, does the row get removed from the index? Or does it stay in there as a dead row?

I'm noticing index bloat on this index and am wondering if all these updated rows just stay in the index?

The index currently sits at 330MB. If I drop and re-create it, it's 1.5MB.

A cronjob runs a vacuum once per day, I can see the amount of dead rows dropping in monitoring software.
   - This doesn't reclaim the space. VACUUM operation cleans up space above the upper edge(High-water mark). Interleaved fragmentation will be marked for reuse.

But should this also take care of indexes? In postgres 11, you can't reindex concurrently, so I was wondering if indexes are skipped by vacuum? Or only in case of conditional indexes?
  - They aren't left untouched by VACUUM, but as I mentioned reclaiming space is not a job of VACUUM operation. You must execute VACUUM FULL. AFAIK, reindexing the 1.5MB index may not need a lot of time(regardless of concurrent reindexing feature).



So I'm wondering if the behavior is as I described.

Regards,
Koen De Groote

Re: Question about behavior of conditional indexes

From
Michael Lewis
Date:
You likely need to tune your autovacuum settings such that the index bloat reaches a steady-state and does not continue to increase. When vacuum runs, it will remove dead (and no longer visible to any running transactions) tuples aka row versions from each page (8KB block by default) in the file for that table's data. It will also update the index, except in newer versions of Postgres where that behavior becomes optional (see manual for which version and whether it still runs by default). If you are I/O bound and cannot tune the system defaults to autovacuum more proactively (when a smaller percent of rows are dead), then perhaps just change the settings for that table as it seems to be functioning a bit like a queue. Or you might consider setting up a daily job to vacuum analyze on all tables, if there is a period of low system activity. If you didn't have the index on the columns you are updating, then reducing fillfactor would be an option to increase HOT updates and thereby prevent the index bloat. Alas, that's not an option with the index needing to reference those values that are being changed.

"index002" btree (action_performed, should_still_perform_action, action_performed_at DESC) WHERE should_still_perform_action = false AND action_performed = true
That index seems odd to me. Why index a single value for the first two columns? I would recreate it with those skipped. Also, based on the names, I'd expect the opposite for true and false. That is, the "interesting" rows are where the action has NOT yet been performed yet and it is needed. I'd expect the index as defined to cover most of the table rather than a small fraction. Perhaps just a typo from manual obfuscation.
For what it's worth, you can create new concurrently, drop old concurrently, then rename new to old. That would be the same result as a reindex concurrently.

Re: Question about behavior of conditional indexes

From
Koen De Groote
Date:
Thanks for the replies, everyone.

Gavin - I can't upgrade to a more recent version, at least not for the foreseeable future. From what I'm reading, it's the best path forward, but there's considerations to be made that I can't overrule.

Ninad - As I suspected about VACUUM and VACUUM FULL. Thanks for confirming

Michael - Yes, my mistakes in manually obfuscating.

Going over the documentation, is seems after pg 11, several additions are made, like "vacuum_index_cleanup" and "VACUUM INDEX_CLEANUP".
From what I'm reading, these did not exist in pg11.Googling a bit shows me this: https://www.depesz.com/2019/05/01/waiting-for-postgresql-12-allow-vacuum-to-be-run-with-index-cleanup-disabled/

From which I assume that index cleaning is the default in pg11 and cannot be controlled. Though it appears to not be "enough" for my use case.

There is already a cronjob that does "VACUUM ANALYZE" during the night. Though it seems the index keeps growing, regardless of cleanup... I'm not sure what to think of that, or what to look for, apart from more agressive autovacuum.

I'll look into dropping and recreating this index without those fields, and the fillfactor you mention.







On Tue, Sep 21, 2021 at 6:15 PM Michael Lewis <mlewis@entrata.com> wrote:
You likely need to tune your autovacuum settings such that the index bloat reaches a steady-state and does not continue to increase. When vacuum runs, it will remove dead (and no longer visible to any running transactions) tuples aka row versions from each page (8KB block by default) in the file for that table's data. It will also update the index, except in newer versions of Postgres where that behavior becomes optional (see manual for which version and whether it still runs by default). If you are I/O bound and cannot tune the system defaults to autovacuum more proactively (when a smaller percent of rows are dead), then perhaps just change the settings for that table as it seems to be functioning a bit like a queue. Or you might consider setting up a daily job to vacuum analyze on all tables, if there is a period of low system activity. If you didn't have the index on the columns you are updating, then reducing fillfactor would be an option to increase HOT updates and thereby prevent the index bloat. Alas, that's not an option with the index needing to reference those values that are being changed.

"index002" btree (action_performed, should_still_perform_action, action_performed_at DESC) WHERE should_still_perform_action = false AND action_performed = true
That index seems odd to me. Why index a single value for the first two columns? I would recreate it with those skipped. Also, based on the names, I'd expect the opposite for true and false. That is, the "interesting" rows are where the action has NOT yet been performed yet and it is needed. I'd expect the index as defined to cover most of the table rather than a small fraction. Perhaps just a typo from manual obfuscation.
For what it's worth, you can create new concurrently, drop old concurrently, then rename new to old. That would be the same result as a reindex concurrently.

Re: Question about behavior of conditional indexes

From
Michael Lewis
Date:
Just to clarify, I suggested fillfactor likely could *not* help because you are indexing a column that has the value change. The idea with reducing fillfactor is that the rows can be updated with the new versions remaining in the same 8KB block in the file for table/row storage. If the new version of the row remains in the same page, then the index pointer doesn't have to be updated until that old version of the row gets vacuumed away. But alas, when the value in the index changes, then all bets are off. Although, I suppose in your workflow you might update these rows frequently and NOT change the status column, then I would certainly consider reducing the fillfactor, but it will mean perpetual "reserved space" (normally called bloat though that has a different implication) in the blocks/pages that only hold old records that won't be getting updates anymore.

If you check pg_stat_user_tables, then you will see autovauum count and can check it periodically to see how often it is doing work. I'd lower autovacuum_vacuum_scale_factor and perhaps autovacuum_vacuum_cost_delay (default was 20ms and lowered to 2ms in PG12) to ensure work is done more often, and more work is done in each cycle before it pauses to avoid overloading the I/O system.