Thread: Corrupt index lead to skipped autovacuum

Corrupt index lead to skipped autovacuum

From
高偉鈞
Date:
Postgres version: postgres (PostgreSQL) 16.3 (Ubuntu 16.3-1.pgdg20.04+1)
Related extension:  vector             | 0.6.2   | public     | vector data type and ivfflat and hnsw access methods

We have a database running with vector extension and somehow one of the index is corrupted. During autovacuum the following message shows up:

different vector dimensions 256 and 0
while vacuuming index "ai_user_embedding_idx" of relation "public.ma_ai_aiuser"
automatic vacuum of table "analytics_vector.public.ma_ai_aiuser"

The problem is that after this error, the autovacuum process seems to stop here. All other tables and indexes are left unvacuumed even if they already exceed emergency vacuum threshold. Once we reindex the index, autovacuum starts to work properly.

We will expect the autovacuum will continue to vacuum other tables/indexes and maybe mark the index as invalid? Skip vacuum for all other indexes / tables can be quite dangerous if un-noticed.

Thanks!




高偉鈞|Wei-Chun Kao

技術長|CTO

T:(02) 2521-7375|M:0920-605-597

E:weichun@bebit-tech.com

A:105402 台北市松山區民生東路三段109號17樓 (聯邦大廈)

17F., No. 109, Sec. 3, Minsheng E. Rd., Songshan Dist., Taipei City 105402, Taiwan (R.O.C.)

Re: Corrupt index lead to skipped autovacuum

From
Tomas Vondra
Date:
On 12/24/24 06:43, 高偉鈞 wrote:
> Postgres version: postgres (PostgreSQL) 16.3 (Ubuntu 
> 16.3-1.pgdg20.04+1) Related extension:  vector             | 0.6.2 |
> public     | vector data type and ivfflat and hnsw access methods
> 
> We have a database running with vector extension and somehow one of 
> the index is corrupted. During autovacuum the following message 
> shows up:
> 
> different vector dimensions 256 and 0 while vacuuming index 
> "ai_user_embedding_idx" of relation "public.ma_ai_aiuser" automatic 
> vacuum of table "analytics_vector.public.ma_ai_aiuser"
> 
> The problem is that after this error, the autovacuum process seems 
> to stop here. All other tables and indexes are left unvacuumed even 
> if they already exceed emergency vacuum threshold. Once we reindex 
> the index, autovacuum starts to work properly.
> 
> We will expect the autovacuum will continue to vacuum other tables/ 
> indexes and maybe mark the index as invalid?

We don't have such logic to automatically disable corrupted indexes (or
suspected to be). I don't know how feasible such behavior is  - it may
seem simple, but in my experience there often are many corner cases that
make it unexpectedly tricky. E.g. it might be a bad idea to disable
indexes supporting constraints, because that would cause a complete
outage (while now the impact is limited to the corrupted pages).

For now, the best thing you can do is to monitor the system for
unexpected ERRORs and vacuum performing cleanup as needed, and if the
autovacuum fails (which should be very rare) either rebuild the index or
do vacuum with index_cleanup OFF.

> Skip vacuum for all other indexes / tables can be quite dangerous if 
> un-noticed.
> 

If you don't notice failing vacuum / ERRORs about data corruption and
that vacuum is falling behing, why would you notice some indexes got
disabled? Surely corrupted indexes can lead to all kinds of serious
problems ...


regards

-- Tomas Vondra