Home > mailing lists

Increasing IndexTupleData.t_info from uint16 to uint32 - Mailing list pgsql-hackers

From	Montana Low
Subject	Increasing IndexTupleData.t_info from uint16 to uint32
Date	January 18, 2024 05:10:05
Msg-id	CAAjvh2Q6MVRip0AJuWe0TyHjhujmpJHFAmWtXZMefCVnZDJ17w@mail.gmail.com Whole thread Raw
Responses	Re: Increasing IndexTupleData.t_info from uint16 to uint32 Re: Increasing IndexTupleData.t_info from uint16 to uint32
List	pgsql-hackers

Tree view

The overall trend in machine learning embedding sizes has been growing rapidly over the last few years from 128 up to 4K dimensions yielding additional value and quality improvements. It's not clear when this trend in growth will ease. The leading text embedding models generate now exceeds the index storage available in IndexTupleData.t_info.

The current index tuple size is stored in 13 bits of IndexTupleData.t_info, which limits the max size of an index tuple to 2^13 = 8129 bytes. Vectors implemented by pgvector currently use a 32 bit float for elements, which limits vector size to 2K dimensions, which is no longer state of the art.

I've attached a patch that increases IndexTupleData.t_info from 16bits to 32bits allowing for significantly larger index tuple sizes. I would guess this patch is not a complete implementation that allows for migration from previous versions, but it does compile and initdb succeeds. I'd be happy to continue work if the core team is receptive to an update in this area, and I'd appreciate any feedback the community has on the approach.

I imagine it might be worth hiding this change behind a compile time configuration parameter similar to blocksize. I'm sure there are implications I'm unaware of with this change, but I wanted to start the discussion around a bit of code to see how much would actually need to change.

Also, I believe this is my first mailing list post in a decade or 2, so let me know if I've missed something important. BTW, thanks for all your work over the decades!

Attachment

32bit_index_info.patch

pgsql-hackers by date:

From: Peter Smith
Date: 18 January 2024, 05:01:00
Subject: Re: Synchronizing slots from primary to standby

From: Andy Fan
Date: 18 January 2024, 05:10:57
Subject: Re: Strange Bitmapset manipulation in DiscreteKnapsack()

Increasing IndexTupleData.t_info from uint16 to uint32 - Mailing list pgsql-hackers

Attachment

Previous

Next