Hi
Please forgive this question if it is naïve.
Would the following be practical to implement:
A btree-like index type that points to *pages* rather than individual rows. Ie if there are many rows in a page with the same data (in the indexed columns), only one index entry will exist. In its normal use case, this index would be much smaller than a regular index on the same columns which would contain one entry for each individual row.
To reduce complexity (eg MVCC/snapshot related issues), index entries would be added when a row is inserted, but they would not be removed when the row is updated/deleted (or when an insert is rolled back): this would cause index bloat over time in volatile tables but this would be acceptable for the use case I have in mind. So in essence, an entry in the index would indicate that there *may* be matching rows in the page, not that there actually are.
For data where there is a high degree of clustering on the columns of the index, this index would be quite efficient at locating matching rows: the index itself would be small and most pages would contain largely rows that match. (In practice it would behave similar to a ‘cluster’ index in Oracle, without the guarantee of clustering).
This brings me on to my use case: a method to efficiently keep a table largely clustered (with ‘clustered’ more loosely defined than the usual Postgres definition) without locking more than a page at a time. The above index could be scanned periodically for entries where multiple keys point to the same page. Of that set of keys/pages, the subset where the same key appears in multiple pages can be clustered by simply deleting and reinserting the rows, key by key. In this case, the pages affected by the deletes could be locked briefly in order to safely delete the corresponding index entries without conflict with other transactions. If a lock can’t be obtained the process could simply skip and move on to the next candidate key/pages.
Although this kind of ‘clustered’ data is less ordered that a normal Postgres cluster, it retains some of the useful properties: particularly that data for a particular key is likely to be found in a small number of pages rather than scattered evenly through the entire table, requiring potentially far more io to select.
Many thanks in advance and my apologies once again if this kind of thing has been rejected before or is obviously impractical.
Jack