Re: Is it possible to have a "fast-write" Index? - Mailing list pgsql-hackers

From Gavin Flower
Subject Re: Is it possible to have a "fast-write" Index?
Date
Msg-id 55722CB2.3030806@archidevsys.co.nz
Whole thread Raw
In response to Is it possible to have a "fast-write" Index?  (deavid <deavidsedice@gmail.com>)
Responses Re: Is it possible to have a "fast-write" Index?  (deavid <deavidsedice@gmail.com>)
List pgsql-hackers
On 06/06/15 04:07, deavid wrote:
> There are several use cases where I see useful an index, but adding it 
> will slow too much inserts and updates.
> For example, when we have 10 million rows on a table, and it's a table 
> which has frequent updates, we need several index to speed up selects, 
> but then we'll slow down updates a lot, specially when we have 10 or 
> more indexes.
> Other cases involve indexes for text search, which are used only for 
> user search and aren't that important, so we want to have them, but we 
> don't want the overload they put whenever we write on the table.
> I know different approaches that already solve some of those problems 
> in some ways (table partitioning, partial indexes, etc), but i don't 
> feel they are the solution to every problem of this kind.
>
> Some people already asked for "delayed write" indexes, but the idea 
> gets discarded because the index could get out of sync, so it can omit 
> results and this is unacceptable. But i think maybe that could be 
> fixed in several ways and we can have a fast and reliable index (but 
> maybe not so fast on selects).
>
> Since I do not know every internal of postgres, i feel simpler to 
> share here and ask which things can or cannot be done.
>
> Let's imagine there is a new type of index called "weird_btree", where 
> we trade-off simplicity for speed. In almost every mode, we will rely 
> on VACUUM to put our index in optimal state.
>
> Mode 1: on "aminsert" mark the index as INVALID. So, if you modified 
> the table you need to run REINDEX/CREATE INDEX CONCURRENTLY before 
> doing SELECT. This is almost the same as create index concurrently, 
> the main difference is you don't have to remember to drop the index 
> before writing. (I don't see much benefit here)
>
> Mode 2: on "aminsert", put the new entry in a plain, unordered list 
> instead of the btree. Inserting at the end of a list should be faster 
> than big btrees and you'll know later which entries you missed indexing.
>
> Mode 2.a: on index scan (amrescan, amgettuple), pretend that after the 
> btree there is the list and output every row, out-of order. You will 
> have to tell postgres that our index isn't sorted and it will have to 
> recheck every row.
>
> Mode 2.b: mark the index invalid instead. When doing the next vacuum, 
> sort the list and insert it to the btree in a bulk operation. If it's 
> ok, mark the index valid.
>
> Mode 3: on "aminsert", put the new entry on a second btree; leaving 
> the first one untouched. Because the second btree is new, will be 
> small, and writes should be faster. When doing a index scan, read 
> tuples from both at same time (like merge sort). On vacuum, merge the 
> second btree onto the first. On this mode, the index is sorted and 
> there's no need of recheck.
>
> Anyone thinks this would be a interesting feature for postgresql?
> Did I miss something?
>
> PD: Maybe it's also possible to take advantage of clustering, and have 
> indexes which entries are range of TIDs; but i'm not sure if this is 
> too exotic, or if it will make a difference.
>
> Sincerely,
> David.
How about a hybrid indexing system, with 2 parts:

(1) existing index system which is checked first and has been mostly 
optimised for speed of reading.  If there are only a few inserts/updates 
and the system is not heavily loaded, then it gets modified 
immediately.  The threshold for being too busy, and few enough changes, 
could be configurable.

(2) overflow index optimised for writing.  Possible in memory and not 
backed to permanent storage.  A crash would require a complete index 
rebuild - but only when there were entries in it (or at least more than 
some configurable threshold, to allow for cases were some missing index 
entries are acceptable).

So where the index is needed for a query, part 1 is checked first, and 
the part 2 if necessary

Have a background process that will move entries from part 2 to part 1, 
when the systems is less busy.


Cheers,
Gavin






pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: could not truncate directory "pg_subtrans": apparent wraparound
Next
From: deavid
Date:
Subject: Re: Is it possible to have a "fast-write" Index?