Re: TODO item: Allow data to be pulled directly from indexes - Mailing list pgsql-hackers

From Karl Schnaitter
Subject Re: TODO item: Allow data to be pulled directly from indexes
Date
Msg-id 4867E384.9080406@soe.ucsc.edu
Whole thread Raw
In response to Re: TODO item: Allow data to be pulled directly from indexes  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
Gregory Stark wrote:
>> (1) & (4) require an UPDATE or DELETE to twiddle the old index tuple. Tom has
>> noted (in the linked message) that this is not reliable if the index has any
>> expression-valued columns, because it is not always possible to find the old
>> index entry. For this reason, the proposed patch does not keep visibility
>> metadata for indexes on expressions. This seems like a reasonable limitation
>> --- indexed expressions are just less efficient.
>>     
>
> Or if the index operators and btproc aren't nearly as immutable as they claim.
> Probably less likely than non-immutable index expressions but also possible.
>
>   
Your point is well taken... I'll have to look into that more.

>> (2) & (3) can work for any index, and they are quite elegant in the way that
>> the overhead does not change with the number of indexes. The TODO also notes
>> the benefit of (2) for efficient vacuuming. Thus, I think that (2) is a great
>> idea in general, but it does not serve the intended purpose of this TODO item.
>> Once a page gets marked as requiring visibility checks, it cannot be unmarked
>> until the next VACUUM. The whole point of this feature is that we are willing
>> to be more proactive during updates in order to make index access more
>> efficient.
>>     
>
> Well I think that's precisely the point. If you're trading off work done at
> update time against work done for index accesses then you're only going to win
> if the tuples are relatively static and have lots of accesses done against
> them between updates. In which case having the optimization only kick in when
> the page has been static for long enough that all the tuples are globally
> visible should be good enough
I really don't understand this point. The way I see the visibility map 
working is as follows: we set a page to "requires visibility check" when 
a tuple on the page is inserted, deleted, or non-HOT updated. If the 
only modifications have been inserts, we can reset the status to "all 
tuples visible" when these tuples become universally visible, which 
matches your description. But in the presence of deletes and updates, we 
can only reset the status of a page after a VACUUM (I know that dead HOT 
tuples can be pruned without VACUUM, but I don't think that's the case 
for indexed tuples). We can't reset the status earlier because we don't 
know what indexes still have pointers to the dead tuples. So a page can 
be static indefinitely (after a single modification) without ever 
getting to enjoy the optimization.

This is a really important point, so please let me know if I'm missing 
something.

Thanks for your response!
Karl


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: WIP patch: reducing overhead for repeat de-TOASTing
Next
From: Robert Treat
Date:
Subject: Re: Does anything dump per-database config settings? (was Re: ALTER DATABASE vs pg_dump)