Re: TODO item: Allow data to be pulled directly from indexes - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: TODO item: Allow data to be pulled directly from indexes
Date
Msg-id 48688C50.2090702@enterprisedb.com
Whole thread Raw
In response to TODO item: Allow data to be pulled directly from indexes  (Karl Schnaitter <karlsch@soe.ucsc.edu>)
Responses Re: TODO item: Allow data to be pulled directly from indexes  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
Karl Schnaitter wrote:
> The main difference between (1) & (4) is that (1) will sometimes require 
> heap lookups and (4) never will. Moreover, the heap lookups in (1) will 
> be difficult for the optimizer to estimate, unless some special 
> statistics can be maintained for this purpose.

Yeah, we certainly should maintain a statistic for it.

> (2) & (3) can work for any index, and they are quite elegant in the way 
> that the overhead does not change with the number of indexes. The TODO 
> also notes the benefit of (2) for efficient vacuuming. Thus, I think 
> that (2) is a great idea in general, but it does not serve the intended 
> purpose of this TODO item. Once a page gets marked as requiring 
> visibility checks, it cannot be unmarked until the next VACUUM. The 
> whole point of this feature is that we are willing to be more proactive 
> during updates in order to make index access more efficient.

In some cases we can mark a page earlier, as soon as we see that the 
condition is true. Most importantly, when new tuples are inserted, we 
can mark the page as soon as the inserting transaction is visible to all.

Also, the visibility map ought to make vacuums cheaper, as you only need 
to scan the parts of the table that have beem modified since last 
vacuum. You still need to scan all indexes, though. But assuming that 
you somehow solve the correctness issues in the "add visibility fields 
to index tuples" approach, we can use the same solution to perform 
retail vacuums, which would bring vacuuming and the visibility map 
approach on par with that approach anyway.

> Please let me know if you agree/disagree with anything here. I need to get this feature implemented for my research,
butI would also love to contribute it to the community so your opinions matter a lot.
 

Well, I think the visibility map is a much better approach. This has 
been discussed many times before, so I don't really have anything new to 
add.

I've been working adding support for so-called "relation forks" 
(http://archives.postgresql.org/message-id/48651722.8020600@enterprisedb.com), 
to allow attaching metadata to relations, like the visibility map. I'm 
going to use the facility for a new FSM implementation, which I'm 
working on at the moment, but after that's done I'm going to start 
working on the visibility map. And after that's done and working for 
VACUUM, I'm going to work on using it for index-only-scans.

I'm not sure I have enough time to get all that done for 8.4, it's 
looking bad at the moment, so help would be much appreciated. If you 
don't agree with taking the visibility map approach, I would suggest 
working on the indexam API changes first, to allow returning index 
tuples from an index. I believe that part is the same regardless of how 
we check the visibility.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Join Removal/ Vertical Partitioning
Next
From: Richard Huxton
Date:
Subject: Re: Does anything dump per-database config settings? (was Re: ALTER DATABASE vs pg_dump)