Thread: No heap lookups on index

No heap lookups on index

From
David Scott
Date:
Allow me a brief introduction.  I work in a company who contracts 
intelligence analysis software to the government.  We are currently 
developing a product which is using PostgreSQL at it's core.  Due to the 
licensing of the product and the integration with perl this is our first 
choice in database solutions.

We are, however, currently stuck.  We are storing millions of rows and 
require very high query performance.  We have spent the last several 
months tweaking, list lurking and researching all the various tweaks and 
performance enhancements and have come to the conclusion that our 
biggest slowdown is validating the index rows which match our selection 
criteria against the heap values.  In general cases there is a very 
small amount required for this, but in our extreme use cases we are 
finding this to slow our queries by an unacceptable amount of time. 

We would like to resolve this issue.  In that endeavor we have done some 
feasibility analysis (either to write a patch ourselves or attempt to 
commission an expert to do so), starting with the archives for this 
list.  We found several posts discussing the issue and it seems that the 
complexity of storing the tuple visibility information inside of the 
index rows is prohibitive for simple indexes. 

I have used SQL Server in the past and have noticed that bookmark 
lookups are avoided because they force the query executor to actually 
fetch the data page off of disk, rather then return the values that 
exist in the index.  I have verified times against the PostgreSQL 
installation and SQL Server to verify that the SQL Server queries come 
back at roughly the same speed when avoiding bookmark lookups as 
Postgres queries accessing clustered tables using the index the table is 
clustered on.

Since I am sure everyone is tired of the intro by now, I'll get to the 
questions:   Do commercial databases implement MVCC in a way that allows an 
efficient implementation of index lookups that can avoid heap lookups?    Is there any way to modify PostgreSQL to
allowindex lookups without 
 
heap validation that doesn't involve re-writing the MVCC implementation 
of keeping dead rows on the live table?    Is the additional overhead of keeping full tuple visibility 
information inside of the index so odious to the Postgres community as 
to prevent a patch with this solution from being applied back to the 
head?  Maybe as an optional use feature?  We would prefer this solution 
for our needs over the bitmap of heap pages listed in the TODO list 
because we want to ensure optimal query times, regardless of the state 
of the cache and because we are concerned with performance in the face 
of concurrent updates on the page level.

Thanks for any thoughts on this, I know this is a  perennial topic, but 
we are seriously considering contributing either code or money to the 
solution of this problem.

David Scott
Applied Technical Systems, Inc.





Re: No heap lookups on index

From
Tom Lane
Date:
David Scott <davids@apptechsys.com> writes:
>     Is the additional overhead of keeping full tuple visibility 
> information inside of the index so odious to the Postgres community as 
> to prevent a patch with this solution from being applied back to the 
> head?

This has been discussed and rejected before (multiple times).  If you
want it considered you'll have to present stronger arguments than have
so far been made.  The current consensus is that the probability of a
net performance win is not good enough to justify the large amount of
development effort that would be required.

What sort of problems are you dealing with exactly?  There has been
some discussion of changes that would improve certain scenarios.  For
instance it might be plausible to do joins using index information and
only go back to the heap for entries that appear to pass the join test.
        regards, tom lane


Re: No heap lookups on index

From
"Jonah H. Harris"
Date:
David,

You can find some of this discussion in "Much Ado About COUNT(*)".  Related to that discussion, I had written a patch which added visibility information to the indexes.

If you're interested in the patch and/or consulting, contact me offline.

-Jonah


On 1/18/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
David Scott <davids@apptechsys.com> writes:
>     Is the additional overhead of keeping full tuple visibility
> information inside of the index so odious to the Postgres community as
> to prevent a patch with this solution from being applied back to the
> head?

This has been discussed and rejected before (multiple times).  If you
want it considered you'll have to present stronger arguments than have
so far been made.  The current consensus is that the probability of a
net performance win is not good enough to justify the large amount of
development effort that would be required.

What sort of problems are you dealing with exactly?  There has been
some discussion of changes that would improve certain scenarios.  For
instance it might be plausible to do joins using index information and
only go back to the heap for entries that appear to pass the join test.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: No heap lookups on index

From
Simon Riggs
Date:
On Wed, 2006-01-18 at 12:14 -0800, David Scott wrote:
> Is the additional overhead of keeping full tuple visibility 
> information inside of the index so odious to the Postgres community
> as 
> to prevent a patch with this solution from being applied back to the 
> head?  Maybe as an optional use feature?

You might want to consider the thought of "organised heaps" as an
alternative thought to index improvements. That way there is no heap to
avoid visiting because the index is also the main data structure.

Teradata provides hash or value-ordered tables
Oracle offers index organised tables
DB2 offers multi-dimensional clustering
Tandem offered value ordered tables

This would offer performance, but would be one of the largest patches
seen in recent times. You may find some co-backers.

Best Regards, Simon Riggs




Re: No heap lookups on index

From
"Jim C. Nasby"
Date:
On Wed, Jan 18, 2006 at 12:14:12PM -0800, David Scott wrote:
>    Do commercial databases implement MVCC in a way that allows an 
> efficient implementation of index lookups that can avoid heap lookups? 

Oracle does, but you pay in other ways. Instead of keeping dead tuples
in the main heap, they shuffle them off to an 'undo log'. This has some
downsides:

Rollbacks take *forever*, though this usually isn't much of an issue
unless you need to abort a really big transaction.

Every update/delete means two seperate writes to disk, one for the base
table and one for the undo log (well, there's also the redo log,
equivalent to our WAL). Though writes to undo can (and presumably are)
grouped together, so they should normally be a lot more efficient than
the updates to the base table unless you're updating data in table
order.

Of course there's downsides to our MVCC as well; the cost of index scans
is just one.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: No heap lookups on index

From
"Jim C. Nasby"
Date:
On Wed, Jan 18, 2006 at 04:02:45PM -0500, Jonah H. Harris wrote:
> David,
> 
> You can find some of this discussion in "Much Ado About COUNT(*)".  Related
> to that discussion, I had written a patch which added visibility information
> to the indexes.
> 
> If you're interested in the patch and/or consulting, contact me offline.

Does the patch change all indexes across the board? Do you have any
performance numbers?

I suspect that in some situations storing visibility info in the index
would be a big win; if that's the case it would be very good if there
was an option that allowed it. Perhaps this could be done using a
different index access method...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: No heap lookups on index

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> You might want to consider the thought of "organised heaps" as an
> alternative thought to index improvements. That way there is no heap to
> avoid visiting because the index is also the main data structure.
> This would offer performance, but would be one of the largest patches
> seen in recent times. You may find some co-backers.

Either way it would be a pretty monstrous patch :-( ... in this case
because of the amount of code that knows about the properties of heap
storage, and in what David is thinking about because of the implications
of trying to keep multiple copies of tuple state up-to-date.

We'd probably end up with a cleaner system structure if we tried to
create an API separating out the knowledge of heap structure, but the
amount of work needed seems out of proportion to the benefit.

It might be possible to compromise though.  Imagine an index that
contains only the upper levels of a search tree --- links to what
would be the leaf level point into the associated heap.  In this design
the heap is still a heap in the sense that you can seqscan it without
any awareness of the index structure.  What you can't do is insert
tuples or move them around without the index AM's say-so.
RelationGetBufferForTuple would become an index AM call, but otherwise
I think the impact on existing code wouldn't be large.

There are some limitations.  For instance I don't think that the index
AM could control the order of items within a heap page, because of the
need for TIDs to be persistent; so within-page searches would still be
kinda slow.  But it's interesting to think about.
        regards, tom lane


Re: No heap lookups on index

From
Simon Riggs
Date:
On Wed, 2006-01-18 at 18:27 -0500, Tom Lane wrote:
> Imagine an index that
> contains only the upper levels of a search tree --- links to what
> would be the leaf level point into the associated heap.  In this
> design
> the heap is still a heap in the sense that you can seqscan it without
> any awareness of the index structure.  What you can't do is insert
> tuples or move them around without the index AM's say-so.
> RelationGetBufferForTuple would become an index AM call, but otherwise
> I think the impact on existing code wouldn't be large.

Eureka! I had been thinking of a "block level index" which sounds almost
the same thing (as opposed to the row level indexes we have now). We
only need to index the row with the lowest value on any page so the main
index would get 100 times smaller. The main part of the index would not
need to be written to except when a block overflows.

I had imagined an ordering within a block to allow fast uniqueness
checks, but it would be pretty fast either way.

Merge joins with the same index become block-level joins without sorts.
We would just do an individual block sort before merging, so no need for
very large sort-merges. Even if the block level indexes differ, we only
need to sort one of the tables.

Hopefully we could avoid trying to support GIST-heaps?

Best Regards, Simon Riggs




Re: No heap lookups on index

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> Hopefully we could avoid trying to support GIST-heaps?

Well, that would be an extra index AM that someone might or might not
get around to writing someday.  I was thinking that both btree and hash
index AMs might be interesting for this, though.  Hash in particular
would adapt pretty trivially ...
        regards, tom lane


Re: No heap lookups on index

From
Tom Lane
Date:
Simon Riggs <simon@2ndquadrant.com> writes:
> We only need to index the row with the lowest value on any page so the main
> index would get 100 times smaller. The main part of the index would not
> need to be written to except when a block overflows.

BTW, the above is equivalent to saying that the leaf-level index pages
aren't there: the downlink pointers on the level-1 index pages are
pointers to heap pages, instead, and you're right that they effectively
only index the lowest value per page (actually IIRC the highest value
per page, but same difference).

I think the "100x" figure is overoptimistic though.  There will be a lot
fewer entries per leaf page because actual heap tuples will be a lot
larger than index entries (typically at least).  Hence, you need more
level-1 entries and so the upper index levels are bigger than in a
simple index.  Another point is that the heap will be somewhat bloated
compared to a simple heap because of containing more unused space.
The traditional rule-of-thumb is that a btree index is only about 2/3rds
full at steady state, and I suppose this would apply to a
btree-organized heap too.

Still, it seems like an idea worth investigating.

> Merge joins with the same index become block-level joins without sorts.
> We would just do an individual block sort before merging, so no need for
> very large sort-merges. Even if the block level indexes differ, we only
> need to sort one of the tables.

I'd phrase that a little differently: an indexscan on such an index
would normally deliver unordered output, but you could demand ordered
output and get it by doing successive one-page sorts.  I doubt it's
worth inventing a new layer of mergejoin code to do this rather than
keeping it at the index access level.

Come to think of it, the idea also seems to map nicely into bitmap index
scans: the index will directly hand back a list of potential pages to
look at, but they are all marked "lossy" because the index doesn't know
exactly which tuple(s) on the target pages match the query.  The
existing bitmap-heap-scan code can take it from there.
        regards, tom lane


Re: No heap lookups on index

From
Christopher Kings-Lynne
Date:
> Oracle does, but you pay in other ways. Instead of keeping dead tuples
> in the main heap, they shuffle them off to an 'undo log'. This has some
> downsides:
> 
> Rollbacks take *forever*, though this usually isn't much of an issue
> unless you need to abort a really big transaction.

It's a good point though.  Surely a database should be optimised for the 
most common operation - commits, rather than rollbacks?

Chris



Re: No heap lookups on index

From
"Jim C. Nasby"
Date:
On Thu, Jan 19, 2006 at 09:18:55AM +0800, Christopher Kings-Lynne wrote:
> >Oracle does, but you pay in other ways. Instead of keeping dead tuples
> >in the main heap, they shuffle them off to an 'undo log'. This has some
> >downsides:
> >
> >Rollbacks take *forever*, though this usually isn't much of an issue
> >unless you need to abort a really big transaction.
> 
> It's a good point though.  Surely a database should be optimised for the 
> most common operation - commits, rather than rollbacks?

Generally true, but keep in mind this counter-argument... our MVCC
performs fewer disk writes (since generally you can find some free space
on the page you're modifying) and you can control when you take the hit
of cleaning up dead space. In fact, you can take that hit at a reduced
priority (vacuum_cost_*).
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: No heap lookups on index

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
>> Oracle does, but you pay in other ways. Instead of keeping dead tuples
>> in the main heap, they shuffle them off to an 'undo log'. This has some
>> downsides:
>> Rollbacks take *forever*, though this usually isn't much of an issue
>> unless you need to abort a really big transaction.

> It's a good point though.  Surely a database should be optimised for the 
> most common operation - commits, rather than rollbacks?

The "shuffling off" of the data is expensive in itself, so I'm not sure
you can argue that the Oracle way is more optimal for commits either.
        regards, tom lane


Re: No heap lookups on index

From
"Jim C. Nasby"
Date:
On Wed, Jan 18, 2006 at 08:13:59PM -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > We only need to index the row with the lowest value on any page so the main
> > index would get 100 times smaller. The main part of the index would not
> > need to be written to except when a block overflows.
> 
> BTW, the above is equivalent to saying that the leaf-level index pages
> aren't there: the downlink pointers on the level-1 index pages are
> pointers to heap pages, instead, and you're right that they effectively
> only index the lowest value per page (actually IIRC the highest value
> per page, but same difference).

Would this open the door for allowing tables to be maintained in CLUSTER
order (at least at the block level if not within the blocks)? Though I
have no idea how you'd handle page splits without a lot of pain, but
perhaps it would be possible to strive for a certain tuple ordering that
would allow for a periodic re-cluster that doesn't have to move a lot of
data. One thought is to strive for the same amount of free space on each
page, so if you're touching a tuple on a page that has less than desired
free space you move it's new version to either the next or previous
page.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: No heap lookups on index

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> Would this open the door for allowing tables to be maintained in CLUSTER
> order (at least at the block level if not within the blocks)? Though I
> have no idea how you'd handle page splits without a lot of pain

I think the way you'd attack that is by building the table with a pretty
low fill factor, so that there's room on each page for a number of
updates before you have to split.  Since the index AM is going to be
dictating space allocation, this is all in its hands.

The existing CLUSTER code would probably be totally inapplicable to
this sort of organization --- we'd have to provide some alternate code
path for index-organized heaps.
        regards, tom lane


Re: No heap lookups on index

From
Rod Taylor
Date:
On Thu, 2006-01-19 at 09:18 +0800, Christopher Kings-Lynne wrote:
> > Oracle does, but you pay in other ways. Instead of keeping dead tuples
> > in the main heap, they shuffle them off to an 'undo log'. This has some
> > downsides:
> > 
> > Rollbacks take *forever*, though this usually isn't much of an issue
> > unless you need to abort a really big transaction.
> 
> It's a good point though.  Surely a database should be optimised for the 
> most common operation

Yes.

>  - commits, rather than rollbacks?

Commits are most common because most databases are optimized for them.
Lots of programs go through a ton pre-checking to avoid a rollback that
they don't need to do under PostgreSQL.

I've found that for small systems I tend to rely very heavily on
frequent vacuums and database level exceptions for virtually all data
checking. Rollbacks are nearly as common as commits in those
environments if not more-so.
-- 



Re: No heap lookups on index

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> >> Oracle does, but you pay in other ways. Instead of keeping dead tuples
> >> in the main heap, they shuffle them off to an 'undo log'. This has some
> >> downsides:
> >> Rollbacks take *forever*, though this usually isn't much of an issue
> >> unless you need to abort a really big transaction.
> 
> > It's a good point though.  Surely a database should be optimised for the 
> > most common operation - commits, rather than rollbacks?
> 
> The "shuffling off" of the data is expensive in itself, so I'm not sure
> you can argue that the Oracle way is more optimal for commits either.

You pay in Oracle when you read these records too. If there are pending
updates you have to do a second read to the rollback segment to get the old
record. This hits long-running batch queries especially hard since by the time
they finish a large number of the records they're reading could have been
updated and require a second read to the rollback segments.

You also pay if the new value is too big to fit in the same space as the old
record. Then you get to have to follow a pointer to the new location. Oracle
tries to minimize that by intentionally leaving extra free space but that has
costs too.

And lastly rollback segments are of limited size. No matter how big you make
them there's always the risk that a long running query will take long enough
that data it needs will have expired from the rollback segments.

Oh, and note that optimizing for the common case has limits. Rollbacks may be
rare but one of the cases where they are effectively happening is on recovery
after a crash. And that's one process you *really* don't want to take longer
than necessary...

-- 
greg



Re: No heap lookups on index

From
Greg Stark
Date:
David Scott <davids@apptechsys.com> writes:

> Since I am sure everyone is tired of the intro by now, I'll get to the
> questions:
...
> Is there any way to modify PostgreSQL to allow index lookups without heap
> validation that doesn't involve re-writing the MVCC implementation of
> keeping dead rows on the live table? Is the additional overhead of keeping
> full tuple visibility information inside of the index so odious to the
> Postgres community as to prevent a patch with this solution from being
> applied back to the head?

The consequences of full visibility information in indexes would indeed be
pretty odious.

However the general gist the conversation led last time it came up had what
sounded like a feasible compromise:

Keep a very compact bitmap outside the table (not attached to any single
index) with one bit per tuple indicating whether the tuple was known to be
visible to every transaction. The hope being this bitmap would be small enough
to sit in memory pretty much permanently. Even if not then it should be much
smaller than the table and impose a pretty small i/o overhead.

If most of the records in the table are old records that are visible to every
transaction then the index scan would be able to avoid reading in pages of the
heap. "Most" would have to be a pretty big percentage though since even a
single tuple with unknown visibility would have to be read in.

The bitmap would be useful for vacuum too. Any page that contained only tuples
with known visibility could be skipped. That would mean running vacuum for
extremely large tables that have only moderate activity wouldn't have to scan
all those static pages. (There could be an issue with people whose FSM can't
track all the free space but expect it to be found on subsequent vacuums, but
details details.)

I wonder if the bitmap can actually be one bit per page actually. A single
update has to set the bit for the tuple, and that will make the whole page
have to be read in for both vacuum and index lookups. Only a vacuum will be
able to verify that all the tuples in the page are known-visible and index
entries have been cleaned up, and the vacuum is going to be operating on the
whole page anyways. A one-bit-per-page bitmap will easily fit in RAM even for
very large tables.

-- 
greg



Re: No heap lookups on index

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> You pay in Oracle when you read these records too. If there are pending
> updates you have to do a second read to the rollback segment to get the old
> record. This hits long-running batch queries especially hard since by the time
> they finish a large number of the records they're reading could have been
> updated and require a second read to the rollback segments.

If not third or fourth read, by the time you've found the version you're
supposed to be able to see.

I recall discussing this several years ago with somebody who knew quite
a bit about Oracle innards (though he didn't say how he knew...)
According to him, heavy accesses to the rollback segments have another
problem, which is contention for ownership of locks protecting access
to the rollback segments.  I got the impression that it would be like
us needing to take the WALWriteLock anytime we wanted to look at any
not-the-very-latest row version --- there's plenty of write traffic
that needs that lock, and you don't want to load it down with read
traffic too.
        regards, tom lane


Re: No heap lookups on index

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> I wonder if the bitmap can actually be one bit per page actually.

Yeah, I think we'd agreed that per-page was the way to go.  Per-tuple
bitmaps are painful to manage because of the variable number of tuples
per page.  And really all you need to know is whether to read the page
or not --- once you have, examining multiple tuples on it doesn't cost
much.
        regards, tom lane


Re: No heap lookups on index

From
"Jonah H. Harris"
Date:
As an Oracle internals person myself, I don't see how making a comparison between the specifics of Oracle's MVCC to PostgreSQL's MVCC is relevant to this discussion.

As does *MOST* other commercial databases, Oracle's storage manager performs an update-in-place whereas PostgreSQL's (for the most part) does not.  There are several ways to implement update-in-place, and Oracle has chosen their own rollback segment methodology which has issues that without tuning, are major hassles.  I'm not saying that one is better than the other in ALL cases, but I and many other Oracle consultants have tuned Oracle installations to eliminate the headaches others in this list have mentioned.  Any knowledgable Oracle person evaluating PostgreSQL that may be reading this list is just going to see it as a lot of anti-Oracle discussion with no basis in fact.

Regardless, there is NO WAY to perform an apples-to-apples comparison between the implementations, locking strategies, etc. as the MVCC implementations and goals are completely different.  In short, Oracle's implementation is not perfect; neither is ours.  Oracle's initial design (as a commercial database) is much different than PostgreSQL's (as a research database).

While I'm always game for looking at other implementations when designing and discussing new features for PostgreSQL, let's focus on making PostgreSQL better rather than spending time discussing unrealistic comparisons.

If we want to do a comparison on the how/why Oracle's index implementation is faster in the context of this situation and how we could make PostgreSQL's faster, let's stick to that.



On 1/19/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Stark <gsstark@mit.edu> writes:
> I wonder if the bitmap can actually be one bit per page actually.

Yeah, I think we'd agreed that per-page was the way to go.  Per-tuple
bitmaps are painful to manage because of the variable number of tuples
per page.  And really all you need to know is whether to read the page
or not --- once you have, examining multiple tuples on it doesn't cost
much.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: No heap lookups on index

From
"Jim C. Nasby"
Date:
On Thu, Jan 19, 2006 at 01:56:51AM -0500, Greg Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> > >> Oracle does, but you pay in other ways. Instead of keeping dead tuples
> > >> in the main heap, they shuffle them off to an 'undo log'. This has some
> > >> downsides:
> > >> Rollbacks take *forever*, though this usually isn't much of an issue
> > >> unless you need to abort a really big transaction.
> > 
> > > It's a good point though.  Surely a database should be optimised for the 
> > > most common operation - commits, rather than rollbacks?
> > 
> > The "shuffling off" of the data is expensive in itself, so I'm not sure
> > you can argue that the Oracle way is more optimal for commits either.
> 
> You pay in Oracle when you read these records too. If there are pending
> updates you have to do a second read to the rollback segment to get the old
> record. This hits long-running batch queries especially hard since by the time
> they finish a large number of the records they're reading could have been
> updated and require a second read to the rollback segments.

You pay the same cost in PostgreSQL though... If you index-scan to a
dead tuple, you get pointed to where the new one is. And if you're
seqscanning, well, you'll be reading everything anyway.

> You also pay if the new value is too big to fit in the same space as the old
> record. Then you get to have to follow a pointer to the new location. Oracle
> tries to minimize that by intentionally leaving extra free space but that has
> costs too.

Again, similar to the cost with our MVCC.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: No heap lookups on index

From
Greg Stark
Date:
"Jonah H. Harris" <jonah.harris@gmail.com> writes:

> As an Oracle internals person myself, I don't see how making a comparison
> between the specifics of Oracle's MVCC to PostgreSQL's MVCC is relevant to
> this discussion.
> 
> As does *MOST* other commercial databases, Oracle's storage manager performs
> an update-in-place whereas PostgreSQL's (for the most part) does not.  There
> are several ways to implement update-in-place, and Oracle has chosen their
> own rollback segment methodology which has issues that without tuning, are
> major hassles.  I'm not saying that one is better than the other in ALL
> cases, but I and many other Oracle consultants have tuned Oracle
> installations to eliminate the headaches others in this list have
> mentioned.  Any knowledgable Oracle person evaluating PostgreSQL that may be
> reading this list is just going to see it as a lot of anti-Oracle discussion
> with no basis in fact.
>
> Regardless, there is NO WAY to perform an apples-to-apples comparison
> between the implementations, locking strategies, etc. as the MVCC
> implementations and goals are completely different.  
...

Well it seems there were lots of facts posted. Yes you can avoid headaches
caused by these issues, but we're not really talking about the headaches. 

We're comparing the performance costs of what are update-in-place and
non-update-in-place approach. All of the costs named so far are to some degree
fundamental costs of update-in-place. All you can hope to do in tuning a
system is make sure the costs are kept within manageable bounds. 

There are fundamental costs to non-update-in-place as well. The table sizes
are bloated by the amount of space used to store older versions and the dead
tuples that haven't been reused yet. Whether this slows down Postgres as much
as having to do a second (or third or fourth) read to a rollback segment is a
valid area for discussion. It's especially interesting to discuss since the
two costs hit different sets of queries unequally.

> If we want to do a comparison on the how/why Oracle's index implementation
> is faster in the context of this situation and how we could make
> PostgreSQL's faster, let's stick to that.

Well the main difference is the MVCC implementation. Talking about Oracle's
index implementation while avoiding mentioning the elephant in the room would
be sort of pointless.

-- 
greg



Re: No heap lookups on index

From
"Jonah H. Harris"
Date:

On 19 Jan 2006 11:25:21 -0500, Greg Stark <gsstark@mit.edu> wrote:
Well it seems there were lots of facts posted. Yes you can avoid headaches
caused by these issues, but we're not really talking about the headaches.

Several were mentioned; some of which could generally be avoided by good tuning.

We're comparing the performance costs of what are update-in-place and
non-update-in-place approach.

As PostgreSQL is not an update-in-place system, what is the point in discussing the costs?  How does this solve David's original problem?

There are fundamental costs to non-update-in-place as well. The table sizes
are bloated by the amount of space used to store older versions and the dead
tuples that haven't been reused yet. Whether this slows down Postgres as much
as having to do a second (or third or fourth) read to a rollback segment is a
valid area for discussion. It's especially interesting to discuss since the
two costs hit different sets of queries unequally.

I agree, but again, we're not talking apples-to-apples.  There's far too many variables to compare Oracle's speed to PostgreSQL's for most types of operations in the varying types of database deployments.

Well the main difference is the MVCC implementation. Talking about Oracle's
index implementation while avoiding mentioning the elephant in the room would
be sort of pointless.

I agree that Oracle's MVCC plays *a little* into this index discussion, but isn't it pointless to discuss the pitfalls of an MVCC implementation that PostgreSQL does not have?  Similarly, how does it solve David's original question.

Again, I'm fine with discussing these things, but let's keep on topic for David's sake.  He posted a problem that we have discussed many times over.  Let's focus on that problem and give him possible options.

David has stated that the index to heap visibility check is slowing him down, so what are the possible options:

- Visibility in indexes (-hackers archives cover the pros/cons)
- True organized heaps
- Block level index (Tom/Simon's earlier discussion)


Re: No heap lookups on index

From
Josh Berkus
Date:
Jonah,

> David has stated that the index to heap visibility check is slowing him 
> down, so what are the possible options:
> 
> - Visibility in indexes (-hackers archives cover the pros/cons)
> - True organized heaps
> - Block level index (Tom/Simon's earlier discussion)

also  - Frozen relations

This last solution was proposed as a possibility for the data 
warehousing case.  For a time-partitioned table, we're going to know 
that all but one of the partitions has not been updated anywhere within 
visible transaction scope, and therefore index-only access is a possibility.

also  - join tables

One of the other most valuable targets for index-only access is the 
"many-to-many join table" whose primary key consists of two (or more) 
foreign keys to two (or more) other tables.  It's actually not necessary 
to check visibility on this kind of table as the visibility of tuples in 
the join table will be determined by the visibility of tuples in the two 
data tables.  Since often join tables consist *only* of the join key, 
being able to do index-only access on them could dramatically speed up 
certian kinds of queries.

Both of the above are "corner cases" but are very common ones and might 
be much easier to implement than the other solutions.

--Josh



Re: No heap lookups on index

From
David Scott
Date:
Thanks for all the help and thought to our problem.

Jonah H. Harris wrote:

>
> David has stated that the index to heap visibility check is slowing 
> him down, so what are the possible options:
>
> - Visibility in indexes (-hackers archives cover the pros/cons)
> - True organized heaps
> - Block level index (Tom/Simon's earlier discussion)
>
>
Several people seem to have a problem with full index visibility, 
however both last thread and this thread people seem to agree that 
keeping a page level visibility would be an acceptable compromise.  
Perhaps we should implement a rough patch and run some time stats to 
post for everyone to weigh the pros and cons against hard figures?


The discussion on block level indexing and true organized heaps, if I am 
interpreting this topic correctly, would essentially keep the table 
optimized for accessing the data through one index?  This seems like a 
very interesting idea, but I don't know that it would really solve our 
current problem because we need to use more then one index to 
efficiently access a table.

We have a very driving need to solve this problem and so don't mind 
doing legwork to do feasibility testing or stat collection to help the 
community decide the best solution for Postgres.


Re: No heap lookups on index

From
David Scott
Date:
Tom Lane wrote:

>What sort of problems are you dealing with exactly?  There has been
>some discussion of changes that would improve certain scenarios.  For
>instance it might be plausible to do joins using index information and
>only go back to the heap for entries that appear to pass the join test.
>
>  
>
   We tried that scenario, writing a "dirty index hack" to experiment 
with, that returned values whether they were valid or not.  We saw some 
definite improvements inside of joins and sub queries, but we were still 
slowed down at the end because we still had to validate every row being 
returned.
   My hands are very tied as to what specific examples I can send, so I 
apologize for how long it took to get back to you on this.  A simple 
(generalized) example of one the types of queries we are running:
   SELECT col1, col2, cool_func(stat_count, 
COALESCE(raw_counts.raw_count, (SELECT alt_count FROM alt_raw_table 
WHERE alt_raw_table.pk = col2))) as cool_func
   FROM       (SELECT col1, col2, stat_count FROM pair_table WHERE col1 = $1) 
pair_table       LEFT JOIN       raw_counts       ON pair_table.col2 = raw_counts.pk 
We tried not validating the return of both of these as we only want to 
see the rows which have a high value for cool_func, but it was still 
necessary to validate the rows which did match the criteria.  So we did 
see an improvement, but not enough.


Re: No heap lookups on index

From
"Jim C. Nasby"
Date:
On Thu, Jan 19, 2006 at 10:19:01AM -0800, Josh Berkus wrote:
> One of the other most valuable targets for index-only access is the 
> "many-to-many join table" whose primary key consists of two (or more) 
> foreign keys to two (or more) other tables.  It's actually not necessary 
> to check visibility on this kind of table as the visibility of tuples in 
> the join table will be determined by the visibility of tuples in the two 
> data tables.  Since often join tables consist *only* of the join key, 
> being able to do index-only access on them could dramatically speed up 
> certian kinds of queries.

How would that handle 'delinking' item A from foobaz 2? (IE: DELETE FROM
join_table WHERE id1=231 and id2=24842)

The only way I can see this working is if it is required that items in
both tables as well as the link in the many-many table are only inserted
and deleted in the same transaction, which seems to be really pushing
this into corner-case territory.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: No heap lookups on index

From
"Jim C. Nasby"
Date:
On Thu, Jan 19, 2006 at 10:35:30AM -0800, David Scott wrote:
> Tom Lane wrote:
> 
> >What sort of problems are you dealing with exactly?  There has been
> >some discussion of changes that would improve certain scenarios.  For
> >instance it might be plausible to do joins using index information and
> >only go back to the heap for entries that appear to pass the join test.

Do you still have that patch that folks could look at? ISTM that this
technique would be rather dependant on your actual workload, and as such
could result in a big win for certain types of queries.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: No heap lookups on index

From
Simon Riggs
Date:
On Wed, 2006-01-18 at 20:13 -0500, Tom Lane wrote:

> Come to think of it, the idea also seems to map nicely into bitmap index
> scans: the index will directly hand back a list of potential pages to
> look at, but they are all marked "lossy" because the index doesn't know
> exactly which tuple(s) on the target pages match the query.  The
> existing bitmap-heap-scan code can take it from there.

Yes, I've privately suggested this solution in that context.

I think there is enough meat there to make this topic worth discussing
further, but not on list again just yet.

Best Regards, Simon Riggs




Re: No heap lookups on index

From
Jeremy Drake
Date:
On Thu, 19 Jan 2006, Jim C. Nasby wrote:

> Do you still have that patch that folks could look at? ISTM that this
> technique would be rather dependant on your actual workload, and as such
> could result in a big win for certain types of queries.


It is not a patch, per se.  It is a c language function which calls some
of the nbtree functions to return things from the index.  The syntax for
calling it is rather obtuse, since those of us who don't understand the
parser are doomed to attempt circumventing it ;P.

I tarred up the code, and put it on a web server so that interested
parties can play with it.  The url is
http://linux.apptechsys.com/~jeremyd/postgresql/fakeidxscan.tar.gz

It is very hackish, so definately do not assume that it is in any way
correct, rather assume the opposite.  I have run it on x86 and x86_64
boxes, and it compiles and runs on those.

Here is an example of its usage, so you can see the nasty syntax required
and perhaps grok how to use it better.


create table test_table (a integer, b integer);
create index test_table_a_b_idx on test_table (a, b);
insert into test_table (a, b) select a, b from generate_series(1,100) a,generate_series(1,100) b;

select * from fakeidxrowscomposite(
'test_table', -- relation
'test_table_a_b_idx', -- index
1, --number of scan keys
ARRAY[1, 2]::smallint[], -- numbers of the index attributes to return
ARRAY[1]::smallint[], -- numbers of the attrs the scankeys apply to
ARRAY['=(integer,integer)'::regoperator], -- operators for the scankeys
ARRAY[3]::smallint[], -- btree strategy for the scankeys
(42,0) -- values for the scankeys to compare against (if there is only      -- one, you have to put a fake one in since
otherwisethe parser      -- does not think it is a record)
 
) AS (a integer, b integer); -- tell the parser what columns to expect



This example returns 100 rows in which the first column contains 42 and
the second column contains the numbers between 1 and 100, in order.

Feel free to do whatever with this, it's pretty fast for tables where
seeks to validate tuples would hurt, but you do get back dead things...


-- 
When you know absolutely nothing about the topic, make your forecast by
asking a carefully selected probability sample of 300 others who don't
know the answer either.    -- Edgar R. Fiedler


Re: No heap lookups on index

From
"Jim C. Nasby"
Date:
On Thu, Jan 19, 2006 at 02:50:39PM -0800, Jeremy Drake wrote:
> On Thu, 19 Jan 2006, Jim C. Nasby wrote:
> 
> > Do you still have that patch that folks could look at? ISTM that this
> > technique would be rather dependant on your actual workload, and as such
> > could result in a big win for certain types of queries.
...
> Feel free to do whatever with this, it's pretty fast for tables where
> seeks to validate tuples would hurt, but you do get back dead things...

How'd you then weed out the dead tuples?

Basically, numbers talk. If there were convincing numbers for something
that wasn't a corner-case that showed a marked improvement then there'd
be much more interest in getting this into the backend in some fashion.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: No heap lookups on index

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> Basically, numbers talk. If there were convincing numbers for something
> that wasn't a corner-case that showed a marked improvement then there'd
> be much more interest in getting this into the backend in some fashion.

I've got no doubt that there are *some* "non corner" cases for which
this would be a win.  The lower the update load on the database, the
better it's going to look.  The issue really is where does it start
being a loss, and can you convince us that those cases are all "corner"
ones?

(The subtext here, of course, is the assumption that it's an
all-or-nothing choice.  I'm of the opinion that supporting both options
would be infeasible from a code complexity and maintenance standpoint;
but a simple patch that did both would of course prove that opinion
wrong ...)
        regards, tom lane


Re: No heap lookups on index

From
Jeremy Drake
Date:
On Thu, 19 Jan 2006, Jim C. Nasby wrote:

> > Feel free to do whatever with this, it's pretty fast for tables where
> > seeks to validate tuples would hurt, but you do get back dead things...
>
> How'd you then weed out the dead tuples?

I didn't get that far with it.  The purpose of this function was to
quickly put something together to demonstrate that the overhead of seeking
to the proper tuples in the heap to determine their visibility was the
main component of the time being spent to satisfy our queries.

> Basically, numbers talk. If there were convincing numbers for something
> that wasn't a corner-case that showed a marked improvement then there'd
> be much more interest in getting this into the backend in some fashion.

I could get some numbers of how much time validating tuples adds to a
query, but I don't think that that would be horribly novel.  BTW,
hopefully I did not make you think that I intended to get this into
the official backend.  This function was only meant to demonstrate to the
people around here that the visibility check was the bottleneck we were
seeing.  The function may also be interesting as a demonstration of how
indexes are handled in postgres, as you can see when tuples are flagged as
no longer valid and when they are not.  I have put xmin into an index so
that I could use this function to better visualize when index tuples are
left behind (I tried to put xmax in there too, but I never saw them
change, after checking the code it turns out that the index is never told
about changes in xmax).


We were seeing this case: All rows in our table are visible (we are the
only transaction on the machine and we did a VACUUM FULL ANALYZE before).
We rebooted to ensure no caching.  We were seeing times which, upon
division by the number of rows returned by the index scan, were remarkably
close to the average seek time listed on the specs for the hard drive in
the testing box.  This was about 5ms, which doesn't sound like much, but
given a large enough number of rows and a few joins, 5ms per tuple adds up
quickly.  This implies that we were seeing approximately the worst case as
far as the distribution of the relevant tuples on pages, ie each tuple we
wanted was on a different heap page.

Digging back to some times we had collected from this experiment,
apparently we were taking about 15 to 20 seconds to run a particular
query, and when we used the function I previously posted those times were
reduced to 5 seconds.  This was a while ago, however, so these times are
probably not very accurate and we probably made other tweaks to speed
things up since then.  But it gives an idea.  We could come up with more
absolute numbers, but I think people already know what they would look
like.