Thread: MVCC and index-only read

MVCC and index-only read

From
Scara Maccai
Date:
Hi,

if I got it right the reason some aggregates (such as COUNT) using only index columns are "slow" on postgresql is that
ituses MVCC, so it has to read the data as well as the index. It makes sense to me, but I don't understand is how other
databases(such as Oracle) do it. 
Can someone explain?

Thank you.



      Unisciti alla community di Io fotografo e video, il nuovo corso di fotografia di Gazzetta dello sport:
http://www.flickr.com/groups/iofotografoevideo

Re: MVCC and index-only read

From
Sam Mason
Date:
On Tue, Nov 18, 2008 at 04:49:35PM +0000, Scara Maccai wrote:
> if I got it right the reason some aggregates (such as COUNT) using
> only index columns are "slow" on postgresql is that it uses MVCC, so
> it has to read the data as well as the index.

Every aggregate (of which COUNT is just one example) has to read data
from both the index and the table.  The reason is that each row in a
table has two important identifiers; the transaction that created it and
the transaction that killed it.  Every time a query scans the table it
looks to see that both the transaction that created it COMMITed and that
transaction that killed it (if any) didn't COMMIT.  The index doesn't
contain these two identifiers so when scanning the index the code needs
to go and check what these are.

There are various optimizations in PG so that it doesn't need to
actually check the transaction numbers the whole time, thus speeding
things up a bit, but the semantics/behavior is the same.

> It makes sense to me,
> but I don't understand is how other databases (such as Oracle) do it.

I believe Oracle maintains a separate log (not sure how it's structured)
that contains this information and all the data in both the main table
and index can be considered committed.

There are tradeoffs in both directions; PG's implementation allows
greater concurrency, but Oracle's way is more optimized for read access.
Which implementation is better depends a lot on your work load.

There has been talk of adding the transaction identifiers into the
indexes in PG, which would mean that index scans wouldn't need to go
to the table.  The problem is that the indexes would be larger and
modifying data would incur larger overheads as both the data and index
would have to be updated.

I hope someone will point out any mistakes I've made!


  Sam

Re: MVCC and index-only read

From
Tom Lane
Date:
Sam Mason <sam@samason.me.uk> writes:
> On Tue, Nov 18, 2008 at 04:49:35PM +0000, Scara Maccai wrote:
>> It makes sense to me,
>> but I don't understand is how other databases (such as Oracle) do it.

> I believe Oracle maintains a separate log (not sure how it's structured)
> that contains this information and all the data in both the main table
> and index can be considered committed.

FWIW, I believe that count(*) is pretty slow in Oracle too.  The DBs
that can do it fast are the ones that maintain a centralized counter
of the number of rows in each table.  Which makes count(*) nice and
fast, at the cost of horrendous concurrency impacts for updates; plus
there's no chance of real MVCC operation.  (In an MVCC world the correct
answer for count(*) can vary depending on who's asking --- there's no
hope of doing that with a single counter.)

            regards, tom lane

Re: MVCC and index-only read

From
Scara Maccai
Date:
> FWIW, I believe that count(*) is pretty slow in Oracle too.

Well COUNT was only an example. I think (but I'm not sure AT ALL) that

SELECT A FROM myTAB where A <10000

only uses the index (if there's an index defined for A) in Oracle.

But mine was just curiosity... which I think you and Sam answered.

Thank you.







      Unisciti alla community di Io fotografo e video, il nuovo corso di fotografia di Gazzetta dello sport:
http://www.flickr.com/groups/iofotografoevideo

Re: MVCC and index-only read

From
"Jonah H. Harris"
Date:
On Tue, Nov 18, 2008 at 12:48 PM, Sam Mason <sam@samason.me.uk> wrote:
>> It makes sense to me,
>> but I don't understand is how other databases (such as Oracle) do it.
>
> There are tradeoffs in both directions; [...] but Oracle's way is more optimized

<response type="snarky">
For the most part, that's all you needed to say :)
</response>

--
Jonah H. Harris, Senior DBA
myYearbook.com

Re: MVCC and index-only read

From
"Jonah H. Harris"
Date:
On Tue, Nov 18, 2008 at 2:02 PM, Scara Maccai <m_lists@yahoo.it> wrote:
> SELECT A FROM myTAB where A <10000
>
> only uses the index (if there's an index defined for A) in Oracle.

Well, not exactly.  That's called a "covered" index because the query
could be satisfied directly from the index (the attribute is covered
by the index).  Oracle sometimes satisfies it with an index fast full
scan, but not always; it depends on the cost of other access methods
and/or what Oracle believes is currently in cache.

--
Jonah H. Harris, Senior DBA
myYearbook.com

Re: MVCC and index-only read

From
Thomas Kellerer
Date:
Jonah H. Harris wrote on 18.11.2008 20:15:
> On Tue, Nov 18, 2008 at 2:02 PM, Scara Maccai <m_lists@yahoo.it> wrote:
>> SELECT A FROM myTAB where A <10000
>>
>> only uses the index (if there's an index defined for A) in Oracle.
>
> Well, not exactly.  That's called a "covered" index because the query
> could be satisfied directly from the index (the attribute is covered
> by the index).  Oracle sometimes satisfies it with an index fast full
> scan, but not always; it depends on the cost of other access methods
> and/or what Oracle believes is currently in cache.
>
If all the columns from the select list are available in the index, then Oracle
will always prefer the index scan over a table scan (at least I have never seen
something else). Even for a SELECT that returns all rows of the table.

They are taking this concept even further with index organized tables, where no
real "table data" exists, everything is stored in the index (quited nice for
e.g. link tables that only consist of two or three integer columns)


Thomas

Re: MVCC and index-only read

From
"Scott Marlowe"
Date:
On Tue, Nov 18, 2008 at 12:33 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> If all the columns from the select list are available in the index, then
> Oracle will always prefer the index scan over a table scan (at least I have
> never seen something else). Even for a SELECT that returns all rows of the
> table.
>
> They are taking this concept even further with index organized tables, where
> no real "table data" exists, everything is stored in the index (quited nice
> for e.g. link tables that only consist of two or three integer columns)

Sounds like they're borrowing the code from innodb that does much the
same thing.  In Innodb, if a field is indexed, it lives only as an
index, not in the table and an index at the same time.

Re: MVCC and index-only read

From
"Jonah H. Harris"
Date:
On Tue, Nov 18, 2008 at 2:33 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> If all the columns from the select list are available in the index, then
> Oracle will always prefer the index scan over a table scan (at least I have
> never seen something else). Even for a SELECT that returns all rows of the
> table.

No, it doesn't always prefer index fast full scan.

> They are taking this concept even further with index organized tables, where
> no real "table data" exists, everything is stored in the index (quited nice
> for e.g. link tables that only consist of two or three integer columns)

Those are essentially clustered indexes, and they're not quite stored
exactly the same..

--
Jonah H. Harris, Senior DBA
myYearbook.com

Re: MVCC and index-only read

From
"Scott Marlowe"
Date:
On Tue, Nov 18, 2008 at 1:03 PM, Jonah H. Harris <jonah.harris@gmail.com> wrote:
> On Tue, Nov 18, 2008 at 2:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> Sounds like they're borrowing the code from innodb that does much the
>> same thing.  In Innodb, if a field is indexed, it lives only as an
>> index, not in the table and an index at the same time.
>
> They aren't borrowing anything, Oracle has had this functionality
> since at least Oracle 8i (1999).

Whoa, calm down Francis.  I'm not suggesting they stole it or
something.  Just that they're using the same basic concepts.

Re: MVCC and index-only read

From
"Jonah H. Harris"
Date:
On Tue, Nov 18, 2008 at 2:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> Sounds like they're borrowing the code from innodb that does much the
> same thing.  In Innodb, if a field is indexed, it lives only as an
> index, not in the table and an index at the same time.

They aren't borrowing anything, Oracle has had this functionality
since at least Oracle 8i (1999).

--
Jonah H. Harris, Senior DBA
myYearbook.com

Re: MVCC and index-only read

From
"Scott Marlowe"
Date:
On Tue, Nov 18, 2008 at 1:07 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Tue, Nov 18, 2008 at 1:03 PM, Jonah H. Harris <jonah.harris@gmail.com> wrote:
>> On Tue, Nov 18, 2008 at 2:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>>> Sounds like they're borrowing the code from innodb that does much the
>>> same thing.  In Innodb, if a field is indexed, it lives only as an
>>> index, not in the table and an index at the same time.
>>
>> They aren't borrowing anything, Oracle has had this functionality
>> since at least Oracle 8i (1999).
>
> Whoa, calm down Francis.  I'm not suggesting they stole it or
> something.  Just that they're using the same basic concepts.

Oh, and citation needed.  I don't remember seeing anything about
oracle using indexes as sole storage units back in 8i


--
When fascism comes to America, it will be draped in a flag and
carrying a cross - Sinclair Lewis

Re: MVCC and index-only read

From
"Jonah H. Harris"
Date:
On Tue, Nov 18, 2008 at 3:07 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> They aren't borrowing anything, Oracle has had this functionality
>> since at least Oracle 8i (1999).
>
> Whoa, calm down Francis.

My name's not Francis :)

> I'm not suggesting they stole it or something.  Just that they're using
> the same basic concepts.

Hmm...

--- snip
> Sounds like they're borrowing the code from innodb that does much the same thing

You can't borrow something you started developing prior to InnoDB's release.

--
Jonah H. Harris, Senior DBA
myYearbook.com

Re: MVCC and index-only read

From
"Jonah H. Harris"
Date:
On Tue, Nov 18, 2008 at 3:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> Oh, and citation needed.  I don't remember seeing anything about
> oracle using indexes as sole storage units back in 8i

Your memory-foo is weak.  See ORGANIZATION INDEX:

http://download-west.oracle.com/docs/cd/A87860_01/doc/server.817/a85397/statem3e.htm#2061671

--
Jonah H. Harris, Senior DBA
myYearbook.com

Re: MVCC and index-only read

From
"Joshua D. Drake"
Date:
On Tue, 2008-11-18 at 15:28 -0500, Jonah H. Harris wrote:
> On Tue, Nov 18, 2008 at 3:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> > Oh, and citation needed.  I don't remember seeing anything about
> > oracle using indexes as sole storage units back in 8i
>
> Your memory-foo is weak.  See ORGANIZATION INDEX:
>
> http://download-west.oracle.com/docs/cd/A87860_01/doc/server.817/a85397/statem3e.htm#2061671
>

Off topic much?

> --
> Jonah H. Harris, Senior DBA
> myYearbook.com
>
--


Re: MVCC and index-only read

From
"Jonah H. Harris"
Date:
On Tue, Nov 18, 2008 at 3:45 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> Off topic much?

Hey, all I did was make a joke; other people wanted to get all
*correct* about it :)

Anyway, as this has been discussed at least twenty times before, this
is a waste of a thread.

--
Jonah H. Harris, Senior DBA
myYearbook.com

Re: MVCC and index-only read

From
"Daniel Verite"
Date:
    Scott Marlowe wrote:

> >> They aren't borrowing anything, Oracle has had this functionality
> >> since at least Oracle 8i (1999).
> >
> > Whoa, calm down Francis.  I'm not suggesting they stole it or
> > something.    Just that they're using the same basic concepts.
>
> Oh, and citation needed.  I don't remember seeing anything about
> oracle using indexes as sole storage units back in 8i

They call that IOT, for Index Organized Tables, and they were
introduced with Oracle8 (1997). Check out
http://www.orafaq.com/wiki/Oracle_8

 Best regards,
--
 Daniel
 PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org

Re: MVCC and index-only read

From
Thomas Kellerer
Date:
Jonah H. Harris wrote on 18.11.2008 20:58:
> On Tue, Nov 18, 2008 at 2:33 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
>> If all the columns from the select list are available in the index, then
>> Oracle will always prefer the index scan over a table scan (at least I have
>> never seen something else). Even for a SELECT that returns all rows of the
>> table.
>
> No, it doesn't always prefer index fast full scan.

Hmm. I was not talking about an index _fast full_ scan, I was talking about
index scans in general. Personally I have never seen Oracle using a table scan
(whatever kind) if all columns in the select are present in the index.

And the manual actually suggests the same:

"If the statement accesses only columns of the index, then Oracle reads the
indexed column values directly from the index, rather than from the table"
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i52300

>> They are taking this concept even further with index organized tables, where
>> no real "table data" exists, everything is stored in the index (quited nice
>> for e.g. link tables that only consist of two or three integer columns)
>
> Those are essentially clustered indexes, and they're not quite stored
> exactly the same..
>
Hmm, my understanding of a clustered index, that it "orders" the table data
according to the index, but there is still "table data" and "index data", right?

That is a bit different to an index-organized table were only a B-Tree index
exists. This is not mandatory, but for my example (a link table with two PK
columns) only a B-Tree index is created.

(I have to admit I don't really know the concept of clustered indexes)

Thomas



Re: MVCC and index-only read

From
"Jonah H. Harris"
Date:
On Tue, Nov 18, 2008 at 3:54 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Hmm. I was not talking about an index _fast full_ scan, I was talking about
> index scans in general. Personally I have never seen Oracle using a table
> scan (whatever kind) if all columns in the select are present in the index.
>
> And the manual actually suggests the same:
>
> "If the statement accesses only columns of the index, then Oracle reads the
> indexed column values directly from the index, rather than from the table"
> http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i52300

The manual is wrong.

>> Those are essentially clustered indexes, and they're not quite stored
>> exactly the same..
>>
> Hmm, my understanding of a clustered index, that it "orders" the table data
> according to the index, but there is still "table data" and "index data",
> right?
>
> That is a bit different to an index-organized table were only a B-Tree index
> exists. This is not mandatory, but for my example (a link table with two PK
> columns) only a B-Tree index is created.

Well, clustered indexes mean different things to different vendors.
Oracle's implementation stores the data with the index as does SQL
Server, but in a little different fashion.

--
Jonah H. Harris, Senior DBA
myYearbook.com