Thread: MVCC and index-only read
Hi, if I got it right the reason some aggregates (such as COUNT) using only index columns are "slow" on postgresql is that ituses MVCC, so it has to read the data as well as the index. It makes sense to me, but I don't understand is how other databases(such as Oracle) do it. Can someone explain? Thank you. Unisciti alla community di Io fotografo e video, il nuovo corso di fotografia di Gazzetta dello sport: http://www.flickr.com/groups/iofotografoevideo
On Tue, Nov 18, 2008 at 04:49:35PM +0000, Scara Maccai wrote: > if I got it right the reason some aggregates (such as COUNT) using > only index columns are "slow" on postgresql is that it uses MVCC, so > it has to read the data as well as the index. Every aggregate (of which COUNT is just one example) has to read data from both the index and the table. The reason is that each row in a table has two important identifiers; the transaction that created it and the transaction that killed it. Every time a query scans the table it looks to see that both the transaction that created it COMMITed and that transaction that killed it (if any) didn't COMMIT. The index doesn't contain these two identifiers so when scanning the index the code needs to go and check what these are. There are various optimizations in PG so that it doesn't need to actually check the transaction numbers the whole time, thus speeding things up a bit, but the semantics/behavior is the same. > It makes sense to me, > but I don't understand is how other databases (such as Oracle) do it. I believe Oracle maintains a separate log (not sure how it's structured) that contains this information and all the data in both the main table and index can be considered committed. There are tradeoffs in both directions; PG's implementation allows greater concurrency, but Oracle's way is more optimized for read access. Which implementation is better depends a lot on your work load. There has been talk of adding the transaction identifiers into the indexes in PG, which would mean that index scans wouldn't need to go to the table. The problem is that the indexes would be larger and modifying data would incur larger overheads as both the data and index would have to be updated. I hope someone will point out any mistakes I've made! Sam
Sam Mason <sam@samason.me.uk> writes: > On Tue, Nov 18, 2008 at 04:49:35PM +0000, Scara Maccai wrote: >> It makes sense to me, >> but I don't understand is how other databases (such as Oracle) do it. > I believe Oracle maintains a separate log (not sure how it's structured) > that contains this information and all the data in both the main table > and index can be considered committed. FWIW, I believe that count(*) is pretty slow in Oracle too. The DBs that can do it fast are the ones that maintain a centralized counter of the number of rows in each table. Which makes count(*) nice and fast, at the cost of horrendous concurrency impacts for updates; plus there's no chance of real MVCC operation. (In an MVCC world the correct answer for count(*) can vary depending on who's asking --- there's no hope of doing that with a single counter.) regards, tom lane
> FWIW, I believe that count(*) is pretty slow in Oracle too. Well COUNT was only an example. I think (but I'm not sure AT ALL) that SELECT A FROM myTAB where A <10000 only uses the index (if there's an index defined for A) in Oracle. But mine was just curiosity... which I think you and Sam answered. Thank you. Unisciti alla community di Io fotografo e video, il nuovo corso di fotografia di Gazzetta dello sport: http://www.flickr.com/groups/iofotografoevideo
On Tue, Nov 18, 2008 at 12:48 PM, Sam Mason <sam@samason.me.uk> wrote: >> It makes sense to me, >> but I don't understand is how other databases (such as Oracle) do it. > > There are tradeoffs in both directions; [...] but Oracle's way is more optimized <response type="snarky"> For the most part, that's all you needed to say :) </response> -- Jonah H. Harris, Senior DBA myYearbook.com
On Tue, Nov 18, 2008 at 2:02 PM, Scara Maccai <m_lists@yahoo.it> wrote: > SELECT A FROM myTAB where A <10000 > > only uses the index (if there's an index defined for A) in Oracle. Well, not exactly. That's called a "covered" index because the query could be satisfied directly from the index (the attribute is covered by the index). Oracle sometimes satisfies it with an index fast full scan, but not always; it depends on the cost of other access methods and/or what Oracle believes is currently in cache. -- Jonah H. Harris, Senior DBA myYearbook.com
Jonah H. Harris wrote on 18.11.2008 20:15: > On Tue, Nov 18, 2008 at 2:02 PM, Scara Maccai <m_lists@yahoo.it> wrote: >> SELECT A FROM myTAB where A <10000 >> >> only uses the index (if there's an index defined for A) in Oracle. > > Well, not exactly. That's called a "covered" index because the query > could be satisfied directly from the index (the attribute is covered > by the index). Oracle sometimes satisfies it with an index fast full > scan, but not always; it depends on the cost of other access methods > and/or what Oracle believes is currently in cache. > If all the columns from the select list are available in the index, then Oracle will always prefer the index scan over a table scan (at least I have never seen something else). Even for a SELECT that returns all rows of the table. They are taking this concept even further with index organized tables, where no real "table data" exists, everything is stored in the index (quited nice for e.g. link tables that only consist of two or three integer columns) Thomas
On Tue, Nov 18, 2008 at 12:33 PM, Thomas Kellerer <spam_eater@gmx.net> wrote: > If all the columns from the select list are available in the index, then > Oracle will always prefer the index scan over a table scan (at least I have > never seen something else). Even for a SELECT that returns all rows of the > table. > > They are taking this concept even further with index organized tables, where > no real "table data" exists, everything is stored in the index (quited nice > for e.g. link tables that only consist of two or three integer columns) Sounds like they're borrowing the code from innodb that does much the same thing. In Innodb, if a field is indexed, it lives only as an index, not in the table and an index at the same time.
On Tue, Nov 18, 2008 at 2:33 PM, Thomas Kellerer <spam_eater@gmx.net> wrote: > If all the columns from the select list are available in the index, then > Oracle will always prefer the index scan over a table scan (at least I have > never seen something else). Even for a SELECT that returns all rows of the > table. No, it doesn't always prefer index fast full scan. > They are taking this concept even further with index organized tables, where > no real "table data" exists, everything is stored in the index (quited nice > for e.g. link tables that only consist of two or three integer columns) Those are essentially clustered indexes, and they're not quite stored exactly the same.. -- Jonah H. Harris, Senior DBA myYearbook.com
On Tue, Nov 18, 2008 at 1:03 PM, Jonah H. Harris <jonah.harris@gmail.com> wrote: > On Tue, Nov 18, 2008 at 2:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> Sounds like they're borrowing the code from innodb that does much the >> same thing. In Innodb, if a field is indexed, it lives only as an >> index, not in the table and an index at the same time. > > They aren't borrowing anything, Oracle has had this functionality > since at least Oracle 8i (1999). Whoa, calm down Francis. I'm not suggesting they stole it or something. Just that they're using the same basic concepts.
On Tue, Nov 18, 2008 at 2:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > Sounds like they're borrowing the code from innodb that does much the > same thing. In Innodb, if a field is indexed, it lives only as an > index, not in the table and an index at the same time. They aren't borrowing anything, Oracle has had this functionality since at least Oracle 8i (1999). -- Jonah H. Harris, Senior DBA myYearbook.com
On Tue, Nov 18, 2008 at 1:07 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Tue, Nov 18, 2008 at 1:03 PM, Jonah H. Harris <jonah.harris@gmail.com> wrote: >> On Tue, Nov 18, 2008 at 2:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >>> Sounds like they're borrowing the code from innodb that does much the >>> same thing. In Innodb, if a field is indexed, it lives only as an >>> index, not in the table and an index at the same time. >> >> They aren't borrowing anything, Oracle has had this functionality >> since at least Oracle 8i (1999). > > Whoa, calm down Francis. I'm not suggesting they stole it or > something. Just that they're using the same basic concepts. Oh, and citation needed. I don't remember seeing anything about oracle using indexes as sole storage units back in 8i -- When fascism comes to America, it will be draped in a flag and carrying a cross - Sinclair Lewis
On Tue, Nov 18, 2008 at 3:07 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> They aren't borrowing anything, Oracle has had this functionality >> since at least Oracle 8i (1999). > > Whoa, calm down Francis. My name's not Francis :) > I'm not suggesting they stole it or something. Just that they're using > the same basic concepts. Hmm... --- snip > Sounds like they're borrowing the code from innodb that does much the same thing You can't borrow something you started developing prior to InnoDB's release. -- Jonah H. Harris, Senior DBA myYearbook.com
On Tue, Nov 18, 2008 at 3:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > Oh, and citation needed. I don't remember seeing anything about > oracle using indexes as sole storage units back in 8i Your memory-foo is weak. See ORGANIZATION INDEX: http://download-west.oracle.com/docs/cd/A87860_01/doc/server.817/a85397/statem3e.htm#2061671 -- Jonah H. Harris, Senior DBA myYearbook.com
On Tue, 2008-11-18 at 15:28 -0500, Jonah H. Harris wrote: > On Tue, Nov 18, 2008 at 3:09 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > > Oh, and citation needed. I don't remember seeing anything about > > oracle using indexes as sole storage units back in 8i > > Your memory-foo is weak. See ORGANIZATION INDEX: > > http://download-west.oracle.com/docs/cd/A87860_01/doc/server.817/a85397/statem3e.htm#2061671 > Off topic much? > -- > Jonah H. Harris, Senior DBA > myYearbook.com > --
On Tue, Nov 18, 2008 at 3:45 PM, Joshua D. Drake <jd@commandprompt.com> wrote: > Off topic much? Hey, all I did was make a joke; other people wanted to get all *correct* about it :) Anyway, as this has been discussed at least twenty times before, this is a waste of a thread. -- Jonah H. Harris, Senior DBA myYearbook.com
Scott Marlowe wrote: > >> They aren't borrowing anything, Oracle has had this functionality > >> since at least Oracle 8i (1999). > > > > Whoa, calm down Francis. I'm not suggesting they stole it or > > something. Just that they're using the same basic concepts. > > Oh, and citation needed. I don't remember seeing anything about > oracle using indexes as sole storage units back in 8i They call that IOT, for Index Organized Tables, and they were introduced with Oracle8 (1997). Check out http://www.orafaq.com/wiki/Oracle_8 Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
Jonah H. Harris wrote on 18.11.2008 20:58: > On Tue, Nov 18, 2008 at 2:33 PM, Thomas Kellerer <spam_eater@gmx.net> wrote: >> If all the columns from the select list are available in the index, then >> Oracle will always prefer the index scan over a table scan (at least I have >> never seen something else). Even for a SELECT that returns all rows of the >> table. > > No, it doesn't always prefer index fast full scan. Hmm. I was not talking about an index _fast full_ scan, I was talking about index scans in general. Personally I have never seen Oracle using a table scan (whatever kind) if all columns in the select are present in the index. And the manual actually suggests the same: "If the statement accesses only columns of the index, then Oracle reads the indexed column values directly from the index, rather than from the table" http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i52300 >> They are taking this concept even further with index organized tables, where >> no real "table data" exists, everything is stored in the index (quited nice >> for e.g. link tables that only consist of two or three integer columns) > > Those are essentially clustered indexes, and they're not quite stored > exactly the same.. > Hmm, my understanding of a clustered index, that it "orders" the table data according to the index, but there is still "table data" and "index data", right? That is a bit different to an index-organized table were only a B-Tree index exists. This is not mandatory, but for my example (a link table with two PK columns) only a B-Tree index is created. (I have to admit I don't really know the concept of clustered indexes) Thomas
On Tue, Nov 18, 2008 at 3:54 PM, Thomas Kellerer <spam_eater@gmx.net> wrote: > Hmm. I was not talking about an index _fast full_ scan, I was talking about > index scans in general. Personally I have never seen Oracle using a table > scan (whatever kind) if all columns in the select are present in the index. > > And the manual actually suggests the same: > > "If the statement accesses only columns of the index, then Oracle reads the > indexed column values directly from the index, rather than from the table" > http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/optimops.htm#i52300 The manual is wrong. >> Those are essentially clustered indexes, and they're not quite stored >> exactly the same.. >> > Hmm, my understanding of a clustered index, that it "orders" the table data > according to the index, but there is still "table data" and "index data", > right? > > That is a bit different to an index-organized table were only a B-Tree index > exists. This is not mandatory, but for my example (a link table with two PK > columns) only a B-Tree index is created. Well, clustered indexes mean different things to different vendors. Oracle's implementation stores the data with the index as does SQL Server, but in a little different fashion. -- Jonah H. Harris, Senior DBA myYearbook.com