Thread: 9.2 and index only scans

9.2 and index only scans

From

Thomas Kellerer

Date:

26 August 2012, 09:04:48

Hi,

I'm playing around with 9.2 beta4 and was looking into the new Index Only Scan feature.

I was a bit surprised that a "count(*)" query does not use an index.
Not even a count(col) where col is the PK of the table.

Is that intended? If so, why is that the case?
I would have thought that this is an operation which is suitable to be served by an index only scan.

Regards
Thomas

Re: 9.2 and index only scans

From

Tom Lane

Date:

26 August 2012, 14:31:54

Thomas Kellerer <spam_eater@gmx.net> writes:
> I'm playing around with 9.2 beta4 and was looking into the new Index Only Scan feature.
> I was a bit surprised that a "count(*)" query does not use an index.

Works for me.  However, the cost estimate for that is heavily dependent
on how much of the table is known all-visible.  If the table is getting
a lot of churn, or even just hasn't been vacuumed since it quiesced,
the planner will prefer a seqscan for this --- and it will be right.

            regards, tom lane

Re: 9.2 and index only scans

From

Thomas Kellerer

Date:

26 August 2012, 15:02:56

Tom Lane wrote on 26.08.2012 16:31:
> Thomas Kellerer <spam_eater@gmx.net> writes:
>> I'm playing around with 9.2 beta4 and was looking into the new Index Only Scan feature.
>> I was a bit surprised that a "count(*)" query does not use an index.
>
> Works for me.  However, the cost estimate for that is heavily dependent
> on how much of the table is known all-visible.  If the table is getting
> a lot of churn, or even just hasn't been vacuumed since it quiesced,
> the planner will prefer a seqscan for this --- and it will be right.
>

Hmm. So it's something with my environment.

Should the following setup qualify for an index scan?

postgres=# select version();
                             version
----------------------------------------------------------------
  PostgreSQL 9.2beta4, compiled by Visual C++ build 1600, 32-bit
(1 row)


postgres=#
postgres=# create table foo (id integer not null primary key, some_data text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo"
CREATE TABLE
postgres=#
postgres=# insert into foo (id, some_data)
postgres-# select i, rpad('x',2500,'*')
postgres-# from generate_series(1,100000) i;
INSERT 0 100000
postgres=#
postgres=# vacuum analyze foo;
VACUUM
postgres=#
postgres=# explain (analyze on, buffers on, verbose on)  select count(*) from foo;
                                                        QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=2185.00..2185.01 rows=1 width=0) (actual time=67.622..67.622 rows=1 loops=1)
    Output: count(*)
    Buffers: shared hit=935
    ->  Seq Scan on public.foo  (cost=0.00..1935.00 rows=100000 width=0) (actual time=0.020..37.531 rows=100000
loops=1)
          Output: id, some_data
          Buffers: shared hit=935
  Total runtime: 67.670 ms
(7 rows)

Regards
Thomas

Re: 9.2 and index only scans

From

Jeff Janes

Date:

26 August 2012, 18:45:52

On Sun, Aug 26, 2012 at 8:02 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Tom Lane wrote on 26.08.2012 16:31:
>
>> Thomas Kellerer <spam_eater@gmx.net> writes:
>>>
>>> I'm playing around with 9.2 beta4 and was looking into the new Index Only
>>> Scan feature.
>>> I was a bit surprised that a "count(*)" query does not use an index.
>>
>>
>> Works for me.  However, the cost estimate for that is heavily dependent
>> on how much of the table is known all-visible.  If the table is getting
>> a lot of churn, or even just hasn't been vacuumed since it quiesced,
>> the planner will prefer a seqscan for this --- and it will be right.
>>
>
> Hmm. So it's something with my environment.
>
> Should the following setup qualify for an index scan?

The seq scan is estimated to use sequential reads, while the
index-only scan is estimated to use random reads (because the index is
scanned in logical order, not physical order).

If you set random_page_cost equal to seq_page_cost, that would
artificially favor the index only scan.

Also, your filler is highly compressible, which means the table is
much smaller than you might think.

Cheers,

Jeff

Re: 9.2 and index only scans

From

Tom Lane

Date:

26 August 2012, 19:34:38

Jeff Janes <jeff.janes@gmail.com> writes:
> On Sun, Aug 26, 2012 at 8:02 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
>> Should the following setup qualify for an index scan?

> ... Also, your filler is highly compressible, which means the table is
> much smaller than you might think.

Yeah.  I see something like 100 rows per page with this example; the
heap is 935 pages, the index 276, which makes things about a wash I/O
wise when you assume that random reads from the index will cost 4x what
sequential reads from the heap will.

You can force an index scan to occur anyway by setting enable_seqscan to
zero.  When I do that, I see an estimated cost that is marginally more
than for the seqscan, and the actual runtime is too.  I'm not sure I'd
put a whole lot of stock in that considering the example is small enough
to be fully cached, but it does show that index-only scans aren't a
magic bullet.

            regards, tom lane

Re: 9.2 and index only scans

From

Pavel Stehule

Date:

26 August 2012, 19:43:30

2012/8/26 Tom Lane <tgl@sss.pgh.pa.us>:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> On Sun, Aug 26, 2012 at 8:02 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
>>> Should the following setup qualify for an index scan?
>
>> ... Also, your filler is highly compressible, which means the table is
>> much smaller than you might think.
>
> Yeah.  I see something like 100 rows per page with this example; the
> heap is 935 pages, the index 276, which makes things about a wash I/O
> wise when you assume that random reads from the index will cost 4x what
> sequential reads from the heap will.
>
> You can force an index scan to occur anyway by setting enable_seqscan to
> zero.  When I do that, I see an estimated cost that is marginally more
> than for the seqscan, and the actual runtime is too.  I'm not sure I'd
> put a whole lot of stock in that considering the example is small enough
> to be fully cached, but it does show that index-only scans aren't a
> magic bullet.

is possible use seqscan for index? When index is small - and can be
smaller than related table.

Regards

Pavel

>
>                         regards, tom lane
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: 9.2 and index only scans

From

Tom Lane

Date:

26 August 2012, 19:52:39

Pavel Stehule <pavel.stehule@gmail.com> writes:
> is possible use seqscan for index?

No, not for a normal indexscan --- concurrent page splits would break
it.

VACUUM can do that, mainly because it doesn't care if it visits some
entries twice (and even then, it has to add a lot of pushups to ensure
it doesn't miss any entries).

            regards, tom lane

Re: 9.2 and index only scans

From

Thomas Kellerer

Date:

26 August 2012, 19:57:54

Jeff Janes wrote on 26.08.2012 20:45:
> The seq scan is estimated to use sequential reads, while the
> index-only scan is estimated to use random reads (because the index is
> scanned in logical order, not physical order).
>
> If you set random_page_cost equal to seq_page_cost, that would
> artificially favor the index only scan.
>
> Also, your filler is highly compressible, which means the table is
> much smaller than you might think.

I tried it also with 750000 rows filled with 3 text columns of random string (between 20 and 15000 characters long).
But also with that bigger data I just don't get an index scan.

Seems that the prerequisites for an index only scan to happen are quite narrow.
But given the fact that it's a brand new feature I guess it will improve over time ;)

Regards
Thomas

Re: 9.2 and index only scans

From

Jeff Janes

Date:

26 August 2012, 20:26:18

On Sun, Aug 26, 2012 at 12:58 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
> Jeff Janes wrote on 26.08.2012 20:45:
>
>> The seq scan is estimated to use sequential reads, while the
>> index-only scan is estimated to use random reads (because the index is
>> scanned in logical order, not physical order).
>>
>> If you set random_page_cost equal to seq_page_cost, that would
>> artificially favor the index only scan.
>>
>> Also, your filler is highly compressible, which means the table is
>> much smaller than you might think.
>
>
> I tried it also with 750000 rows filled with 3 text columns of random string
> (between 20 and 15000 characters long).

Could you show that in a copy-and-paste-able example?

> But also with that bigger data I just don't get an index scan.

Did you change random_page_cost?

> Seems that the prerequisites for an index only scan to happen are quite
> narrow.

Unrestricted count(*) is itself a pretty narrow use case.  It is not
the one for which IOS is most likely to prove useful.

> But given the fact that it's a brand new feature I guess it will improve
> over time ;)

When I force an index-only scan (set enable_seqscan=off) it turned out
to be very slightly slower than the sequential scan, so the planner
was getting it right, but perhaps not for the right reason.

Cheers,

Jeff

Re: 9.2 and index only scans

From

Thomas Kellerer

Date:

26 August 2012, 21:00:59

Jeff Janes wrote on 26.08.2012 22:26:
>>> The seq scan is estimated to use sequential reads, while the
>>> index-only scan is estimated to use random reads (because the index is
>>> scanned in logical order, not physical order).

Sounds like scanning the index in physical order would be an enhancement.
That could improve the performance of for queries that don't require a special order and can be retrieved from an
index.

>> I tried it also with 750000 rows filled with 3 text columns of random string
>> (between 20 and 15000 characters long).
>
> Could you show that in a copy-and-paste-able example?

Not really - it's about 70MB.

I used a Benerator script (http://databene.org/databene-benerator) to generate the test
data which is attached to this email. The table itself is pretty straight forward:

CREATE TABLE pages
(
    id          integer    NOT NULL,
    url         text,
    last_error  text,
    content     text
);

ALTER TABLE pages
    ADD CONSTRAINT pages_pkey PRIMARY KEY (id);

> Did you change random_page_cost?
Yes I did. When setting it to 3.0 the planner will indeed use an index only scan.

And it does confirm Tom's suspicion that the planner is actually correct, because the index scan is indeed slower than
theseq scan. 

I was inspired by this question on StackOverflow:
http://stackoverflow.com/questions/12128501/fastest-way-to-count-the-rows-in-any-database-table/12128545#12128545

Which shows Oracle's behaviour with an index scan for the count(*) operation.

I thought I'd try 9.2 to see how it compares, but it seems there is quite some way to go for my
beloved Postgres to catch up ;)


Thanks for all the input.

Regards
Thomas

Attachment

pages.xml

Re: 9.2 and index only scans

From

Pavel Stehule

Date:

27 August 2012, 04:05:13

2012/8/26 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> is possible use seqscan for index?
>
> No, not for a normal indexscan --- concurrent page splits would break
> it.
>

and what about seq scan for prefetch index - and processing should be
random, but over pages in cache?

Pavel

> VACUUM can do that, mainly because it doesn't care if it visits some
> entries twice (and even then, it has to add a lot of pushups to ensure
> it doesn't miss any entries).
>
>                         regards, tom lane

Re: 9.2 and index only scans

From

Martijn van Oosterhout

Date:

28 August 2012, 08:02:22

On Sun, Aug 26, 2012 at 11:01:31PM +0200, Thomas Kellerer wrote:
> I was inspired by this question on StackOverflow:
> http://stackoverflow.com/questions/12128501/fastest-way-to-count-the-rows-in-any-database-table/12128545#12128545
>
> Which shows Oracle's behaviour with an index scan for the count(*) operation.

Interesting, It shows indeed Oracle uses the index to do the operation.

For postgres it's not so simple for a few reasons, I'm not sure how
oracle avoids the same issues:

- The index has no visibility information, so you can't tell if an
  index entry refers to a row you can actually see in your session.
  The visibility map might help here in the future.

- Different versions of the same row (after an UPDATE for example) may
  both be in the index, Now if you're counting a primary key column you
  can work around that.

But frankly, counting all the rows in a table is something I never do.
The system tables carry estimates which have proved good enough for
statistical purposes when I need them.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Attachment

signature.asc

Re: 9.2 and index only scans

From

Thomas Kellerer

Date:

28 August 2012, 09:51:26

Martijn van Oosterhout, 28.08.2012 10:02:
> I'm not sure how oracle avoids the same issues:
> - The index has no visibility information, so you can't tell if an
>    index entry refers to a row you can actually see in your session.
>    The visibility map might help here in the future.

In Oracle an index (entry) has the information about transactional visibility.

> - Different versions of the same row (after an UPDATE for example) may
>    both be in the index, Now if you're counting a primary key column you
>    can work around that.

This also works fine in Oracle due to the visibility information inside the index.
I did a test where I deleted and inserted a bunch of rows in the test table (in a different transaction).
The execution plan - even the real one - still used the index.

> But frankly, counting all the rows in a table is something I never do.

I agree, but I thought it was a nice example to test out this new PostgreSQL feature after seeing the SO question.

Regards
Thomas

Re: 9.2 and index only scans

From

Craig Ringer

Date:

28 August 2012, 13:05:14

On 08/28/2012 05:51 PM, Thomas Kellerer wrote:
> Martijn van Oosterhout, 28.08.2012 10:02:
>> I'm not sure how oracle avoids the same issues:
>> - The index has no visibility information, so you can't tell if an
>>    index entry refers to a row you can actually see in your session.
>>    The visibility map might help here in the future.
>
> In Oracle an index (entry) has the information about transactional
> visibility.

Wow. Doesn't that mean that indexes are insanely expensive to update,
since each index (and possibly also the table its self) needs updating?

I can see that making sense for index-oriented tables, but otherwise ...
ugh.

--
Craig Ringer

Re: 9.2 and index only scans

From

Thomas Kellerer

Date:

28 August 2012, 13:43:38

Craig Ringer, 28.08.2012 15:04:

>> In Oracle an index (entry) has the information about transactional visibility.
>
> Wow. Doesn't that mean that indexes are insanely expensive to update,
> since each index (and possibly also the table its self) needs
> updating?
>
No, I don't think so.

It's the same mechanism that is used to maintain visibility for table rows.

This information is part of the database block. Blocks for indexes are treated the same way as blocks for tables, so
it'sa uniform implementation. 

It's maintained in a structure called "ITL" (Interested Transaction List) but I think this is getting a bit off-topic
now;) 

If you are interested, this is documented in the Oracle concepts manual:
http://docs.oracle.com/cd/E11882_01/server.112/e25789/consist.htm

Thomas

Re: 9.2 and index only scans

From

Chris Travers

Date:

28 August 2012, 13:55:28

On Tue, Aug 28, 2012 at 6:04 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:

On 08/28/2012 05:51 PM, Thomas Kellerer wrote:
Martijn van Oosterhout, 28.08.2012 10:02:
I'm not sure how oracle avoids the same issues:
- The index has no visibility information, so you can't tell if an
index entry refers to a row you can actually see in your session.
The visibility map might help here in the future.

In Oracle an index (entry) has the information about transactional
visibility.

Wow. Doesn't that mean that indexes are insanely expensive to update, since each index (and possibly also the table its self) needs updating?

I was thinking of read performance. But it might be a case where the global optimization might be worth the local cost. I don't know.

Best Wishes,

Chris Travers

Re: 9.2 and index only scans

From

Tom Lane

Date:

28 August 2012, 14:30:15

Thomas Kellerer <spam_eater@gmx.net> writes:
> Martijn van Oosterhout, 28.08.2012 10:02:
>> I'm not sure how oracle avoids the same issues:
>> - The index has no visibility information, so you can't tell if an
>> index entry refers to a row you can actually see in your session.
>> The visibility map might help here in the future.

> In Oracle an index (entry) has the information about transactional visibility.

You sure about that?

What I always understood about Oracle is that the main table (and by
implication, also the indexes) has *only* the most up-to-date version
of any row.  If you want back versions, you have to go trawling for them
in the rollback segments.  That approach has its pluses and minuses
compared to ours, but it doesn't seem to lead to keeping visibility info
in the index.

            regards, tom lane

Re: 9.2 and index only scans

From

Thomas Kellerer

Date:

28 August 2012, 14:55:33

Tom Lane, 28.08.2012 16:30:
>> In Oracle an index (entry) has the information about transactional visibility.
>
> You sure about that?

Yes, although technically it's not the index *entry*, but the index *block*.
But the result is the same thing.

The visibility information is stored on data block level. And an index block is not really different to a table block
whenit comes to transaction handling. 

> What I always understood about Oracle is that the main table (and by
> implication, also the indexes) has *only* the most up-to-date version
> of any row.

Yes that is true. The block stores a pointer to the "old" versions (through the so called SCN - comparable somehow to
PG'stxid). 

Depending on the transaction reading the block it will either follow that link (if it's a different transaction) or use
thatblock (if it's the transaction that modified the block in the first place). 

This is somewhat simplified, but I think it's a good enough picture.

If you are interested in more details, see the link to the concepts manual I posted in the response to Craig

Thomas