Thread: help understanding analyze

help understanding analyze

From

Luca Ferrari

Date:

09 December 2006, 00:44:19

Hi all,
excuse me for this trivial question, but here's my doubt:
create table person(varchar id, varchar surname, varchar name)
with id primary key. Now, the query:
select * from person order by surname,name
provide me an explaination that is sequential scan + sort, as I expected.
After that I build an index on surname,name (clustered) and run vacuum to
update statistics. Then I ran again the query and got the same results (scan
+ sort) with the same time.
Now my trivial question is: why another sort? The index is clustered so the
database should not need to sort the output, or am I using wrong the tools?
Someone can explain me that?

Thanks,
Luca

Re: help understanding analyze

From

Bill Moran

Date:

09 December 2006, 01:47:26

Luca Ferrari <fluca1978@infinito.it> wrote:
>
> Hi all,
> excuse me for this trivial question, but here's my doubt:
> create table person(varchar id, varchar surname, varchar name)
> with id primary key. Now, the query:
> select * from person order by surname,name
> provide me an explaination that is sequential scan + sort, as I expected.
> After that I build an index on surname,name (clustered) and run vacuum to
> update statistics. Then I ran again the query and got the same results (scan
> + sort) with the same time.
> Now my trivial question is: why another sort? The index is clustered so the
> database should not need to sort the output, or am I using wrong the tools?
> Someone can explain me that?

I doubt that the planner has any way to know that the table, at any point
in time, is still 100% clustered.  If even one row has been added since
the cluster was done, the table will need resorted.

Might be an optimization that could be done, except that I expect there
will be very few cases where it will actually make a difference.  How
often do you have a table that never changes and can always be assured
of being in index order?

-Bill

Re: help understanding analyze

From

Tom Lane

Date:

09 December 2006, 01:48:31

Luca Ferrari <fluca1978@infinito.it> writes:
> excuse me for this trivial question, but here's my doubt:
> create table person(varchar id, varchar surname, varchar name)
> with id primary key. Now, the query:
> select * from person order by surname,name
> provide me an explaination that is sequential scan + sort, as I expected.
> After that I build an index on surname,name (clustered) and run vacuum to
> update statistics. Then I ran again the query and got the same results (scan
> + sort) with the same time.
> Now my trivial question is: why another sort? The index is clustered so the
> database should not need to sort the output, or am I using wrong the tools?

Well, CLUSTER does not guarantee that the data remains sorted --- as
soon as you do any updates it won't be anymore.  So the planner can
never assume that a plain seqscan delivers correctly sorted output.

The real question you should be asking in the above case is why it
didn't use an indexscan on that index, and the answer is probably
that you didn't ANALYZE.  VACUUM does not update the statistics
about index correlation.

            regards, tom lane

Re: help understanding analyze

From

Luca Ferrari

Date:

09 December 2006, 09:35:44

On Saturday 09 December 2006 03:48 Tom Lane's cat, walking on the keyboard,
wrote:
> Well, CLUSTER does not guarantee that the data remains sorted --- as
> soon as you do any updates it won't be anymore.  So the planner can
> never assume that a plain seqscan delivers correctly sorted output.

And when the cluster is rebuilt? I mean, in theory, a clustered index should
be sorted at any point in time, that means I've got much cost for
insert/update cause I need to sort again the index when I'm performing the
insert/update, isn't it? For me, at least in theory, a clustered index is
always sorted. Now, assuming that my table is not changing (the number of
people hired/fired is very low!), it makes sense to me use a clustered index
cause I should not have the cost of insert/update but should have better
performances. Maybe I cannot understand something...

>
> The real question you should be asking in the above case is why it
> didn't use an indexscan on that index, and the answer is probably
> that you didn't ANALYZE.  VACUUM does not update the statistics
> about index correlation.

I did run analyze, and the explain shows me the seq scan and then a sort. The
only difference I've seen between a only vacuum and a analyze is that the
seq. scan cost changes, but the final cost (i.e., seq. scan and sort) is the
same either with or without the index. This is the point I cannot understand.
And of course, as you stated, the problem is that the system is not
considering the created index (of course I can suggest it within the select
statement), and I don't know why.
Any explaination?

Thanks,
Luca

Re: help understanding analyze

From

Martijn van Oosterhout

Date:

09 December 2006, 09:46:07

On Sat, Dec 09, 2006 at 11:35:39AM +0100, Luca Ferrari wrote:
> And when the cluster is rebuilt? I mean, in theory, a clustered index should
> be sorted at any point in time, that means I've got much cost for
> insert/update cause I need to sort again the index when I'm performing the
> insert/update, isn't it? For me, at least in theory, a clustered index is
> always sorted. Now, assuming that my table is not changing (the number of
> people hired/fired is very low!), it makes sense to me use a clustered index
> cause I should not have the cost of insert/update but should have better
> performances. Maybe I cannot understand something...

I think you're confused about what CLUSTER does. There's is no such
thing as a "clustered index". An index is always organised in some way,
if it's a b-tree index it has the information of the key in sorted
order. When you cluster a table it rearranges the data so it is in the
same order as the index. But it's not kept that way. The index is kept
sorted but the data is not.

> I did run analyze, and the explain shows me the seq scan and then a sort. The
> only difference I've seen between a only vacuum and a analyze is that the
> seq. scan cost changes, but the final cost (i.e., seq. scan and sort) is the
> same either with or without the index. This is the point I cannot understand.

If it doesn't say "Index Scan" it's not using the index.

At a guess your table is not big enough to make an index worthwhile. If
your table is only a few pages long, it's just not efficient to lookup
an index first.

If you post the results of EXPLAIN ANALYZE we can tell you for sure.

Have a ncie day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: help understanding analyze

From

Tom Lane

Date:

09 December 2006, 16:49:51

Martijn van Oosterhout <kleptog@svana.org> writes:
> At a guess your table is not big enough to make an index worthwhile. If
> your table is only a few pages long, it's just not efficient to lookup
> an index first.
> If you post the results of EXPLAIN ANALYZE we can tell you for sure.

Actually, it would be interesting to see EXPLAIN ANALYZE for the query
both with enable_sort = on and enable_sort = off.  That would show what
the planner thinks the relative costs are as well as what the true costs
are.

            regards, tom lane

Re: help understanding analyze

From

"Shoaib Mir"

Date:

10 December 2006, 18:40:31

You will need to first ANALYZE the table so that the stats are updated, and then it depends on the number of records in the table as well because if the number is low then a Sequence will be much better cost wise then choosing an index scan...

Thanks,
---------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 12/8/06, Luca Ferrari < fluca1978@infinito.it> wrote:

Hi all,
excuse me for this trivial question, but here's my doubt:
create table person(varchar id, varchar surname, varchar name)
with id primary key. Now, the query:
select * from person order by surname,name
provide me an explaination that is sequential scan + sort, as I expected.
After that I build an index on surname,name (clustered) and run vacuum to
update statistics. Then I ran again the query and got the same results (scan
+ sort) with the same time.
Now my trivial question is: why another sort? The index is clustered so the
database should not need to sort the output, or am I using wrong the tools?
Someone can explain me that?

Thanks,
Luca

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/

Re: help understanding analyze

From

Luca Ferrari

Date:

10 December 2006, 19:12:28

On Saturday 09 December 2006 11:46 Martijn van Oosterhout's cat, walking on
the keyboard, wrote:
> If it doesn't say "Index Scan" it's not using the index.
>
> At a guess your table is not big enough to make an index worthwhile. If
> your table is only a few pages long, it's just not efficient to lookup
> an index first.

Dear Martijn,
thanks for your explainations! I have checked the siz eof my table and,
despite the number of rows, it has less pages then the index, thus the seq.
scan is of course the faster one!
Now I begin to understand a little better how the optimizer works.

Thanks everyone,
Luca