Thread: help understanding analyze
Hi all, excuse me for this trivial question, but here's my doubt: create table person(varchar id, varchar surname, varchar name) with id primary key. Now, the query: select * from person order by surname,name provide me an explaination that is sequential scan + sort, as I expected. After that I build an index on surname,name (clustered) and run vacuum to update statistics. Then I ran again the query and got the same results (scan + sort) with the same time. Now my trivial question is: why another sort? The index is clustered so the database should not need to sort the output, or am I using wrong the tools? Someone can explain me that? Thanks, Luca
Luca Ferrari <fluca1978@infinito.it> wrote: > > Hi all, > excuse me for this trivial question, but here's my doubt: > create table person(varchar id, varchar surname, varchar name) > with id primary key. Now, the query: > select * from person order by surname,name > provide me an explaination that is sequential scan + sort, as I expected. > After that I build an index on surname,name (clustered) and run vacuum to > update statistics. Then I ran again the query and got the same results (scan > + sort) with the same time. > Now my trivial question is: why another sort? The index is clustered so the > database should not need to sort the output, or am I using wrong the tools? > Someone can explain me that? I doubt that the planner has any way to know that the table, at any point in time, is still 100% clustered. If even one row has been added since the cluster was done, the table will need resorted. Might be an optimization that could be done, except that I expect there will be very few cases where it will actually make a difference. How often do you have a table that never changes and can always be assured of being in index order? -Bill
Luca Ferrari <fluca1978@infinito.it> writes: > excuse me for this trivial question, but here's my doubt: > create table person(varchar id, varchar surname, varchar name) > with id primary key. Now, the query: > select * from person order by surname,name > provide me an explaination that is sequential scan + sort, as I expected. > After that I build an index on surname,name (clustered) and run vacuum to > update statistics. Then I ran again the query and got the same results (scan > + sort) with the same time. > Now my trivial question is: why another sort? The index is clustered so the > database should not need to sort the output, or am I using wrong the tools? Well, CLUSTER does not guarantee that the data remains sorted --- as soon as you do any updates it won't be anymore. So the planner can never assume that a plain seqscan delivers correctly sorted output. The real question you should be asking in the above case is why it didn't use an indexscan on that index, and the answer is probably that you didn't ANALYZE. VACUUM does not update the statistics about index correlation. regards, tom lane
On Saturday 09 December 2006 03:48 Tom Lane's cat, walking on the keyboard, wrote: > Well, CLUSTER does not guarantee that the data remains sorted --- as > soon as you do any updates it won't be anymore. So the planner can > never assume that a plain seqscan delivers correctly sorted output. And when the cluster is rebuilt? I mean, in theory, a clustered index should be sorted at any point in time, that means I've got much cost for insert/update cause I need to sort again the index when I'm performing the insert/update, isn't it? For me, at least in theory, a clustered index is always sorted. Now, assuming that my table is not changing (the number of people hired/fired is very low!), it makes sense to me use a clustered index cause I should not have the cost of insert/update but should have better performances. Maybe I cannot understand something... > > The real question you should be asking in the above case is why it > didn't use an indexscan on that index, and the answer is probably > that you didn't ANALYZE. VACUUM does not update the statistics > about index correlation. I did run analyze, and the explain shows me the seq scan and then a sort. The only difference I've seen between a only vacuum and a analyze is that the seq. scan cost changes, but the final cost (i.e., seq. scan and sort) is the same either with or without the index. This is the point I cannot understand. And of course, as you stated, the problem is that the system is not considering the created index (of course I can suggest it within the select statement), and I don't know why. Any explaination? Thanks, Luca
On Sat, Dec 09, 2006 at 11:35:39AM +0100, Luca Ferrari wrote: > And when the cluster is rebuilt? I mean, in theory, a clustered index should > be sorted at any point in time, that means I've got much cost for > insert/update cause I need to sort again the index when I'm performing the > insert/update, isn't it? For me, at least in theory, a clustered index is > always sorted. Now, assuming that my table is not changing (the number of > people hired/fired is very low!), it makes sense to me use a clustered index > cause I should not have the cost of insert/update but should have better > performances. Maybe I cannot understand something... I think you're confused about what CLUSTER does. There's is no such thing as a "clustered index". An index is always organised in some way, if it's a b-tree index it has the information of the key in sorted order. When you cluster a table it rearranges the data so it is in the same order as the index. But it's not kept that way. The index is kept sorted but the data is not. > I did run analyze, and the explain shows me the seq scan and then a sort. The > only difference I've seen between a only vacuum and a analyze is that the > seq. scan cost changes, but the final cost (i.e., seq. scan and sort) is the > same either with or without the index. This is the point I cannot understand. If it doesn't say "Index Scan" it's not using the index. At a guess your table is not big enough to make an index worthwhile. If your table is only a few pages long, it's just not efficient to lookup an index first. If you post the results of EXPLAIN ANALYZE we can tell you for sure. Have a ncie day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > At a guess your table is not big enough to make an index worthwhile. If > your table is only a few pages long, it's just not efficient to lookup > an index first. > If you post the results of EXPLAIN ANALYZE we can tell you for sure. Actually, it would be interesting to see EXPLAIN ANALYZE for the query both with enable_sort = on and enable_sort = off. That would show what the planner thinks the relative costs are as well as what the true costs are. regards, tom lane
You will need to first ANALYZE the table so that the stats are updated, and then it depends on the number of records in the table as well because if the number is low then a Sequence will be much better cost wise then choosing an index scan...
Thanks,
---------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
Thanks,
---------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
On 12/8/06, Luca Ferrari < fluca1978@infinito.it> wrote:
Hi all,
excuse me for this trivial question, but here's my doubt:
create table person(varchar id, varchar surname, varchar name)
with id primary key. Now, the query:
select * from person order by surname,name
provide me an explaination that is sequential scan + sort, as I expected.
After that I build an index on surname,name (clustered) and run vacuum to
update statistics. Then I ran again the query and got the same results (scan
+ sort) with the same time.
Now my trivial question is: why another sort? The index is clustered so the
database should not need to sort the output, or am I using wrong the tools?
Someone can explain me that?
Thanks,
Luca
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
http://archives.postgresql.org/
On Saturday 09 December 2006 11:46 Martijn van Oosterhout's cat, walking on the keyboard, wrote: > If it doesn't say "Index Scan" it's not using the index. > > At a guess your table is not big enough to make an index worthwhile. If > your table is only a few pages long, it's just not efficient to lookup > an index first. Dear Martijn, thanks for your explainations! I have checked the siz eof my table and, despite the number of rows, it has less pages then the index, thus the seq. scan is of course the faster one! Now I begin to understand a little better how the optimizer works. Thanks everyone, Luca