Thread: does "select count(*) from mytable" always do a seq scan?
Hi, I am using Postgresql 7.4. I have a table with 1.5 million rows. It has a primary key. VACUUM FULL ANALYZE is run every night. There are 2000-5000 inserts on this table every day but very few updates and deletes. When I select count(*) from this table it is using a sequence scan. Is this just life or is there some way to get this to do an index scan? culley
Am Freitag, den 07.01.2005, 06:45 -0800 schrieb Culley Harrelson: > Hi, > > I am using Postgresql 7.4. I have a table with 1.5 million rows. It > has a primary key. VACUUM FULL ANALYZE is run every night. There are > 2000-5000 inserts on this table every day but very few updates and > deletes. When I select count(*) from this table it is using a > sequence scan. Is this just life or is there some way to get this to > do an index scan? How do you think an index would help if you do an unconditional count(*)? Regards Tino
On Fri, Jan 07, 2005 at 16:17:16 +0100, Tino Wildenhain <tino@wildenhain.de> wrote: > Am Freitag, den 07.01.2005, 06:45 -0800 schrieb Culley Harrelson: > > Hi, > > > > I am using Postgresql 7.4. I have a table with 1.5 million rows. It > > has a primary key. VACUUM FULL ANALYZE is run every night. There are > > 2000-5000 inserts on this table every day but very few updates and > > deletes. When I select count(*) from this table it is using a > > sequence scan. Is this just life or is there some way to get this to > > do an index scan? > > How do you think an index would help if you do an unconditional > count(*)? Some systems can just run through the index without having to access the tuples. This can result in you having to read significantly fewer disk blocks to get the count. Unfortunately, postgres still needs to check visibility for each tuple and so an using index scan for count will be slower than a sequential scan if a significant fraction of the table is being counted. If an approximate answer is OK there is some information calculated when you vacuum a table and you could query this value in the pg catalog. I don't remember the name of what you want, but this should be in the archives. Another solution is to use a trigger to keep a count in another table. from what you say above, this might be a practical solution for you. Doing this has also been discussed in the archives.
On Fri, 07 Jan 2005 16:17:16 +0100, Tino Wildenhain <tino@wildenhain.de> wrote: > > How do you think an index would help if you do an unconditional > count(*)? I really don't know <grin>. I don't know the inner workings of database internals but I would guess that there would be some optimized way of counting the nodes in an index tree that would be faster than sequentially going through a table.... I suppose there is no free lunch. One row, two rows, three rows, four rows, five rows.... <snore> culley
This is interesting... Perhaps a more knowledgable person for pgsql could help us here... I seem to remember something to do with the fact that You can't use aggregate functions over an index... I'm not sure why though. You can do: create index foo on my_table (lower(my_column)) but not create index foo on my_table(min(my_column)) - I guess it wouldn't be much of an index - it would be a single value. You could reproduce that functionality with a trigger that updated a table that contained the value of (min(my_column)), and I guess you could do the same fo count(*) too. I guess what I"m really asking is why can't you run aggregates over an index? Alex Turner NetEconomist On Fri, 7 Jan 2005 09:09:49 -0800, Culley Harrelson <harrelson@gmail.com> wrote: > On Fri, 07 Jan 2005 16:17:16 +0100, Tino Wildenhain <tino@wildenhain.de> wrote: > > > > How do you think an index would help if you do an unconditional > > count(*)? > > I really don't know <grin>. I don't know the inner workings of > database internals but I would guess that there would be some > optimized way of counting the nodes in an index tree that would be > faster than sequentially going through a table.... I suppose there is > no free lunch. > > One row, two rows, three rows, four rows, five rows.... <snore> > > culley > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
> I guess what I"m really asking is why can't you run aggregates over an index? It's got to do with MVCC and transaction consistency. Running count(*) or an aggregate function on an index could include records that should not be visible to your current transaction. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 665-7007 voice
No offense or anything, but that doesn't make any sense. If you are running count(*) against a table, it still has to worry about MVCC, and which rows are visible to your transaction. What difference does it make, table or index, the system still has to figure out which rows are visible in the current transaction, so why not use the index? (The example is really count(pkey) because count(*) is always going to do a seq scan I reckon - and could probably never use an index). Alex Turner NetEconomist On Fri, 07 Jan 2005 11:17:32 -0700, Scott Ribe <scott_ribe@killerbytes.com> wrote: > > I guess what I"m really asking is why can't you run aggregates over an index? > > It's got to do with MVCC and transaction consistency. Running count(*) or an > aggregate function on an index could include records that should not be > visible to your current transaction. > > -- > Scott Ribe > scott_ribe@killerbytes.com > http://www.killerbytes.com/ > (303) 665-7007 voice > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
> No offense or anything, but that doesn't make any sense. If you are > running count(*) against a table, it still has to worry about MVCC, > and which rows are visible to your transaction. What difference does > it make, table or index, the system still has to figure out which rows > are visible in the current transaction, so why not use the index? Your mistake seems to be assuming that row visibility is tracked in the index. As was stated earlier in the thread, row visibility information is not available in the index, therefore rows have to be looked at to determine whether they're visible. What this means is that using the index would only add an additional unnecessary step. > (The example is really count(pkey) because count(*) is always going to > do a seq scan I reckon - and could probably never use an index). No, if there is an index on a column that is required, such as a primary key, then count(pkey) is equal to count(*). Many databases make use of this fact to optimize performance of count(*) by using an index scan. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 665-7007 voice
> (The example is really count(pkey) because count(*) is always going to > do a seq scan I reckon - and could probably never use an index). postgres knows that count(*) is just "count the rows", you can use count(1), it makes no difference... > > Alex Turner > NetEconomist > > > On Fri, 07 Jan 2005 11:17:32 -0700, Scott Ribe > <scott_ribe@killerbytes.com> wrote: >> > I guess what I"m really asking is why can't you run aggregates over >> an index? >> >> It's got to do with MVCC and transaction consistency. Running count(*) >> or an >> aggregate function on an index could include records that should not be >> visible to your current transaction. >> >> -- >> Scott Ribe >> scott_ribe@killerbytes.com >> http://www.killerbytes.com/ >> (303) 665-7007 voice >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 8: explain analyze is your friend >> > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
Forgive my ignorance, but I'm still learning about much of this stuff. If you perform: select an_id, int_value from my_table where int_value>400; The table has an index on int_value and there are enough rows to warrant using it. Doesn't the database perform in index scan on int_value followed by a retrieve for the datablocks with relavent oids to get the an_id field? If another transaction has inserted rows into this table, won't the index have been updated, and contain new row references? Does this imply that the database must retrieve the row information to determine if the row is a row from a different transaction? thanks, Alex Turner On Sat, 08 Jan 2005 12:39:41 -0700, Scott Ribe <scott_ribe@killerbytes.com> wrote: > > No offense or anything, but that doesn't make any sense. If you are > > running count(*) against a table, it still has to worry about MVCC, > > and which rows are visible to your transaction. What difference does > > it make, table or index, the system still has to figure out which rows > > are visible in the current transaction, so why not use the index? > > Your mistake seems to be assuming that row visibility is tracked in the > index. As was stated earlier in the thread, row visibility information is > not available in the index, therefore rows have to be looked at to determine > whether they're visible. What this means is that using the index would only > add an additional unnecessary step. > > > (The example is really count(pkey) because count(*) is always going to > > do a seq scan I reckon - and could probably never use an index). > > No, if there is an index on a column that is required, such as a primary > key, then count(pkey) is equal to count(*). Many databases make use of this > fact to optimize performance of count(*) by using an index scan. > > -- > Scott Ribe > scott_ribe@killerbytes.com > http://www.killerbytes.com/ > (303) 665-7007 voice > >
On Mon, Jan 10, 2005 at 10:26:46 -0500, Alex Turner <armtuk@gmail.com> wrote: > Forgive my ignorance, but I'm still learning about much of this stuff. > If you perform: > > select an_id, int_value from my_table where int_value>400; > > The table has an index on int_value and there are enough rows to > warrant using it. Doesn't the database perform in index scan on > int_value followed by a retrieve for the datablocks with relavent oids > to get the an_id field? I don't think that oids are used in the process, but if the planner thinks an index scan would be better it will use one. > If another transaction has inserted rows into this table, won't the > index have been updated, and contain new row references? Does this > imply that the database must retrieve the row information to determine > if the row is a row from a different transaction? When doing an index scan, the heap tuples still need to be checked for visibility to the current transaction.
I'm no database writing guru, but wouldn't it just be a matter of adding a transaction number to an index entry so as to determine it's newness and only retrieve entries with an older transaction number? I'm guessing that the theory is that most insert transactions will be committed, or only contain a small number of rows relative to the overall size of the table, and therefore the extra overhead of checking newer tuples won't impact the overall performance that much? I know I'm asking kind of deep questions that really don't affect much of anything, but I'm a devilishly curious individual, and I like understanding things that I use well. Feel free to tell me that it's irrelavant, or that I'm full of hot air and I don't have a good question ;) Alex Turner NetEconomist On Mon, 10 Jan 2005 10:34:46 -0600, Bruno Wolff III <bruno@wolff.to> wrote: > On Mon, Jan 10, 2005 at 10:26:46 -0500, > Alex Turner <armtuk@gmail.com> wrote: > > Forgive my ignorance, but I'm still learning about much of this stuff. > > If you perform: > > > > select an_id, int_value from my_table where int_value>400; > > > > The table has an index on int_value and there are enough rows to > > warrant using it. Doesn't the database perform in index scan on > > int_value followed by a retrieve for the datablocks with relavent oids > > to get the an_id field? > > I don't think that oids are used in the process, but if the planner thinks > an index scan would be better it will use one. > > > If another transaction has inserted rows into this table, won't the > > index have been updated, and contain new row references? Does this > > imply that the database must retrieve the row information to determine > > if the row is a row from a different transaction? > > When doing an index scan, the heap tuples still need to be checked for > visibility to the current transaction. > >
On Mon, Jan 10, 2005 at 11:51:51AM -0500, Alex Turner wrote: > I'm no database writing guru, but wouldn't it just be a matter of > adding a transaction number to an index entry so as to determine it's > newness and only retrieve entries with an older transaction number? No, it's more complex than that. Index entries would have to be labeled with both a creation transaction Id and a destruction transaction Id (xmin and xmax. Probably it'd also need Cmin and Cmax to be completely consistent.) Keeping them in sync would be prone to deadlock because it'd have to simultaneously update the table proper and the possibly multiple indexes. Plus, having all those identifiers in the index file would imply more I/O costs. > I'm guessing that the theory is that most insert transactions will be > committed, or only contain a small number of rows relative to the > overall size of the table, and therefore the extra overhead of > checking newer tuples won't impact the overall performance that much? The choice is yours (or whoever's): if you absolutely need exact numbers, you pay the cost of having a trigger. OTOH if you can do with estimates, you can use the reltuples column from pg_class. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "There is evil in the world. There are dark, awful things. Occasionally, we get a glimpse of them. But there are dark corners; horrors almost impossible to imagine... even in our worst nightmares." (Van Helsing, Dracula A.D. 1972)
On Mon, Jan 10, 2005 at 11:51:51 -0500, Alex Turner <armtuk@gmail.com> wrote: > I'm no database writing guru, but wouldn't it just be a matter of > adding a transaction number to an index entry so as to determine it's > newness and only retrieve entries with an older transaction number? No, because transactions don't complete in the order they are started. So you will need to be able to maintain some kind of list to cover exceptions. > I'm guessing that the theory is that most insert transactions will be > committed, or only contain a small number of rows relative to the > overall size of the table, and therefore the extra overhead of > checking newer tuples won't impact the overall performance that much? > > I know I'm asking kind of deep questions that really don't affect much > of anything, but I'm a devilishly curious individual, and I like > understanding things that I use well. Feel free to tell me that it's > irrelavant, or that I'm full of hot air and I don't have a good > question ;) There have been discussions in the past about why the core developers feel that moving visibility status into indexes would be a net loss on average. I don't think there has been one for a while, but you can try searching the hackers archive.
On Mon, Jan 10, 2005 at 11:23:42AM -0600, Bruno Wolff III wrote: > There have been discussions in the past about why the core developers > feel that moving visibility status into indexes would be a net loss > on average. I don't think there has been one for a while, but you can > try searching the hackers archive. I wonder if a technical discussion or a link to one could find its way into the FAQ. Has anybody ever put together a single document that describes in detail the issue, various proposed solutions and their tradeoffs, and why the developers think the chosen implementation is best? -- Michael Fuhr http://www.fuhr.org/~mfuhr/