Thread: no index-usage on aggregate-functions?
Hi, I've experienced that PG up to current release does not make use of an index when aggregating. Which of course may resultin unacceptable answering times This behaviour is reproducable on any table with any aggregat function in all of my databases on every machine (PostgreSQL7.4.2 on i386-redhat-linux-gnu and PostgreSQL 7.2.1 on i686-pc-linux-gnu) f.e. querying against a 2.8-mio-records (2.800.000) table the_table SELECT count(*) FROM the_table => Seq scan -> takes about 12 sec SELECT Avg(num_found) AS NumFound FROM the_table --(index on num_found) => Seq scan -> takes about 10 sec SELECT Sum(num_found) AS TotalFound FROM the_table --(index on num_found) => Seq scan -> takes about 11 sec SELECT Max(date_) AS LatestDate FROM the_table --(index on date_) => Seq scan -> takes about 14 sec But SELECT date_ AS LatestDate FROM the_table ORDER BY date_ DESC LIMIT 1; => Index scan -> takes 0.18 msec MS SQLServer 2000: Use of an appropriate index _whenever_ aggregating. Am I doing something wrong? Greetings Harald
On Tue, 2004-06-29 at 00:42, Harald Lau (Sector-X) wrote: > Hi, > > I've experienced that PG up to current release does not make use of an index when aggregating. Which of course may resultin unacceptable answering times > > This behaviour is reproducable on any table with any aggregat function in all of my databases on every machine (PostgreSQL7.4.2 on i386-redhat-linux-gnu and PostgreSQL 7.2.1 on i686-pc-linux-gnu) > > f.e. querying against a 2.8-mio-records (2.800.000) table the_table > SELECT count(*) FROM the_table > => Seq scan -> takes about 12 sec > > SELECT Avg(num_found) AS NumFound FROM the_table --(index on num_found) > => Seq scan -> takes about 10 sec > > SELECT Sum(num_found) AS TotalFound FROM the_table --(index on num_found) > => Seq scan -> takes about 11 sec > > SELECT Max(date_) AS LatestDate FROM the_table --(index on date_) > => Seq scan -> takes about 14 sec > > But > SELECT date_ AS LatestDate FROM the_table ORDER BY date_ DESC LIMIT 1; > => Index scan -> takes 0.18 msec > > MS SQLServer 2000: Use of an appropriate index _whenever_ aggregating. > > Am I doing something wrong? Yes, you're expecting an MVCC database to behave like a row locking database. Due to the way PostgreSQL is put together, it can't count on an index giving it values, only pointers to values, so to speak. This means it can use an index, but it will still go to the table to get the right value. On the other hand, the trade off is that MVCC can handle much higher parallel loads, usually. Note that if you're aggregate is on sub subset of a table, then an index scan can often be a big win, such as: create table z(a int, b int); insert into z values (1,1); (repeat a couple thousand times) select avg(b) from z where a=3; <-- this can use the index But note that in the above, the table's rows will still have to be accessed to get the right values.
> f.e. querying against a 2.8-mio-records (2.800.000) table the_table > SELECT count(*) FROM the_table > => Seq scan -> takes about 12 sec This cannot be made O(1) in postgres due to MVCC. You just have to live with it. > SELECT Avg(num_found) AS NumFound FROM the_table --(index on num_found) > => Seq scan -> takes about 10 sec > > SELECT Sum(num_found) AS TotalFound FROM the_table --(index on num_found) > => Seq scan -> takes about 11 sec Average and sum can never use an index AFAIK, in any db server. You need information from every row. > SELECT Max(date_) AS LatestDate FROM the_table --(index on date_) > => Seq scan -> takes about 14 sec Yep, that's due to postgresql's type extensibility. You should use th workaround you point out below. > But > SELECT date_ AS LatestDate FROM the_table ORDER BY date_ DESC LIMIT 1; > => Index scan -> takes 0.18 msec > > MS SQLServer 2000: Use of an appropriate index _whenever_ aggregating. > > Am I doing something wrong? Nope. Chris
@Chris: > > SELECT count(*) FROM the_table > > => Seq scan -> takes about 12 sec > This cannot be made O(1) in postgres due to MVCC. You just have to live > with it. bad news BTW: in this case you could workaround select reltuples from pg_class where relname='the_table' (yes, I know: presumes a regular vacuum analyse) > Average and sum can never use an index AFAIK, in any db server. You > need information from every row. Take a look at the SQLSrv-pendant: create index x_1 on the_table (num_found) select avg(num_found) from the_table -> Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1]) (I'm not sure what Oracle does - have to re-install it first ...) @Scott: > Yes, you're expecting an MVCC database to behave like a row locking > database. hmmmm... So, it seems that PG is not soooo well suited for a datawarehouse and/or performing extensive statistics/calculations/reportingson large tables, is it? Greetings Harald
On Tue, 29 Jun 2004, Harald Lau (Sector-X) wrote: > > Average and sum can never use an index AFAIK, in any db server. You > > need information from every row. > > Take a look at the SQLSrv-pendant: > create index x_1 on the_table (num_found) > select avg(num_found) from the_table > -> Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1]) But is it really faster is the question? This sum needs all the values in that column. As far as I know it uses the index because it uses less space on disk and thus is a little faster due to less IO. In pg the index doesn't work like that, so in pg it's faster to sum all values using the table itself. If you have a WHERE clause to only sum some values, then pg will use an index (if applicable) and you will see a speedup. For min and max the situation is different, there an index can give you the answer without scanning all rows. For that the workaround exist in pg. The pg aggregate functions are very general and no one have special cased min/max yet. Until that happen the work around works and is fast. > So, it seems that PG is not soooo well suited for a datawarehouse and/or > performing extensive statistics/calculations/reportings on large tables, > is it? I don't see how you can say that from your example. Just because it uses an index for the sum above does not mean that it is a lot faster. It still have to do as many additions as pg has to do. Sure, mvcc is best when you have both read and writes. But it should still be comparable in speed even if you only do reads. -- /Dennis Björklund
On Tue, Jun 29, 2004 at 10:46:27 +0200, "Harald Lau (Sector-X)" <harald@sector-x.de> wrote: > > hmmmm... > So, it seems that PG is not soooo well suited for a datawarehouse and/or performing extensive statistics/calculations/reportingson large tables, is it? If you are doing lots of selects of aggregates relative to the number of updates, you can cache the values of interest in derived tables and use triggers to keep those tables up to date.
On Tue, 2004-06-29 at 02:46, Harald Lau (Sector-X) wrote: > @Chris: > > > > SELECT count(*) FROM the_table > > > => Seq scan -> takes about 12 sec > > This cannot be made O(1) in postgres due to MVCC. You just have to live > > with it. > > bad news > BTW: in this case you could workaround > select reltuples from pg_class where relname='the_table' > (yes, I know: presumes a regular vacuum analyse) Note that there ARE other options. While the inability to provide a speedy count is a "cost" of using an MVCC system, the ability to allow thousands of readers to run while updates are happening underneath them more than makes up for the slower aggregate performance. The other options to this problem involve maintaining another table that has a single (visible) row that is maintained by a trigger on the main table that fires and updates that single row to reflect the count of the table. This is costly on updates, but may be worth doing for certain situations. Personally, I haven't had a great need to do a count(*) on my tables that much. And on large tables, approximations are usually fine. > > Average and sum can never use an index AFAIK, in any db server. You > > need information from every row. > > Take a look at the SQLSrv-pendant: > create index x_1 on the_table (num_found) > select avg(num_found) from the_table > -> Index Scan(OBJECT:([midata].[dbo].[THE_TABLE].[x_1]) > > (I'm not sure what Oracle does - have to re-install it first ...) There's a good chance Oracle can use the index too. That's because both Oracle is still a row locked database at heart. It's MVCC system sits on top of it in roll back segments. So, the main store is serialized and can be indexed, while the updates live in the rollback segment. This, however, is not paradise. This limits Oracle's performance for things like long running transactions and makes it slower as the amount of information in the rollback segment grows. Meanwhile, PostgreSQL uses an in store MVCC mechanism. This system means that all index accesses must then hit the actual MVCC storage, since indexes aren't easily serialized. > @Scott: > > Yes, you're expecting an MVCC database to behave like a row locking > > database. > > hmmmm... > So, it seems that PG is not soooo well suited for a datawarehouse and/or performing extensive statistics/calculations/reportingson large tables, is it? On the contrary, it makes it GREAT for datawarehousing. Not because any one child process will be super fast, but because ALL the child processes will run reasonably fast, even under very heavy read and write load. Note that if you've got the memory for the hash agg algo to fire into shared memory, it's pretty darned fast now, so if the data (mostly) fit into kernel cache you're gold. And 12 gig Intel boxes aren't that expensive, compared to an Oracle license.
> Note that there ARE other options. While the inability to provide a > speedy count is a "cost" of using an MVCC system, the ability to allow > thousands of readers to run while updates are happening underneath them > more than makes up for the slower aggregate performance. IMO this depends on the priority of your application resp. the customers intentions and wishes > This, however, is not paradise. you can't have it all ;-) > On the contrary, it makes it GREAT for datawarehousing. Not because any > one child process will be super fast, but because ALL the child > processes will run reasonably fast, even under very heavy read and write > load. What I meant with datawarehouse are many db's at many locations whose data are to be collected in one central db in orderto mix em up, sum up or do anything equivalent. But in fact my quite heavy-read/write-accessed db is running really fast since 1 1/2 years now Even though still on PG 7.2 The one and only bottleneck are the statistics and the reports - and the tables are getting larger and larger ... > Note that if you've got the memory for the hash agg algo to fire > into shared memory, it's pretty darned fast now, yes, I've noticed here on the testing server > so if the data (mostly) > fit into kernel cache you're gold. And 12 gig Intel boxes aren't that > expensive, compared to an Oracle license. *that's* the point ... Anyway: Greetings and thanks for your answers Harald