Thread: Bug#48582: psql spends hours computing results it already knows (fwd)
This is a Debian bug report, which needs upstream attention. ------- Forwarded Message Date: Thu, 28 Oct 1999 13:45:18 -0400 From: Brian Ristuccia <brianr@osiris.978.org> To: submit@bugs.debian.org Subject: Bug#48582: psql spends hours computing results it already knows Package: postgresql Version: 6.5.2-3 massive_db=> explain select count(*) from huge_table; NOTICE: QUERY PLAN: Aggregate (cost=511.46 rows=9923 width=12) -> Seq Scan on huge_table (cost=511.46 rows=9923 width=12) EXPLAIN If huge_table really is huge -- like 9,000,000 rows instead of 9923, after postgresql already knows the number of rows (that's how it determines the cost), it proceeds to do a very long and CPU/IO intensive seq scan to determine the count(). - -- Brian Ristuccia brianr@osiris.978.org bristucc@nortelnetworks.com bristucc@cs.uml.edu ------- End of Forwarded Message -- Vote against SPAM: http://www.politik-digital.de/spam/ ======================================== Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver PGP key from public servers; key ID32B8FAA1 ======================================== "Cast thy burden upon the LORD, and he shall sustain thee; he shall never allow the righteous to fall." Psalms 55:22
Re: [HACKERS] Bug#48582: psql spends hours computing results it already knows (fwd)
From
"Ross J. Reedstrom"
Date:
Oliver (& Brian) - Hmm, that happens to not be the case. The rows=XXXX number is drawn from the statistics for the table, which are only updated on VACUUM ANALYZE of that table. Easily tested: just INSERT a couple rows and do the EXPLAIN again. The rows=XXX won't change. Ah, here's an example: a table I've never vacuumed, until now (it's in a test copy of a db) test=> select count(*) from "Personnel"; count ----- 177 (1 row) test=> explain select count(*) from "Personnel"; NOTICE: QUERY PLAN: Aggregate (cost=43.00 rows=1000 width=4) -> Seq Scan on Personnel (cost=43.00 rows=1000 width=4) EXPLAIN test=> vacuum analyze "Personnel"; VACUUM test=> explain select count(*) from "Personnel"; NOTICE: QUERY PLAN: Aggregate (cost=7.84 rows=177 width=4) -> Seq Scan on Personnel (cost=7.84 rows=177 width=4) EXPLAIN test=> Ross On Thu, Oct 28, 1999 at 08:53:31PM +0100, Oliver Elphick wrote: > This is a Debian bug report, which needs upstream attention. > > ------- Forwarded Message > > Date: Thu, 28 Oct 1999 13:45:18 -0400 > From: Brian Ristuccia <brianr@osiris.978.org> > To: submit@bugs.debian.org > Subject: Bug#48582: psql spends hours computing results it already knows > > Package: postgresql > Version: 6.5.2-3 > > massive_db=> explain select count(*) from huge_table; > NOTICE: QUERY PLAN: > > Aggregate (cost=511.46 rows=9923 width=12) > -> Seq Scan on huge_table (cost=511.46 rows=9923 width=12) > > EXPLAIN > > If huge_table really is huge -- like 9,000,000 rows instead of 9923, after > postgresql already knows the number of rows (that's how it determines the > cost), it proceeds to do a very long and CPU/IO intensive seq scan to > determine the count(). > > - -- > Brian Ristuccia > brianr@osiris.978.org > bristucc@nortelnetworks.com > bristucc@cs.uml.edu >
Re: [HACKERS] Bug#48582: psql spends hours computing results it already knows (fwd)
From
Brian Ristuccia
Date:
On Thu, Oct 28, 1999 at 03:32:17PM -0500, Ross J. Reedstrom wrote: > Oliver (& Brian) - > Hmm, that happens to not be the case. The rows=XXXX number is drawn > from the statistics for the table, which are only updated on VACUUM > ANALYZE of that table. Easily tested: just INSERT a couple rows and do > the EXPLAIN again. The rows=XXX won't change. Ah, here's an example: > a table I've never vacuumed, until now (it's in a test copy of a db) Aah.. Is there any other more efficient way of determining the number of rows in a table? It seems a sequential scan takes forever, but the database must already have some idea (somewhere) of how many records are in the table otherwise how would it know where to start/stop the sequential scan? > > test=> select count(*) from "Personnel"; > count > ----- > 177 > (1 row) > > test=> explain select count(*) from "Personnel"; > NOTICE: QUERY PLAN: > > Aggregate (cost=43.00 rows=1000 width=4) > -> Seq Scan on Personnel (cost=43.00 rows=1000 width=4) > > EXPLAIN > test=> vacuum analyze "Personnel"; > VACUUM > test=> explain select count(*) from "Personnel"; > NOTICE: QUERY PLAN: > > Aggregate (cost=7.84 rows=177 width=4) > -> Seq Scan on Personnel (cost=7.84 rows=177 width=4) > > EXPLAIN > test=> > > Ross > > On Thu, Oct 28, 1999 at 08:53:31PM +0100, Oliver Elphick wrote: > > This is a Debian bug report, which needs upstream attention. > > > > ------- Forwarded Message > > > > Date: Thu, 28 Oct 1999 13:45:18 -0400 > > From: Brian Ristuccia <brianr@osiris.978.org> > > To: submit@bugs.debian.org > > Subject: Bug#48582: psql spends hours computing results it already knows > > > > Package: postgresql > > Version: 6.5.2-3 > > > > massive_db=> explain select count(*) from huge_table; > > NOTICE: QUERY PLAN: > > > > Aggregate (cost=511.46 rows=9923 width=12) > > -> Seq Scan on huge_table (cost=511.46 rows=9923 width=12) > > > > EXPLAIN > > > > If huge_table really is huge -- like 9,000,000 rows instead of 9923, after > > postgresql already knows the number of rows (that's how it determines the > > cost), it proceeds to do a very long and CPU/IO intensive seq scan to > > determine the count(). > > > > - -- > > Brian Ristuccia > > brianr@osiris.978.org > > bristucc@nortelnetworks.com > > bristucc@cs.uml.edu > > -- Brian Ristuccia brianr@osiris.978.org bristucc@nortelnetworks.com bristucc@cs.uml.edu
Re: [HACKERS] Bug#48582: psql spends hours computing results it already knows (fwd)
From
Tom Lane
Date:
"Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu> writes: > Hmm, that happens to not be the case. The rows=XXXX number is drawn > from the statistics for the table, which are only updated on VACUUM > ANALYZE of that table. Easily tested: just INSERT a couple rows and do > the EXPLAIN again. The rows=XXX won't change. The short answer to this is that maintaining a perfectly accurate tuple count on-the-fly would almost certainly cost more, totalled over all operations that modify a table, than we could ever hope to make back by short-circuiting "select count(*)" operations. (Consider concurrent transactions running in multiple backends, some of which may abort instead of committing, and others of which may already have committed but your transaction is not supposed to be able to see their effects...) The optimizer is perfectly happy with approximate tuple counts, so it makes do with stats recorded at the last VACUUM. This has been discussed quite recently on pg-hackers; see the archives for more info. regards, tom lane
Re: [HACKERS] Bug#48582: psql spends hours computing results it already knows (fwd)
From
Brian E Gallew
Date:
Then <tgl@sss.pgh.pa.us> spoke up and said: > The short answer to this is that maintaining a perfectly accurate tuple > count on-the-fly would almost certainly cost more, totalled over all > operations that modify a table, than we could ever hope to make back > by short-circuiting "select count(*)" operations. (Consider > concurrent transactions running in multiple backends, some of which > may abort instead of committing, and others of which may already have > committed but your transaction is not supposed to be able to see their > effects...) So, does the planner allow counting from a unique index (if one exists)? In general, an index scan on a unique index should be faster than a table scan. Of course, I'm sure someone already thought of this... -- ===================================================================== | JAVA must have been developed in the wilds of West Virginia. | | After all, why else would it support only single inheritance?? | ===================================================================== | Finger geek@cmu.edu for my public key. | =====================================================================
Re: [HACKERS] Bug#48582: psql spends hours computing results it already knows (fwd)
From
Tom Lane
Date:
Brian E Gallew <geek+@cmu.edu> writes: > So, does the planner allow counting from a unique index (if one > exists)? In general, an index scan on a unique index should be faster > than a table scan. Of course, I'm sure someone already thought of this... Vadim will have to check me on this, but I believe that index entries don't contain transaction information --- that is, you can determine whether a tuple matches a specified search key by examining the index, but in order to discover whether the tuple is actually *valid* (according to your transaction's worldview) you must fetch the tuple itself from the main table. So scanning an index cannot be cheaper than a sequential scan of the main table, except when the index allows you to avoid visiting most of the tuples in the main table. There has been some discussion of allowing scans of indexes without fetching the underlying tuples, but AFAICS that would mean replicating the tuple transaction status information into (each!) index, which'd be a big hit in both disk space and number of disk writes implied by committing a tuple. I've got my doubts about it being a win... regards, tom lane
Re: [HACKERS] Bug#48582: psql spends hours computing results it already knows (fwd)
From
Brian E Gallew
Date:
Then <tgl@sss.pgh.pa.us> spoke up and said: > Vadim will have to check me on this, but I believe that index entries > don't contain transaction information --- that is, you can determine > whether a tuple matches a specified search key by examining the index, > but in order to discover whether the tuple is actually *valid* > (according to your transaction's worldview) you must fetch the tuple > itself from the main table. So scanning an index cannot be cheaper than > a sequential scan of the main table, except when the index allows you to > avoid visiting most of the tuples in the main table. Right. As usual, I've overlooked something obvious. So, this really wouldn't work unless we had an exclusive table lock ('cause then there wouldn't be any other transactions to worry about, except for our own). Feh. -- ===================================================================== | JAVA must have been developed in the wilds of West Virginia. | | After all, why else would it support only single inheritance?? | ===================================================================== | Finger geek@cmu.edu for my public key. | =====================================================================