Thread: Various performance questions
Hi, We're in the process of setting up a new database server. The application is an online rss aggregator which you can see at www.fastbuzz.com (still running with the old hardware). The new machine is a dual Xeon with 2 Gigs of ram The OS is freebsd 4.9. shared_buffers = 10000 sort_mem = 32768 effective_cache_size = 25520 -- freebsd forumla: vfs.hibufspace / 8192 1. While it seems to work correctly, I'm unclear on why this number is correct. 25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it seems like the number should be more like 1 - 1.5 Gigs. 2. The main performance challenges are with the items table which has around five million rows and grows at the rate of more than 100,000 rows a day. If I do a select count(*) from the items table it take 55 - 60 seconds to execute. I find it interesting that it takes that long whether it's doing it the first time and fetching the pages from disk or on subsequent request where it fetches the pages from memory. I know that it's not touching the disks because I'm running an iostat in a different window. Here's the explain analyze: explain analyze select count(*) from items; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=245377.53..245377.53 rows=1 width=0) (actual time=55246.035..55246.040 rows=1 loops=1) -> Seq Scan on items (cost=0.00..233100.62 rows=4910762 width=0) (actual time=0.054..30220.641 rows=4910762 loops=1) Total runtime: 55246.129 ms (3 rows) and the number of pages: select relpages from pg_class where relname = 'items'; relpages ---------- 183993 So does it make sense that it would take close to a minute to count the 5 million rows even if all pages are in memory? 3. Relpages is 183993 so file size should be 183993*8192 = 1507270656, roughly 1.5 gig. The actual disk size is 1073741824 or roughly 1 gig. Why the difference? 4. If I put a certain filter/condition on the query it tells me that it's doing a sequential scan, and yet it takes less time than a full sequential scan: explain analyze select count(*) from items where channel < 5000; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1) -> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057 loops=1) Filter: (channel < 5000) Total runtime: 26224.703 ms How can it do a sequential scan and apply a filter to it in less time than the full sequential scan? Is it actually using an index without really telling me? Here's the structure of the items table Column | Type | Modifiers ---------------+--------------------------+----------- articlenumber | integer | not null channel | integer | not null title | character varying | link | character varying | description | character varying | comments | character varying(500) | dtstamp | timestamp with time zone | signature | character varying(32) | pubdate | timestamp with time zone | Indexes: "item_channel_link" btree (channel, link) "item_created" btree (dtstamp) "item_signature" btree (signature) "items_channel_article" btree (channel, articlenumber) "items_channel_tstamp" btree (channel, dtstamp) 5. Any other comments/suggestions on the above setup. Thanks, Dror -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com
Dror Matalon <dror@zapatec.com> writes: > explain analyze select count(*) from items where channel < 5000; > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1) > -> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057 loops=1) > Filter: (channel < 5000) > Total runtime: 26224.703 ms > > > How can it do a sequential scan and apply a filter to it in less time > than the full sequential scan? Is it actually using an index without > really telling me? It's not using the index and not telling you. It's possible the count(*) operator itself is taking some time. Postgres doesn't have to call it on the rows that don't match the where clause. How long does "explain analyze select 1 from items" with and without the where clause take? What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an int8 to store its count so it's not limited to 4 billion records. Unfortunately int8 is somewhat inefficient as it has to be dynamically allocated repeatedly. It's possible it's making a noticeable difference, especially with all the pages in cache, though I'm a bit surprised. There's some thought about optimizing this in 7.5. -- greg
On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote: > Dror Matalon <dror@zapatec.com> writes: > > > explain analyze select count(*) from items where channel < 5000; > > QUERY PLAN > > -------------------------------------------------------------------------------------------------------------------------- > > Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1) > > -> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057 loops=1) > > Filter: (channel < 5000) > > Total runtime: 26224.703 ms > > > > > > How can it do a sequential scan and apply a filter to it in less time > > than the full sequential scan? Is it actually using an index without > > really telling me? > > It's not using the index and not telling you. > > It's possible the count(*) operator itself is taking some time. Postgres I find it hard to believe that the actual counting would take a significant amount of time. > doesn't have to call it on the rows that don't match the where clause. How > long does "explain analyze select 1 from items" with and without the where > clause take? Same as count(*). Around 55 secs with no where clause, around 25 secs with. > > What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an This is 7.4. > int8 to store its count so it's not limited to 4 billion records. > Unfortunately int8 is somewhat inefficient as it has to be dynamically > allocated repeatedly. It's possible it's making a noticeable difference, > especially with all the pages in cache, though I'm a bit surprised. There's > some thought about optimizing this in 7.5. > > -- > greg > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com
dror@zapatec.com (Dror Matalon) wrote: > On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote: >> Dror Matalon <dror@zapatec.com> writes: >> >> > explain analyze select count(*) from items where channel < 5000; >> > QUERY PLAN >> > -------------------------------------------------------------------------------------------------------------------------- >> > Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1) >> > -> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057 loops=1) >> > Filter: (channel < 5000) >> > Total runtime: 26224.703 ms >> > >> > >> > How can it do a sequential scan and apply a filter to it in less time >> > than the full sequential scan? Is it actually using an index without >> > really telling me? >> >> It's not using the index and not telling you. >> >> It's possible the count(*) operator itself is taking some time. Postgres > > I find it hard to believe that the actual counting would take a > significant amount of time. Most of the time involves: a) Reading each page of the table, and b) Figuring out which records on those pages are still "live." What work were you thinking was involved in doing the counting? >> doesn't have to call it on the rows that don't match the where clause. How >> long does "explain analyze select 1 from items" with and without the where >> clause take? > > Same as count(*). Around 55 secs with no where clause, around 25 secs > with. Good; at least that's consistent... -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www3.sympatico.ca/cbbrowne/postgresql.html Signs of a Klingon Programmer #2: "You question the worthiness of my code? I should kill you where you stand!"
On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote: > dror@zapatec.com (Dror Matalon) wrote: > > On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote: > >> Dror Matalon <dror@zapatec.com> writes: > >> > >> > explain analyze select count(*) from items where channel < 5000; > >> > QUERY PLAN > >> > -------------------------------------------------------------------------------------------------------------------------- > >> > Aggregate (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1) > >> > -> Seq Scan on items (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057 loops=1) > >> > Filter: (channel < 5000) > >> > Total runtime: 26224.703 ms > >> > > >> > > >> > How can it do a sequential scan and apply a filter to it in less time > >> > than the full sequential scan? Is it actually using an index without > >> > really telling me? > >> > >> It's not using the index and not telling you. > >> > >> It's possible the count(*) operator itself is taking some time. Postgres > > > > I find it hard to believe that the actual counting would take a > > significant amount of time. > > Most of the time involves: > > a) Reading each page of the table, and > b) Figuring out which records on those pages are still "live." The table has been VACUUM ANALYZED so that there are no "dead" records. It's still not clear why select count() would be slower than select with a "where" clause. > > What work were you thinking was involved in doing the counting? I was answering an earlier response that suggested that maybe the actual counting took time so it would take quite a bit longer when there are more rows to count. > > >> doesn't have to call it on the rows that don't match the where clause. How > >> long does "explain analyze select 1 from items" with and without the where > >> clause take? > > > > Same as count(*). Around 55 secs with no where clause, around 25 secs > > with. > > Good; at least that's consistent... > -- > (format nil "~S@~S" "cbbrowne" "acm.org") > http://www3.sympatico.ca/cbbrowne/postgresql.html > Signs of a Klingon Programmer #2: "You question the worthiness of my > code? I should kill you where you stand!" > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com
Dror Matalon wrote: > On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote: >>Most of the time involves: >> >> a) Reading each page of the table, and >> b) Figuring out which records on those pages are still "live." > > > The table has been VACUUM ANALYZED so that there are no "dead" records. > It's still not clear why select count() would be slower than select with > a "where" clause. Do a vacuum verbose full and then everything should be within small range of each other. Also in the where clause, does explicitly typecasting helps? Like 'where channel<5000::int2;' HTH Shridhar
On Mon, Oct 27, 2003 at 12:52:27PM +0530, Shridhar Daithankar wrote: > Dror Matalon wrote: > > >On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote: > >>Most of the time involves: > >> > >>a) Reading each page of the table, and > >>b) Figuring out which records on those pages are still "live." > > > > > >The table has been VACUUM ANALYZED so that there are no "dead" records. > >It's still not clear why select count() would be slower than select with > >a "where" clause. > > Do a vacuum verbose full and then everything should be within small range > of each other. > I did vaccum full verbose and the results are the same as before, 55 seconds for count(*) and 26 seconds for count(*) where channel < 5000. > Also in the where clause, does explicitly typecasting helps? > > Like 'where channel<5000::int2;' It makes no difference. > > HTH > > Shridhar > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com
In the last exciting episode, dror@zapatec.com (Dror Matalon) wrote: > I was answering an earlier response that suggested that maybe the actual > counting took time so it would take quite a bit longer when there are > more rows to count. Well, if a "where clause" allows the system to use an index to search for the subset of elements, that would reduce the number of pages that have to be examined, thereby diminishing the amount of work. Why don't you report what EXPLAIN ANALYZE returns as output for the query with WHERE clause? That would allow us to get more of an idea of what is going on... -- (format nil "~S@~S" "cbbrowne" "acm.org") http://www3.sympatico.ca/cbbrowne/spiritual.html When replying, it is often possible to cleverly edit the original message in such a way as to subtly alter its meaning or tone to your advantage while appearing that you are taking pains to preserve the author's intent. As a bonus, it will seem that your superior intellect is cutting through all the excess verbiage to the very heart of the matter. -- from the Symbolics Guidelines for Sending Mail
Christopher Browne <cbbrowne@acm.org> writes: > In the last exciting episode, dror@zapatec.com (Dror Matalon) wrote: > > I was answering an earlier response that suggested that maybe the actual > > counting took time so it would take quite a bit longer when there are > > more rows to count. That was my theory. I guess it's wrong. There is other work involved in processing a record, but i'm surprised it's as long as the work to actually pull the record from kernel and check if it's visible. > Well, if a "where clause" allows the system to use an index to search > for the subset of elements, that would reduce the number of pages that > have to be examined, thereby diminishing the amount of work. it's not. therein lies the mystery. > Why don't you report what EXPLAIN ANALYZE returns as output for the > query with WHERE clause? That would allow us to get more of an idea > of what is going on... He did, right at the start of the thread. For a 1 million record table without he's seeing select 1 from tab select count(*) from tab being comparable with only a slight delay for the count(*) whereas select 1 from tab where c < 1000 select count(*) from tab where c < 1000 are much faster even though they still use a sequential scan. I'm puzzled why the where clause speeds things up as much as it does. -- greg
On Sun, 26 Oct 2003, Dror Matalon wrote: > Here's the structure of the items table [snip] > pubdate | timestamp with time zone | > Indexes: > "item_channel_link" btree (channel, link) > "item_created" btree (dtstamp) > "item_signature" btree (signature) > "items_channel_article" btree (channel, articlenumber) > "items_channel_tstamp" btree (channel, dtstamp) > > > 5. Any other comments/suggestions on the above setup. Try set enable_seqscan = off; set enable_indexscan = on; to force the planner to use one of the indexes. Analyze the queries from your application and see what are the most used columns in WHERE clauses and recreate the indexes. select count(*) from items where channel < 5000; will never use any of the current indexes because none matches your WHERE clause (channel appears now only in multicolumn indexes). -- Any views or opinions presented within this e-mail are solely those of the author and do not necessarily represent those of any company, unless otherwise expressly stated.
On Mon, 2003-10-27 at 10:15, Tarhon-Onu Victor wrote: > select count(*) from items where channel < > 5000; will never use any of the current indexes because none matches > your WHERE clause (channel appears now only in multicolumn indexes). No -- a multi-column index can be used to answer queries on a prefix of the index's column list. So an index on (channel, xyz) can be used to answer queries on (just) "channel". -Neil
On Sun, 2003-10-26 at 22:49, Greg Stark wrote: > What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an > int8 to store its count so it's not limited to 4 billion records. > Unfortunately int8 is somewhat inefficient as it has to be dynamically > allocated repeatedly. Uh, what? Why would an int8 need to be "dynamically allocated repeatedly"? -Neil
>>>>> "DM" == Dror Matalon <dror@zapatec.com> writes: DM> effective_cache_size = 25520 -- freebsd forumla: vfs.hibufspace / 8192 DM> 1. While it seems to work correctly, I'm unclear on why this number is DM> correct. 25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it DM> seems like the number should be more like 1 - 1.5 Gigs. Nope, that's correct... -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
On Mon, Oct 27, 2003 at 11:12:37AM -0500, Vivek Khera wrote: > >>>>> "DM" == Dror Matalon <dror@zapatec.com> writes: > > DM> effective_cache_size = 25520 -- freebsd forumla: vfs.hibufspace / 8192 > > DM> 1. While it seems to work correctly, I'm unclear on why this number is > DM> correct. 25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it > DM> seems like the number should be more like 1 - 1.5 Gigs. > > Nope, that's correct... I know it's correct. I was asking why it's correct. > > > -- > =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Vivek Khera, Ph.D. Khera Communications, Inc. > Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 > AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com
On Mon, Oct 27, 2003 at 07:52:06AM -0500, Christopher Browne wrote: > In the last exciting episode, dror@zapatec.com (Dror Matalon) wrote: > > I was answering an earlier response that suggested that maybe the actual > > counting took time so it would take quite a bit longer when there are > > more rows to count. > > Well, if a "where clause" allows the system to use an index to search > for the subset of elements, that would reduce the number of pages that > have to be examined, thereby diminishing the amount of work. > > Why don't you report what EXPLAIN ANALYZE returns as output for the > query with WHERE clause? That would allow us to get more of an idea > of what is going on... Here it is once again, and I've added another data poing "channel < 1000" which takes even less time than channel < 5000. It almost seems like the optimizer knows that it can skip certain rows "rows=4910762" vs "rows=1505605" . But how can it do that without using an index or actually looking at each row? zp1936=> EXPLAIN ANALYZE select count(*) from items; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=245044.53..245044.53 rows=1 width=0) (actual time=55806.893..55806.897 rows=1 loops=1) -> Seq Scan on items (cost=0.00..232767.62 rows=4910762 width=0) (actual time=0.058..30481.482 rows=4910762 loops=1) Total runtime: 55806.992 ms (3 rows) zp1936=> EXPLAIN ANALYZE select count(*) from items where channel < 5000; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=248808.54..248808.54 rows=1 width=0) (actual time=26071.264..26071.269 rows=1 loops=1) -> Seq Scan on items (cost=0.00..245044.52 rows=1505605 width=0) (actual time=0.161..17623.033 rows=1632057 loops=1) Filter: (channel < 5000) Total runtime: 26071.361 ms (4 rows) zp1936=> EXPLAIN ANALYZE select count(*) from items where channel < 1000; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Aggregate (cost=245429.74..245429.74 rows=1 width=0) (actual time=10225.272..10225.276 rows=1 loops=1) -> Seq Scan on items (cost=0.00..245044.52 rows=154085 width=0) (actual time=7.633..10083.246 rows=25687 loops=1) Filter: (channel < 1000) Total runtime: 10225.373 ms (4 rows) > -- > (format nil "~S@~S" "cbbrowne" "acm.org") > http://www3.sympatico.ca/cbbrowne/spiritual.html > When replying, it is often possible to cleverly edit the original > message in such a way as to subtly alter its meaning or tone to your > advantage while appearing that you are taking pains to preserve the > author's intent. As a bonus, it will seem that your superior > intellect is cutting through all the excess verbiage to the very heart > of the matter. -- from the Symbolics Guidelines for Sending Mail > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.zapatec.com
Neil Conway <neilc@samurai.com> writes: > On Sun, 2003-10-26 at 22:49, Greg Stark wrote: > > What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an > > int8 to store its count so it's not limited to 4 billion records. > > Unfortunately int8 is somewhat inefficient as it has to be dynamically > > allocated repeatedly. > > Uh, what? Why would an int8 need to be "dynamically allocated > repeatedly"? Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that profiling showed that the bulk of the cost in count() went to allocating int8s. He commented that this could be optimized by having count() and sum() bypass the regular api. I don't have the original message handy. -- greg
On Mon, 2003-10-27 at 12:56, Greg Stark wrote: > Neil Conway <neilc@samurai.com> writes: > > Uh, what? Why would an int8 need to be "dynamically allocated > > repeatedly"? > > Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that > profiling showed that the bulk of the cost in count() went to allocating > int8s. He commented that this could be optimized by having count() and sum() > bypass the regular api. I don't have the original message handy. I'm still confused: int64 should be stack-allocated, AFAICS. Tom, do you recall what the issue here is? -Neil
Neil Conway <neilc@samurai.com> writes: > On Mon, 2003-10-27 at 12:56, Greg Stark wrote: >> Neil Conway <neilc@samurai.com> writes: >>> Uh, what? Why would an int8 need to be "dynamically allocated >>> repeatedly"? >> >> Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that >> profiling showed that the bulk of the cost in count() went to allocating >> int8s. He commented that this could be optimized by having count() and sum() >> bypass the regular api. I don't have the original message handy. > I'm still confused: int64 should be stack-allocated, AFAICS. Tom, do you > recall what the issue here is? Greg is correct. int8 is a pass-by-reference datatype and so every aggregate state-transition function cycle requires at least one palloc (to return the function result). I think in the current state of the code it requires two pallocs :-(, because we can't trust the transition function to palloc its result in the right context without palloc'ing leaked junk in that context, so an extra palloc is needed to copy the result Datum into a longer-lived context than we call the function in. There was some speculation a few weeks ago about devising a way to let performance-critical transition functions avoid the excess palloc's by working with a specialized API instead of the standard function call API, but I think it'll have to wait for 7.5. regards, tom lane
On Mon, 2003-10-27 at 13:52, Tom Lane wrote: > Greg is correct. int8 is a pass-by-reference datatype and so every > aggregate state-transition function cycle requires at least one palloc > (to return the function result). Interesting. Is there a reason why int8 is pass-by-reference? (ISTM that pass-by-value would be sufficient...) Thanks for the information, Tom & Greg. -Neil
Neil Conway <neilc@samurai.com> writes: > Interesting. Is there a reason why int8 is pass-by-reference? Pass-by-value types have to fit into Datum. On a 64-bit machine (ie, one where pointers are 64-bits anyway) it would make sense to convert int8 (and float8 too) into pass-by-value types. If the machine does not already need Datum to be 8 bytes, though, I think that widening Datum to 8 bytes just for the benefit of these two datatypes would be a serious net loss. Not to mention that it would just plain break everything on machines with no native 8-byte-int datatype. One of the motivations for the version-1 function call protocol was to allow the pass-by-value-or-by-ref nature of these datatypes to be hidden from most of the code, with an eye to someday making this a platform-specific choice. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg is correct. int8 is a pass-by-reference datatype Just to keep the conversation on track. the evidence from this particular post seems to indicate that my theory was wrong and the overhead for count(*) is _not_ a big time sink. It seems to be at most 10% and usually less. A simple "select 1 from tab" takes nearly as long. I'm still puzzled why the times on these are so different when the latter returns fewer records and both are doing sequential scans: select 1 from tab select 1 from tab where a < 1000 -- greg
In fact the number of records seems to be almost irrelevant. A sequential scan takes almost exactly the same amount of time up until a critical region (for me around 100000 records) at which point it starts going up very quickly. It's almost as if it's doing some disk i/o, but I'm watching vmstat and don't see anything. And in any case it would have to read all the same blocks to do the sequential scan regardless of how many records match, no? I don't hear the disk seeking either -- though oddly there is some sound coming from the computer when this computer running. It sounds like a high pitched sound, almost like a floppy drive reading without seeking. Perhaps there is some i/o happening and linux is lying about it? Perhaps I'm not hearing seeking because it's reading everything from one track and not seeking? Very strange. slo=> explain analyze select 1::int4 from test where a < 1 ; QUERY PLAN ------------------------------------------------------------------------------------------------------ Seq Scan on test (cost=0.00..1693.00 rows=11 width=0) (actual time=417.468..417.468 rows=0 loops=1) Filter: (a < 1) Total runtime: 417.503 ms (3 rows) Time: 418.181 ms slo=> explain analyze select 1::int4 from test where a < 100 ; QUERY PLAN ----------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1693.00 rows=53 width=0) (actual time=0.987..416.224 rows=50 loops=1) Filter: (a < 100) Total runtime: 416.301 ms (3 rows) Time: 417.008 ms slo=> explain analyze select 1::int4 from test where a < 10000 ; QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1693.00 rows=5283 width=0) (actual time=0.812..434.967 rows=5000 loops=1) Filter: (a < 10000) Total runtime: 439.620 ms (3 rows) Time: 440.665 ms slo=> explain analyze select 1::int4 from test where a < 100000 ; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1693.00 rows=50076 width=0) (actual time=0.889..458.623 rows=50000 loops=1) Filter: (a < 100000) Total runtime: 491.281 ms (3 rows) Time: 491.998 ms slo=> explain analyze select 1::int4 from test where a < 1000000 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Seq Scan on test (cost=0.00..1693.00 rows=99991 width=0) (actual time=0.018..997.421 rows=715071 loops=1) Filter: (a < 1000000) Total runtime: 1461.851 ms (3 rows) Time: 1462.898 ms slo=> explain analyze select 1::int4 from test where a < 10000000 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..1693.00 rows=99991 width=0) (actual time=0.015..1065.456 rows=800000 loops=1) Filter: (a < 10000000) Total runtime: 1587.481 ms (3 rows) -- greg
Greg Stark <gsstark@mit.edu> writes: > I'm still puzzled why the times on these are so different when the latter > returns fewer records and both are doing sequential scans: My best guess is that it's simply the per-tuple overhead of cycling tuples through the two plan nodes. When you have no actual I/O happening, the seqscan runtime is going to be all CPU time, something of the form cost_per_page * number_of_pages_processed + cost_per_tuple_scanned * number_of_tuples_scanned + cost_per_tuple_returned * number_of_tuples_returned I don't have numbers for the relative sizes of those three costs, but I doubt that any of them are negligible compared to the other two. Adding a WHERE clause increases cost_per_tuple_scanned but reduces the number_of_tuples_returned, and so it cuts the contribution from the third term, evidently by more than the WHERE clause adds to the second term. Ny own profiling had suggested that the cost-per-tuple-scanned in the aggregate node dominated the seqscan CPU costs, but that might be platform-specific, or possibly have something to do with the fact that I was profiling an assert-enabled build. It might be worth pointing out that EXPLAIN ANALYZE adds two kernel calls (gettimeofday or some such) into each cycle of the plan nodes; that's probably inflating the cost_per_tuple_returned by a noticeable amount. regards, tom lane