Thread: Various performance questions

Various performance questions

From
Dror Matalon
Date:
Hi,

We're in the process of setting up a new database server. The
application is an online rss aggregator which you can see at
www.fastbuzz.com (still running with the old hardware).

The new machine is a dual Xeon with 2 Gigs of ram

The OS is freebsd 4.9.

shared_buffers = 10000
sort_mem = 32768
effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192

1. While it seems to work correctly, I'm unclear on why this number is
correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
seems like the number should be more like 1 - 1.5 Gigs.

2.  The main performance challenges are with the items table which has around
five million rows and grows at the rate of more than 100,000 rows a day.

If I do a select count(*) from the items table it take 55 - 60 seconds
to execute. I find it interesting that it takes that long whether it's
doing it the first time and fetching the pages from disk or on
subsequent request where it fetches the pages from memory.
I know that it's not touching the disks because I'm running an iostat in
a different window. Here's the explain analyze:

explain analyze select count(*) from items;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=245377.53..245377.53 rows=1 width=0) (actual time=55246.035..55246.040 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..233100.62 rows=4910762 width=0)
(actual time=0.054..30220.641 rows=4910762 loops=1)
 Total runtime: 55246.129 ms
(3 rows)

and the number of pages:

select relpages from pg_class where relname = 'items';
 relpages
----------
   183993


So does it make sense that it would take close to a minute to count the 5 million rows
even if all pages are in memory?

3. Relpages is 183993 so file size should be  183993*8192 = 1507270656,
roughly 1.5 gig. The actual disk size is 1073741824 or roughly 1 gig.
Why the difference?



4. If I put a certain filter/condition on the query it tells me that it's doing
a sequential scan, and yet it takes less time than a full sequential
scan:

explain analyze select count(*) from items where channel < 5000;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057
loops=1)
         Filter: (channel < 5000)
 Total runtime: 26224.703 ms


How can it do a sequential scan and apply a filter to it in less time
than the full sequential scan? Is it actually using an index without
really telling me?


Here's the structure of the items table

    Column     |           Type           | Modifiers
---------------+--------------------------+-----------
 articlenumber | integer                  | not null
 channel       | integer                  | not null
 title         | character varying        |
 link          | character varying        |
 description   | character varying        |
 comments      | character varying(500)   |
 dtstamp       | timestamp with time zone |
 signature     | character varying(32)    |
 pubdate       | timestamp with time zone |
Indexes:
    "item_channel_link" btree (channel, link)
    "item_created" btree (dtstamp)
    "item_signature" btree (signature)
    "items_channel_article" btree (channel, articlenumber)
    "items_channel_tstamp" btree (channel, dtstamp)


5. Any other comments/suggestions on the above setup.

Thanks,

Dror

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Various performance questions

From
Greg Stark
Date:
Dror Matalon <dror@zapatec.com> writes:

> explain analyze select count(*) from items where channel < 5000;
>                                                         QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1)
>    ->  Seq Scan on items  (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057
loops=1)
>          Filter: (channel < 5000)
>  Total runtime: 26224.703 ms
>
>
> How can it do a sequential scan and apply a filter to it in less time
> than the full sequential scan? Is it actually using an index without
> really telling me?

It's not using the index and not telling you.

It's possible the count(*) operator itself is taking some time. Postgres
doesn't have to call it on the rows that don't match the where clause. How
long does "explain analyze select 1 from items" with and without the where
clause take?

What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an
int8 to store its count so it's not limited to 4 billion records.
Unfortunately int8 is somewhat inefficient as it has to be dynamically
allocated repeatedly. It's possible it's making a noticeable difference,
especially with all the pages in cache, though I'm a bit surprised. There's
some thought about optimizing this in 7.5.

--
greg

Re: Various performance questions

From
Dror Matalon
Date:
On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote:
> Dror Matalon <dror@zapatec.com> writes:
>
> > explain analyze select count(*) from items where channel < 5000;
> >                                                         QUERY PLAN
> >
--------------------------------------------------------------------------------------------------------------------------
> >  Aggregate  (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1)
> >    ->  Seq Scan on items  (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057
loops=1)
> >          Filter: (channel < 5000)
> >  Total runtime: 26224.703 ms
> >
> >
> > How can it do a sequential scan and apply a filter to it in less time
> > than the full sequential scan? Is it actually using an index without
> > really telling me?
>
> It's not using the index and not telling you.
>
> It's possible the count(*) operator itself is taking some time. Postgres

I find it hard to believe that the actual counting would take a
significant amount of time.

> doesn't have to call it on the rows that don't match the where clause. How
> long does "explain analyze select 1 from items" with and without the where
> clause take?

Same as count(*). Around 55 secs with no where clause, around 25 secs
with.

>
> What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an

This is 7.4.

> int8 to store its count so it's not limited to 4 billion records.
> Unfortunately int8 is somewhat inefficient as it has to be dynamically
> allocated repeatedly. It's possible it's making a noticeable difference,
> especially with all the pages in cache, though I'm a bit surprised. There's
> some thought about optimizing this in 7.5.
>
> --
> greg
>

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Various performance questions

From
Christopher Browne
Date:
dror@zapatec.com (Dror Matalon) wrote:
> On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote:
>> Dror Matalon <dror@zapatec.com> writes:
>>
>> > explain analyze select count(*) from items where channel < 5000;
>> >                                                         QUERY PLAN
>> >
--------------------------------------------------------------------------------------------------------------------------
>> >  Aggregate  (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1)
>> >    ->  Seq Scan on items  (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057
loops=1)
>> >          Filter: (channel < 5000)
>> >  Total runtime: 26224.703 ms
>> >
>> >
>> > How can it do a sequential scan and apply a filter to it in less time
>> > than the full sequential scan? Is it actually using an index without
>> > really telling me?
>>
>> It's not using the index and not telling you.
>>
>> It's possible the count(*) operator itself is taking some time. Postgres
>
> I find it hard to believe that the actual counting would take a
> significant amount of time.

Most of the time involves:

 a) Reading each page of the table, and
 b) Figuring out which records on those pages are still "live."

What work were you thinking was involved in doing the counting?

>> doesn't have to call it on the rows that don't match the where clause. How
>> long does "explain analyze select 1 from items" with and without the where
>> clause take?
>
> Same as count(*). Around 55 secs with no where clause, around 25 secs
> with.

Good; at least that's consistent...
--
(format nil "~S@~S" "cbbrowne" "acm.org")
http://www3.sympatico.ca/cbbrowne/postgresql.html
Signs of a Klingon  Programmer #2: "You  question the worthiness of my
code? I should kill you where you stand!"

Re: Various performance questions

From
Dror Matalon
Date:
On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote:
> dror@zapatec.com (Dror Matalon) wrote:
> > On Sun, Oct 26, 2003 at 10:49:29PM -0500, Greg Stark wrote:
> >> Dror Matalon <dror@zapatec.com> writes:
> >>
> >> > explain analyze select count(*) from items where channel < 5000;
> >> >                                                         QUERY PLAN
> >> >
--------------------------------------------------------------------------------------------------------------------------
> >> >  Aggregate  (cost=249141.54..249141.54 rows=1 width=0) (actual time=26224.603..26224.608 rows=1 loops=1)
> >> >    ->  Seq Scan on items  (cost=0.00..245377.52 rows=1505605 width=0) (actual time=7.599..17686.869 rows=1632057
loops=1)
> >> >          Filter: (channel < 5000)
> >> >  Total runtime: 26224.703 ms
> >> >
> >> >
> >> > How can it do a sequential scan and apply a filter to it in less time
> >> > than the full sequential scan? Is it actually using an index without
> >> > really telling me?
> >>
> >> It's not using the index and not telling you.
> >>
> >> It's possible the count(*) operator itself is taking some time. Postgres
> >
> > I find it hard to believe that the actual counting would take a
> > significant amount of time.
>
> Most of the time involves:
>
>  a) Reading each page of the table, and
>  b) Figuring out which records on those pages are still "live."

The table has been VACUUM ANALYZED so that there are no "dead" records.
It's still not clear why select count() would be slower than select with
a "where" clause.

>
> What work were you thinking was involved in doing the counting?

I was answering an earlier response that suggested that maybe the actual
counting took time so it would take quite a bit longer when there are
more rows to count.

>
> >> doesn't have to call it on the rows that don't match the where clause. How
> >> long does "explain analyze select 1 from items" with and without the where
> >> clause take?
> >
> > Same as count(*). Around 55 secs with no where clause, around 25 secs
> > with.
>
> Good; at least that's consistent...
> --
> (format nil "~S@~S" "cbbrowne" "acm.org")
> http://www3.sympatico.ca/cbbrowne/postgresql.html
> Signs of a Klingon  Programmer #2: "You  question the worthiness of my
> code? I should kill you where you stand!"
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Various performance questions

From
Shridhar Daithankar
Date:
Dror Matalon wrote:

> On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote:
>>Most of the time involves:
>>
>> a) Reading each page of the table, and
>> b) Figuring out which records on those pages are still "live."
>
>
> The table has been VACUUM ANALYZED so that there are no "dead" records.
> It's still not clear why select count() would be slower than select with
> a "where" clause.

Do a vacuum verbose full and then everything should be within small range of
each other.

Also in the where clause, does explicitly typecasting helps?

Like 'where channel<5000::int2;'

  HTH

  Shridhar


Re: Various performance questions

From
Dror Matalon
Date:
On Mon, Oct 27, 2003 at 12:52:27PM +0530, Shridhar Daithankar wrote:
> Dror Matalon wrote:
>
> >On Mon, Oct 27, 2003 at 01:04:49AM -0500, Christopher Browne wrote:
> >>Most of the time involves:
> >>
> >>a) Reading each page of the table, and
> >>b) Figuring out which records on those pages are still "live."
> >
> >
> >The table has been VACUUM ANALYZED so that there are no "dead" records.
> >It's still not clear why select count() would be slower than select with
> >a "where" clause.
>
> Do a vacuum verbose full and then everything should be within small range
> of each other.
>

I did vaccum full verbose and the results are the same as before, 55
seconds for count(*) and 26 seconds for count(*) where channel < 5000.

> Also in the where clause, does explicitly typecasting helps?
>
> Like 'where channel<5000::int2;'

It makes no difference.

>
>  HTH
>
>  Shridhar
>

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Various performance questions

From
Christopher Browne
Date:
In the last exciting episode, dror@zapatec.com (Dror Matalon) wrote:
> I was answering an earlier response that suggested that maybe the actual
> counting took time so it would take quite a bit longer when there are
> more rows to count.

Well, if a "where clause" allows the system to use an index to search
for the subset of elements, that would reduce the number of pages that
have to be examined, thereby diminishing the amount of work.

Why don't you report what EXPLAIN ANALYZE returns as output for the
query with WHERE clause?  That would allow us to get more of an idea
of what is going on...
--
(format nil "~S@~S" "cbbrowne" "acm.org")
http://www3.sympatico.ca/cbbrowne/spiritual.html
When  replying, it  is  often possible  to cleverly edit  the original
message in such a way  as to subtly alter  its meaning or tone to your
advantage while  appearing that you are  taking pains  to preserve the
author's intent.   As a   bonus,   it will   seem that your   superior
intellect is cutting through all the excess verbiage to the very heart
of the matter.  -- from the Symbolics Guidelines for Sending Mail

Re: Various performance questions

From
Greg Stark
Date:
Christopher Browne <cbbrowne@acm.org> writes:

> In the last exciting episode, dror@zapatec.com (Dror Matalon) wrote:
> > I was answering an earlier response that suggested that maybe the actual
> > counting took time so it would take quite a bit longer when there are
> > more rows to count.

That was my theory. I guess it's wrong. There is other work involved in
processing a record, but i'm surprised it's as long as the work to actually
pull the record from kernel and check if it's visible.

> Well, if a "where clause" allows the system to use an index to search
> for the subset of elements, that would reduce the number of pages that
> have to be examined, thereby diminishing the amount of work.

it's not. therein lies the mystery.

> Why don't you report what EXPLAIN ANALYZE returns as output for the
> query with WHERE clause?  That would allow us to get more of an idea
> of what is going on...

He did, right at the start of the thread.

For a 1 million record table without he's seeing

 select 1 from tab
 select count(*) from tab

being comparable with only a slight delay for the count(*) whereas

 select 1 from tab where c < 1000
 select count(*) from tab where c < 1000

are much faster even though they still use a sequential scan.

I'm puzzled why the where clause speeds things up as much as it does.

--
greg

Re: Various performance questions

From
Tarhon-Onu Victor
Date:
On Sun, 26 Oct 2003, Dror Matalon wrote:

> Here's the structure of the items table
[snip]
>  pubdate       | timestamp with time zone |
> Indexes:
>     "item_channel_link" btree (channel, link)
>     "item_created" btree (dtstamp)
>     "item_signature" btree (signature)
>     "items_channel_article" btree (channel, articlenumber)
>     "items_channel_tstamp" btree (channel, dtstamp)
>
>
> 5. Any other comments/suggestions on the above setup.

    Try set enable_seqscan = off; set enable_indexscan = on; to
force the planner to use one of the indexes. Analyze the queries from
your application and see what are the most used columns in WHERE clauses
and recreate the indexes. select count(*) from items where channel <
5000; will never use any of the current indexes because none matches
your WHERE clause (channel appears now only in multicolumn indexes).

--
Any views or opinions presented within this e-mail are solely those of
the author and do not necessarily represent those of any company, unless
otherwise expressly stated.

Re: Various performance questions

From
Neil Conway
Date:
On Mon, 2003-10-27 at 10:15, Tarhon-Onu Victor wrote:
> select count(*) from items where channel <
> 5000; will never use any of the current indexes because none matches
> your WHERE clause (channel appears now only in multicolumn indexes).

No -- a multi-column index can be used to answer queries on a prefix of
the index's column list. So an index on (channel, xyz) can be used to
answer queries on (just) "channel".

-Neil



Re: Various performance questions

From
Neil Conway
Date:
On Sun, 2003-10-26 at 22:49, Greg Stark wrote:
> What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an
> int8 to store its count so it's not limited to 4 billion records.
> Unfortunately int8 is somewhat inefficient as it has to be dynamically
> allocated repeatedly.

Uh, what? Why would an int8 need to be "dynamically allocated
repeatedly"?

-Neil



Re: Various performance questions

From
Vivek Khera
Date:
>>>>> "DM" == Dror Matalon <dror@zapatec.com> writes:

DM> effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192

DM> 1. While it seems to work correctly, I'm unclear on why this number is
DM> correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
DM> seems like the number should be more like 1 - 1.5 Gigs.

Nope, that's correct...


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: Various performance questions

From
Dror Matalon
Date:
On Mon, Oct 27, 2003 at 11:12:37AM -0500, Vivek Khera wrote:
> >>>>> "DM" == Dror Matalon <dror@zapatec.com> writes:
>
> DM> effective_cache_size = 25520  -- freebsd forumla: vfs.hibufspace / 8192
>
> DM> 1. While it seems to work correctly, I'm unclear on why this number is
> DM> correct.  25520*8 = 204160 or 200 Megs. On a machine with 2 Gigs it
> DM> seems like the number should be more like 1 - 1.5 Gigs.
>
> Nope, that's correct...

I know it's correct. I was asking why it's correct.

>
>
> --
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Vivek Khera, Ph.D.                Khera Communications, Inc.
> Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
> AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Various performance questions

From
Dror Matalon
Date:
On Mon, Oct 27, 2003 at 07:52:06AM -0500, Christopher Browne wrote:
> In the last exciting episode, dror@zapatec.com (Dror Matalon) wrote:
> > I was answering an earlier response that suggested that maybe the actual
> > counting took time so it would take quite a bit longer when there are
> > more rows to count.
>
> Well, if a "where clause" allows the system to use an index to search
> for the subset of elements, that would reduce the number of pages that
> have to be examined, thereby diminishing the amount of work.
>
> Why don't you report what EXPLAIN ANALYZE returns as output for the
> query with WHERE clause?  That would allow us to get more of an idea
> of what is going on...


Here it is once again, and I've added another data poing "channel <
1000" which takes even less time than channel < 5000. It almost seems
like the optimizer knows that it can skip certain rows "rows=4910762" vs
"rows=1505605" . But how can it do that without using an index or
actually looking at each row?

zp1936=> EXPLAIN ANALYZE select count(*) from items;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=245044.53..245044.53 rows=1 width=0) (actual time=55806.893..55806.897 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..232767.62 rows=4910762 width=0)
(actual time=0.058..30481.482 rows=4910762 loops=1)
 Total runtime: 55806.992 ms
(3 rows)

zp1936=> EXPLAIN ANALYZE select count(*) from items where channel < 5000;
                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=248808.54..248808.54 rows=1 width=0) (actual time=26071.264..26071.269 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..245044.52 rows=1505605 width=0)
(actual time=0.161..17623.033 rows=1632057 loops=1)
         Filter: (channel < 5000)
 Total runtime: 26071.361 ms
(4 rows)

zp1936=> EXPLAIN ANALYZE select count(*) from items where channel < 1000;
                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=245429.74..245429.74 rows=1 width=0) (actual time=10225.272..10225.276 rows=1 loops=1)
   ->  Seq Scan on items  (cost=0.00..245044.52 rows=154085 width=0) (actual time=7.633..10083.246 rows=25687 loops=1)
         Filter: (channel < 1000)
 Total runtime: 10225.373 ms
(4 rows)


> --
> (format nil "~S@~S" "cbbrowne" "acm.org")
> http://www3.sympatico.ca/cbbrowne/spiritual.html
> When  replying, it  is  often possible  to cleverly edit  the original
> message in such a way  as to subtly alter  its meaning or tone to your
> advantage while  appearing that you are  taking pains  to preserve the
> author's intent.   As a   bonus,   it will   seem that your   superior
> intellect is cutting through all the excess verbiage to the very heart
> of the matter.  -- from the Symbolics Guidelines for Sending Mail
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

--
Dror Matalon
Zapatec Inc
1700 MLK Way
Berkeley, CA 94709
http://www.zapatec.com

Re: Various performance questions

From
Greg Stark
Date:
Neil Conway <neilc@samurai.com> writes:

> On Sun, 2003-10-26 at 22:49, Greg Stark wrote:
> > What version of postgres is this?. In 7.4 (and maybe 7.3?) count() uses an
> > int8 to store its count so it's not limited to 4 billion records.
> > Unfortunately int8 is somewhat inefficient as it has to be dynamically
> > allocated repeatedly.
>
> Uh, what? Why would an int8 need to be "dynamically allocated
> repeatedly"?

Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that
profiling showed that the bulk of the cost in count() went to allocating
int8s. He commented that this could be optimized by having count() and sum()
bypass the regular api. I don't have the original message handy.

--
greg

Re: Various performance questions

From
Neil Conway
Date:
On Mon, 2003-10-27 at 12:56, Greg Stark wrote:
> Neil Conway <neilc@samurai.com> writes:
> > Uh, what? Why would an int8 need to be "dynamically allocated
> > repeatedly"?
>
> Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that
> profiling showed that the bulk of the cost in count() went to allocating
> int8s. He commented that this could be optimized by having count() and sum()
> bypass the regular api. I don't have the original message handy.

I'm still confused: int64 should be stack-allocated, AFAICS. Tom, do you
recall what the issue here is?

-Neil



Re: Various performance questions

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Mon, 2003-10-27 at 12:56, Greg Stark wrote:
>> Neil Conway <neilc@samurai.com> writes:
>>> Uh, what? Why would an int8 need to be "dynamically allocated
>>> repeatedly"?
>>
>> Perhaps I'm wrong, I'm extrapolating from a comment Tom Lane made that
>> profiling showed that the bulk of the cost in count() went to allocating
>> int8s. He commented that this could be optimized by having count() and sum()
>> bypass the regular api. I don't have the original message handy.

> I'm still confused: int64 should be stack-allocated, AFAICS. Tom, do you
> recall what the issue here is?

Greg is correct.  int8 is a pass-by-reference datatype and so every
aggregate state-transition function cycle requires at least one palloc
(to return the function result).  I think in the current state of the
code it requires two pallocs :-(, because we can't trust the transition
function to palloc its result in the right context without palloc'ing
leaked junk in that context, so an extra palloc is needed to copy the
result Datum into a longer-lived context than we call the function in.

There was some speculation a few weeks ago about devising a way to let
performance-critical transition functions avoid the excess palloc's by
working with a specialized API instead of the standard function call
API, but I think it'll have to wait for 7.5.

            regards, tom lane

Re: Various performance questions

From
Neil Conway
Date:
On Mon, 2003-10-27 at 13:52, Tom Lane wrote:
> Greg is correct.  int8 is a pass-by-reference datatype and so every
> aggregate state-transition function cycle requires at least one palloc
> (to return the function result).

Interesting. Is there a reason why int8 is pass-by-reference? (ISTM that
pass-by-value would be sufficient...)

Thanks for the information, Tom & Greg.

-Neil



Re: Various performance questions

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Interesting. Is there a reason why int8 is pass-by-reference?

Pass-by-value types have to fit into Datum.

On a 64-bit machine (ie, one where pointers are 64-bits anyway) it would
make sense to convert int8 (and float8 too) into pass-by-value types.
If the machine does not already need Datum to be 8 bytes, though, I
think that widening Datum to 8 bytes just for the benefit of these two
datatypes would be a serious net loss.  Not to mention that it would
just plain break everything on machines with no native 8-byte-int
datatype.

One of the motivations for the version-1 function call protocol was to
allow the pass-by-value-or-by-ref nature of these datatypes to be hidden
from most of the code, with an eye to someday making this a
platform-specific choice.

            regards, tom lane

Re: Various performance questions

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg is correct.  int8 is a pass-by-reference datatype

Just to keep the conversation on track. the evidence from this particular post
seems to indicate that my theory was wrong and the overhead for count(*) is
_not_ a big time sink. It seems to be at most 10% and usually less. A simple
"select 1 from tab" takes nearly as long.

I'm still puzzled why the times on these are so different when the latter
returns fewer records and both are doing sequential scans:

 select 1 from tab

 select 1 from tab where a < 1000

--
greg

Re: Various performance questions

From
Greg Stark
Date:
In fact the number of records seems to be almost irrelevant. A sequential scan
takes almost exactly the same amount of time up until a critical region (for
me around 100000 records) at which point it starts going up very quickly.

It's almost as if it's doing some disk i/o, but I'm watching vmstat and don't
see anything. And in any case it would have to read all the same blocks to do
the sequential scan regardless of how many records match, no?

I don't hear the disk seeking either -- though oddly there is some sound
coming from the computer when this computer running. It sounds like a high
pitched sound, almost like a floppy drive reading without seeking. Perhaps
there is some i/o happening and linux is lying about it? Perhaps I'm not
hearing seeking because it's reading everything from one track and not
seeking? Very strange.


slo=> explain analyze select 1::int4 from test where a < 1 ;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1693.00 rows=11 width=0) (actual time=417.468..417.468 rows=0 loops=1)
   Filter: (a < 1)
 Total runtime: 417.503 ms
(3 rows)

Time: 418.181 ms


slo=> explain analyze select 1::int4 from test where a < 100 ;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1693.00 rows=53 width=0) (actual time=0.987..416.224 rows=50 loops=1)
   Filter: (a < 100)
 Total runtime: 416.301 ms
(3 rows)

Time: 417.008 ms


slo=> explain analyze select 1::int4 from test where a < 10000 ;
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1693.00 rows=5283 width=0) (actual time=0.812..434.967 rows=5000 loops=1)
   Filter: (a < 10000)
 Total runtime: 439.620 ms
(3 rows)

Time: 440.665 ms


slo=> explain analyze select 1::int4 from test where a < 100000 ;
                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1693.00 rows=50076 width=0) (actual time=0.889..458.623 rows=50000 loops=1)
   Filter: (a < 100000)
 Total runtime: 491.281 ms
(3 rows)

Time: 491.998 ms


slo=> explain analyze select 1::int4 from test where a < 1000000 ;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1693.00 rows=99991 width=0) (actual time=0.018..997.421 rows=715071 loops=1)
   Filter: (a < 1000000)
 Total runtime: 1461.851 ms
(3 rows)

Time: 1462.898 ms


slo=> explain analyze select 1::int4 from test where a < 10000000 ;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1693.00 rows=99991 width=0) (actual time=0.015..1065.456 rows=800000 loops=1)
   Filter: (a < 10000000)
 Total runtime: 1587.481 ms
(3 rows)

--
greg

Re: Various performance questions

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> I'm still puzzled why the times on these are so different when the latter
> returns fewer records and both are doing sequential scans:

My best guess is that it's simply the per-tuple overhead of cycling
tuples through the two plan nodes.  When you have no actual I/O happening,
the seqscan runtime is going to be all CPU time, something of the form
    cost_per_page * number_of_pages_processed +
    cost_per_tuple_scanned * number_of_tuples_scanned +
    cost_per_tuple_returned * number_of_tuples_returned
I don't have numbers for the relative sizes of those three costs, but
I doubt that any of them are negligible compared to the other two.

Adding a WHERE clause increases cost_per_tuple_scanned but reduces the
number_of_tuples_returned, and so it cuts the contribution from the
third term, evidently by more than the WHERE clause adds to the second
term.

Ny own profiling had suggested that the cost-per-tuple-scanned in the
aggregate node dominated the seqscan CPU costs, but that might be
platform-specific, or possibly have something to do with the fact that
I was profiling an assert-enabled build.

It might be worth pointing out that EXPLAIN ANALYZE adds two kernel
calls (gettimeofday or some such) into each cycle of the plan nodes;
that's probably inflating the cost_per_tuple_returned by a noticeable
amount.

            regards, tom lane