Thread: Re: [HACKERS] []performance issues

Re: [HACKERS] []performance issues

From

Andrew Sullivan

Date:

02 August 2002, 11:39:29

On Fri, Aug 02, 2002 at 03:48:39PM +0400, Yaroslav Dmitriev wrote:
>
> So I am still interested in PostgreSQL's ability to deal with
> multimillon records tables.

[x-posted and Reply-To: to -general; this isn't a development
problem.]

We have tables with multimillion records, and they are fast.  But not
fast to count().  The MVCC design of PostgreSQL will give you very
few concurerncy problems, but you pay for that in the response time
of certain kinds of aggregates, which cannot use an index.

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110

Re: [HACKERS] []performance issues

From

"David Blood"

Date:

02 August 2002, 11:57:03

Count() is slow even on your Sun server with 16gb ram?  How big is the
database?

David Blood
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Andrew Sullivan
Sent: Friday, August 02, 2002 9:40 AM
To: PostgreSQL-development
Cc: PostgreSQL general list
Subject: Re: [GENERAL] [HACKERS] []performance issues

On Fri, Aug 02, 2002 at 03:48:39PM +0400, Yaroslav Dmitriev wrote:
>
> So I am still interested in PostgreSQL's ability to deal with
> multimillon records tables.

[x-posted and Reply-To: to -general; this isn't a development
problem.]

We have tables with multimillion records, and they are fast.  But not
fast to count().  The MVCC design of PostgreSQL will give you very
few concurerncy problems, but you pay for that in the response time
of certain kinds of aggregates, which cannot use an index.

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Re: [HACKERS] []performance issues

From

Andrew Sullivan

Date:

02 August 2002, 14:11:10

On Fri, Aug 02, 2002 at 09:57:16AM -0600, David Blood wrote:
>
> Count() is slow even on your Sun server with 16gb ram?  How big is the
> database?

Well, just relatively slow!  It's always going to be relatively slow
to seqscan a few million records.  We have some tables which have
maybe 4 or 4.5 million records in them.  (I don't spend a lot of time
count()ing them ;-)

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110

Re: [HACKERS] []performance issues

From

"Peter A. Daly"

Date:

05 August 2002, 17:48:58

We have tables of over 3.1 million records. Performance is fine for most
things as long as access hits an index. As already stated, count(*)
takes a long time. Just took over a minute for me to check the record
count. Our DB is primarily a data warehouse role. Creating an index on a
char(43) field on that table from scratch takes a while, but I think
that's expected. Under normal loads we have well under 1 second "LIKE"
queries on that the indexed char(43) field in the table with a join on a
table of 1.1 million records using a char(12) primary key.

Server is a Dell PowerEdge 2400, Dual PIII 667's with a gig of memory,
800 something megs allocated to postgres shared buffers.

-Pete

Andrew Sullivan wrote:

>On Fri, Aug 02, 2002 at 03:48:39PM +0400, Yaroslav Dmitriev wrote:
>
>
>>So I am still interested in PostgreSQL's ability to deal with
>>multimillon records tables.
>>
>>
>
>[x-posted and Reply-To: to -general; this isn't a development
>problem.]
>
>We have tables with multimillion records, and they are fast.  But not
>fast to count().  The MVCC design of PostgreSQL will give you very
>few concurerncy problems, but you pay for that in the response time
>of certain kinds of aggregates, which cannot use an index.
>
>A
>
>
>

Re: [HACKERS] []performance issues

From

Rod Taylor

Date:

06 August 2002, 14:44:34

On Fri, 2002-08-02 at 11:39, Andrew Sullivan wrote:
> On Fri, Aug 02, 2002 at 03:48:39PM +0400, Yaroslav Dmitriev wrote:
> >
> > So I am still interested in PostgreSQL's ability to deal with
> > multimillon records tables.
>
> [x-posted and Reply-To: to -general; this isn't a development
> problem.]
>
> We have tables with multimillion records, and they are fast.  But not
> fast to count().  The MVCC design of PostgreSQL will give you very
> few concurerncy problems, but you pay for that in the response time
> of certain kinds of aggregates, which cannot use an index.

Of course, as suggested this is easily overcome by keeping your own c
counter.

begin;
insert into bigtable values ();
update into counttable set count=count+1;
commit;

Now you get all the fun concurrency issues -- but fetching the
information will be quick.   What happens more, the counts, or the
inserts :)

Re: [HACKERS] []performance issues

From

Andrew Sullivan

Date:

06 August 2002, 17:44:59

On Fri, Aug 02, 2002 at 02:08:02PM -0400, Rod Taylor wrote:
>
> Of course, as suggested this is easily overcome by keeping your own c
> counter.
>
> begin;
> insert into bigtable values ();
> update into counttable set count=count+1;
> commit;
>
> Now you get all the fun concurrency issues -- but fetching the
> information will be quick.   What happens more, the counts, or the
> inserts :)

You could get around this with a trigger that just inserts 1 into one
table (call it counter_unposted), and then using an external process
to take those units, add them to the value in counter_posted, and
delete them from counter_unposted.  You'd always be a few minutes
behind, but you'd get a counter that's pretty close without too much
overhead.  Of course, this raises the obvious question: why use
count() at all?

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110