Thread: Re: [HACKERS] []performance issues
On Fri, Aug 02, 2002 at 03:48:39PM +0400, Yaroslav Dmitriev wrote: > > So I am still interested in PostgreSQL's ability to deal with > multimillon records tables. [x-posted and Reply-To: to -general; this isn't a development problem.] We have tables with multimillion records, and they are fast. But not fast to count(). The MVCC design of PostgreSQL will give you very few concurerncy problems, but you pay for that in the response time of certain kinds of aggregates, which cannot use an index. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
Count() is slow even on your Sun server with 16gb ram? How big is the database? David Blood -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Andrew Sullivan Sent: Friday, August 02, 2002 9:40 AM To: PostgreSQL-development Cc: PostgreSQL general list Subject: Re: [GENERAL] [HACKERS] []performance issues On Fri, Aug 02, 2002 at 03:48:39PM +0400, Yaroslav Dmitriev wrote: > > So I am still interested in PostgreSQL's ability to deal with > multimillon records tables. [x-posted and Reply-To: to -general; this isn't a development problem.] We have tables with multimillion records, and they are fast. But not fast to count(). The MVCC design of PostgreSQL will give you very few concurerncy problems, but you pay for that in the response time of certain kinds of aggregates, which cannot use an index. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110 ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
On Fri, Aug 02, 2002 at 09:57:16AM -0600, David Blood wrote: > > Count() is slow even on your Sun server with 16gb ram? How big is the > database? Well, just relatively slow! It's always going to be relatively slow to seqscan a few million records. We have some tables which have maybe 4 or 4.5 million records in them. (I don't spend a lot of time count()ing them ;-) A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
We have tables of over 3.1 million records. Performance is fine for most things as long as access hits an index. As already stated, count(*) takes a long time. Just took over a minute for me to check the record count. Our DB is primarily a data warehouse role. Creating an index on a char(43) field on that table from scratch takes a while, but I think that's expected. Under normal loads we have well under 1 second "LIKE" queries on that the indexed char(43) field in the table with a join on a table of 1.1 million records using a char(12) primary key. Server is a Dell PowerEdge 2400, Dual PIII 667's with a gig of memory, 800 something megs allocated to postgres shared buffers. -Pete Andrew Sullivan wrote: >On Fri, Aug 02, 2002 at 03:48:39PM +0400, Yaroslav Dmitriev wrote: > > >>So I am still interested in PostgreSQL's ability to deal with >>multimillon records tables. >> >> > >[x-posted and Reply-To: to -general; this isn't a development >problem.] > >We have tables with multimillion records, and they are fast. But not >fast to count(). The MVCC design of PostgreSQL will give you very >few concurerncy problems, but you pay for that in the response time >of certain kinds of aggregates, which cannot use an index. > >A > > >
On Fri, 2002-08-02 at 11:39, Andrew Sullivan wrote: > On Fri, Aug 02, 2002 at 03:48:39PM +0400, Yaroslav Dmitriev wrote: > > > > So I am still interested in PostgreSQL's ability to deal with > > multimillon records tables. > > [x-posted and Reply-To: to -general; this isn't a development > problem.] > > We have tables with multimillion records, and they are fast. But not > fast to count(). The MVCC design of PostgreSQL will give you very > few concurerncy problems, but you pay for that in the response time > of certain kinds of aggregates, which cannot use an index. Of course, as suggested this is easily overcome by keeping your own c counter. begin; insert into bigtable values (); update into counttable set count=count+1; commit; Now you get all the fun concurrency issues -- but fetching the information will be quick. What happens more, the counts, or the inserts :)
On Fri, Aug 02, 2002 at 02:08:02PM -0400, Rod Taylor wrote: > > Of course, as suggested this is easily overcome by keeping your own c > counter. > > begin; > insert into bigtable values (); > update into counttable set count=count+1; > commit; > > Now you get all the fun concurrency issues -- but fetching the > information will be quick. What happens more, the counts, or the > inserts :) You could get around this with a trigger that just inserts 1 into one table (call it counter_unposted), and then using an external process to take those units, add them to the value in counter_posted, and delete them from counter_unposted. You'd always be a few minutes behind, but you'd get a counter that's pretty close without too much overhead. Of course, this raises the obvious question: why use count() at all? A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110