Thread: SQL performance issue with PostgreSQL compared to MySQL
This popped up on a local mailing list and I was alarmed at what results the user was getting in regards to MySQL and PostgreSQL. It seems that the query he was running on both databases was much faster in MySQL than PostgreSQL. Here's what he wrote: ************************************************************************** I'll post this message to the list, in case anyone wants to follow - or, if it's too OT, let me know, and I'll stop. The query "SELECT authuser, SUM(bytes) AS traffic FROM logfile GROUP BY authuser ORDER BY traffic" on a DB of approx. 1.8 million rows (same data in Pgsql and in mysql) takes 1.83min. in mysql, and 7.36min. on pgsql. The mysql db is raw...no indexes or anything 'tuning' done. The pgsql db is indexed on the 'authuser' field, and I've run 'analyze logfile'. The machine is a PIII 600 w/728Mb RAM - and it's definitely CPU bound (both scream up to 100% and stay ;). As far as effecient queries, I'm not too sure how much more efficient that query can be made - I'm a netadmin, not a DBA :) ************************************************************************** My guess is he hasn't optimized PostgreSQL at all on his system. I will try and find out more from him as to what version of PostgreSQL he's running and try and get a copy of his postgresql.conf file. But can anyone think of how the SQL statement could be written to be more efficient? -- Jeff Self Information Technology Analyst Department of Personnel City of Newport News 2400 Washington Avenue Newport News, VA 23607 757-926-6930
Jeff Self <jself@nngov.com> writes: > if it's too OT, let me know, and I'll stop. The query "SELECT authuser, > SUM(bytes) AS traffic FROM logfile GROUP BY authuser ORDER BY traffic" > [is slow] What datatype are the columns used in the query? And what PG version are we talking about? regards, tom lane
Jeff, ************************************************************************** > I'll post this message to the list, in case anyone wants to follow - > or, > if it's too OT, let me know, and I'll stop. The query "SELECT > authuser, > SUM(bytes) AS traffic FROM logfile GROUP BY authuser ORDER BY > traffic" > on a DB of approx. 1.8 million rows (same data in Pgsql and in mysql) > takes 1.83min. in mysql, and 7.36min. on pgsql. The mysql db is > raw...no > indexes or anything 'tuning' done. First off, this is nonsense. One of its benefits for web developers is that MySQL automatically indexes everything. > The pgsql db is indexed on the > 'authuser' field, and I've run 'analyze logfile'. He also needs to index the bytes field and the traffic field. And run VACUUM, not just ANALYZE, if this is a high-activitly table, which I suspect. The machine is a > PIII > 600 w/728Mb RAM - and it's definitely CPU bound (both scream up to > 100% > and stay ;). As far as effecient queries, I'm not too sure how much > more > efficient that query can be made - I'm a netadmin, not a DBA :) We can tell. To be blunt, MySQL is the database for non-DBAs. He should probably stick to using it rather than Postgres, which requires some knowledge of performance tuning and query structure. > My guess is he hasn't optimized PostgreSQL at all on his system. I > will > try and find out more from him as to what version of PostgreSQL he's > running and try and get a copy of his postgresql.conf file. But can > anyone think of how the SQL statement could be written to be more > efficient? He also needs to up his sort_mem to the max his system and load will allow. He's sorting 1.8 million rows. -Josh Berkus
Tom Lane wrote: > Jeff Self <jself@nngov.com> writes: > > if it's too OT, let me know, and I'll stop. The query "SELECT authuser, > > SUM(bytes) AS traffic FROM logfile GROUP BY authuser ORDER BY traffic" > > [is slow] > > What datatype are the columns used in the query? And what PG version > are we talking about? My guess is that is the lack of hashing for GROUP BY aggregating that is killing us. TODO has: * Add hash for evaluating GROUP BY aggregates and I specifically remember MySQL has such a hash. The big question is how many unique values are there in the group. If it is large, the missing hashing could be the problem. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Wed, 19 Jun 2002, Josh Berkus wrote: > > The pgsql db is indexed on the > > 'authuser' field, and I've run 'analyze logfile'. > > He also needs to index the bytes field and the traffic field. And run ^^^^^^^ Index on an aggregate function??? I though we could only index plain normal column iscachable functions. Am i missing something? > > -Josh Berkus > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Achilleus Mantzios S/W Engineer IT dept Dynacom Tankers Mngmt tel: +30-10-8981112 fax: +30-10-8981877 email: achill@matrix.gatewaynet.com mantzios@softlab.ece.ntua.gr
Achilleus, > > > The pgsql db is indexed on the > > > 'authuser' field, and I've run 'analyze logfile'. > > > > He also needs to index the bytes field and the traffic field. And run > ^^^^^^^ > > Index on an aggregate function??? > > I though we could only index plain normal column iscachable functions. No, you're correct. I meant just to index on the bytes field. -- -Josh Berkus