Thread: Postgres slower than MS ACCESS
I am running some simple queries to benchmark Postgres 8.1 against MS Access and Postgres is 2 to 3 times slower that Access.
Hardware:
Dell Optiplex GX280
P4 3.20 GHz
3GB RAM
Windows XP SP1
Database has one table with 1.2 million rows
Query:
UPDATE ntdn SET gha=area/10000
I could post the EXPLAIN ANALYZE results but its 4,000+ lines long
I’ve run various tests on a number of Postgres parameters; none of which have come close to Access’ time of 5.00 min. Postgres times range between 24 min and 121 min.
Some of the Postgres variables and ranges I’ve tested.
work_mem: 1,000 to 2,000,000
temp_buffers: 1,000 to 10,000
shared_buffers: 1,000 to 64,000
sort_mem: 1,024,000
fsync on / off
Why does Access run so much faster? How can I get Postgres to run as fast as Access?
Thanks,
Jay
On Tue, 2006-02-14 at 09:51, Jay Greenfield wrote: > I am running some simple queries to benchmark Postgres 8.1 against MS > Access and Postgres is 2 to 3 times slower that Access. A BUNCH OF STUFF SNIPPED > Why does Access run so much faster? How can I get Postgres to run as > fast as Access? Because Access is not a multi-user database management system designed to handle anywhere from a couple to several thousand users at the same time? PostgreSQL can do this update while still allowing users to access the data in the database, and can handle updates to the same table at the same time, as long as they aren't hitting the same rows. They're two entirely different beasts. One is good at batch processing moderate amounts of data for one user at a time. The other is good for real time processing of very large amounts of data for a fairly large number of users while running at an acceptable, if slower speed.
Is it possible to configure Postgres to behave like Access - a single user and use as much of the recourses as required? Thanks, Jay. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Scott Marlowe Sent: Tuesday, February 14, 2006 8:05 AM To: Jay Greenfield Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Postgres slower than MS ACCESS On Tue, 2006-02-14 at 09:51, Jay Greenfield wrote: > I am running some simple queries to benchmark Postgres 8.1 against MS > Access and Postgres is 2 to 3 times slower that Access. A BUNCH OF STUFF SNIPPED > Why does Access run so much faster? How can I get Postgres to run as > fast as Access? Because Access is not a multi-user database management system designed to handle anywhere from a couple to several thousand users at the same time? PostgreSQL can do this update while still allowing users to access the data in the database, and can handle updates to the same table at the same time, as long as they aren't hitting the same rows. They're two entirely different beasts. One is good at batch processing moderate amounts of data for one user at a time. The other is good for real time processing of very large amounts of data for a fairly large number of users while running at an acceptable, if slower speed. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
On Tue, 2006-02-14 at 10:17, Jay Greenfield wrote: > Is it possible to configure Postgres to behave like Access - a single user > and use as much of the recourses as required? No. If you want something akin to that, try SQL Lite. it's not as featureful as PostgreSQL, but it's closer to it than Access.
* Jay Greenfield (jag@timberline.ca) wrote: > Database has one table with 1.2 million rows > Query: > > UPDATE ntdn SET gha=area/10000 > > I could post the EXPLAIN ANALYZE results but its 4,000+ lines long How do you get 4,000+ lines of explain analyze for one update query in a database with only one table? Something a bit fishy there. Perhaps you mean explain verbose, though I don't really see how that'd be so long either, but it'd be closer. Could you provide some more sane information? > I've run various tests on a number of Postgres parameters; none of which > have come close to Access' time of 5.00 min. Postgres times range between > 24 min and 121 min. > > Some of the Postgres variables and ranges I've tested. > work_mem: 1,000 to 2,000,000 > temp_buffers: 1,000 to 10,000 > shared_buffers: 1,000 to 64,000 > sort_mem: 1,024,000 > fsync on / off > > Why does Access run so much faster? How can I get Postgres to run as fast > as Access? While it's true that Access almost certainly takes some shortcuts, 24 minutes for an update across 1.2 millon rows seems an awefully long time for Postgres. Is this table exceptionally large in same way (ie: lots of columns)? I expect running with fsync off would be closer to 'Access mode' though it has risks (of course). Also, it might be faster to insert into a seperate table rather than run a huge update like that in Postgres. Also, if there are indexes on the table in question, you might drop them before doing the update/insert and recreate them after the query has finished. You really havn't provided anywhere near enough information to figure out what the actual problem is here. Access does take shortcuts but the times you're posting for Postgres seem quite far off based on the hardware and commands you've described... Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > While it's true that Access almost certainly takes some shortcuts, 24 > minutes for an update across 1.2 millon rows seems an awefully long time > for Postgres. I did some experiments along this line with a trivial table (2 integer columns) of 1.28M rows. I used CVS tip with all parameters at defaults. With no indexes, an UPDATE took about 50 seconds. With one index, it took 628 seconds. It's not hard to believe you could get to Jay's figures with multiple indexes. Looking in the postmaster log, I see I was getting checkpoints every few seconds. Increasing checkpoint_segments to 30 (a factor of 10) brought it down to 355 seconds, and then increasing shared_buffers to 20000 brought it down to 165 sec. Separating WAL and data onto different disks would have helped too, no doubt, but I'm too lazy to try it. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > While it's true that Access almost certainly takes some shortcuts, 24 > > minutes for an update across 1.2 millon rows seems an awefully long time > > for Postgres. > > I did some experiments along this line with a trivial table (2 integer > columns) of 1.28M rows. I used CVS tip with all parameters at defaults. > With no indexes, an UPDATE took about 50 seconds. With one index, it > took 628 seconds. It's not hard to believe you could get to Jay's > figures with multiple indexes. With multiple indexes, you might want to drop them and recreate them when you're updating an entire table. > Looking in the postmaster log, I see I was getting checkpoints every few > seconds. Increasing checkpoint_segments to 30 (a factor of 10) brought > it down to 355 seconds, and then increasing shared_buffers to 20000 > brought it down to 165 sec. Separating WAL and data onto different > disks would have helped too, no doubt, but I'm too lazy to try it. Sure, this was kind of my point, we need more information about the database if we're going to have much of a chance of improving the results he's seeing. 165 seconds is certainly a great deal better than 24 minutes. :) Thanks, Stephen
Attachment
> How do you get 4,000+ lines of explain analyze for one update query in a > database with only one table? Something a bit fishy there. Perhaps you > mean explain verbose, though I don't really see how that'd be so long > either, but it'd be closer. Could you provide some more sane > information? My mistake - there was 4,000 lines in the EXPLAIN ANALYZE VERBOSE output. Here is the output of EXPLAIN ANALYZE: QUERY PLAN "Seq Scan on ntdn (cost=0.00..3471884.39 rows=1221391 width=1592) (actual time=57292.580..1531300.003 rows=1221391 loops=1)" "Total runtime: 4472646.988 ms" > Is this table exceptionally large in same way (ie: lots > of columns)? The table is 1.2 million rows X 246 columns. The only index is the primary key. I will try to remove that index to see if that improves performance at all. Jay -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, February 14, 2006 12:43 PM To: Stephen Frost Cc: Jay Greenfield; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Postgres slower than MS ACCESS Stephen Frost <sfrost@snowman.net> writes: > While it's true that Access almost certainly takes some shortcuts, 24 > minutes for an update across 1.2 millon rows seems an awefully long time > for Postgres. I did some experiments along this line with a trivial table (2 integer columns) of 1.28M rows. I used CVS tip with all parameters at defaults. With no indexes, an UPDATE took about 50 seconds. With one index, it took 628 seconds. It's not hard to believe you could get to Jay's figures with multiple indexes. Looking in the postmaster log, I see I was getting checkpoints every few seconds. Increasing checkpoint_segments to 30 (a factor of 10) brought it down to 355 seconds, and then increasing shared_buffers to 20000 brought it down to 165 sec. Separating WAL and data onto different disks would have helped too, no doubt, but I'm too lazy to try it. regards, tom lane
"Jay Greenfield" <jag@timberline.ca> writes: > The table is 1.2 million rows X 246 columns. The only index is the primary > key. I will try to remove that index to see if that improves performance at > all. Hmm, the large number of columns might have something to do with it ... what datatypes are the columns? regards, tom lane
> Hmm, the large number of columns might have something to do with it ... > what datatypes are the columns? All sorts, but mostly float4 and varchar(2 to 10) Jay -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, February 14, 2006 1:03 PM To: Jay Greenfield Cc: 'Stephen Frost'; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Postgres slower than MS ACCESS "Jay Greenfield" <jag@timberline.ca> writes: > The table is 1.2 million rows X 246 columns. The only index is the primary > key. I will try to remove that index to see if that improves performance at > all. Hmm, the large number of columns might have something to do with it ... what datatypes are the columns? regards, tom lane
On Feb 14, 2006, at 3:56 PM, Jay Greenfield wrote: >> How do you get 4,000+ lines of explain analyze for one update >> query in a >> database with only one table? Something a bit fishy there. >> Perhaps you >> mean explain verbose, though I don't really see how that'd be so long >> either, but it'd be closer. Could you provide some more sane >> information? > > My mistake - there was 4,000 lines in the EXPLAIN ANALYZE VERBOSE > output. > Here is the output of EXPLAIN ANALYZE: > > QUERY PLAN > "Seq Scan on ntdn (cost=0.00..3471884.39 rows=1221391 width=1592) > (actual > time=57292.580..1531300.003 rows=1221391 loops=1)" > "Total runtime: 4472646.988 ms" > Have you been vacuuming or running autovacuum? If you keep running queries like this you're certianly going to have a ton of dead tuples, which would def explain these times too. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
I've been vacuuming between each test run. Not vacuuming results in times all the way up to 121 minutes. For a direct comparison with Access, the vacuuming time with Postgres should really be included as this is not required with Access. By removing all of the indexes I have been able to get the Postgres time down to 4.35 minutes with default setting for all except the following: fsync: off work_mem: 1024000 shared_buffers: 10000 I did a run with checkpoint_segments @ 30 (from 3 in 4.35 min run) and posted a time of 6.78 minutes. Any idea why this would increase the time? Thanks, Jay. -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Jeff Trout Sent: Wednesday, February 15, 2006 6:23 AM To: Jay Greenfield Cc: 'Tom Lane'; 'Stephen Frost'; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Postgres slower than MS ACCESS On Feb 14, 2006, at 3:56 PM, Jay Greenfield wrote: >> How do you get 4,000+ lines of explain analyze for one update >> query in a >> database with only one table? Something a bit fishy there. >> Perhaps you >> mean explain verbose, though I don't really see how that'd be so long >> either, but it'd be closer. Could you provide some more sane >> information? > > My mistake - there was 4,000 lines in the EXPLAIN ANALYZE VERBOSE > output. > Here is the output of EXPLAIN ANALYZE: > > QUERY PLAN > "Seq Scan on ntdn (cost=0.00..3471884.39 rows=1221391 width=1592) > (actual > time=57292.580..1531300.003 rows=1221391 loops=1)" > "Total runtime: 4472646.988 ms" > Have you been vacuuming or running autovacuum? If you keep running queries like this you're certianly going to have a ton of dead tuples, which would def explain these times too. -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
"Jay Greenfield" <jag@timberline.ca> writes: > I did a run with checkpoint_segments @ 30 (from 3 in 4.35 min run) and > posted a time of 6.78 minutes. Any idea why this would increase the time? The first time through might take longer while the machine creates empty xlog segment files (though I'd not have expected a hit that big). Once it's fully populated pg_xlog it'll just recycle the files, so you might find that a second try is faster. regards, tom lane
I've been vacuuming between each test run.
Not vacuuming results in times all the way up to 121 minutes. For a direct
comparison with Access, the vacuuming time with Postgres should really be
included as this is not required with Access.
Hmm but then you would have to include Access Vacuum too I'll think you will find "Tools -> Database Utils -> Compact Database" preforms a simular purpose and is just as important as I've seen many Access Databases bloat in my time.
Peter Childs