Thread: Sort performance on large tables
Hi everyone, I have a question about the performance of sort. Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate 400GB drive. The IDE drive has the OS and the WAL files, the SATA drive the database. From hdparm the max IO for the IDE drive is about 50Mb/s and the SATA drive is about 65Mb/s. Thus a very low-end machine - but it used just for development (i.e., it is not a production machine) and the only thing it does is run a PostgresSQL database. I have a staging table called completechain that holds US tiger data (i.e., streets and addresses for the US). The table is approximately 18GB. Its big because there is a lot of data, but also because the table is not normalized (it comes that way). I want to extract data out of the file, with the most important values being stored in a column called tlid. The tlid field is an integer, and the values are 98% unique. There is a second column called ogc_fid which is unique (it is a serial field). I need to extract out unique TLID's (doesn't matter which duplicate I get rid of). To do this I am running this query: SELECT tlid, min(ogc_fid) FROM completechain GROUP BY tlid; The results from explain analyze are: "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)" " -> Sort (cost=10400373.80..10520553.06 rows=48071704 width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)" " Sort Key: tlid" " -> Seq Scan on completechain (cost=0.00..2228584.04 rows=48071704 width=8) (actual time=27.514..773245.046 rows=48199165 loops=1)" "Total runtime: 8486057.185 ms" Doing a similar query produces the same results: SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain; Note it takes over 10 times longer to do the sort than the full sequential scan. Should I expect results like this? I realize that the computer is quite low-end and is very IO bound for this query, but I'm still surprised that the sort operation takes so long. Out of curiosity, I setup an Oracle database on the same machine with the same data and ran the same query. Oracle was over an order of magnitude faster. Looking at its query plan, it avoided the sort by using "HASH GROUP BY." Does such a construct exist in PostgreSQL (I see only hash joins)? Also as an experiment I forced oracle to do a sort by running this query: SELECT tlid, min(ogc_fid) FROM completechain GROUP BY tlid ORDER BY tlid; Even with this, it was more than a magnitude faster than Postgresql. Which makes me think I have somehow misconfigured postgresql (see the relevant parts of postgresql.conf below). Any idea/help appreciated. Thanks, Charlie ------------------------------- #--------------------------------------------------------------------------- # RESOURCE USAGE (except WAL) #--------------------------------------------------------------------------- shared_buffers = 40000 # 40000 buffers * 8192 bytes/buffer = 327,680,000 bytes #shared_buffers = 1000 # min 16 or max_connections*2, 8KB each temp_buffers = 5000 #temp_buffers = 1000 # min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 16384 # in Kb #work_mem = 1024 # min 64, size in KB maintenance_work_mem = 262144 # in kb #maintenance_work_mem = 16384 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - max_fsm_pages = 60000 #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~70 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 #preload_libraries = '' # - Cost-Based Vacuum Delay - #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-10000 credits #vacuum_cost_page_miss = 10 # 0-10000 credits #vacuum_cost_page_dirty = 20 # 0-10000 credits #vacuum_cost_limit = 200 # 0-10000 credits # - Background writer - #bgwriter_delay = 200 # 10-10000 milliseconds between rounds #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers scanned/round #bgwriter_lru_maxpages = 5 # 0-1000 buffers max written/round #bgwriter_all_percent = 0.333 # 0-100% of all buffers scanned/round #bgwriter_all_maxpages = 5 # 0-1000 buffers max written/round #--------------------------------------------------------------------------- # WRITE AHEAD LOG #--------------------------------------------------------------------------- # - Settings - fsync = on # turns forced synchronization on or off #wal_sync_method = fsync # the default is the first option # supported by the operating system: # open_datasync # fdatasync # fsync # fsync_writethrough # open_sync #full_page_writes = on # recover from partial page writes wal_buffers = 128 #wal_buffers = 8 # min 4, 8KB each #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 # - Checkpoints - checkpoint_segments = 256 # 256 * 16Mb = 4,294,967,296 bytes checkpoint_timeout = 1200 # 1200 seconds (20 minutes) checkpoint_warning = 30 # in seconds, 0 is off #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 300 # range 30-3600, in seconds #checkpoint_warning = 30 # in seconds, 0 is off # - Archiving - #archive_command = '' # command to use to archive a logfile # segment #--------------------------------------------------------------------------- # QUERY TUNING #--------------------------------------------------------------------------- # - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_mergejoin = on #enable_nestloop = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on # - Planner Cost Constants - effective_cache_size = 80000 # 80000 * 8192 = 655,360,000 bytes #effective_cache_size = 1000 # typically 8KB each random_page_cost = 2.5 # units are one sequential page fetch #random_page_cost = 4 # units are one sequential page fetch # cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) # - Genetic Query Optimizer - #geqo = on #geqo_threshold = 12 #geqo_effort = 5 # range 1-10 #geqo_pool_size = 0 # selects default based on effort #geqo_generations = 0 # selects default based on effort #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - default_statistics_target = 100 # range 1-1000 #default_statistics_target = 10 # range 1-1000 #constraint_exclusion = off #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of explicit # JOINs #--------------------------------------------------------------------------- #--------------------------------------------------------------------------- # RUNTIME STATISTICS #--------------------------------------------------------------------------- # - Statistics Monitoring - #log_parser_stats = off #log_planner_stats = off #log_executor_stats = off #log_statement_stats = off # - Query/Index Statistics Collector - stats_start_collector = on stats_command_string = on stats_block_level = on stats_row_level = on #stats_start_collector = on #stats_command_string = off #stats_block_level = off #stats_row_level = off #stats_reset_on_server_start = off #--------------------------------------------------------------------------- # AUTOVACUUM PARAMETERS #--------------------------------------------------------------------------- autovacuum = true autovacuum_naptime = 600 #autovacuum = false # enable autovacuum subprocess? #autovacuum_naptime = 60 # time between autovacuum runs, in secs #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before # vacuum #autovacuum_analyze_threshold = 500 # min # of tuple updates before # analyze #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before # vacuum #autovacuum_analyze_scale_factor = 0.2 # fraction of rel size before # analyze #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for # autovac, -1 means use # vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # autovac, -1 means use # vacuum_cost_ ---------------------- CREATE TABLE tiger.completechain ( ogc_fid int4 NOT NULL DEFAULT nextval('completechain_ogc_fid_seq'::regclass), module varchar(8) NOT NULL, tlid int4 NOT NULL, side1 int4, source varchar(1) NOT NULL, fedirp varchar(2), fename varchar(30), fetype varchar(4), fedirs varchar(2), cfcc varchar(3) NOT NULL, fraddl varchar(11), toaddl varchar(11), fraddr varchar(11), toaddr varchar(11), friaddl varchar(1), toiaddl varchar(1), friaddr varchar(1), toiaddr varchar(1), zipl int4, zipr int4, aianhhfpl int4, aianhhfpr int4, aihhtlil varchar(1), aihhtlir varchar(1), census1 varchar(1), census2 varchar(1), statel int4, stater int4, countyl int4, countyr int4, cousubl int4, cousubr int4, submcdl int4, submcdr int4, placel int4, placer int4, tractl int4, tractr int4, blockl int4, blockr int4, wkb_geometry public.geometry NOT NULL, CONSTRAINT enforce_dims_wkb_geometry CHECK (ndims(wkb_geometry) = 2), CONSTRAINT enforce_geotype_wkb_geometry CHECK (geometrytype(wkb_geometry) = 'LINESTRING'::text OR wkb_geometry IS NULL), CONSTRAINT enforce_srid_wkb_geometry CHECK (srid(wkb_geometry) = 4269) ) WITHOUT OIDS; ALTER TABLE tiger.completechain OWNER TO postgres;
Charlie Savage wrote: > Hi everyone, > > I have a question about the performance of sort. > Note it takes over 10 times longer to do the sort than the full > sequential scan. > > Should I expect results like this? I realize that the computer is quite > low-end and is very IO bound for this query, but I'm still surprised > that the sort operation takes so long. The sort will be spilling to disk, which will grind your I/O to a halt. > work_mem = 16384 # in Kb Try upping this. You should be able to issue "set work_mem = 100000" before running your query IIRC. That should let PG do its sorting in larger chunks. Also, if your most common access pattern is ordered via tlid look into clustering the table on that. -- Richard Huxton Archonet Ltd
Charlie, > Should I expect results like this? I realize that the > computer is quite low-end and is very IO bound for this > query, but I'm still surprised that the sort operation takes so long. It's the sort performance of Postgres that's your problem. > Out of curiosity, I setup an Oracle database on the same > machine with the same data and ran the same query. Oracle > was over an order of magnitude faster. Looking at its query > plan, it avoided the sort by using "HASH GROUP BY." Does > such a construct exist in PostgreSQL (I see only hash joins)? Yes, hashaggregate does a similar thing. You can force the planner to do it, don't remember off the top of my head but someone else on-list will. > Also as an experiment I forced oracle to do a sort by running > this query: > > SELECT tlid, min(ogc_fid) > FROM completechain > GROUP BY tlid > ORDER BY tlid; > > Even with this, it was more than a magnitude faster than Postgresql. > Which makes me think I have somehow misconfigured postgresql > (see the relevant parts of postgresql.conf below). Just as we find with a similar comparison (with a "popular commercial, proprietary database" :-) Though some might suggest you increase work_mem or other tuning suggestions to speed sorting, none work. In fact, we find that increasing work_mem actually slows sorting slightly. We are commissioning an improved sorting routine for bizgres (www.bizgres.org) which will be contributed to the postgres main, but won't come out at least until 8.2 comes out, possibly 12 mos. In the meantime, you will be able to use the new routine in the bizgres version of postgres, possibly in the next couple of months. Also - we (Greenplum) are about to announce the public beta of the bizgres MPP database, which will use all of your CPUs, and those of other nodes in a cluster, for sorting. We see a linear scaling of sort performance, so you could add CPUs and/or hosts and scale out of the problem. Cheers, - Luke
I have run into this type of query problem as well. I solved it in my application by the following type of query. SELECT tlid FROM completechain AS o WHERE not exists ( SELECT 1 FROM completechain WHERE tlid=o.tlid and ogc_fid!=o.ogc_fid ); Assumes of course that you have an index on tlid. > -----Original Message----- > From: pgsql-performance-owner@postgresql.org > [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of > Charlie Savage > Sent: Tuesday, November 08, 2005 2:05 AM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] Sort performance on large tables > > Hi everyone, > > I have a question about the performance of sort. > > Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 > RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate > 400GB drive. The IDE drive has the OS and the WAL files, the > SATA drive the database. > From hdparm the max IO for the IDE drive is about 50Mb/s and > the SATA drive is about 65Mb/s. Thus a very low-end machine > - but it used just for development (i.e., it is not a > production machine) and the only thing it does is run a > PostgresSQL database. > > I have a staging table called completechain that holds US > tiger data (i.e., streets and addresses for the US). The > table is approximately 18GB. Its big because there is a lot > of data, but also because the table is not normalized (it > comes that way). > > I want to extract data out of the file, with the most > important values being stored in a column called tlid. The > tlid field is an integer, and the values are 98% unique. > There is a second column called ogc_fid which is unique (it > is a serial field). I need to extract out unique TLID's > (doesn't matter which duplicate I get rid of). To do this I > am running this query: > > SELECT tlid, min(ogc_fid) > FROM completechain > GROUP BY tlid; > > The results from explain analyze are: > > "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 > width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)" > " -> Sort (cost=10400373.80..10520553.06 rows=48071704 > width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)" > " Sort Key: tlid" > " -> Seq Scan on completechain (cost=0.00..2228584.04 > rows=48071704 width=8) (actual time=27.514..773245.046 > rows=48199165 loops=1)" > "Total runtime: 8486057.185 ms" > > Doing a similar query produces the same results: > > SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain; > > Note it takes over 10 times longer to do the sort than the > full sequential scan. > > Should I expect results like this? I realize that the > computer is quite low-end and is very IO bound for this > query, but I'm still surprised that the sort operation takes so long. > > Out of curiosity, I setup an Oracle database on the same > machine with the same data and ran the same query. Oracle > was over an order of magnitude faster. Looking at its query > plan, it avoided the sort by using "HASH GROUP BY." Does > such a construct exist in PostgreSQL (I see only hash joins)? > > Also as an experiment I forced oracle to do a sort by running > this query: > > SELECT tlid, min(ogc_fid) > FROM completechain > GROUP BY tlid > ORDER BY tlid; > > Even with this, it was more than a magnitude faster than Postgresql. > Which makes me think I have somehow misconfigured postgresql > (see the relevant parts of postgresql.conf below). > > Any idea/help appreciated. > > Thanks, > > Charlie > > > ------------------------------- > > #------------------------------------------------------------- > -------------- > # RESOURCE USAGE (except WAL) > #------------------------------------------------------------- > -------------- > > shared_buffers = 40000 # 40000 buffers * 8192 > bytes/buffer = 327,680,000 bytes > #shared_buffers = 1000 # min 16 or > max_connections*2, 8KB each > > temp_buffers = 5000 > #temp_buffers = 1000 # min 100, 8KB each > #max_prepared_transactions = 5 # can be 0 or more > # note: increasing max_prepared_transactions costs ~600 bytes > of shared memory # per transaction slot, plus lock space (see > max_locks_per_transaction). > > work_mem = 16384 # in Kb > #work_mem = 1024 # min 64, size in KB > > maintenance_work_mem = 262144 # in kb > #maintenance_work_mem = 16384 # min 1024, size in KB > #max_stack_depth = 2048 # min 100, size in KB > > # - Free Space Map - > > max_fsm_pages = 60000 > #max_fsm_pages = 20000 # min > max_fsm_relations*16, 6 bytes each > > #max_fsm_relations = 1000 # min 100, ~70 bytes each > > # - Kernel Resource Usage - > > #max_files_per_process = 1000 # min 25 > #preload_libraries = '' > > # - Cost-Based Vacuum Delay - > > #vacuum_cost_delay = 0 # 0-1000 milliseconds > #vacuum_cost_page_hit = 1 # 0-10000 credits > #vacuum_cost_page_miss = 10 # 0-10000 credits > #vacuum_cost_page_dirty = 20 # 0-10000 credits > #vacuum_cost_limit = 200 # 0-10000 credits > > # - Background writer - > > #bgwriter_delay = 200 # 10-10000 milliseconds > between rounds > #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers > scanned/round > #bgwriter_lru_maxpages = 5 # 0-1000 buffers max > written/round > #bgwriter_all_percent = 0.333 # 0-100% of all buffers > scanned/round > #bgwriter_all_maxpages = 5 # 0-1000 buffers max > written/round > > > #------------------------------------------------------------- > -------------- > # WRITE AHEAD LOG > #------------------------------------------------------------- > -------------- > > # - Settings - > > fsync = on # turns forced > synchronization on or off > #wal_sync_method = fsync # the default is the > first option > # supported by the > operating system: > # open_datasync > # fdatasync > # fsync > # fsync_writethrough > # open_sync > #full_page_writes = on # recover from > partial page writes > > wal_buffers = 128 > #wal_buffers = 8 # min 4, 8KB each > > #commit_delay = 0 # range 0-100000, in > microseconds > #commit_siblings = 5 # range 1-1000 > > # - Checkpoints - > > checkpoint_segments = 256 # 256 * 16Mb = > 4,294,967,296 bytes > checkpoint_timeout = 1200 # 1200 seconds (20 minutes) > checkpoint_warning = 30 # in seconds, 0 is off > > #checkpoint_segments = 3 # in logfile segments, > min 1, 16MB each > #checkpoint_timeout = 300 # range 30-3600, in seconds > #checkpoint_warning = 30 # in seconds, 0 is off > > # - Archiving - > > #archive_command = '' # command to use to > archive a logfile > # segment > > > #------------------------------------------------------------- > -------------- > # QUERY TUNING > #------------------------------------------------------------- > -------------- > > # - Planner Method Configuration - > > #enable_bitmapscan = on > #enable_hashagg = on > #enable_hashjoin = on > #enable_indexscan = on > #enable_mergejoin = on > #enable_nestloop = on > #enable_seqscan = on > #enable_sort = on > #enable_tidscan = on > > # - Planner Cost Constants - > > effective_cache_size = 80000 # 80000 * 8192 = > 655,360,000 bytes > #effective_cache_size = 1000 # typically 8KB each > > random_page_cost = 2.5 # units are one > sequential page fetch > #random_page_cost = 4 # units are one > sequential page fetch > # cost > #cpu_tuple_cost = 0.01 # (same) > #cpu_index_tuple_cost = 0.001 # (same) > #cpu_operator_cost = 0.0025 # (same) > > # - Genetic Query Optimizer - > > #geqo = on > #geqo_threshold = 12 > #geqo_effort = 5 # range 1-10 > #geqo_pool_size = 0 # selects default based > on effort > #geqo_generations = 0 # selects default based > on effort > #geqo_selection_bias = 2.0 # range 1.5-2.0 > > # - Other Planner Options - > > default_statistics_target = 100 # range 1-1000 > #default_statistics_target = 10 # range 1-1000 > #constraint_exclusion = off > #from_collapse_limit = 8 > #join_collapse_limit = 8 # 1 disables collapsing > of explicit > # JOINs > > > #------------------------------------------------------------- > -------------- > #------------------------------------------------------------- > -------------- > # RUNTIME STATISTICS > #------------------------------------------------------------- > -------------- > > # - Statistics Monitoring - > > #log_parser_stats = off > #log_planner_stats = off > #log_executor_stats = off > #log_statement_stats = off > > # - Query/Index Statistics Collector - > > stats_start_collector = on > stats_command_string = on > stats_block_level = on > stats_row_level = on > > #stats_start_collector = on > #stats_command_string = off > #stats_block_level = off > #stats_row_level = off > #stats_reset_on_server_start = off > > > #------------------------------------------------------------- > -------------- > # AUTOVACUUM PARAMETERS > #------------------------------------------------------------- > -------------- > > autovacuum = true > autovacuum_naptime = 600 > > #autovacuum = false # enable autovacuum subprocess? > #autovacuum_naptime = 60 # time between > autovacuum runs, in secs > #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before > # vacuum > #autovacuum_analyze_threshold = 500 # min # of tuple updates before > # analyze > #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before > # vacuum > #autovacuum_analyze_scale_factor = 0.2 # fraction of > rel size before > # analyze > #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for > # autovac, -1 means use > # vacuum_cost_delay > #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for > # autovac, -1 means use > # vacuum_cost_ > > > ---------------------- > > CREATE TABLE tiger.completechain > ( > ogc_fid int4 NOT NULL DEFAULT > nextval('completechain_ogc_fid_seq'::regclass), > module varchar(8) NOT NULL, > tlid int4 NOT NULL, > side1 int4, > source varchar(1) NOT NULL, > fedirp varchar(2), > fename varchar(30), > fetype varchar(4), > fedirs varchar(2), > cfcc varchar(3) NOT NULL, > fraddl varchar(11), > toaddl varchar(11), > fraddr varchar(11), > toaddr varchar(11), > friaddl varchar(1), > toiaddl varchar(1), > friaddr varchar(1), > toiaddr varchar(1), > zipl int4, > zipr int4, > aianhhfpl int4, > aianhhfpr int4, > aihhtlil varchar(1), > aihhtlir varchar(1), > census1 varchar(1), > census2 varchar(1), > statel int4, > stater int4, > countyl int4, > countyr int4, > cousubl int4, > cousubr int4, > submcdl int4, > submcdr int4, > placel int4, > placer int4, > tractl int4, > tractr int4, > blockl int4, > blockr int4, > wkb_geometry public.geometry NOT NULL, > CONSTRAINT enforce_dims_wkb_geometry CHECK > (ndims(wkb_geometry) = 2), > CONSTRAINT enforce_geotype_wkb_geometry CHECK > (geometrytype(wkb_geometry) = 'LINESTRING'::text OR > wkb_geometry IS NULL), > CONSTRAINT enforce_srid_wkb_geometry CHECK > (srid(wkb_geometry) = 4269) > ) > WITHOUT OIDS; > ALTER TABLE tiger.completechain OWNER TO postgres; > > > > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > >
On Tue, 8 Nov 2005, Luke Lonergan wrote: > > SELECT tlid, min(ogc_fid) > > FROM completechain > > GROUP BY tlid > > ORDER BY tlid; > > > > Even with this, it was more than a magnitude faster than Postgresql. > > Which makes me think I have somehow misconfigured postgresql > > (see the relevant parts of postgresql.conf below). > > Just as we find with a similar comparison (with a "popular commercial, > proprietary database" :-) Though some might suggest you increase > work_mem or other tuning suggestions to speed sorting, none work. In > fact, we find that increasing work_mem actually slows sorting slightly. I wish you'd qualify your statements, because I can demonstrably show that I can make sorts go faster on my machine at least by increasing work_mem under some conditions.
Stephan,
On 11/8/05 9:38 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote:
Recognize also that we’re looking for a factor of 10 or more improvement here – this is not a small increase that’s needed.
- Luke
On 11/8/05 9:38 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote:
>Cool – can you provide your test case please? I’ll ask our folks to do the same, but as I recall we did some pretty thorough testing and found that it doesn’t help. Moreover, the conclusion was that the current algorithm isn’t designed to use memory effectively.
> Just as we find with a similar comparison (with a "popular commercial,
> proprietary database" :-) Though some might suggest you increase
> work_mem or other tuning suggestions to speed sorting, none work. In
> fact, we find that increasing work_mem actually slows sorting slightly.
I wish you'd qualify your statements, because I can demonstrably show that
I can make sorts go faster on my machine at least by increasing work_mem
under some conditions.
Recognize also that we’re looking for a factor of 10 or more improvement here – this is not a small increase that’s needed.
- Luke
On Tue, 8 Nov 2005, Luke Lonergan wrote: > Stephan, > > On 11/8/05 9:38 AM, "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote: > > >> > > >> > Just as we find with a similar comparison (with a "popular commercial, > >> > proprietary database" :-) Though some might suggest you increase > >> > work_mem or other tuning suggestions to speed sorting, none work. In > >> > fact, we find that increasing work_mem actually slows sorting slightly. > > > > I wish you'd qualify your statements, because I can demonstrably show that > > I can make sorts go faster on my machine at least by increasing work_mem > > under some conditions. > > > Cool � can you provide your test case please? I probably should have added the wink smiley to make it obvious I was talking about the simplest case, things that don't fit in work_mem at the current level but for which it's easy to raise work_mem to cover. It's not a big a gain as one might hope, but it does certainly drop again. > Recognize also that we�re looking for a factor of 10 or more improvement > here � this is not a small increase that�s needed. I agree that we definately need help on that regard. I do see the effect where raising work_mem lowers the performance up until that point. I just think that it requires more care in the discussion than disregarding the suggestions entirely especially since people are going to see this in the archives.
Thanks everyone for the feedback. I tried increasing work_mem: set work_mem to 300000; select tlid, min(ogc_fid) from completechain group by tld; The results are: "GroupAggregate (cost=9041602.80..10003036.88 rows=48071704 width=8) (actual time=4371749.523..5106162.256 rows=47599910 loops=1)" " -> Sort (cost=9041602.80..9161782.06 rows=48071704 width=8) (actual time=4371690.894..4758660.433 rows=48199165 loops=1)" " Sort Key: tlid" " -> Seq Scan on completechain (cost=0.00..2228584.04 rows=48071704 width=8) (actual time=49.518..805234.970 rows=48199165 loops=1)" "Total runtime: 5279988.127 ms" Thus the time decreased from 8486 seconds to 5279 seconds - which is a nice improvement. However, that still leaves postgresql about 9 times slower. I tried increasing work_mem up to 500000, but at that point the machine started using its swap partition and performance degraded back to the original values. Charlie Richard Huxton wrote: > Charlie Savage wrote: >> Hi everyone, >> >> I have a question about the performance of sort. > >> Note it takes over 10 times longer to do the sort than the full >> sequential scan. >> >> Should I expect results like this? I realize that the computer is >> quite low-end and is very IO bound for this query, but I'm still >> surprised that the sort operation takes so long. > > The sort will be spilling to disk, which will grind your I/O to a halt. > >> work_mem = 16384 # in Kb > > Try upping this. You should be able to issue "set work_mem = 100000" > before running your query IIRC. That should let PG do its sorting in > larger chunks. > > Also, if your most common access pattern is ordered via tlid look into > clustering the table on that. Richard Huxton wrote: > Charlie Savage wrote: >> Hi everyone, >> >> I have a question about the performance of sort. > >> Note it takes over 10 times longer to do the sort than the full >> sequential scan. >> >> Should I expect results like this? I realize that the computer is >> quite low-end and is very IO bound for this query, but I'm still >> surprised that the sort operation takes so long. > > The sort will be spilling to disk, which will grind your I/O to a halt. > >> work_mem = 16384 # in Kb > > Try upping this. You should be able to issue "set work_mem = 100000" > before running your query IIRC. That should let PG do its sorting in > larger chunks. > > Also, if your most common access pattern is ordered via tlid look into > clustering the table on that.
Charlie Savage <cfis@interserv.com> writes: > Thus the time decreased from 8486 seconds to 5279 seconds - which is a > nice improvement. However, that still leaves postgresql about 9 times > slower. BTW, what data type are you sorting, exactly? If it's a string type, what is your LC_COLLATE setting? regards, tom lane
Its an int4. Charlie Tom Lane wrote: > Charlie Savage <cfis@interserv.com> writes: >> Thus the time decreased from 8486 seconds to 5279 seconds - which is a >> nice improvement. However, that still leaves postgresql about 9 times >> slower. > > BTW, what data type are you sorting, exactly? If it's a string type, > what is your LC_COLLATE setting? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
I'd set up a trigger to maintain summary tables perhaps... Chris Charlie Savage wrote: > Thanks everyone for the feedback. > > I tried increasing work_mem: > > set work_mem to 300000; > > select tlid, min(ogc_fid) > from completechain > group by tld; > > The results are: > > "GroupAggregate (cost=9041602.80..10003036.88 rows=48071704 width=8) > (actual time=4371749.523..5106162.256 rows=47599910 loops=1)" > " -> Sort (cost=9041602.80..9161782.06 rows=48071704 width=8) (actual > time=4371690.894..4758660.433 rows=48199165 loops=1)" > " Sort Key: tlid" > " -> Seq Scan on completechain (cost=0.00..2228584.04 > rows=48071704 width=8) (actual time=49.518..805234.970 rows=48199165 > loops=1)" > "Total runtime: 5279988.127 ms" > > Thus the time decreased from 8486 seconds to 5279 seconds - which is a > nice improvement. However, that still leaves postgresql about 9 times > slower. > > I tried increasing work_mem up to 500000, but at that point the machine > started using its swap partition and performance degraded back to the > original values. > > Charlie > > > Richard Huxton wrote: > > Charlie Savage wrote: > >> Hi everyone, > >> > >> I have a question about the performance of sort. > > > >> Note it takes over 10 times longer to do the sort than the full > >> sequential scan. > >> > >> Should I expect results like this? I realize that the computer is > >> quite low-end and is very IO bound for this query, but I'm still > >> surprised that the sort operation takes so long. > > > > The sort will be spilling to disk, which will grind your I/O to a halt. > > > >> work_mem = 16384 # in Kb > > > > Try upping this. You should be able to issue "set work_mem = 100000" > > before running your query IIRC. That should let PG do its sorting in > > larger chunks. > > > > Also, if your most common access pattern is ordered via tlid look into > > clustering the table on that. > > > > Richard Huxton wrote: > >> Charlie Savage wrote: >> >>> Hi everyone, >>> >>> I have a question about the performance of sort. >> >> >>> Note it takes over 10 times longer to do the sort than the full >>> sequential scan. >>> >>> Should I expect results like this? I realize that the computer is >>> quite low-end and is very IO bound for this query, but I'm still >>> surprised that the sort operation takes so long. >> >> >> The sort will be spilling to disk, which will grind your I/O to a halt. >> >>> work_mem = 16384 # in Kb >> >> >> Try upping this. You should be able to issue "set work_mem = 100000" >> before running your query IIRC. That should let PG do its sorting in >> larger chunks. >> >> Also, if your most common access pattern is ordered via tlid look into >> clustering the table on that. > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
Very interesting technique. It doesn't actually do quite what I want since it returns all rows that do not have duplicates and not a complete list of unique tlid values. But I could massage it to do what I want. Anyway, the timing: "Seq Scan on completechain t1 (cost=0.00..218139733.60 rows=24099582 width=4) (actual time=25.890..3404650.452 rows=47000655 loops=1)" " Filter: (NOT (subplan))" " SubPlan" " -> Index Scan using idx_completechain_tlid on completechain t2 (cost=0.00..4.48 rows=1 width=0) (actual time=0.059..0.059 rows=0 loops=48199165)" " Index Cond: ($0 = tlid)" " Filter: ($1 <> ogc_fid)" "Total runtime: 3551423.162 ms" Marc Morin wrote: So a 60% reduction in time. Thanks again for the tip. Charlie > I have run into this type of query problem as well. I solved it in my > application by the following type of query. > > > Assumes of course that you have an index on tlid. > >> -----Original Message----- >> From: pgsql-performance-owner@postgresql.org >> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of >> Charlie Savage >> Sent: Tuesday, November 08, 2005 2:05 AM >> To: pgsql-performance@postgresql.org >> Subject: [PERFORM] Sort performance on large tables >> >> Hi everyone, >> >> I have a question about the performance of sort. >> >> Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 >> RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate >> 400GB drive. The IDE drive has the OS and the WAL files, the >> SATA drive the database. >> From hdparm the max IO for the IDE drive is about 50Mb/s and >> the SATA drive is about 65Mb/s. Thus a very low-end machine >> - but it used just for development (i.e., it is not a >> production machine) and the only thing it does is run a >> PostgresSQL database. >> >> I have a staging table called completechain that holds US >> tiger data (i.e., streets and addresses for the US). The >> table is approximately 18GB. Its big because there is a lot >> of data, but also because the table is not normalized (it >> comes that way). >> >> I want to extract data out of the file, with the most >> important values being stored in a column called tlid. The >> tlid field is an integer, and the values are 98% unique. >> There is a second column called ogc_fid which is unique (it >> is a serial field). I need to extract out unique TLID's >> (doesn't matter which duplicate I get rid of). To do this I >> am running this query: >> >> SELECT tlid, min(ogc_fid) >> FROM completechain >> GROUP BY tlid; >> >> The results from explain analyze are: >> >> "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 >> width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)" >> " -> Sort (cost=10400373.80..10520553.06 rows=48071704 >> width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)" >> " Sort Key: tlid" >> " -> Seq Scan on completechain (cost=0.00..2228584.04 >> rows=48071704 width=8) (actual time=27.514..773245.046 >> rows=48199165 loops=1)" >> "Total runtime: 8486057.185 ms" >> >> Doing a similar query produces the same results: >> >> SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain; >> >> Note it takes over 10 times longer to do the sort than the >> full sequential scan. >> >> Should I expect results like this? I realize that the >> computer is quite low-end and is very IO bound for this >> query, but I'm still surprised that the sort operation takes so long. >> >> Out of curiosity, I setup an Oracle database on the same >> machine with the same data and ran the same query. Oracle >> was over an order of magnitude faster. Looking at its query >> plan, it avoided the sort by using "HASH GROUP BY." Does >> such a construct exist in PostgreSQL (I see only hash joins)? >> >> Also as an experiment I forced oracle to do a sort by running >> this query: >> >> SELECT tlid, min(ogc_fid) >> FROM completechain >> GROUP BY tlid >> ORDER BY tlid; >> >> Even with this, it was more than a magnitude faster than Postgresql. >> Which makes me think I have somehow misconfigured postgresql >> (see the relevant parts of postgresql.conf below). >> >> Any idea/help appreciated. >> >> Thanks, >> >> Charlie >> >> >> ------------------------------- >> >> #------------------------------------------------------------- >> -------------- >> # RESOURCE USAGE (except WAL) >> #------------------------------------------------------------- >> -------------- >> >> shared_buffers = 40000 # 40000 buffers * 8192 >> bytes/buffer = 327,680,000 bytes >> #shared_buffers = 1000 # min 16 or >> max_connections*2, 8KB each >> >> temp_buffers = 5000 >> #temp_buffers = 1000 # min 100, 8KB each >> #max_prepared_transactions = 5 # can be 0 or more >> # note: increasing max_prepared_transactions costs ~600 bytes >> of shared memory # per transaction slot, plus lock space (see >> max_locks_per_transaction). >> >> work_mem = 16384 # in Kb >> #work_mem = 1024 # min 64, size in KB >> >> maintenance_work_mem = 262144 # in kb >> #maintenance_work_mem = 16384 # min 1024, size in KB >> #max_stack_depth = 2048 # min 100, size in KB >> >> # - Free Space Map - >> >> max_fsm_pages = 60000 >> #max_fsm_pages = 20000 # min >> max_fsm_relations*16, 6 bytes each >> >> #max_fsm_relations = 1000 # min 100, ~70 bytes each >> >> # - Kernel Resource Usage - >> >> #max_files_per_process = 1000 # min 25 >> #preload_libraries = '' >> >> # - Cost-Based Vacuum Delay - >> >> #vacuum_cost_delay = 0 # 0-1000 milliseconds >> #vacuum_cost_page_hit = 1 # 0-10000 credits >> #vacuum_cost_page_miss = 10 # 0-10000 credits >> #vacuum_cost_page_dirty = 20 # 0-10000 credits >> #vacuum_cost_limit = 200 # 0-10000 credits >> >> # - Background writer - >> >> #bgwriter_delay = 200 # 10-10000 milliseconds >> between rounds >> #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers >> scanned/round >> #bgwriter_lru_maxpages = 5 # 0-1000 buffers max >> written/round >> #bgwriter_all_percent = 0.333 # 0-100% of all buffers >> scanned/round >> #bgwriter_all_maxpages = 5 # 0-1000 buffers max >> written/round >> >> >> #------------------------------------------------------------- >> -------------- >> # WRITE AHEAD LOG >> #------------------------------------------------------------- >> -------------- >> >> # - Settings - >> >> fsync = on # turns forced >> synchronization on or off >> #wal_sync_method = fsync # the default is the >> first option >> # supported by the >> operating system: >> # open_datasync >> # fdatasync >> # fsync >> # fsync_writethrough >> # open_sync >> #full_page_writes = on # recover from >> partial page writes >> >> wal_buffers = 128 >> #wal_buffers = 8 # min 4, 8KB each >> >> #commit_delay = 0 # range 0-100000, in >> microseconds >> #commit_siblings = 5 # range 1-1000 >> >> # - Checkpoints - >> >> checkpoint_segments = 256 # 256 * 16Mb = >> 4,294,967,296 bytes >> checkpoint_timeout = 1200 # 1200 seconds (20 minutes) >> checkpoint_warning = 30 # in seconds, 0 is off >> >> #checkpoint_segments = 3 # in logfile segments, >> min 1, 16MB each >> #checkpoint_timeout = 300 # range 30-3600, in seconds >> #checkpoint_warning = 30 # in seconds, 0 is off >> >> # - Archiving - >> >> #archive_command = '' # command to use to >> archive a logfile >> # segment >> >> >> #------------------------------------------------------------- >> -------------- >> # QUERY TUNING >> #------------------------------------------------------------- >> -------------- >> >> # - Planner Method Configuration - >> >> #enable_bitmapscan = on >> #enable_hashagg = on >> #enable_hashjoin = on >> #enable_indexscan = on >> #enable_mergejoin = on >> #enable_nestloop = on >> #enable_seqscan = on >> #enable_sort = on >> #enable_tidscan = on >> >> # - Planner Cost Constants - >> >> effective_cache_size = 80000 # 80000 * 8192 = >> 655,360,000 bytes >> #effective_cache_size = 1000 # typically 8KB each >> >> random_page_cost = 2.5 # units are one >> sequential page fetch >> #random_page_cost = 4 # units are one >> sequential page fetch >> # cost >> #cpu_tuple_cost = 0.01 # (same) >> #cpu_index_tuple_cost = 0.001 # (same) >> #cpu_operator_cost = 0.0025 # (same) >> >> # - Genetic Query Optimizer - >> >> #geqo = on >> #geqo_threshold = 12 >> #geqo_effort = 5 # range 1-10 >> #geqo_pool_size = 0 # selects default based >> on effort >> #geqo_generations = 0 # selects default based >> on effort >> #geqo_selection_bias = 2.0 # range 1.5-2.0 >> >> # - Other Planner Options - >> >> default_statistics_target = 100 # range 1-1000 >> #default_statistics_target = 10 # range 1-1000 >> #constraint_exclusion = off >> #from_collapse_limit = 8 >> #join_collapse_limit = 8 # 1 disables collapsing >> of explicit >> # JOINs >> >> >> #------------------------------------------------------------- >> -------------- >> #------------------------------------------------------------- >> -------------- >> # RUNTIME STATISTICS >> #------------------------------------------------------------- >> -------------- >> >> # - Statistics Monitoring - >> >> #log_parser_stats = off >> #log_planner_stats = off >> #log_executor_stats = off >> #log_statement_stats = off >> >> # - Query/Index Statistics Collector - >> >> stats_start_collector = on >> stats_command_string = on >> stats_block_level = on >> stats_row_level = on >> >> #stats_start_collector = on >> #stats_command_string = off >> #stats_block_level = off >> #stats_row_level = off >> #stats_reset_on_server_start = off >> >> >> #------------------------------------------------------------- >> -------------- >> # AUTOVACUUM PARAMETERS >> #------------------------------------------------------------- >> -------------- >> >> autovacuum = true >> autovacuum_naptime = 600 >> >> #autovacuum = false # enable autovacuum subprocess? >> #autovacuum_naptime = 60 # time between >> autovacuum runs, in secs >> #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before >> # vacuum >> #autovacuum_analyze_threshold = 500 # min # of tuple updates before >> # analyze >> #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before >> # vacuum >> #autovacuum_analyze_scale_factor = 0.2 # fraction of >> rel size before >> # analyze >> #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for >> # autovac, -1 means use >> # vacuum_cost_delay >> #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for >> # autovac, -1 means use >> # vacuum_cost_ >> >> >> ---------------------- >> >> CREATE TABLE tiger.completechain >> ( >> ogc_fid int4 NOT NULL DEFAULT >> nextval('completechain_ogc_fid_seq'::regclass), >> module varchar(8) NOT NULL, >> tlid int4 NOT NULL, >> side1 int4, >> source varchar(1) NOT NULL, >> fedirp varchar(2), >> fename varchar(30), >> fetype varchar(4), >> fedirs varchar(2), >> cfcc varchar(3) NOT NULL, >> fraddl varchar(11), >> toaddl varchar(11), >> fraddr varchar(11), >> toaddr varchar(11), >> friaddl varchar(1), >> toiaddl varchar(1), >> friaddr varchar(1), >> toiaddr varchar(1), >> zipl int4, >> zipr int4, >> aianhhfpl int4, >> aianhhfpr int4, >> aihhtlil varchar(1), >> aihhtlir varchar(1), >> census1 varchar(1), >> census2 varchar(1), >> statel int4, >> stater int4, >> countyl int4, >> countyr int4, >> cousubl int4, >> cousubr int4, >> submcdl int4, >> submcdr int4, >> placel int4, >> placer int4, >> tractl int4, >> tractr int4, >> blockl int4, >> blockr int4, >> wkb_geometry public.geometry NOT NULL, >> CONSTRAINT enforce_dims_wkb_geometry CHECK >> (ndims(wkb_geometry) = 2), >> CONSTRAINT enforce_geotype_wkb_geometry CHECK >> (geometrytype(wkb_geometry) = 'LINESTRING'::text OR >> wkb_geometry IS NULL), >> CONSTRAINT enforce_srid_wkb_geometry CHECK >> (srid(wkb_geometry) = 4269) >> ) >> WITHOUT OIDS; >> ALTER TABLE tiger.completechain OWNER TO postgres; >> >> >> >> >> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings >> >> > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
On Tue, 2005-11-08 at 00:05 -0700, Charlie Savage wrote: > Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 RC 1 with > I want to extract data out of the file, with the most important values > being stored in a column called tlid. The tlid field is an integer, and > the values are 98% unique. There is a second column called ogc_fid > which is unique (it is a serial field). I need to extract out unique > TLID's (doesn't matter which duplicate I get rid of). To do this I am > running this query: > > SELECT tlid, min(ogc_fid) > FROM completechain > GROUP BY tlid; > > The results from explain analyze are: > > "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 width=8) > (actual time=7311682.715..8315746.835 rows=47599910 loops=1)" > " -> Sort (cost=10400373.80..10520553.06 rows=48071704 width=8) > (actual time=7311682.682..7972304.777 rows=48199165 loops=1)" > " Sort Key: tlid" > " -> Seq Scan on completechain (cost=0.00..2228584.04 > rows=48071704 width=8) (actual time=27.514..773245.046 rows=48199165 > loops=1)" > "Total runtime: 8486057.185 ms" > Should I expect results like this? I realize that the computer is quite > low-end and is very IO bound for this query, but I'm still surprised > that the sort operation takes so long. > > Out of curiosity, I setup an Oracle database on the same machine with > the same data and ran the same query. Oracle was over an order of > magnitude faster. Looking at its query plan, it avoided the sort by > using "HASH GROUP BY." Does such a construct exist in PostgreSQL (I see > only hash joins)? PostgreSQL can do HashAggregates as well as GroupAggregates, just like Oracle. HashAggs avoid the sort phase, so would improve performance considerably. The difference in performance you are getting is because of the different plan used. Did you specifically do anything to Oracle to help it get that plan, or was it a pure out-of-the-box install (or maybe even a "set this up for Data Warehousing" install)? To get a HashAgg plan, you need to be able to fit all of the unique values in memory. That would be 98% of 48071704 rows, each 8+ bytes wide, giving a HashAgg memory sizing of over 375MB. You must allocate memory of the next power of two above the level you want, so we would need to allocate 512MB to work_mem before it would consider using a HashAgg. Can you let us know how high you have to set work_mem before an EXPLAIN (not EXPLAIN ANALYZE) chooses the HashAgg plan? Please be aware that publishing Oracle performance results is against the terms of their licence and we seek to be both fair and legitimate, especially within this public discussion forum. Best Regards, Simon Riggs
Hi Simon, Thanks for the response Simon. > PostgreSQL can do HashAggregates as well as GroupAggregates, just like > Oracle. HashAggs avoid the sort phase, so would improve performance > considerably. The difference in performance you are getting is because > of the different plan used. Did you specifically do anything to Oracle > to help it get that plan, or was it a pure out-of-the-box install (or > maybe even a "set this up for Data Warehousing" install)? It was an out-of-the-box plan with the standard database install option (wasn't a Data Warehousing install). > Can you let us know how high you have to set work_mem before an EXPLAIN > (not EXPLAIN ANALYZE) chooses the HashAgg plan? The planner picked a HashAggregate only when I set work_mem to 2097151 - which I gather is the maximum allowed value according to a message returned from the server. > Please be aware that publishing Oracle performance results is against > the terms of their licence and we seek to be both fair and legitimate, > especially within this public discussion forum. Sorry, I didn't realize - I'll be more vague next time. Charlie
...and on those notes, let me repeat my often stated advice that a DB server should be configured with as much RAM as isfeasible. 4GB or more strongly recommended. I'll add that the HW you are using for a DB server should be able to hold _at least_ 4GB of RAM (note that modern _laptops_can hold 2GB. Next year's are likely to be able to hold 4GB.). I can't casually find specs on the D3000, but ifit can't be upgraded to at least 4GB, you should be looking for new DB server HW. At this writing, 4 1GB DIMMs (4GB) should set you back ~$300 or less. 4 2GB DIMMs (8GB) should cost ~$600. As of now, very few mainboards support 4GB DIMMs and I doubt the D3000 has such a mainboard. If you can use them, 4 4GBDIMMs (16GB) will currently set you back ~$1600-$2400. Whatever the way you do it, it's well worth the money to have at least 4GB of RAM in a DB server. It makes all kinds ofproblems just not exist. Ron -----Original Message----- From: Simon Riggs <simon@2ndquadrant.com> Sent: Nov 9, 2005 4:35 AM To: Charlie Savage <cfis@interserv.com>, Luke Lonergan <llonergan@greenplum.com> Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Sort performance on large tables On Tue, 2005-11-08 at 00:05 -0700, Charlie Savage wrote: > Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 RC 1 with > I want to extract data out of the file, with the most important values > being stored in a column called tlid. The tlid field is an integer, and > the values are 98% unique. There is a second column called ogc_fid > which is unique (it is a serial field). I need to extract out unique > TLID's (doesn't matter which duplicate I get rid of). To do this I am > running this query: > > SELECT tlid, min(ogc_fid) > FROM completechain > GROUP BY tlid; > > The results from explain analyze are: > > "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 width=8) > (actual time=7311682.715..8315746.835 rows=47599910 loops=1)" > " -> Sort (cost=10400373.80..10520553.06 rows=48071704 width=8) > (actual time=7311682.682..7972304.777 rows=48199165 loops=1)" > " Sort Key: tlid" > " -> Seq Scan on completechain (cost=0.00..2228584.04 > rows=48071704 width=8) (actual time=27.514..773245.046 rows=48199165 > loops=1)" > "Total runtime: 8486057.185 ms" > Should I expect results like this? I realize that the computer is quite > low-end and is very IO bound for this query, but I'm still surprised > that the sort operation takes so long. > > Out of curiosity, I setup an Oracle database on the same machine with > the same data and ran the same query. Oracle was over an order of > magnitude faster. Looking at its query plan, it avoided the sort by > using "HASH GROUP BY." Does such a construct exist in PostgreSQL (I see > only hash joins)? PostgreSQL can do HashAggregates as well as GroupAggregates, just like Oracle. HashAggs avoid the sort phase, so would improve performance considerably. The difference in performance you are getting is because of the different plan used. Did you specifically do anything to Oracle to help it get that plan, or was it a pure out-of-the-box install (or maybe even a "set this up for Data Warehousing" install)? To get a HashAgg plan, you need to be able to fit all of the unique values in memory. That would be 98% of 48071704 rows, each 8+ bytes wide, giving a HashAgg memory sizing of over 375MB. You must allocate memory of the next power of two above the level you want, so we would need to allocate 512MB to work_mem before it would consider using a HashAgg. Can you let us know how high you have to set work_mem before an EXPLAIN (not EXPLAIN ANALYZE) chooses the HashAgg plan? Please be aware that publishing Oracle performance results is against the terms of their licence and we seek to be both fair and legitimate, especially within this public discussion forum. Best Regards, Simon Riggs ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly
My original post did not take into account VAT, I apologize for that oversight. However, unless you are naive, or made of gold, or have some sort of "special" relationship that requires you to, _NE VER_buy RAM from your computer HW OEM. For at least two decades it's been a provable fact that OEMs like DEC, Sun, HP, Compaq,Dell, etc, etc charge far more per GB for the RAM they sell. Same goes for HDs. Buy your memory and HDs direct fromreputable manufacturers, you'll get at least the same quality and pay considerably less. Your Dell example is evidence that supports my point. As of this writing, decent RAM should cost $75-$150 pr GB (not includingVAT ;-) ). Don't let yourself be conned into paying more. I'm talking about decent RAM from reputable direct suppliers like Corsair and Kingston (_not_ their Value RAM, the actualKingston branded stuff), OCZ, etc. Such companies sell via multiple channels, including repuatble websites like dealtime.com,pricewatch.com, newegg.com, etc, etc. You are quite correct that there's poor quality junk out there. I was not talking about it, only reasonable quality components. Ron -----Original Message----- From: Kurt De Grave <Kurt.DeGrave@student.kuleuven.be> Sent: Nov 10, 2005 5:40 AM To: Ron Peacetree <rjpeace@earthlink.net> Cc: Charlie Savage <cfis@interserv.com>, pgsql-performance@postgresql.org Subject: Re: [PERFORM] Sort performance on large tables On Wed, 9 Nov 2005, Ron Peacetree wrote: > At this writing, 4 1GB DIMMs (4GB) should set you back ~$300 or less. > 4 2GB DIMMs (8GB) should cost ~$600. As of now, very few mainboards > support 4GB DIMMs and I doubt the D3000 has such a mainboard. If you > can use them, 4 4GB DIMMs (16GB) will currently set you back > ~$1600-$2400. Sorry, but every time again I see unrealistic memory prices quoted when the buy-more-memory argument passes by. What kind of memory are you buying for your servers? Non-ECC no-name memory that doesn't even pass a one-hour memtest86 for 20% of the items you buy? Just checked at Dell's web page: adding 4 1GB DIMMs to a PowerEdge 2850 sets you back _1280 EURO_ excluding VAT. And that's after they already charged you 140 euro for replacing the obsolete standard 4 512MB DIMMs with the same capacity in 1GB DIMMs. So the 4GB upgrade actually costs 1420 euro plus VAT, which is quite a bit more than $300. Okay, few people will happily buy at those prices. You can get the exact same goods much cheaper elsewhere, but it'll still cost you way more than the number you gave, plus you'll have to drive to the server's location, open up the box yourself, and risk incompatibilities and support problems if there's ever something wrong with that memory. Disclaimers: I know that you're talking about a desktop in this particular case. I wouldn't see a need for ECC in a development box either. I know a Dell hasn't been the smartest choice for a database box lately (but politics...). kurt.
We use this memory in all our servers (well - the 512 sticks). 0 problems to date: http://www.newegg.com/Product/Product.asp?Item=N82E16820145513 $163 for 1GB. This stuff is probably better than the Samsung RAM dell is selling you for 3 times the price. Alex On 11/10/05, Ron Peacetree <rjpeace@earthlink.net> wrote: > My original post did not take into account VAT, I apologize for that oversight. > > However, unless you are naive, or made of gold, or have some sort of "special" relationship that requires you to, _NE VER_buy RAM from your computer HW OEM. For at least two decades it's been a provable fact that OEMs like DEC, Sun, HP, Compaq,Dell, etc, etc charge far more per GB for the RAM they sell. Same goes for HDs. Buy your memory and HDs direct fromreputable manufacturers, you'll get at least the same quality and pay considerably less. > > Your Dell example is evidence that supports my point. As of this writing, decent RAM should cost $75-$150 pr GB (not includingVAT ;-) ). Don't let yourself be conned into paying more. > > I'm talking about decent RAM from reputable direct suppliers like Corsair and Kingston (_not_ their Value RAM, the actualKingston branded stuff), OCZ, etc. Such companies sell via multiple channels, including repuatble websites like dealtime.com,pricewatch.com, newegg.com, etc, etc. > > You are quite correct that there's poor quality junk out there. I was not talking about it, only reasonable quality components. > > Ron > > > -----Original Message----- > From: Kurt De Grave <Kurt.DeGrave@student.kuleuven.be> > Sent: Nov 10, 2005 5:40 AM > To: Ron Peacetree <rjpeace@earthlink.net> > Cc: Charlie Savage <cfis@interserv.com>, pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Sort performance on large tables > > > > On Wed, 9 Nov 2005, Ron Peacetree wrote: > > > At this writing, 4 1GB DIMMs (4GB) should set you back ~$300 or less. > > 4 2GB DIMMs (8GB) should cost ~$600. As of now, very few mainboards > > support 4GB DIMMs and I doubt the D3000 has such a mainboard. If you > > can use them, 4 4GB DIMMs (16GB) will currently set you back > > ~$1600-$2400. > > Sorry, but every time again I see unrealistic memory prices quoted when > the buy-more-memory argument passes by. > What kind of memory are you buying for your servers? Non-ECC no-name > memory that doesn't even pass a one-hour memtest86 for 20% of the items > you buy? > > Just checked at Dell's web page: adding 4 1GB DIMMs to a PowerEdge 2850 > sets you back _1280 EURO_ excluding VAT. And that's after they already > charged you 140 euro for replacing the obsolete standard 4 512MB DIMMs > with the same capacity in 1GB DIMMs. So the 4GB upgrade actually costs > 1420 euro plus VAT, which is quite a bit more than $300. > > Okay, few people will happily buy at those prices. You can get the > exact same goods much cheaper elsewhere, but it'll still cost you way > more than the number you gave, plus you'll have to drive to the server's > location, open up the box yourself, and risk incompatibilities and > support problems if there's ever something wrong with that memory. > > Disclaimers: > I know that you're talking about a desktop in this particular case. > I wouldn't see a need for ECC in a development box either. > I know a Dell hasn't been the smartest choice for a database box lately > (but politics...). > > kurt. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >