Re: Sort performance on large tables - Mailing list pgsql-performance
From | Charlie Savage |
---|---|
Subject | Re: Sort performance on large tables |
Date | |
Msg-id | 43718DF7.6060609@interserv.com Whole thread Raw |
In response to | Re: Sort performance on large tables ("Marc Morin" <marc@sandvine.com>) |
List | pgsql-performance |
Very interesting technique. It doesn't actually do quite what I want since it returns all rows that do not have duplicates and not a complete list of unique tlid values. But I could massage it to do what I want. Anyway, the timing: "Seq Scan on completechain t1 (cost=0.00..218139733.60 rows=24099582 width=4) (actual time=25.890..3404650.452 rows=47000655 loops=1)" " Filter: (NOT (subplan))" " SubPlan" " -> Index Scan using idx_completechain_tlid on completechain t2 (cost=0.00..4.48 rows=1 width=0) (actual time=0.059..0.059 rows=0 loops=48199165)" " Index Cond: ($0 = tlid)" " Filter: ($1 <> ogc_fid)" "Total runtime: 3551423.162 ms" Marc Morin wrote: So a 60% reduction in time. Thanks again for the tip. Charlie > I have run into this type of query problem as well. I solved it in my > application by the following type of query. > > > Assumes of course that you have an index on tlid. > >> -----Original Message----- >> From: pgsql-performance-owner@postgresql.org >> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of >> Charlie Savage >> Sent: Tuesday, November 08, 2005 2:05 AM >> To: pgsql-performance@postgresql.org >> Subject: [PERFORM] Sort performance on large tables >> >> Hi everyone, >> >> I have a question about the performance of sort. >> >> Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1 >> RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate >> 400GB drive. The IDE drive has the OS and the WAL files, the >> SATA drive the database. >> From hdparm the max IO for the IDE drive is about 50Mb/s and >> the SATA drive is about 65Mb/s. Thus a very low-end machine >> - but it used just for development (i.e., it is not a >> production machine) and the only thing it does is run a >> PostgresSQL database. >> >> I have a staging table called completechain that holds US >> tiger data (i.e., streets and addresses for the US). The >> table is approximately 18GB. Its big because there is a lot >> of data, but also because the table is not normalized (it >> comes that way). >> >> I want to extract data out of the file, with the most >> important values being stored in a column called tlid. The >> tlid field is an integer, and the values are 98% unique. >> There is a second column called ogc_fid which is unique (it >> is a serial field). I need to extract out unique TLID's >> (doesn't matter which duplicate I get rid of). To do this I >> am running this query: >> >> SELECT tlid, min(ogc_fid) >> FROM completechain >> GROUP BY tlid; >> >> The results from explain analyze are: >> >> "GroupAggregate (cost=10400373.80..11361807.88 rows=48071704 >> width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)" >> " -> Sort (cost=10400373.80..10520553.06 rows=48071704 >> width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)" >> " Sort Key: tlid" >> " -> Seq Scan on completechain (cost=0.00..2228584.04 >> rows=48071704 width=8) (actual time=27.514..773245.046 >> rows=48199165 loops=1)" >> "Total runtime: 8486057.185 ms" >> >> Doing a similar query produces the same results: >> >> SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain; >> >> Note it takes over 10 times longer to do the sort than the >> full sequential scan. >> >> Should I expect results like this? I realize that the >> computer is quite low-end and is very IO bound for this >> query, but I'm still surprised that the sort operation takes so long. >> >> Out of curiosity, I setup an Oracle database on the same >> machine with the same data and ran the same query. Oracle >> was over an order of magnitude faster. Looking at its query >> plan, it avoided the sort by using "HASH GROUP BY." Does >> such a construct exist in PostgreSQL (I see only hash joins)? >> >> Also as an experiment I forced oracle to do a sort by running >> this query: >> >> SELECT tlid, min(ogc_fid) >> FROM completechain >> GROUP BY tlid >> ORDER BY tlid; >> >> Even with this, it was more than a magnitude faster than Postgresql. >> Which makes me think I have somehow misconfigured postgresql >> (see the relevant parts of postgresql.conf below). >> >> Any idea/help appreciated. >> >> Thanks, >> >> Charlie >> >> >> ------------------------------- >> >> #------------------------------------------------------------- >> -------------- >> # RESOURCE USAGE (except WAL) >> #------------------------------------------------------------- >> -------------- >> >> shared_buffers = 40000 # 40000 buffers * 8192 >> bytes/buffer = 327,680,000 bytes >> #shared_buffers = 1000 # min 16 or >> max_connections*2, 8KB each >> >> temp_buffers = 5000 >> #temp_buffers = 1000 # min 100, 8KB each >> #max_prepared_transactions = 5 # can be 0 or more >> # note: increasing max_prepared_transactions costs ~600 bytes >> of shared memory # per transaction slot, plus lock space (see >> max_locks_per_transaction). >> >> work_mem = 16384 # in Kb >> #work_mem = 1024 # min 64, size in KB >> >> maintenance_work_mem = 262144 # in kb >> #maintenance_work_mem = 16384 # min 1024, size in KB >> #max_stack_depth = 2048 # min 100, size in KB >> >> # - Free Space Map - >> >> max_fsm_pages = 60000 >> #max_fsm_pages = 20000 # min >> max_fsm_relations*16, 6 bytes each >> >> #max_fsm_relations = 1000 # min 100, ~70 bytes each >> >> # - Kernel Resource Usage - >> >> #max_files_per_process = 1000 # min 25 >> #preload_libraries = '' >> >> # - Cost-Based Vacuum Delay - >> >> #vacuum_cost_delay = 0 # 0-1000 milliseconds >> #vacuum_cost_page_hit = 1 # 0-10000 credits >> #vacuum_cost_page_miss = 10 # 0-10000 credits >> #vacuum_cost_page_dirty = 20 # 0-10000 credits >> #vacuum_cost_limit = 200 # 0-10000 credits >> >> # - Background writer - >> >> #bgwriter_delay = 200 # 10-10000 milliseconds >> between rounds >> #bgwriter_lru_percent = 1.0 # 0-100% of LRU buffers >> scanned/round >> #bgwriter_lru_maxpages = 5 # 0-1000 buffers max >> written/round >> #bgwriter_all_percent = 0.333 # 0-100% of all buffers >> scanned/round >> #bgwriter_all_maxpages = 5 # 0-1000 buffers max >> written/round >> >> >> #------------------------------------------------------------- >> -------------- >> # WRITE AHEAD LOG >> #------------------------------------------------------------- >> -------------- >> >> # - Settings - >> >> fsync = on # turns forced >> synchronization on or off >> #wal_sync_method = fsync # the default is the >> first option >> # supported by the >> operating system: >> # open_datasync >> # fdatasync >> # fsync >> # fsync_writethrough >> # open_sync >> #full_page_writes = on # recover from >> partial page writes >> >> wal_buffers = 128 >> #wal_buffers = 8 # min 4, 8KB each >> >> #commit_delay = 0 # range 0-100000, in >> microseconds >> #commit_siblings = 5 # range 1-1000 >> >> # - Checkpoints - >> >> checkpoint_segments = 256 # 256 * 16Mb = >> 4,294,967,296 bytes >> checkpoint_timeout = 1200 # 1200 seconds (20 minutes) >> checkpoint_warning = 30 # in seconds, 0 is off >> >> #checkpoint_segments = 3 # in logfile segments, >> min 1, 16MB each >> #checkpoint_timeout = 300 # range 30-3600, in seconds >> #checkpoint_warning = 30 # in seconds, 0 is off >> >> # - Archiving - >> >> #archive_command = '' # command to use to >> archive a logfile >> # segment >> >> >> #------------------------------------------------------------- >> -------------- >> # QUERY TUNING >> #------------------------------------------------------------- >> -------------- >> >> # - Planner Method Configuration - >> >> #enable_bitmapscan = on >> #enable_hashagg = on >> #enable_hashjoin = on >> #enable_indexscan = on >> #enable_mergejoin = on >> #enable_nestloop = on >> #enable_seqscan = on >> #enable_sort = on >> #enable_tidscan = on >> >> # - Planner Cost Constants - >> >> effective_cache_size = 80000 # 80000 * 8192 = >> 655,360,000 bytes >> #effective_cache_size = 1000 # typically 8KB each >> >> random_page_cost = 2.5 # units are one >> sequential page fetch >> #random_page_cost = 4 # units are one >> sequential page fetch >> # cost >> #cpu_tuple_cost = 0.01 # (same) >> #cpu_index_tuple_cost = 0.001 # (same) >> #cpu_operator_cost = 0.0025 # (same) >> >> # - Genetic Query Optimizer - >> >> #geqo = on >> #geqo_threshold = 12 >> #geqo_effort = 5 # range 1-10 >> #geqo_pool_size = 0 # selects default based >> on effort >> #geqo_generations = 0 # selects default based >> on effort >> #geqo_selection_bias = 2.0 # range 1.5-2.0 >> >> # - Other Planner Options - >> >> default_statistics_target = 100 # range 1-1000 >> #default_statistics_target = 10 # range 1-1000 >> #constraint_exclusion = off >> #from_collapse_limit = 8 >> #join_collapse_limit = 8 # 1 disables collapsing >> of explicit >> # JOINs >> >> >> #------------------------------------------------------------- >> -------------- >> #------------------------------------------------------------- >> -------------- >> # RUNTIME STATISTICS >> #------------------------------------------------------------- >> -------------- >> >> # - Statistics Monitoring - >> >> #log_parser_stats = off >> #log_planner_stats = off >> #log_executor_stats = off >> #log_statement_stats = off >> >> # - Query/Index Statistics Collector - >> >> stats_start_collector = on >> stats_command_string = on >> stats_block_level = on >> stats_row_level = on >> >> #stats_start_collector = on >> #stats_command_string = off >> #stats_block_level = off >> #stats_row_level = off >> #stats_reset_on_server_start = off >> >> >> #------------------------------------------------------------- >> -------------- >> # AUTOVACUUM PARAMETERS >> #------------------------------------------------------------- >> -------------- >> >> autovacuum = true >> autovacuum_naptime = 600 >> >> #autovacuum = false # enable autovacuum subprocess? >> #autovacuum_naptime = 60 # time between >> autovacuum runs, in secs >> #autovacuum_vacuum_threshold = 1000 # min # of tuple updates before >> # vacuum >> #autovacuum_analyze_threshold = 500 # min # of tuple updates before >> # analyze >> #autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before >> # vacuum >> #autovacuum_analyze_scale_factor = 0.2 # fraction of >> rel size before >> # analyze >> #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for >> # autovac, -1 means use >> # vacuum_cost_delay >> #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for >> # autovac, -1 means use >> # vacuum_cost_ >> >> >> ---------------------- >> >> CREATE TABLE tiger.completechain >> ( >> ogc_fid int4 NOT NULL DEFAULT >> nextval('completechain_ogc_fid_seq'::regclass), >> module varchar(8) NOT NULL, >> tlid int4 NOT NULL, >> side1 int4, >> source varchar(1) NOT NULL, >> fedirp varchar(2), >> fename varchar(30), >> fetype varchar(4), >> fedirs varchar(2), >> cfcc varchar(3) NOT NULL, >> fraddl varchar(11), >> toaddl varchar(11), >> fraddr varchar(11), >> toaddr varchar(11), >> friaddl varchar(1), >> toiaddl varchar(1), >> friaddr varchar(1), >> toiaddr varchar(1), >> zipl int4, >> zipr int4, >> aianhhfpl int4, >> aianhhfpr int4, >> aihhtlil varchar(1), >> aihhtlir varchar(1), >> census1 varchar(1), >> census2 varchar(1), >> statel int4, >> stater int4, >> countyl int4, >> countyr int4, >> cousubl int4, >> cousubr int4, >> submcdl int4, >> submcdr int4, >> placel int4, >> placer int4, >> tractl int4, >> tractr int4, >> blockl int4, >> blockr int4, >> wkb_geometry public.geometry NOT NULL, >> CONSTRAINT enforce_dims_wkb_geometry CHECK >> (ndims(wkb_geometry) = 2), >> CONSTRAINT enforce_geotype_wkb_geometry CHECK >> (geometrytype(wkb_geometry) = 'LINESTRING'::text OR >> wkb_geometry IS NULL), >> CONSTRAINT enforce_srid_wkb_geometry CHECK >> (srid(wkb_geometry) = 4269) >> ) >> WITHOUT OIDS; >> ALTER TABLE tiger.completechain OWNER TO postgres; >> >> >> >> >> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings >> >> > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
pgsql-performance by date: