Re: Sort performance on large tables - Mailing list pgsql-performance

From Charlie Savage
Subject Re: Sort performance on large tables
Date
Msg-id 43718DF7.6060609@interserv.com
Whole thread Raw
In response to Re: Sort performance on large tables  ("Marc Morin" <marc@sandvine.com>)
List pgsql-performance
Very interesting technique.  It doesn't actually do quite what I want
since it returns all rows that do not have duplicates and not a complete
list of unique tlid values.  But I could massage it to do what I want.

Anyway, the timing:


"Seq Scan on completechain t1  (cost=0.00..218139733.60 rows=24099582
width=4) (actual time=25.890..3404650.452 rows=47000655 loops=1)"
"  Filter: (NOT (subplan))"
"  SubPlan"
"    ->  Index Scan using idx_completechain_tlid on completechain t2
(cost=0.00..4.48 rows=1 width=0) (actual time=0.059..0.059 rows=0
loops=48199165)"
"          Index Cond: ($0 = tlid)"
"          Filter: ($1 <> ogc_fid)"
"Total runtime: 3551423.162 ms"
Marc Morin wrote:

So a 60% reduction in time.  Thanks again for the tip.

Charlie


> I have run into this type of query problem as well.  I solved it in my
> application by the following type of query.
>
>
> Assumes of course that you have an index on tlid.
>
>> -----Original Message-----
>> From: pgsql-performance-owner@postgresql.org
>> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of
>> Charlie Savage
>> Sent: Tuesday, November 08, 2005 2:05 AM
>> To: pgsql-performance@postgresql.org
>> Subject: [PERFORM] Sort performance on large tables
>>
>> Hi everyone,
>>
>> I have a question about the performance of sort.
>>
>> Setup: Dell Dimension 3000, Suse 10, 1GB ram, PostgreSQL 8.1
>> RC 1 with PostGIS, 1 built-in 80 GB IDE drive, 1 SATA Seagate
>> 400GB drive.  The IDE drive has the OS and the WAL files, the
>> SATA drive the database.
>>  From hdparm the max IO for the IDE drive is about 50Mb/s and
>> the SATA drive is about 65Mb/s.  Thus a very low-end machine
>> - but it used just for development (i.e., it is not a
>> production machine) and the only thing it does is run a
>> PostgresSQL database.
>>
>> I have a staging table called completechain that holds US
>> tiger data (i.e., streets and addresses for the US).  The
>> table is approximately 18GB.  Its big because there is a lot
>> of data, but also because the table is not normalized (it
>> comes that way).
>>
>> I want to extract data out of the file, with the most
>> important values being stored in a column called tlid.  The
>> tlid field is an integer, and the values are 98% unique.
>> There is a second column called ogc_fid which is unique (it
>> is a serial field).  I need to extract out unique TLID's
>> (doesn't matter which duplicate I get rid of).  To do this I
>> am running this query:
>>
>> SELECT tlid, min(ogc_fid)
>> FROM completechain
>> GROUP BY tlid;
>>
>> The results from explain analyze are:
>>
>> "GroupAggregate  (cost=10400373.80..11361807.88 rows=48071704
>> width=8) (actual time=7311682.715..8315746.835 rows=47599910 loops=1)"
>> "  ->  Sort  (cost=10400373.80..10520553.06 rows=48071704
>> width=8) (actual time=7311682.682..7972304.777 rows=48199165 loops=1)"
>> "        Sort Key: tlid"
>> "        ->  Seq Scan on completechain  (cost=0.00..2228584.04
>> rows=48071704 width=8) (actual time=27.514..773245.046
>> rows=48199165 loops=1)"
>> "Total runtime: 8486057.185 ms"
>>
>> Doing a similar query produces the same results:
>>
>> SELECT DISTINCT ON (tlid), tlid, ogc_fid FROM completechain;
>>
>> Note it takes over 10 times longer to do the sort than the
>> full sequential scan.
>>
>> Should I expect results like this?  I realize that the
>> computer is quite low-end and is very IO bound for this
>> query, but I'm still surprised that the sort operation takes so long.
>>
>> Out of curiosity, I setup an Oracle database on the same
>> machine with the same data and ran the same query.  Oracle
>> was over an order of magnitude faster. Looking at its query
>> plan, it avoided the sort by using "HASH GROUP BY."  Does
>> such a construct exist in PostgreSQL (I see only hash joins)?
>>
>> Also as an experiment I forced oracle to do a sort by running
>> this query:
>>
>> SELECT tlid, min(ogc_fid)
>> FROM completechain
>> GROUP BY tlid
>> ORDER BY tlid;
>>
>> Even with this, it was more than a magnitude faster than Postgresql.
>> Which makes me think I have somehow misconfigured postgresql
>> (see the relevant parts of postgresql.conf below).
>>
>> Any idea/help appreciated.
>>
>> Thanks,
>>
>> Charlie
>>
>>
>> -------------------------------
>>
>> #-------------------------------------------------------------
>> --------------
>> # RESOURCE USAGE (except WAL)
>> #-------------------------------------------------------------
>> --------------
>>
>> shared_buffers = 40000                  # 40000 buffers * 8192
>> bytes/buffer = 327,680,000 bytes
>> #shared_buffers = 1000            # min 16 or
>> max_connections*2, 8KB each
>>
>> temp_buffers = 5000
>> #temp_buffers = 1000            # min 100, 8KB each
>> #max_prepared_transactions = 5        # can be 0 or more
>> # note: increasing max_prepared_transactions costs ~600 bytes
>> of shared memory # per transaction slot, plus lock space (see
>> max_locks_per_transaction).
>>
>> work_mem =  16384                        # in Kb
>> #work_mem = 1024            # min 64, size in KB
>>
>> maintenance_work_mem = 262144            # in kb
>> #maintenance_work_mem = 16384        # min 1024, size in KB
>> #max_stack_depth = 2048            # min 100, size in KB
>>
>> # - Free Space Map -
>>
>> max_fsm_pages = 60000
>> #max_fsm_pages = 20000            # min
>> max_fsm_relations*16, 6 bytes each
>>
>> #max_fsm_relations = 1000        # min 100, ~70 bytes each
>>
>> # - Kernel Resource Usage -
>>
>> #max_files_per_process = 1000        # min 25
>> #preload_libraries = ''
>>
>> # - Cost-Based Vacuum Delay -
>>
>> #vacuum_cost_delay = 0            # 0-1000 milliseconds
>> #vacuum_cost_page_hit = 1        # 0-10000 credits
>> #vacuum_cost_page_miss = 10        # 0-10000 credits
>> #vacuum_cost_page_dirty = 20        # 0-10000 credits
>> #vacuum_cost_limit = 200        # 0-10000 credits
>>
>> # - Background writer -
>>
>> #bgwriter_delay = 200            # 10-10000 milliseconds
>> between rounds
>> #bgwriter_lru_percent = 1.0        # 0-100% of LRU buffers
>> scanned/round
>> #bgwriter_lru_maxpages = 5        # 0-1000 buffers max
>> written/round
>> #bgwriter_all_percent = 0.333        # 0-100% of all buffers
>> scanned/round
>> #bgwriter_all_maxpages = 5        # 0-1000 buffers max
>> written/round
>>
>>
>> #-------------------------------------------------------------
>> --------------
>> # WRITE AHEAD LOG
>> #-------------------------------------------------------------
>> --------------
>>
>> # - Settings -
>>
>> fsync = on                # turns forced
>> synchronization on or off
>> #wal_sync_method = fsync        # the default is the
>> first option
>>                     # supported by the
>> operating system:
>>                     #   open_datasync
>>                     #   fdatasync
>>                     #   fsync
>>                     #   fsync_writethrough
>>                     #   open_sync
>> #full_page_writes = on            # recover from
>> partial page writes
>>
>> wal_buffers = 128
>> #wal_buffers = 8            # min 4, 8KB each
>>
>> #commit_delay = 0            # range 0-100000, in
>> microseconds
>> #commit_siblings = 5            # range 1-1000
>>
>> # - Checkpoints -
>>
>> checkpoint_segments = 256               # 256 * 16Mb =
>> 4,294,967,296 bytes
>> checkpoint_timeout = 1200        # 1200 seconds (20 minutes)
>> checkpoint_warning = 30            # in seconds, 0 is off
>>
>> #checkpoint_segments = 3        # in logfile segments,
>> min 1, 16MB each
>> #checkpoint_timeout = 300        # range 30-3600, in seconds
>> #checkpoint_warning = 30        # in seconds, 0 is off
>>
>> # - Archiving -
>>
>> #archive_command = ''            # command to use to
>> archive a logfile
>>                     # segment
>>
>>
>> #-------------------------------------------------------------
>> --------------
>> # QUERY TUNING
>> #-------------------------------------------------------------
>> --------------
>>
>> # - Planner Method Configuration -
>>
>> #enable_bitmapscan = on
>> #enable_hashagg = on
>> #enable_hashjoin = on
>> #enable_indexscan = on
>> #enable_mergejoin = on
>> #enable_nestloop = on
>> #enable_seqscan = on
>> #enable_sort = on
>> #enable_tidscan = on
>>
>> # - Planner Cost Constants -
>>
>> effective_cache_size = 80000        # 80000 * 8192 =
>> 655,360,000 bytes
>> #effective_cache_size = 1000        # typically 8KB each
>>
>> random_page_cost = 2.5            # units are one
>> sequential page fetch
>> #random_page_cost = 4            # units are one
>> sequential page fetch
>>                     # cost
>> #cpu_tuple_cost = 0.01            # (same)
>> #cpu_index_tuple_cost = 0.001        # (same)
>> #cpu_operator_cost = 0.0025        # (same)
>>
>> # - Genetic Query Optimizer -
>>
>> #geqo = on
>> #geqo_threshold = 12
>> #geqo_effort = 5            # range 1-10
>> #geqo_pool_size = 0            # selects default based
>> on effort
>> #geqo_generations = 0            # selects default based
>> on effort
>> #geqo_selection_bias = 2.0        # range 1.5-2.0
>>
>> # - Other Planner Options -
>>
>> default_statistics_target = 100        # range 1-1000
>> #default_statistics_target = 10        # range 1-1000
>> #constraint_exclusion = off
>> #from_collapse_limit = 8
>> #join_collapse_limit = 8        # 1 disables collapsing
>> of explicit
>>                     # JOINs
>>
>>
>> #-------------------------------------------------------------
>> --------------
>> #-------------------------------------------------------------
>> --------------
>> # RUNTIME STATISTICS
>> #-------------------------------------------------------------
>> --------------
>>
>> # - Statistics Monitoring -
>>
>> #log_parser_stats = off
>> #log_planner_stats = off
>> #log_executor_stats = off
>> #log_statement_stats = off
>>
>> # - Query/Index Statistics Collector -
>>
>> stats_start_collector = on
>> stats_command_string = on
>> stats_block_level = on
>> stats_row_level = on
>>
>> #stats_start_collector = on
>> #stats_command_string = off
>> #stats_block_level = off
>> #stats_row_level = off
>> #stats_reset_on_server_start = off
>>
>>
>> #-------------------------------------------------------------
>> --------------
>> # AUTOVACUUM PARAMETERS
>> #-------------------------------------------------------------
>> --------------
>>
>> autovacuum = true
>> autovacuum_naptime = 600
>>
>> #autovacuum = false            # enable autovacuum subprocess?
>> #autovacuum_naptime = 60        # time between
>> autovacuum runs, in secs
>> #autovacuum_vacuum_threshold = 1000    # min # of tuple updates before
>>                     # vacuum
>> #autovacuum_analyze_threshold = 500    # min # of tuple updates before
>>                     # analyze
>> #autovacuum_vacuum_scale_factor = 0.4    # fraction of rel size before
>>                     # vacuum
>> #autovacuum_analyze_scale_factor = 0.2    # fraction of
>> rel size before
>>                     # analyze
>> #autovacuum_vacuum_cost_delay = -1    # default vacuum cost delay for
>>                     # autovac, -1 means use
>>                     # vacuum_cost_delay
>> #autovacuum_vacuum_cost_limit = -1    # default vacuum cost limit for
>>                     # autovac, -1 means use
>>                     # vacuum_cost_
>>
>>
>> ----------------------
>>
>> CREATE TABLE tiger.completechain
>> (
>>    ogc_fid int4 NOT NULL DEFAULT
>> nextval('completechain_ogc_fid_seq'::regclass),
>>    module varchar(8) NOT NULL,
>>    tlid int4 NOT NULL,
>>    side1 int4,
>>    source varchar(1) NOT NULL,
>>    fedirp varchar(2),
>>    fename varchar(30),
>>    fetype varchar(4),
>>    fedirs varchar(2),
>>    cfcc varchar(3) NOT NULL,
>>    fraddl varchar(11),
>>    toaddl varchar(11),
>>    fraddr varchar(11),
>>    toaddr varchar(11),
>>    friaddl varchar(1),
>>    toiaddl varchar(1),
>>    friaddr varchar(1),
>>    toiaddr varchar(1),
>>    zipl int4,
>>    zipr int4,
>>    aianhhfpl int4,
>>    aianhhfpr int4,
>>    aihhtlil varchar(1),
>>    aihhtlir varchar(1),
>>    census1 varchar(1),
>>    census2 varchar(1),
>>    statel int4,
>>    stater int4,
>>    countyl int4,
>>    countyr int4,
>>    cousubl int4,
>>    cousubr int4,
>>    submcdl int4,
>>    submcdr int4,
>>    placel int4,
>>    placer int4,
>>    tractl int4,
>>    tractr int4,
>>    blockl int4,
>>    blockr int4,
>>    wkb_geometry public.geometry NOT NULL,
>>    CONSTRAINT enforce_dims_wkb_geometry CHECK
>> (ndims(wkb_geometry) = 2),
>>    CONSTRAINT enforce_geotype_wkb_geometry CHECK
>> (geometrytype(wkb_geometry) = 'LINESTRING'::text OR
>> wkb_geometry IS NULL),
>>    CONSTRAINT enforce_srid_wkb_geometry CHECK
>> (srid(wkb_geometry) = 4269)
>> )
>> WITHOUT OIDS;
>> ALTER TABLE tiger.completechain OWNER TO postgres;
>>
>>
>>
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

pgsql-performance by date:

Previous
From: "Thomas F. O'Connell"
Date:
Subject: Re: Figuring out which command failed
Next
From: Simon Riggs
Date:
Subject: Re: Sort performance on large tables