Thread: ported application having performance issues
pgsql performance gurus, We ported an application from oracle to postgresql. We are experiencing an approximately 50% performance hit. I am in the process of isolating the problem. I have searched the internet (google) and tried various things. Only one thing seems to work. I am trying to find out if our solution is the only option, or if I am doing something terribly wrong. The original application runs on the following: hw: cpu0: SUNW,UltraSPARC-IIi (upaid 0 impl 0x12 ver 0x12 clock 302 MHz) mem = 393216K (0x18000000) sw: Solaris 5.6 Oracle 7.3.2.2.0 Apache 1.3.27 Perl 5.004_04 mod_perl 1.27 DBI 1.20 DBD::Oracle 1.12 The ported application runs on the following: hw: unix: [ID 389951 kern.info] mem = 262144K (0x10000000) rootnex: [ID 466748 kern.info] root nexus = Sun Ultra 5/10 UPA/PCI (UltraSPARC-IIi 360MHz) sw: Solaris 5.9 PostgreSQL 7.4.6 Apache 1.3.33 Perl 5.8.6 mod_perl 1.29 DBI 1.46 DBD::Pg 1.40.1 Based on assistance from another list, we have tried the following: (1) Upgraded DBD::Pg to current version 1.43 (2) Ensured all tables are analyzed regularly (3) Setting some memory options in postgresql.conf (4) Located a handful of slow queries by setting log_min_duration_statement to 250. Future options we will consider are: (1) Attempting other option settings, like random_page_cost (2) Upgrading db server to current version 8.0.3 With our handful of slow queries, we have done several iterations of changes to determine what will address the issues. We have broken this down to the direction of a join and setting the enable_seqscan to off. The table definitions are at the bottom of this e-mail. There is one large table (contacts) and one smaller table (lead_requests). The final SQL is as follows: SELECT c.id AS contact_id, lr.id AS lead_request_id FROM lead_requests lr JOIN contacts c ON (c.id = lr.contact_id) WHERE c.partner_id IS NULL ORDER BY contact_id I ran this query against freshly vacuum analyzed tables. The first run is as follows: db=> explain analyze SELECT db-> c.id AS contact_id, db-> lr.id AS lead_request_id db-> FROM db-> lead_requests lr db-> JOIN contacts c ON (c.id = lr.contact_id) db-> WHERE db-> c.partner_id IS NULL db-> ORDER BY db-> contact_id db-> ; LOG: duration: 4618.133 ms statement: explain analyze SELECT c.id AS contact_id, lr.id AS lead_request_id FROM lead_requests lr JOIN contacts c ON (c.id = lr.contact_id) WHERE c.partner_id IS NULL ORDER BY contact_id QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=4272.84..4520.82 rows=1230 width=21) (actual time=3998.771..4603.739 rows=699 loops=1) Merge Cond: ("outer".contact_id = "inner".id) -> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..74.09 rows=1435 width=21) (actual time=0.070..22.431rows=1430 loops=1) -> Sort (cost=4272.84..4352.28 rows=31775 width=11) (actual time=3998.554..4130.067 rows=32640 loops=1) Sort Key: c.id -> Seq Scan on contacts c (cost=0.00..1896.77 rows=31775 width=11) (actual time=0.040..326.135 rows=32501 loops=1) Filter: (partner_id IS NULL) Total runtime: 4611.323 ms (8 rows) As you can see, run time over 4 seconds. Then, I set enable_seqscan = off. db=> set enable_seqscan=off; SET Then I ran the exact same query: db=> explain analyze SELECT db-> c.id AS contact_id, db-> lr.id AS lead_request_id db-> FROM db-> lead_requests lr db-> JOIN contacts c ON (c.id = lr.contact_id) db-> WHERE db-> c.partner_id IS NULL db-> ORDER BY db-> contact_id db-> ; LOG: duration: 915.304 ms statement: explain analyze SELECT c.id AS contact_id, lr.id AS lead_request_id FROM lead_requests lr JOIN contacts c ON (c.id = lr.contact_id) WHERE c.partner_id IS NULL ORDER BY contact_id QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..4749.84 rows=1230 width=21) (actual time=0.213..901.315 rows=699 loops=1) Merge Cond: ("outer".contact_id = "inner".id) -> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..74.09 rows=1435 width=21) (actual time=0.073..21.448rows=1430 loops=1) -> Index Scan using contacts_pkey on contacts c (cost=0.00..4581.30 rows=31775 width=11) (actual time=0.038..524.217rows=32640 loops=1) Filter: (partner_id IS NULL) Total runtime: 903.638 ms (6 rows) Under 1 second. Excellent. The contacts table has 30000+ records. The lead_requests table has just around 1500 records. I want the query to start with the join at the lead_requests table since the number is so much smaller. So, bottom line is this: In order to get the performance to an acceptable level (I can live with under 1 second, though less time would be better), do I have to set enable_seqscan to off every time I run this query? Is there a better or more acceptable way to handle this? Thank you very much in advance for any pointers you can provide. And, if this is the wrong forum for this question, please let me know and I'll ask it elsewhere. JohnM ----- table definitions ----- ----- db=> \d contacts Table "db.contacts" Column | Type | Modifiers ------------------------------+-----------------------------+----------- id | numeric(38,0) | not null db_id | character varying(32) | firstname | character varying(64) | lastname | character varying(64) | company | character varying(128) | email | character varying(256) | phone | character varying(64) | address | character varying(128) | city | character varying(128) | state | character varying(32) | postalcode | character varying(16) | country | character varying(2) | not null contact_info_modified | character(1) | token_id | numeric(38,0) | status_id | numeric(38,0) | status_last_modified | timestamp without time zone | notes | character varying(2000) | demo_schedule | timestamp without time zone | partner_id | numeric(38,0) | prev_partner_id | numeric(38,0) | prev_prev_partner_id | numeric(38,0) | site_last_visited | timestamp without time zone | source_id | numeric(4,0) | demo_requested | timestamp without time zone | sourcebook_requested | timestamp without time zone | zip | numeric(8,0) | suffix | numeric(8,0) | feedback_request_sent | timestamp without time zone | products_sold | character varying(512) | other_brand | character varying(512) | printsample_requested | timestamp without time zone | indoor_media_sample | timestamp without time zone | outdoor_media_sample | timestamp without time zone | printers_owned | character varying(256) | business_type | character varying(256) | printers_owned2 | character varying(256) | contact_quality_id | numeric(38,0) | est_annual_value | numeric(38,2) | likelyhood_of_closing | numeric(38,0) | priority | numeric(38,0) | business_type_id | numeric(38,0) | lead_last_modified | timestamp without time zone | lead_value | numeric(38,2) | channel_contact_flag | character(1) | request_status_last_modified | timestamp without time zone | master_key_number | numeric(38,0) | master_key_token | character varying(32) | current_media_cust | character(1) | kodak_media_id | numeric(38,0) | printer_sample_id | numeric(38,0) | quantity_used_id | numeric(38,0) | rip_used_id | numeric(38,0) | language_code | character varying(3) | region_id | numeric(38,0) | not null lead_deleted | timestamp without time zone | last_request_set_status_id | numeric(38,0) | address2 | character varying(128) | media_usage_id | numeric(38,0) | Indexes: "contacts_pkey" primary key, btree (id) "contacts_partner_id_idx" btree (partner_id) "contacts_partner_id_null_idx" btree (partner_id) WHERE (partner_id IS NULL) "contacts_token_id_idx" btree (token_id) Check constraints: "sys_c0050644" CHECK (country IS NOT NULL) "sys_c0050643" CHECK (id IS NOT NULL) "sys_c0050645" CHECK (region_id IS NOT NULL) Triggers: insert_master_key BEFORE INSERT ON contacts FOR EACH ROW EXECUTE PROCEDURE pg_fct_insert_master_key() ----- ----- db=> \d lead_requests Table "db.lead_requests" Column | Type | Modifiers -----------------------+-----------------------------+----------- id | numeric(38,0) | not null contact_id | numeric(38,0) | not null request_id | numeric(38,0) | not null date_requested | timestamp without time zone | not null must_update_by | timestamp without time zone | date_satisfied | timestamp without time zone | status_id | numeric(38,0) | request_scheduled | timestamp without time zone | session_log_id | numeric(38,0) | notes | character varying(2000) | status_last_modified | timestamp without time zone | reminder_last_sent | timestamp without time zone | data | character varying(2000) | fulfillment_status_id | numeric(38,0) | Indexes: "lead_requests_pkey" primary key, btree (id) "lead_requests_contact_id_idx" btree (contact_id) "lead_requests_request_id_idx" btree (request_id) Check constraints: "sys_c0049877" CHECK (request_id IS NOT NULL) "sys_c0049876" CHECK (contact_id IS NOT NULL) "sys_c0049878" CHECK (date_requested IS NOT NULL) ----- -- John Mendenhall john@surfutopia.net surf utopia internet services
> Thank you very much in advance for any pointers you can > provide. And, if this is the wrong forum for this question, > please let me know and I'll ask it elsewhere. I think you may want to increase your statistics_target plus make sure you are running analyze. explain anaylze would do. Sincerely, Joshua D. Drake > > JohnM > > > > > > ----- > table definitions > ----- > > ----- > db=> \d contacts > Table "db.contacts" > Column | Type | Modifiers > ------------------------------+-----------------------------+----------- > id | numeric(38,0) | not null > db_id | character varying(32) | > firstname | character varying(64) | > lastname | character varying(64) | > company | character varying(128) | > email | character varying(256) | > phone | character varying(64) | > address | character varying(128) | > city | character varying(128) | > state | character varying(32) | > postalcode | character varying(16) | > country | character varying(2) | not null > contact_info_modified | character(1) | > token_id | numeric(38,0) | > status_id | numeric(38,0) | > status_last_modified | timestamp without time zone | > notes | character varying(2000) | > demo_schedule | timestamp without time zone | > partner_id | numeric(38,0) | > prev_partner_id | numeric(38,0) | > prev_prev_partner_id | numeric(38,0) | > site_last_visited | timestamp without time zone | > source_id | numeric(4,0) | > demo_requested | timestamp without time zone | > sourcebook_requested | timestamp without time zone | > zip | numeric(8,0) | > suffix | numeric(8,0) | > feedback_request_sent | timestamp without time zone | > products_sold | character varying(512) | > other_brand | character varying(512) | > printsample_requested | timestamp without time zone | > indoor_media_sample | timestamp without time zone | > outdoor_media_sample | timestamp without time zone | > printers_owned | character varying(256) | > business_type | character varying(256) | > printers_owned2 | character varying(256) | > contact_quality_id | numeric(38,0) | > est_annual_value | numeric(38,2) | > likelyhood_of_closing | numeric(38,0) | > priority | numeric(38,0) | > business_type_id | numeric(38,0) | > lead_last_modified | timestamp without time zone | > lead_value | numeric(38,2) | > channel_contact_flag | character(1) | > request_status_last_modified | timestamp without time zone | > master_key_number | numeric(38,0) | > master_key_token | character varying(32) | > current_media_cust | character(1) | > kodak_media_id | numeric(38,0) | > printer_sample_id | numeric(38,0) | > quantity_used_id | numeric(38,0) | > rip_used_id | numeric(38,0) | > language_code | character varying(3) | > region_id | numeric(38,0) | not null > lead_deleted | timestamp without time zone | > last_request_set_status_id | numeric(38,0) | > address2 | character varying(128) | > media_usage_id | numeric(38,0) | > Indexes: > "contacts_pkey" primary key, btree (id) > "contacts_partner_id_idx" btree (partner_id) > "contacts_partner_id_null_idx" btree (partner_id) WHERE (partner_id IS NULL) > "contacts_token_id_idx" btree (token_id) > Check constraints: > "sys_c0050644" CHECK (country IS NOT NULL) > "sys_c0050643" CHECK (id IS NOT NULL) > "sys_c0050645" CHECK (region_id IS NOT NULL) > Triggers: > insert_master_key BEFORE INSERT ON contacts FOR EACH ROW EXECUTE PROCEDURE pg_fct_insert_master_key() > ----- > > ----- > db=> \d lead_requests > Table "db.lead_requests" > Column | Type | Modifiers > -----------------------+-----------------------------+----------- > id | numeric(38,0) | not null > contact_id | numeric(38,0) | not null > request_id | numeric(38,0) | not null > date_requested | timestamp without time zone | not null > must_update_by | timestamp without time zone | > date_satisfied | timestamp without time zone | > status_id | numeric(38,0) | > request_scheduled | timestamp without time zone | > session_log_id | numeric(38,0) | > notes | character varying(2000) | > status_last_modified | timestamp without time zone | > reminder_last_sent | timestamp without time zone | > data | character varying(2000) | > fulfillment_status_id | numeric(38,0) | > Indexes: > "lead_requests_pkey" primary key, btree (id) > "lead_requests_contact_id_idx" btree (contact_id) > "lead_requests_request_id_idx" btree (request_id) > Check constraints: > "sys_c0049877" CHECK (request_id IS NOT NULL) > "sys_c0049876" CHECK (contact_id IS NOT NULL) > "sys_c0049878" CHECK (date_requested IS NOT NULL) > ----- > > -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
John Mendenhall <john@surfutopia.net> writes: > Merge Join (cost=4272.84..4520.82 rows=1230 width=21) (actual time=3998.771..4603.739 rows=699 loops=1) > Merge Cond: ("outer".contact_id = "inner".id) > -> Index Scan using lead_requests_contact_id_idx on lead_requests lr (cost=0.00..74.09 rows=1435 width=21) (actualtime=0.070..22.431 rows=1430 loops=1) > -> Sort (cost=4272.84..4352.28 rows=31775 width=11) (actual time=3998.554..4130.067 rows=32640 loops=1) > Sort Key: c.id > -> Seq Scan on contacts c (cost=0.00..1896.77 rows=31775 width=11) (actual time=0.040..326.135 rows=32501 loops=1) > Filter: (partner_id IS NULL) > Total runtime: 4611.323 ms Hmm ... even on a SPARC, it doesn't seem like it should take 4 seconds to sort 30000 rows. You can certainly see that the planner is not expecting that (it's estimating a sort cost comparable to the scan cost, which if true would put this in the sub-second ballpark). Does increasing sort_mem help? Have you considered using some other datatype than "numeric" for your keys? Numeric may be fast on Oracle but it's not amazingly fast on Postgres. bigint would be better, if you don't really need 38 digits; if you do, I'd be inclined to think about plain char or varchar keys. regards, tom lane
pgsql performance gurus, I truly appreciate the suggestions provided. I have tried each one separately to determine the best fit. I have included results for each suggestion. I have also included my entire postgresql.conf file so you can see our base configuration. Each result is based on an in-session variable setting, so it only affected the current session. (1) Increase the default_statistics_target, run vacuum, analyze on each table for each setting The default setting is 10. I tried the following settings, with the corresponding results: default_statistics_target = 10 time approximately 4500ms default_statistics_target = 100 time approximately 3900ms default_statistics_target = 500 time approximately 3900ms default_statistics_target = 1000 time approximately 3900ms So, this option does not quite get us there. (2) Increase sort_mem value The current setting for sort_mem is 2048. sort_mem = 2048 time approximately 4500ms sort_mem = 8192 time approximately 2750ms sort_mem = 16384 time approximately 2650ms sort_mem = 1024 time approximately 1000ms Interesting to note... When I set sort_mem to 1024, the plan started the join with the lead_requests table and used the contacts index. None of the above attempts used this. (3) Decrease random_page_cost, increase effective_cache_size The default setting for random_page_cost is 4. Our setting for effective_cache_size is 2048. random_page_cost = 4, effective_cache_size = 2048 time approximately 4500ms random_page_cost = 3, effective_cache_size = 2048 time approximately 1050ms random_page_cost = 3, effective_cache_size = 4096 time approximately 1025ms The decrease of random_page_cost to 3 caused the plan to work properly, using the lead_requests table as a join starting point and using the contacts index. ***** It appears we learned the following: (a) For some reason, setting the sort_mem smaller than our current setting caused things to work correctly. (b) Lowering random_page_cost causes things to work correctly. This brings up the following questions: (i) What is the ideal configuration for this query to work? (ii) Will this ideal configuration work for all our other queries, or is this specific to this query only? (iii) Should I try additional variable changes, or lower/raise the variables I have already changed even more? Thanks again for the suggestions provided. And, thanks in advance for any additional thoughts or suggestions. JohnM :::::::::::::: postgresql.conf :::::::::::::: ----- # ----------------------------- # PostgreSQL configuration file # ----------------------------- # # This file consists of lines of the form: # # name = value # # (The '=' is optional.) White space may be used. Comments are introduced # with '#' anywhere on a line. The complete list of option names and # allowed values can be found in the PostgreSQL documentation. The # commented-out settings shown in this file represent the default values. # # Any option can also be given as a command line switch to the # postmaster, e.g. 'postmaster -c log_connections=on'. Some options # can be changed at run-time with the 'SET' SQL command. # # This file is read on postmaster startup and when the postmaster # receives a SIGHUP. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use # "pg_ctl reload". #--------------------------------------------------------------------------- # CONNECTIONS AND AUTHENTICATION #--------------------------------------------------------------------------- # - Connection Settings - tcpip_socket = false max_connections = 128 # note: increasing max_connections costs about 500 bytes of shared # memory per connection slot, in addition to costs from shared_buffers # and max_locks_per_transaction. #superuser_reserved_connections = 2 #port = 5432 #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal #virtual_host = '' # what interface to listen on; defaults to any #rendezvous_name = '' # defaults to the computer name # - Security & Authentication - #authentication_timeout = 60 # 1-600, in seconds #ssl = false #password_encryption = true #krb_server_keyfile = '' #db_user_namespace = false #--------------------------------------------------------------------------- # RESOURCE USAGE (except WAL) #--------------------------------------------------------------------------- # - Memory - shared_buffers = 4096 # min 16, at least max_connections*2, 8KB each sort_mem = 2048 # min 64, size in KB #vacuum_mem = 8192 # min 1024, size in KB # - Free Space Map - #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each # - Kernel Resource Usage - #max_files_per_process = 1000 # min 25 #preload_libraries = '' #--------------------------------------------------------------------------- # WRITE AHEAD LOG #--------------------------------------------------------------------------- # - Settings - #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync # the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync #wal_buffers = 8 # min 4, 8KB each # - Checkpoints - #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 300 # range 30-3600, in seconds #checkpoint_warning = 30 # 0 is off, in seconds #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 #--------------------------------------------------------------------------- # QUERY TUNING #--------------------------------------------------------------------------- # - Planner Method Enabling - #enable_hashagg = true #enable_hashjoin = true #enable_indexscan = true #enable_mergejoin = true #enable_nestloop = true #enable_seqscan = true #enable_sort = true #enable_tidscan = true # - Planner Cost Constants - effective_cache_size = 2048 # typically 8KB each #random_page_cost = 4 # units are one sequential page fetch cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) # - Genetic Query Optimizer - #geqo = true #geqo_threshold = 11 #geqo_effort = 1 #geqo_generations = 0 #geqo_pool_size = 0 # default based on tables in statement, # range 128-1024 #geqo_selection_bias = 2.0 # range 1.5-2.0 # - Other Planner Options - #default_statistics_target = 10 # range 1-1000 #from_collapse_limit = 8 #join_collapse_limit = 8 # 1 disables collapsing of explicit JOINs #--------------------------------------------------------------------------- # ERROR REPORTING AND LOGGING #--------------------------------------------------------------------------- # - Syslog - syslog = 1 # range 0-2; 0=stdout; 1=both; 2=syslog syslog_facility = 'LOCAL0' syslog_ident = 'postgres' # - When to Log - client_min_messages = log # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # log, info, notice, warning, error log_min_messages = info # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, log, fatal, # panic log_error_verbosity = verbose # terse, default, or verbose messages log_min_error_statement = info # Values in order of increasing severity: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, panic(off) log_min_duration_statement = 250 # Log all statements whose # execution time exceeds the value, in # milliseconds. Zero prints all # queries. Minus-one disables. silent_mode = false # DO NOT USE without Syslog! # - What to Log - #debug_print_parse = false #debug_print_rewritten = false #debug_print_plan = false debug_pretty_print = true log_connections = true #log_duration = true #log_pid = false #log_statement = true log_timestamp = true #log_hostname = false #log_source_port = false #--------------------------------------------------------------------------- # RUNTIME STATISTICS #--------------------------------------------------------------------------- # - Statistics Monitoring - #log_parser_stats = false #log_planner_stats = false #log_executor_stats = false #log_statement_stats = false # - Query/Index Statistics Collector - #stats_start_collector = true #stats_command_string = false #stats_block_level = false #stats_row_level = false #stats_reset_on_server_start = true #--------------------------------------------------------------------------- # CLIENT CONNECTION DEFAULTS #--------------------------------------------------------------------------- # - Statement Behavior - #search_path = '$user,public' # schema names #check_function_bodies = true #default_transaction_isolation = 'read committed' #default_transaction_read_only = false #statement_timeout = 0 # 0 is disabled, in milliseconds # - Locale and Formatting - #datestyle = 'iso, mdy' #timezone = unknown # actually, defaults to TZ environment setting #australian_timezones = false #extra_float_digits = 0 # min -15, max 2 #client_encoding = sql_ascii # actually, defaults to database encoding # These settings are initialized by initdb -- they may be changed lc_messages = 'C' # locale for system error message strings lc_monetary = 'C' # locale for monetary formatting lc_numeric = 'C' # locale for number formatting lc_time = 'C' # locale for time formatting # - Other Defaults - #explain_pretty_print = true #dynamic_library_path = '$libdir' #max_expr_depth = 10000 # min 10 #--------------------------------------------------------------------------- # LOCK MANAGEMENT #--------------------------------------------------------------------------- #deadlock_timeout = 1000 # in milliseconds #max_locks_per_transaction = 64 # min 10, ~260*max_connections bytes each #--------------------------------------------------------------------------- # VERSION/PLATFORM COMPATIBILITY #--------------------------------------------------------------------------- # - Previous Postgres Versions - #add_missing_from = true #regex_flavor = advanced # advanced, extended, or basic #sql_inheritance = true # - Other Platforms & Clients - #transform_null_equals = false ----- -- John Mendenhall john@surfutopia.net surf utopia internet services
On Thu, 30 Jun 2005, John Mendenhall wrote: > Our setting for effective_cache_size is 2048. > > random_page_cost = 4, effective_cache_size = 2048 time approximately 4500ms > random_page_cost = 3, effective_cache_size = 2048 time approximately 1050ms > random_page_cost = 3, effective_cache_size = 4096 time approximately 1025ms > > The decrease of random_page_cost to 3 caused the plan > to work properly, using the lead_requests table as a > join starting point and using the contacts index. The effective_cache_size still looks small. As a rule of tumb you might want effective_cache_size to be something like 1/2 or 2/3 of your total memory. I don't know how much you had, but effective_cache_size = 4096 is only 32M. shared_buffers and effective_cache_size is normally the two most important settings in my experience. -- /Dennis Björklund
Dennis, On Fri, 01 Jul 2005, Dennis Bjorklund wrote: > On Thu, 30 Jun 2005, John Mendenhall wrote: > > > Our setting for effective_cache_size is 2048. > > > > random_page_cost = 4, effective_cache_size = 2048 time approximately 4500ms > > random_page_cost = 3, effective_cache_size = 2048 time approximately 1050ms > > random_page_cost = 3, effective_cache_size = 4096 time approximately 1025ms > > The effective_cache_size still looks small. As a rule of tumb you might > want effective_cache_size to be something like 1/2 or 2/3 of your total > memory. I don't know how much you had, but effective_cache_size = 4096 is > only 32M. > > shared_buffers and effective_cache_size is normally the two most important > settings in my experience. I have increased the effective_cache_size to 16384 (128M). I have kept random_page_cost at 3 for now. This appears to give me the performance I need at this time. In the future, we'll look at other methods of increasing the performance. Thank you all for all your suggestions. JohnM -- John Mendenhall john@surfutopia.net surf utopia internet services
Hi all,
If you can just help my understanding the choice of the planner.
Here is the Query:
explain analyse SELECT IRNUM FROM IR
INNER JOIN IT ON IT.ITIRNUM = ANY ('{1000, 2000}') AND IT.ITYPNUM = 'M' AND IR.IRYPNUM = IT.ITYPNUM AND IR.IRNUM = IT.ITIRNUM
WHERE IRNUM = ANY ('{1000, 2000}') and IRYPNUM = 'M'
Here is the Query plan:
I don't understand why the planner do a Seq Scan (Seq Scan on table IT ..) instead of passing by the followin index:
ALTER TABLE IT ADD CONSTRAINT IT_IR_FK foreign key (ITYPNUM,ITIRNUM) references IR (IRYPNUM, IRNUM) ON UPDATE CASCADE;
I tried some stuff but I'm not able to change this behavior. The IT and IR table may be quite huge (from 20k to 1600k rows) so I think doing a SEQ SCAN is not a good idea.. am I wrong? Is this query plan is oki for you ?
Thanks for your help.
/David
P.S.: I'm using postgresql 8.0.3 on windows and I change those setting in my postgresql.conf :
shared_buffers = 12000 # min 16, at least max_connections*2, 8KB each
work_mem = 15000 # min 64, size in KB
If you can just help my understanding the choice of the planner.
Here is the Query:
explain analyse SELECT IRNUM FROM IR
INNER JOIN IT ON IT.ITIRNUM = ANY ('{1000, 2000}') AND IT.ITYPNUM = 'M' AND IR.IRYPNUM = IT.ITYPNUM AND IR.IRNUM = IT.ITIRNUM
WHERE IRNUM = ANY ('{1000, 2000}') and IRYPNUM = 'M'
Here is the Query plan:
QUERY PLAN | ||||||||
Hash Join (cost=1142.47..5581.75 rows=87 width=4) (actual time=125.000..203.000 rows=2 loops=1) | ||||||||
Hash Cond: ("outer".itirnum = "inner".irnum) | ||||||||
-> Seq Scan on it (cost=0.00..3093.45 rows=31646 width=9) (actual time=0.000..78.000 rows=2 loops=1) | ||||||||
Filter: ((itirnum = ANY ('{1000,2000}'::integer[])) AND ((itypnum)::text = 'M'::text)) | ||||||||
-> Hash (cost=1142.09..1142.09 rows=151 width=37) (actual time=125.000..125.000 rows=0 loops=1) | ||||||||
-> Index Scan using ir_pk on ir (cost=0.00..1142.09 rows=151 width=37) (actual time=0.000..125.000 rows=2 loops=1) | ||||||||
Index Cond: ((irypnum)::text = 'M'::text) | ||||||||
Filter: (irnum = ANY ('{1000,2000}'::integer[])) | ||||||||
Total runtime: 203.000 ms | ||||||||
I don't understand why the planner do a Seq Scan (Seq Scan on table IT ..) instead of passing by the followin index:
ALTER TABLE IT ADD CONSTRAINT IT_IR_FK foreign key (ITYPNUM,ITIRNUM) references IR (IRYPNUM, IRNUM) ON UPDATE CASCADE;
I tried some stuff but I'm not able to change this behavior. The IT and IR table may be quite huge (from 20k to 1600k rows) so I think doing a SEQ SCAN is not a good idea.. am I wrong? Is this query plan is oki for you ?
Thanks for your help.
/David
P.S.: I'm using postgresql 8.0.3 on windows and I change those setting in my postgresql.conf :
shared_buffers = 12000 # min 16, at least max_connections*2, 8KB each
work_mem = 15000 # min 64, size in KB
On Mon, 4 Jul 2005, David Gagnon wrote: > If you can just help my understanding the choice of the planner. > > Here is the Query: > explain analyse SELECT IRNUM FROM IR > INNER JOIN IT ON IT.ITIRNUM = ANY ('{1000, 2000}') AND > IT.ITYPNUM = 'M' AND IR.IRYPNUM = IT.ITYPNUM AND IR.IRNUM = IT.ITIRNUM > WHERE IRNUM = ANY ('{1000, 2000}') and IRYPNUM = 'M' > > Here is the Query plan: > > QUERY PLAN > > Hash Join (cost=1142.47..5581.75 rows=87 width=4) (actual > time=125.000..203.000 rows=2 loops=1) > Hash Cond: ("outer".itirnum = "inner".irnum) > -> Seq Scan on it (cost=0.00..3093.45 rows=31646 width=9) (actual > time=0.000..78.000 rows=2 loops=1) > Filter: ((itirnum = ANY ('{1000,2000}'::integer[])) AND > ((itypnum)::text = 'M'::text)) > > -> Hash (cost=1142.09..1142.09 rows=151 width=37) (actual > time=125.000..125.000 rows=0 loops=1) > -> Index Scan using ir_pk on ir (cost=0.00..1142.09 rows=151 > width=37) (actual time=0.000..125.000 rows=2 loops=1) > Index Cond: ((irypnum)::text = 'M'::text) > Filter: (irnum = ANY ('{1000,2000}'::integer[])) > Total runtime: 203.000 ms > I don't understand why the planner do a Seq Scan (Seq Scan on table > IT ..) instead of passing by the followin index: > ALTER TABLE IT ADD CONSTRAINT IT_IR_FK foreign key (ITYPNUM,ITIRNUM) > references IR (IRYPNUM, IRNUM) ON UPDATE CASCADE; That doesn't create an index on IT. Primary keys (and unique constraints) create indexes, but not foreign keys. Did you also create an index on those fields? Also it looks like it's way overestimating the number of rows that condition would succeed for. You might consider raising the statistics targets on those columns and reanalyzing.
Thanks .. I miss that FK don't create indexed ... since Primary key implicitly does ...
I'm a bit surprised of that behavior thought, since it means that if we delete a row from table A all tables (B,C,D) with FK pointing to this table (A) must be scanned.
If there is no index on those tables it means we gone do all Sequantial scans. Than can cause significant performance problem!!!.
Is there a reason why implicit index aren't created when FK are declared. I looked into the documentation and I haven't found a way to tell postgresql to automatically create an index when creating la FK. Does it means I need to manage it EXPLICITLY with create index statement ? Is there another way ?
Thanks for you help that simple answer will solve a lot of performance problem I have ...
/David
I'm a bit surprised of that behavior thought, since it means that if we delete a row from table A all tables (B,C,D) with FK pointing to this table (A) must be scanned.
If there is no index on those tables it means we gone do all Sequantial scans. Than can cause significant performance problem!!!.
Is there a reason why implicit index aren't created when FK are declared. I looked into the documentation and I haven't found a way to tell postgresql to automatically create an index when creating la FK. Does it means I need to manage it EXPLICITLY with create index statement ? Is there another way ?
Thanks for you help that simple answer will solve a lot of performance problem I have ...
/David
On Mon, 4 Jul 2005, David Gagnon wrote:If you can just help my understanding the choice of the planner. Here is the Query:explain analyse SELECT IRNUM FROM IR INNER JOIN IT ON IT.ITIRNUM = ANY ('{1000, 2000}') AND IT.ITYPNUM = 'M' AND IR.IRYPNUM = IT.ITYPNUM AND IR.IRNUM = IT.ITIRNUM WHERE IRNUM = ANY ('{1000, 2000}') and IRYPNUM = 'M' Here is the Query plan: QUERY PLAN Hash Join (cost=1142.47..5581.75 rows=87 width=4) (actual time=125.000..203.000 rows=2 loops=1) Hash Cond: ("outer".itirnum = "inner".irnum) -> Seq Scan on it (cost=0.00..3093.45 rows=31646 width=9) (actual time=0.000..78.000 rows=2 loops=1) Filter: ((itirnum = ANY ('{1000,2000}'::integer[])) AND ((itypnum)::text = 'M'::text)) -> Hash (cost=1142.09..1142.09 rows=151 width=37) (actual time=125.000..125.000 rows=0 loops=1) -> Index Scan using ir_pk on ir (cost=0.00..1142.09 rows=151 width=37) (actual time=0.000..125.000 rows=2 loops=1) Index Cond: ((irypnum)::text = 'M'::text) Filter: (irnum = ANY ('{1000,2000}'::integer[])) Total runtime: 203.000 msI don't understand why the planner do a Seq Scan (Seq Scan on table IT ..) instead of passing by the followin index: ALTER TABLE IT ADD CONSTRAINT IT_IR_FK foreign key (ITYPNUM,ITIRNUM) references IR (IRYPNUM, IRNUM) ON UPDATE CASCADE;That doesn't create an index on IT. Primary keys (and unique constraints) create indexes, but not foreign keys. Did you also create an index on those fields? Also it looks like it's way overestimating the number of rows that condition would succeed for. You might consider raising the statistics targets on those columns and reanalyzing.
> I'm a bit surprised of that behavior thought, since it means that if we > delete a row from table A all tables (B,C,D) with FK pointing to this > table (A) must be scanned. > If there is no index on those tables it means we gone do all Sequantial > scans. Than can cause significant performance problem!!!. Correct. > Is there a reason why implicit index aren't created when FK are > declared. Because it's not a requirement... > I looked into the documentation and I haven't found a way to > tell postgresql to automatically create an index when creating la FK. > Does it means I need to manage it EXPLICITLY with create index statement > ? Is there another way ? No other way - you need to explicitly create them. It's not that hard either to write a query to search the system catalogs for unindexed FK's. Chris
On Mon, Jul 04, 2005 at 20:29:50 -0400, David Gagnon <dgagnon@siunik.com> wrote: > Thanks .. I miss that FK don't create indexed ... since Primary key > implicitly does ... > > I'm a bit surprised of that behavior thought, since it means that if we > delete a row from table A all tables (B,C,D) with FK pointing to this > table (A) must be scanned. But in some applications you don't ever do that, so you don't save anything by having the index for deletes but have to pay the cost to update it when modifying the referencing table. If you think an index will help in your case, just create one.
On Mon, 4 Jul 2005, David Gagnon wrote: > Thanks .. I miss that FK don't create indexed ... since Primary key > implicitly does ... > > I'm a bit surprised of that behavior thought, since it means that if we > delete a row from table A all tables (B,C,D) with FK pointing to this > table (A) must be scanned. > If there is no index on those tables it means we gone do all Sequantial > scans. Than can cause significant performance problem!!!. > > Is there a reason why implicit index aren't created when FK are > declared. I looked into the documentation and I haven't found a way to The reason is that it's not always useful to have an index for that purpose. You could either have low selectivity (in which case the index wouldn't be used) or low/batch changes to the referenced table (in which case the cost of maintaining the index may be greater than the value of having the index) or other such cases. In primary key and unique, we currently have no choice but to make an index because that's how the constraint is currently implemented. > tell postgresql to automatically create an index when creating la FK. > Does it means I need to manage it EXPLICITLY with create index statement > ? Yeah. >Is there another way ? Not that I can think of without changing the source.
David Gagnon <dgagnon@siunik.com> writes: > explain analyse SELECT IRNUM FROM IR > INNER JOIN IT ON IT.ITIRNUM = ANY ('{1000, 2000}') AND > IT.ITYPNUM = 'M' AND IR.IRYPNUM = IT.ITYPNUM AND IR.IRNUM = IT.ITIRNUM > WHERE IRNUM = ANY ('{1000, 2000}') and IRYPNUM = 'M' Those =ANY constructs are not currently optimizable at all. You might get better results with "IT.ITIRNUM IN (1000, 2000)" etc. regards, tom lane
Tom Lane wrote: >David Gagnon <dgagnon@siunik.com> writes: > > >> explain analyse SELECT IRNUM FROM IR >> INNER JOIN IT ON IT.ITIRNUM = ANY ('{1000, 2000}') AND >>IT.ITYPNUM = 'M' AND IR.IRYPNUM = IT.ITYPNUM AND IR.IRNUM = IT.ITIRNUM >> WHERE IRNUM = ANY ('{1000, 2000}') and IRYPNUM = 'M' >> >> > >Those =ANY constructs are not currently optimizable at all. You might >get better results with "IT.ITIRNUM IN (1000, 2000)" etc. > > regards, tom lane > > > I already tried this construct. But the statement comes from a stored procedure where the {1000, 2000} is an array variable (requestIds). I tried to use IT.ITIRNUM IN (requestIds) or several other variant without success. Is there a way to make it work? Here is the statement the statement from the store procedure. Remenber requestIds is anarray of int. FOR inventoryTransaction IN SELECT DISTINCT IRNUM, IRAENUM, IRSTATUT, IRSENS, IRSOURCE, IRDATE, IRQTE FROM IR WHERE IRNUM = ANY (requestIds) and IRYPNUM = companyId LOOP Thank for your help !!!! /David
* David Gagnon <dgagnon@siunik.com> wrote: > FOR inventoryTransaction IN > SELECT DISTINCT IRNUM, IRAENUM, IRSTATUT, IRSENS, IRSOURCE, > IRDATE, IRQTE > FROM IR > WHERE IRNUM = ANY (requestIds) and IRYPNUM = companyId > LOOP hmm. you probably could create the query dynamically and then execute it. BTW: why isn't IN not usable with arrays ? cu -- --------------------------------------------------------------------- Enrico Weigelt == metux IT service phone: +49 36207 519931 www: http://www.metux.de/ fax: +49 36207 519932 email: contact@metux.de --------------------------------------------------------------------- Realtime Forex/Stock Exchange trading powered by postgresSQL :)) http://www.fxignal.net/ ---------------------------------------------------------------------