Fw: Infinite CPU loop due to field ::type casting, Take II :-) - Mailing list pgsql-performance
From | Steven Rosenstein |
---|---|
Subject | Fw: Infinite CPU loop due to field ::type casting, Take II :-) |
Date | |
Msg-id | OF31A95FE9.5950C366-ON85256F17.0070339C-85256F17.0070E90F@us.ibm.com Whole thread Raw |
Responses |
Re: Fw: Infinite CPU loop due to field ::type casting, Take II :-)
|
List | pgsql-performance |
I just realized in my haste to send this email out I provided the wrong table in my example. Below is the same email, but with vsa.dtbl_logged_event_20040922 substituted for vsa.tbl_sad_event. Sorry for the inconvenience. --- Steve Gentlefolk, I'm not sure if this is the proper forum for this question, and it might have been answered in a previous thread, but I'm new to PostgreSQL and the research I did in the archives did not turn up anything addressing this issue. Please direct me to the proper forum is this is not the correct venue. Environment: Red Hat Enterprise Linux 3 Workstation, PostgreSQL V7.3.6 (stock with the RHEL distribution) The two tables I used in the example are tbl_device and dtbl_logged_event_20040922: vsa=# \d vsa.tbl_device; Table "vsa.tbl_device" Column | Type | Modifiers ----------------+--------------------------+--------------------------------------------------------- id | integer | not null default nextval('vsa.tbl_device_id_seq'::text) name | character varying(100) | not null account_id | bigint | not null vss_site_id | bigint | not null org_site_id | bigint | not null default 0 device_type_id | integer | not null default 1 os_family_id | integer | not null default 0 status_id | integer | not null default 0 timezone | character varying(80) | clientkey | character varying(2048) | not null record_created | timestamp with time zone | default now() Indexes: pk_tbl_device primary key btree (id), idx_d_uniq_name_site_account_key unique btree (name, vss_site_id, account_id, clientkey), tbl_device_clientkey_key unique btree (clientkey), idx_d_account_id btree (account_id), idx_d_account_site_name btree (account_id, vss_site_id, name), idx_d_device_type_id btree (device_type_id), idx_d_name btree (name), idx_d_org_site_id btree (org_site_id), idx_d_os_family_id btree (os_family_id), idx_d_status_id btree (status_id), idx_d_vss_site_id btree (vss_site_id) Foreign Key constraints: fk_d_va FOREIGN KEY (account_id) REFERENCES vsa.tbl_vsa_account(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_vs FOREIGN KEY (vss_site_id) REFERENCES vsa.tbl_vss_site(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_dof FOREIGN KEY (os_family_id) REFERENCES vsa.enum_device_os_family(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_dsc FOREIGN KEY (status_id) REFERENCES vsa.enum_device_status_code(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_d_dt FOREIGN KEY (device_type_id) REFERENCES vsa.enum_device_type(id) ON UPDATE NO ACTION ON DELETE NO ACTION Triggers: trg_clean_device_name vsa=# \d vsa.dtbl_logged_event_20040922 Table "vsa.dtbl_logged_event_20040922" Column | Type | Modifiers -----------------+-----------------------------+------------------------------------------------------------------------- id | integer | not null default nextval('vsa.dtbl_logged_event_20040922_id_seq'::text) device_id | bigint | not null report_datetime | timestamp without time zone | time_logged | timestamp without time zone | log_type | integer | not null type | character varying(50) | severity | character varying(30) | source_tag | character varying(30) | remote_host | character varying(100) | user_name | character varying(50) | event_code | character varying(10) | description | text | record_created | timestamp with time zone | default now() event_code_new | character varying(30) | remote_user | character varying(50) | Indexes: pk_dtbl_logged_event_20040922 primary key btree (id), idx_le_device_id_20040922 btree (device_id), idx_le_id_source_event_20040922 btree (device_id, source_tag, event_code), idx_le_id_src_20040922 btree (device_id, source_tag), idx_le_id_type_severity_evtcode_20040922 btree (device_id, log_type, severity, event_code), idx_le_log_type_20040922 btree (log_type), idx_le_source_tag_20040922 btree (source_tag), idx_le_time_logged_20040922 btree (time_logged), idx_le_time_type_20040922 btree (time_logged, log_type) Foreign Key constraints: fk_le_lelt_20040922 FOREIGN KEY (log_type) REFERENCES vsa.enum_le_log_type(id) ON UPDATE NO ACTION ON DELETE NO ACTION, fk_le_d_20040922 FOREIGN KEY (device_id) REFERENCES vsa.tbl_device(id) ON UPDATE NO ACTION ON DELETE CASCADE Here is my original query, and the query plan generated by the planner: vsa=# explain SELECT dev.name, dev.vss_site_id, tbl.log_type, tbl.severity, tbl.count FROM vsa.tbl_device AS dev LEFT OUTER JOIN (SELECT stbl.device_id, stbl.log_type, stbl.severity, count(*) FROM vsa.dtbl_logged_event_20040922 AS stbl WHERE stbl.log_type IN (2, 3, 4, 5) GROUP BY stbl.device_id, stbl.log_type, stbl.severity) AS tbl ON (dev.id=tbl.device_id) ORDER BY dev.name; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=40893.18..40960.93 rows=27100 width=79) Sort Key: dev.name -> Merge Join (cost=38417.13..38897.77 rows=27100 width=79) Merge Cond: ("outer".id = "inner".device_id) -> Sort (cost=869.52..872.70 rows=1275 width=26) Sort Key: dev.id -> Seq Scan on tbl_device dev (cost=0.00..803.75 rows=1275 width=26) -> Sort (cost=37547.62..37615.37 rows=27100 width=26) Sort Key: tbl.device_id -> Subquery Scan tbl (cost=0.00..35552.21 rows=27100 width=26) -> Aggregate (cost=0.00..35552.21 rows=27100 width=26) -> Group (cost=0.00..34874.70 rows=271005 width=26) -> Index Scan using idx_le_id_type_severity_evtcode_20040922 on dtbl_logged_event_20040922stbl (cost=0.00..32842.16 rows=271005 width=26) Filter: ((log_type = 2) OR (log_type = 3) OR (log_type = 4) OR (log_type = 5)) (14 rows) Time: 1.43 ms Late in the development I realized that we had created an inconsistency in our design by having vsa.tbl_device.id defined as "int", and vsa.dtbl_logged_event_20040922.device_id defined as "bigint". These two fields are used in the ON clause (ON (dev.id=tbl.device_id)), and my understanding is that they should be of the same type cast. Trying to remedy this situation, I explicitly tried casting vsa.dtbl_logged_event_20040922.device_id as "int" (::int): vsa=# explain SELECT dev.name, dev.vss_site_id, tbl.log_type, tbl.severity, tbl.count FROM vsa.tbl_device AS dev LEFT OUTER JOIN (SELECT stbl.device_id, stbl.log_type, stbl.severity, count(*) FROM vsa.dtbl_logged_event_20040922 AS stbl WHERE stbl.log_type IN (2, 3, 4, 5) GROUP BY stbl.device_id, stbl.log_type, stbl.severity) AS tbl ON (dev.id=tbl.device_id::int) ORDER BY dev.name; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..45848850.65 rows=27100 width=79) Join Filter: ("outer".id = ("inner".device_id)::integer) -> Index Scan using idx_d_name on tbl_device dev (cost=0.00..1490.19 rows=1275 width=26) -> Subquery Scan tbl (cost=0.00..35552.21 rows=27100 width=26) -> Aggregate (cost=0.00..35552.21 rows=27100 width=26) -> Group (cost=0.00..34874.70 rows=271005 width=26) -> Index Scan using idx_le_id_type_severity_evtcode_20040922 on dtbl_logged_event_20040922 stbl (cost=0.00..32842.16 rows=271005 width=26) Filter: ((log_type = 2) OR (log_type = 3) OR (log_type = 4) OR (log_type = 5)) (8 rows) Time: 1.62 ms Notice that the query plan changes completely when I cast device_id as int. What is worse (and why I'm writing) is that when I run the second query, it goes into an infinite CPU loop. The original query completed in under 4 seconds. I've left the second query running for 30 minutes or more, and TOP show 100% CPU utilization and 0% disk I/O (0% iowait). We are starting to see this phenomenon in other queries which do *not* have any explicit type casting, but in which something like "cast(vsa.dtbl_logged_event_20040922.time_logged AS date)" is used in a WHERE clause. It's becoming a show-stopper until we understand what is happening. Any information or suggestions about this problem or making the query more efficient will be greatly appreciated. Thanks! --- Steve
pgsql-performance by date: