Re: Comparitive UPDATE speed - Mailing list pgsql-performance
From | Josh Berkus |
---|---|
Subject | Re: Comparitive UPDATE speed |
Date | |
Msg-id | web-1771475@davinci.ethosmedia.com Whole thread Raw |
In response to | Comparitive UPDATE speed (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Comparitive UPDATE speed
|
List | pgsql-performance |
Folks, Sorry for the double-quoting here. I sent this to just Ron by accident. My original question is double-quoted, Ron is quoted, and my responses are below. Thanks! > > Ok, I'm still confused. > > > > I'm updating a (not not indexed) field in a 117,000 row table based > on > > information in another 117,000 row table. The update is an > integer, and the > > linking fields are indexed. Yet the two queries are flattening my > > > dual-processor, RAID5 database server for up to 11 minutes ... > using 230mb > > ram the entire time. I simply can't believe that these two > queries are that > > difficult. > > So there's no index on elbs_matter_links.case_id? From your original > > post, I thought that there *is* an index on that field. I'm now dropping it before the update. Unfortunately, dropping the index made no appreciable gain in performance. > > I've increased the memory available to the update to 256mb, and > tried forcing > > an index scan ... to no avail. Ideas, please? > > > > The queries: > > > > UPDATE elbs_matter_links SET case_id = case_clients.case_id > > FROM case_clients > > WHERE elbs_matter_links.mmatter = case_clients.matter_no; > > What happens if you run the query: > SELECT eml.case_id, cc.case_id, eml.mmatter, cc.matter_no > FROM elbs_matter_links eml, > case_clients cc > WHERE eml.mmatter = cc.matter_no; > > That, for all intents and purposes, is your UPDATE statement, just > without doing the UPDATE. How fast does it run? Slowly. It takes about 60 seconds to return data. This may be the problem. Thoughts? Here's EXPLAIN output: Hash Join (cost=3076.10..91842.88 rows=108648 width=40) -> Seq Scan on elbs_matter_links eml (cost=0.00..85641.87 rows=117787 width=20) -> Hash (cost=2804.48..2804.48 rows=108648 width=20) -> Seq Scan on case_clients cc (cost=0.00..2804.48 rows=108648 width=20) According to the parser, using the indexes would be worse: Merge Join (cost=0.00..520624.38 rows=108648 width=40) -> Index Scan using idx_eml_mmatter on elbs_matter_links eml (cost=0.00..451735.00 rows=117787 width=20) -> Index Scan using idx_caseclients_matter on case_clients cc (cost=0.00..66965.20 rows=108648 width=20) Though in practice, a forced index scan returns rows in about 60 seconds, same as the SeqScan version. All of this seems very costly for a query that, while it does return a lot of rows, is essentially a very simple query. More importantly, on the hardware I'm using, I would expect better performance that I get on my laptop ... and I'm not seeing it. I just can't believe that the simple query above could soak 200mb of RAM for a full 60 seconds to return a result. It's like queries over a certain result size on the system choke postgres. My reference data below: ============================================== > > > UPDATE elbs_matter_links SET case_id = cases.case_id > > FROM cases > > WHERE elbs_matter_links.docket = cases.docket > > AND elbs_matter_links.case_id IS NULL; > > > > > > EXPLAIN output: > > > > Hash Join (cost=4204.83..39106.77 rows=8473 width=299) > > -> Index Scan using idx_eml_mmatter on elbs_matter_links > > (cost=0.00..34668.94 rows=8473 width=279) > > -> Hash (cost=2808.38..2808.38 rows=109038 width=20) > > -> Seq Scan on case_clients (cost=0.00..2808.38 > rows=109038 > > width=20) > > > > Nested Loop (cost=0.00..32338.47 rows=99 width=300) > > -> Seq Scan on cases (cost=0.00..9461.97 rows=4297 width=21) > > -> Index Scan using idx_eml_docket on elbs_matter_links > (cost=0.00..5.31 > > rows=1 width=279) > > > > Table defintions: > > > > Table "elbs_matter_links" > > Column | Type | Modifiers > > ------------------+-----------------------+----------------------- > > mmatter | character varying(15) | not null > > case_id | integer | > > matter_check | character varying(20) | not null default 'OK' > > docket | character varying(50) | > > case_name | character varying(50) | > > practice | character varying(50) | > > opp_counsel_name | character varying(50) | > > opp_counsel_id | integer | > > act_type | character varying(10) | > > lead_case_id | integer | > > lead_case_docket | character varying(50) | > > disease | character varying(50) | > > docket_no | character varying(25) | > > juris_state | character varying(6) | > > juris_local | character varying(20) | > > status | smallint | not null default 1 > > client_id | integer | > > office_loc | character varying(5) | > > date_filed | date | > > date_served | date | > > date_resolved | date | > > case_status | character varying(5) | > > settle_amount | numeric(12,2) | default 0 > > narrative | text | > > comment | character varying(50) | > > client_no | character varying(10) | > > juris_id | integer | > > Indexes: idx_eml_check, > > idx_eml_docket, > > idx_eml_mmatter > > Primary key: elbs_matter_links_pkey > > > > Table "case_clients" > > Column | Type | > Modifiers > > > ------------------+-----------------------+---------------------------------------------------- > > case_client_id | integer | not null default > > nextval('case_clients_seq'::text) > > case_id | integer | not null > > client_id | integer | not null > > office_loc | character varying(5) | > > date_filed | date | > > date_served | date | > > date_resolved | date | > > matter_no | character varying(15) | not null > > case_status | character varying(5) | not null > > settle_amount | numeric(14,2) | not null default 0 > > matter_narrative | text | > > comment | character varying(50) | > > Indexes: idx_case_clients_client, > > idx_caseclients_case, > > idx_caseclients_matter, > > idx_caseclients_resolved, > > idx_caseclients_served, > > idx_caseclients_status > > Primary key: case_clients_pkey > > > > > > Table "cases" > > Column | Type | > Modifiers > > > ------------------+-----------------------+--------------------------------------------- > > case_id | integer | not null default > > nextval('cases_seq'::text) > > docket | character varying(50) | not null > > case_name | character varying(50) | not null > > practice | character varying(50) | not null > > opp_counsel_name | character varying(50) | > > opp_counsel_id | integer | > > act_type | character varying(10) | > > lead_case_id | integer | > > lead_case_docket | character varying(50) | > > disease | character varying(50) | > > docket_no | character varying(25) | not null > > juris_state | character varying(6) | not null > > juris_local | character varying(20) | > > tgroup_id | integer | > > status | smallint | not null default 1 > > juris_id | integer | > > Indexes: idx_case_cases_juris, > > idx_cases_docket, > > idx_cases_lead, > > idx_cases_name, > > idx_cases_status, > > idx_cases_tgroup, > > idx_lower_case_name > > > > > > > > -- > > Josh Berkus > > josh@agliodbs.com > > Aglio Database Solutions > > San Francisco > -- > +------------------------------------------------------------+ > | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | > | Jefferson, LA USA http://members.cox.net/ron.l.johnson | > | | > | "What other evidence do you have that they are terrorists, | > | other than that they trained in these camps?" | > | 17-Sep-2002 Katie Couric to an FBI agent regarding the 5 | > | men arrested near Buffalo NY | > +------------------------------------------------------------+ > ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
pgsql-performance by date: