Thread: Comparitive UPDATE speed
Relative performance question: I have 2 UPDATE queires in a function. table_a: 117,000 records table_b: 117,000 records table_c: 1.5 million records #1 updates table_a, field_2 from table_b, field_1 based on a joining field_3. Around 110,000 updates #2 updates table_a, field_5 from table_c, field_2 joining on field_3. Around 110,000 updates. #1 takes 5-7 minutes; #2 takes about 15 seconds. The only difference I can discern is that table_a, field_2 is indexed and table_a, field_5 is not. Is it reasonable that updating the index would actually make the query take 20x longer? If not, I'll post actual table defs and query statements. -- Josh Berkus josh@agliodbs.com Aglio Database Solutions San Francisco
Randy, > I'm not sure about 20 times longer but you would have index records > that > would need to be changed. Is field_3 indexed in all 3 tables? If > table_b > does not have an index on field_3 and the other tables do, I'd guess > that > would make this take longer too. Yeah, they're indexed. I'm going to try the updates without the index on field_2 tonight. -Josh Berkus
On Tue, 2002-10-01 at 16:51, Josh Berkus wrote: > Relative performance question: > > I have 2 UPDATE queires in a function. > > table_a: 117,000 records > table_b: 117,000 records > table_c: 1.5 million records > > #1 updates table_a, field_2 from table_b, field_1 based on a joining field_3. > Around 110,000 updates > #2 updates table_a, field_5 from table_c, field_2 joining on field_3. > Around 110,000 updates. > > #1 takes 5-7 minutes; #2 takes about 15 seconds. The only difference I can > discern is that table_a, field_2 is indexed and table_a, field_5 is not. > > Is it reasonable that updating the index would actually make the query take > 20x longer? If not, I'll post actual table defs and query statements. Absolutely. You are doing lots of extra work. For each of the 110,000 updates, you are deleting a leaf node from one part of the index tree and then inserting it into another part of the tree. It will get even worse as you add more rows to table_a, since the index tree will get deeper, and more work work must be done during each insert and delete. -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "What other evidence do you have that they are terrorists, | | other than that they trained in these camps?" | | 17-Sep-2002 Katie Couric to an FBI agent regarding the 5 | | men arrested near Buffalo NY | +------------------------------------------------------------+
Folks, Sorry for the double-quoting here. I sent this to just Ron by accident. My original question is double-quoted, Ron is quoted, and my responses are below. Thanks! > > Ok, I'm still confused. > > > > I'm updating a (not not indexed) field in a 117,000 row table based > on > > information in another 117,000 row table. The update is an > integer, and the > > linking fields are indexed. Yet the two queries are flattening my > > > dual-processor, RAID5 database server for up to 11 minutes ... > using 230mb > > ram the entire time. I simply can't believe that these two > queries are that > > difficult. > > So there's no index on elbs_matter_links.case_id? From your original > > post, I thought that there *is* an index on that field. I'm now dropping it before the update. Unfortunately, dropping the index made no appreciable gain in performance. > > I've increased the memory available to the update to 256mb, and > tried forcing > > an index scan ... to no avail. Ideas, please? > > > > The queries: > > > > UPDATE elbs_matter_links SET case_id = case_clients.case_id > > FROM case_clients > > WHERE elbs_matter_links.mmatter = case_clients.matter_no; > > What happens if you run the query: > SELECT eml.case_id, cc.case_id, eml.mmatter, cc.matter_no > FROM elbs_matter_links eml, > case_clients cc > WHERE eml.mmatter = cc.matter_no; > > That, for all intents and purposes, is your UPDATE statement, just > without doing the UPDATE. How fast does it run? Slowly. It takes about 60 seconds to return data. This may be the problem. Thoughts? Here's EXPLAIN output: Hash Join (cost=3076.10..91842.88 rows=108648 width=40) -> Seq Scan on elbs_matter_links eml (cost=0.00..85641.87 rows=117787 width=20) -> Hash (cost=2804.48..2804.48 rows=108648 width=20) -> Seq Scan on case_clients cc (cost=0.00..2804.48 rows=108648 width=20) According to the parser, using the indexes would be worse: Merge Join (cost=0.00..520624.38 rows=108648 width=40) -> Index Scan using idx_eml_mmatter on elbs_matter_links eml (cost=0.00..451735.00 rows=117787 width=20) -> Index Scan using idx_caseclients_matter on case_clients cc (cost=0.00..66965.20 rows=108648 width=20) Though in practice, a forced index scan returns rows in about 60 seconds, same as the SeqScan version. All of this seems very costly for a query that, while it does return a lot of rows, is essentially a very simple query. More importantly, on the hardware I'm using, I would expect better performance that I get on my laptop ... and I'm not seeing it. I just can't believe that the simple query above could soak 200mb of RAM for a full 60 seconds to return a result. It's like queries over a certain result size on the system choke postgres. My reference data below: ============================================== > > > UPDATE elbs_matter_links SET case_id = cases.case_id > > FROM cases > > WHERE elbs_matter_links.docket = cases.docket > > AND elbs_matter_links.case_id IS NULL; > > > > > > EXPLAIN output: > > > > Hash Join (cost=4204.83..39106.77 rows=8473 width=299) > > -> Index Scan using idx_eml_mmatter on elbs_matter_links > > (cost=0.00..34668.94 rows=8473 width=279) > > -> Hash (cost=2808.38..2808.38 rows=109038 width=20) > > -> Seq Scan on case_clients (cost=0.00..2808.38 > rows=109038 > > width=20) > > > > Nested Loop (cost=0.00..32338.47 rows=99 width=300) > > -> Seq Scan on cases (cost=0.00..9461.97 rows=4297 width=21) > > -> Index Scan using idx_eml_docket on elbs_matter_links > (cost=0.00..5.31 > > rows=1 width=279) > > > > Table defintions: > > > > Table "elbs_matter_links" > > Column | Type | Modifiers > > ------------------+-----------------------+----------------------- > > mmatter | character varying(15) | not null > > case_id | integer | > > matter_check | character varying(20) | not null default 'OK' > > docket | character varying(50) | > > case_name | character varying(50) | > > practice | character varying(50) | > > opp_counsel_name | character varying(50) | > > opp_counsel_id | integer | > > act_type | character varying(10) | > > lead_case_id | integer | > > lead_case_docket | character varying(50) | > > disease | character varying(50) | > > docket_no | character varying(25) | > > juris_state | character varying(6) | > > juris_local | character varying(20) | > > status | smallint | not null default 1 > > client_id | integer | > > office_loc | character varying(5) | > > date_filed | date | > > date_served | date | > > date_resolved | date | > > case_status | character varying(5) | > > settle_amount | numeric(12,2) | default 0 > > narrative | text | > > comment | character varying(50) | > > client_no | character varying(10) | > > juris_id | integer | > > Indexes: idx_eml_check, > > idx_eml_docket, > > idx_eml_mmatter > > Primary key: elbs_matter_links_pkey > > > > Table "case_clients" > > Column | Type | > Modifiers > > > ------------------+-----------------------+---------------------------------------------------- > > case_client_id | integer | not null default > > nextval('case_clients_seq'::text) > > case_id | integer | not null > > client_id | integer | not null > > office_loc | character varying(5) | > > date_filed | date | > > date_served | date | > > date_resolved | date | > > matter_no | character varying(15) | not null > > case_status | character varying(5) | not null > > settle_amount | numeric(14,2) | not null default 0 > > matter_narrative | text | > > comment | character varying(50) | > > Indexes: idx_case_clients_client, > > idx_caseclients_case, > > idx_caseclients_matter, > > idx_caseclients_resolved, > > idx_caseclients_served, > > idx_caseclients_status > > Primary key: case_clients_pkey > > > > > > Table "cases" > > Column | Type | > Modifiers > > > ------------------+-----------------------+--------------------------------------------- > > case_id | integer | not null default > > nextval('cases_seq'::text) > > docket | character varying(50) | not null > > case_name | character varying(50) | not null > > practice | character varying(50) | not null > > opp_counsel_name | character varying(50) | > > opp_counsel_id | integer | > > act_type | character varying(10) | > > lead_case_id | integer | > > lead_case_docket | character varying(50) | > > disease | character varying(50) | > > docket_no | character varying(25) | not null > > juris_state | character varying(6) | not null > > juris_local | character varying(20) | > > tgroup_id | integer | > > status | smallint | not null default 1 > > juris_id | integer | > > Indexes: idx_case_cases_juris, > > idx_cases_docket, > > idx_cases_lead, > > idx_cases_name, > > idx_cases_status, > > idx_cases_tgroup, > > idx_lower_case_name > > > > > > > > -- > > Josh Berkus > > josh@agliodbs.com > > Aglio Database Solutions > > San Francisco > -- > +------------------------------------------------------------+ > | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | > | Jefferson, LA USA http://members.cox.net/ron.l.johnson | > | | > | "What other evidence do you have that they are terrorists, | > | other than that they trained in these camps?" | > | 17-Sep-2002 Katie Couric to an FBI agent regarding the 5 | > | men arrested near Buffalo NY | > +------------------------------------------------------------+ > ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
On Fri, Oct 04, 2002 at 08:54:56AM -0700, Josh Berkus wrote: > Slowly. It takes about 60 seconds to return data. This may be the > problem. Thoughts? Here's EXPLAIN output: [. . .] > According to the parser, using the indexes would be worse: Have you run this with EXPLAIN ANALYSE? It will actually perform the necessary steps, so it will reveal if the planner is getting something wrong. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Andrew, > Have you run this with EXPLAIN ANALYSE? It will actually perform the > necessary steps, so it will reveal if the planner is getting > something wrong. Here it is: Hash Join (cost=3076.10..91842.88 rows=108648 width=40) (actual time=18625.19..22823.39 rows=108546 loops=1) -> Seq Scan on elbs_matter_links eml (cost=0.00..85641.87 rows=117787 width=20) (actual time=18007.69..19515.63 rows=117787 loops=1) -> Hash (cost=2804.48..2804.48 rows=108648 width=20) (actual time=602.12..602.12 rows=0 loops=1) -> Seq Scan on case_clients cc (cost=0.00..2804.48 rows=108648 width=20) (actual time=5.18..370.68 rows=108648 loops=1) Total runtime: 22879.26 msec The above doesn't seem bad, except that this is some serious hardware in this system and 23 seconds right after VACUUM ANALYZE is too long. I've a feeling that I botched one of my postgresql.conf parameters or something. I'll do an explain for the UPDATE query later, when the users are off the system. -Josh Berkus -- Josh Berkus josh@agliodbs.com Aglio Database Solutions San Francisco
On Fri, Oct 04, 2002 at 11:13:09AM -0700, Josh Berkus wrote: > > Andrew, > > > Have you run this with EXPLAIN ANALYSE? It will actually perform the > > necessary steps, so it will reveal if the planner is getting > > something wrong. > > Here it is: Oops, sorry. What if you force the index use here? Just because the planner thinks that's more expensive doesn't mean that it is. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Andrew, > Oops, sorry. What if you force the index use here? Just because the > planner thinks that's more expensive doesn't mean that it is. Yeah, I tried it ... no faster, no slower, really. BTW, in case you missed it, the real concern is that an UPDATE query similar to the SELECT query we are discussing takes over 10 minutes, which on this hardware is ridiculous. Robert suggested that we test the SELECT query to see if there were general performance problems; apparently, there are. The hardware I'm using is: dual-processor Athalon 1400mhz motherboard raid 5 UW SCSI drive array with 3 drives 512mb DDR RAM SuSE Linux 7.3 (Kernel 2.4.10) Postgres is on its own LVM partition PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.95.3 (will upgrade to 7.2.3 very soon) Postgresql.conf has: fdatasync, various chared memory tuned to allocate 256mb to postgres (which seems to be working correctly). Debug level 2. When the UPDATE query takes a long time, I generally can watch the log hover in the land of "Reaping dead child processes" for 30-90 seconds per iteration. -- Josh Berkus josh@agliodbs.com Aglio Database Solutions San Francisco
On Fri, Oct 04, 2002 at 12:09:42PM -0700, Josh Berkus wrote: > BTW, in case you missed it, the real concern is that an UPDATE query similar > to the SELECT query we are discussing takes over 10 minutes, which on this > hardware is ridiculous. Robert suggested that we test the SELECT query to > see if there were general performance problems; apparently, there are. Yes, that's my thought, too. > Postgresql.conf has: fdatasync, various chared memory tuned to allocate 256mb > to postgres (which seems to be working correctly). Hmm. Are you swapping? Lots of temp files? (I presume you've been over all that.) Half your physical memory seems pretty dangerous to me. If oyu reduce that, does it help? > When the UPDATE query takes a long time, I generally can watch the log hover > in the land of "Reaping dead child processes" for 30-90 seconds per > iteration. Ick. Hmm. What sort of numbers do you get from vmstat, iostat, sar, and friends? A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Josh Berkus <josh@agliodbs.com> writes: > Hash Join (cost=3076.10..91842.88 rows=108648 width=40) (actual > time=18625.19..22823.39 rows=108546 loops=1) > -> Seq Scan on elbs_matter_links eml (cost=0.00..85641.87 rows=117787 > width=20) (actual time=18007.69..19515.63 rows=117787 loops=1) > -> Hash (cost=2804.48..2804.48 rows=108648 width=20) (actual > time=602.12..602.12 rows=0 loops=1) > -> Seq Scan on case_clients cc (cost=0.00..2804.48 rows=108648 > width=20) (actual time=5.18..370.68 rows=108648 loops=1) > Total runtime: 22879.26 msec Hm. Why does it take 19500 milliseconds to read 117787 rows from elbs_matter_links, if 108648 rows can be read from case_clients in 370 msec? And why does the output show that the very first of those rows was returned only after 18000 msec? I am suspicious that this table has a huge number of empty pages in it, mostly at the beginning. If so, a VACUUM FULL would help. (Try "vacuum full verbose elbs_matter_links" and see if it indicates it's reclaiming any large number of pages.) If that proves to be the answer, you need to look to your FSM parameters, and perhaps arrange for more frequent regular vacuums of this table. regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: > When the UPDATE query takes a long time, I generally can watch the log hover > in the land of "Reaping dead child processes" for 30-90 seconds per > iteration. Uh ... would you translate that observation into English please? Or better, provide the log output you're looking at? regards, tom lane
Tom, > I am suspicious that this table has a huge number of empty pages in it, > mostly at the beginning. If so, a VACUUM FULL would help. (Try > "vacuum full verbose elbs_matter_links" and see if it indicates it's > reclaiming any large number of pages.) Thank you. Aha. That appears to have been the main problem; apparently, at some time during my tinkering, I dumped most of the rows from elbs_matter_links a couple of times. Ooops. I'll post the new situation when I test the update queries tonight. -- Josh Berkus josh@agliodbs.com Aglio Database Solutions San Francisco