Re: Comparitive UPDATE speed - Mailing list pgsql-performance

From Josh Berkus
Subject Re: Comparitive UPDATE speed
Date
Msg-id web-1771475@davinci.ethosmedia.com
Whole thread Raw
In response to Comparitive UPDATE speed  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Comparitive UPDATE speed
List pgsql-performance
Folks,

Sorry for the double-quoting here.   I sent this to just Ron by
accident.  My original question is double-quoted, Ron is quoted, and my
responses are below.  Thanks!

> > Ok, I'm still confused.
> >
> > I'm updating a (not not indexed) field in a 117,000 row table based
> on
> > information in another 117,000 row table.   The update is an
> integer, and the
> > linking fields are indexed.   Yet the two queries are flattening my
>
> > dual-processor, RAID5 database server for up to 11 minutes ...
> using 230mb
> > ram the entire time.   I simply can't believe that these two
> queries are that
> > difficult.
>
> So there's no index on elbs_matter_links.case_id?  From your original
>
> post, I thought that there *is* an index on that field.

I'm now dropping it before the update.   Unfortunately, dropping the
index made no appreciable gain in performance.

> > I've increased the memory available to the update to 256mb, and
> tried forcing
> > an index scan ... to no avail.   Ideas, please?
> >
> > The queries:
> >
> > UPDATE elbs_matter_links SET case_id = case_clients.case_id
> > FROM case_clients
> > WHERE elbs_matter_links.mmatter = case_clients.matter_no;
>
> What happens if you run the query:
> SELECT eml.case_id, cc.case_id, eml.mmatter, cc.matter_no
> FROM elbs_matter_links eml,
>      case_clients cc
> WHERE eml.mmatter = cc.matter_no;
>
> That, for all intents and purposes, is your UPDATE statement, just
> without doing the UPDATE.  How fast does it run?

Slowly.  It takes about 60 seconds to return data.  This may be the
problem.  Thoughts?  Here's EXPLAIN output:

Hash Join  (cost=3076.10..91842.88 rows=108648 width=40)
  ->  Seq Scan on elbs_matter_links eml  (cost=0.00..85641.87
rows=117787 width=20)
  ->  Hash  (cost=2804.48..2804.48 rows=108648 width=20)
        ->  Seq Scan on case_clients cc  (cost=0.00..2804.48
rows=108648 width=20)

According to the parser, using the indexes would be worse:

Merge Join  (cost=0.00..520624.38 rows=108648 width=40)
  ->  Index Scan using idx_eml_mmatter on elbs_matter_links eml
 (cost=0.00..451735.00 rows=117787 width=20)
  ->  Index Scan using idx_caseclients_matter on case_clients cc
 (cost=0.00..66965.20 rows=108648 width=20)

Though in practice, a forced index scan returns rows in about 60
seconds, same as the SeqScan version.

All of this seems very costly for a query that, while it does return a
lot of rows, is essentially a very simple query.

More importantly, on the hardware I'm using, I would expect better
performance that I get on my laptop ... and I'm not seeing it.  I just
can't believe that the simple query above could soak 200mb of RAM for a
full 60 seconds to return a result.    It's like queries over a certain
result size on the system choke postgres.


My reference data below:
==============================================

>
> > UPDATE elbs_matter_links SET case_id = cases.case_id
> > FROM cases
> > WHERE elbs_matter_links.docket = cases.docket
> >  AND elbs_matter_links.case_id IS NULL;
> >
> >
> > EXPLAIN output:
> >
> > Hash Join  (cost=4204.83..39106.77 rows=8473 width=299)
> >   ->  Index Scan using idx_eml_mmatter on elbs_matter_links
> > (cost=0.00..34668.94 rows=8473 width=279)
> >   ->  Hash  (cost=2808.38..2808.38 rows=109038 width=20)
> >         ->  Seq Scan on case_clients  (cost=0.00..2808.38
> rows=109038
> > width=20)
> >
> > Nested Loop  (cost=0.00..32338.47 rows=99 width=300)
> >   ->  Seq Scan on cases  (cost=0.00..9461.97 rows=4297 width=21)
> >   ->  Index Scan using idx_eml_docket on elbs_matter_links
>  (cost=0.00..5.31
> > rows=1 width=279)
> >
> > Table defintions:
> >
> > Table "elbs_matter_links"
> >       Column      |         Type          |       Modifiers
> > ------------------+-----------------------+-----------------------
> >  mmatter          | character varying(15) | not null
> >  case_id          | integer               |
> >  matter_check     | character varying(20) | not null default 'OK'
> >  docket           | character varying(50) |
> >  case_name        | character varying(50) |
> >  practice         | character varying(50) |
> >  opp_counsel_name | character varying(50) |
> >  opp_counsel_id   | integer               |
> >  act_type         | character varying(10) |
> >  lead_case_id     | integer               |
> >  lead_case_docket | character varying(50) |
> >  disease          | character varying(50) |
> >  docket_no        | character varying(25) |
> >  juris_state      | character varying(6)  |
> >  juris_local      | character varying(20) |
> >  status           | smallint              | not null default 1
> >  client_id        | integer               |
> >  office_loc       | character varying(5)  |
> >  date_filed       | date                  |
> >  date_served      | date                  |
> >  date_resolved    | date                  |
> >  case_status      | character varying(5)  |
> >  settle_amount    | numeric(12,2)         | default 0
> >  narrative        | text                  |
> >  comment          | character varying(50) |
> >  client_no        | character varying(10) |
> >  juris_id         | integer               |
> > Indexes: idx_eml_check,
> >          idx_eml_docket,
> >          idx_eml_mmatter
> > Primary key: elbs_matter_links_pkey
> >
> >                                      Table "case_clients"
> >       Column      |         Type          |
>                     Modifiers
> >
>
------------------+-----------------------+----------------------------------------------------
> >  case_client_id   | integer               | not null default
> > nextval('case_clients_seq'::text)
> >  case_id          | integer               | not null
> >  client_id        | integer               | not null
> >  office_loc       | character varying(5)  |
> >  date_filed       | date                  |
> >  date_served      | date                  |
> >  date_resolved    | date                  |
> >  matter_no        | character varying(15) | not null
> >  case_status      | character varying(5)  | not null
> >  settle_amount    | numeric(14,2)         | not null default 0
> >  matter_narrative | text                  |
> >  comment          | character varying(50) |
> > Indexes: idx_case_clients_client,
> >          idx_caseclients_case,
> >          idx_caseclients_matter,
> >          idx_caseclients_resolved,
> >          idx_caseclients_served,
> >          idx_caseclients_status
> > Primary key: case_clients_pkey
> >
> >
> >                                      Table "cases"
> >       Column      |         Type          |
>                  Modifiers
> >
>
------------------+-----------------------+---------------------------------------------
> >  case_id          | integer               | not null default
> > nextval('cases_seq'::text)
> >  docket           | character varying(50) | not null
> >  case_name        | character varying(50) | not null
> >  practice         | character varying(50) | not null
> >  opp_counsel_name | character varying(50) |
> >  opp_counsel_id   | integer               |
> >  act_type         | character varying(10) |
> >  lead_case_id     | integer               |
> >  lead_case_docket | character varying(50) |
> >  disease          | character varying(50) |
> >  docket_no        | character varying(25) | not null
> >  juris_state      | character varying(6)  | not null
> >  juris_local      | character varying(20) |
> >  tgroup_id        | integer               |
> >  status           | smallint              | not null default 1
> >  juris_id         | integer               |
> > Indexes: idx_case_cases_juris,
> >          idx_cases_docket,
> >          idx_cases_lead,
> >          idx_cases_name,
> >          idx_cases_status,
> >          idx_cases_tgroup,
> >          idx_lower_case_name
> >
> >
> >
> > --
> > Josh Berkus
> > josh@agliodbs.com
> > Aglio Database Solutions
> > San Francisco
> --
> +------------------------------------------------------------+
> | Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
> | Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
> |                                                            |
> | "What other evidence do you have that they are terrorists, |
> |  other than that they trained in these camps?"             |
> |   17-Sep-2002 Katie Couric to an FBI agent regarding the 5 |
> |   men arrested near Buffalo NY                             |
> +------------------------------------------------------------+
>

______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

pgsql-performance by date:

Previous
From: "Shridhar Daithankar"
Date:
Subject: Re: [HACKERS] Large databases, performance
Next
From: "scott.marlowe"
Date:
Subject: Re: [GENERAL] [HACKERS] Large databases, performance