Re: same plan, add 1 condition, 1900x slower - Mailing list pgsql-performance

From Mitchell Skinner
Subject Re: same plan, add 1 condition, 1900x slower
Date
Msg-id 1131728255.10481.51.camel@firebolt
Whole thread Raw
In response to Re: same plan, add 1 condition, 1900x slower  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: same plan, add 1 condition, 1900x slower  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Fri, 2005-11-11 at 10:53 -0500, Tom Lane wrote:
> After re-reading your explanation of what you're doing with the data,
> I thought of a possible explanation.  Is the "source" value exactly
> correlated with the external_id_map primary key?

Sort of.  In this case, at the beginning of external_id_map, yes, though
further down the table they're not.  For example, if we got new subjects
from 'SCH' at this point, they'd get assigned external_id_map.target_id
(the primary key) values that are totally unrelated to what the current
set are (the values in the external_id_map primary key just come off of
a sequence that we use for everything).

Right now though, since the 'SCH' data came in a contiguous chunk right
at the beginning and hasn't changed or grown since then, the correlation
is pretty exact, I think.  It's true that there are no 'SCH' rows in the
table after the first contiguous set (when I get back to work I'll check
exactly what row that is).  It's interesting that there are these
correlations in the the data that didn't exist at all in my mental
model.

> what you need to do is incorporate the "source" value into the
> external_id_map index key somehow.  Then the index scan would be able to
> realize that there is no possibility of finding another row with source
> = 'SCH'.  The simplest way is just to make a 2-column index

I thought that's what I had done with the
external_id_map_source_target_id index:

statgen=> \d util.external_id_map
         Table "util.external_id_map"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
 source_id | character varying(32) | not null
 source    | character(3)          | not null
 target_id | bigint                | not null
Indexes:
    "external_id_map_primary_key" PRIMARY KEY, btree (target_id)
    "external_id_map_source_source_id_unique" UNIQUE, btree (source,
source_id)
    "external_id_map_source" btree (source)
    "external_id_map_source_target_id" btree (source, target_id)
Foreign-key constraints:
    "external_id_map_source" FOREIGN KEY (source) REFERENCES
util.source(id)

So if I understand your suggestion correctly, we're back to the "why
isn't this query using index foo" FAQ.  For the external_id_map table,
the statistics target for "source" is 200; the other two columns are at
the default level because I didn't think of them as being very
interesting statistics-wise.  I suppose I should probably go ahead and
raise the targets for every column of that table; I expect the planning
time is negligible, and our queries tend to be large data-wise.  Beyond
that, I'm not sure how else to encourage the use of that index.  If I
changed that index to be (target_id, source) would it make a difference?

Thanks for your help,
Mitch

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: same plan, add 1 condition, 1900x slower
Next
From: Tom Lane
Date:
Subject: Re: same plan, add 1 condition, 1900x slower