Re: same plan, add 1 condition, 1900x slower - Mailing list pgsql-performance
From | Mitchell Skinner |
---|---|
Subject | Re: same plan, add 1 condition, 1900x slower |
Date | |
Msg-id | 1131728255.10481.51.camel@firebolt Whole thread Raw |
In response to | Re: same plan, add 1 condition, 1900x slower (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: same plan, add 1 condition, 1900x slower
|
List | pgsql-performance |
On Fri, 2005-11-11 at 10:53 -0500, Tom Lane wrote: > After re-reading your explanation of what you're doing with the data, > I thought of a possible explanation. Is the "source" value exactly > correlated with the external_id_map primary key? Sort of. In this case, at the beginning of external_id_map, yes, though further down the table they're not. For example, if we got new subjects from 'SCH' at this point, they'd get assigned external_id_map.target_id (the primary key) values that are totally unrelated to what the current set are (the values in the external_id_map primary key just come off of a sequence that we use for everything). Right now though, since the 'SCH' data came in a contiguous chunk right at the beginning and hasn't changed or grown since then, the correlation is pretty exact, I think. It's true that there are no 'SCH' rows in the table after the first contiguous set (when I get back to work I'll check exactly what row that is). It's interesting that there are these correlations in the the data that didn't exist at all in my mental model. > what you need to do is incorporate the "source" value into the > external_id_map index key somehow. Then the index scan would be able to > realize that there is no possibility of finding another row with source > = 'SCH'. The simplest way is just to make a 2-column index I thought that's what I had done with the external_id_map_source_target_id index: statgen=> \d util.external_id_map Table "util.external_id_map" Column | Type | Modifiers -----------+-----------------------+----------- source_id | character varying(32) | not null source | character(3) | not null target_id | bigint | not null Indexes: "external_id_map_primary_key" PRIMARY KEY, btree (target_id) "external_id_map_source_source_id_unique" UNIQUE, btree (source, source_id) "external_id_map_source" btree (source) "external_id_map_source_target_id" btree (source, target_id) Foreign-key constraints: "external_id_map_source" FOREIGN KEY (source) REFERENCES util.source(id) So if I understand your suggestion correctly, we're back to the "why isn't this query using index foo" FAQ. For the external_id_map table, the statistics target for "source" is 200; the other two columns are at the default level because I didn't think of them as being very interesting statistics-wise. I suppose I should probably go ahead and raise the targets for every column of that table; I expect the planning time is negligible, and our queries tend to be large data-wise. Beyond that, I'm not sure how else to encourage the use of that index. If I changed that index to be (target_id, source) would it make a difference? Thanks for your help, Mitch
pgsql-performance by date: