Thread: same plan, add 1 condition, 1900x slower
This is with Postgres 8.0.3. Any advice is appreciated. I'm not sure exactly what I expect, but I was hoping that if it used the external_id_map_source_target_id index it would be faster. Mainly I was surprised that the same plan could perform so much differently with just an extra condition. I've increased the statistics target on util.external_id_map.source, but I'm fuzzy on exactly where (what columns) the planner could use more information. statgen=> explain analyze select * from subject_source; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=0.00..316.79 rows=1186 width=46) (actual time=0.136..9.808 rows=1186 loops=1) Merge Cond: ("outer".id = "inner".target_id) -> Index Scan using subject_pkey on subject norm (cost=0.00..63.36 rows=1186 width=28) (actual time=0.050..1.834 rows=1186 loops=1) -> Index Scan using external_id_map_primary_key on external_id_map eim (cost=0.00..2345747.01 rows=15560708 width=26) (actual time=0.061..2.944 rows=2175 loops=1) Total runtime: 10.745 ms (5 rows) statgen=> explain analyze select * from subject_source where source='SCH'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=0.00..640.95 rows=1 width=46) (actual time=0.043..21074.403 rows=1186 loops=1) Merge Cond: ("outer".id = "inner".target_id) -> Index Scan using subject_pkey on subject norm (cost=0.00..63.36 rows=1186 width=28) (actual time=0.014..1.478 rows=1186 loops=1) -> Index Scan using external_id_map_primary_key on external_id_map eim (cost=0.00..2384648.78 rows=4150 width=26) (actual time=0.020..21068.508 rows=1186 loops=1) Filter: (source = 'SCH'::bpchar) Total runtime: 21075.142 ms (6 rows) statgen=> \d subject Table "public.subject" Column | Type | Modifiers ---------+---------+----------- id | bigint | not null sex | integer | parent1 | bigint | parent2 | bigint | Indexes: "subject_pkey" PRIMARY KEY, btree (id) Foreign-key constraints: "subject_parent1" FOREIGN KEY (parent1) REFERENCES subject(id) DEFERRABLE INITIALLY DEFERRED "subject_parent2" FOREIGN KEY (parent2) REFERENCES subject(id) DEFERRABLE INITIALLY DEFERRED "subject_id_map" FOREIGN KEY (id) REFERENCES util.external_id_map(target_id) DEFERRABLE INITIALLY DEFERRED statgen=> \d subject_source View "public.subject_source" Column | Type | Modifiers -----------+-----------------------+----------- id | bigint | sex | integer | parent1 | bigint | parent2 | bigint | source | character(3) | source_id | character varying(32) | View definition: SELECT norm.id, norm.sex, norm.parent1, norm.parent2, eim.source, eim.source_id FROM subject norm JOIN util.external_id_map eim ON norm.id = eim.target_id; statgen=> \d util.external_id_map Table "util.external_id_map" Column | Type | Modifiers -----------+-----------------------+----------- source_id | character varying(32) | not null source | character(3) | not null target_id | bigint | not null Indexes: "external_id_map_primary_key" PRIMARY KEY, btree (target_id) "external_id_map_source_source_id_unique" UNIQUE, btree (source, source_id) "external_id_map_source" btree (source) "external_id_map_source_target_id" btree (source, target_id) Foreign-key constraints: "external_id_map_source" FOREIGN KEY (source) REFERENCES util.source(id) Thanks in advance, Mitch
Mitch Skinner <mitch@egcrc.net> writes: > This is with Postgres 8.0.3. Any advice is appreciated. These are exactly the same plan, except for the addition of the extra filter condition ... > -> Index Scan using external_id_map_primary_key on external_id_map > eim (cost=0.00..2345747.01 rows=15560708 width=26) (actual > time=0.061..2.944 rows=2175 loops=1) > -> Index Scan using external_id_map_primary_key on external_id_map > eim (cost=0.00..2384648.78 rows=4150 width=26) (actual > time=0.020..21068.508 rows=1186 loops=1) > Filter: (source = 'SCH'::bpchar) Apparently, you are using a platform and/or locale in which strcoll() is spectacularly, god-awfully slow --- on the order of 10 msec per comparison. This is a bit hard to believe but I can't make sense of those numbers any other way. What is the platform exactly, and what database locale and encoding are you using? regards, tom lane
On Thu, 2005-11-10 at 12:23 -0500, Tom Lane wrote: > Apparently, you are using a platform and/or locale in which strcoll() is > spectacularly, god-awfully slow --- on the order of 10 msec per comparison. The version with the condition is definitely doing more I/O. The version without the condition doesn't read at all. I strace'd an explain analyze for each separately, and this is what I ended up with (the first is with the condition, the second is without): bash-2.05b$ cut '-d(' -f1 subsourcestrace | sort | uniq -c 7127 gettimeofday 75213 _llseek 1 Process 30227 attached - interrupt to quit 1 Process 30227 detached 148671 read 2 recv 4 semop 4 send bash-2.05b$ cut '-d(' -f1 subsourcestrace-nocond | sort | uniq -c 9103 gettimeofday 7 _llseek 1 Process 30227 attached - interrupt to quit 1 Process 30227 detached 2 recv 4 send For the moment, all of the rows in the view I'm selecting from satisfy the condition, so the output of both queries is the same. The relevant rows of the underlying tables are probably pretty contiguous (all of the rows satisfying the condition and the join were inserted at the same time). Could it just be the result of a weird physical distribution of data in the table/index files? For the fast query, the actual number of rows is a lot less than the planner expects. > This is a bit hard to believe but I can't make sense of those numbers > any other way. What is the platform exactly, and what database locale > and encoding are you using? It's RHEL 3 on x86: [root@rehoboam root]# uname -a Linux rehoboam 2.4.21-32.0.1.ELsmp #1 SMP Tue May 17 17:52:23 EDT 2005 i686 i686 i386 GNU/Linux The glibc version is 2.3.2. statgen=# select current_setting('lc_collate'); current_setting ----------------- en_US.UTF-8 Not sure what's relevant, but here's some more info: The machine has 4.5GiB of RAM and a 5-disk Raid 5. It's a dual xeon 3.2ghz. relname | relpages | reltuples -----------------------------+----------+------------- external_id_map | 126883 | 1.55625e+07 external_id_map_primary_key | 64607 | 1.55625e+07 subject | 31 | 1186 subject_pkey | 19 | 1186 I've attached the output of "select name, setting from pg_settings". And, in case my original message isn't handy, the explain analyze output and table/view info is below. Thanks for taking a look, Mitch statgen=> explain analyze select * from subject_source; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=0.00..330.72 rows=1186 width=46) (actual time=0.051..8.890 rows=1186 loops=1) Merge Cond: ("outer".id = "inner".target_id) -> Index Scan using subject_pkey on subject norm (cost=0.00..63.36 rows=1186 width=28) (actual time=0.022..1.441 rows=1186 loops=1) -> Index Scan using external_id_map_primary_key on external_id_map eim (cost=0.00..2485226.70 rows=15562513 width=26) (actual time=0.016..2.532 rows=2175 loops=1) Total runtime: 9.592 ms (5 rows) statgen=> explain analyze select * from subject_source where source='SCH'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=0.00..1147.33 rows=1 width=46) (actual time=0.054..20258.161 rows=1186 loops=1) Merge Cond: ("outer".id = "inner".target_id) -> Index Scan using subject_pkey on subject norm (cost=0.00..63.36 rows=1186 width=28) (actual time=0.022..1.478 rows=1186 loops=1) -> Index Scan using external_id_map_primary_key on external_id_map eim (cost=0.00..2524132.99 rows=2335 width=26) (actual time=0.022..20252.326 rows=1186 loops=1) Filter: (source = 'SCH'::bpchar) Total runtime: 20258.922 ms (6 rows) statgen=> \d subject_source View "public.subject_source" Column | Type | Modifiers -----------+-----------------------+----------- id | bigint | sex | integer | parent1 | bigint | parent2 | bigint | source | character(3) | source_id | character varying(32) | View definition: SELECT norm.id, norm.sex, norm.parent1, norm.parent2, eim.source, eim.source_id FROM subject norm JOIN util.external_id_map eim ON norm.id = eim.target_id; statgen=> \d subject Table "public.subject" Column | Type | Modifiers ---------+---------+----------- id | bigint | not null sex | integer | parent1 | bigint | parent2 | bigint | Indexes: "subject_pkey" PRIMARY KEY, btree (id) Foreign-key constraints: "subject_parent1" FOREIGN KEY (parent1) REFERENCES subject(id) DEFERRABLE INITIALLY DEFERRED "subject_parent2" FOREIGN KEY (parent2) REFERENCES subject(id) DEFERRABLE INITIALLY DEFERRED "subject_id_map" FOREIGN KEY (id) REFERENCES util.external_id_map(target_id) DEFERRABLE INITIALLY DEFERRED statgen=> \d util.external_id_map Table "util.external_id_map" Column | Type | Modifiers -----------+-----------------------+----------- source_id | character varying(32) | not null source | character(3) | not null target_id | bigint | not null Indexes: "external_id_map_primary_key" PRIMARY KEY, btree (target_id) "external_id_map_source_source_id_unique" UNIQUE, btree (source, source_id) "external_id_map_source" btree (source) "external_id_map_source_target_id" btree (source, target_id) Foreign-key constraints: "external_id_map_source" FOREIGN KEY (source) REFERENCES util.source(id)
Attachment
Mitch Skinner wrote: > > The version with the condition is definitely doing more I/O. The > version without the condition doesn't read at all. [snip] > relname | relpages | reltuples > -----------------------------+----------+------------- > external_id_map | 126883 | 1.55625e+07 > external_id_map_primary_key | 64607 | 1.55625e+07 > subject | 31 | 1186 > subject_pkey | 19 | 1186 Does external_id_map really have 15 million rows? If not, try a VACUUM FULL on it. Be prepared to give it some time to complete. -- Richard Huxton Archonet Ltd
Mitch Skinner <mitch@egcrc.net> writes: > On Thu, 2005-11-10 at 12:23 -0500, Tom Lane wrote: >> Apparently, you are using a platform and/or locale in which strcoll() is >> spectacularly, god-awfully slow --- on the order of 10 msec per comparison. > The version with the condition is definitely doing more I/O. The > version without the condition doesn't read at all. That's pretty interesting, but what file(s) is it reading exactly? It could still be strcoll's fault. The only plausible explanation I can think of for strcoll being so slow is if for some reason it were re-reading the locale definition file every time, instead of setting up just once. If it is hitting Postgres files, it'd be interesting to look at exactly which files and what the distribution of seek offsets is. regards, tom lane
Richard Huxton <dev@archonet.com> writes: > Mitch Skinner wrote: >> The version with the condition is definitely doing more I/O. The >> version without the condition doesn't read at all. > Does external_id_map really have 15 million rows? If not, try a VACUUM > FULL on it. Be prepared to give it some time to complete. Please don't, actually, until we understand what's going on. The thing is that the given plan will fetch every row indicated by the index in both cases, in order to check the row's visibility. I don't see how an additional test on a non-indexed column would cause any additional I/O. If the value were large enough to be toasted out-of-line then it could cause toast table accesses ... but we're speaking of a char(3). regards, tom lane
On Fri, 2005-11-11 at 11:51 +0000, Richard Huxton wrote: > Does external_id_map really have 15 million rows? If not, try a VACUUM > FULL on it. Be prepared to give it some time to complete. Thanks for the reply. It does indeed have that many rows: statgen=> select count(*) from util.external_id_map ; count ---------- 15562513 (1 row) That table never gets deletions or updates, only insertions and reads. For fun and base-covering, I'm running a full vacuum now. Usually there's just a nightly lazy vacuum. If it helps, here's some background on what we're doing and why (plus some stuff at the end about how it relates to Postgres): We get very similar data from multiple sources, and I want to be able to combine it all into one schema. The data from different sources is similar enough (it's generally constrained by the underlying biology, e.g., each person has a father and a mother, two versions of each regular chromosome, etc.) that I think putting it all into one set of tables makes sense. Different people in our group use different tools (Python, R, Java), so instead of integrating at the code level (like a shared class hierarchy) we use the schema as our shared idea of the data. This helps make my analyses comparable to the analyses from my co-workers. We don't all want to have to write basic sanity checks in each of our languages, so we want to be able to have foreign keys in the schema. Having foreign keys and multiple data sources means that we have to generate our own internal identifiers (otherwise we'd expect to have ID collisions from different sources). I'd like to be able to have a stable internal-external ID mapping (this is actually something we spent a lot of time arguing about), so we have a table that does exactly that. When we import data, we do a bunch of joins against the external_id_map table to translate external IDs into internal IDs. It means that the external_id_map table gets pretty big and the joins can take a long time (it takes four hours to import one 11-million row source table into our canonical schema, because we have to do 5 ID translations per row on that one), but we don't need to import data too often so it works. The main speed concern is that exploratory data analyses are pretty interactive, and also sometimes you want to run a bunch of analyses in parallel, and if the queries are slow that can be a bottleneck. I'm looking forward to partitioning the external_id_map table with 8.1, and when Greenplum comes out with their stuff we'll probably take a look. If the main Postgres engine had parallel query execution, I'd be pretty happy. I also followed the external sort thread with interest, but I didn't get the impression that there was a very clear consensus there. Since some of our sources change over time, and I can't generally expect them to have timestamps on their data, what we do when we re-import from a source is delete everything out of the canonical tables from that source and then re-insert. It sounds like mass deletions are not such a common thing to do; I think there was a thread about this recently and Tom questioned the real-world need to worry about that workload. I was thinking that maybe the foreign key integrity checks might be better done by a join rather than a per-deleted-row trigger queue, but since all my foreign keys are indexed on both ends it doesn't look like a bottleneck. Anyway, all that probably has an effect on the data distribution in our tables and indexes. I'll report back on the effect of the full vacuum. Thanks for reading, Mitch
On Fri, 2005-11-11 at 09:17 -0500, Tom Lane wrote: > Richard Huxton <dev@archonet.com> writes: > > Does external_id_map really have 15 million rows? If not, try a VACUUM > > FULL on it. Be prepared to give it some time to complete. > > Please don't, actually, until we understand what's going on. Ack, I was the middle of the vacuum full already when I got this. I still have the strace and lsof output from before the vacuum full. It's definitely reading Postgres files: bash-2.05b$ grep '^read' subsourcestrace | cut -d, -f1 | sort | uniq -c 100453 read(44 48218 read(47 bash-2.05b$ grep 'seek' subsourcestrace | cut -d, -f1 | sort | uniq -c 1 _llseek(40 1 _llseek(43 35421 _llseek(44 1 _llseek(45 1 _llseek(46 39787 _llseek(47 1 _llseek(48 File handles: 44 - external_id_map 47 - external_id_map_primary_key 40 - subject 43 - subject_pkey 45 - external_id_map_source 46 - external_id_map_source_target_id 48 - external_id_map_source_source_id_unique As far as the seek offsets go, R doesn't want to do a histogram for me without using up more RAM than I have. I put up some files at: http://arctur.us/pgsql/ They are: subsourcestrace - the strace output from "select * from subject_source where source='SCH'" subsourcestrace-nocond - the strace output from "select * from subject_source" subsourcelsof - the lsof output (for mapping from file handles to file names) relfilenode.html - for mapping from file names to table/index names (I think I've gotten all the relevant file handle-table name mappings above, though) seekoff-44 - just the beginning seek offsets for the 44 file handle (external_id_map) seekoff-47 - just the beginning seek offsets for the 47 file handle (external_id_map_primary_key) The vacuum full is still going; I'll let you know if it changes things. > The thing is that the given plan will fetch every row indicated by the > index in both cases, in order to check the row's visibility. I don't > see how an additional test on a non-indexed column would cause any > additional I/O. If the value were large enough to be toasted > out-of-line then it could cause toast table accesses ... but we're > speaking of a char(3). Pardon my ignorance, but do the visibility check and the check of the condition happen at different stages of execution? Would it end up checking the condition for all 15M rows, but only checking visibility for the 1200 rows that come back from the join? I guess I'm confused about what "every row indicated by the index" means in the context of the join. Thanks for taking an interest, Mitch
Mitch Skinner <mitch@egcrc.net> writes: > On Fri, 2005-11-11 at 09:17 -0500, Tom Lane wrote: >> Please don't, actually, until we understand what's going on. > Ack, I was the middle of the vacuum full already when I got this. Given what you said about no deletions or updates, the vacuum should have no effect anyway, so don't panic. > I put up some files at: http://arctur.us/pgsql/ Great, I'll take a look ... > Pardon my ignorance, but do the visibility check and the check of the > condition happen at different stages of execution? Would it end up > checking the condition for all 15M rows, but only checking visibility > for the 1200 rows that come back from the join? No, the visibility check happens first. The timing does seem consistent with the idea that the comparison is being done at all 15M rows, but your other EXPLAIN shows that only 2K rows are actually retrieved, which presumably is because the merge doesn't need the rest. (Merge will stop scanning either input when it runs out of rows on the other side; so this sort of plan is very fast if the range of keys on one side is smaller than the range on the other. The numbers from the no-comparison EXPLAIN ANALYZE indicate that that is happening for your case.) So the comparison should happen for at most 2K rows too. regards, tom lane
I wrote: > No, the visibility check happens first. The timing does seem consistent > with the idea that the comparison is being done at all 15M rows, but > your other EXPLAIN shows that only 2K rows are actually retrieved, which > presumably is because the merge doesn't need the rest. (Merge will stop > scanning either input when it runs out of rows on the other side; so > this sort of plan is very fast if the range of keys on one side is > smaller than the range on the other. The numbers from the no-comparison > EXPLAIN ANALYZE indicate that that is happening for your case.) So the > comparison should happen for at most 2K rows too. After re-reading your explanation of what you're doing with the data, I thought of a possible explanation. Is the "source" value exactly correlated with the external_id_map primary key? What could be happening is this: 1. We can see from the EXPLAIN ANALYZE for the no-comparison case that the merge join stops after fetching only 2175 rows from external_id_map. This implies that the subject table joins to the first couple thousand entries in external_id_map and nothing beyond that. In particular, the merge join must have observed that the join key in the 2175'th row (in index order) of external_id_map was larger than the last (largest) join key in subject. 2. Let's suppose that source = 'SCH' is false for the 2175'th row of external_id_map and every one after that. Then what will happen is that the index scan will vainly seek through the entire external_id_map, looking for a row that its filter allows it to return, not knowing that the merge join has no use for any of those rows. If this is the story, and you need to make this sort of query fast, then what you need to do is incorporate the "source" value into the external_id_map index key somehow. Then the index scan would be able to realize that there is no possibility of finding another row with source = 'SCH'. The simplest way is just to make a 2-column index, but I wonder whether the source isn't actually redundant with the external_id_map primary key already ... regards, tom lane
On Fri, 2005-11-11 at 10:53 -0500, Tom Lane wrote: > After re-reading your explanation of what you're doing with the data, > I thought of a possible explanation. Is the "source" value exactly > correlated with the external_id_map primary key? Sort of. In this case, at the beginning of external_id_map, yes, though further down the table they're not. For example, if we got new subjects from 'SCH' at this point, they'd get assigned external_id_map.target_id (the primary key) values that are totally unrelated to what the current set are (the values in the external_id_map primary key just come off of a sequence that we use for everything). Right now though, since the 'SCH' data came in a contiguous chunk right at the beginning and hasn't changed or grown since then, the correlation is pretty exact, I think. It's true that there are no 'SCH' rows in the table after the first contiguous set (when I get back to work I'll check exactly what row that is). It's interesting that there are these correlations in the the data that didn't exist at all in my mental model. > what you need to do is incorporate the "source" value into the > external_id_map index key somehow. Then the index scan would be able to > realize that there is no possibility of finding another row with source > = 'SCH'. The simplest way is just to make a 2-column index I thought that's what I had done with the external_id_map_source_target_id index: statgen=> \d util.external_id_map Table "util.external_id_map" Column | Type | Modifiers -----------+-----------------------+----------- source_id | character varying(32) | not null source | character(3) | not null target_id | bigint | not null Indexes: "external_id_map_primary_key" PRIMARY KEY, btree (target_id) "external_id_map_source_source_id_unique" UNIQUE, btree (source, source_id) "external_id_map_source" btree (source) "external_id_map_source_target_id" btree (source, target_id) Foreign-key constraints: "external_id_map_source" FOREIGN KEY (source) REFERENCES util.source(id) So if I understand your suggestion correctly, we're back to the "why isn't this query using index foo" FAQ. For the external_id_map table, the statistics target for "source" is 200; the other two columns are at the default level because I didn't think of them as being very interesting statistics-wise. I suppose I should probably go ahead and raise the targets for every column of that table; I expect the planning time is negligible, and our queries tend to be large data-wise. Beyond that, I'm not sure how else to encourage the use of that index. If I changed that index to be (target_id, source) would it make a difference? Thanks for your help, Mitch
Mitchell Skinner <mitch@egcrc.net> writes: > On Fri, 2005-11-11 at 10:53 -0500, Tom Lane wrote: >> what you need to do is incorporate the "source" value into the >> external_id_map index key somehow. Then the index scan would be able to >> realize that there is no possibility of finding another row with source >> = 'SCH'. The simplest way is just to make a 2-column index > I thought that's what I had done with the > external_id_map_source_target_id index: > "external_id_map_source_target_id" btree (source, target_id) > If I changed that index to be (target_id, source) would it make a difference? [ fools around with a test case ... ] Seems like not :-(. PG is not bright enough to realize that an index on (source, target_id) can be used with a mergejoin on target_id, because the index sort order isn't compatible. (Given the equality constraint on source, there is an effective compatibility. I had thought that 8.1 might be able to detect this, but it seems not to in a simple test case --- there may be a bug involved there. In any case 8.0 definitely won't see it.) An index on (target_id, source) would be recognized as mergejoinable, but that doesn't solve the problem because an index condition on the second column doesn't provide enough information to know that the scan can stop early. Given your comment that the correlation is accidental, it may be that there's not too much point in worrying. The planner is picking this plan only because it notices the asymmetry in key ranges, and as soon as some more rows get added with higher-numbered target_ids it will shift to something else (probably a hash join). regards, tom lane