On Fri, 2005-11-11 at 09:17 -0500, Tom Lane wrote:
> Richard Huxton <dev@archonet.com> writes:
> > Does external_id_map really have 15 million rows? If not, try a VACUUM
> > FULL on it. Be prepared to give it some time to complete.
>
> Please don't, actually, until we understand what's going on.
Ack, I was the middle of the vacuum full already when I got this. I
still have the strace and lsof output from before the vacuum full. It's
definitely reading Postgres files:
bash-2.05b$ grep '^read' subsourcestrace | cut -d, -f1 | sort | uniq -c
100453 read(44
48218 read(47
bash-2.05b$ grep 'seek' subsourcestrace | cut -d, -f1 | sort | uniq -c
1 _llseek(40
1 _llseek(43
35421 _llseek(44
1 _llseek(45
1 _llseek(46
39787 _llseek(47
1 _llseek(48
File handles:
44 - external_id_map
47 - external_id_map_primary_key
40 - subject
43 - subject_pkey
45 - external_id_map_source
46 - external_id_map_source_target_id
48 - external_id_map_source_source_id_unique
As far as the seek offsets go, R doesn't want to do a histogram for me
without using up more RAM than I have. I put up some files at:
http://arctur.us/pgsql/
They are:
subsourcestrace - the strace output from "select * from subject_source
where source='SCH'"
subsourcestrace-nocond - the strace output from "select * from
subject_source"
subsourcelsof - the lsof output (for mapping from file handles to file
names)
relfilenode.html - for mapping from file names to table/index names (I
think I've gotten all the relevant file handle-table name mappings
above, though)
seekoff-44 - just the beginning seek offsets for the 44 file handle
(external_id_map)
seekoff-47 - just the beginning seek offsets for the 47 file handle
(external_id_map_primary_key)
The vacuum full is still going; I'll let you know if it changes things.
> The thing is that the given plan will fetch every row indicated by the
> index in both cases, in order to check the row's visibility. I don't
> see how an additional test on a non-indexed column would cause any
> additional I/O. If the value were large enough to be toasted
> out-of-line then it could cause toast table accesses ... but we're
> speaking of a char(3).
Pardon my ignorance, but do the visibility check and the check of the
condition happen at different stages of execution? Would it end up
checking the condition for all 15M rows, but only checking visibility
for the 1200 rows that come back from the join? I guess I'm confused
about what "every row indicated by the index" means in the context of
the join.
Thanks for taking an interest,
Mitch