Re: same plan, add 1 condition, 1900x slower - Mailing list pgsql-performance

From Mitch Skinner
Subject Re: same plan, add 1 condition, 1900x slower
Date
Msg-id 1131722682.29496.229.camel@enzian
Whole thread Raw
In response to Re: same plan, add 1 condition, 1900x slower  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: same plan, add 1 condition, 1900x slower  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Fri, 2005-11-11 at 09:17 -0500, Tom Lane wrote:
> Richard Huxton <dev@archonet.com> writes:
> > Does external_id_map really have 15 million rows? If not, try a VACUUM
> > FULL on it. Be prepared to give it some time to complete.
>
> Please don't, actually, until we understand what's going on.

Ack, I was the middle of the vacuum full already when I got this.  I
still have the strace and lsof output from before the vacuum full.  It's
definitely reading Postgres files:

bash-2.05b$ grep '^read' subsourcestrace | cut -d, -f1 | sort |  uniq -c
 100453 read(44
  48218 read(47
bash-2.05b$ grep 'seek' subsourcestrace | cut -d, -f1 | sort |  uniq -c
      1 _llseek(40
      1 _llseek(43
  35421 _llseek(44
      1 _llseek(45
      1 _llseek(46
  39787 _llseek(47
      1 _llseek(48

File handles:
44 - external_id_map
47 - external_id_map_primary_key
40 - subject
43 - subject_pkey
45 - external_id_map_source
46 - external_id_map_source_target_id
48 - external_id_map_source_source_id_unique

As far as the seek offsets go, R doesn't want to do a histogram for me
without using up more RAM than I have.  I put up some files at:
http://arctur.us/pgsql/
They are:
subsourcestrace - the strace output from "select * from subject_source
where source='SCH'"
subsourcestrace-nocond - the strace output from "select * from
subject_source"
subsourcelsof - the lsof output (for mapping from file handles to file
names)
relfilenode.html - for mapping from file names to table/index names (I
think I've gotten all the relevant file handle-table name mappings
above, though)
seekoff-44 - just the beginning seek offsets for the 44 file handle
(external_id_map)
seekoff-47 - just the beginning seek offsets for the 47 file handle
(external_id_map_primary_key)

The vacuum full is still going; I'll let you know if it changes things.

> The thing is that the given plan will fetch every row indicated by the
> index in both cases, in order to check the row's visibility.  I don't
> see how an additional test on a non-indexed column would cause any
> additional I/O.  If the value were large enough to be toasted
> out-of-line then it could cause toast table accesses ... but we're
> speaking of a char(3).

Pardon my ignorance, but do the visibility check and the check of the
condition happen at different stages of execution?  Would it end up
checking the condition for all 15M rows, but only checking visibility
for the 1200 rows that come back from the join?  I guess I'm confused
about what "every row indicated by the index" means in the context of
the join.

Thanks for taking an interest,
Mitch


pgsql-performance by date:

Previous
From: Mitch Skinner
Date:
Subject: Re: same plan, add 1 condition, 1900x slower
Next
From: Tom Lane
Date:
Subject: Re: same plan, add 1 condition, 1900x slower