Kurt Overberg <kurt@hotdogrecords.com> writes:
> I have the following query on postgresql 7.3.2 on RedHat 7.
> select *, (select count(*) from xrefmembergroup where membergroupid =
> m.id) as numberingroup from membergroup m;
> The xrefmembergroup table has about 120,000 rows, membergroup has 90.
> This query has been running very quickly, but has suddenly started
> taking a LONG LONG time.
Presumably the plan changed, but without any reasonable way to tell what
the old plan was, there's no way to be sure. (Possibly comparing
explain plans from both servers would be useful, though.)
> Now, when I do run this query my postmaster process spikes from around
> 10Megs (normal size) to around 250Megs and just kinda sits there until
> it eventually returns 5 minutes later.
What was the new plan, exactly? I don't see any reason for this query
to chew a lot of memory.
I think that the query is inherently inefficient as written, since
it forces a separate scan of xrefmembergroup for every membergroup row.
I don't really see how it could ever have been done in subsecond time,
unless perhaps a large fraction of the xrefmembergroup entries did not
match any membergroup row, which seems unlikely.
I'd suggest doing something that will allow the counts to be accumulated
in just one xrefmembergroup scan, with GROUP BY. A straightforward way
is
select m.*, numberingroup
from membergroup m, (select membergroupid, count(*) as numberingroup from xrefmembergroup group by membergroupid) as
c
where m.id = c.membergroupid;
I'm not convinced this will actually be much of a win in 7.3
unfortunately ... but it should fly in 7.4, because of the new
hash aggregation code.
regards, tom lane