Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT) - Mailing list pgsql-interfaces

From Douglas Thomson
Subject Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
Date
Msg-id 199912030551.QAA09331@mugca.cc.monash.edu.au
Whole thread Raw
In response to Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-interfaces
Tom Lane <tgl@sss.pgh.pa.us> writes:
> Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes:
> > If I hadn't checked that I have indexes on
> > all the attributes (and attribute combinations) that make up the keys
> > needed for the join, I would think the back end was trying to read in
> > and sort a large part of the data just for the query!
>
> Well, of course it *is* ... the trick is to make sure that anything
> really large ends up on disk (in a temp file) and not in memory.

What I don't understand is, why would it bother trying to read in
lots of data and sort it when it already has an index sorted in the
required order? I was trying to set up the individual table indexes
so that the join could be done efficiently, and not need either lots
of memory or temporary disk files. Do indexes not get used for joins?

> Offhand the plan looks pretty reasonable for a 9-way join.  I see that
> you have two sorts and five hashes going on here.  Each of those will
> feel entitled to use as much memory as whatever your -S setting is,
> which IIRC is 512K by default.  But 3.5meg altogether isn't enough to
> explain your memory usage ... unless you are using a larger-than-normal
> -S switch?

No, I use the default.

> Also, while the sorts are pretty certain to honor your -S limit (plus or
> minus some slop), the hashes are much more probabilistic affairs.  If a
> table is much larger than the planner guesses, or if the distribution of
> tuple values is very uneven, a hash join might use a lot more memory
> than -S.  It's hard to tell whether this might be happening without a
> lot more info than EXPLAIN provides, however.  EXPLAIN shows that your
> tables being hashed are all pretty small, the largest being
> 't24 users' at 6955 rows.  If that's accurate then I doubt hash overrun
> is the explanation...

6955 rows is accurate for that table. Values could well be highly
skewed though. The joining attribute for t24 is a student ID "number"
text field, and the numbers will be grouped.

Doug.

pgsql-interfaces by date:

Previous
From: "Ross J. Reedstrom"
Date:
Subject: pgsql, ODBC and ColdFusion
Next
From: Peter Wiley
Date:
Subject: Re: pgsql-interfaces-digest V1 #570