On May 15, 2007, at 12:02 PM, Bill Moseley wrote:
> On Tue, May 15, 2007 at 01:42:18PM -0400, Chris Browne wrote:
>> lists@peufeu.com (PFC) writes:
>>>> SELECT o.id
>>>> FROM order o
>>>> JOIN customer c on o.customer = c.id
>>>>
>>>> Does that bring into memory all columns from both order and
>>>> customer?
>>>> Maybe that's not a good example due to indexes.
>>>
>>> No, it just pulls the columns you ask from the table, nothing
>>> less, nothing more.
>>
>> That's not quite 100% accurate.
>>
>> In order to construct the join, the entire pages of the relevant
>> tuples in tables "order" and "customer" will need to be drawn into
>> memory.
>>
>> Thus, if there are a whole bunch of columns on each table, the
>> data in
>> those extra columns (e.g. - all columns aside from "id", the one that
>> was asked for in the result set) will indeed be drawn into memory.
>
> Is that specific to Postgresql? From an outside perspective it just
> seems odd that potentially a large amount of data would be pulled off
> disk into memory that is never used. Perhaps there's an overriding
> reason for this.
The columns are next to each other on the disk. You need to read
the entire block off disk into system cache, so you'll be reading all
the columns of all the rows in that block into memory.
That's just the way that most (all?) modern filesystems work, and so the
way that most filesystem based databases are going to work. I've seen
some databases that don't store all the main columns of a table together
on disk, but they're fairly rare.
Pushing data into lookaside tables either manually or automatically
via toast changes the tradeoffs.
Cheers,
Steve