Re: out-of-line (TOAST) storage ineffective when loading from dump? - Mailing list pgsql-general

From Markus Bertheau
Subject Re: out-of-line (TOAST) storage ineffective when loading from dump?
Date
Msg-id 684362e10802190155v1fdbcb31m7d7fb3353026a802@mail.gmail.com
Whole thread Raw
In response to Re: out-of-line (TOAST) storage ineffective when loading from dump?  (Richard Huxton <dev@archonet.com>)
Responses Re: out-of-line (TOAST) storage ineffective when loading from dump?
Re: out-of-line (TOAST) storage ineffective when loading from dump?
List pgsql-general
2008/2/19, Richard Huxton <dev@archonet.com>:
> Markus Bertheau wrote:
> > Afaics, TOAST was invented so that big attributes wouldn't be in the
> > way when working with the other attributes.
>
> Actually, I think it was designed as a way of working around PG's 8KB
> block-size. That imposed a maximum row size of the same and, well meant
> you couldn't have 2 x 5KB text fields for example.

Ah, ok. So what I'm talking about is a possible positive side effect of
out-of-line storage.

> The "toast" pages are stored in a separate table - see manual for
> details. There's a whole chapter (53.2) on this.

Yes, but I assume that on disk the pages will be laid out sequentially
- not intentionally so, of course. See below.

> I'm not sure what you mean by this. The page-cache will cache individual
> pages regardless of type afaik. A large data-load will probably mess up
> your cache for other processes. On the other hand, assuming you'll be
> building indexes etc. too then it's going to be in cache one way or another.

I'm loading a table with some short attributes and a large toastable attribute.
That means that for every main table heap page several toast table heap pages
are written. This happens through the buffer cache and the background writer,
so maybe the pages aren't written in the order in which they were created in
the buffer cache, but if they are, they end up on disk (assuming that the file
system is not fragmented) roughly like that:

main table heap page 1
toast table heap page 1
toast table heap page .
toast table heap page n
main table heap page 2
toast table heap page n+1
toast table heap page .
toast table heap page 2n

Significantly later a sequential scan of the table has to be made, the
toastable attribute is not needed for the operation. The caches are cold or
otherwise occupied. If the granularity of caches that are nearer to the disk in
the cache hierarchy than the PG buffer cache is higher than the PG page size
(i.e. a cache unit is bigger than the PG page size), then every read of a main
table heap page will inescapably read some toast table heap pages into the
cache (whichever cache that may be).  If all the main table heap pages were
laid out adjecently on disk, they could be read faster and caches be polluted
less.

Markus

pgsql-general by date:

Previous
From: Richard Huxton
Date:
Subject: Re: out-of-line (TOAST) storage ineffective when loading from dump?
Next
From: Gordon
Date:
Subject: Re: Auto incrementing primary keys