Re: out-of-line (TOAST) storage ineffective when loading from dump? - Mailing list pgsql-general

From Richard Huxton
Subject Re: out-of-line (TOAST) storage ineffective when loading from dump?
Date
Msg-id 47BAB68A.1060709@archonet.com
Whole thread Raw
In response to Re: out-of-line (TOAST) storage ineffective when loading from dump?  ("Markus Bertheau" <mbertheau.pg@googlemail.com>)
Responses Re: out-of-line (TOAST) storage ineffective when loading from dump?
List pgsql-general
Markus Bertheau wrote:
> 2008/2/19, Richard Huxton <dev@archonet.com>:
> I'm loading a table with some short attributes and a large toastable attribute.
> That means that for every main table heap page several toast table heap pages
> are written. This happens through the buffer cache and the background writer,
> so maybe the pages aren't written in the order in which they were created in
> the buffer cache, but if they are, they end up on disk (assuming that the file
> system is not fragmented) roughly like that:
>
> main table heap page 1
> toast table heap page 1
> toast table heap page .
> toast table heap page n
> main table heap page 2
> toast table heap page n+1
> toast table heap page .
> toast table heap page 2n

Well, that's assuming:
1. You're not re-using space from previously deleted/updated rows.
2. You've not got a RAID array striping writes over multiple disks
3. The underlying filesystem + buffering isn't doing anything too clever.

> Significantly later a sequential scan of the table has to be made, the
> toastable attribute is not needed for the operation. The caches are cold or
> otherwise occupied.

OK

 > If the granularity of caches that are nearer to the disk in
> the cache hierarchy than the PG buffer cache is higher than the PG page size
> (i.e. a cache unit is bigger than the PG page size), then every read of a main
> table heap page will inescapably read some toast table heap pages into the
> cache (whichever cache that may be).

I think disk blocks on your more common file-systems are 4KB by default
(ext2/3 and ntfs for example). I'm not aware of any default disk-block
sizes more than the 8KB page-size of PG. Of course, the OS may read
ahead if it sees you scanning, but it will do that on a file basis.

 > If all the main table heap pages were
> laid out adjecently on disk, they could be read faster and caches be polluted
> less.

True enough. The key word there though is "if" - it means that PG is
trying to out-think the filesystem, OS and hardware.

It should be easy enough to test on a particular system though.
1. pg_restore a sample table with TOASTed data.
2. pg_restore the same data but no TOASTed data.
3. cluster the table with TOASTed data (which should force a rewrite of
the whole table but not its TOASTed data)

If the timing of various selects differ hugely then there's something
worth investigating.

--
   Richard Huxton
   Archonet Ltd

pgsql-general by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: out-of-line (TOAST) storage ineffective when loading from dump?
Next
From: Magnus Hagander
Date:
Subject: Re: Analogue to SQL Server UniqueIdentifier?