Thread: Column storage (EXTERNAL/EXTENDED) settings for bytea/text column
Hi all, I'm trying to do some comparisons between the EXTERNAL and the EXTENDED storage methods on a bytea column and from the outsidethe setting doesn't appear to affect the value stored on initial insert, but perhaps I'm looking at the wrong numbers. If I create two new tables with a single bytea column and set one of them to external storage, then insert an existingbytea value from another table into each one, they appear to be of exactly the same size. This is using PG 9.0.3on Debian Lenny, using the backports-sloppy deb package of PG 9. (I've verified that the first table has "extended" storage via pg_attribute and that the second table has external.) create table obj1 ( object bytea ); create table obj2 ( object bytea ); alter table obj2 alter column object set storage external; insert into obj1 ( object ) select object from serialized_content where id = 12345; insert into obj2 ( object ) select object from obj1; select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname ='obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid from pg_classwhere relname = 'obj2' ) ) as otoast2; o1 | otoast1 | o2 | otoast2 -------+---------+-------+--------- 65536 | 57344 | 65536 | 57344 (1 row) Now at this point if I perform a vacuum full on one or both, they'll both shrink down to a bit over half that size: vacuum full obj1; vacuum full obj2; select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname ='obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid from pg_classwhere relname = 'obj2' ) ) as otoast2; o1 | otoast1 | o2 | otoast2 -------+---------+-------+--------- 40960 | 32768 | 40960 | 32768 This shrinkage doesn't occur with a normal vacuum, in fact the reported sizes increase if I do a normal vacuum at that point,but that's not germane to my question AFAICT. Can I use the relation size like this to determine whether or not compression is happening for these toast columns? If not,is there a way that I can confirm that it is or isn't active? The results appear to be similar for text columns. Thanks, Joel
In response to Joel Stevenson <jstevenson@bepress.com>: > Hi all, > > I'm trying to do some comparisons between the EXTERNAL and the EXTENDED storage methods on a bytea column and from theoutside the setting doesn't appear to affect the value stored on initial insert, but perhaps I'm looking at the wrongnumbers. If I create two new tables with a single bytea column and set one of them to external storage, then insertan existing bytea value from another table into each one, they appear to be of exactly the same size. This is usingPG 9.0.3 on Debian Lenny, using the backports-sloppy deb package of PG 9. > > (I've verified that the first table has "extended" storage via pg_attribute and that the second table has external.) > > create table obj1 ( object bytea ); > create table obj2 ( object bytea ); > alter table obj2 alter column object set storage external; > insert into obj1 ( object ) select object from serialized_content where id = 12345; > insert into obj2 ( object ) select object from obj1; > select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname= 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid frompg_class where relname = 'obj2' ) ) as otoast2; > o1 | otoast1 | o2 | otoast2 > -------+---------+-------+--------- > 65536 | 57344 | 65536 | 57344 > (1 row) > > Now at this point if I perform a vacuum full on one or both, they'll both shrink down to a bit over half that size: > > vacuum full obj1; > vacuum full obj2; > select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname= 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid frompg_class where relname = 'obj2' ) ) as otoast2; > o1 | otoast1 | o2 | otoast2 > -------+---------+-------+--------- > 40960 | 32768 | 40960 | 32768 I'm not an expert, but it looks like you're not storing enough data to actually see the difference, since the actual sizes of the tables will always be rounded to an even page size. With only 1 row, it's always going to take a minimum amount. Also, are you sure you're storing compressible data? For example, if you're putting PNG or JPEG images in there, they're not going to compress any. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Apr 12, 2011, at 10:33 AM, Bill Moran wrote: > In response to Joel Stevenson <jstevenson@bepress.com>: > >> select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname= 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid frompg_class where relname = 'obj2' ) ) as otoast2; >> o1 | otoast1 | o2 | otoast2 >> -------+---------+-------+--------- >> 40960 | 32768 | 40960 | 32768 > > I'm not an expert, but it looks like you're not storing enough data to > actually see the difference, since the actual sizes of the tables will > always be rounded to an even page size. With only 1 row, it's always > going to take a minimum amount. > > Also, are you sure you're storing compressible data? For example, if > you're putting PNG or JPEG images in there, they're not going to compress > any. > Thanks for the reply, Bill. The data is very compressible, the raw data is 33392 bytes long and gzips down to 6965 bytes. As far as not storing enough,the description of the 'SET STORAGE' clause and the TOAST strategy it sounds like the TOASTer will try to compressanything that doesn't fit into the PG page ( default 8Kb ) so I would've thought that compression would be used forthe EXTENDED column and not used for the EXTERNAL column since my single-row data is larger than that. To be certain I stored 10 rows of that data and rechecked the reported size after a vacuum full: select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname ='obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid from pg_classwhere relname = 'obj2' ) ) as otoast2; o1 | otoast1 | o2 | otoast2 --------+---------+--------+--------- 147456 | 139264 | 147456 | 139264 So, again from the outside, the column storage settings don't appear to be behaving as I'd expect them too. Stumped. - Joel
On Mon, Apr 11, 2011 at 03:19:23PM -0700, Joel Stevenson wrote: > I'm trying to do some comparisons between the EXTERNAL and the EXTENDED storage methods on a bytea column and from theoutside the setting doesn't appear to affect the value stored on initial insert, but perhaps I'm looking at the wrongnumbers. If I create two new tables with a single bytea column and set one of them to external storage, then insertan existing bytea value from another table into each one, they appear to be of exactly the same size. This is usingPG 9.0.3 on Debian Lenny, using the backports-sloppy deb package of PG 9. > > (I've verified that the first table has "extended" storage via pg_attribute and that the second table has external.) > > create table obj1 ( object bytea ); > create table obj2 ( object bytea ); > alter table obj2 alter column object set storage external; > insert into obj1 ( object ) select object from serialized_content where id = 12345; > insert into obj2 ( object ) select object from obj1; If the value that shows up for insertion is already compressed, EXTERNAL storage will not decompress it. Change this line to insert into obj2 ( object ) select object || '' from obj1; to observe the effect you seek. Given the purpose of EXTERNAL storage, this might qualify as a bug. > select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname= 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid frompg_class where relname = 'obj2' ) ) as otoast2; > o1 | otoast1 | o2 | otoast2 > -------+---------+-------+--------- > 65536 | 57344 | 65536 | 57344 > Can I use the relation size like this to determine whether or not compression is happening for these toast columns? Ifnot, is there a way that I can confirm that it is or isn't active? The results appear to be similar for text columns. Yes; the sizes you're seeing through that method should be accurate. nm
Thanks for the help with that, Noah. Indeed the sizes do look like I'd expect them to if I force deflation of the byteavalue before inserting it into the EXTENDED column. On Apr 21, 2011, at 2:02 PM, Noah Misch wrote: > On Mon, Apr 11, 2011 at 03:19:23PM -0700, Joel Stevenson wrote: >> create table obj1 ( object bytea ); >> create table obj2 ( object bytea ); >> alter table obj2 alter column object set storage external; >> insert into obj1 ( object ) select object from serialized_content where id = 12345; >> insert into obj2 ( object ) select object from obj1; > > If the value that shows up for insertion is already compressed, EXTERNAL storage > will not decompress it. Change this line to > > insert into obj2 ( object ) select object || '' from obj1; > > to observe the effect you seek. > > Given the purpose of EXTERNAL storage, this might qualify as a bug.