Thread: Table size does not include toast size
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello I wonder why the function pg_relation_size(text) does not take into account the space used by toast data in a table when returning the space used by the table. As an administrator I would expect pg_total_relation_size() to return data+toast+indexes and pg_relation_size() to return data+toast. Is this a deliberate decision? Could we change this behavior in the future? We are using a 8.3 database. Thanks in advance. regards, - --Rafael Martinez, <r.m.guerrero@usit.uio.no>Center for Information Technology ServicesUniversity of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.7 (GNU/Linux) iD8DBQFLL3m+BhuKQurGihQRAgBdAKCV5ZIBJyDOzGWh/En4sTvWSW67ZwCfYoYx iUYIMJCbk6li2BhYcR7JB5M= =l2YF -----END PGP SIGNATURE-----
Rafael Martinez <r.m.guerrero@usit.uio.no> writes: > I wonder why the function pg_relation_size(text) does not take into > account the space used by toast data in a table when returning the space > used by the table. It's not supposed to. Use pg_total_relation_size if you want a number that includes index and toast space. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane wrote: > Rafael Martinez <r.m.guerrero@usit.uio.no> writes: >> I wonder why the function pg_relation_size(text) does not take into >> account the space used by toast data in a table when returning the space >> used by the table. > > It's not supposed to. Use pg_total_relation_size if you want a number > that includes index and toast space. > I am probably missing the point here, why is it not supposed to show the size of the table(data) *without* indexes? My question was because I can not understand the use and usefulness of pg_relation_size() (as it works today) in a table that use toast. - From an administrator point of view, there are two numbers that are interesting, the total size of a table (indexes included) and the size of the table without taking into account the space used by its indexes. At least, if there is a logic in this behavior, it should be documented in "9.23. System Administration Functions". The documentation only says "Disk space used by the table or index with ...." It is not the first time confused users have asked me why pg_relation_size() does not show the space used by the table without indexes. Many do not know what 'toast' is, and most probably they do not need to know about this either. regards, - --Rafael Martinez, <r.m.guerrero@usit.uio.no>Center for Information Technology ServicesUniversity of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.7 (GNU/Linux) iD8DBQFLL5NCBhuKQurGihQRAmtZAJ99wJPKbS1u2RUGxO4G++X7nbqt2gCeJubn b+328nrEICsXPS7kgD4bq68= =bBO8 -----END PGP SIGNATURE-----
--On 21. Dezember 2009 10:01:37 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's not supposed to. Use pg_total_relation_size if you want a number > that includes index and toast space. I've created a C-Function a while ago that extracts the TOAST size for a given relation. This gave me the opportunity to do a pg_relation_size(oid) + pg_relation_toast_size(oid) for a given table oid to calculate on disk data size required by a table. Maybe we should include such a function in core? -- Thanks Bernd
Rafael Martinez <r.m.guerrero@usit.uio.no> writes: > I am probably missing the point here, why is it not supposed to show the > size of the table(data) *without* indexes? Because pg_relation_size is defined at the "physical" level of showing one relation, where relation means a pg_class entry. If you want agglomerations of multiple relations, you can use pg_total_relation_size, or build your own total if you have some other usage in mind. The one you propose seems fairly arbitrary --- for example, if it includes the toast relation, why not the toast relation's index too? It's not like either one is optional from the user's standpoint. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bernd Helmle wrote: > > > --On 21. Dezember 2009 10:01:37 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> It's not supposed to. Use pg_total_relation_size if you want a number >> that includes index and toast space. > > I've created a C-Function a while ago that extracts the TOAST size for a > given relation. This gave me the opportunity to do a > pg_relation_size(oid) + pg_relation_toast_size(oid) for a given table > oid to calculate on disk data size required by a table. Maybe we should > include such a function in core? > It is a possibility. But I really think that pg_relation_size() not reporting the total size of the table (without indexes) is useless. toast is an internal way of organizing/saving data for tuples larger than the page size used by PostgreSQL. It is a mechanism transparent to the user and therefore pg_relation_size() should not differentiate between data saved via toast or not. The size of the table without the indexes should be reported regardless the technique used to save the data on the disk. regards, - --Rafael Martinez, <r.m.guerrero@usit.uio.no>Center for Information Technology ServicesUniversity of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.7 (GNU/Linux) iD8DBQFLL5ZHBhuKQurGihQRAoR8AJ97RoST3VHGCmcIOhkdRbJIWb3mnwCeN7Mm 7Oja4kmyrQfM6/RxyUE4K2A= =kxO9 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tom Lane wrote: > Rafael Martinez <r.m.guerrero@usit.uio.no> writes: >> I am probably missing the point here, why is it not supposed to show the >> size of the table(data) *without* indexes? > > Because pg_relation_size is defined at the "physical" level of showing > one relation, where relation means a pg_class entry. If you want > agglomerations of multiple relations, you can use > pg_total_relation_size, Ok, thanks for the clarification :-) The 'problem' is that as a developer with advanced knowledge of the postgres internals, you see a table as a group of relations (toast, indexes, toast relation's index, etc) A 'normal' user only sees a table and its indexes and this user misinterpret the use of the function "pg_relation_size() when it reads in the documentation "pg_relation_size(): Disk space used by the table or index ... " regards, - --Rafael Martinez, <r.m.guerrero@usit.uio.no>Center for Information Technology ServicesUniversity of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.7 (GNU/Linux) iD8DBQFLL5lwBhuKQurGihQRApt1AJ4wQS9+WSiUSAB6sSV6i/z0y0gZhwCfWq1Y BnnbddNedMMGCUGJ+X4eMMY= =yUsa -----END PGP SIGNATURE-----
Bernd Helmle wrote: > I've created a C-Function a while ago that extracts the TOAST size for > a given relation. This gave me the opportunity to do a > pg_relation_size(oid) + pg_relation_toast_size(oid) for a given table > oid to calculate on disk data size required by a table. Maybe we > should include such a function in core? Writing such a thing is already on my to-do list; it's absolutely a missing piece of the puzzle here. If you've got such a patch, by all means submit that. I just ran into my first heavily TOASTy database recently and the way I'm computing sizes on the relations there is too complicated for my tastes, so it's completely unreasonable to expect regular users to do that. To answer Rafael's concerns directly: you're right that this is confusing. pg_relation_size is always going to do what it does right now just because of how that fits into the design of the database. However, the documentation should be updated to warn against the issue with TOAST here. And it should be easier to get the total you're like to see here: main relation + toasted parts, since that's what most DBAs want in this area. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Greg Smith <greg@2ndquadrant.com> writes: > To answer Rafael's concerns directly: you're right that this is > confusing. pg_relation_size is always going to do what it does right > now just because of how that fits into the design of the database. > However, the documentation should be updated to warn against the issue > with TOAST here. And it should be easier to get the total you're like > to see here: main relation + toasted parts, since that's what most DBAs > want in this area. Perhaps invent pg_table_size() = base table + toast table + toast index and pg_indexes_size() = all other indexes for table giving us the property pg_table_size + pg_indexes_size = pg_total_relation_size I think the 8.4 documentation already makes it apparent that pg_relation_size is a pretty low-level number. If we invent other functions with obvious names, that should be sufficient. regards, tom lane
Tom Lane wrote: > Perhaps invent pg_table_size() = base table + toast table + toast index > and pg_indexes_size() = all other indexes for table > giving us the property pg_table_size + pg_indexes_size = > pg_total_relation_size > Right; that's exactly the way I'm computing things now, I just have to crawl way too much catalog data to do it. I also agree that if we provide pg_table_size, the issue of "pg_relation_size doesn't do what I want" goes away without needing to even change the existing documentation--people don't come to that section looking for "relation", they're looking for "table". Bernd, there's a basic spec if you have time to work on this. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On Mon, Dec 21, 2009 at 5:02 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Tom Lane wrote: >> >> Perhaps invent pg_table_size() = base table + toast table + toast index >> and pg_indexes_size() = all other indexes for table >> giving us the property pg_table_size + pg_indexes_size = >> pg_total_relation_size >> > > Right; that's exactly the way I'm computing things now, I just have to crawl > way too much catalog data to do it. I also agree that if we provide > pg_table_size, the issue of "pg_relation_size doesn't do what I want" goes > away without needing to even change the existing documentation--people don't > come to that section looking for "relation", they're looking for "table". > > Bernd, there's a basic spec if you have time to work on this. What about, the visibility maps and free space maps? -- greg
Greg Stark <gsstark@mit.edu> writes: > On Mon, Dec 21, 2009 at 5:02 PM, Greg Smith <greg@2ndquadrant.com> wrote: >> Right; that's exactly the way I'm computing things now, I just have to crawl >> way too much catalog data to do it. �I also agree that if we provide >> pg_table_size, the issue of "pg_relation_size doesn't do what I want" goes >> away without needing to even change the existing documentation--people don't >> come to that section looking for "relation", they're looking for "table". >> >> Bernd, there's a basic spec if you have time to work on this. > What about, the visibility maps and free space maps? Those would be included for each relation, I should think. The objective here is not to break things down even more finely than pg_relation_size does, but to aggregate into terms that are meaningful to the user --- which is to say, "the table" and "its indexes". Anything you can't get rid of by dropping indexes/constraints is part of "the table" at this level of detail. regards, tom lane
2009/12/21 Tom Lane <tgl@sss.pgh.pa.us>: > Greg Smith <greg@2ndquadrant.com> writes: >> To answer Rafael's concerns directly: you're right that this is >> confusing. pg_relation_size is always going to do what it does right >> now just because of how that fits into the design of the database. >> However, the documentation should be updated to warn against the issue >> with TOAST here. And it should be easier to get the total you're like >> to see here: main relation + toasted parts, since that's what most DBAs >> want in this area. > > Perhaps invent pg_table_size() = base table + toast table + toast index > and pg_indexes_size() = all other indexes for table > giving us the property pg_table_size + pg_indexes_size = > pg_total_relation_size Did you mean :pg_table_size() = base table + toast tablepg_indexes_size() = base indexes + toast indexes ? > > I think the 8.4 documentation already makes it apparent that > pg_relation_size is a pretty low-level number. If we invent other > functions with obvious names, that should be sufficient. > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
--On 22. Dezember 2009 11:46:32 +0100 Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: > Did you mean : > pg_table_size() = base table + toast table > pg_indexes_size() = base indexes + toast indexes > ? Since you always have a toast index automatically it makes sense to include them in pg_table_size(). -- Thanks Bernd
--On 21. Dezember 2009 12:02:02 -0500 Greg Smith <greg@2ndquadrant.com> wrote: > Tom Lane wrote: >> Perhaps invent pg_table_size() = base table + toast table + toast index >> and pg_indexes_size() = all other indexes for table >> giving us the property pg_table_size + pg_indexes_size = >> pg_total_relation_size >> > Right; that's exactly the way I'm computing things now, I just have to > crawl way too much catalog data to do it. I also agree that if we > provide pg_table_size, the issue of "pg_relation_size doesn't do what I > want" goes away without needing to even change the existing > documentation--people don't come to that section looking for "relation", > they're looking for "table". > > Bernd, there's a basic spec if you have time to work on this. I see if i can get some time for it during christmas vacation (its on my radar for a longer period of time). I'm still working on this NOT NULL pg_constraint representation and would like to propose a patch fairly soon for this. -- Thanks Bernd
Cédric Villemain <cedric.villemain.debian@gmail.com> writes: > 2009/12/21 Tom Lane <tgl@sss.pgh.pa.us>: >> Perhaps invent �pg_table_size() = base table + toast table + toast index >> and � � � � � � pg_indexes_size() = all other indexes for table >> giving us the property pg_table_size + pg_indexes_size = >> pg_total_relation_size > Did you mean : > pg_table_size() = base table + toast table > pg_indexes_size() = base indexes + toast indexes > ? No. regards, tom lane
--On 22. Dezember 2009 15:11:40 +0100 Bernd Helmle <mailings@oopsware.de> wrote: >> Bernd, there's a basic spec if you have time to work on this. > > I see if i can get some time for it during christmas vacation (its on my > radar for a longer period of time). I'm still working on this NOT NULL > pg_constraint representation and would like to propose a patch fairly > soon for this. Since i'm not able to finish those other things in time, i wrapped up my existing code for this issue and came up with the attached patch, which should implement the behavior Tom proposed. These are two new functions pg_table_size() and pg_indexes_size(). This patch also changes pg_total_relation_size() to be a shorthand for pg_table_size() + pg_indexes_size(). Barring any objections i'm adding this to the CF. -- Thanks Bernd
Attachment
Bernd Helmle wrote: > These are two new functions pg_table_size() and pg_indexes_size(). > This patch also changes pg_total_relation_size() to be a shorthand for > pg_table_size() + pg_indexes_size(). Attached is a test program to exercise these new functions. I thoroughly abuse generate_series and arrays to create a table with a few megabytes of both regular and TOAST-ed text, and with two indexes on it. Here's the results from a sample run (it's random data so each run will be a bit different): pg_relation_size | 11,755,520 pages_size | 11,755,520 toast_and_fsm | 22,159,360 pg_table_size | 33,914,880 pg_indexes_size | 524,288 pkey | 262,144 i | 262,144 pg_total_relation_size | 34,439,168 computed_total | 34,439,168 This seems to work as expected. You can see that pg_relation_size gives a really misleading value for this table, whereas the new pg_table_size does what DBAs were asking for here. Having pg_indexes_size around is handy too. I looked over the code a bit, everything in the patch looks clean too. The only question I'm left with after browsing the patch and staring at the above results is whether it makes sense to expose a pg_toast_size function. That would make the set available here capable of handling almost every situation somebody might want to know about, making this area completely done as I see it. In addition to being a useful shorthand on its own, that would then allow you to indirectly compute just the FSM size, which seems like an interesting number to know as feedback on what VACUUM is up to. It's easy enough to add, too: the calculate_toast_table_size code needed is already in the patch, just have to add another external function to expose it. I don't think there's any useful case for further exposing the two component parts of the toast size. If you're enough of a hacker to know what to do with those, you can certainly break them down yourself. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com \x DROP TABLE test; CREATE TABLE test(s SERIAL PRIMARY KEY,d TEXT); CREATE INDEX i on TEST(s); INSERT INTO test(d) SELECT array_to_string(array( SELECT chr(ascii('A') + (random() * 64)::integer) FROM generate_series(1,20000)),'') -- Size FROM generate_series(1,1000); -- Rows INSERT INTO test(d) SELECT array_to_string(array( SELECT chr(ascii('A') + (random() * 64)::integer) FROM generate_series(1,1000)),'') -- Size FROM generate_series(1,10000); -- Rows --insert into test (d) SELECT repeat('xyz123'::text,(1+random() * 1000)::integer) FROM generate_series(1,100000); ANALYZE test; SELECT pg_relation_size(relname::regclass),relpages * 8192 AS pages_size FROM pg_class where relname='test'; SELECT pg_table_size('test'::regclass) - pg_relation_size('test'::regclass) AS toast_and_fsm; SELECT pg_table_size('test'::regclass); SELECT pg_indexes_size('test'::regclass),pg_relation_size('test_pkey'::regclass) as pkey,pg_relation_size('i'::regclass)AS i; SELECT pg_total_relation_size('test'::regclass),pg_table_size('test'::regclass)+pg_indexes_size('test'::regclass) AS computed_total;
Greg Smith <greg@2ndquadrant.com> writes: > The only question I'm left with after browsing the patch and staring at > the above results is whether it makes sense to expose a pg_toast_size > function. That would make the set available here capable of handling > almost every situation somebody might want to know about, making this > area completely done as I see it. In addition to being a useful > shorthand on its own, that would then allow you to indirectly compute > just the FSM size, which seems like an interesting number to know as > feedback on what VACUUM is up to. It's easy enough to add, too: the > calculate_toast_table_size code needed is already in the patch, just > have to add another external function to expose it. > I don't think there's any useful case for further exposing the two > component parts of the toast size. If you're enough of a hacker to know > what to do with those, you can certainly break them down yourself. Hmm ... those opinions seem a bit contradictory. If you're enough of a hacker to know what FSM is, you can subtract off the toast size for yourself no? I'm inclined to think that table vs. index is the right level of abstraction for these functions, and that breaking it down further than that isn't all that helpful. We have the bottom-level information (per-fork relation size) available for those who really want the details. regards, tom lane
Tom Lane wrote: > I'm inclined to think that table vs. index is the right level of > abstraction for these functions, and that breaking it down further than > that isn't all that helpful. We have the bottom-level information > (per-fork relation size) available for those who really want the > details. > Fair enough; this certainly knocks off all the important stuff already, just wanted final sanity check opinion. This one is ready for a committer to look at now. My test case seems to work fine with a moderately complex set of things to navigate. The main think I'm not familiar enough with to have looked at deeply is exactly how the FSM and toast computations are done, to check if there's any corner cases in how it navigates forks and such that aren't considered. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Bernd Helmle <mailings@oopsware.de> writes: > Since i'm not able to finish those other things in time, i wrapped up my > existing code for this issue and came up with the attached patch, which > should implement the behavior Tom proposed. These are two new functions > pg_table_size() and pg_indexes_size(). This patch also changes > pg_total_relation_size() to be a shorthand for pg_table_size() + > pg_indexes_size(). Applied with minor corrections. regards, tom lane