Thread: pg_class.relpages/allvisible probably shouldn't be a int4
Hi, postgres=# SELECT relpages FROM pg_class WHERE relname = 'large'; relpages --------------1804468224 (1 row) postgres=# \dt+ large List of relationsSchema | Name | Type | Owner | Size | Description --------+-------+-------+--------+-------+-------------public | large | table | andres | 19 TB | (1 row) That's nothing for 9.4 anymore, but shouldn't we make pg_class.relpages a int8 (sounds slightly better than float to me) or somesuch? I think most calculations actually work out because they're performed after casting to BlockNumber, but ... Greetings, Andres Freund PS: _mdfd_getseg is considered harmful(). Not fun once you have couple of hundred gigabytes. --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > That's nothing for 9.4 anymore, but shouldn't we make pg_class.relpages > a int8 (sounds slightly better than float to me) or somesuch? No; those are really BlockNumbers, and have always been. float4 would lose information and float8 or int8 would waste space. If we had an unsigned int type it'd be better. I suppose we could declare them as OID, but that would probably confuse people no end. regards, tom lane
On May 9, 2014 10:37:49 PM CEST, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Andres Freund <andres@2ndquadrant.com> writes: >> That's nothing for 9.4 anymore, but shouldn't we make >pg_class.relpages >> a int8 (sounds slightly better than float to me) or somesuch? > >No; those are really BlockNumbers, and have always been. float4 would >lose information and float8 or int8 would waste space. If we had an >unsigned int type it'd be better. I suppose we could declare them as >OID, >but that would probably confuse people no end. Well negative numbers aren't great either. Although admittedly it's not yet affecting many... I think the waste of storing 2*4 additional bytes isn't going to hurt much. And adding a proper unsigned type doesn't sound like a small amount of work. Not to speak of overloading troubles.... I realize they are block numbers and casted in most places - that's why the overflow doesn't seem to cause too many troubles. Andres -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, May 9, 2014 at 1:50 PM, Andres Freund <andres@2ndquadrant.com> wrote: > And adding a proper unsigned type doesn't sound like a small amount of work. Perhaps not, but it's overdue. We ought to have one. -- Peter Geoghegan
On 2014-05-10 23:21:34 -0700, Peter Geoghegan wrote: > On Fri, May 9, 2014 at 1:50 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > And adding a proper unsigned type doesn't sound like a small amount of work. > > Perhaps not, but it's overdue. We ought to have one. Maybe. But there's so many things to decide around it that I don't think it's a good prerequisite for not showing essentially corrupted values in a supported scenario. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-05-10 23:21:34 -0700, Peter Geoghegan wrote: >> On Fri, May 9, 2014 at 1:50 PM, Andres Freund <andres@2ndquadrant.com> wrote: >>> And adding a proper unsigned type doesn't sound like a small amount of work. >> Perhaps not, but it's overdue. We ought to have one. > Maybe. But there's so many things to decide around it that I don't think > it's a good prerequisite for not showing essentially corrupted values in > a supported scenario. It's a lot harder than it sounds at first; see past discussions about how we could shoehorn one into the numeric type hierarchy. And think about how C expressions that mix signed and unsigned inputs tend to give surprising results :-( The bigger picture though is that it's hard to get excited about this particular scenario, because if you are up to the point where your table size overflows int32, you are less than a factor of 2 away from the hard limit of BlockNumber, and will therefore soon have much bigger problems to worry about than whether pg_class.relpages prints confusingly. My advice to anyone who reported this from the field would certainly be "time to think about partitioning that table". So if I were to take Andres' complaint seriously at all, I'd be thinking in terms of "do we need to widen BlockNumber to int64?", not "how do we make this print as unsigned?". But I doubt such a proposal would fly, because of the negative impact on index sizes. A possible cosmetic fix is to just clamp the value vacuum/analyze will store into relpages at INT_MAX, while scaling reltuples so that the all-important reltuples/relpages ratio stays at reality. But that might be harder than it sounds too, because of the moving average tracking behavior for reltuples. And it'd be a code path that gets no meaningful testing :-( regards, tom lane
On 2014-05-11 12:24:30 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-05-10 23:21:34 -0700, Peter Geoghegan wrote: > >> On Fri, May 9, 2014 at 1:50 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >>> And adding a proper unsigned type doesn't sound like a small amount of work. > > >> Perhaps not, but it's overdue. We ought to have one. > > > Maybe. But there's so many things to decide around it that I don't think > > it's a good prerequisite for not showing essentially corrupted values in > > a supported scenario. > > It's a lot harder than it sounds at first; see past discussions about how > we could shoehorn one into the numeric type hierarchy. And think about > how C expressions that mix signed and unsigned inputs tend to give > surprising results :-( Yea. I really don't like to take on such a major project to solve a minor problem. What I am thinking about right now is to expose a 'pg_blocknumber' type. That only does very basic operations and implicitly casts to int64. That's probably a much more handleable problem and it also might give us some more experience with unsigned types. Comments? > The bigger picture though is that it's hard to get excited about this > particular scenario, because if you are up to the point where your table > size overflows int32, you are less than a factor of 2 away from the hard > limit of BlockNumber, and will therefore soon have much bigger problems to > worry about than whether pg_class.relpages prints confusingly. Well. It takes several years to get to 16TB, so properly supporting >16TB && < 32TB is a pretty good step. It'll allow a couple of years continued operation. > My advice > to anyone who reported this from the field would certainly be "time to > think about partitioning that table". That definitely got to be the mid to longterm plan. The problem is that our partitioning sucks. Majorly. My hope is that by allowing that large tables to work more properly we have time to improve partitioning to the level that it doesn't basically requires to remove all nontrivial constraints. > So if I were to take Andres' > complaint seriously at all, I'd be thinking in terms of "do we need to > widen BlockNumber to int64?", not "how do we make this print as > unsigned?". But I doubt such a proposal would fly, because of the > negative impact on index sizes. Yea, I am not wild for that either. I guess migrating to a postgres with a larger blocksize is the next step. > A possible cosmetic fix is to just clamp the value vacuum/analyze > will store into relpages at INT_MAX, while scaling reltuples so that > the all-important reltuples/relpages ratio stays at reality. But > that might be harder than it sounds too, because of the moving > average tracking behavior for reltuples. And it'd be a code path > that gets no meaningful testing :-( I am absolutely not a fan of that either :(. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 05/12/2014 12:30 AM, Andres Freund wrote: >> >So if I were to take Andres' >> >complaint seriously at all, I'd be thinking in terms of "do we need to >> >widen BlockNumber to int64?", not "how do we make this print as >> >unsigned?". But I doubt such a proposal would fly, because of the >> >negative impact on index sizes. > Yea, I am not wild for that either. I guess migrating to a postgres with > a larger blocksize is the next step. A larger block size won't buy you very much time either. We could steal some bits from the OffsetNumber portion of an ItemPointer. If we assume the max. block size of 32kb, and that each Item takes at least 16 bytes, you only need 11 bits for the offset number. That leaves 5 bits unused, and if we use them to expand the block number to 37 bits in total, that's enough for 1 PB with the default 8k block size. But I concur that in practice, if you're dealing with 16TB tables, it's time to partition. - Heikki
On 2014-05-12 10:07:29 +0300, Heikki Linnakangas wrote: > On 05/12/2014 12:30 AM, Andres Freund wrote: > >>>So if I were to take Andres' > >>>complaint seriously at all, I'd be thinking in terms of "do we need to > >>>widen BlockNumber to int64?", not "how do we make this print as > >>>unsigned?". But I doubt such a proposal would fly, because of the > >>>negative impact on index sizes. > >Yea, I am not wild for that either. I guess migrating to a postgres with > >a larger blocksize is the next step. > > A larger block size won't buy you very much time either. Well. If you mean 'a year or five with that... :) > We could steal some bits from the OffsetNumber portion of an ItemPointer. If > we assume the max. block size of 32kb, and that each Item takes at least 16 > bytes, you only need 11 bits for the offset number. That leaves 5 bits > unused, and if we use them to expand the block number to 37 bits in total, > that's enough for 1 PB with the default 8k block size. Hm. That's not a generally bad idea. I think we'll have to do that in a couple of years. Regardless of better partitioning. > But I concur that in practice, if you're dealing with 16TB tables, it's time > to partition. Well, we need to improve our partitioning for that to be viable for all relations. Not having usable foreign and unique keys makes it a pita in some cases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-05-12 10:07:29 +0300, Heikki Linnakangas wrote: >> But I concur that in practice, if you're dealing with 16TB tables, it's time >> to partition. > Well, we need to improve our partitioning for that to be viable for all > relations. Not having usable foreign and unique keys makes it a pita in > some cases. Well, yeah, but that's on the to-do list in any case. regards, tom lane
On 2014-05-11 23:30:37 +0200, Andres Freund wrote: > On 2014-05-11 12:24:30 -0400, Tom Lane wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > > > On 2014-05-10 23:21:34 -0700, Peter Geoghegan wrote: > > >> On Fri, May 9, 2014 at 1:50 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > >>> And adding a proper unsigned type doesn't sound like a small amount of work. > > > > >> Perhaps not, but it's overdue. We ought to have one. > > > > > Maybe. But there's so many things to decide around it that I don't think > > > it's a good prerequisite for not showing essentially corrupted values in > > > a supported scenario. > > > > It's a lot harder than it sounds at first; see past discussions about how > > we could shoehorn one into the numeric type hierarchy. And think about > > how C expressions that mix signed and unsigned inputs tend to give > > surprising results :-( > > Yea. I really don't like to take on such a major project to solve a > minor problem. > What I am thinking about right now is to expose a 'pg_blocknumber' > type. That only does very basic operations and implicitly casts to > int64. That's probably a much more handleable problem and it also might > give us some more experience with unsigned types. Comments? As a note towards that: e.g. pageinspect deals with blocknumbers and uses int4 for that. That makes accessing the higher blocks really awkward... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 12 May 2014 08:15, Andres Freund <andres@2ndquadrant.com> wrote: >> But I concur that in practice, if you're dealing with 16TB tables, it's time >> to partition. > > Well, we need to improve our partitioning for that to be viable for all > relations. Not having usable foreign and unique keys makes it a pita in > some cases. As discussed, declarative partitioning is on the roadmap for this next release, so I would say lets just document that tablesizes above 16TB don't report correctly and move on. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-05-30 12:00:47 +0100, Simon Riggs wrote: > On 12 May 2014 08:15, Andres Freund <andres@2ndquadrant.com> wrote: > > >> But I concur that in practice, if you're dealing with 16TB tables, it's time > >> to partition. > > > > Well, we need to improve our partitioning for that to be viable for all > > relations. Not having usable foreign and unique keys makes it a pita in > > some cases. > > As discussed, declarative partitioning is on the roadmap for this next > release, so I would say lets just document that tablesizes above 16TB > don't report correctly and move on. I doubt we'll fix all the snags - like foreign keys, unique keys, etc - that partitioning has in this release... Introducing a blocknumber type seems easy and mechanical enough. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund wrote: > On 2014-05-11 23:30:37 +0200, Andres Freund wrote: > > Yea. I really don't like to take on such a major project to solve a > > minor problem. > > What I am thinking about right now is to expose a 'pg_blocknumber' > > type. That only does very basic operations and implicitly casts to > > int64. That's probably a much more handleable problem and it also might > > give us some more experience with unsigned types. Comments? > > As a note towards that: e.g. pageinspect deals with blocknumbers and > uses int4 for that. That makes accessing the higher blocks really > awkward... Sounds like pg_blocknumber could be a good idea, but I wouldn't add the bigint implicit casts to int64. We might enlarge BlockNumber to unsigned 64 bits in the future -- we will regret the implicit casts to signed int64 then. Anyway it's not like implicit casts will get us any extra functionality. We can just add + and - operators, and that should cover 99% of what people would initially want ... Now, adding casts FROM int32 and int64 to pg_blocknumber doesn't sound bad. (Maybe have them error out if the value is negative?) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services