Thread: pg_class -> reltuples?

pg_class -> reltuples?

From
Neil Conway
Date:
Hi all,

Is there a reason why the reltuples column of pg_class is stored as a
"real", rather than one of the integer data types? Are there any
situations in which there will be a non-integer value stored in this
column?

Cheers,

Neil

P.S. I tried to search the archives, but archives.postgresql.org is so
slow, it's basically unusable. So my apologies if this has already been
discussed...
-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC



Re: pg_class -> reltuples?

From
Bruce Momjian
Date:
Neil Conway wrote:
> Hi all,
> 
> Is there a reason why the reltuples column of pg_class is stored as a
> "real", rather than one of the integer data types? Are there any
> situations in which there will be a non-integer value stored in this
> column?

That is an excellent question.  I assume it is related to having > 4
billion rows, but we have int8 for that.  The value is used mostly by
the optimizer, which does most of its calcultions using float8 (real),
so that may be why.

> P.S. I tried to search the archives, but archives.postgresql.org is so
> slow, it's basically unusable. So my apologies if this has already been
> discussed...

Yes, it is hampering me from researching some of these patches too, and
fts is completely down.  If I could just get a web page of all the
threads (forget searching), I would be happy. The archives site contents
hasn't been updated since Feb 28.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_class -> reltuples?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Neil Conway wrote:
>> Is there a reason why the reltuples column of pg_class is stored as a
>> "real", rather than one of the integer data types?

> That is an excellent question.  I assume it is related to having > 4
> billion rows, but we have int8 for that.

1. We support tables > 4G rows.

2. int8 is not available on all platforms.

3. The only use for reltuples is in the optimizer, which is perfectly  content with approximate values.
        regards, tom lane


Archive search (was: pg_class -> reltuples?)

From
Ian Barwick
Date:
On Thursday 07 March 2002 23:11, Bruce Momjian wrote:
> Neil Conway wrote:
>
> > P.S. I tried to search the archives, but archives.postgresql.org is so
> > slow, it's basically unusable. So my apologies if this has already been
> > discussed...
>
> Yes, it is hampering me from researching some of these patches too, and
> fts is completely down.  If I could just get a web page of all the
> threads (forget searching), I would be happy. The archives site contents
> hasn't been updated since Feb 28.

maybe google?

http://groups.google.com/groups?hl=en&group=comp.databases.postgresql.hackers

Though a cursory glance shows some mails which went over the
list aren't there, particularly the most recent threads are pretty patchy, pun
unintended..

Ian Barwick


Re: Archive search (was: pg_class -> reltuples?)

From
Bruce Momjian
Date:
> maybe google?
> 
> http://groups.google.com/groups?hl=en&group=comp.databases.postgresql.hackers
> 
> Though a cursory glance shows some mails which went over the
> list aren't there, particularly the most recent threads are pretty patchy, pun
> unintended..

Thanks. That is a huge help.  In fact, this lists all the groups:
http://groups.google.com/groups?hl=en&group=comp.databases.postgresql

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: pg_class -> reltuples?

From
Neil Conway
Date:
On Thu, 2002-03-07 at 17:51, Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Neil Conway wrote:
> >> Is there a reason why the reltuples column of pg_class is stored as a
> >> "real", rather than one of the integer data types?
> 
> > That is an excellent question.  I assume it is related to having > 4
> > billion rows, but we have int8 for that.
> 
> 1. We support tables > 4G rows.

I agree we should try to support very large tables -- so why waste space
on storing floating point? And am I missing something, or is a "real"
only 4 bytes?

> 2. int8 is not available on all platforms.

I have no problem making restrictions on data types for portability, but
at least we should be consistent:

% grep -rI 'long long' * | wc -l    37
% grep -rI 'int64' * | wc -l   191

On all the platforms I tested (x86, SPARC, PPC, PA-RISC, Alpha), a 'long
long' is supported, and is 8 bytes. Which platforms don't have this, and
are we actively supporting them?

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC



Re: pg_class -> reltuples?

From
Tom Lane
Date:
Neil Conway <nconway@klamath.dyndns.org> writes:
> I have no problem making restrictions on data types for portability, but
> at least we should be consistent:

We *are* consistent.  int8 is not used in the system catalogs, and where
it is used, the system will continue to function if it's implemented as
a 32-bit datatype.  (At least, things still worked the last time I tried
turning off HAVE_LONG_LONG_INT.  If someone broke it since then, it
needs to be fixed.)
        regards, tom lane


Re: pg_class -> reltuples?

From
Neil Conway
Date:
On Thu, 2002-03-07 at 19:54, Tom Lane wrote:
> Neil Conway <nconway@klamath.dyndns.org> writes:
> > I have no problem making restrictions on data types for portability, but
> > at least we should be consistent:
> 
> We *are* consistent.  int8 is not used in the system catalogs, and where
> it is used, the system will continue to function if it's implemented as
> a 32-bit datatype.  (At least, things still worked the last time I tried
> turning off HAVE_LONG_LONG_INT.  If someone broke it since then, it
> needs to be fixed.)

9 regression tests fail without HAVE_LONG_LONG_INT on a 32-bit machine
(int8, constraints, select_implicit, select_having, subselect, union,
aggregates, misc, rules). It's pretty obvious that int8 should fail, but
the others look like bugs.

As for the original question, maybe I'm missing something obvious, but
is there a reason why reltuples can't be an int8? (which is already
typedef'ed to a int4 on broken machines/compilers) This would mean that
on machines without a 64-bit int type, tables greater than 2^32 rows
can't be stored (or at least, reltuples breaks). But I'm inclined to
dismiss those platforms as broken, anyway...

In any case, I think the current situation is the wrong way around:
we're using a workaround on _all_ platforms, just to avoid breaking a
few old systems. Wouldn't it make more sense to use an int8 by default,
and fall back to a floating-point workaround if the default, optimal
solution isn't available?

Cheers,

Neil

-- 
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC



Re: pg_class -> reltuples?

From
Tom Lane
Date:
Neil Conway <nconway@klamath.dyndns.org> writes:
> 9 regression tests fail without HAVE_LONG_LONG_INT on a 32-bit machine
> (int8, constraints, select_implicit, select_having, subselect, union,
> aggregates, misc, rules). It's pretty obvious that int8 should fail, but
> the others look like bugs.

I think int8_tbl may be used in some of the other tests, so diffs there
are not necessarily a big deal.  Did you examine the diffs closely?

> As for the original question, maybe I'm missing something obvious, but
> is there a reason why reltuples can't be an int8? (which is already
> typedef'ed to a int4 on broken machines/compilers)

Yes: it won't work.  If reltuples is construed to be 8 bytes by some
compilers and 4 bytes by others, then the struct definition will fail to
overlay onto the storage as seen by the general-purpose tuple access
routines.  (We could maybe fix that by having pg_type.h and some other
places conditionally compile the declared size of type int8, but it
ain't worth the trouble.)

> This would mean that
> on machines without a 64-bit int type, tables greater than 2^32 rows
> can't be stored (or at least, reltuples breaks). But I'm inclined to
> dismiss those platforms as broken, anyway...

Sorry, but I have very little patience for arguments that "if it works
on all the machines I use, it's good enough".  Especially for a case
like this, where there is zero advantage to using int8 anyway.
Using a float here is not a "workaround", it's the right thing to do.
(The optimizer would only have to convert it to float anyway for its
internal calculations.)

> Wouldn't it make more sense to use an int8 by default,
> and fall back to a floating-point workaround if the default, optimal
> solution isn't available?

So the user-visible column types of pg_class would vary depending on
this implementation detail?  Not a good idea IMHO.
        regards, tom lane