Thread: Is MaxHeapAttributeNumber a reasonable restriction for foreign-tables?

Is MaxHeapAttributeNumber a reasonable restriction for foreign-tables?

From
Kohei KaiGai
Date:
Hello,

I noticed that CheckAttributeNamesTypes() prevents to create a table that has
more than MaxHeapAttributeNumber (1600) columns, for foreign-table also.
IIUC, this magic number comes from length of the null-bitmap can be covered
with t_hoff in HeapTupleHeaderData.
For heap-tables, it seems to me a reasonable restriction to prevent overrun of
null-bitmap. On the other hand, do we have proper reason to apply same
restrictions on foreign-tables also?

Foreign-tables have their own unique internal data structures instead of
the PostgreSQL's heap-table, and some of foreign-data can have thousands
attributes in their structured data.
I think that MaxHeapAttributeNumber is a senseless restriction for foreign-
tables. How about your opinions?

Best regards,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>



Re: Is MaxHeapAttributeNumber a reasonable restriction for foreign-tables?

From
Amit Langote
Date:
Hello,

On Thu, Feb 4, 2021 at 4:24 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> I noticed that CheckAttributeNamesTypes() prevents to create a table that has
> more than MaxHeapAttributeNumber (1600) columns, for foreign-table also.
> IIUC, this magic number comes from length of the null-bitmap can be covered
> with t_hoff in HeapTupleHeaderData.
> For heap-tables, it seems to me a reasonable restriction to prevent overrun of
> null-bitmap. On the other hand, do we have proper reason to apply same
> restrictions on foreign-tables also?
>
> Foreign-tables have their own unique internal data structures instead of
> the PostgreSQL's heap-table, and some of foreign-data can have thousands
> attributes in their structured data.
> I think that MaxHeapAttributeNumber is a senseless restriction for foreign-
> tables. How about your opinions?

My first reaction to this was a suspicion that the
MaxHeapAttributeNumber limit would be too ingrained in PostgreSQL's
architecture to consider this matter lightly, but actually browsing
the code, that may not really be the case.  Other than
src/backend/access/heap/*, here are the places that check it:

catalog/heap.c: CheckAttributeNamesTypes() that you mentioned:

    /* Sanity check on column count */
    if (natts < 0 || natts > MaxHeapAttributeNumber)
        ereport(ERROR,
                (errcode(ERRCODE_TOO_MANY_COLUMNS),
                 errmsg("tables can have at most %d columns",
                        MaxHeapAttributeNumber)));

tablecmds.c: MergeAttributes():

    /*
     * Check for and reject tables with too many columns. We perform this
     * check relatively early for two reasons: (a) we don't run the risk of
     * overflowing an AttrNumber in subsequent code (b) an O(n^2) algorithm is
     * okay if we're processing <= 1600 columns, but could take minutes to
     * execute if the user attempts to create a table with hundreds of
     * thousands of columns.
     *
     * Note that we also need to check that we do not exceed this figure after
     * including columns from inherited relations.
     */
    if (list_length(schema) > MaxHeapAttributeNumber)
        ereport(ERROR,
                (errcode(ERRCODE_TOO_MANY_COLUMNS),
                 errmsg("tables can have at most %d columns",
                        MaxHeapAttributeNumber)));


tablecmds.c: ATExecAddColumn():

    /* Determine the new attribute's number */
    newattnum = ((Form_pg_class) GETSTRUCT(reltup))->relnatts + 1;
    if (newattnum > MaxHeapAttributeNumber)
        ereport(ERROR,
                (errcode(ERRCODE_TOO_MANY_COLUMNS),
                 errmsg("tables can have at most %d columns",
                        MaxHeapAttributeNumber)));

So, unless I am terribly wrong, we may have a shot at revisiting the
decision that would have set this limit.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: Is MaxHeapAttributeNumber a reasonable restriction for foreign-tables?

From
Tom Lane
Date:
Amit Langote <amitlangote09@gmail.com> writes:
> On Thu, Feb 4, 2021 at 4:24 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
>> I think that MaxHeapAttributeNumber is a senseless restriction for foreign-
>> tables. How about your opinions?

> My first reaction to this was a suspicion that the
> MaxHeapAttributeNumber limit would be too ingrained in PostgreSQL's
> architecture to consider this matter lightly, but actually browsing
> the code, that may not really be the case.

You neglected to search for MaxTupleAttributeNumber...

I'm quite skeptical of trying to raise this limit significantly.

In the first place, you'd have to worry about the 2^15 limit on
int16 AttrNumbers --- and keep in mind that that has to be enough
for reasonable-size joins, not only an individual table.  If you
join a dozen or so max-width tables, you're already most of the way
to that limit.

In the second place, as noted by the comment you quoted, there are
algorithms in various places that are O(N^2) (or maybe even worse?)
in the number of columns they're dealing with.

In the third place, I've yet to see a use-case that didn't represent
crummy table design.  Pushing the table off to a remote server doesn't
make it less crummy design.

            regards, tom lane



Re: Is MaxHeapAttributeNumber a reasonable restriction for foreign-tables?

From
Amit Langote
Date:
On Thu, Feb 4, 2021 at 11:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Thu, Feb 4, 2021 at 4:24 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> >> I think that MaxHeapAttributeNumber is a senseless restriction for foreign-
> >> tables. How about your opinions?
>
> > My first reaction to this was a suspicion that the
> > MaxHeapAttributeNumber limit would be too ingrained in PostgreSQL's
> > architecture to consider this matter lightly, but actually browsing
> > the code, that may not really be the case.
>
> You neglected to search for MaxTupleAttributeNumber...

Ah, I did.  Although, even its usage seems mostly limited to modules
under src/backend/access/heap.

> I'm quite skeptical of trying to raise this limit significantly.
>
> In the first place, you'd have to worry about the 2^15 limit on
> int16 AttrNumbers --- and keep in mind that that has to be enough
> for reasonable-size joins, not only an individual table.  If you
> join a dozen or so max-width tables, you're already most of the way
> to that limit.
>
> In the second place, as noted by the comment you quoted, there are
> algorithms in various places that are O(N^2) (or maybe even worse?)
> in the number of columns they're dealing with.

Those are certainly intimidating considerations.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: Is MaxHeapAttributeNumber a reasonable restriction for foreign-tables?

From
Kohei KaiGai
Date:
2021年2月4日(木) 23:45 Tom Lane <tgl@sss.pgh.pa.us>:
>
> Amit Langote <amitlangote09@gmail.com> writes:
> > On Thu, Feb 4, 2021 at 4:24 PM Kohei KaiGai <kaigai@heterodb.com> wrote:
> >> I think that MaxHeapAttributeNumber is a senseless restriction for foreign-
> >> tables. How about your opinions?
>
> > My first reaction to this was a suspicion that the
> > MaxHeapAttributeNumber limit would be too ingrained in PostgreSQL's
> > architecture to consider this matter lightly, but actually browsing
> > the code, that may not really be the case.
>
> You neglected to search for MaxTupleAttributeNumber...
>
> I'm quite skeptical of trying to raise this limit significantly.
>
> In the first place, you'd have to worry about the 2^15 limit on
> int16 AttrNumbers --- and keep in mind that that has to be enough
> for reasonable-size joins, not only an individual table.  If you
> join a dozen or so max-width tables, you're already most of the way
> to that limit.
>
free_parsestate() also prevents to use target-list more than
MaxTupleAttributeNumber.
(But it is reasonable restriction because we cannot guarantee that
HeapTupleTableSlot
is not used during query execution.)

> In the second place, as noted by the comment you quoted, there are
> algorithms in various places that are O(N^2) (or maybe even worse?)
> in the number of columns they're dealing with.
>
Only table creation time, isn't it?
If N is not small (probably >100), we can use temporary HTAB to ensure
duplicated column-name is not supplied.

> In the third place, I've yet to see a use-case that didn't represent
> crummy table design.  Pushing the table off to a remote server doesn't
> make it less crummy design.
>
I met this limitation to create a foreign-table that try to map Apache
Arrow file that
contains ~2,500 attributes of scientific observation data.
Apache Arrow internally has columnar format, and queries to this
data-set references
up to 10-15 columns on average. So, it shall make the query execution much more
efficient.

Thanks,
--
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei <kaigai@heterodb.com>