Thread: TOAST condition for column size

TOAST condition for column size

From
torikoshia
Date:
Hi,

When I created a table consisting of 400 VARCHAR columns and tried
to INSERT a record which rows were all the same size, there were
cases where I got an error due to exceeding the size limit per
row.

   =# -- create a table consisting of 400 VARCHAR columns
   =# CREATE TABLE t1 (c1 VARCHAR(100),
                       c2 VARCHAR(100),
                       ...
                       c400 VARCHAR(100));

   =# -- insert one record which rows are all 20 bytes
   =# INSERT INTO t1 VALUES (repeat('a', 20),
                             repeat('a', 20),
                             ...
                             repeat('a', 20));
     ERROR:  row is too big: size 8424, maximum size 8160

What is interesting is that it failed only when the size of each
column was 20~23 bytes, as shown below.

   size of each column  |  result
   -------------------------------
           18 bytes     |  success
           19 bytes     |  success
           20 bytes     |  failure
           21 bytes     |  failure
           22 bytes     |  failure
           23 bytes     |  failure
           24 bytes     |  success
           25 bytes     |  success


When the size of each column was 19 bytes or less, it succeeds
because the row size is within a page size.
When the size of each column was 24 bytes or more, it also
succeeds because columns are TOASTed and the row size is reduced
to less than one page size.
OTOH, when it's more than 19 bytes and less than 24 bytes,
columns aren't TOASTed because it doesn't meet the condition of
the following if statement.

  --src/backend/access/table/toast_helper.c

    toast_tuple_find_biggest_attribute(ToastTupleContext *ttc,
                          bool for_compression, bool check_main)
        ...(snip)...
        int32        biggest_size = MAXALIGN(TOAST_POINTER_SIZE);
        ...(snip)...
        if (ttc->ttc_attr[i].tai_size > biggest_size) // <- here
        {
            biggest_attno = i;
            biggest_size = ttc->ttc_attr[i].tai_size;
        }


Since TOAST_POINTER_SIZE is 18 bytes but
MAXALIGN(TOAST_POINTER_SIZE) is 24 bytes, columns are not TOASTed
until its size becomes larger than 24 bytes.

I confirmed these sizes in my environment but AFAIU they would be
the same size in any environment.

So, as a result of adjusting the alignment, 20~23 bytes seems to
fail.

I wonder if it might be better not to adjust the alignment here
as an attached patch because it succeeded in inserting 20~23
bytes records.
Or is there reasons to add the alignment here?

I understand that TOAST is not effective for small data and it's
not recommended to create a table containing hundreds of columns,
but I think cases that can be successful should be successful.

Any thoughts?


Regards,

--
Atsushi Torikoshi
Attachment

Re: TOAST condition for column size

From
Amit Kapila
Date:
On Mon, Jan 18, 2021 at 7:53 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Hi,

> I confirmed these sizes in my environment but AFAIU they would be
> the same size in any environment.
>
> So, as a result of adjusting the alignment, 20~23 bytes seems to
> fail.
>
> I wonder if it might be better not to adjust the alignment here
> as an attached patch because it succeeded in inserting 20~23
> bytes records.
> Or is there reasons to add the alignment here?
>

Because no benefit is to be expected by compressing it. The size will
be mostly the same. Also, even if we somehow try to fit this data via
toast, I think reading speed will be slower because for all such
columns an extra fetch from toast would be required. Another thing is
you or others can still face the same problem with 17-byte column
data. I don't this is the right way to fix it. I don't have many good
ideas but I think you can try by (a) increasing block size during
configure, (b) reduce the number of columns, (c) create char columns
of somewhat bigger size say greater than 24 bytes to accommodate your
case.

I know none of these are good workarounds but at this moment I can't
think of better alternatives.

-- 
With Regards,
Amit Kapila.



Re: TOAST condition for column size

From
Dilip Kumar
Date:
On Mon, Jan 18, 2021 at 7:53 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Hi,
>
> When I created a table consisting of 400 VARCHAR columns and tried
> to INSERT a record which rows were all the same size, there were
> cases where I got an error due to exceeding the size limit per
> row.
>
>    =# -- create a table consisting of 400 VARCHAR columns
>    =# CREATE TABLE t1 (c1 VARCHAR(100),
>                        c2 VARCHAR(100),
>                        ...
>                        c400 VARCHAR(100));
>
>    =# -- insert one record which rows are all 20 bytes
>    =# INSERT INTO t1 VALUES (repeat('a', 20),
>                              repeat('a', 20),
>                              ...
>                              repeat('a', 20));
>      ERROR:  row is too big: size 8424, maximum size 8160
>
> What is interesting is that it failed only when the size of each
> column was 20~23 bytes, as shown below.
>
>    size of each column  |  result
>    -------------------------------
>            18 bytes     |  success
>            19 bytes     |  success
>            20 bytes     |  failure
>            21 bytes     |  failure
>            22 bytes     |  failure
>            23 bytes     |  failure
>            24 bytes     |  success
>            25 bytes     |  success
>
>
> When the size of each column was 19 bytes or less, it succeeds
> because the row size is within a page size.
> When the size of each column was 24 bytes or more, it also
> succeeds because columns are TOASTed and the row size is reduced
> to less than one page size.
> OTOH, when it's more than 19 bytes and less than 24 bytes,
> columns aren't TOASTed because it doesn't meet the condition of
> the following if statement.
>
>   --src/backend/access/table/toast_helper.c
>
>     toast_tuple_find_biggest_attribute(ToastTupleContext *ttc,
>                           bool for_compression, bool check_main)
>         ...(snip)...
>         int32        biggest_size = MAXALIGN(TOAST_POINTER_SIZE);
>         ...(snip)...
>         if (ttc->ttc_attr[i].tai_size > biggest_size) // <- here
>         {
>             biggest_attno = i;
>             biggest_size = ttc->ttc_attr[i].tai_size;
>         }
>
>
> Since TOAST_POINTER_SIZE is 18 bytes but
> MAXALIGN(TOAST_POINTER_SIZE) is 24 bytes, columns are not TOASTed
> until its size becomes larger than 24 bytes.
>
> I confirmed these sizes in my environment but AFAIU they would be
> the same size in any environment.
>
> So, as a result of adjusting the alignment, 20~23 bytes seems to
> fail.
>
> I wonder if it might be better not to adjust the alignment here
> as an attached patch because it succeeded in inserting 20~23
> bytes records.
> Or is there reasons to add the alignment here?
>
> I understand that TOAST is not effective for small data and it's
> not recommended to create a table containing hundreds of columns,
> but I think cases that can be successful should be successful.
>
> Any thoughts?

How this can be correct? because while forming the tuple you might
need the alignment.  So basically while computing the size we are not
considering alignment and later while actually forming the tuple you
might have to align it so seems like it can create corruption while
forming the tuple.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: TOAST condition for column size

From
Amit Kapila
Date:
On Tue, Jan 19, 2021 at 5:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Jan 18, 2021 at 7:53 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >
> > Hi,
> >
> > When I created a table consisting of 400 VARCHAR columns and tried
> > to INSERT a record which rows were all the same size, there were
> > cases where I got an error due to exceeding the size limit per
> > row.
> >
> >    =# -- create a table consisting of 400 VARCHAR columns
> >    =# CREATE TABLE t1 (c1 VARCHAR(100),
> >                        c2 VARCHAR(100),
> >                        ...
> >                        c400 VARCHAR(100));
> >
> >    =# -- insert one record which rows are all 20 bytes
> >    =# INSERT INTO t1 VALUES (repeat('a', 20),
> >                              repeat('a', 20),
> >                              ...
> >                              repeat('a', 20));
> >      ERROR:  row is too big: size 8424, maximum size 8160
> >
> > What is interesting is that it failed only when the size of each
> > column was 20~23 bytes, as shown below.
> >
> >    size of each column  |  result
> >    -------------------------------
> >            18 bytes     |  success
> >            19 bytes     |  success
> >            20 bytes     |  failure
> >            21 bytes     |  failure
> >            22 bytes     |  failure
> >            23 bytes     |  failure
> >            24 bytes     |  success
> >            25 bytes     |  success
> >
> >
> > When the size of each column was 19 bytes or less, it succeeds
> > because the row size is within a page size.
> > When the size of each column was 24 bytes or more, it also
> > succeeds because columns are TOASTed and the row size is reduced
> > to less than one page size.
> > OTOH, when it's more than 19 bytes and less than 24 bytes,
> > columns aren't TOASTed because it doesn't meet the condition of
> > the following if statement.
> >
> >   --src/backend/access/table/toast_helper.c
> >
> >     toast_tuple_find_biggest_attribute(ToastTupleContext *ttc,
> >                           bool for_compression, bool check_main)
> >         ...(snip)...
> >         int32        biggest_size = MAXALIGN(TOAST_POINTER_SIZE);
> >         ...(snip)...
> >         if (ttc->ttc_attr[i].tai_size > biggest_size) // <- here
> >         {
> >             biggest_attno = i;
> >             biggest_size = ttc->ttc_attr[i].tai_size;
> >         }
> >
> >
> > Since TOAST_POINTER_SIZE is 18 bytes but
> > MAXALIGN(TOAST_POINTER_SIZE) is 24 bytes, columns are not TOASTed
> > until its size becomes larger than 24 bytes.
> >
> > I confirmed these sizes in my environment but AFAIU they would be
> > the same size in any environment.
> >
> > So, as a result of adjusting the alignment, 20~23 bytes seems to
> > fail.
> >
> > I wonder if it might be better not to adjust the alignment here
> > as an attached patch because it succeeded in inserting 20~23
> > bytes records.
> > Or is there reasons to add the alignment here?
> >
> > I understand that TOAST is not effective for small data and it's
> > not recommended to create a table containing hundreds of columns,
> > but I think cases that can be successful should be successful.
> >
> > Any thoughts?
>
> How this can be correct? because while forming the tuple you might
> need the alignment.
>

Won't it be safe because we don't align individual attrs of type
varchar where length is less than equal to 127?

-- 
With Regards,
Amit Kapila.



Re: TOAST condition for column size

From
Dilip Kumar
Date:
On Tue, 19 Jan 2021 at 6:28 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Jan 19, 2021 at 5:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Jan 18, 2021 at 7:53 PM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >
> > Hi,
> >
> > When I created a table consisting of 400 VARCHAR columns and tried
> > to INSERT a record which rows were all the same size, there were
> > cases where I got an error due to exceeding the size limit per
> > row.
> >
> >    =# -- create a table consisting of 400 VARCHAR columns
> >    =# CREATE TABLE t1 (c1 VARCHAR(100),
> >                        c2 VARCHAR(100),
> >                        ...
> >                        c400 VARCHAR(100));
> >
> >    =# -- insert one record which rows are all 20 bytes
> >    =# INSERT INTO t1 VALUES (repeat('a', 20),
> >                              repeat('a', 20),
> >                              ...
> >                              repeat('a', 20));
> >      ERROR:  row is too big: size 8424, maximum size 8160
> >
> > What is interesting is that it failed only when the size of each
> > column was 20~23 bytes, as shown below.
> >
> >    size of each column  |  result
> >    -------------------------------
> >            18 bytes     |  success
> >            19 bytes     |  success
> >            20 bytes     |  failure
> >            21 bytes     |  failure
> >            22 bytes     |  failure
> >            23 bytes     |  failure
> >            24 bytes     |  success
> >            25 bytes     |  success
> >
> >
> > When the size of each column was 19 bytes or less, it succeeds
> > because the row size is within a page size.
> > When the size of each column was 24 bytes or more, it also
> > succeeds because columns are TOASTed and the row size is reduced
> > to less than one page size.
> > OTOH, when it's more than 19 bytes and less than 24 bytes,
> > columns aren't TOASTed because it doesn't meet the condition of
> > the following if statement.
> >
> >   --src/backend/access/table/toast_helper.c
> >
> >     toast_tuple_find_biggest_attribute(ToastTupleContext *ttc,
> >                           bool for_compression, bool check_main)
> >         ...(snip)...
> >         int32        biggest_size = MAXALIGN(TOAST_POINTER_SIZE);
> >         ...(snip)...
> >         if (ttc->ttc_attr[i].tai_size > biggest_size) // <- here
> >         {
> >             biggest_attno = i;
> >             biggest_size = ttc->ttc_attr[i].tai_size;
> >         }
> >
> >
> > Since TOAST_POINTER_SIZE is 18 bytes but
> > MAXALIGN(TOAST_POINTER_SIZE) is 24 bytes, columns are not TOASTed
> > until its size becomes larger than 24 bytes.
> >
> > I confirmed these sizes in my environment but AFAIU they would be
> > the same size in any environment.
> >
> > So, as a result of adjusting the alignment, 20~23 bytes seems to
> > fail.
> >
> > I wonder if it might be better not to adjust the alignment here
> > as an attached patch because it succeeded in inserting 20~23
> > bytes records.
> > Or is there reasons to add the alignment here?
> >
> > I understand that TOAST is not effective for small data and it's
> > not recommended to create a table containing hundreds of columns,
> > but I think cases that can be successful should be successful.
> >
> > Any thoughts?
>
> How this can be correct? because while forming the tuple you might
> need the alignment.
>

Won't it be safe because we don't align individual attrs of type
varchar where length is less than equal to 127?

Yeah right,  I just missed that point.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: TOAST condition for column size

From
Tom Lane
Date:
Dilip Kumar <dilipbalaut@gmail.com> writes:
> On Tue, 19 Jan 2021 at 6:28 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Won't it be safe because we don't align individual attrs of type
>> varchar where length is less than equal to 127?

> Yeah right,  I just missed that point.

Yeah, the minimum on biggest_size has nothing to do with alignment
decisions.  It's just a filter to decide whether it's worth trying
to toast anything.

Having said that, I'm pretty skeptical of this patch: I think its
most likely real-world effect is going to be to waste cycles (and
create TOAST-table bloat) on the way to failing anyway.  I do not
think that toasting a 20-byte field down to 18 bytes is likely to be
a productive thing to do in typical situations.  The given example
looks like a cherry-picked edge case rather than a useful case to
worry about.

IOW, if I were asked to review whether the current minimum is
well-chosen, I'd be wondering if we should increase it not
decrease it.

            regards, tom lane



Re: TOAST condition for column size

From
torikoshia
Date:
On 2021-01-19 19:32, Amit Kapila wrote:
> On Mon, Jan 18, 2021 at 7:53 PM torikoshia
> Because no benefit is to be expected by compressing it. The size will
> be mostly the same. Also, even if we somehow try to fit this data via
> toast, I think reading speed will be slower because for all such
> columns an extra fetch from toast would be required. Another thing is
> you or others can still face the same problem with 17-byte column
> data. I don't this is the right way to fix it. I don't have many good
> ideas but I think you can try by (a) increasing block size during
> configure, (b) reduce the number of columns, (c) create char columns
> of somewhat bigger size say greater than 24 bytes to accommodate your
> case.
> 
> I know none of these are good workarounds but at this moment I can't
> think of better alternatives.

Thanks for your explanation and workarounds!



On 2021-01-20 00:40, Tom Lane wrote:
> Dilip Kumar <dilipbalaut@gmail.com> writes:
>> On Tue, 19 Jan 2021 at 6:28 PM, Amit Kapila <amit.kapila16@gmail.com> 
>> wrote:
>>> Won't it be safe because we don't align individual attrs of type
>>> varchar where length is less than equal to 127?
> 
>> Yeah right,  I just missed that point.
> 
> Yeah, the minimum on biggest_size has nothing to do with alignment
> decisions.  It's just a filter to decide whether it's worth trying
> to toast anything.
> Having said that, I'm pretty skeptical of this patch: I think its
> most likely real-world effect is going to be to waste cycles (and
> create TOAST-table bloat) on the way to failing anyway.  I do not
> think that toasting a 20-byte field down to 18 bytes is likely to be
> a productive thing to do in typical situations.  The given example
> looks like a cherry-picked edge case rather than a useful case to
> worry about.

I agree with you, it seems only work when there are many columns with
19 ~ 23 bytes of data and it's not a normal case.
I'm not sure, but a rare exception might be some geographic data.
That's the situation I heard that problem happened.


Regards,

--
Atsushi Torikoshi