Thread: large INSERT leads to "invalid memory alloc"

large INSERT leads to "invalid memory alloc"

From
Stefan Froehlich
Date:
While converting a mysql database into postgres, I stumbled over the
following problem:

| INSERT INTO a (id, fkid, displayorder, name, description, internal, mimetype, mimedata, filesize) VALUES (73,6,5,
E'Seefeld.rar',E'Seefeld',0, E'application/octet-stream',decode('5261...0700', 'hex'),311484587); 

As the value for "filesize" suggests, this is a very large BYTEA
(formerly: LONGBLOB) entry with about 300 MB. This is untypical, all
other fields are about a couple of MB and don't make any problems.
This very line leads to:

| sfroehli@host:~$ psql dbname < statement.sql
| Password:
| ERROR:  invalid memory alloc request size 1073741824

I have not found any configuration directive similar to mysqls
"max_allowed_packet" to increase the buffer size. And besides, I
don't understand, why postgres wants to allocate 1 GB to store
300 MB (which take 600 MB of ASCII text in the decode()-string).

Any idea how to put this into the target database?

Stefan


Re: large INSERT leads to "invalid memory alloc"

From
Pavel Stehule
Date:
Hello

2012/11/25 Stefan Froehlich <postgresql@froehlich.priv.at>:
> While converting a mysql database into postgres, I stumbled over the
> following problem:
>
> | INSERT INTO a (id, fkid, displayorder, name, description, internal, mimetype, mimedata, filesize) VALUES (73,6,5,
E'Seefeld.rar',E'Seefeld',0, E'application/octet-stream',decode('5261...0700', 'hex'),311484587); 
>
> As the value for "filesize" suggests, this is a very large BYTEA
> (formerly: LONGBLOB) entry with about 300 MB. This is untypical, all
> other fields are about a couple of MB and don't make any problems.
> This very line leads to:

usually you need 2-3 times memory than is query size for parsing and
execution - and you probably raise a internal check of max allocation
- it expects so any alloc over 1G is strange.

Attention - BYTEA is not BLOB and although physical limit is 1G - real
limit is significantly less - depends on RAM - 7years ago we found so
practical limit is about 20MB.

If you need more, use blobs instead or you can divide value to more blocks

http://www.fuzzy.cz/en/articles/storing-files-in-a-postgresql-database/

Regards

Pavel Stehule


>
> | sfroehli@host:~$ psql dbname < statement.sql
> | Password:
> | ERROR:  invalid memory alloc request size 1073741824
>
> I have not found any configuration directive similar to mysqls
> "max_allowed_packet" to increase the buffer size. And besides, I
> don't understand, why postgres wants to allocate 1 GB to store
> 300 MB (which take 600 MB of ASCII text in the decode()-string).
>
> Any idea how to put this into the target database?
>
> Stefan
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: large INSERT leads to "invalid memory alloc"

From
Stefan Froehlich
Date:
On Sun, Nov 25, 2012 at 06:57:22PM +0100, Pavel Stehule wrote:
> > | INSERT INTO a (id, fkid, displayorder, name, description, internal, mimetype, mimedata, filesize) VALUES (73,6,5,
E'Seefeld.rar',E'Seefeld',0, E'application/octet-stream',decode('5261...0700', 'hex'),311484587); 

> Attention - BYTEA is not BLOB and although physical limit is 1G - real
> limit is significantly less - depends on RAM - 7years ago we found so
> practical limit is about 20MB.

Oops, that's not too much. In the docs I've seen a 4-byte length
descriptor, thus expected a size limit of 4 GB and felt quit safe
with a maximum size of 300 MB.

> If you need more, use blobs instead or you can divide value to more blocks
> http://www.fuzzy.cz/en/articles/storing-files-in-a-postgresql-database/

Yes, storing large data objects in the file system is advisable for
several reasons - we've had the same discussion times ago with mysql as
well. But the decision was made to keep it in the database (and there is
only one object of this size anyways). Rewriting the framework is not an
option at the moment.

If I fail to migrate this into postgresql, we'd rather cancel the
transition.

Stefan


Re: large INSERT leads to "invalid memory alloc"

From
Pavel Stehule
Date:
2012/11/25 Stefan Froehlich <postgresql@froehlich.priv.at>:
> On Sun, Nov 25, 2012 at 06:57:22PM +0100, Pavel Stehule wrote:
>> > | INSERT INTO a (id, fkid, displayorder, name, description, internal, mimetype, mimedata, filesize) VALUES
(73,6,5,E'Seefeld.rar', E'Seefeld',0, E'application/octet-stream',decode('5261...0700', 'hex'),311484587); 
>
>> Attention - BYTEA is not BLOB and although physical limit is 1G - real
>> limit is significantly less - depends on RAM - 7years ago we found so
>> practical limit is about 20MB.
>
> Oops, that's not too much. In the docs I've seen a 4-byte length
> descriptor, thus expected a size limit of 4 GB and felt quit safe
> with a maximum size of 300 MB.
>

you didn't read well - it a 4byte header - but some bites are
reserved. so theoretical limit is 1G

>> If you need more, use blobs instead or you can divide value to more blocks
>> http://www.fuzzy.cz/en/articles/storing-files-in-a-postgresql-database/
>
> Yes, storing large data objects in the file system is advisable for
> several reasons - we've had the same discussion times ago with mysql as
> well. But the decision was made to keep it in the database (and there is
> only one object of this size anyways). Rewriting the framework is not an

> option at the moment.

It highly depends on RAM and on used API - if you use prepared
statements and binary transmission, you probably significantly reduce
memory usage.

But I think so +/- 50MB is practical - and LO interface will be faster
and better.

Regards

Pavel

>
> If I fail to migrate this into postgresql, we'd rather cancel the
> transition.
>
> Stefan
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: large INSERT leads to "invalid memory alloc"

From
Tom Lane
Date:
Stefan Froehlich <postgresql@froehlich.priv.at> writes:
> On Sun, Nov 25, 2012 at 06:57:22PM +0100, Pavel Stehule wrote:
>>> | INSERT INTO a (id, fkid, displayorder, name, description, internal, mimetype, mimedata, filesize) VALUES (73,6,5,
E'Seefeld.rar',E'Seefeld',0, E'application/octet-stream',decode('5261...0700', 'hex'),311484587); 

>> Attention - BYTEA is not BLOB and although physical limit is 1G - real
>> limit is significantly less - depends on RAM - 7years ago we found so
>> practical limit is about 20MB.

> Oops, that's not too much. In the docs I've seen a 4-byte length
> descriptor, thus expected a size limit of 4 GB and felt quit safe
> with a maximum size of 300 MB.

I replicated this case and found that the immediate cause of the problem
is addlit() in the scanner, which is trying to double its work buffer
size until it's larger than the literal string --- so even though the
string is "only" 600MB, it tries to make a 1GB buffer.

We could fix that particular case but there are doubtless others.
It's not really a good idea to be pushing query texts of hundreds of
megabytes through the system.  Quite aside from whether you'd hit the
1GB-per-alloc hard limit, the system is likely to make quite a few
copies of any constant value in the process of parsing/planning a query.

You might have better luck if you treated the large value as an
out-of-line parameter instead of a literal constant.  Aside from dodging
the costs of a very large query string and a large Const value, you
could send the parameter value in binary and avoid hex-to-binary
conversion costs.

            regards, tom lane


interpret bytea output as text / double encode()

From
Stefan Froehlich
Date:
In one of my databases, I have mistakenly double encoded bytea
values (i.e. the content is literally '\x202020...' which would have
to be decoded once more to get the actually desired content).

But how to get to the content? This:

# select encode(column, 'escape')

gives me the once-only encoded value, which I'd like to decode once more
to get the original content. Using this:

# select encode(encode(column, 'escape'), 'escape')

does not work, because PostgreSQL is aware of the fact, that the result
of the inner operation is text and thus may not be used as an argument
for encod(). Casting the result does not work either:

# select encode(encode(column, 'escape')::bytea, 'escape')

because the cast reverts the effect of the first encode(), so the result
does not change.

If I take the result of the first encode() with copy/paste and put
it through another encode() statement, everything is fine - but I
have not found out a way how to automate this.

Is there a chance? Can anyone help out?

Stefan


Re: interpret bytea output as text / double encode()

From
Rob Sargent
Date:

Sent from my iPhone

> On Jun 6, 2014, at 7:00 AM, Stefan Froehlich <postgresql@froehlich.priv.at> wrote:
>
> In one of my databases, I have mistakenly double encoded bytea
> values (i.e. the content is literally '\x202020...' which would have
> to be decoded once more to get the actually desired content).
>
> But how to get to the content? This:
>
> # select encode(column, 'escape')
>
> gives me the once-only encoded value, which I'd like to decode once more
> to get the original content. Using this:
>
> # select encode(encode(column, 'escape'), 'escape')
>
> does not work, because PostgreSQL is aware of the fact, that the result
> of the inner operation is text and thus may not be used as an argument
> for encod(). Casting the result does not work either:
>
> # select encode(encode(column, 'escape')::bytea, 'escape')
>
> because the cast reverts the effect of the first encode(), so the result
> does not change.
>
> If I take the result of the first encode() with copy/paste and put
> it through another encode() statement, everything is fine - but I
> have not found out a way how to automate this.
>
> Is there a chance? Can anyone help out?
>
> Stefan
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Any  chance you can encode() into temp table then encode() the result.

Re: interpret bytea output as text / double encode()

From
Stefan Froehlich
Date:
> > # select encode(encode(column, 'escape'), 'escape')

> Any  chance you can encode() into temp table then encode() the result.

Don't think so, PostgreSQL will recognize the temp column either as text
or as bytea and then refuse either the insert or the encode() because of
a type mismatch.

Any workaround for this would most likely solve my original problem
without the use of a temp table.

Stefan


Re: interpret bytea output as text / double encode()

From
Tom Lane
Date:
Stefan Froehlich <postgresql@froehlich.priv.at> writes:
> ... Casting the result does not work either:

> # select encode(encode(column, 'escape')::bytea, 'escape')

> because the cast reverts the effect of the first encode(), so the result
> does not change.

Since there's no explicitly defined cast from text to bytea according
to pg_cast, what happens when you do this is that the textual result
of the inner encode() is fed to bytea's input function, which of course
is supposed to decode escaped data.

> If I take the result of the first encode() with copy/paste and put
> it through another encode() statement, everything is fine - but I
> have not found out a way how to automate this.

I'm a bit confused by this statement, because it sounds like what
you did manually is precisely to feed the text string to bytea's
input function.  So I don't see quite what the difference is.

It's possible that what you are looking for is a binary-equivalent
cast from text to bytea, which you could create like this:

# create cast (text as bytea) without function;

However when I experiment with

# select encode(encode(column, 'escape')::bytea, 'escape')

after doing that, it doesn't seem like the results are very useful,
so I think I'm misunderstanding.

            regards, tom lane


Re: interpret bytea output as text / double encode()

From
Stefan Froehlich
Date:
On Fri, Jun 06, 2014 at 10:30:56AM -0400, Tom Lane wrote:
> It's possible that what you are looking for is a binary-equivalent
> cast from text to bytea, which you could create like this:

> # create cast (text as bytea) without function;

Hm. No, actually it does not help. But playing around with the cast
revealed to me another thing. If I double-encode "12345", I have:

| # select encode('\x5c7833313332333333343335', 'escape');
|     encode
|     ---------------
|      \\x3132333435
|      (1 row)

Now the intermediate result contains a '\\' which I just ignored as
it is the usual quoting for a '\'. *But* of course this prevents the
string from being correctly interpreted as a bytea-sequence, so the
solution for the problem is:

| # select encode(right(encode('\x5c7833313332333333343335', 'escape'), -1)::bytea, 'escape');
|  encode
|  --------
|   12345
|   (1 row)

Kind of a hack, but this works on tables as well and will save me a
lot of trouble. Thank you for the inspiration!

Stefan