Home > mailing lists

Re: badly scaling performance with appending to bytea - Mailing list pgsql-performance

From	Rick Otten
Subject	Re: badly scaling performance with appending to bytea
Date	March 21, 2018 15:12:31
Msg-id	CAMAYy4+E9-0+Zee7ed9H6QB_kqp2oW8Q=QY8c7Pywinp0eSBnA@mail.gmail.com Whole thread Raw
In response to	badly scaling performance with appending to bytea (Gary Cowell <gary.cowell@gmail.com>)
List	pgsql-performance

Tree view

Can you use a materialized view to do the bytea_agg() and then refresh concurrently whenever you need updated data?

The refresh concurrently might take a few hours or days to run to keep the matview up to date, but your queries would be pretty fast.

A possible problem is that you are running out of memory, so the larger queries are going to disk. If you can set up temp space on a faster volume, or bump up your memory configuration it might help.

ie, work_mem, shared_buffers, and file system cache could all play into larger aggregations running faster.

On Wed, Mar 21, 2018 at 8:03 AM, Gary Cowell <gary.cowell@gmail.com> wrote:

We are trying to implement postgresql code to load a large object into
a postgresql bytea in chunks to avoid loading the file into memory in
the client.

First attempt was to do

update build_attachment set chunk = chunk || newdata ;

this did not scale and got significantly slower after 4000-5000 updates.

The chunks are 4K in size, and I'm testing with a 128MB input file,
requiring 32,774 chunk updates.

Next, I tried creating an aggregate, thus:

(taken from stackoverflow)

CREATE AGGREGATE bytea_agg(bytea) (SFUNC=byteacat,STYPE=bytea);

changed the code to insert the chunks to a temporary table :

create temporary table build_attachment (seq bigserial primary key,
chunk bytea ) on commit drop;

we then insert our 4K chunks to this, which takes very little time (20
seconds for the 32,774 inserts)

Here's an example though of trying to select the aggregate:

gary=> \timing
Timing is on.
gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 4000 \g output
Time: 13372.843 ms
gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 8000 \g output
Time: 54447.541 ms
gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 16000 \g output
Time: 582219.773 ms

So those partial aggregates completed in somewhat acceptable times but ...

gary=> select bytea_agg(chunk order by seq) from build_attachment
where seq < 32000 \g output
this one hadn't completed in an hour - the PostgreSQL connection
process for my connection on the server goes to 100% CPU and stays
there, not using much RAM, not doing much IO, oddly

EXPLAINing these aggregate selects doesn't show anything useful.

Am I doomed to not be able to update a bytea this way? Is there some
way I can tune this?

pgsql-performance by date:

From: Gary Cowell
Date: 21 March 2018, 15:03:17
Subject: badly scaling performance with appending to bytea

From: Pavel Stehule
Date: 21 March 2018, 15:56:24
Subject: Re: badly scaling performance with appending to bytea

Re: badly scaling performance with appending to bytea - Mailing list pgsql-performance

Previous

Next