Re: How to store "blobs" efficiently for small and large sizes, with random access - Mailing list pgsql-general

From Daniel Verite
Subject Re: How to store "blobs" efficiently for small and large sizes, with random access
Date
Msg-id 08dacb6b-09ea-4c04-b3db-b308e8b208ab@manitou-mail.org
Whole thread Raw
In response to How to store "blobs" efficiently for small and large sizes, with random access  (Dominique Devienne <ddevienne@gmail.com>)
Responses Re: How to store "blobs" efficiently for small and large sizes, with random access
List pgsql-general
    Dominique Devienne wrote:

> PostgreSQL bytea is much better and simpler, except limited to 1GB...
> Bytea also has no direct random access, except via substr[ing], but
> how efficient and "random access" is that?

Bytea contents are compressed before being sliced (in chunks of
TOAST_MAX_CHUNK_SIZE bytes, typically it's 2000 IIRC), so it's not
possible to access a piece of data without decompressing the contents
before it.

By contrast large objects are sliced before compression, so the
performance of random access is likely to be completely different.

> Here are the main requirement I need to fulfil:
> 1) store literally millions of rows, 1 "blob" per row. (scientific data).
> 2) most "blobs" are smallish, below a few KBs. bytea's perfect for that.
> 3) yet many blobs are on the dozens of MBs. bytea's still OK.
> 4) but some blobs exceed the 1GB byte limit. Found at least a dozen
> just in our test data, so clients will have them.
> 5) for accessing larger blobs, the API I must implement accesses
> contiguous chunks of the blobs. Thus I need random access.

In your case I would probably opt for bytea (as opposed to large
objects), and slicing the blobs in the application in chunks of a
fixed size much larger than what TOAST does (for instance, somewhere
between 128 kB and 32 MB).

That is, having a table like:

create table blobs (
  blob_id some_type,
  chunk_no int,  /* 0->N */
  chunk bytea
);

It's not as simple as using a single bytea field or large objects,
but overall it avoids the management difficulties of both large
objects and very large contents in bytea columns.
Random access is achieved by skipping the chunks before the
requested piece of data.

If the app is able to read/write the chunks in binary mode, its queries
should perform as well as the large objects functions.


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



pgsql-general by date:

Previous
From: Ron
Date:
Subject: Re: Is this error expected ?
Next
From: Ron
Date:
Subject: Re: COMMIT IN STORED PROCEDURE WHILE IN A LOOP