Re: CLOB & BLOB limitations in PostgreSQL - Mailing list pgsql-general
From | Jack.O'Sullivan@tessella.com |
---|---|
Subject | Re: CLOB & BLOB limitations in PostgreSQL |
Date | |
Msg-id | OFAB0BD7CA.F3DEEF64-ON80257CB7.0056CD13-80257CB7.0057DA7B@tessella.co.uk Whole thread Raw |
In response to | Re: CLOB & BLOB limitations in PostgreSQL (Andy Colson <andy@squeakycode.net>) |
List | pgsql-general |
Hi Andy,
Thanks for getting those thoughts out so quickly.
As far as points 1 and 2 go, it is definitely something to think about, but they are largely tangential to what I need to worry about at this moment. I am less concerned about "how much disk do we need to store this" than "is it even possible to store this".
If I'm understanding your point 3 correctly, you are referring to the OID/LOB type, which, from what I have read, is even more restrictive than bytea or text in that you have a limit of 4 billion objects per database (https://wiki.postgresql.org/wiki/BinaryFilesInDB). The fact that each object can be 2G rather than 1G is not much help to us as our objects are unlikely to ever been more than ~100k.
The table in question supports storing information either as XML, which we store in a text/char type column, or binary compressed xml, which is in the BLOB type column. The particular client I'm doing this for uses the compressed version, so all of their data in this table is binary.
Thanks
Jack
From: Andy Colson <andy@squeakycode.net>
To: Jack.O'Sullivan@tessella.com, pgsql-general@postgresql.org,
Date: 11/04/2014 16:24
Subject: Re: [GENERAL] CLOB & BLOB limitations in PostgreSQL
On 4/11/2014 9:45 AM, Jack.O'Sullivan@tessella.com wrote:
> I am working for a client who is interested in migrating from Oracle to
> Postgres. Their database is currently ~20TB in size, and is growing. The
> biggest table in this database is effectively a BLOB store and currently
> has around 1 billion rows.
>
> From reading around Postgres, there are a couple of limits which are
> concerning in terms of being able to migrate this database. We are not
> up against these limits just yet, but it is likely that they will be a
> potential blocker within the next few years.
>
> 1) Table can be maximum of 32TB (http://www.postgresql.org/about/)
>
> 2) When storing bytea or text datatypes there is a limit of 4 billion
> entries per table (https://wiki.postgresql.org/wiki/BinaryFilesInDB)
>
> With both of these, are they hard limits or can they be worked around
> with partitioning of tables? Could we set the table up in such a way
> that each child table was limited, but there was no limit on the number
> of children?
>
> With point two, does this mean that any table with a bytea datatype is
> limited to 4 billion rows (which would seem in conflict with the
> "unlimited rows" shown by http://www.postgresql.org/about)? If we had
> rows where the bytea was a "null" entry would they contribute towards
> this total or is it 4 billion non-null entries?
>
> Thanks.
Sorry I cant answer any of your questions, but I do have a few more to
raise:
1) I assume Oracle is pretty efficient on disk. You might wanna do a
quick test of a million rows or so and compare the on disk size of an
Oracle db vs PG. It wouldn't surprise me if PG used more space. (I
mean regular varchar, integer, etc.)
2) Does the Oracle blob compress? PG will compress but I'd bet they
compress differently. Again, you might wanna dump out a million blobs
and compare their space usage. At 20TB, a 10% increase in disk usage is
quite a bit.
3) There are two ways to store blob data. Bytea in your table and Large
Object support (in a separate table). Google "postgres bytea vs large
object" might offer useful reading.
I don't know if bytea or large object offer more efficient storage, but
it might be another thing you can test. Large object might be a little
more work to use, but if it saves lots of disk space, it might be worth it.
4) is this blob data binary'ish? We have json/hstore if its text'ish,
which might make it more usable.
-Andy
Thanks for getting those thoughts out so quickly.
As far as points 1 and 2 go, it is definitely something to think about, but they are largely tangential to what I need to worry about at this moment. I am less concerned about "how much disk do we need to store this" than "is it even possible to store this".
If I'm understanding your point 3 correctly, you are referring to the OID/LOB type, which, from what I have read, is even more restrictive than bytea or text in that you have a limit of 4 billion objects per database (https://wiki.postgresql.org/wiki/BinaryFilesInDB). The fact that each object can be 2G rather than 1G is not much help to us as our objects are unlikely to ever been more than ~100k.
The table in question supports storing information either as XML, which we store in a text/char type column, or binary compressed xml, which is in the BLOB type column. The particular client I'm doing this for uses the compressed version, so all of their data in this table is binary.
Thanks
Jack
From: Andy Colson <andy@squeakycode.net>
To: Jack.O'Sullivan@tessella.com, pgsql-general@postgresql.org,
Date: 11/04/2014 16:24
Subject: Re: [GENERAL] CLOB & BLOB limitations in PostgreSQL
On 4/11/2014 9:45 AM, Jack.O'Sullivan@tessella.com wrote:
> I am working for a client who is interested in migrating from Oracle to
> Postgres. Their database is currently ~20TB in size, and is growing. The
> biggest table in this database is effectively a BLOB store and currently
> has around 1 billion rows.
>
> From reading around Postgres, there are a couple of limits which are
> concerning in terms of being able to migrate this database. We are not
> up against these limits just yet, but it is likely that they will be a
> potential blocker within the next few years.
>
> 1) Table can be maximum of 32TB (http://www.postgresql.org/about/)
>
> 2) When storing bytea or text datatypes there is a limit of 4 billion
> entries per table (https://wiki.postgresql.org/wiki/BinaryFilesInDB)
>
> With both of these, are they hard limits or can they be worked around
> with partitioning of tables? Could we set the table up in such a way
> that each child table was limited, but there was no limit on the number
> of children?
>
> With point two, does this mean that any table with a bytea datatype is
> limited to 4 billion rows (which would seem in conflict with the
> "unlimited rows" shown by http://www.postgresql.org/about)? If we had
> rows where the bytea was a "null" entry would they contribute towards
> this total or is it 4 billion non-null entries?
>
> Thanks.
Sorry I cant answer any of your questions, but I do have a few more to
raise:
1) I assume Oracle is pretty efficient on disk. You might wanna do a
quick test of a million rows or so and compare the on disk size of an
Oracle db vs PG. It wouldn't surprise me if PG used more space. (I
mean regular varchar, integer, etc.)
2) Does the Oracle blob compress? PG will compress but I'd bet they
compress differently. Again, you might wanna dump out a million blobs
and compare their space usage. At 20TB, a 10% increase in disk usage is
quite a bit.
3) There are two ways to store blob data. Bytea in your table and Large
Object support (in a separate table). Google "postgres bytea vs large
object" might offer useful reading.
I don't know if bytea or large object offer more efficient storage, but
it might be another thing you can test. Large object might be a little
more work to use, but if it saves lots of disk space, it might be worth it.
4) is this blob data binary'ish? We have json/hstore if its text'ish,
which might make it more usable.
-Andy
pgsql-general by date: