Thread: Am I best off keeping large chunks of text in a separate table?

Am I best off keeping large chunks of text in a separate table?

From

Mike Christensen

Date:

17 September 2011, 21:46:33

I have a table that looks something like this:

url - character varying(1024)
date - timestamptz
body - text

Url is a unique primary key.  Body can potentially be a couple hundred
k of text.

There will at first be perhaps 100,000 rows in this table, but at some
point it might get into the millions.

I need to be able to quickly insert into this table (I might be
inserting several rows per second at times).  I also need to be able
to very quickly see if a URL already exists in the table, and what the
date value is.  Or, query for all "urls" that have a "date" older than
x days.

Am I better off with two tables such as:

Table1:
id - uuid or integer (primary key)
url - unique index
date

Table2:

id - FK to Table2.id
body - text

It makes the program flow a bit more complicated, and I'd have to use
transactions and stuff when inserting new rows.  However, for years
I've been told that having rows with large chunks of text is bad for
perf and forces that data to be paged into memory and causes other
various issues.  Any advice on this one?  Thanks!

Mike

Re: Am I best off keeping large chunks of text in a separate table?

From

Scott Marlowe

Date:

17 September 2011, 22:02:34

On Sat, Sep 17, 2011 at 6:46 PM, Mike Christensen <mike@kitchenpc.com> wrote:
> I have a table that looks something like this:
>
> url - character varying(1024)
> date - timestamptz
> body - text
>
> Url is a unique primary key.  Body can potentially be a couple hundred
> k of text.
>
> There will at first be perhaps 100,000 rows in this table, but at some
> point it might get into the millions.
>
> I need to be able to quickly insert into this table (I might be
> inserting several rows per second at times).  I also need to be able
> to very quickly see if a URL already exists in the table, and what the
> date value is.  Or, query for all "urls" that have a "date" older than
> x days.
>
> Am I better off with two tables such as:
>
> Table1:
> id - uuid or integer (primary key)
> url - unique index
> date
>
> Table2:
>
> id - FK to Table2.id
> body - text
>
> It makes the program flow a bit more complicated, and I'd have to use
> transactions and stuff when inserting new rows.  However, for years
> I've been told that having rows with large chunks of text is bad for
> perf and forces that data to be paged into memory and causes other
> various issues.  Any advice on this one?  Thanks!

What would be really cool is if postgresql took values for body that
were over a few k and compressed them and stored them out of line in
another table.  Luckily for you, that's EXACTLY what it already does.
http://www.postgresql.org/docs/9.1/static/storage-toast.html  Cool eh?

Re: Am I best off keeping large chunks of text in a separate table?

From

Mike Christensen

Date:

17 September 2011, 22:15:09

> On Sat, Sep 17, 2011 at 6:46 PM, Mike Christensen <mike@kitchenpc.com> wrote:
>> I have a table that looks something like this:
>>
>> url - character varying(1024)
>> date - timestamptz
>> body - text
>>
>> Url is a unique primary key.  Body can potentially be a couple hundred
>> k of text.
>>
>> There will at first be perhaps 100,000 rows in this table, but at some
>> point it might get into the millions.
>>
>> I need to be able to quickly insert into this table (I might be
>> inserting several rows per second at times).  I also need to be able
>> to very quickly see if a URL already exists in the table, and what the
>> date value is.  Or, query for all "urls" that have a "date" older than
>> x days.
>>
>> Am I better off with two tables such as:
>>
>> Table1:
>> id - uuid or integer (primary key)
>> url - unique index
>> date
>>
>> Table2:
>>
>> id - FK to Table2.id
>> body - text
>>
>> It makes the program flow a bit more complicated, and I'd have to use
>> transactions and stuff when inserting new rows.  However, for years
>> I've been told that having rows with large chunks of text is bad for
>> perf and forces that data to be paged into memory and causes other
>> various issues.  Any advice on this one?  Thanks!
>
> What would be really cool is if postgresql took values for body that
> were over a few k and compressed them and stored them out of line in
> another table.  Luckily for you, that's EXACTLY what it already does.
> http://www.postgresql.org/docs/9.1/static/storage-toast.html  Cool eh?
>

Man I've been reading this list for years now, and I kept on seeing
this "TOAST" thing and just figured you people liked it for sandwiches
or something.

I feel like the programmer who thinks he's smart using a left bitshift
operator to double an integer value just to find out the compiler
already takes that optimization anyway.  Are you saying I don't
actually need to de-frag my hard drive these days either?

Thanks for the quick reply!  I will design my table in a way that
logically makes sense to me.

Mike

Re: Am I best off keeping large chunks of text in a separate table?

From

Scott Marlowe

Date:

17 September 2011, 22:20:46

On Sat, Sep 17, 2011 at 7:15 PM, Mike Christensen <mike@kitchenpc.com> wrote:
>
> Man I've been reading this list for years now, and I kept on seeing
> this "TOAST" thing and just figured you people liked it for sandwiches
> or something.
>
> I feel like the programmer who thinks he's smart using a left bitshift
> operator to double an integer value just to find out the compiler
> already takes that optimization anyway.  Are you saying I don't
> actually need to de-frag my hard drive these days either?

Not if you're running most flavors of unix and not filling up the hard
drive too much.  Ext3 and ext4 both tend to defrag as they go, and not
frag much anyway, especially if you leave a little bit of free space
for overhead. :)

> Thanks for the quick reply!  I will design my table in a way that
> logically makes sense to me.

Cool.  Toast is one of those mad scientist ideas that really works and
works well.  Before it pg had a pretty severe limit on text / varchar
columns.

Re: Am I best off keeping large chunks of text in a separate table?

From

Jerry Sievers

Date:

19 September 2011, 19:30:22

Mike Christensen <mike@kitchenpc.com> writes:

>> On Sat, Sep 17, 2011 at 6:46 PM, Mike Christensen <mike@kitchenpc.com> wrote:
>>
>> What would be really cool is if postgresql took values for body that
>> were over a few k and compressed them and stored them out of line in
>> another table.  Luckily for you, that's EXACTLY what it already does.
>> http://www.postgresql.org/docs/9.1/static/storage-toast.html  Cool eh?
>>
>
> Man I've been reading this list for years now, and I kept on seeing
> this "TOAST" thing and just figured you people liked it for sandwiches
> or something.
>
> I feel like the programmer who thinks he's smart using a left bitshift
> operator to double an integer value just to find out the compiler
> already takes that optimization anyway.  Are you saying I don't
> actually need to de-frag my hard drive these days either?
>
> Thanks for the quick reply!  I will design my table in a way that

Do so at your own risk :-)

Depending on the data model, it could still be good for performance to
store the large payload data in a secondary table linked to the main
table containing other data or metadata.

Especially true if the payload fields are quite often small enough to
*not* cause toasting.  In that scenario, you could end up with some
very long physical tuples in the main table that will result in  a low
tuple/page ratio and heavy disk reading  for any query including those
not caring about the payload data itself.

Yet more true if some of the non payload data is frequently   updated.

YMMV
> Mike
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 305.321.1144