Thread: Am I best off keeping large chunks of text in a separate table?
I have a table that looks something like this: url - character varying(1024) date - timestamptz body - text Url is a unique primary key. Body can potentially be a couple hundred k of text. There will at first be perhaps 100,000 rows in this table, but at some point it might get into the millions. I need to be able to quickly insert into this table (I might be inserting several rows per second at times). I also need to be able to very quickly see if a URL already exists in the table, and what the date value is. Or, query for all "urls" that have a "date" older than x days. Am I better off with two tables such as: Table1: id - uuid or integer (primary key) url - unique index date Table2: id - FK to Table2.id body - text It makes the program flow a bit more complicated, and I'd have to use transactions and stuff when inserting new rows. However, for years I've been told that having rows with large chunks of text is bad for perf and forces that data to be paged into memory and causes other various issues. Any advice on this one? Thanks! Mike
On Sat, Sep 17, 2011 at 6:46 PM, Mike Christensen <mike@kitchenpc.com> wrote: > I have a table that looks something like this: > > url - character varying(1024) > date - timestamptz > body - text > > Url is a unique primary key. Body can potentially be a couple hundred > k of text. > > There will at first be perhaps 100,000 rows in this table, but at some > point it might get into the millions. > > I need to be able to quickly insert into this table (I might be > inserting several rows per second at times). I also need to be able > to very quickly see if a URL already exists in the table, and what the > date value is. Or, query for all "urls" that have a "date" older than > x days. > > Am I better off with two tables such as: > > Table1: > id - uuid or integer (primary key) > url - unique index > date > > Table2: > > id - FK to Table2.id > body - text > > It makes the program flow a bit more complicated, and I'd have to use > transactions and stuff when inserting new rows. However, for years > I've been told that having rows with large chunks of text is bad for > perf and forces that data to be paged into memory and causes other > various issues. Any advice on this one? Thanks! What would be really cool is if postgresql took values for body that were over a few k and compressed them and stored them out of line in another table. Luckily for you, that's EXACTLY what it already does. http://www.postgresql.org/docs/9.1/static/storage-toast.html Cool eh?
> On Sat, Sep 17, 2011 at 6:46 PM, Mike Christensen <mike@kitchenpc.com> wrote: >> I have a table that looks something like this: >> >> url - character varying(1024) >> date - timestamptz >> body - text >> >> Url is a unique primary key. Body can potentially be a couple hundred >> k of text. >> >> There will at first be perhaps 100,000 rows in this table, but at some >> point it might get into the millions. >> >> I need to be able to quickly insert into this table (I might be >> inserting several rows per second at times). I also need to be able >> to very quickly see if a URL already exists in the table, and what the >> date value is. Or, query for all "urls" that have a "date" older than >> x days. >> >> Am I better off with two tables such as: >> >> Table1: >> id - uuid or integer (primary key) >> url - unique index >> date >> >> Table2: >> >> id - FK to Table2.id >> body - text >> >> It makes the program flow a bit more complicated, and I'd have to use >> transactions and stuff when inserting new rows. However, for years >> I've been told that having rows with large chunks of text is bad for >> perf and forces that data to be paged into memory and causes other >> various issues. Any advice on this one? Thanks! > > What would be really cool is if postgresql took values for body that > were over a few k and compressed them and stored them out of line in > another table. Luckily for you, that's EXACTLY what it already does. > http://www.postgresql.org/docs/9.1/static/storage-toast.html Cool eh? > Man I've been reading this list for years now, and I kept on seeing this "TOAST" thing and just figured you people liked it for sandwiches or something. I feel like the programmer who thinks he's smart using a left bitshift operator to double an integer value just to find out the compiler already takes that optimization anyway. Are you saying I don't actually need to de-frag my hard drive these days either? Thanks for the quick reply! I will design my table in a way that logically makes sense to me. Mike
On Sat, Sep 17, 2011 at 7:15 PM, Mike Christensen <mike@kitchenpc.com> wrote: > > Man I've been reading this list for years now, and I kept on seeing > this "TOAST" thing and just figured you people liked it for sandwiches > or something. > > I feel like the programmer who thinks he's smart using a left bitshift > operator to double an integer value just to find out the compiler > already takes that optimization anyway. Are you saying I don't > actually need to de-frag my hard drive these days either? Not if you're running most flavors of unix and not filling up the hard drive too much. Ext3 and ext4 both tend to defrag as they go, and not frag much anyway, especially if you leave a little bit of free space for overhead. :) > Thanks for the quick reply! I will design my table in a way that > logically makes sense to me. Cool. Toast is one of those mad scientist ideas that really works and works well. Before it pg had a pretty severe limit on text / varchar columns.
Mike Christensen <mike@kitchenpc.com> writes: >> On Sat, Sep 17, 2011 at 6:46 PM, Mike Christensen <mike@kitchenpc.com> wrote: >> >> What would be really cool is if postgresql took values for body that >> were over a few k and compressed them and stored them out of line in >> another table. Luckily for you, that's EXACTLY what it already does. >> http://www.postgresql.org/docs/9.1/static/storage-toast.html Cool eh? >> > > Man I've been reading this list for years now, and I kept on seeing > this "TOAST" thing and just figured you people liked it for sandwiches > or something. > > I feel like the programmer who thinks he's smart using a left bitshift > operator to double an integer value just to find out the compiler > already takes that optimization anyway. Are you saying I don't > actually need to de-frag my hard drive these days either? > > Thanks for the quick reply! I will design my table in a way that Do so at your own risk :-) Depending on the data model, it could still be good for performance to store the large payload data in a secondary table linked to the main table containing other data or metadata. Especially true if the payload fields are quite often small enough to *not* cause toasting. In that scenario, you could end up with some very long physical tuples in the main table that will result in a low tuple/page ratio and heavy disk reading for any query including those not caring about the payload data itself. Yet more true if some of the non payload data is frequently updated. YMMV > Mike > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@comcast.net p: 305.321.1144