Thread: BLOB
Hello
I would like to know
- how to create under psql a BLOB field in sql language
- how store a value (and not a file) in sql language
- the maximum size of the BLOB field
- how to create under psql a BLOB field in sql language
- how store a value (and not a file) in sql language
- the maximum size of the BLOB field
Thank you
pg
pg
> Hello > > I would like to know > - how to create under psql a BLOB field in sql language > - how store a value (and not a file) in sql language > - the maximum size of the BLOB field For now (and 7.0), there is no way to have rows >8K in a regular table. But I'm actually making alot of progress in breaking it. Seems we'll get rid of this restriction in 7.1. To share some info with the other developers: I just successfully stored the entire PG sources (~10M) in one table. Stored are all 936 .[chyl] files. The tables schema is: sources ( s_path text, s_data clob ) clob is a user defined, very simple varsize datatype, I created for testing. The entire source tree inserts in 23 seconds. If I cut off all sources at 7K and insert into a text field it needs 8 seconds. Not bad,because the amount of data shrinks <4M due to the cutoff. Reading the data back shows similar results, 0.8 secs vs. 0.3 secs. And the toaster already uses a heap+index to store values external. This all done with full toasting - i.e. try to compress, and only if it still doesn't fit move already compressed attribute into secondary relation. As a side effect, the 10M of sources only need 3.4M main and 2.6M secondaryheap. I experienced some trouble with occational "Deadlock" detections, when using multiple processes hammeringon the toaster. But I was able to reproduce the same error with a plain 7.0 and "text" attributes (onlyfar less frequent). Seems there's something wrong in our deadlock detection algorithm. Another thing is, that ISTM that index's never shrink on vacuum. One requirement of TOAST is, that the secondaries tables index-OID is remembered in the external reference, so the ref can be passed around in the entire backend and whenever needed, it's real value can be fetched quickly by an index scan. Thus, it's not possibleto drop/recreate an index on that right now. Seems we need a "REBUILD INDEX" utility or the like. Anyway, TOAST is on it's way. And as soon as we go for 7.1, I'll need alot of help to make all our existingtypes toastable, add administrative utility commands and teach pg_dump to deal with all that. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes: > To share some info with the other developers: > I just successfully stored the entire PG sources (~10M) in > one table. Stored are all 936 .[chyl] files. Cool! Sounds like you are making great progress. > Anyway, TOAST is on it's way. And as soon as we go for 7.1, > I'll need alot of help to make all our existing types > toastable, add administrative utility commands and teach > pg_dump to deal with all that. Question: pg_dump shouldn't need to know about toasted tuples as such, should it? I'm guessing that what you are thinking of is adding administration capabilities to control whether particular columns are toast-able, and then pg_dump would need to know about saving and loading those control fields. But toasting a particular field value ought to be invisible to applications... regards, tom lane
Tom Lane wrote: > wieck@debis.com (Jan Wieck) writes: > > To share some info with the other developers: > > I just successfully stored the entire PG sources (~10M) in > > one table. Stored are all 936 .[chyl] files. > > Cool! Sounds like you are making great progress. At least I think so :-) > > Anyway, TOAST is on it's way. And as soon as we go for 7.1, > > I'll need alot of help to make all our existing types > > toastable, add administrative utility commands and teach > > pg_dump to deal with all that. > > Question: pg_dump shouldn't need to know about toasted tuples as > such, should it? I'm guessing that what you are thinking of is > adding administration capabilities to control whether particular > columns are toast-able, and then pg_dump would need to know about > saving and loading those control fields. But toasting a particular > field value ought to be invisible to applications... Answer: Right! The only change from a clients PoV is that attribute values can be bigger now. So it's only to teach pg_dump to correctly restore all the possible tuning. For now there is an attstorage(char) attribute telling in the TupleDesc, what can be done with a particular value in a HeapTuple. Themeanings are: 'p' don't touch, either DBA doesn't want or the type isn't capable of dealing with it. 'e' never try tocompress, if needed store external as is. 'x' try to compress and if needed store compression result external. 'm' try to keep in main tuple as long as possible, if needed handle like 'x'. If a tuple doesn't fit into a block, the toaster performs 4 loops over the attributes, each one stops immediately when the tuple fits. 1. Inline compress storage 'x' attributes. 2. Move storage 'x' and 'e' attributes to secondary relation. 3. Inline compress storage 'm' attributes. 4. Move storage 'm' attributes to secondary relation. My plan now is to add a default storage value to pg_type. At CREATE TABLE time, a secondary relation plus it's indexare automatically created if attributes of the table are toastable. This is questionable, because it'llcost 2 inodes and 16K per table, but we can change it later and have special admin commands to deal withit. The toaster doesn't jump in if a relation doesn't have a secondary one. But it will expand values from otherrelations in that case (INSERT ... SELECT). With the pg_type entry telling what to do by default, we can work on all our types one by one, and enable them at thetime all their type specific functions can handle it. Finally, I'd like to give the secondary relations another relkind, so they are ignored by vacuum, and then vacuumthem explicitly while the LOCK on the master table is still held. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
> -----Original Message----- > From: pgsql-sql-owner@hub.org [mailto:pgsql-sql-owner@hub.org]On Behalf > Of Jan Wieck > > > Hello > > > > I would like to know > > - how to create under psql a BLOB field in sql language > > - how store a value (and not a file) in sql language > > - the maximum size of the BLOB field > > For now (and 7.0), there is no way to have rows >8K in a > regular table. > > But I'm actually making alot of progress in breaking it. > Seems we'll get rid of this restriction in 7.1. > > To share some info with the other developers: > > I experienced some trouble with occational "Deadlock" > detections, when using multiple processes hammering on the > toaster. But I was able to reproduce the same error with a > plain 7.0 and "text" attributes (only far less frequent). > Seems there's something wrong in our deadlock detection > algorithm. > Does the trouble mean that "Deadlock" message was issued for non-deadlock cases ? > Another thing is, that ISTM that index's never shrink on > vacuum. One requirement of TOAST is, that the secondaries > tables index-OID is remembered in the external reference, so > the ref can be passed around in the entire backend and > whenever needed, it's real value can be fetched quickly by an > index scan. Thus, it's not possible to drop/recreate an index > on that right now. Seems we need a "REBUILD INDEX" utility or > the like. > REINDEX command could do it in a sense. However current implementation couldn't guarantee the existence of proper index in case of abort. Proper implementation would require the standard transactional control mechanism to handle relation files such as new naming rule of relation files.... Regards. Hiroshi Inoue Inoue@tpf.co.jp
Jan Wieck writes: > I just successfully stored the entire PG sources (~10M) in > one table. Stored are all 936 .[chyl] files. Cool, then we can get rid of CVS. :-) > clob is a user defined, very simple varsize datatype, I > created for testing. Keep it, it's SQL3. > Seems there's something wrong in our deadlock detection > algorithm. Our deadlock detection "algorithm" is that when nothing happens for 1 sec then that's a deadlock. Increasing that number might make those messages go away but that's still far from an algorithm, of course. > I'll need alot of help to make all our existing types > toastable, I'm wondering how transparent all of this will be. What is involved in making existing types toastable? How does that affect user defined datatypes now and in the future? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: >> I'll need alot of help to make all our existing types >> toastable, > I'm wondering how transparent all of this will be. What is involved in > making existing types toastable? How does that affect user defined > datatypes now and in the future? I'd like to think that we can fold the TOAST support into the argument-access macros that I plan to be defining for the fmgr rewrite. See previous rants on this subject, eg 14-Jun-99, 23-Oct-99 (in pgsql-hackers archives). Net result for user-defined-datatype authors will be "if you revise your routines, they will be easier to read, more portable, and will support TOASTed values. If you don't, they'll still work about as well (or poorly) as they did before." regards, tom lane
Peter Eisentraut wrote: > > clob is a user defined, very simple varsize datatype, I > > created for testing. > > Keep it, it's SQL3. We don't need it later. At the time text is toastable, we can simply create an alias for it and are done. And I think blob is bytea then, isn't it? > > Seems there's something wrong in our deadlock detection > > algorithm. > > Our deadlock detection "algorithm" is that when nothing happens for 1 sec > then that's a deadlock. Increasing that number might make those messages > go away but that's still far from an algorithm, of course. Really? I remember that someday it was "if I'm waiting 60 seconds for my lock, ...". What a change. What's the exact location of that parameter to change? > > I'll need alot of help to make all our existing types > > toastable, > > I'm wondering how transparent all of this will be. What is involved in > making existing types toastable? How does that affect user defined > datatypes now and in the future? Let's look at the input/output functions of clob: void * clob_in(char *s) { void *c; int32 l; if (s == NULL) return NULL; l = strlen(s); c = (void *)palloc(VARHDRSZ + l); VARATT_SIZEP(c) = l + VARHDRSZ; memcpy(VARATT_DATA(c), s, l); return c; } char * clob_out(void *c) { void *p; char *s; int32 l; if (c == NULL) { s = palloc(2); s[0] = '-'; s[1] = '\0'; return s; } VARATT_GETPLAIN(c, p); l = VARATT_SIZE(p) - VARHDRSZ; s = (char *)palloc(l + 1); memcpy(s, VARATT_DATA(p), l); s[l] = '\0'; VARATT_FREE(c, p); return s; } So the input function doesn't change at all. Only functions that have a toastable type as argument need to wraparound with a local variable and the VARATT_GETPLAIN(), VARATT_FREE() macros. VARATT_GETPLAIN(arg, ptr) places the plain value of arg in ptr. If the argument wasn't toasted, it's assigned asis. If compressed or stored external, the original value is reconstructed in palloc()'d memory and assignedto ptr. VARATT_FREE(arg, ptr) free()'s ptr if it is different from arg. It all ain't that complicated. Easy enough to use it in user defined types too (a must because today's user defined functions usually use our base types too). Only that there are hundreds of functions in utils/adt and contribthat need to be looked at. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
Tom Lane wrote: > I'd like to think that we can fold the TOAST support into the > argument-access macros that I plan to be defining for the fmgr rewrite. > See previous rants on this subject, eg 14-Jun-99, 23-Oct-99 (in > pgsql-hackers archives). So we need to fold the VARATT_... macro's I'm using now into the FMGR_... macros. Or define the macro's you want now. > Net result for user-defined-datatype authors will be "if you revise > your routines, they will be easier to read, more portable, and will > support TOASTed values. If you don't, they'll still work about as > well (or poorly) as they did before." Sorry, but that ain't true. Making an existing type toastable means that all functions, receiving that type needto be revised. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes: >> Net result for user-defined-datatype authors will be "if you revise >> your routines, they will be easier to read, more portable, and will >> support TOASTed values. If you don't, they'll still work about as >> well (or poorly) as they did before." > Sorry, but that ain't true. Making an existing type toastable > means that all functions, receiving that type need to be > revised. What I meant was that they'd still work, with a limit on field size, just like before. ie, no TOAST support. regards, tom lane
> wieck@debis.com (Jan Wieck) writes: > >> Net result for user-defined-datatype authors will be "if you revise > >> your routines, they will be easier to read, more portable, and will > >> support TOASTed values. If you don't, they'll still work about as > >> well (or poorly) as they did before." > > > Sorry, but that ain't true. Making an existing type toastable > > means that all functions, receiving that type need to be > > revised. > > What I meant was that they'd still work, with a limit on field size, > just like before. ie, no TOAST support. Yes, but at the first time, a toasted value is handed to them the result (up to backend crash) is unpredictable. So any user defined function taking "text" as argument is potentially in danger! Better tell them they have to revise. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes: >> What I meant was that they'd still work, with a limit on field size, >> just like before. ie, no TOAST support. > Yes, but at the first time, a toasted value is handed to them > the result (up to backend crash) is unpredictable. So any > user defined function taking "text" as argument is > potentially in danger! Oh, right, user-defined functions on system types will have problems if the system type has been marked toastable. I was thinking of user-defined datatypes, for which it'd be OK to leave the type untoastable if you didn't want to fix the associated functions right away. > Better tell them they have to revise. Well, hmm. It's not out of the question that we could continue to support old user-defined functions even on toastable types. There is going to be a compatibility wrapper anyway around any function that's adhering to the old fmgr interface. So with just a little more work, we could make that wrapper expand any toasted values that are being presented to the function. Only new-style functions would ever get passed toasted values directly. This'd also make it a lot less painful to convert the built-in functions, of course. The only downside I see in this is that it'd take a few extra catalog lookups to determine which arguments are toastable types and thus potentially in need of untoasting. I don't think that'd be a big loss, since we normally only do the catalog lookups for a function reference once per query anyway. Seem reasonable? regards, tom lane
Jan Wieck writes: > Really? I remember that someday it was "if I'm waiting 60 > seconds for my lock, ...". What a change. > > What's the exact location of that parameter to change? include/config.h:#define DEADLOCK_CHECK_TIMER 1 This only has second resolution. I changed it to milliseconds for my configuration clean-up (details coming soon), but I'm not sure if that's worth a lot. The logic behind this is in backend/storage/lmgr/proc.c. > Only that there > are hundreds of functions in utils/adt and contrib that need > to be looked at. What datatypes besides text'ish and binary data'ish do you want to toast? SQL(3) only defines CLOB and BLOB, so they might be thinking along my lines. I mean surely having toastable polygons and numerics has some theoretical value but is it worth bothering? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: > What datatypes besides text'ish and binary data'ish do you want to toast? > SQL(3) only defines CLOB and BLOB, so they might be thinking along my > lines. I mean surely having toastable polygons and numerics has some > theoretical value but is it worth bothering? Good point. The only possible candidates for toasting are varlena types, which are select oid,typname from pg_type where typtype='b' and typlen < 0 and typname !~ '^_';oid | typname ------+--------- 17 | bytea 25 | text 32 | SET 602 | path 604 | polygon 705 | unknown 869 | inet 650 | cidr1042 | bpchar1043| varchar1560 | bit1562 | varbit1625 | lztext1700 | numeric (14 rows) plus array types (of which there are lots, but only one set of access routines needs to deal with toasting). I find it hard to foresee any need for toasted inet/cidr data ;-), and I'm not even sure what SET is. And unknown is just a placeholder, and lztext is going away anyway. So I'd say we have MUSTs: bytea, text, bpchar, varchar, arrays WANTs: bit, varbit, numeric LOW PRIORITY: path, polygon which is not so many types after all... although some of these are used by a lot of routines: select pg_type.oid,typname,count(*) from pg_type, pg_proc where typtype='b' and typlen < 0 and typname !~ '^_' and pg_type.oid in (proargtypes[0],proargtypes[1],proargtypes[2],proargtypes[3], proargtypes[4],proargtypes[5],proargtypes[6],proargtypes[7]) group by pg_type.oid,typname;oid | typname | count ------+---------+------- 17 | bytea | 6 25 | text | 88 602 | path | 29 604 | polygon | 24 869 | inet | 161042 | bpchar | 221043 | varchar | 141560 | bit | 151562 | varbit | 151625 | lztext | 121700 | numeric | 45 (11 rows) (This is an overestimate since routines with multiple arguments may be counted in more than one category...) regards, tom lane