Thread: Toast, Text, blob bytea Huh?
I'm trying my best to convert from MySQL to PgSQL but I cant get a good clear answer about certian issures.Mainly TEXT, TOAST,BLOB , BYTEA etc. It was an easy task in mysql but everything in the archives about , text , toast and bytea is just confusing me with postgresql. I have Bruces's book and I've searched the archives years back with all the right keywords with not luck.Here is my situation--> WHAT I WAS DOING IN MYSQL Via the web my clients are uploading basic text/data files, sometimes > than 30MB. In the past ,via CGI I have been parsing the file into one STL string, using mysql_escape_string to escape it and then using an INSERT to place the ,\'+"stlstring+"\' , into a BLOB column. "dont want to use a temp. file or files in general anywhere. The data will always be passed via the database and buffers for certian reasons."Thus no OID's THIS IS WHAT I CANT SEEM TO FIGURE OUT IN POSTGRESQL 1. I cant get a clear answer on what kind of data type to use for my large text string? TEXT, ???, ??? or something about TOAST I have seen in the e-mail archive but cant find any documentaion? 2. I've written my own escape method ,("cant find one for Pgsql") , BUT i don't know what to escape and not to escape. So it keeps failing. I cand find any docs. on what to escape either? SUMMARY What is the best datatype to use, for large raw text and/or binary if i choose? Once I know this,how Im a supposed to escape my string and get it through the parser correctly so i can retrieve it correctly? Thanks for your time. PS: Using RedHat. Jason H. Ory Medprint+ Software Developer jason.ory@ndchealth.com
On Thu, 23 Aug 2001 13:47:01 -0400, you wrote: >1. I cant get a clear answer on what kind of data type to use for my large >text string? TEXT, ???, ??? or something about TOAST >I have seen in the e-mail archive but cant find any documentaion? TOAST is not a data type, but a project that extended the capacity of PostgreSQL from version 7.1 onwards to support fields up to 1 GB in length and rows of (practically) unlimited length. This makes TEXT and BYTEA good data types for storing large character and binary data. In the hands of a marketing department toasted data would be a great feature that justified the release of version 8 all by itself :-) However, unlike SQL3 Lobs, values of type TEXT and BYTEA are always transfered as one unit to the client. As a Java programmer I like that feature. A string is a string and a byte array is a byte array, no matter what it's length may be. Large Objects in PostgreSQL (http://www.ca.postgresql.org/users-lounge/docs/7.1/programmer/largeobjects.html) support file oriented access to large data. This may be a more convenient programming model, depending on the client interface that you use and the requirements of your application. However, Large Objects in PostgreSQL are objects that exist independently from the rows in which you hold a reference to them. If you delete or update a row, your application may need to delete certain Large Objects as well. In this respect, PostgreSQL Large Objects differ from SQL3 Lobs. >2. I've written my own escape method ,("cant find one for Pgsql") , BUT i >don't know what >to escape and not to escape. So it keeps failing. I cand find any docs. on >what to escape either? I'm not sure what you mean by that. What characters would you want to escape in what way and why? And by the way, what client interface are you using? Regards, René Pijlman
> I'm trying my best to convert from MySQL to PgSQL but I cant get a good > clear answer about > certian issures.Mainly TEXT, TOAST,BLOB , BYTEA etc. > It was an easy task in mysql but everything in the archives about , text , > toast and bytea is just > confusing me with postgresql. I have Bruces's book and I've searched the Quick glossary: TEXT is a datatype which stores character data of unspecified length (up to the max value of a 4 byte integer in length, although I've seen comments indicating that the practical limit is closer to 1 GB -- not sure why). TEXT is not intended to hold arbitrary binary data. If you want to store binary in a text column, encode it to hex or base64 or something first. TOAST is an internal database concept. If I understand it correctly, it refers to a combination of compression and out-of-line storage for large length values of a charater datatype. This happens transparently to you. BLOB is otherwise known as LO or Large Object datatype in PostgreSQL. These are always stored out-of-line, I don't believe they are compressed, and they have their own special access methods (for dealing with data a "chunk" at a time). BYTEA is very similar to TEXT, except that it is intended for binary data. I recently posted a PHP function which escapes binary in order to allow inserting it into a bytea column (on the pgsql-general list). At a minimum there are 4 characters which must be escaped. They are ACSII 0, 10, 39, and 92. They must reach PostgreSQL looking like \\000, \\012, \\047, and \\134 respectively (actually 39 could be \' and 92 could be \\\\, but it is simpler to be consistent). > THIS IS WHAT I CANT SEEM TO FIGURE OUT IN POSTGRESQL > 1. I cant get a clear answer on what kind of data type to use for my large > text string? TEXT, ???, ??? or something about TOAST > I have seen in the e-mail archive but cant find any documentaion? So, you can use TEXT if you encode to hex or base64 in your app first, or you bytea if you escape as I described above in your app. Or you can use the LO functions to manipulate large objects (see http://www.postgresql.org/idocs/index.php?lo-interfaces.html). > > 2. I've written my own escape method ,("cant find one for Pgsql") , BUT i > don't know what > to escape and not to escape. So it keeps failing. I cand find any docs. on > what to escape either? See above. > > > SUMMARY > What is the best datatype to use, for large raw text and/or binary if i > choose? > Once I know this,how Im a supposed to escape my string and get it through > the parser correctly so i can retrieve it correctly? If you use TEXT, you will have to decode the hex/base64 back into binary in your app. Similarly, if you use bytea, although stored as binary, the data is returned with "unprintable" values escaped as octals*. Your app will have to decode the octals back into binary. *NOTE to hackers: is there a good reason for this? ISTM that the client should be responsible for any encoding needed when bytea is returned. The server should return bytea as straight varlena. If you use LO, you have to use the interface functions instead of standard SQL. Hope this helps, -- Joe
Quoting Joe Conway <joseph.conway@home.com>: > TEXT is a datatype which stores character data of unspecified length (up > to > the max value of a 4 byte integer in length, although I've seen > comments > indicating that the practical limit is closer to 1 GB -- not sure why). It may be something to do with the 1Gb splitting of the physical files representing a table... Unless it changed recently, a table was split over multiple files at the 1Gb mark. Peter -- Peter Mount peter@retep.org.uk PostgreSQL JDBC Driver: http://www.retep.org.uk/postgres/ RetepPDF PDF library for Java: http://www.retep.org.uk/pdf/
Peter T Mount <peter@retep.org.uk> writes: > Quoting Joe Conway <joseph.conway@home.com>: >> indicating that the practical limit is closer to 1 GB -- not sure why). > It may be something to do with the 1Gb splitting of the physical files > representing a table... No, that's just a coincidence. The reason that TOAST limits fields to 1Gb is that the high-order two bits of the varlena length word were commandeered as TOAST state indicators. There are now only thirty bits available to represent the length of a variable-length Datum; hence the hard limit on field width is 1Gb. I'd think that the "practical" limit is quite a bit less than that, at least until we devise an API that lets you read and write toasted values in sections. regards, tom lane
> No, that's just a coincidence. The reason that TOAST limits fields to > 1Gb is that the high-order two bits of the varlena length word were > commandeered as TOAST state indicators. There are now only thirty bits > available to represent the length of a variable-length Datum; hence the > hard limit on field width is 1Gb. > > I'd think that the "practical" limit is quite a bit less than that, at > least until we devise an API that lets you read and write toasted values > in sections. Yes, passing around multi-gigabytes memory chunks in a process is pretty slow. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Peter T Mount wrote: > Quoting Joe Conway <joseph.conway@home.com>: > > > TEXT is a datatype which stores character data of unspecified length (up > > to > > the max value of a 4 byte integer in length, although I've seen > > comments > > indicating that the practical limit is closer to 1 GB -- not sure why). > > It may be something to do with the 1Gb splitting of the physical files > representing a table... Unless it changed recently, a table was split over > multiple files at the 1Gb mark. No, it's because the upper two bits of the variable size field are used as flags. But in practice there are other limits that force you to keep the objects you throw into text or bytea fields alot smaller. When your INSERT query is received, parsed, planned and a heap tuple created, there are at least fourcopies of that object in the backends memory. How much virtual memory does your OS support for one single process? And by the way, TOAST is not only used for character data types. All variable size data types in the base system are toastable. Well, arrays might be considered sort of pop-tarts here, but anyway, they get toasted. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
joseph.conway@home.com ("Joe Conway") wrote in message news:<02a101c12c0e$37856d60$48d210ac@jecw2k1>... ... snip > BYTEA is very similar to TEXT, except that it is intended for binary data. I > recently posted a PHP function which escapes binary in order to allow > inserting it into a bytea column (on the pgsql-general list). At a minimum > there are 4 characters which must be escaped. They are ACSII 0, 10, 39, and > 92. They must reach PostgreSQL looking like \\000, \\012, \\047, and \\134 > respectively (actually 39 could be \' and 92 could be \\\\, but it is > simpler to be consistent). ... snip Is it actually necessary to escape \012 (linefeed) in a query? My brief testing using psql or python pygresql seemed to work ok with only \000, \', and \\ escaped. Gosh, maybe all my data is corrupted (!!) Ryan
> joseph.conway@home.com ("Joe Conway") wrote in message news:<02a101c12c0e$37856d60$48d210ac@jecw2k1>... > ... snip > > BYTEA is very similar to TEXT, except that it is intended for binary data. I > > recently posted a PHP function which escapes binary in order to allow > > inserting it into a bytea column (on the pgsql-general list). At a minimum > > there are 4 characters which must be escaped. They are ACSII 0, 10, 39, and > > 92. They must reach PostgreSQL looking like \\000, \\012, \\047, and \\134 > > respectively (actually 39 could be \' and 92 could be \\\\, but it is > > simpler to be consistent). > ... snip > > Is it actually necessary to escape \012 (linefeed) in a query? My > brief testing using psql or python pygresql seemed to work ok with > only \000, \', and \\ escaped. Gosh, maybe all my data is corrupted > (!!) > Sorry the response is a week late, but your post just hit the list (at least I just got it). I found after sending this, that the problem with linefeeds was in my PHP code, so you should be OK :-) Sorry for the confusion I may have caused! Joe
> joseph.conway@home.com ("Joe Conway") wrote in message news:<02a101c12c0e$37856d60$48d210ac@jecw2k1>... > ... snip > > BYTEA is very similar to TEXT, except that it is intended for binary data. I > > recently posted a PHP function which escapes binary in order to allow > > inserting it into a bytea column (on the pgsql-general list). At a minimum > > there are 4 characters which must be escaped. They are ACSII 0, 10, 39, and > > 92. They must reach PostgreSQL looking like \\000, \\012, \\047, and \\134 > > respectively (actually 39 could be \' and 92 could be \\\\, but it is > > simpler to be consistent). > ... snip > > Is it actually necessary to escape \012 (linefeed) in a query? My > brief testing using psql or python pygresql seemed to work ok with > only \000, \', and \\ escaped. Gosh, maybe all my data is corrupted > (!!) The linefeed escape was reported by a PHP users and perhaps there is an issue with PHP only. Not sure. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026