Thread: Fixed length data types issue
So I'm thinking again about the problems with fixed length data types not having typmod available when they would need it. But I'm having trouble finding enough old posts to get a handle on exactly what the problem is. This would make a nice test of the new wiki. I would be indebted to whoever could summarize the root of the problem and explain exactly what circumstances the typmod is unavailable. I would summarize the responses and put them up on the wiki. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > So I'm thinking again about the problems with fixed length data types not > having typmod available when they would need it. But I'm having trouble > finding enough old posts to get a handle on exactly what the problem is. The problem is it isn't available ;-) AFAIR the only context where datatype-specific functions *do* get passed typmod is in the invocation of a datatype input function or length coercion function. And in those contexts the semantics are really "convert the input to match this typmod", not "this typmod describes what you've been passed". The basic rule here is that you have to be able to find out everything you need to know about a given instance of a datatype just by looking at the Datum. If you try to rely on external data then you have the same security problems that we had to redesign output functions to get rid of: there's not sufficient guarantee that the external data actually matches the datum. regards, tom lane
On Tue, Sep 05, 2006 at 02:48:45PM +0100, Gregory Stark wrote: > > > So I'm thinking again about the problems with fixed length data types not > having typmod available when they would need it. But I'm having trouble > finding enough old posts to get a handle on exactly what the problem is. Like Tom said, the problem is you don't have it. In the specific case of type input functions, what typmod is the output? For type output functions relying on a passed typmod is a security risk. So you end up storing the typmod in the Datum itself, which brings you right back to varlena. > This would make a nice test of the new wiki. I would be indebted to whoever > could summarize the root of the problem and explain exactly what circumstances > the typmod is unavailable. I would summarize the responses and put them up on > the wiki. Well, the root of the problem depends on your perspective. If the purpose behind all of this is to save disk space, perhaps the root of the problem is that disk representation and memory representation are intimately tied? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > So you end up storing the typmod in the Datum itself, which brings you > right back to varlena. Not really since the Datum doesn't actually end up on disk in the case of pass-by-reference. which leads us to: > Well, the root of the problem depends on your perspective. If the > purpose behind all of this is to save disk space, perhaps the root of > the problem is that disk representation and memory representation are > intimately tied? Indeed. Consider this real table definition I found in a few moments searching for schemas on google: PRVDR_CTRL_TYPE_CD: CHAR(2) PRVDR_NUM: CHAR(6) NPI: NUMBER RPT_STUS_CD: CHAR(1) FY_BGN_DT: DATE FY_END_DT: DATE PROC_DT: DATE INITL_RPT_SW: CHAR(1) LAST_RPT_SW: CHAR(1) TRNSMTL_NUM: CHAR(2) FI_NUM: CHAR(5) ADR_VNDR_CD:CHAR(1) FI_CREAT_DT: DATE UTIL_CD: CHAR(1) NPR_DT: DATE SPEC_IND: CHAR(1) FI_RCPT_DT: DATE By my count postgres would use 154 bytes for this record. Whereas in fact there's no need for it to take more than 87 bytes. Almost 100% overhead for varattlen headers and the padding they necessitate. This is not a pathological example. This is a very common style of database schema definition. Many many database tables in the real world are a 1-1 translations of existing flat file databases which have lots of short fixed length ascii codes. Any database interacting with any old school inventory management systems, financial databases, marketing database, etc is likely to be of this form. So it seems what has to happen here is we need a way of defining a data type that has a different on-disk representation from its in-memory definition. That means a lot more cpu overhead since I imagine it will mean pallocing the in-memory representation before you can actually do anything with the data. The disk reader and writer functions could probably use the typmod but it seems what they really want to have access to is the attlen because what they really want to know is the length of the object that their pointer refers to. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Martijn van Oosterhout <kleptog@svana.org> writes: > >> So you end up storing the typmod in the Datum itself, which brings you >> right back to varlena. > > Not really since the Datum doesn't actually end up on disk in the case of > pass-by-reference. Just brain storming here. But what happens if we make Datum 2*sizeof(pointer) and stored the typmod and/or attlen in it? Obviously it means the memory use goes up dramatically. But the disk i/o could potentially be reduced dramatically as well. Does it let us do anything else we've been dreaming of but not thought doable? Does it cause any fundamental problems? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, Sep 07, 2006 at 11:57:26AM +0100, Gregory Stark wrote: > Just brain storming here. But what happens if we make Datum 2*sizeof(pointer) > and stored the typmod and/or attlen in it? The fundamental property of a Datum is that you can pass it by value to a C function. This generally means it has to fit in a register. On the whole, the CPU register size is the same as the pointer size, so 2*sizeof(pointer) is unlikely to fit... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On 2006-09-07, Gregory Stark <stark@enterprisedb.com> wrote: > Consider this real table definition I found in a few moments searching for > schemas on google: [snip table with lots of fixed-length char fields] > > By my count postgres would use 154 bytes for this record. Whereas in fact > there's no need for it to take more than 87 bytes. Are you sure? Perhaps you are assuming that a char(1) field can be made to be fixed-length; this is not the case (consider utf-8 for example). -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Andrew - Supernews <andrew+nonews@supernews.com> writes: > Are you sure? Perhaps you are assuming that a char(1) field can be made > to be fixed-length; this is not the case (consider utf-8 for example). Well that could still be fixed length, it would just be a longer fixed length. (In theory it would have to be 6 bytes long which I suppose would open up the argument that if you're usually storing 7-bit ascii then a varlena would usually be shorter.) In any case I think the intersection of columns for which you care about i18n and columns that you're storing according to an old-fashioned fixed column layout is pretty much nil. And not just because it hasn't been updated to modern standards either. If you look again at the columns in my example you'll see none of them are appropriate targets for i18n anyways. They're all codes and even numbers. In other words if you're actually storing localized text then you almost certainly will be using a text or varchar and probably won't even have a maximum size. The use case for CHAR(n) is when you have fixed length statically defined strings that are always the same length. it doesn't make sense to store these in UTF8. Currently Postgres has a limitation that you can only have one encoding per database and one locale per cluster. Personally I'm of the opinion that the only correct choice for that is "C" and all localization should be handled in the client and with pg_strxfrm. Putting the whole database into non-C locales guarantees that the columns that should not be localized will have broken semantics and there's no way to work around things in the other direction. Perhaps given the current situation what we should have is a cvarchar and cchar data types that are like varchar and char but guaranteed to always be interpreted in the c locale with ascii encoding. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, Sep 07, 2006 at 01:11:49PM +0100, Gregory Stark wrote: > > Martijn van Oosterhout <kleptog@svana.org> writes: > > > The fundamental property of a Datum is that you can pass it by value to > > a C function. This generally means it has to fit in a register. On the > > whole, the CPU register size is the same as the pointer size, so > > 2*sizeof(pointer) is unlikely to fit... > > Not having it fit in a register might impact performance but it certainly > isn't a requirement. You can pass whole structs by value in modern C. (And by > modern here I don't mean C99, this has been supported since before ANSI and is > required by *C89*). Sure, the C compiler pushes it on the stack and passes a pointer to the function. Pass-by-value in this context means "pass a reference to a copy". It works, but it's not very efficient. The C compiler also allows you create struct variables and assign them as if they were plain variables. The assembly code to make this work isn't pretty. You're proposing doing it for everywhere in the backend, which seems like a huge cost for very little gain. A better approach would be to revive the proposal for a variable-length varlena header. It's four-bytes fixed because that's easy, but given most values are under 4K you could come up with a coding scheme that cut the header for such Datums to only 2 bytes, or less... Have a ncie day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > The fundamental property of a Datum is that you can pass it by value to > a C function. This generally means it has to fit in a register. On the > whole, the CPU register size is the same as the pointer size, so > 2*sizeof(pointer) is unlikely to fit... Not having it fit in a register might impact performance but it certainly isn't a requirement. You can pass whole structs by value in modern C. (And by modern here I don't mean C99, this has been supported since before ANSI and is required by *C89*). -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, Sep 07, 2006 at 01:27:01PM +0100, Gregory Stark wrote: > ... If you look again at the columns in my example you'll > see none of them are appropriate targets for i18n anyways. They're all codes > and even numbers. Which begs the question of why they don't store the numbers in numeric columns? That'll take far less space than any string. > In other words if you're actually storing localized text then you almost > certainly will be using a text or varchar and probably won't even have a > maximum size. The use case for CHAR(n) is when you have fixed length > statically defined strings that are always the same length. it doesn't make > sense to store these in UTF8. It makes sense to store them as numbers, or perhaps an enum. > Currently Postgres has a limitation that you can only have one encoding per > database and one locale per cluster. Personally I'm of the opinion that the > only correct choice for that is "C" and all localization should be handled in > the client and with pg_strxfrm. Putting the whole database into non-C locales > guarantees that the columns that should not be localized will have broken > semantics and there's no way to work around things in the other direction. Quite. So if someone would code up SQL COLLATE support and integrate ICU, everyone would be happy and we could all go home. BTW, requireing localisation to happen in the client is silly. SQL provides the ORDER BY clause for strings and it'd be silly to have the client resort them just because they're not using C locale. The point of a database was to make your life easier, right? > Perhaps given the current situation what we should have is a cvarchar and > cchar data types that are like varchar and char but guaranteed to always be > interpreted in the c locale with ascii encoding. I think bytea gives you that, pretty much. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, Sep 07, 2006 at 11:57:26AM +0100, Gregory Stark wrote: >> Just brain storming here. But what happens if we make Datum 2*sizeof(pointer) >> and stored the typmod and/or attlen in it? > The fundamental property of a Datum is that you can pass it by value to > a C function. This generally means it has to fit in a register. On the > whole, the CPU register size is the same as the pointer size, so > 2*sizeof(pointer) is unlikely to fit... Not to mention the problem that such a change would break every single datatype-manipulation function in both the core backend and every user-written datatype. I don't think we're going there. regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, Sep 07, 2006 at 01:27:01PM +0100, Gregory Stark wrote: >> ... If you look again at the columns in my example you'll >> see none of them are appropriate targets for i18n anyways. They're all codes >> and even numbers. > > Which begs the question of why they don't store the numbers in numeric > columns? That'll take far less space than any string. Sure, unless by PRVDR_NUM they mean things like '000001' or even 'C00001'. ... > It makes sense to store them as numbers, or perhaps an enum. Sure. If you're designing the schema from scratch and don't have to interoperate with any other systems. But if you're receiving a flat ascii text file and it has a 5 character opaque identifier called "FI_NUM" which do you think is the safer approach for storing these opaque identifiers? You can suggest that there are other ways of designing a schema that will work better with Postgres but I think you're just working around Postgre deficiencies. These may be deficiencies that are pretty low down your priority list but they may be higher up my list. I just don't think you can argue they're not deficiencies just because you know better than to get tripped up by them. I'm discussing these things with an eye to getting some kind of consensus on what should be done about them so I can go do it, not because I'm trying to get you to work on it :) >> Currently Postgres has a limitation that you can only have one encoding per >> database and one locale per cluster. Personally I'm of the opinion that the >> only correct choice for that is "C" and all localization should be handled in >> the client and with pg_strxfrm. Putting the whole database into non-C locales >> guarantees that the columns that should not be localized will have broken >> semantics and there's no way to work around things in the other direction. > > Quite. So if someone would code up SQL COLLATE support and integrate > ICU, everyone would be happy and we could all go home. Well I for one would be pretty unhappy if ICU were integrated. It seems like a whole pile of code and complexity for no particular gain. The standard i18n support with a few extensions (namely strcoll_l) seems to be adequate for us and not introduce huge new dependencies and code burdens. > BTW, requireing localisation to happen in the client is silly. SQL > provides the ORDER BY clause for strings and it'd be silly to have the > client resort them just because they're not using C locale. The point > of a database was to make your life easier, right? That's why I mentioned pg_strxfrm. It doesn't solve all your problems if you're doing lots of string manipulations in queries but it can handle collation so you can at least execute ORDER BY clauses which of course you can't efficiently do in the client. For anything more complex you're probably happier doing your string manipulations in the client just because SQL's string primitives are so, well, primitive. >> Perhaps given the current situation what we should have is a cvarchar and >> cchar data types that are like varchar and char but guaranteed to always be >> interpreted in the c locale with ascii encoding. > > I think bytea gives you that, pretty much. Hm, that's an interesting idea. We could define all the string functions for bytea as well. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Tom Lane <tgl@sss.pgh.pa.us> writes: > Martijn van Oosterhout <kleptog@svana.org> writes: >> On Thu, Sep 07, 2006 at 11:57:26AM +0100, Gregory Stark wrote: >>> Just brain storming here. But what happens if we make Datum 2*sizeof(pointer) >>> and stored the typmod and/or attlen in it? > >> The fundamental property of a Datum is that you can pass it by value to >> a C function. This generally means it has to fit in a register. On the >> whole, the CPU register size is the same as the pointer size, so >> 2*sizeof(pointer) is unlikely to fit... > > Not to mention the problem that such a change would break every single > datatype-manipulation function in both the core backend and every > user-written datatype. I don't think we're going there. Sure, I'm just brain storming. Sometimes thinking about outlandish ideas can result in quiet reasonable ideas appearing down the line. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, Sep 07, 2006 at 03:38:10PM +0100, Gregory Stark wrote: > Well I for one would be pretty unhappy if ICU were integrated. It seems like a > whole pile of code and complexity for no particular gain. The standard i18n > support with a few extensions (namely strcoll_l) seems to be adequate for us > and not introduce huge new dependencies and code burdens. Let's be serious here. The patch is 18k (729 lines), hardly "whole pile of code and complexity". The patch has been in the FreeBSD ports collection for a rather long time, so it's not like it's not tested. http://www.freebsd.org/cgi/cvsweb.cgi/ports/databases/postgresql81-server/Makefile?rev=1.156&content-type=text/x-cvsweb-markup and search for "ICU". The actual patches are here: http://people.freebsd.org/~girgen/postgresql-icu/ The point is that strcoll_l doesn't exist on most platforms, so unless someone is going to write another locale library, why not just use one that's available? > That's why I mentioned pg_strxfrm. It doesn't solve all your problems if > you're doing lots of string manipulations in queries but it can handle > collation so you can at least execute ORDER BY clauses which of course you > can't efficiently do in the client. For anything more complex you're probably > happier doing your string manipulations in the client just because SQL's > string primitives are so, well, primitive. I think you're making the assumption that client locale support is going to be better than the server's. Besides, pg_strxfrm doesn't help you if you want to do accent-insensetive matching. Sometimes you don't just want to change the order, you also want to change what is equal. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Gregory Stark wrote: > By my count postgres would use 154 bytes for this record. Whereas in > fact there's no need for it to take more than 87 bytes. Almost 100% > overhead for varattlen headers and the padding they necessitate. The thing is, 100% extra space is cheap, but the processing power for making the need for that extra space go away is not. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > Gregory Stark wrote: > > By my count postgres would use 154 bytes for this record. Whereas in > > fact there's no need for it to take more than 87 bytes. Almost 100% > > overhead for varattlen headers and the padding they necessitate. > > The thing is, 100% extra space is cheap, but the processing power for > making the need for that extra space go away is not. I think it would be good to see if we can extend the varlena data types to support a shorter header for storing short byte values. Looking at the header now we have: #define VARATT_FLAG_EXTERNAL 0x80000000 #define VARATT_FLAG_COMPRESSED 0x40000000 #define VARATT_MASK_FLAGS 0xc0000000 #define VARATT_MASK_SIZE 0x3fffffff #define VARATT_SIZEP(_PTR) (((varattrib *)(_PTR))->va_header) so there is precedent for overloading that header, but currently all the headers are four bytes. The big question is can a bit be allocated to indicate a short byte header is being used? Can we do this with minimal performance impact for non-short values? One test would be to adjust the masks above to assign one bit to be the "I am a short value" header, and I think that leaves you with 5 bits == 32, which is probably enough for a test. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, Sep 07, 2006 at 03:38:10PM +0100, Gregory Stark wrote: > > Well I for one would be pretty unhappy if ICU were integrated. It seems like a > > whole pile of code and complexity for no particular gain. The standard i18n > > support with a few extensions (namely strcoll_l) seems to be adequate for us > > and not introduce huge new dependencies and code burdens. > > Let's be serious here. The patch is 18k (729 lines), hardly "whole pile > of code and complexity". The patch has been in the FreeBSD ports > collection for a rather long time, so it's not like it's not tested. Uhm, an ICU source tree is over 40 *megabytes*. That's almost as much as the rest of Postgres itself and that doesn't even include documentation. Even if you exclude the data and regression tests you're still talking about depending on the portability and correctness of over 10 megabytes of new code. > The point is that strcoll_l doesn't exist on most platforms, so unless > someone is going to write another locale library, why not just use one > that's available? Neither is ICU available on most platforms. In any case we only need strcoll_l as a performance optimization, the regular interface works, it's just slow. > I think you're making the assumption that client locale support is > going to be better than the server's. Well we know it is because Postgres's support is basically nonexistent. > Besides, pg_strxfrm doesn't help you if you want to do > accent-insensetive matching. Sometimes you don't just want to change > the order, you also want to change what is equal. Well equal is part of collation at least in the sense you mean. What it doesn't help with is things like tolower or regexp matching. These are the things that I would suggest you usually want to be doing on the client because SQL's string manipulation facilities are so poor compared to most client languages. -- greg
Peter Eisentraut <peter_e@gmx.net> writes: > Gregory Stark wrote: > > By my count postgres would use 154 bytes for this record. Whereas in > > fact there's no need for it to take more than 87 bytes. Almost 100% > > overhead for varattlen headers and the padding they necessitate. > > The thing is, 100% extra space is cheap, but the processing power for > making the need for that extra space go away is not. That's simply untrue for most applications. Unless you can fit much of your database into RAM that 100% extra space translates directly into 100% slower. This is most obviously the case for data warehouses that are doing lots of sequential scans of tables that don't fit in cache. But it's largely true for OLTP applications too. The more compact the data the more tuples fit on a page and the greater the chance you have the page you need in cache. -- greg
Bruce Momjian <bruce@momjian.us> writes: > I think it would be good to see if we can extend the varlena data types > to support a shorter header for storing short byte values. Looking at > the header now we have: This isn't the first time we've been down that route. There were some extensive discussions a while back. I think there were even patches. I don't remember why it was eventually rejected. I suspect it simply got too complex. But I think this is a dead-end route. What you're looking at is the number "1" repeated for *every* record in the table. And what your proposing amounts to noticing that the number "4" fits in a byte and doesn't need a whole word to store it. Well sure, but you don't even need a byte if it's going to be the same for every record in the table. If someone popped up on the list asking about whether Postgres compressed their data efficiently if they stored a column that was identical throughout the whole table you would tell them to normalize their data. -- greg
Gregory Stark wrote: > Bruce Momjian <bruce@momjian.us> writes: > > > I think it would be good to see if we can extend the varlena data types > > to support a shorter header for storing short byte values. Looking at > > the header now we have: > > This isn't the first time we've been down that route. There were some > extensive discussions a while back. I think there were even patches. > I don't remember why it was eventually rejected. I suspect it simply got too > complex. > > But I think this is a dead-end route. What you're looking at is the number "1" > repeated for *every* record in the table. And what your proposing amounts to > noticing that the number "4" fits in a byte and doesn't need a whole word to > store it. Well sure, but you don't even need a byte if it's going to be the > same for every record in the table. > > If someone popped up on the list asking about whether Postgres compressed > their data efficiently if they stored a column that was identical throughout > the whole table you would tell them to normalize their data. I am confused. You don't want to shrink the header but instead compress duplicate values in the same row to a single entry? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Gregory Stark <gsstark@mit.edu> writes: > Peter Eisentraut <peter_e@gmx.net> writes: >> The thing is, 100% extra space is cheap, but the processing power for >> making the need for that extra space go away is not. > That's simply untrue for most applications. Well, it's true for some and not true for others: we hear from plenty of people who seem to be more CPU-bound than IO-bound, and the former group would not like a change along this line. The trick with any space-saving change would be to not expend so many cycles as to make things a lot worse for the CPU-bound crowd. regards, tom lane
Gregory Stark wrote: > This is most obviously the case for data warehouses that are doing > lots of sequential scans of tables that don't fit in cache. In a data warehouse, you won't have many caching effects anyway. > But it's largely true for OLTP applications too. The more compact the > data the more tuples fit on a page and the greater the chance you > have the page you need in cache. But a linear amount of more RAM is still more affordable than a CPU that is 100 times faster, which is about what some of the proposed schemes would require. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Gregory Stark wrote: > I think we have to find a way to remove the varlena length header > entirely for fixed length data types since it's going to be the same > for every single record in the table. But that won't help in the example you posted upthread, because char(N) is not fixed-length. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Bruce Momjian <bruce@momjian.us> writes: > Gregory Stark wrote: > > But I think this is a dead-end route. What you're looking at is the number "1" > > repeated for *every* record in the table. And what your proposing amounts to > > noticing that the number "4" fits in a byte and doesn't need a whole word to > > store it. Well sure, but you don't even need a byte if it's going to be the > > same for every record in the table. > > > > If someone popped up on the list asking about whether Postgres compressed > > their data efficiently if they stored a column that was identical throughout > > the whole table you would tell them to normalize their data. > > I am confused. You don't want to shrink the header but instead compress > duplicate values in the same row to a single entry? I think we have to find a way to remove the varlena length header entirely for fixed length data types since it's going to be the same for every single record in the table. It might be useful to find a way to have 1-byte or 2-byte length headers too since I suspect most legitimately variable columns like text or array[] are also gong to be under 256 bytes. -- greg
Peter Eisentraut <peter_e@gmx.net> writes: > Gregory Stark wrote: > > I think we have to find a way to remove the varlena length header > > entirely for fixed length data types since it's going to be the same > > for every single record in the table. > > But that won't help in the example you posted upthread, because char(N) > is not fixed-length. Sure it is because any sane database--certainly any sane database using char(N)--is in C locale anyways. In any case if you disagree about that (and you're wrong) then substitute some other data type. Defining such a data type may be part of the problem that has to be solved here. -- greg
Peter Eisentraut <peter_e@gmx.net> writes: > Gregory Stark wrote: > > > But that won't help in the example you posted upthread, because > > > char(N) is not fixed-length. > > > > Sure it is because any sane database--certainly any sane database > > using char(N)--is in C locale anyways. > > This matter is completely independent of the choice of locale and > therefore any unilateral redefinition of sanity that you might come up > with. Except it isn't. If you're dealing with fixed length ascii codes from existing databases you interoperate with then you will have problems if you initialize your database in a non-C locale. Interpreting those codes in your locale will be do incorrect things like treat them as case insensitive or ignore spaces in collation, etc. -- greg
On 2006-09-08, Gregory Stark <gsstark@MIT.EDU> wrote: >> But that won't help in the example you posted upthread, because char(N) >> is not fixed-length. > > Sure it is because any sane database--certainly any sane database using > char(N)--is in C locale anyways. You're confusing locale and charset. -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
Gregory Stark <gsstark@MIT.EDU> writes: > Peter Eisentraut <peter_e@gmx.net> writes: > > > Gregory Stark wrote: > > > > But that won't help in the example you posted upthread, because > > > > char(N) is not fixed-length. > > > > > > Sure it is because any sane database--certainly any sane database > > > using char(N)--is in C locale anyways. > > > > This matter is completely independent of the choice of locale and > > therefore any unilateral redefinition of sanity that you might come up > > with. > > Except it isn't. If you're dealing with fixed length ascii codes from existing > databases you interoperate with then you will have problems if you initialize > your database in a non-C locale. Interpreting those codes in your locale will > be do incorrect things like treat them as case insensitive or ignore spaces in > collation, etc. Oh, I think I misread your comment. You're saying the choice of encoding is independent of the choice of locale. Sure, if you're using UTF8 then how efficiently Postgres stores fixed length data types isn't terribly relevant to you. Just as it isn't relevant if you're storing other variable length data types. But why would you use UTF8 to encode fixed length ascii strings? -- greg
Gregory Stark wrote: > > But that won't help in the example you posted upthread, because > > char(N) is not fixed-length. > > Sure it is because any sane database--certainly any sane database > using char(N)--is in C locale anyways. This matter is completely independent of the choice of locale and therefore any unilateral redefinition of sanity that you might come up with. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Gregory Stark wrote: > But why would you use UTF8 to encode fixed length ascii strings? The encoding is set per-database. Even if you need UTF-8 to encode user-supplied strings, there can still be many small ASCII fields in the database. Country code, currency code etc. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Gregory Stark wrote: >> But why would you use UTF8 to encode fixed length ascii strings? > > The encoding is set per-database. Even if you need UTF-8 to encode > user-supplied strings, there can still be many small ASCII fields in > the database. Country code, currency code etc. > > ISTM we should revisit this when we get per-column encoding. cheers andrew
On Fri, Sep 08, 2006 at 05:54:01AM -0400, Andrew Dunstan wrote: > >The encoding is set per-database. Even if you need UTF-8 to encode > >user-supplied strings, there can still be many small ASCII fields in > >the database. Country code, currency code etc. > > ISTM we should revisit this when we get per-column encoding. I think that if SQL COLLATE gets in we'll get this almost for free. Collation and charset are both properties of strings. Once you've got a mechanism to know the collation of a string, you just attach the charset to the same place. The only difference is that changing charsets requires recoding, wheres changing collation does not. I think it'd just become a special case of the Relabel node. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > I think that if SQL COLLATE gets in we'll get this almost for free. > Collation and charset are both properties of strings. Once you've got a > mechanism to know the collation of a string, you just attach the > charset to the same place. The only difference is that changing charsets > requires recoding, wheres changing collation does not. Not quite. Collation is a property of the operation that you're doing. For example, if you're doing a sort, you might do it in different collation depending on the user that's doing it, or it might even be chosen by the user case-by-case. Of course, usually you have a default set per-database, per-table or per-column, but it's not a property of the actual value of a field. I think that the phrase "collation of a string" doesn't make sense. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Sep 08, 2006 at 11:58:59AM +0100, Heikki Linnakangas wrote: > Martijn van Oosterhout wrote: > >I think that if SQL COLLATE gets in we'll get this almost for free. > >Collation and charset are both properties of strings. Once you've got a > >mechanism to know the collation of a string, you just attach the > >charset to the same place. The only difference is that changing charsets > >requires recoding, wheres changing collation does not. > > Not quite. Collation is a property of the operation that you're doing. > For example, if you're doing a sort, you might do it in different > collation depending on the user that's doing it, or it might even be > chosen by the user case-by-case. Of course, usually you have a default > set per-database, per-table or per-column, but it's not a property of > the actual value of a field. I think that the phrase "collation of a > string" doesn't make sense. Sorry, you're quite right. The collation is the property of an executor node, or at least that's how I thought of it while I was working on it. By that I mean that each source (say column) has a defined value for collation and charset (the SQL defines the rules for determining collation, don't know about charset). At each point in the query you can point at the charset and collation applying to that node. However, I think my point that charset and collations could be treated via the same mechanism is still valid. If I get time I might rework the COLLATE spec I wrote to include charset stuff. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Thu, Sep 07, 2006 at 04:57:04PM -0400, Gregory Stark wrote: > Uhm, an ICU source tree is over 40 *megabytes*. That's almost as much as the > rest of Postgres itself and that doesn't even include documentation. Even if > you exclude the data and regression tests you're still talking about depending > on the portability and correctness of over 10 megabytes of new code. I don't understand this argument. No-one asked what size the LDAP libraries were when we added support for them. No-one cares that libssl/libcrypto is as large as glibc. What size the libraries are that postgresql uses is somewhat irrelevent. It's not like we're forcing people to install them. > Neither is ICU available on most platforms. In any case we only need strcoll_l > as a performance optimization, the regular interface works, it's just slow. Can you point me to a common platform where postgresql runs and ICU doesn't? http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/readme.html?rev=HEAD#HowToBuildSupported The only one I can see in the buildfarm that isn't mentioned is Unixware. > Well equal is part of collation at least in the sense you mean. What it > doesn't help with is things like tolower or regexp matching. These are the > things that I would suggest you usually want to be doing on the client because > SQL's string manipulation facilities are so poor compared to most client > languages. If I specify a collation where case and accents are ignored, then GROUP BY should ignore them too, and regexps should honour that. Moving all this to the client doesn't seem like a good move at all. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Heikki Linnakangas wrote: > have a default set per-database, per-table or per-column, but it's > not a property of the actual value of a field. I think that the > phrase "collation of a string" doesn't make sense. The real problem is that the established method dividing up the locale categories ignores both the technological and the linguistic reality. In reality, all properties like lc_collate, lc_ctype, and lc_numeric are dependent on the property "language of the text". In general, it doesn't make sense to sort a text by Spanish rules, downcase by Turkish rules, and embed numbers using English punctuation. Of course you can do all that, but it's generally not very useful and might give inconsistent results. (For extra credit: how do you do case-insensitive sorts with inconsistent lc_collate and lc_ctype settings?) So "mathematically", you are right, the collation is a property of the operation, not of the operands. But semantically, the operands do carry the information of what collation order they would like to be compared under, and if two pieces of data with different choices meet, you need an override. Incidentally, if you buy into that, this would also neatly solve the problem of how to arrange for column-specific case conversion rules, which SQL does not address at all. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > The real problem is that the established method dividing up the locale > categories ignores both the technological and the linguistic reality. > In reality, all properties like lc_collate, lc_ctype, and lc_numeric > are dependent on the property "language of the text". I don't buy that. lc_collate, lc_ctype and lc_numeric are certainly related, but they're not a property of the "language of the text". For example, imagine an employee database for an international company. When a user wants to print out a sorted list of employees, the language of the text in the database (name of an employee) is irrelevant. A german user would like to see the names in different order than an English-speaking user. I've seen this in practice. Also, see: http://www.unicode.org/unicode/reports/tr10/#Common_Misperceptions for another example. > In general, it > doesn't make sense to sort a text by Spanish rules, downcase by Turkish > rules, and embed numbers using English punctuation. Of course you can > do all that, but it's generally not very useful and might give > inconsistent results. (For extra credit: how do you do > case-insensitive sorts with inconsistent lc_collate and lc_ctype > settings?) Sure. Don't do that, that's just silly. But I don't see how that's relevant. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Sep 08, 2006 at 02:14:58PM +0200, Peter Eisentraut wrote: > So "mathematically", you are right, the collation is a property of the > operation, not of the operands. But semantically, the operands do > carry the information of what collation order they would like to be > compared under, and if two pieces of data with different choices meet, > you need an override. Sure, but SQL COLLATE handles all that just fine. At no point is the collation a property of the operands. At best is a property of the source of the operands but can be overridden at any point. SQL also covers the case where there is ambiguity, and the writer of the query has to clarify. Collation is hard precisly because it's not a property of the operands, which makes it very difficult to make postgresql do it. > Incidentally, if you buy into that, this would also neatly solve the > problem of how to arrange for column-specific case conversion rules, > which SQL does not address at all. SQL does say that UPPER and LOWER should be handled by Unicode rules, however the notes do mention that they should probably pay attention to the collation and character set, since the results are dependant on them. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Fri, Sep 08, 2006 at 08:57:12AM +0200, Peter Eisentraut wrote: > Gregory Stark wrote: > > I think we have to find a way to remove the varlena length header > > entirely for fixed length data types since it's going to be the same > > for every single record in the table. > But that won't help in the example you posted upthread, because char(N) > is not fixed-length. It can be fixed-length, or at least, have an upper bound. If marked up to contain only ascii characters, it doesn't, at least in theory, and even if it is unicode, it's not going to need more than 4 bytes per character. char(2) through char(16) only require 4 bits to store the length header, leaving 4 bits for encoding information. bytea(2) through bytea(16), at least in theory, should require none. For my own uses, I would like for bytea(16) to have no length header. The length is constant. UUID or MD5SUM. Store the length at the head of the table, or look up the information from the schema. I see the complexity argument. Existing code is too heavy to change completely. People talking about compromises such as allowing the on disk layout to be different from the in memory layout. I wonder whether the change could be small enough to not significantly increase CPU, while still having significant effect. I find myself doubting the CPU bound numbers. If even 20% data is saved, this means 20% more RAM for caching, 20% less pages touched when scanning, and 20% less RAM read. When people say CPU-bound, are we sure they do not mean RAM speed bound? How do they tell the difference between the two? RAM lookups count as CPU on most performance counters I've ever used. RAM speed is also slower than CPU speed, allowing for calculations between accesses assuming that the loop allows for prefetching to be possible and accurate. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Fri, Sep 08, 2006 at 08:50:57AM +0200, Peter Eisentraut wrote: > Gregory Stark wrote: > > But it's largely true for OLTP applications too. The more compact the > > data the more tuples fit on a page and the greater the chance you > > have the page you need in cache. > But a linear amount of more RAM is still more affordable than a CPU that > is 100 times faster, which is about what some of the proposed schemes > would require. 100 times faster? I don't think it has been proven that a change in how data is stored would result in an increase in CPU usage. It's an assumption. It might be correct. It might not. I guess this is where patches speak louder than words... :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Fri, Sep 08, 2006 at 09:28:21AM -0400, mark@mark.mielke.cc wrote: > > But that won't help in the example you posted upthread, because char(N) > > is not fixed-length. > > It can be fixed-length, or at least, have an upper bound. If marked > up to contain only ascii characters, it doesn't, at least in theory, > and even if it is unicode, it's not going to need more than 4 bytes > per character. char(2) through char(16) only require 4 bits to > store the length header, leaving 4 bits for encoding information. > bytea(2) through bytea(16), at least in theory, should require none. If your talking about an upper-bound, then it's not fixed length anymore, and you need to expend bytes storing the length. ASCII bytes only take one byte in most encodings, include UTF8. Doodling this morning I remember why the simple approach didn't work. If you look at the varlena header, 2 bits are reserved. Say you take one bit to indicate "short header". Then lengths 0-31 bytes can be represented with a one byte header, yay! However, now you only have enough bits leftover to store 29 bits for the length, so we've just cut the maximum datum size from 1GB to 512MB. Is that a fair trade? Probably not, so you'd need a more sophisticated scheme. > For my own uses, I would like for bytea(16) to have no length header. > The length is constant. UUID or MD5SUM. Store the length at the head > of the table, or look up the information from the schema. I'm still missing the argument of why you can't just make a 16-byte type. Around half the datatypes in postgresql are fixed-length and have no header. I'm completely confused about why people are hung up about bytea(16) not being fixed length when it's trivial to create a type that is. > I see the complexity argument. Existing code is too heavy to change > completely. People talking about compromises such as allowing the > on disk layout to be different from the in memory layout. The biggest cost of having differing memory and disk layouts is that you have to "unpack" each disk page as it's read it. This means an automatic doubling of memory usage for the buffer cache. If you're RAM limited, that's the last thing you want. Currently, the executor will use the contents of the actual disk page when possible, saving a lot of copying. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, Sep 07, 2006 at 04:57:04PM -0400, Gregory Stark wrote: >> Uhm, an ICU source tree is over 40 *megabytes*. > I don't understand this argument. No-one asked what size the LDAP > libraries were when we added support for them. No-one cares that > libssl/libcrypto is as large as glibc. The reason this is a relevant consideration: we are talking about changes that would remove existing functionality for people who don't have that library. People who don't have LDAP don't care that the PG sources have some LDAP functionality they're not getting, people who don't have SSL evidently don't care about that, etc. But there is existing, portable locale and multi-charset support in PG, and even though it's rather limited it's still useful. So you're telling people "to maintain the same functionality you have today, you will have to add this rather large library". That is only zero-cost from the perspective of someone who already has ICU installed; from everyone else, you should expect pushback. I suppose it might be possible to do#ifdef HAVE_ICU ... new code ...#else ... existing code ...#endif but given the differences in API I can't believe this would be readable or maintainable. Another problem is that AFAICT, depending on ICU would force us to standardize on Unicode as the *only* server internal encoding; what's more, the docs suggest that it doesn't support anything wider than UTF16. From the point of view of some of our far eastern users, both of those are serious steps backward. "Add large library, get *less* functionality" is an even harder sell. regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes: > I'm still missing the argument of why you can't just make a 16-byte > type. Around half the datatypes in postgresql are fixed-length and have > no header. I'm completely confused about why people are hung up about > bytea(16) not being fixed length when it's trivial to create a type > that is. Because by the time you have a CHAR(1), CHAR(2), CHAR(4), and CHAR(8) your head is already swimming trying to keep track of all the casts and cross-data-type comparators and you haven't even covered all the cases. If you define types just for the lengths up to 128 you would have 16,384 casts and 114,688 different cross-data-type comparisons just between them. Without them you wouldn't be able to have things like phone_number char(10)area_code char(3) and do things like:WHERE phone_number LIKE area_code||'%' And before you say so, sure this isn't the only way to do this and there are reasons why this may not be the best. But if you were shipping separate data types for char(3) and char(10) I think it would be a bug if the above didn't work. The problem is worse with numeric in that it would definitely be a bug if you couldn't use an index when comparing two numeric columns just because one had less precision than the other. There wouldn't be nearly as many types but even with just three such types you're already talking about hundreds of cross-data-type comparisons. Would others really consider shipping hundreds of new types to take care of this problem? I was looking for a more general solution. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: > The reason this is a relevant consideration: we are talking about > changes that would remove existing functionality for people who don't > have that library. Huh? If you don't select ICU at compile time you get no difference from what we have now. I'm not sure I'm seeing your point. My COLLATE patches did allow both to coexist, but no-one appeared to like that idea either. > I suppose it might be possible to do > #ifdef HAVE_ICU > ... new code ... > #else > ... existing code ... > #endif > but given the differences in API I can't believe this would be readable > or maintainable. That's what the patch does. And the api differences are marginal. They even have C compatability functions to make it easier. > Another problem is that AFAICT, depending on ICU would force us to > standardize on Unicode as the *only* server internal encoding; Huh? You can use whatever encoding you like... Actual collations are determined on the basis of unicode properties, but I don't think that is what you're referring to. > what's more, the docs suggest that it doesn't support anything wider > than UTF16. Well, that's not true, which part of the docs were you looking at? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: >> what's more, the docs suggest that it doesn't support anything wider >> than UTF16. > Well, that's not true, which part of the docs were you looking at? AFAICT, most of the useful operations work on UChar, which is uint16: http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b305324ef288165e2ac regards, tom lane
On Fri, Sep 08, 2006 at 12:19:19PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > On Fri, Sep 08, 2006 at 10:35:58AM -0400, Tom Lane wrote: > >> what's more, the docs suggest that it doesn't support anything wider > >> than UTF16. > > > Well, that's not true, which part of the docs were you looking at? > > AFAICT, most of the useful operations work on UChar, which is uint16: > http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b305324ef288165e2ac Oh, you're confusing UCS-2 with UTF-16, UCS-2 is a subset of UTF-16 that only handles the basic plane. Just like no-one is surprised that UTF-8 handles more than 256 characters, it shouldn't surprise you that UTF-16 handles more than 65536. ICU hasn't used UCS-2 since 1996. It's in the FAQ: http://icu.sourceforge.net/userguide/icufaq.html Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: >> AFAICT, most of the useful operations work on UChar, which is uint16: >> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30= > 5324ef288165e2ac > Oh, you're confusing UCS-2 with UTF-16, Ah, you're right, I did misunderstand that. However, it's still apparently the case that ICU works mostly with UTF16 and handles other encodings only via conversion to UTF16. That's a pretty serious mismatch with our needs --- we'll end up converting to UTF16 all the time. We're certainly not going to change to using UTF16 as the actual native string representation inside the backend, both because of the space penalty and incompatibility with tools like bison. regards, tom lane
On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > >> AFAICT, most of the useful operations work on UChar, which is uint16: > >> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30= > > 5324ef288165e2ac > > Oh, you're confusing UCS-2 with UTF-16, > Ah, you're right, I did misunderstand that. However, it's still > apparently the case that ICU works mostly with UTF16 and handles other > encodings only via conversion to UTF16. That's a pretty serious > mismatch with our needs --- we'll end up converting to UTF16 all the > time. We're certainly not going to change to using UTF16 as the actual > native string representation inside the backend, both because of the > space penalty and incompatibility with tools like bison. I think I've been involved in a discussion like this in the past. Was it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding means that UTF-8 applications are at a disadvantage when using the library. UTF-16 is considered more efficient to work with for everybody except ASCII users. :-) No opinion on the matter though. Changing PostgreSQL to UTF-16 would be an undertaking... :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote: > Ah, you're right, I did misunderstand that. However, it's still > apparently the case that ICU works mostly with UTF16 and handles other > encodings only via conversion to UTF16. That's a pretty serious > mismatch with our needs --- we'll end up converting to UTF16 all the > time. We're certainly not going to change to using UTF16 as the actual > native string representation inside the backend, both because of the > space penalty and incompatibility with tools like bison. No need to do anything like that. We'd probably use the u_strCompareIter() interface, where the two strings are defined as iterators. We setup the iterator to understand whatever charset postgres is currently running. Many of the other function have iterator versions also, so you can avoid UTF-16 entirely if you like. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
mark@mark.mielke.cc wrote: > I think I've been involved in a discussion like this in the past. Was > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding > means that UTF-8 applications are at a disadvantage when using the > library. UTF-16 is considered more efficient to work with for everybody > except ASCII users. :-) Uh, is it? By whom? And why? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Gregory Stark wrote: > Bruce Momjian <bruce@momjian.us> writes: > > > Gregory Stark wrote: > > > But I think this is a dead-end route. What you're looking at is the number "1" > > > repeated for *every* record in the table. And what your proposing amounts to > > > noticing that the number "4" fits in a byte and doesn't need a whole word to > > > store it. Well sure, but you don't even need a byte if it's going to be the > > > same for every record in the table. > > > > > > If someone popped up on the list asking about whether Postgres compressed > > > their data efficiently if they stored a column that was identical throughout > > > the whole table you would tell them to normalize their data. > > > > I am confused. You don't want to shrink the header but instead compress > > duplicate values in the same row to a single entry? > > I think we have to find a way to remove the varlena length header entirely for > fixed length data types since it's going to be the same for every single > record in the table. What fixed-length data type has a header? > It might be useful to find a way to have 1-byte or 2-byte length headers too > since I suspect most legitimately variable columns like text or array[] are > also gong to be under 256 bytes. I think the point you are making is that fixed length fields, like GUID, don't need a header, while short fields like VARCHAR() and NUMERIC() need some shorter header. No one has mentioned that we page value on disk to match the CPU alignment. This is done for efficiency, but is not strictly required. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > No one has mentioned that we page value on disk to match the CPU > alignment. This is done for efficiency, but is not strictly required. Well, it is unless you are willing to give up support of non-Intel CPUs; most other popular chips are strict about alignment, and will fail an attempt to do a nonaligned fetch. The only way we could pack stuff without alignment is to go over to the idea that memory and disk representations are different --- where in this case the "conversion" might just be a memcpy to a known-aligned location. The performance costs of that seem pretty daunting, however, especially when you reflect that simply stepping over a varlena field would require memcpy'ing its length word to someplace. regards, tom lane
Bruce Momjian wrote: > > No one has mentioned that we page value on disk to match the CPU > alignment. This is done for efficiency, but is not strictly required. > > From time to time the idea of a logical vs physical mapping for columns has been mentioned. Among other benefits, that might allow us to do some rearrangement of physical ordering to reduce space wasted on alignment in some cases. There might be a small addition on computation required, but I suspect it would be lost in the noise, and swamped by any increased efficiency we got from putting more tuples in a page. cheers andrew
Martijn van Oosterhout wrote: -- Start of PGP signed section. > On Fri, Sep 08, 2006 at 09:28:21AM -0400, mark@mark.mielke.cc wrote: > > > But that won't help in the example you posted upthread, because char(N) > > > is not fixed-length. > > > > It can be fixed-length, or at least, have an upper bound. If marked > > up to contain only ascii characters, it doesn't, at least in theory, > > and even if it is unicode, it's not going to need more than 4 bytes > > per character. char(2) through char(16) only require 4 bits to > > store the length header, leaving 4 bits for encoding information. > > bytea(2) through bytea(16), at least in theory, should require none. > > If your talking about an upper-bound, then it's not fixed length > anymore, and you need to expend bytes storing the length. ASCII bytes > only take one byte in most encodings, include UTF8. > > Doodling this morning I remember why the simple approach didn't work. > If you look at the varlena header, 2 bits are reserved. Say you take > one bit to indicate "short header". Then lengths 0-31 bytes can be > represented with a one byte header, yay! > > However, now you only have enough bits leftover to store 29 bits for > the length, so we've just cut the maximum datum size from 1GB to 512MB. > Is that a fair trade? Probably not, so you'd need a more sophisticated > scheme. I was hoping we could have both bits true mean short header, but that is also used by our system to indicate compressed and TOAST usage. For testing, I would just grab a bit and see how thing go. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > No one has mentioned that we page value on disk to match the CPU > > alignment. This is done for efficiency, but is not strictly required. > > Well, it is unless you are willing to give up support of non-Intel CPUs; > most other popular chips are strict about alignment, and will fail an > attempt to do a nonaligned fetch. > > The only way we could pack stuff without alignment is to go over to the > idea that memory and disk representations are different --- where in > this case the "conversion" might just be a memcpy to a known-aligned > location. The performance costs of that seem pretty daunting, however, > especially when you reflect that simply stepping over a varlena field > would require memcpy'ing its length word to someplace. Agreed, but I thought I would point it out. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote: > mark@mark.mielke.cc wrote: > > I think I've been involved in a discussion like this in the past. Was > > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding > > means that UTF-8 applications are at a disadvantage when using the > > library. UTF-16 is considered more efficient to work with for everybody > > except ASCII users. :-) > Uh, is it? By whom? And why? The authors of the library in question? Java? Anybody whose primary alphabet isn't LATIN1 based? :-) Only ASCII values store more space efficiently in UTF-8. All values over 127 store more space efficiently using UTF-16. UTF-16 is easier to process. UTF-8 requires too many bit checks with single character offsets. I'm not an expert - I had this question before a year or two ago, and read up on the ideas of experts. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
mark@mark.mielke.cc wrote: > On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote: > > mark@mark.mielke.cc wrote: > > > I think I've been involved in a discussion like this in the past. Was > > > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding > > > means that UTF-8 applications are at a disadvantage when using the > > > library. UTF-16 is considered more efficient to work with for everybody > > > except ASCII users. :-) > > Uh, is it? By whom? And why? > > The authors of the library in question? Java? Anybody whose primary > alphabet isn't LATIN1 based? :-) Well, for Latin-9 alphabets, Latin-9 is still more space-efficient than UTF-8. That covers a lot of the world. Forcing those people to change to UTF-16 does not strike me as a very good idea. But Martijn already clarified that ICU does not actually force you to switch everything to UTF-16, so this is not an issue anyway. > Only ASCII values store more space efficiently in UTF-8. All values > over 127 store more space efficiently using UTF-16. UTF-16 is easier > to process. UTF-8 requires too many bit checks with single character > offsets. I'm not an expert - I had this question before a year or two > ago, and read up on the ideas of experts. Well, I was not asking about "UTF-8 vs UTF-16," but rather "anything vs. UTF-16". I don't much like UTF-8 myself, but that's not a very informed opinion, just like a feeling of "fly-killing-cannon" (when it's used to store Latin-9-fitting text). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote: > mark@mark.mielke.cc wrote: > > The authors of the library in question? Java? Anybody whose primary > > alphabet isn't LATIN1 based? :-) > Well, for Latin-9 alphabets, Latin-9 is still more space-efficient than > UTF-8. That covers a lot of the world. Forcing those people to change > to UTF-16 does not strike me as a very good idea. Ah. Thought you were talking UTF-8 vs UTF-16. > But Martijn already clarified that ICU does not actually force you to > switch everything to UTF-16, so this is not an issue anyway. If my memory is correct, it does this by converting it to UTF-16 first. This is a performance disadvantage (although it may not be worse than PostgreSQL's current implementation :-) ). > > Only ASCII values store more space efficiently in UTF-8. All values > > over 127 store more space efficiently using UTF-16. UTF-16 is easier > > to process. UTF-8 requires too many bit checks with single character > > offsets. I'm not an expert - I had this question before a year or two > > ago, and read up on the ideas of experts. > Well, I was not asking about "UTF-8 vs UTF-16," but rather "anything vs. > UTF-16". I don't much like UTF-8 myself, but that's not a very informed > opinion, just like a feeling of "fly-killing-cannon" (when it's used to > store Latin-9-fitting text). *nod* Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
mark@mark.mielke.cc wrote: > On Fri, Sep 08, 2006 at 04:42:09PM -0400, Alvaro Herrera wrote: > > But Martijn already clarified that ICU does not actually force you to > > switch everything to UTF-16, so this is not an issue anyway. > > If my memory is correct, it does this by converting it to UTF-16 first. > This is a performance disadvantage (although it may not be worse than > PostgreSQL's current implementation :-) ). Actually he muttered something about iterators, and not needing to convert anything. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
mark@mark.mielke.cc wrote: > Only ASCII values store more space efficiently in UTF-8. All values > over 127 store more space efficiently using UTF-16. > > This second statement is demonstrably not true. Only values above 0x07ff require more than 2 bytes in UTF-8. All chars up to that point are stored in UTF-8 with greater or equal efficiency than that of UTF-16. See http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html cheers andrew
On Fri, Sep 08, 2006 at 04:49:49PM -0400, Alvaro Herrera wrote: > Actually he muttered something about iterators, and not needing to > convert anything. Yes, many of the useful functions accept strings in two forms, either UTF-16 or CharacterIterators. The iterator pretty much only has to know how to step forward through the string and return the code point at each point. Here's the docs for C++ class, but there's a equivalent C interface. http://icu.sourceforge.net/apiref/icu4c/classCharacterIterator.html#_details Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Fri, Sep 08, 2006 at 04:49:22PM -0400, Andrew Dunstan wrote: > mark@mark.mielke.cc wrote: > >Only ASCII values store more space efficiently in UTF-8. All values > >over 127 store more space efficiently using UTF-16. > This second statement is demonstrably not true. Only values above 0x07ff > require more than 2 bytes in UTF-8. All chars up to that point are > stored in UTF-8 with greater or equal efficiency than that of UTF-16. > See http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html You are correct - I should have said "All values over 127 store at least as space efficiently using UTF-16 as UTF-8." From the ICU page: "Most of the time, the memory throughput of the hard drive and RAM is the main performance constraint. UTF-8 is 50% smaller than UTF-16 or US-ASCII, but UTF-8 is 50% larger than UTF-16 or East and South Asian scripts. There is no memory difference for Latin extensions, Greek, Cyrillic, Hebrew, and Arabic. For processing Unicode data, UTF-16 is much easier to handle. You get a choice between either one or two units per character, not a choice among four lengths. UTF-16 also does not have illegal 16-bit unit values, while you might want to check or illegal bytes in UTF-8. Incomplete character sequences in UTF-16 are less important and more benign. If you want to quickly convert small strings between the different UTF encodings or get a UChar32 value, you can use the macros provided in utf.h and ..." I didn't think of the iterators for simple uses. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Tom Lane <tgl@sss.pgh.pa.us> writes: > The performance costs of that seem pretty daunting, however, especially when > you reflect that simply stepping over a varlena field would require > memcpy'ing its length word to someplace. I think if you give up on disk and in-memory representations being the same then there are ways of finessing that. For example you could have all the lengths together in the header prior to the variable length fields. In a separate unrelated thought, if we bring back the idea of having logical and physical field orders be distinct then we could also have the initial table creation sort the fields to minimize padding. It won't always be perfect but sometimes it could help quite a bit. It also wouldn't help much if you start altering the table afterward but even then the next time you pg_dump and reload you'll get a more efficient layout. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > The performance costs of that seem pretty daunting, however, especially when > > you reflect that simply stepping over a varlena field would require > > memcpy'ing its length word to someplace. > > I think if you give up on disk and in-memory representations being the same > then there are ways of finessing that. For example you could have all the > lengths together in the header prior to the variable length fields. Hm, this might have nice cache effects when reading in a tuple too. Since all the lengths would likely fit in a single cache line and probably the same cache line as the null bitmap even it means you can find all the offsets without actually having to bring in the rest of the tuple into the processor. I don't think that alone would be enough to outweigh the costs of having to convert to an in-memory representation though. Even if that was still just a pointer to the buffer memory in the simple case. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> No one has mentioned that we page value on disk to match the CPU >> alignment. This is done for efficiency, but is not strictly required. > > Well, it is unless you are willing to give up support of non-Intel CPUs; > most other popular chips are strict about alignment, and will fail an > attempt to do a nonaligned fetch. Intel CPUs are detectable at compile time, right? Do we use less padding in the layout for tables on Intel-based servers? If not, could we? I would be particularly interested in the creation of a 24-bit integer if it could pack into only three bytes. (If the layout forces an extra byte of padding per integer, the advantage is lost.) For argument sake, if I created a contrib extension called "int3" which stored 24-bit integers, in the int3.source file I could write: CREATE TYPE int3 (internallength = 3,input = int3_in,output = int3_out,alignment = <ALIGNMENT> ); And then have sed replace <ALIGNMENT> with either "char" or "int4" depending on the architecture. Is there a reason this wouldn't work? For the example schema which started this thread, a contrib extension for ascii fields could be written, with types like ascii1, ascii2, ascii3, and ascii4, each with implicit upcasts to text. A contrib for int1 and uint1 could be written to store single byte integers in a single byte, performing math on them correctly, etc. mark > The only way we could pack stuff without alignment is to go over to the > idea that memory and disk representations are different --- where in > this case the "conversion" might just be a memcpy to a known-aligned > location. The performance costs of that seem pretty daunting, however, > especially when you reflect that simply stepping over a varlena field > would require memcpy'ing its length word to someplace. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote: > >Well, it is unless you are willing to give up support of non-Intel CPUs; > >most other popular chips are strict about alignment, and will fail an > >attempt to do a nonaligned fetch. > > Intel CPUs are detectable at compile time, right? Do we use less > padding in the layout for tables on Intel-based servers? If not, could we? Intel CPUs may not complain about unaligned reads, they're still inefficient. Internally it does two aligned reads and rearranges the bytes. On other architechtures the OS can emulate that but postgres doesn't use that for obvious reasons. > For the example schema which started this thread, a contrib extension > for ascii fields could be written, with types like ascii1, ascii2, > ascii3, and ascii4, each with implicit upcasts to text. A contrib for > int1 and uint1 could be written to store single byte integers in a > single byte, performing math on them correctly, etc. The problem is that for each of those ascii types, to actually use them they would have to be converted, which would amount to allocating some memory, copying and adding a length header. At some point you have to wonder whether you're actually saving anything. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Sun, Sep 10, 2006 at 11:55:35AM -0700, Mark Dilger wrote: >>> Well, it is unless you are willing to give up support of non-Intel CPUs; >>> most other popular chips are strict about alignment, and will fail an >>> attempt to do a nonaligned fetch. >> Intel CPUs are detectable at compile time, right? Do we use less >> padding in the layout for tables on Intel-based servers? If not, could we? > > Intel CPUs may not complain about unaligned reads, they're still > inefficient. Internally it does two aligned reads and rearranges the > bytes. On other architechtures the OS can emulate that but postgres > doesn't use that for obvious reasons. This gets back to the CPU vs. I/O bound issue, right? Might not some people (with heavily taxed disks but lightly taxed CPU) prefer that trade-off? >> For the example schema which started this thread, a contrib extension >> for ascii fields could be written, with types like ascii1, ascii2, >> ascii3, and ascii4, each with implicit upcasts to text. A contrib for >> int1 and uint1 could be written to store single byte integers in a >> single byte, performing math on them correctly, etc. > > The problem is that for each of those ascii types, to actually use them > they would have to be converted, which would amount to allocating some > memory, copying and adding a length header. At some point you have to > wonder whether you're actually saving anything. > > Have a nice day, I'm not sure what you mean by "actually use them". The types could have their own comparator operators. So you could use them for sorting and indexing, and use them in WHERE clauses with these comparisons without any conversion to/from text. I mentioned implicit upcasts to text merely to handle other cases, such as using them in a LIKE or ILIKE, or concatenation, etc., where the work of providing this functionality for each contrib datatype would not really be justified. I'm not personally as interested in the aforementioned ascii types as I am in the int1 and int3 types, but the argument in favor of each is about the same. If a person has a large table made of small data, it seems really nuts to have 150% - 400% bloat on that table, when such a small amount of work is needed to write the contrib datatypes necessary to store the data compactly. The argument made upthread that a quadratic number of conversion operators is necessitated doesn't seem right to me, given that each type could upcast to the canonical built in type. (int1 => smallint, int3 => integer, ascii1 => text, ascii2 => text, ascii3 => text, etc.) Operations on data of differing type can be done in the canonical type, but the common case for many users would be operations between data of the same type, for which no conversion is required. Am I missing something that would prevent this approach from working? I am seriously considering writing these contrib datatypes for use either on pgfoundary or the contrib/ subdirectory for the 8.3 release, but am looking for advice if I am really off-base. Thanks, mark
Mark Dilger <pgsql@markdilger.com> writes: > ... The argument made upthread that a > quadratic number of conversion operators is necessitated doesn't seem > right to me, given that each type could upcast to the canonical built in > type. (int1 => smallint, int3 => integer, ascii1 => text, ascii2 => > text, ascii3 => text, etc.) This would work all right for the string-category cases, since TEXT is the only thing you really care about having them cast to anyway. It probably won't work all that well for int1/int3, because you really want them to coerce implicitly to all the "wider" numeric types. Otherwise, perfectly sane queries like "int8 + int1" fail. Part of the issue here is that we deliberately keep the parser from searching for multi-step coercions. So for example if you only provide int1->int2 then the existence of up-casts from int2 doesn't help you use an int1 with anything except int2. I am not sure whether any problems would be created if you did provide the full spectrum of up-casts. I remember having argued that there would be problems with trying to invent uint2/uint4 types, but that was a very long time ago, before we had pg_cast and some other changes in the type resolution rules. With the current system it might work OK. regards, tom lane
Added to TODO:* Consider ways of storing rows more compactly on disk o Store disk pages with no alignment/padding? o Reorder physical storage order to reduce padding? o Support a smaller header for shortvariable-length fields? o Reduce the row header size? --------------------------------------------------------------------------- Gregory Stark wrote: > Bruce Momjian <bruce@momjian.us> writes: > > > I think it would be good to see if we can extend the varlena data types > > to support a shorter header for storing short byte values. Looking at > > the header now we have: > > This isn't the first time we've been down that route. There were some > extensive discussions a while back. I think there were even patches. > I don't remember why it was eventually rejected. I suspect it simply got too > complex. > > But I think this is a dead-end route. What you're looking at is the number "1" > repeated for *every* record in the table. And what your proposing amounts to > noticing that the number "4" fits in a byte and doesn't need a whole word to > store it. Well sure, but you don't even need a byte if it's going to be the > same for every record in the table. > > If someone popped up on the list asking about whether Postgres compressed > their data efficiently if they stored a column that was identical throughout > the whole table you would tell them to normalize their data. > > -- > greg -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote: > Mark Dilger <pgsql@markdilger.com> writes: >> ... The argument made upthread that a >> quadratic number of conversion operators is necessitated doesn't seem >> right to me, given that each type could upcast to the canonical built in >> type. (int1 => smallint, int3 => integer, ascii1 => text, ascii2 => >> text, ascii3 => text, etc.) > > This would work all right for the string-category cases, since TEXT is > the only thing you really care about having them cast to anyway. > It probably won't work all that well for int1/int3, because you really > want them to coerce implicitly to all the "wider" numeric types. > Otherwise, perfectly sane queries like "int8 + int1" fail. > > Part of the issue here is that we deliberately keep the parser from > searching for multi-step coercions. So for example if you only provide > int1->int2 then the existence of up-casts from int2 doesn't help you > use an int1 with anything except int2. > > I am not sure whether any problems would be created if you did provide > the full spectrum of up-casts. I remember having argued that there > would be problems with trying to invent uint2/uint4 types, but that was > a very long time ago, before we had pg_cast and some other changes in > the type resolution rules. With the current system it might work OK. > > regards, tom lane Thanks Tom, I will try this then. I won't be proposing to ever put this in core, as the increased code size isn't justified for peoplewho aren't using these types (IMHO). Any further feedback on why this wouldn't work is appreciated, as it might save me some time learning on my own. But otherwise I'll post back in a few days when this is finished. mark
Bruce Momjian <bruce@momjian.us> writes: > * Consider ways of storing rows more compactly on disk > o Support a smaller header for short variable-length fields? With respect to the business of having different on-disk and in-memory representations, we have that already today: see TOAST. It strikes me that it might be useful to think about solving the problem with a "second generation toast mechanism". The first generation only worried about storing large values, but the second generation would also address the problem of storing small values efficiently. Or you could think about it as a "second generation varlena". This mindset would make for a slightly different set of choices about where the work gets done. I'm not sure which is better. Either way, I think it would be interesting to consider (a) length word either one or two bytes, not four. You can't need more than 2 bytes for a datum that fits in a disk page ... (b) alignment either one or two bytes, not four. TEXT would be perfectly happy with 1-byte alignment, but for NUMERIC we might want 2. I'm inclined to bag the idea of storing the length words separately from the data proper. Although it probably would make for some marginal gain in cache efficiency, I don't see any reasonable way at all to fit it into the current system structure, whereas either the "toast" or "next gen varlena" approaches seem fairly straightforward. And having to track an additional pointer inside the inner loops of heap_form_tuple and heap_deform_tuple could eat up any performance gain anyway. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > * Consider ways of storing rows more compactly on disk > > o Support a smaller header for short variable-length fields? > > With respect to the business of having different on-disk and in-memory > representations, we have that already today: see TOAST. It strikes me > that it might be useful to think about solving the problem with a > "second generation toast mechanism". The first generation only worried > about storing large values, but the second generation would also address > the problem of storing small values efficiently. > > Or you could think about it as a "second generation varlena". This > mindset would make for a slightly different set of choices about where > the work gets done. I'm not sure which is better. > > Either way, I think it would be interesting to consider > > (a) length word either one or two bytes, not four. You can't need more > than 2 bytes for a datum that fits in a disk page ... That is an interesting observation, though could compressed inline values exceed two bytes? > (b) alignment either one or two bytes, not four. TEXT would be > perfectly happy with 1-byte alignment, but for NUMERIC we might want 2. > > I'm inclined to bag the idea of storing the length words separately from > the data proper. Although it probably would make for some marginal gain > in cache efficiency, I don't see any reasonable way at all to fit it > into the current system structure, whereas either the "toast" or "next > gen varlena" approaches seem fairly straightforward. And having to > track an additional pointer inside the inner loops of heap_form_tuple > and heap_deform_tuple could eat up any performance gain anyway. Good point. How do we do it now? I assume we store just the fixed-size toast pointer length in the heap attribute, not the toast length. Why haven't we investigated shrinking the varlena header before? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Tom Lane wrote: >> Either way, I think it would be interesting to consider >> >> (a) length word either one or two bytes, not four. You can't need more >> than 2 bytes for a datum that fits in a disk page ... > That is an interesting observation, though could compressed inline > values exceed two bytes? After expansion, perhaps, but it's the on-disk footprint that concerns us here. I thought a bit more about this and came up with a zeroth-order sketch: The "length word" for an on-disk datum could be either 1 or 2 bytes; in the 2-byte case we'd need to be prepared to fetch the bytes separately to avoid alignment issues. The high bits of the first byte say what's up: * First two bits 00: 2-byte length word, uncompressed inline data follows. This allows a maximum on-disk size of 16K for an uncompressed datum, so we lose nothing at all for standard-size disk pages and not much for 32K pages (remember the toaster will try to compress any tuple exceeding 1/4 page anyway ... this just makes it mandatory). * First two bits 01: 2-byte length word, compressed inline data follows. Again, hard limit of 16K, so if your data exceeds that you have to push it out to the toast table. Again, this policy costs zero for standard size disk pages and not much for 32K pages. * First two bits 10: 1-byte length word, zero to 62 bytes of uncompressed inline data follows. This is the case that wins for short values. * First two bits 11: 1-byte length word, pointer to out-of-line toast data follows. We may as well let the low 6 bits of the length word be the size of the toast pointer, same as it works now. Since the toast pointer is not guaranteed aligned anymore, we'd have to memcpy it somewhere before using it ... but compared to the other costs of fetching a toast value, that's surely down in the noise. The distinction between compressed and uncompressed toast data would need to be indicated in the body of the toast pointer, not in the length word as today, but nobody outside of tuptoaster.c would care. Notice that heap_deform_tuple only sees 2 cases here: high bit 0 means 2-byte length word, high bit 1 means 1-byte. It doesn't care whether the data is compressed or toasted, same as today. There are other ways we could divvy up the bit assignments of course. The main issue is keeping track of whether any given Datum is in this compressed-for-disk format or in the uncompressed 4-byte-length-word format. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Bruce Momjian <bruce@momjian.us> writes: > > Tom Lane wrote: > >> Either way, I think it would be interesting to consider > >> > >> (a) length word either one or two bytes, not four. You can't need more > >> than 2 bytes for a datum that fits in a disk page ... > > > That is an interesting observation, though could compressed inline > > values exceed two bytes? > > After expansion, perhaps, but it's the on-disk footprint that concerns > us here. I'm a bit confused by this and how it would be handled in your sketch. I assumed we needed a bit pattern dedicated to 4-byte length headers because even though it would never occur on disk it would be necessary to for the uncompressed and/or detoasted data. In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted to larger than 16k? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > Bruce Momjian <bruce@momjian.us> writes: > > > Tom Lane wrote: > > >> Either way, I think it would be interesting to consider > > >> > > >> (a) length word either one or two bytes, not four. You can't need more > > >> than 2 bytes for a datum that fits in a disk page ... > > > > > That is an interesting observation, though could compressed inline > > > values exceed two bytes? > > > > After expansion, perhaps, but it's the on-disk footprint that concerns > > us here. > > I'm a bit confused by this and how it would be handled in your sketch. I > assumed we needed a bit pattern dedicated to 4-byte length headers because > even though it would never occur on disk it would be necessary to for the > uncompressed and/or detoasted data. Well, we have to expand the TOAST anyway in memory, so when we do that we already give it the right length header. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Gregory Stark <gsstark@mit.edu> writes: > I'm a bit confused by this and how it would be handled in your sketch. I > assumed we needed a bit pattern dedicated to 4-byte length headers because > even though it would never occur on disk it would be necessary to for the > uncompressed and/or detoasted data. > In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted > to larger than 16k? I'm imagining that it would give you the same old uncompressed in-memory representation as it does now, ie, 4-byte length word and uncompressed data. The weak spot of the scheme is that it assumes different, incompatible in-memory and on-disk representations. This seems to require either (a) coercing values to in-memory form before they ever get handed to any datatype manipulation function, or (b) thinking of some magic way to pass out-of-band info about the contents of the datum. (b) is the same stumbling block we have in connection with making typmod available to datatype manipulation functions. I don't want to reject (b) entirely, but it seems to require some pretty major structural changes. OTOH (a) is not very pleasant either, and so what would be nice is if we could tell by inspection of the Datum alone which format it's in. After further thought I have an alternate proposal that does that, but it's got its own disadvantage: it requires storing uncompressed 4-byte length words in big-endian byte order everywhere. This might be a showstopper (does anyone know the cost of ntohl() on modern Intel CPUs?), but if it's not then I see things working like this: * If high order bit of datum's first byte is 0, then it's an uncompressed datum in what's essentially the same as our current in-memory format except that the 4-byte length word must be big-endian (to ensure that the leading bit can be kept zero). In particular this format will be aligned on 4- or 8-byte boundary as called for by the datatype definition. * If high order bit of first byte is 1, then it's some compressed variant. I'd propose divvying up the code space like this: * 0xxxxxxx uncompressed 4-byte length word as stated above* 10xxxxxx 1-byte length word, up to 62 bytes of data* 110xxxxx 2-byte length word, uncompressed inline data* 1110xxxx 2-byte length word, compressed inline data* 1111xxxx 1-bytelength word, out-of-line TOAST pointer This limits us to 8K uncompressed or 4K compressed inline data without toasting, which is slightly annoying but probably still an insignificant limitation. It also means more distinct cases for the heap_deform_tuple inner loop to think about, which might be a problem. Since the compressed forms would not be aligned to any boundary, there's an important special case here: how can heap_deform_tuple tell whether the next field is compressed or not? The answer is that we'll have to require pad bytes between fields to be zero. (They already are zeroed by heap_form_tuple, but now it'd be a requirement.) So the algorithm for decoding a non-null field is: * if looking at a byte with high bit 0, then we are eitheron the start of an uncompressed field, or on a pad byte beforesucha field. Advance to the declared alignment boundary forthe datatype, read a 4-byte length word, and proceed. * if looking at a byte with high bit 1, then we are at thestart of a compressed field (which will never have any precedingpadbytes). Decode length as per rules above. The good thing about this approach is that it requires zero changes to fundamental system structure. The pack/unpack rules in heap_form_tuple and heap_deform_tuple change a bit, and the mechanics of PG_DETOAST_DATUM change, but a Datum is still just a pointer and you can always tell what you've got by examining the pointed-to data. regards, tom lane
Tom Lane wrote: > After further thought I have an alternate proposal that does that, > but it's got its own disadvantage: it requires storing uncompressed > 4-byte length words in big-endian byte order everywhere. This might > be a showstopper (does anyone know the cost of ntohl() on modern > Intel CPUs?), but if it's not then I see things working like this: > > * If high order bit of datum's first byte is 0, then it's an > uncompressed datum in what's essentially the same as our current > in-memory format except that the 4-byte length word must be big-endian > (to ensure that the leading bit can be kept zero). In particular this > format will be aligned on 4- or 8-byte boundary as called for by the > datatype definition. > > * If high order bit of first byte is 1, then it's some compressed > variant. I'd propose divvying up the code space like this: > > * 0xxxxxxx uncompressed 4-byte length word as stated above > * 10xxxxxx 1-byte length word, up to 62 bytes of data > * 110xxxxx 2-byte length word, uncompressed inline data > * 1110xxxx 2-byte length word, compressed inline data > * 1111xxxx 1-byte length word, out-of-line TOAST pointer Great. I assumed we would have to use a variable-length header, as you described. I don't think ntohl() is going to be a problem. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote: > (does anyone know the cost of ntohl() on modern > Intel CPUs?) I wrote a simple test program to determine this: #include <arpa/inet.h> int main (int argc, char *argv[]) { unsigned long i; uint32_t a; a = 0; for (i = 0 ; i < 4000000000L ; ++i) { #ifdef CALL_NTOHL a = ntohl(i); #endif } return a; } I have a system with an Athlon 64 3200+ (2.0 GHz) running in 64-bit mode, another one with the same processor running in 32-bit mode, a a third running a Pentium 4 1.5 GHz processor, and a fourth running a pair of 2.8 GHz Xeons in hyperthreading mode. I compiled the test program on the 32-bit systems with the -std=c9x option so that the constant would be treated as unsigned. Other than that, the compilation method I used was identical: no optimization, since it would skip the loop entirely in the version without the ntohl() call. I compiled it both with and without defining CALL_NTOHL, and measured the difference in billed CPU seconds. Based on the above, on both Athlon 64 systems, each ntohl() invocation and assignment takes 1.04 nanoseconds to complete (I presume the assignment is to a register, but I'd have to examine the assembly to know for sure). On the 1.5 GHz P4 system, each iteration takes 8.49 nanoseconds. And on the 2.8 GHz Xeon system, each iteration takes 5.01 nanoseconds. That seems reasonably fast to me... -- Kevin Brown kevin@sysexperts.com
On Sun, 10 Sep 2006, Kevin Brown wrote: > Tom Lane wrote: > > (does anyone know the cost of ntohl() on modern > > Intel CPUs?) > > I have a system with an Athlon 64 3200+ (2.0 GHz) running in 64-bit > mode, another one with the same processor running in 32-bit mode, a a > third running a Pentium 4 1.5 GHz processor, and a fourth running a > pair of 2.8 GHz Xeons in hyperthreading mode. > > I compiled the test program on the 32-bit systems with the -std=c9x > option so that the constant would be treated as unsigned. Other than > that, the compilation method I used was identical: no optimization, > since it would skip the loop entirely in the version without the > ntohl() call. I compiled it both with and without defining > CALL_NTOHL, and measured the difference in billed CPU seconds. > > Based on the above, on both Athlon 64 systems, each ntohl() invocation > and assignment takes 1.04 nanoseconds to complete (I presume the > assignment is to a register, but I'd have to examine the assembly to > know for sure). On the 1.5 GHz P4 system, each iteration takes 8.49 > nanoseconds. And on the 2.8 GHz Xeon system, each iteration takes > 5.01 nanoseconds. Of course, that depends on the particular OS and variant as well. IIRC, at some point an instruction was added to x86 instruction set to do byte swapping. This is from /usr/include/netinet/in.h on a gentoo linux box with glibc 2.3 #ifdef __OPTIMIZE__ /* We can optimize calls to the conversion functions. Either nothing has to be done or we are using directly the byte-swappingfunctions which often can be inlined. */ # if __BYTE_ORDER == __BIG_ENDIAN /* The host byte order is the same as network byte order, so these functions are all just identity. */ # define ntohl(x) (x) # define ntohs(x) (x) # define htonl(x) (x) # define htons(x) (x) # else # if __BYTE_ORDER == __LITTLE_ENDIAN # define ntohl(x) __bswap_32 (x) # define ntohs(x) __bswap_16 (x) # define htonl(x) __bswap_32 (x) # define htons(x) __bswap_16 (x) # endif # endif #endif And from bits/byteswap.h /* To swap the bytes in a word the i486 processors and up provide the `bswap' opcode. On i386 we have to use three instructions. */ # if !defined __i486__ && !defined __pentium__ && !defined __pentiumpro__ \ && !defined __pentium4__ # define __bswap_32(x) \ (__extension__ \ ({ register unsigned int __v, __x = (x); \ if (__builtin_constant_p (__x)) \ __v = __bswap_constant_32 (__x); \ else \ __asm__("rorw $8, %w0;" \ "rorl $16, %0;" \ "rorw $8, %w0" \ : "=r" (__v) \ : "0" (__x) \ : "cc"); \ __v; })) # else # define __bswap_32(x) \ (__extension__ \ ({ registerunsigned int __v, __x = (x); \ if (__builtin_constant_p (__x)) \ __v = __bswap_constant_32 (__x); \ else \ __asm__ ("bswap %0" : "=r" (__v) : "0" (__x)); \ __v; })) # endif /me searches around his hard drive for the ia32 developers reference BSWAP Opcode Instruction Description 0F C8+rd BSWAP r32 Reverse the byte order of a 32-bit register ... The BSWAP instruction is not supported on IA-32 processors earlier than the Intel486 processor family. ... I have read some odd stuff about instructions like these. Apparently the fact that this is a "prefixed instruction" (the 0F byte at the beginning) costs an extra clock cycle, so though this instruction should take 1 cycle, it ends up taking 2. I am unclear whether or not this is rectified in later pentium chips. So to answer the question about how much ntohl costs on recent Intel boxes, a properly optimized build with a friendly libc like I quoted should be able to do it in 2 cycles. -- In Ohio, if you ignore an orator on Decoration day to such an extent as to publicly play croquet or pitch horseshoes within one mile of the speaker's stand, you can be fined $25.00.
Tom Lane <tgl@sss.pgh.pa.us> writes: > Gregory Stark <gsstark@mit.edu> writes: >> I'm a bit confused by this and how it would be handled in your sketch. I >> assumed we needed a bit pattern dedicated to 4-byte length headers because >> even though it would never occur on disk it would be necessary to for the >> uncompressed and/or detoasted data. > >> In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted >> to larger than 16k? > > I'm imagining that it would give you the same old uncompressed in-memory > representation as it does now, ie, 4-byte length word and uncompressed > data. Sure, but how would you know? Sometimes you would get a pointer to a varlena starting with a bytes with a leading 00 indicating a 1-byte varlena header and sometimes you would get a pointer to a varlena with the old uncompressed representation with a 4-byte length header which may well start with a 00. > * If high order bit of first byte is 1, then it's some compressed > variant. I'd propose divvying up the code space like this: > > * 0xxxxxxx uncompressed 4-byte length word as stated above > * 10xxxxxx 1-byte length word, up to 62 bytes of data > * 110xxxxx 2-byte length word, uncompressed inline data > * 1110xxxx 2-byte length word, compressed inline data > * 1111xxxx 1-byte length word, out-of-line TOAST pointer I'm unclear how you're using the remaining bits. Are you saying you would have a 4-byte length word following this bit-flag byte? Or are you saying we would use 31 bits for the 4-byte length word, 13 bits for the 2-byte uncompressed length word and 12 bits for the compressed length word? Also Heikki points out here that it would be nice to allow for the case for a 0-byte header. So for example if the leading bit is 0 then the remaining 7 bits are available for the datum itself. This would actually vacate much of my argument for a fixed length char(n) data type. The most frequent use case is for things like CHAR(1) fields containg 'Y' or 'N'. In any case it seems a bit backwards to me. Wouldn't it be better to preserve bits in the case of short length words where they're precious rather than long ones? If we make 0xxxxxxx the 1-byte case it means limiting our maximum datum size to something like .5G but if you're working with .5G data wouldn't you be using an api that lets you access it by chunks anyways? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Hi, Tom, Tom Lane wrote: > The only way we could pack stuff without alignment is to go over to the > idea that memory and disk representations are different --- where in > this case the "conversion" might just be a memcpy to a known-aligned > location. The performance costs of that seem pretty daunting, however, > especially when you reflect that simply stepping over a varlena field > would require memcpy'ing its length word to someplace. AFAICS, PostGIS already uses this approach internally, mostly because its current format requires a mix of byte-sized and larger (int, double) fields. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
Tom Lane <tgl@sss.pgh.pa.us> writes: > Mark Dilger <pgsql@markdilger.com> writes: > > ... The argument made upthread that a > > quadratic number of conversion operators is necessitated doesn't seem > > right to me, given that each type could upcast to the canonical built in > > type. (int1 => smallint, int3 => integer, ascii1 => text, ascii2 => > > text, ascii3 => text, etc.) > > This would work all right for the string-category cases, since TEXT is > the only thing you really care about having them cast to anyway. > It probably won't work all that well for int1/int3, because you really > want them to coerce implicitly to all the "wider" numeric types. > Otherwise, perfectly sane queries like "int8 + int1" fail. Actually that one works, you get numeric addition which then would get cast back to an int8 if you tried to store it in an int8 column. Not necessarily the most efficient way to go about it though. However you do have to provide all the cross-data-type comparisons if you want indexes to work right and that alone gives you a couple hundred catalog entries. > Part of the issue here is that we deliberately keep the parser from > searching for multi-step coercions. So for example if you only provide > int1->int2 then the existence of up-casts from int2 doesn't help you > use an int1 with anything except int2. After my initial plea for multi-step coercions I've thought about it a bit further and I think I can make a stronger case for them now: Consider that in the current situation there's an asymmetry between function calls and casts. If you call a function and there's an implicit cast that matches the argument then we'll use it. But if you use a cast implemented with that same function it won't work. For example say you implement a numeric data type called mynumber and you define function mynumber(numeric) => mynumber and use it to implement the cast numeric::mynumber. You'll find mynumber(integer) works just fine but integer::mynumber doesn't. To make the situation parallel we would had to find two-step casts only for explicit casts and if and only if the intermediate cast is an implicit cast. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> I'm imagining that it would give you the same old uncompressed in-memory >> representation as it does now, ie, 4-byte length word and uncompressed >> data. > Sure, but how would you know? Sometimes you would get a pointer to a varlena > starting with a bytes with a leading 00 indicating a 1-byte varlena header and > sometimes you would get a pointer to a varlena with the old uncompressed > representation with a 4-byte length header which may well start with a 00. Yeah, in that scheme you need some out-of-band information telling you if the datum is compressed or not. The second scheme I posted avoids that problem. >> * If high order bit of first byte is 1, then it's some compressed >> variant. I'd propose divvying up the code space like this: >> >> * 0xxxxxxx uncompressed 4-byte length word as stated above >> * 10xxxxxx 1-byte length word, up to 62 bytes of data >> * 110xxxxx 2-byte length word, uncompressed inline data >> * 1110xxxx 2-byte length word, compressed inline data >> * 1111xxxx 1-byte length word, out-of-line TOAST pointer > I'm unclear how you're using the remaining bits. Length (or high order bits of it, if the length covers more than 1 byte). > Also Heikki points out here that it would be nice to allow for the case for a > 0-byte header. I don't think there's enough code space for that; at least not compared to its use case. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: >> Also Heikki points out here that it would be nice to allow for the case for a >> 0-byte header. > > I don't think there's enough code space for that; at least not compared > to its use case. Well it's irrelevant if we add a special data type to handle CHAR(1). But if we don't it's pretty important. Even with 1-byte varlena headers you can have approaching 100% bloat if you have a table with lots of CHAR(1) fields. That said I'm not sure whether it's worth it over having a special CHAR(1) data type which would have the benefit of handling other 1-byte encodings aside from ascii. We would probably still need a CHAR(2) data type too where the overhead is still 50%. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > >> Also Heikki points out here that it would be nice to allow for the case for a > >> 0-byte header. > > > > I don't think there's enough code space for that; at least not compared > > to its use case. > > Well it's irrelevant if we add a special data type to handle CHAR(1). In that case you should probably be using "char" ... -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Sep 11, 2006 at 03:13:36PM +0100, Gregory Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > >> Also Heikki points out here that it would be nice to allow for the case for a > >> 0-byte header. > > > > I don't think there's enough code space for that; at least not compared > > to its use case. > > Well it's irrelevant if we add a special data type to handle CHAR(1). We already have a CHAR(1), it's called "char" and it's exactly one byte. This discussion should probably be about strings longer than that. It's a pity arrays have so much overhead, otherwise you could work with arrays of "char". Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Alvaro Herrera <alvherre@commandprompt.com> writes: > > Well it's irrelevant if we add a special data type to handle CHAR(1). > > In that case you should probably be using "char" ... Well "char" doesn't have quite the same semantics as CHAR(1). If that's the consensus though then I can work on either fixing "char" semantics to match CHAR(1) or adding a separate type instead. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Sun, 2006-09-10 at 21:16 -0400, Tom Lane wrote: > After further thought I have an alternate proposal (snip) > * If high order bit of datum's first byte is 0, then it's an > uncompressed datum in what's essentially the same as our current > in-memory format except that the 4-byte length word must be big-endian > (to ensure that the leading bit can be kept zero). In particular this > format will be aligned on 4- or 8-byte boundary as called for by the > datatype definition. > > * If high order bit of first byte is 1, then it's some compressed > variant. I'd propose divvying up the code space like this: > > * 0xxxxxxx uncompressed 4-byte length word as stated above > * 10xxxxxx 1-byte length word, up to 62 bytes of data > * 110xxxxx 2-byte length word, uncompressed inline data > * 1110xxxx 2-byte length word, compressed inline data > * 1111xxxx 1-byte length word, out-of-line TOAST pointer > > This limits us to 8K uncompressed or 4K compressed inline data without > toasting, which is slightly annoying but probably still an insignificant > limitation. It also means more distinct cases for the heap_deform_tuple > inner loop to think about, which might be a problem. > > Since the compressed forms would not be aligned to any boundary, > there's an important special case here: how can heap_deform_tuple tell > whether the next field is compressed or not? The answer is that we'll > have to require pad bytes between fields to be zero. (They already are > zeroed by heap_form_tuple, but now it'd be a requirement.) So the > algorithm for decoding a non-null field is: > > * if looking at a byte with high bit 0, then we are either > on the start of an uncompressed field, or on a pad byte before > such a field. Advance to the declared alignment boundary for > the datatype, read a 4-byte length word, and proceed. > > * if looking at a byte with high bit 1, then we are at the > start of a compressed field (which will never have any preceding > pad bytes). Decode length as per rules above. > > The good thing about this approach is that it requires zero changes to > fundamental system structure. The pack/unpack rules in heap_form_tuple > and heap_deform_tuple change a bit, and the mechanics of > PG_DETOAST_DATUM change, but a Datum is still just a pointer and you > can always tell what you've got by examining the pointed-to data. Seems like a great approach to this pain point. More fun than lots of new datatypes also. Is this an 8.2 thing? If not, is Numeric508 applied? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > > > > Well it's irrelevant if we add a special data type to handle CHAR(1). > > > > In that case you should probably be using "char" ... > > Well "char" doesn't have quite the same semantics as CHAR(1). If that's the > consensus though then I can work on either fixing "char" semantics to match > CHAR(1) or adding a separate type instead. What semantics? I thought you would just store a byte there, retrieve it and compare to something else. Anything beyond this doesn't probably make much sense (to me anyway). Are you thinking in concatenating it, etc? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Gregory Stark <stark@enterprisedb.com> writes: > In any case it seems a bit backwards to me. Wouldn't it be better to > preserve bits in the case of short length words where they're precious > rather than long ones? If we make 0xxxxxxx the 1-byte case it means ... Well, I don't find that real persuasive: you're saying that it's important to have a 1-byte not 2-byte header for datums between 64 and 127 bytes long. Which is by definition less than a 2% savings for those values. I think its's more important to pick bitpatterns that reduce the number of cases heap_deform_tuple has to think about while decoding the length of a field --- every "if" in that inner loop is expensive. I realized this morning that if we are going to preserve the rule that 4-byte-header and compressed-header cases can be distinguished from the data alone, there is no reason to be very worried about whether the 2-byte cases can represent the maximal length of an in-line datum. If you want to do 16K inline (and your page is big enough for that) you can just fall back to the 4-byte-header case. So there's no real disadvantage if the 2-byte headers can only go up to 4K or so. This gives us some more flexibility in the bitpattern choices. Another thought that occurred to me is that if we preserve the convention that a length word's value includes itself, then for a 1-byte header the bit pattern 10000000 is meaningless --- the count has to be at least 1. So one trick we could play is to take over this value as the signal for "toast pointer follows", with the assumption that the tuple-decoder code knows a-priori how big a toast pointer is. I am not real enamored of this, because it certainly adds one case to the inner heap_deform_tuple loop and it'll give us problems if we ever want more than one kind of toast pointer. But it's a possibility. Anyway, a couple of encodings that I'm thinking about now involve limiting uncompressed data to 1G (same as now), so that we can play with the first 2 bits instead of just 1: 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G) 01xxxxxx 4-byte length word, aligned, compressed data (up to 1G) 100xxxxx 1-byte length word, unaligned, TOAST pointer 1010xxxx 2-byte length word, unaligned, uncompressed data (up to 4K) 1011xxxx 2-byte length word, unaligned, compressed data (up to 4K) 11xxxxxx 1-byte length word, unaligned, uncompressed data (up to 63b) or 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G) 010xxxxx 2-byte length word, unaligned, uncompressed data (up to 8K) 011xxxxx 2-byte length word, unaligned, compressed data (up to 8K) 10000000 1-byte length word, unaligned, TOAST pointer 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 127b) (xxxxxxx not all zero) This second choice allows longer datums in both the 1-byte and 2-byte header formats, but it hardwires the length of a TOAST pointer and requires four cases to be distinguished in the inner loop; the first choice only requires three cases, because TOAST pointer and 1-byte header can be handled by the same rule "length is low 6 bits of byte". The second choice also loses the ability to store in-line compressed data above 8K, but that's probably an insignificant loss. There's more than one way to do it ... regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> In any case it seems a bit backwards to me. Wouldn't it be better to >> preserve bits in the case of short length words where they're precious >> rather than long ones? If we make 0xxxxxxx the 1-byte case it means ... > > Well, I don't find that real persuasive: you're saying that it's > important to have a 1-byte not 2-byte header for datums between 64 and > 127 bytes long. Which is by definition less than a 2% savings for those > values. Sure, but my thinking was that saving one byte on data between 64 and 127 bytes long is more important than saving two bytes on data between 4k and 8k or whatever the range was in that proposal. > I think its's more important to pick bitpatterns that reduce the number of > cases heap_deform_tuple has to think about while decoding the length of a > field --- every "if" in that inner loop is expensive. I'll have to spend a few hours tomorrow becoming one with that section of code. I looked at it already and was surprised at how short it was already so I can understand what you mean. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > Is this an 8.2 thing? You are joking, no? > If not, is Numeric508 applied? No, that got rejected as being too much of a restriction of the dynamic range, eg John's comment here: http://archives.postgresql.org/pgsql-general/2005-12/msg00246.php I think a more practical way of shaving 2 bytes from NUMERIC would be to invent a fixed-2-byte-header variant of varlena. That's something we talked about in the NUMERIC thread but ultimately dropped --- it'd be worth reconsidering along with the current ideas about multiple varlena header formats, however. The reason to not just make it use the generalized varlena format under discussion is that it'd really like to have at least 2-byte alignment; that would be enough to avoid memcpy-for-alignment. Another interesting subset of the problem is the inet/cidr datatypes, which I think would be perfectly happy with the 1-byte-header variants we've talked about --- AFAIR the inet code doesn't really have any alignment requirements on its data, and it certainly doesn't need values longer than 63 bytes. So that subset of the concept might need to be broken out as a separately usable thing too. regards, tom lane
On Mon, Sep 11, 2006 at 01:15:43PM -0400, Tom Lane wrote: > Gregory Stark <stark@enterprisedb.com> writes: > > In any case it seems a bit backwards to me. Wouldn't it be better to > > preserve bits in the case of short length words where they're precious > > rather than long ones? If we make 0xxxxxxx the 1-byte case it means ... > Well, I don't find that real persuasive: you're saying that it's > important to have a 1-byte not 2-byte header for datums between 64 and > 127 bytes long. Which is by definition less than a 2% savings for those > values. I think its's more important to pick bitpatterns that reduce > the number of cases heap_deform_tuple has to think about while decoding > the length of a field --- every "if" in that inner loop is expensive. I like your thought process on this, Tom. I read your suggestions and didn't respond because I was in full agreement with them. The 1-byte header would be valuable even if it only worked for 32-bytes. It is important to keep CPU overhead down by making it easy to switch off the bit patterns. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Tom Lane <tgl@sss.pgh.pa.us> writes: > No, that got rejected as being too much of a restriction of the dynamic > range, eg John's comment here: > http://archives.postgresql.org/pgsql-general/2005-12/msg00246.php That logic seems questionable. John makes two points: a) crypto applications are within a factor of two of the proposed limitation. Firstly, nobody does actual crypto work using Postgres's numeric data type. It would be ridiculously slow. They wouldn't even store numbers used for crypto in it, they would use bytea or something like that to store a binary bitstring. Secondly, there's nothing blocking us from changing it again in the future. It would make pg_upgrade a pain but solving user-defined datatypes being redefined would be a necessity anyways. A future version could always revert the change. b) Because we're usually not especially concerned with CPU usage of numeric we're also not concerned with space usage of numeric. I'm not sure what the arguments were that he's referring to but I have trouble imagining a credible argument against being concerned for cpu usage that wouldn't result in the conclusion that space usage was *more* important. I was actually going to suggest going back and looking for *more* space savings in numeric. I had assumed this first step had gone in long ago. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> No, that got rejected as being too much of a restriction of the dynamic >> range, eg John's comment here: >> http://archives.postgresql.org/pgsql-general/2005-12/msg00246.php > That logic seems questionable. John makes two points: > a) crypto applications are within a factor of two of the proposed limitation. > Firstly, nobody does actual crypto work using Postgres's numeric data type. > It would be ridiculously slow. That's utterly irrelevant. The point is that there are standard applications today in which people need that much precision; therefore, the argument that "10^508 is far more than anyone could want" is on exceedingly shaky ground. Besides, isn't "it's too slow" a bug we'd like to fix someday? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > That's utterly irrelevant. The point is that there are standard > applications today in which people need that much precision; therefore, > the argument that "10^508 is far more than anyone could want" is on > exceedingly shaky ground. My point is those applications aren't practical in our current implementation and we can always extend the precision later if we decide we want it to be. > Besides, isn't "it's too slow" a bug we'd like to fix someday? The only way I see to do that is to replace our implementation entirely with something like libgmp. At first I meant that as a reductio ad absurdum argument, but, uh, come to think of it why *do* we have our own arbitrary precision library? Is there any particular reason we can't use one of the existing binary implementations? I think libgmp itself is GPL'd but there are others and even if libgmp is GPL'd that just puts it into the same camp as readline. It would have to be an option and even the strictest interpretations of the GPL as long as there are alternative implementations it's fine. I was going to spend time looking at optimising numeric's storage but it seems like a waste of time if we could just use an implementation that's better. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > At first I meant that as a reductio ad absurdum argument, but, uh, > come to think of it why *do* we have our own arbitrary precision > library? Is there any particular reason we can't use one of the > existing binary implementations? Going over to binary storage would trade off I/O speed for calculation speed, which is probably not a win for everyone; and even more seriously, how are you going to represent decimal fractions exactly? The fact that 0.01 is 0.01 and not just a near approximation thereto is critical for a lot of our users. I have no objection to relying on someone else's package if it actually solves our problem, but not if it just solves a related problem. (It might be interesting to offer a "bignum" datatype that uses binary math internally, but replacing numeric with it would be a hard sell.) regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: > > At first I meant that as a reductio ad absurdum argument, but, uh, > > come to think of it why *do* we have our own arbitrary precision > > library? Is there any particular reason we can't use one of the > > existing binary implementations? > > Going over to binary storage would trade off I/O speed for calculation > speed, which is probably not a win for everyone; Huh? Which would you expect binary to be worse at than decimal? I would expect it to be both faster and denser. > and even more seriously, how are you going to represent decimal fractions > exactly? The fact that 0.01 is 0.01 and not just a near approximation > thereto is critical for a lot of our users. Certainly any arbitrary precision library isn't worth beans if it can't represent values accurately. I'm not sure how gmp and the others represent their data but my first guess is that there's no particular reason the base of the mantissa and exponent have to be the same as the base the exponent is interpreted as. That is, you can store a base 10 exponent but store it and the mantissa in two's complement integers. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Mon, Sep 11, 2006 at 07:05:12PM -0400, Gregory Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > Gregory Stark <stark@enterprisedb.com> writes: > > > At first I meant that as a reductio ad absurdum argument, but, uh, > > > come to think of it why *do* we have our own arbitrary precision > > > library? Is there any particular reason we can't use one of the > > > existing binary implementations? > > Going over to binary storage would trade off I/O speed for calculation > > speed, which is probably not a win for everyone; > Huh? Which would you expect binary to be worse at than decimal? I > would expect it to be both faster and denser. Representation is the difficult part. > > and even more seriously, how are you going to represent decimal fractions > > exactly? The fact that 0.01 is 0.01 and not just a near approximation > > thereto is critical for a lot of our users. > Certainly any arbitrary precision library isn't worth beans if it can't > represent values accurately. This isn't correct. Try representing 0.01 "accurately" in binary. See what you come up with. :-) > I'm not sure how gmp and the others represent their data but my > first guess is that there's no particular reason the base of the > mantissa and exponent have to be the same as the base the exponent > is interpreted as. That is, you can store a base 10 exponent but > store it and the mantissa in two's complement integers. I don't think gmp does this, nor do I expect it would be trivial to author a package that was both efficient, and could operate in any base. I believe gmp operates in a base that is the size of the CPU word, usually 32-bits or 64-bits. It does not offer ability to calculate or store using base 10. I've seen libraries that do an acceptable job storing items in base 1000 or higher for use in decimal calculations. I have no idea what PostgreSQL itself does... :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
On Mon, 2006-09-11 at 14:25 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > Is this an 8.2 thing? > > You are joking, no? Confirming, using an open question, and a smile. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera <alvherre@commandprompt.com> writes: > Gregory Stark wrote: >> >> Well "char" doesn't have quite the same semantics as CHAR(1). If that's the >> consensus though then I can work on either fixing "char" semantics to match >> CHAR(1) or adding a separate type instead. > > What semantics? The main bit that comes to mind is 32::CHAR(1) give you '3' but 32::"char" gives you ' '. Really it makes more sense if you think of "char" is a 1 byte integer type with some extra text casts and operators to make C programmers happy, not a 1 byte character type. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Sep 11, 2006, at 1:57 PM, Gregory Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> I think its's more important to pick bitpatterns that reduce the >> number of >> cases heap_deform_tuple has to think about while decoding the >> length of a >> field --- every "if" in that inner loop is expensive. > > I'll have to spend a few hours tomorrow becoming one with that > section of > code. I looked at it already and was surprised at how short it was > already so > I can understand what you mean. I'm guessing this won't change the design, but I'll throw it out anyway. I'd love to have the ability to control toasting thresholds manually. This could result in a lot of speed improvements in cases where a varlena field isn't frequently accessed and will be fairly large, yet not large enough to normally trigger toasting. An address field would be a good example. Being able to force a field to be toasted before it normally would could drastically improve tuple density without requiring the developer to use a 'side table' to store the data. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Mark Dilger wrote: > Tom Lane wrote: >> Mark Dilger <pgsql@markdilger.com> writes: >>> ... The argument made upthread that a quadratic number of conversion >>> operators is necessitated doesn't seem right to me, given that each >>> type could upcast to the canonical built in type. (int1 => smallint, >>> int3 => integer, ascii1 => text, ascii2 => text, ascii3 => text, etc.) >> >> This would work all right for the string-category cases, since TEXT is >> the only thing you really care about having them cast to anyway. >> It probably won't work all that well for int1/int3, because you really >> want them to coerce implicitly to all the "wider" numeric types. >> Otherwise, perfectly sane queries like "int8 + int1" fail. >> >> Part of the issue here is that we deliberately keep the parser from >> searching for multi-step coercions. So for example if you only provide >> int1->int2 then the existence of up-casts from int2 doesn't help you >> use an int1 with anything except int2. >> >> I am not sure whether any problems would be created if you did provide >> the full spectrum of up-casts. I remember having argued that there >> would be problems with trying to invent uint2/uint4 types, but that was >> a very long time ago, before we had pg_cast and some other changes in >> the type resolution rules. With the current system it might work OK. >> >> regards, tom lane > > Thanks Tom, > > I will try this then. I won't be proposing to ever put this in core, as > the increased code size isn't justified for people who aren't using > these types (IMHO). Any further feedback on why this wouldn't work is > appreciated, as it might save me some time learning on my own. But > otherwise I'll post back in a few days when this is finished. > I've created the int1 and int3 types, with casts to/from each other as well as to/from int2, int4, int8, float4, float8, and numeric. They also have comparison operators for themselves, though you have to use casts if you want to compare against other numeric types. int1 works perfectly, as far as I can tell. int3 works great in memory, but can't be stored to a table. The problem seems to be that store_att_byval allows data of size 1 byte but not size 3 bytes, forcing me to pass int3 by reference. But when I pass either of these types by reference the backend exits when trying to store to a table. Does anybody know whether storing data "by reference" works for data smaller than 4 bytes? If not, then I seem to be out of options for creating a 3-byte datatype. This would also seem to prevent the creation of an ascii3 type as well. mark
Attachment
Mark Dilger <pgsql@markdilger.com> writes: > int1 works perfectly, as far as I can tell. int3 works great in memory, > but can't be stored to a table. The problem seems to be that > store_att_byval allows data of size 1 byte but not size 3 bytes, forcing > me to pass int3 by reference. But when I pass either of these types by > reference the backend exits when trying to store to a table. Please provide a stack trace --- AFAIK there shouldn't be any reason why a pass-by-ref 3-byte type wouldn't work. I'm wondering though what alignment you expect it to have. You'd need some pretty ugly code to pick up an unaligned 3-byte integer portably ... but if you align it, the space savings probably goes out the window. regards, tom lane
In article <EC43916E-2290-4882-B64F-DD81866386DC@nasby.net>,jim@nasby.net (Jim Nasby) wrote: > I'd love to have the ability to control toasting thresholds > manually. ... Being able to force a field to be > toasted before it normally would could drastically improve tuple > density without requiring the developer to use a 'side table' to > store the data. +1 :-) -arturo
Tom Lane wrote: > Mark Dilger <pgsql@markdilger.com> writes: >> int1 works perfectly, as far as I can tell. int3 works great in memory, >> but can't be stored to a table. The problem seems to be that >> store_att_byval allows data of size 1 byte but not size 3 bytes, forcing >> me to pass int3 by reference. But when I pass either of these types by >> reference the backend exits when trying to store to a table. > > Please provide a stack trace --- AFAIK there shouldn't be any reason why > a pass-by-ref 3-byte type wouldn't work. I'm wondering though what > alignment you expect it to have. You'd need some pretty ugly code to > pick up an unaligned 3-byte integer portably ... but if you align it, > the space savings probably goes out the window. > > regards, tom lane Program received signal SIGSEGV, Segmentation fault. 0xb7e01d45 in memcpy () from /lib/libc.so.6 (gdb) bt #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, values=0x83c2e84, isnull=0x83c2e98 "", data=0x83c2ef4 "", infomask=0x83c2ef0, bit=0x0) at heaptuple.c:181 #2 0x08078b0d in heap_form_tuple (tupleDescriptor=0x83c2d78, values=0x83c2e84, isnull=0x83c2e98 "") at heaptuple.c:749 #3 0x0815d2d9 in ExecCopySlotTuple (slot=0x83c26f4) at execTuples.c:558 #4 0x0815d393 in ExecMaterializeSlot (slot=0x83c26f4) at execTuples.c:639 #5 0x081560ca in ExecutorRun (queryDesc=0x83c2834, direction=ForwardScanDirection, count=0) at execMain.c:1401 #6 0x081e78e4 in ProcessQuery (parsetree=0x83c2240, plan=0x83b837c, params=0x3, dest=0x83b8290, completionTag=0xbfedffa0 "") at pquery.c:174 #7 0x081e89f9 in PortalRun (portal=0x83c0064, count=2147483647, dest=0x83b8290, altdest=0x83b8290, completionTag=0xbfedffa0 "") at pquery.c:1076 #8 0x081e4060 in exec_simple_query (query_string=0x83b7bbc "insert into test (a) values (3::int3);") at postgres.c:1004 #9 0x081e6074 in PostgresMain (argc=4, argv=0x836fab4, username=0x836fa8c "mark") at postgres.c:3219 #10 0x081b89b3 in ServerLoop () at postmaster.c:2854 #11 0x081ba21b in PostmasterMain (argc=1, argv=0x836d9f8) at postmaster.c:941 #12 0x081764a8 in main (argc=1, argv=0x836d9f8) at main.c:265
Mark Dilger <pgsql@markdilger.com> writes: > Tom Lane wrote: >> Please provide a stack trace --- AFAIK there shouldn't be any reason why >> a pass-by-ref 3-byte type wouldn't work. > (gdb) bt > #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 > #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, > values=0x83c2e84, isnull=0x83c2e98 "", data=0x83c2ef4 "", > infomask=0x83c2ef0, bit=0x0) > at heaptuple.c:181 Hm, are you sure you provided a valid pointer (not the integer value itself) as the Datum output from int3_in? (Looks at patch ... ) Um, I think you didn't, although that coding is far too cute to be actually readable ... regards, tom lane
Tom Lane wrote: > Mark Dilger <pgsql@markdilger.com> writes: >> Tom Lane wrote: >>> Please provide a stack trace --- AFAIK there shouldn't be any reason why >>> a pass-by-ref 3-byte type wouldn't work. > >> (gdb) bt >> #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 >> #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, >> values=0x83c2e84, isnull=0x83c2e98 "", data=0x83c2ef4 "", >> infomask=0x83c2ef0, bit=0x0) >> at heaptuple.c:181 > > Hm, are you sure you provided a valid pointer (not the integer value > itself) as the Datum output from int3_in? > > (Looks at patch ... ) Um, I think you didn't, although that coding > is far too cute to be actually readable ... > > regards, tom lane I tracked this down to my implementation of Int24GetDatum. I've got that fixed now and have hit another bug, but I'm still working on it so I won't bother you about that yet. As for the patch, I will eventually submit a version without the "cute" code autogeneration stuff. mark
Hi, Jim, Jim Nasby wrote: > I'd love to have the ability to control toasting thresholds manually. > This could result in a lot of speed improvements in cases where a > varlena field isn't frequently accessed and will be fairly large, yet > not large enough to normally trigger toasting. An address field would be > a good example. Being able to force a field to be toasted before it > normally would could drastically improve tuple density without requiring > the developer to use a 'side table' to store the data. Sounds good. But I remember that the query planner underestimated sequential scans when lots of TOAST data was in the table. IIRC, The specific case (that was discussent on pgperform) was about 70 PostGIS geometries, amounting to about 35MB of TOAST data and only 2 or 3 pages in the actual table. The query planner used an sequential scan instead of an GIST index scan (&& operator), leading to deTOASTing and processing all 35 MB of geometries, instead of just those 2 small ones that matched the index condition. So I think before we start toasting more, we should check whether the query planner could be affected negatively. It should have statistics about TOAST data, and then see whether he'd need to detoast for condition checking and for actual data fetching. Thanks, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org
Gregory Stark wrote: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > > > Gregory Stark wrote: > >> > >> Well "char" doesn't have quite the same semantics as CHAR(1). If that's the > >> consensus though then I can work on either fixing "char" semantics to match > >> CHAR(1) or adding a separate type instead. > > > > What semantics? > > The main bit that comes to mind is 32::CHAR(1) give you '3' but 32::"char" > gives you ' '. > > Really it makes more sense if you think of "char" is a 1 byte integer type > with some extra text casts and operators to make C programmers happy, not a 1 > byte character type. One very nifty trick would be to fix "char" to act as CHAR(), and map CHAR(1) automatically to "char". -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
My apologies if you are seeing this twice. I posted it last night, but it still does not appear to have made it to the group. Mark Dilger wrote: > Tom Lane wrote: >> Mark Dilger <pgsql@markdilger.com> writes: >>> Tom Lane wrote: >>>> Please provide a stack trace --- AFAIK there shouldn't be any reason >>>> why >>>> a pass-by-ref 3-byte type wouldn't work. >> >>> (gdb) bt >>> #0 0xb7e01d45 in memcpy () from /lib/libc.so.6 >>> #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7, >>> values=0x83c2e84, isnull=0x83c2e98 "", data=0x83c2ef4 "", >>> infomask=0x83c2ef0, bit=0x0) >>> at heaptuple.c:181 >> >> Hm, are you sure you provided a valid pointer (not the integer value >> itself) as the Datum output from int3_in? >> >> (Looks at patch ... ) Um, I think you didn't, although that coding >> is far too cute to be actually readable ... >> >> regards, tom lane > > Ok, I have it working on my intel architecture machine. Here are some > of my findings. Disk usage is calculated by running 'du -b' in > /usr/local/pgsql/data before and after loading the table, and taking the > difference. That directory is deleted, recreated, and initdb rerun > between each test. The host system is a dual processor, dual core 2.4 > GHz system, 2 GB DDR400 memory, 10,000 RPM SCSI ultra160 hard drive with > the default postgresql.conf file as created by initdb. The code is the > stock postgresql-8.1.4 release tarball compiled with gcc and configured > without debug or cassert options enabled. > > > INT3 VS INT4 > ------------ > Using a table of 8 integers per row and 16777216 rows, I can drop the > disk usage from 1.2 GB down to 1.0 GB by defining those integers as int3 > rather than int4. (It works out to about 70.5 bytes per row vs. 62.5 > bytes per row.) However, the load time actually increases, probably due > to CPU/memory usage. The time increased from 197 seconds to 213 > seconds. Note that int3 is defined pass-by-reference due to a > limitation in the code that prevents pass-by-value for any datasize > other than 1, 2, or 4 bytes. > > Using a table of only one integer per row, the table size is exactly the > same (down to the byte) whether I use int3 or int4. I suspect this is > due to data alignment for the row being on at least a 4 byte boundary. > > Creating an index on a single column of the 8-integer-per-row table, the > index size is exactly the same whether the integers are int3 or int4. > Once again, I suspect that data alignment is eliminating the space savings. > > I haven't tested this, but I suspect that if the column following an > int3 is aligned on 4 or 8 byte boundaries, that the int3 column will > have an extra byte padded and hence will have no performance gain. > > > INT1 VS INT2 > ------------ > Once again using a table of 8 integers per row and 16777216 rows, I can > drop the disk usage from 909 MB down to 774 MB by defining those > integers as int1 rather than int2. (54 bytes per row vs 46 bytes per > row.) The load time also drops, from 179 seconds to 159 seconds. Note > that int1 is defined pass-by-value. > > > mark
Bruce Momjian wrote: > Gregory Stark wrote: > > > > Alvaro Herrera <alvherre@commandprompt.com> writes: > > > > > Gregory Stark wrote: > > >> > > >> Well "char" doesn't have quite the same semantics as CHAR(1). If that's the > > >> consensus though then I can work on either fixing "char" semantics to match > > >> CHAR(1) or adding a separate type instead. > > > > > > What semantics? > > > > The main bit that comes to mind is 32::CHAR(1) give you '3' but 32::"char" > > gives you ' '. > > > > Really it makes more sense if you think of "char" is a 1 byte integer type > > with some extra text casts and operators to make C programmers happy, not a 1 > > byte character type. > > One very nifty trick would be to fix "char" to act as CHAR(), and map > CHAR(1) automatically to "char". Sorry, probably a stupid idea considering multi-byte encodings. I suppose it could be an optimization for single-byte encodings, but that seems very limiting. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: >> One very nifty trick would be to fix "char" to act as CHAR(), and map >> CHAR(1) automatically to "char". > > Sorry, probably a stupid idea considering multi-byte encodings. I > suppose it could be an optimization for single-byte encodings, but that > seems very limiting. No, there are lots of single-byte encoding databases. And one day we'll have per-column encoding anyways and there are lots of databases that have columns that want to be one-character ascii encoded fields. It's limited but I wouldn't say it's very limiting. In the cases where it doesn't apply there's no way out anyways. A UTF8 field will need a length header in some form. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, Sep 14, 2006 at 10:21:30PM +0100, Gregory Stark wrote: > >> One very nifty trick would be to fix "char" to act as CHAR(), and map > >> CHAR(1) automatically to "char". > > Sorry, probably a stupid idea considering multi-byte encodings. I > > suppose it could be an optimization for single-byte encodings, but that > > seems very limiting. > No, there are lots of single-byte encoding databases. And one day we'll have > per-column encoding anyways and there are lots of databases that have columns > that want to be one-character ascii encoded fields. > > It's limited but I wouldn't say it's very limiting. In the cases where it > doesn't apply there's no way out anyways. A UTF8 field will need a length > header in some form. Declaring a column as ASCII should allow for char(8) to mean the same as byte(8) with text semantics. byte(8) shouldn't require a length header. :-) Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
Gregory Stark wrote: > It's limited but I wouldn't say it's very limiting. In the cases where it > doesn't apply there's no way out anyways. A UTF8 field will need a length > header in some form. Actually, you can determine the length of a UTF-8 encoded character by looking at the most significant bits of the first byte. So we could store a UTF-8 encoded CHAR(1) field without any additional length header. See http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 for the bit patterns. AFAIK, UTF-16 works similarly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Sep 15, 2006 at 10:01:19AM +0100, Heikki Linnakangas wrote: > Gregory Stark wrote: > >It's limited but I wouldn't say it's very limiting. In the cases where it > >doesn't apply there's no way out anyways. A UTF8 field will need a length > >header in some form. > > Actually, you can determine the length of a UTF-8 encoded character by > looking at the most significant bits of the first byte. So we could > store a UTF-8 encoded CHAR(1) field without any additional length header. Except in postgres the length of a datum is currently only determined from the type, or from a standard varlena header. Going down the road of having to call type specific length functions for the values in columns 1 to n-1 just to read column n seems like a really bad idea. We want to make access to later columns *faster* not slower, which means keeping to the simplest (code-wise) scheme possible. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Fri, Sep 15, 2006 at 10:01:19AM +0100, Heikki Linnakangas wrote: >> Actually, you can determine the length of a UTF-8 encoded character by >> looking at the most significant bits of the first byte. So we could >> store a UTF-8 encoded CHAR(1) field without any additional length header. > > Except in postgres the length of a datum is currently only determined > from the type, or from a standard varlena header. Going down the road > of having to call type specific length functions for the values in > columns 1 to n-1 just to read column n seems like a really bad idea. > > We want to make access to later columns *faster* not slower, which > means keeping to the simplest (code-wise) scheme possible. We really have two goals. We want to reduce on-disk storage size to save I/O, and we want to keep processing simple to save CPU. Some ideas help one goal but hurt the other so we have to strike a balance between the two. My gut feeling is that it wouldn't be that bad compared to what we have now or the new proposed varlena scheme, but before someone actually tries it and shows some numbers, this is just hand-waving. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Sep 15, 2006 at 11:43:52AM +0100, Heikki Linnakangas wrote: > My gut feeling is that it wouldn't be that bad compared to what we have > now or the new proposed varlena scheme, but before someone actually > tries it and shows some numbers, this is just hand-waving. Well, that depends on whether you're going to make a special typlen value for *just* UTF-8, which would probably cost about the same. Or allow any type to have it's own Datum length function, which would be very expensive. Calling user-defined functions is not cheap. I don't think making a special typlen value just for a type that can store a single UTF-8 character is smart. I just can't see enough use to make it worth it. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > I don't think making a special typlen value just for a type that can > store a single UTF-8 character is smart. I just can't see enough use to > make it worth it. > Assuming that we can set encoding per-column one day, I agree. If you have a CHAR(1) field, you're going to store codes like 'A', 'B', 'C' or '1', '2', '3' in it, and you don't need UTF-8 for that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
What about the "char" type? Isn't it designed for that? Or will this type disappear in future releases? -----Ursprüngliche Nachricht----- Von: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] Im Auftrag von Heikki Linnakangas Gesendet: Freitag, 15. September 2006 13:35 An: Martijn van Oosterhout Cc: stark@enterprisedb.com; pgsql-hackers@postgresql.org Betreff: Re: [HACKERS] Fixed length data types issue Martijn van Oosterhout wrote: > I don't think making a special typlen value just for a type that can > store a single UTF-8 character is smart. I just can't see enough use > to make it worth it. > Assuming that we can set encoding per-column one day, I agree. If you have a CHAR(1) field, you're going to store codes like'A', 'B', 'C' or '1', '2', '3' in it, and you don't need UTF-8 for that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend
On Fri, Sep 15, 2006 at 01:38:54PM +0200, Mario Weilguni wrote: > What about the "char" type? Isn't it designed for that? Or will this type disappear in future releases? "char" is used in the system catalogs, I don't think it's going to go any time soon. There it's used as a (surprise) single byte indicater, with different letters meaning different things. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > I don't think making a special typlen value just for a type that can > store a single UTF-8 character is smart. I just can't see enough use to > make it worth it. Well there are lots of data types that can probably tell how long they are based on internal state. And they can often store that state much more compactly because they know more about the possible values. Consider for example a network data type that can store either ipv4 or ipv6 addresses -- it only needs a single bit to indicate the length. While I agree that having to invoke data type specific functions just to do a heap_deform_tuple would probably be far outside the bounds of possibility I think it's still an interesting direction to ponder. Sometimes you reach entirely practical ideas indirectly by brainstorming about outrageous ideas. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Fri, Sep 08, 2006 at 15:08:18 -0400, Andrew Dunstan <andrew@dunslane.net> wrote: > > From time to time the idea of a logical vs physical mapping for columns > has been mentioned. Among other benefits, that might allow us to do some > rearrangement of physical ordering to reduce space wasted on alignment > in some cases. There might be a small addition on computation required, > but I suspect it would be lost in the noise, and swamped by any > increased efficiency we got from putting more tuples in a page. I believe another counter argument raised, is that this would be a source of a lot of bugs.
On Mon, Sep 11, 2006 at 19:05:12 -0400, Gregory Stark <gsstark@mit.edu> wrote: > > I'm not sure how gmp and the others represent their data but my first guess is > that there's no particular reason the base of the mantissa and exponent have > to be the same as the base the exponent is interpreted as. That is, you can > store a base 10 exponent but store it and the mantissa in two's complement > integers. You can also store numbers as a relatively prime numerator and denominator, which will let store rational numbers exactly. Doing this isn't going to help with speed of operations though.