Thread: Maximum number of columns in a table
The FAQs state "Maximum number of columns in a table? 250-1600 depending on column types".
Is there a location which gives instructions for calculating whether a number of columns of different types can be accommodated?
I have a situation where I need to load tables from another database which may theoretically contain up to 1000 columns per table and this information will determine whether I need to rebuild PostgreSQL with a larger blocksize. Splitting into multiple tables is not always an easy option.
Many thanks
Gerry
__________ Information from ESET NOD32 Antivirus, version of virus signature database 4001 (20090411) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
Thats alot of columns
you might want to normalise your data before settling on any feeding a fixed DB schema
http://www.datamodel.org/NormalizationRules.html
1)
Are there any repeating patterns? such as Repeating Groups or Redundant data e.g.
OrderLineItem1...OrderLineItem2
splitting the grouping of (related data points) such as OrderLineItems to a new table would allow you to creating a unique key such as OrderLineItemID (foreign key in your table) to point to uniquely access only the related data points from your new table
2)
Are there any data columns that do not necessarily absolutely belong in the data?
I once worked with Patient Data where someone put Patient phone number in Diagnosis
(Patient's diagnosis has nothing to do with their phone unless the bill is off the charts)
Diagnosis belongs with Patient DRG code and Diagnostic Physician Name
Patient Phone number is more closely associated with Contact record for that Patient
Up front schema design is vital and necessary activity for ensuring you know how your data can be aggregated into functional groups before
1)Any Database/Table Creation (or any DB entity is created)
2)Any data Feeds from external sources or Form Population to your new DB entities
take place..
HTH
Martin
______________________________________________
Disclaimer and Confidentiality/Verzicht und Vertraulichkeitanmerkung / Note de déni et de confidentialité
From: gerry@tbstbs.net
To: pgsql-general@postgresql.org
Subject: [GENERAL] Maximum number of columns in a table
Date: Sun, 12 Apr 2009 05:33:35 +1000
__________ Information from ESET NOD32 Antivirus, version of virus signature database 4001 (20090411) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
Rediscover Hotmail®: Get quick friend updates right in your inbox. Check it out.
you might want to normalise your data before settling on any feeding a fixed DB schema
http://www.datamodel.org/NormalizationRules.html
1)
Are there any repeating patterns? such as Repeating Groups or Redundant data e.g.
OrderLineItem1...OrderLineItem2
splitting the grouping of (related data points) such as OrderLineItems to a new table would allow you to creating a unique key such as OrderLineItemID (foreign key in your table) to point to uniquely access only the related data points from your new table
2)
Are there any data columns that do not necessarily absolutely belong in the data?
I once worked with Patient Data where someone put Patient phone number in Diagnosis
(Patient's diagnosis has nothing to do with their phone unless the bill is off the charts)
Diagnosis belongs with Patient DRG code and Diagnostic Physician Name
Patient Phone number is more closely associated with Contact record for that Patient
Up front schema design is vital and necessary activity for ensuring you know how your data can be aggregated into functional groups before
1)Any Database/Table Creation (or any DB entity is created)
2)Any data Feeds from external sources or Form Population to your new DB entities
take place..
HTH
Martin
______________________________________________
Disclaimer and Confidentiality/Verzicht und Vertraulichkeitanmerkung / Note de déni et de confidentialité
This message is confidential. If you should not be the intended receiver, then we ask politely to report. Each unauthorized forwarding or manufacturing of a copy is inadmissible. This message serves only for the exchange of information and has no legal binding effect. Due to the easy manipulation of emails we cannot take responsibility over the the contents.
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.
From: gerry@tbstbs.net
To: pgsql-general@postgresql.org
Subject: [GENERAL] Maximum number of columns in a table
Date: Sun, 12 Apr 2009 05:33:35 +1000
The FAQs state "Maximum number of columns in a table? 250-1600 depending on column types".
Is there a location which gives instructions for calculating whether a number of columns of different types can be accommodated?
I have a situation where I need to load tables from another database which may theoretically contain up to 1000 columns per table and this information will determine whether I need to rebuild PostgreSQL with a larger blocksize. Splitting into multiple tables is not always an easy option.
Many thanks
Gerry
__________ Information from ESET NOD32 Antivirus, version of virus signature database 4001 (20090411) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
Rediscover Hotmail®: Get quick friend updates right in your inbox. Check it out.
This is a generic problem - I have no control over the data and need to solve the problem by programming without analyzing each application. However, I do have control over the way the data is stored and can use user-defined types.
I guess I could partially restate the problem as: How many varlena columns can be supported in a table with a standard 8K blocksize?
Gerry
__________ Information from ESET NOD32 Antivirus, version of virus signature database 4001 (20090411) __________
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com
On Sun, Apr 12, 2009 at 05:33:35AM +1000, Gerry Scales wrote: > The FAQs state "Maximum number of columns in a table? 250-1600 > depending on column types". > > Is there a location which gives instructions for calculating whether a > number of columns of different types can be accommodated? > > I have a situation where I need to load tables from another database > which may theoretically contain up to 1000 columns per table and this > information will determine whether I need to rebuild PostgreSQL with > a larger blocksize. Splitting into multiple tables is not always an > easy option. I think the only definitive advice I can give would be to try it and see how it works. That said the pg_column_size function may be of use and I'd have a look at how variable length fields (i.e. CHAR(n), VARCHAR(n), TEXT, NUMERIC) and TOAST are handled. Fixed size fields, such as integers are going to be fine for you and you should be able to fit more than a thousand integers into a row, but text values will depend on the size of the text you're inserting. I think that either short strings (i.e. less than 8 characters on average) or large bits of text, because they're getting TOASTed, should be OK. Also note that NULL values get compressed into a bitmap at the beginning of the tuple and so don't take up much space. Hope that helps! -- Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> writes: > On Sun, Apr 12, 2009 at 05:33:35AM +1000, Gerry Scales wrote: >> I have a situation where I need to load tables from another database >> which may theoretically contain up to 1000 columns per table and this >> information will determine whether I need to rebuild PostgreSQL with >> a larger blocksize. Splitting into multiple tables is not always an >> easy option. > ... Fixed size fields, > such as integers are going to be fine for you and you should be able > to fit more than a thousand integers into a row, but text values will > depend on the size of the text you're inserting. I think that either > short strings (i.e. less than 8 characters on average) or large bits of > text, because they're getting TOASTed, should be OK. In principle, every varchar column could get toasted into an 18-byte (more in pre-8.3 versions) TOAST pointer, meaning you could get to somewhere around 450 columns in 8K regardless of how wide they are. In practice the performance would probably suck too much to be useful --- you don't want to be toasting key columns, in particular. So really the question is uselessly vague as posed. We'd need to know a lot more about the columns' average widths and usage patterns before we could say much about how well it's likely to work. I rather suspect that most columns in a thousand-column table will be null in any particular row, in which case it would likely work all right. We used to have some issues with O(N^2) performance on lots of columns, but I think those are largely gone in recent releases. regards, tom lane
Thanks Sam, that's excellent advice. If a developer doesn't jump in with a definitive formula, that is the path I will take. Regards Gerry ----- Original Message ----- From: "Sam Mason" <sam@samason.me.uk> To: <pgsql-general@postgresql.org> Sent: Sunday, April 12, 2009 7:23 AM Subject: Re: [GENERAL] Maximum number of columns in a table > On Sun, Apr 12, 2009 at 05:33:35AM +1000, Gerry Scales wrote: >> The FAQs state "Maximum number of columns in a table? 250-1600 >> depending on column types". >> >> Is there a location which gives instructions for calculating whether a >> number of columns of different types can be accommodated? >> >> I have a situation where I need to load tables from another database >> which may theoretically contain up to 1000 columns per table and this >> information will determine whether I need to rebuild PostgreSQL with >> a larger blocksize. Splitting into multiple tables is not always an >> easy option. > > I think the only definitive advice I can give would be to try it and > see how it works. That said the pg_column_size function may be of > use and I'd have a look at how variable length fields (i.e. CHAR(n), > VARCHAR(n), TEXT, NUMERIC) and TOAST are handled. Fixed size fields, > such as integers are going to be fine for you and you should be able > to fit more than a thousand integers into a row, but text values will > depend on the size of the text you're inserting. I think that either > short strings (i.e. less than 8 characters on average) or large bits of > text, because they're getting TOASTed, should be OK. Also note that > NULL values get compressed into a bitmap at the beginning of the tuple > and so don't take up much space. > > Hope that helps! > > > -- > Sam http://samason.me.uk/ > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
On Sat, Apr 11, 2009 at 05:43:06PM -0400, Tom Lane wrote: > In principle, every varchar column could get toasted into an 18-byte > (more in pre-8.3 versions) TOAST pointer, meaning you could get to > somewhere around 450 columns in 8K regardless of how wide they are. Wow, never realised it was that much before. Where is all that? varattrib_1b_e = 3 bytes, but will overlap one byte with: varatt_external = 16 bytes Is that right? > In practice the performance would probably suck too much to be useful > --- you don't want to be toasting key columns, in particular. Good point; if you're referring to any reasonable number of these columns in each query it's going to be somewhat grim. > So really the question is uselessly vague as posed. We'd need to know a > lot more about the columns' average widths and usage patterns before we > could say much about how well it's likely to work. I guess it's things like lots of NULLs = good that the OP was trying to find out. Just out of interest; what would happen if you had: CREATE TABLE wider ( col0001 TEXT, col0002 TEXT, -- [ .... ] col9998 TEXT, col9999 TEXT ); CREATE TABLE store ( pk1 INTEGER, pk2 TEXT, PRIMARY KEY (pk1,pk2), data wider ); Would the "data" tend to end up toasted, or would PG try and expand the data inline and fail some of the time? Also, if I ran the following query: SELECT pk1, pk2, (data).col0001, (data).col0101 FROM store; Would "data" get detoasted once per row, or per column referenced? -- Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> writes: > Just out of interest; what would happen if you had: > CREATE TABLE wider ( > col0001 TEXT, > col0002 TEXT, > -- [ .... ] > col9998 TEXT, > col9999 TEXT > ); > CREATE TABLE store ( > pk1 INTEGER, > pk2 TEXT, > PRIMARY KEY (pk1,pk2), > data wider > ); > Would the "data" tend to end up toasted, or would PG try and expand the > data inline and fail some of the time? The toast code doesn't recurse into composite values. It would see the "data" column as one single value, so most of the time data would get toasted and pushed out as a unit. You probably don't want to adopt the above design. (Also, you'd still be subject to the 1600 column limit on the number of fields within "data", because that comes from a tuple header field width limit that has nothing to do with total tuple size.) > Also, if I ran the following query: > SELECT pk1, pk2, (data).col0001, (data).col0101 FROM store; > Would "data" get detoasted once per row, or per column referenced? Probably the latter. I did some work a few months ago trying to make the former happen, but it crashed and burned for reasons I don't recall at the moment. regards, tom lane
Thank you gentlemen. I believe I am now enlightened. 1. The maximum number of rows per table is not limited by an algorithm at CREATE/ALTER TABLE other than to enforce an absolute maximum number of columns of 1600. 2. The ability to store any individual row depends on its content. I hasten to add that I am simply catering for a boundary condition which would (surely) never occur but I have to be defensive here. My aim is simply to cover this without unduly impacting the normal case. I have been researching this for some time and I am truly grateful for your expert and thoughtful assistance. Gerry ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Sam Mason" <sam@samason.me.uk> Cc: <pgsql-general@postgresql.org> Sent: Sunday, April 12, 2009 9:26 AM Subject: Re: [GENERAL] Maximum number of columns in a table > Sam Mason <sam@samason.me.uk> writes: >> Just out of interest; what would happen if you had: > >> CREATE TABLE wider ( >> col0001 TEXT, >> col0002 TEXT, >> -- [ .... ] >> col9998 TEXT, >> col9999 TEXT >> ); > >> CREATE TABLE store ( >> pk1 INTEGER, >> pk2 TEXT, >> PRIMARY KEY (pk1,pk2), >> data wider >> ); > >> Would the "data" tend to end up toasted, or would PG try and expand the >> data inline and fail some of the time? > > The toast code doesn't recurse into composite values. It would see the > "data" column as one single value, so most of the time data would get > toasted and pushed out as a unit. You probably don't want to adopt the > above design. (Also, you'd still be subject to the 1600 column limit > on the number of fields within "data", because that comes from a tuple > header field width limit that has nothing to do with total tuple size.) > >> Also, if I ran the following query: > >> SELECT pk1, pk2, (data).col0001, (data).col0101 FROM store; > >> Would "data" get detoasted once per row, or per column referenced? > > Probably the latter. I did some work a few months ago trying to make > the former happen, but it crashed and burned for reasons I don't recall > at the moment. > > regards, tom lane > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
*guess i'm not used to seeing such thoughtful responses*
please keep us all apprised on how your implementation progresses!
Martin
______________________________________________
Disclaimer and Confidentiality/Verzicht und Vertraulichkeitanmerkung / Note de déni et de confidentialité
> From: gerry@tbstbs.net
> To: sam@samason.me.uk; tgl@sss.pgh.pa.us
> CC: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Maximum number of columns in a table
> Date: Sun, 12 Apr 2009 09:52:29 +1000
>
> Thank you gentlemen. I believe I am now enlightened.
>
> 1. The maximum number of rows per table is not limited by an algorithm at
> CREATE/ALTER TABLE other than to enforce an absolute maximum number of
> columns of 1600.
>
> 2. The ability to store any individual row depends on its content.
>
> I hasten to add that I am simply catering for a boundary condition which
> would (surely) never occur but I have to be defensive here. My aim is simply
> to cover this without unduly impacting the normal case.
>
> I have been researching this for some time and I am truly grateful for your
> expert and thoughtful assistance.
>
> Gerry
>
> ----- Original Message -----
> From: "Tom Lane" <tgl@sss.pgh.pa.us>
> To: "Sam Mason" <sam@samason.me.uk>
> Cc: <pgsql-general@postgresql.org>
> Sent: Sunday, April 12, 2009 9:26 AM
> Subject: Re: [GENERAL] Maximum number of columns in a table
>
>
> > Sam Mason <sam@samason.me.uk> writes:
> >> Just out of interest; what would happen if you had:
> >
> >> CREATE TABLE wider (
> >> col0001 TEXT,
> >> col0002 TEXT,
> >> -- [ .... ]
> >> col9998 TEXT,
> >> col9999 TEXT
> >> );
> >
> >> CREATE TABLE store (
> >> pk1 INTEGER,
> >> pk2 TEXT,
> >> PRIMARY KEY (pk1,pk2),
> >> data wider
> >> );
> >
> >> Would the "data" tend to end up toasted, or would PG try and expand the
> >> data inline and fail some of the time?
> >
> > The toast code doesn't recurse into composite values. It would see the
> > "data" column as one single value, so most of the time data would get
> > toasted and pushed out as a unit. You probably don't want to adopt the
> > above design. (Also, you'd still be subject to the 1600 column limit
> > on the number of fields within "data", because that comes from a tuple
> > header field width limit that has nothing to do with total tuple size.)
> >
> >> Also, if I ran the following query:
> >
> >> SELECT pk1, pk2, (data).col0001, (data).col0101 FROM store;
> >
> >> Would "data" get detoasted once per row, or per column referenced?
> >
> > Probably the latter. I did some work a few months ago trying to make
> > the former happen, but it crashed and burned for reasons I don't recall
> > at the moment.
> >
> > regards, tom lane
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Rediscover Hotmail®: Get quick friend updates right in your inbox. Check it out.
please keep us all apprised on how your implementation progresses!
Martin
______________________________________________
Disclaimer and Confidentiality/Verzicht und Vertraulichkeitanmerkung / Note de déni et de confidentialité
This message is confidential. If you should not be the intended receiver, then we ask politely to report. Each unauthorized forwarding or manufacturing of a copy is inadmissible. This message serves only for the exchange of information and has no legal binding effect. Due to the easy manipulation of emails we cannot take responsibility over the the contents.
Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.
Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.
> From: gerry@tbstbs.net
> To: sam@samason.me.uk; tgl@sss.pgh.pa.us
> CC: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Maximum number of columns in a table
> Date: Sun, 12 Apr 2009 09:52:29 +1000
>
> Thank you gentlemen. I believe I am now enlightened.
>
> 1. The maximum number of rows per table is not limited by an algorithm at
> CREATE/ALTER TABLE other than to enforce an absolute maximum number of
> columns of 1600.
>
> 2. The ability to store any individual row depends on its content.
>
> I hasten to add that I am simply catering for a boundary condition which
> would (surely) never occur but I have to be defensive here. My aim is simply
> to cover this without unduly impacting the normal case.
>
> I have been researching this for some time and I am truly grateful for your
> expert and thoughtful assistance.
>
> Gerry
>
> ----- Original Message -----
> From: "Tom Lane" <tgl@sss.pgh.pa.us>
> To: "Sam Mason" <sam@samason.me.uk>
> Cc: <pgsql-general@postgresql.org>
> Sent: Sunday, April 12, 2009 9:26 AM
> Subject: Re: [GENERAL] Maximum number of columns in a table
>
>
> > Sam Mason <sam@samason.me.uk> writes:
> >> Just out of interest; what would happen if you had:
> >
> >> CREATE TABLE wider (
> >> col0001 TEXT,
> >> col0002 TEXT,
> >> -- [ .... ]
> >> col9998 TEXT,
> >> col9999 TEXT
> >> );
> >
> >> CREATE TABLE store (
> >> pk1 INTEGER,
> >> pk2 TEXT,
> >> PRIMARY KEY (pk1,pk2),
> >> data wider
> >> );
> >
> >> Would the "data" tend to end up toasted, or would PG try and expand the
> >> data inline and fail some of the time?
> >
> > The toast code doesn't recurse into composite values. It would see the
> > "data" column as one single value, so most of the time data would get
> > toasted and pushed out as a unit. You probably don't want to adopt the
> > above design. (Also, you'd still be subject to the 1600 column limit
> > on the number of fields within "data", because that comes from a tuple
> > header field width limit that has nothing to do with total tuple size.)
> >
> >> Also, if I ran the following query:
> >
> >> SELECT pk1, pk2, (data).col0001, (data).col0101 FROM store;
> >
> >> Would "data" get detoasted once per row, or per column referenced?
> >
> > Probably the latter. I did some work a few months ago trying to make
> > the former happen, but it crashed and burned for reasons I don't recall
> > at the moment.
> >
> > regards, tom lane
> >
> > --
> > Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-general
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Rediscover Hotmail®: Get quick friend updates right in your inbox. Check it out.