Thread: how to add more than 1600 columns in a table?
Hello
Sadly today we hit the 1600 columns limit of Postgresql 11.
How could we add more columns?
Note: Tables are OK. We truly have 2400 columns now. Each column represents a value in a matrix.
We have millions of rows so I would prefer not to transpose each row to (x, y, column_value) triplets because it will make all our code more difficult of what it is.
Any help, very appreciated.
Pablo
pabloa98 schrieb am 24.04.2019 um 22:17: > Sadly today we hit the 1600 columns limit of Postgresql 11. > > How could we add more columns? > > Note: Tables are OK. We truly have 2400 columns now. Each column > represents a value in a matrix. > > We have millions of rows so I would prefer not to transpose each row > to (x, y, column_value) triplets because it will make all our code > more difficult of what it is. What about using an array?
On 4/24/19 1:17 PM, pabloa98 wrote: > Hello > > Sadly today we hit the 1600 columns limit of Postgresql 11. > > How could we add more columns? > > Note: Tables are OK. We truly have 2400 columns now. Each column > represents a value in a matrix. Not sure how hit 1600 column limit = having 2400 columns? Off hand I would say you are using the wrong tool for the job. More information on what you trying to achieve might help get you to the correct tool. > > We have millions of rows so I would prefer not to transpose each row to > (x, y, column_value) triplets because it will make all our code more > difficult of what it is. > > Any help, very appreciated. > > Pablo -- Adrian Klaver adrian.klaver@aklaver.com
On 4/24/19 3:17 PM, pabloa98 wrote: > Hello > > Sadly today we hit the 1600 columns limit of Postgresql 11. > > How could we add more columns? > > Note: Tables are OK. We truly have 2400 columns now. Each column > represents a value in a matrix. > > We have millions of rows so I would prefer not to transpose each row to > (x, y, column_value) triplets because it will make all our code more > difficult of what it is. > > Any help, very appreciated. Is a relational database the proper tool for the job? -- Angular momentum makes the world go 'round.
On 2019-Apr-24, pabloa98 wrote: > How could we add more columns? Sorry. > Note: Tables are OK. We truly have 2400 columns now. Each column represents > a value in a matrix. Maybe you could use arrays? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4/24/19 3:22 PM, Adrian Klaver wrote:
I think he means that the design has 2400 columns, and Pg is failing to implement it.
On 4/24/19 1:17 PM, pabloa98 wrote:Hello
Sadly today we hit the 1600 columns limit of Postgresql 11.
How could we add more columns?
Note: Tables are OK. We truly have 2400 columns now. Each column represents a value in a matrix.
Not sure how hit 1600 column limit = having 2400 columns?
I think he means that the design has 2400 columns, and Pg is failing to implement it.
--
Angular momentum makes the world go 'round.
Angular momentum makes the world go 'round.
pabloa98 <pabloa98@gmail.com> writes: > Sadly today we hit the 1600 columns limit of Postgresql 11. > How could we add more columns? You can't, at least not without some pretty fundamental changes; that value is limited by field sizes within tuple headers. You'd also, more than likely, find yourself hitting problems with the tuples-can't-cross-page-boundary restrictions. (Our TOAST mechanism avoids that problem for large individual fields, but not for many small fields.) It seems pretty unlikely to me that any sane table design actually has thousands of truly-independent columns. Consider using arrays, or perhaps composite sub-structures (JSON maybe?). regards, tom lane
On Wed, Apr 24, 2019 at 1:17 PM pabloa98 <pabloa98@gmail.com> wrote:
HelloSadly today we hit the 1600 columns limit of Postgresql 11.How could we add more columns?
Add a second table and relate them together via a one-to-one-required relationship?
I'm unsure whether a final result set is limited or just tables...
David J.
On 4/24/19 4:17 PM, pabloa98 wrote: > Sadly today we hit the 1600 columns limit of Postgresql 11. > > How could we add more columns? > > Note: Tables are OK. We truly have 2400 columns now. Each column > represents a value in a matrix. As everyone else has mentioned, your use case sounds like arrays are appropriate. Note that PostgreSQL supports 2 dimensional (actually more than 2d if you wanted) arrays which are essentially perfect for representing a matrix. If this works for your data model it will likely save a ton of storage space and perform much better than alternatives. FWIW, if you are needing to do matrix math, you might want to look into PL/R (https://github.com/postgres-plr/plr) as it supports 2d arrays as arguments which are converted directly into R matrices. I don't know for sure but likely PL/Python could be used to process matrices as well. HTH, Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Attachment
Tom,
We are using currently 32kb pages. It was working fine with our 1200 columns tables. The new table design has around 2400 columns and it does not fit in our modified postgresql11.
We are ready (meaning, all the ready we could be with java and python but not too much C background) to modify the database source code if it is possible. We tried that a few months ago and we did not go too far. :)
I believe we replace an 8 bits int by a 32 bytes int (perhaps in t_heap? I am not sure) but it looked like we missed something because it did not work. Several tests failed and some of them paged faulted the server.
I think we had to modify other datastructures and some code in places like JIT to make it work.
A word about the database design: These tables are vectors (or matrixes) of numbers, booleans, and other vectors. They are normalized and all the columns are dependent on their PK column.
We used tables because we have 2 types of queries on this table:
SELECT * FROM table_wih_lots_of_columns WHERE condition involving a lot of columns.
These type of queries read lot of rows.
or
SELECT columnX FROM table_wih_lots_of_columns WHERE condition involving a lot of columns
These type of queries read very few rows.
UPDATES are at row level involving usually all the columns.
We like to use ACID semantic on these tables. So we chose PostgreSQL because we can create transactions, joins with other (more normal) tables and all that (compared with MongoDB it looks far better).
We considered several alternatives:
* Storing rows as JSONB, we would have to add a parsing stage in our code that will generate a lot of temporal objects. Besides, it will need a LOT more space.
* Storing rows as ARRAY[], some columns are arrays themselves. Other are booleans (that could be converted to numbers). The worst part is that we will have to reference a "column" using something like row_array[423] and trust it has the right data type. It would make our far more complicated.
* Storing info in vertical columns using arrays (the same problem than before, but adding much more complexity to the code).
* Storing outside as text files, no more ACID semantic :/
Having a plain table design let us write very clear code.
I think that PostgreSQL supporting long tables would give to the database a definitely advantage compared to other technologies used in ML and scientific programming (like MongoDB and Cassandra). Be
Pablo
On Wed, Apr 24, 2019 at 1:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
pabloa98 <pabloa98@gmail.com> writes:
> Sadly today we hit the 1600 columns limit of Postgresql 11.
> How could we add more columns?
You can't, at least not without some pretty fundamental changes;
that value is limited by field sizes within tuple headers. You'd
also, more than likely, find yourself hitting problems with the
tuples-can't-cross-page-boundary restrictions. (Our TOAST mechanism
avoids that problem for large individual fields, but not for many small
fields.)
It seems pretty unlikely to me that any sane table design actually
has thousands of truly-independent columns. Consider using arrays,
or perhaps composite sub-structures (JSON maybe?).
regards, tom lane
Thank you Joe! I will take a look
Pablo
On Wed, Apr 24, 2019 at 1:47 PM Joe Conway <mail@joeconway.com> wrote:
On 4/24/19 4:17 PM, pabloa98 wrote:
> Sadly today we hit the 1600 columns limit of Postgresql 11.
>
> How could we add more columns?
>
> Note: Tables are OK. We truly have 2400 columns now. Each column
> represents a value in a matrix.
As everyone else has mentioned, your use case sounds like arrays are
appropriate. Note that PostgreSQL supports 2 dimensional (actually more
than 2d if you wanted) arrays which are essentially perfect for
representing a matrix.
If this works for your data model it will likely save a ton of storage
space and perform much better than alternatives.
FWIW, if you are needing to do matrix math, you might want to look into
PL/R (https://github.com/postgres-plr/plr) as it supports 2d arrays as
arguments which are converted directly into R matrices.
I don't know for sure but likely PL/Python could be used to process
matrices as well.
HTH,
Joe
--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development
Arrays could work, but it will make our code less clear. It is good to read the column name (meaningful) than a number. We could use constants, but then we should maintain them...
Pablo
On Wed, Apr 24, 2019 at 1:24 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2019-Apr-24, pabloa98 wrote:
> How could we add more columns?
Sorry.
> Note: Tables are OK. We truly have 2400 columns now. Each column represents
> a value in a matrix.
Maybe you could use arrays?
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
if there is some no-SQL database supporting ACID + generic queries like a SQL database, I would consider it.
However, the column limit is the problem, no the database. 1200 cols are reasonable for standard problems. Having a Postgresql supporting thousands of columns will be useful in Machine Learning and similar domains.
Pablo
On Wed, Apr 24, 2019 at 1:23 PM Ron <ronljohnsonjr@gmail.com> wrote:
On 4/24/19 3:17 PM, pabloa98 wrote:
> Hello
>
> Sadly today we hit the 1600 columns limit of Postgresql 11.
>
> How could we add more columns?
>
> Note: Tables are OK. We truly have 2400 columns now. Each column
> represents a value in a matrix.
>
> We have millions of rows so I would prefer not to transpose each row to
> (x, y, column_value) triplets because it will make all our code more
> difficult of what it is.
>
> Any help, very appreciated.
Is a relational database the proper tool for the job?
--
Angular momentum makes the world go 'round.
On 25/04/2019 10:11, pabloa98 wrote: > Thank you Joe! I will take a look > > Pablo > > On Wed, Apr 24, 2019 at 1:47 PM Joe Conway <mail@joeconway.com > <mailto:mail@joeconway.com>> wrote: [...] Hi Pablo, The convention here is to bottom post, or to intersperse comments, like in all the replies to you. So it would be appreciated if you did that, rather than top post as you have been doing. I strongly suspect that: (1) making pg handle more than 1600 columns, would be way more complicated than you can imagine (2) suich a change would be unlikely to be accepted into the main line, which would mean you'd have to reapply your patch for every new version of pg you wanted to use! Cheers, Gavin
On Wed, Apr 24, 2019 at 3:28 PM Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
The convention here is to bottom post, or to intersperse comments, like
in all the replies to you.
So it would be appreciated if you did that, rather than top post as you
have been doing.
Thanks for the advice. I will follow the convention.
I strongly suspect that:
(1) making pg handle more than 1600 columns, would be way more
complicated than you can imagine
(2) suich a change would be unlikely to be accepted into the main
line, which would mean you'd have to reapply your patch for every
new version of pg you wanted to use!
Yes. Our 1st attempt showed us (1).
Regarding to (2), We are good by adding a patch and recompile a patched version for our server databases.
But we are open on helping to add thousands of columns support as a compile-time parameter if there are other people interested.
Something like --with_thousands_of_columns_support feature
We could add a node to the build farm to test this functionality on Ubuntu. And helping and writing tests.
Pablo
Cheers,
Gavin
On 2019-Apr-24, pabloa98 wrote: > Regarding to (2), We are good by adding a patch and recompile a patched > version for our server databases. > > But we are open on helping to add thousands of columns support as a > compile-time parameter if there are other people interested. It's hard to say what you're doing wrong when we don't know what are you actually doing. I think raising the limit requires changing ItemIdData, t_hoff, and a few members of PageHeaderData at the very least. Reading the three header files involved carefully would probably point out areas I've forgotten to mention. I think if you enlarge t_hoff and lp_off/lp_len to 16 bits, you can use 64kB blocks, which might be useful too. Note that with pg12 you could have your own table AM that supported wider ItemIds as a (small?) change on heapam, rather than supplant it for all tables. That way you would only pay the (probably considerable) cost of the wider line pointers on all tables ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Apr-24, Alvaro Herrera wrote: > Note that with pg12 you could have your own table AM that supported > wider ItemIds as a (small?) change on heapam, rather than supplant it > for all tables. That way you would only pay the (probably considerable) > cost of the wider line pointers on all tables ... "That way you would only pay the (probably considerable) cost of the wider line pointers on the tables that need it rather than all of them." -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 4/24/19 5:55 PM, Alvaro Herrera wrote: > On 2019-Apr-24, pabloa98 wrote: > >> Regarding to (2), We are good by adding a patch and recompile a patched >> version for our server databases. >> >> But we are open on helping to add thousands of columns support as a >> compile-time parameter if there are other people interested. > It's hard to say what you're doing wrong when we don't know > what are you actually doing. > > I think raising the limit requires changing ItemIdData, t_hoff, and a > few members of PageHeaderData at the very least. Reading the three > header files involved carefully would probably point out areas I've > forgotten to mention. I think if you enlarge t_hoff and lp_off/lp_len > to 16 bits, you can use 64kB blocks, which might be useful too. > > Note that with pg12 you could have your own table AM that supported > wider ItemIds as a (small?) change on heapam, rather than supplant it > for all tables. That way you would only pay the (probably considerable) > cost of the wider line pointers on all tables ... Do views have the same 1600 column limit as tables? -- Angular momentum makes the world go 'round.
On 25/04/2019 10:55, Alvaro Herrera wrote: > On 2019-Apr-24, pabloa98 wrote: > >> Regarding to (2), We are good by adding a patch and recompile a patched >> version for our server databases. >> >> But we are open on helping to add thousands of columns support as a >> compile-time parameter if there are other people interested. > It's hard to say what you're doing wrong when we don't know > what are you actually doing. > > I think raising the limit requires changing ItemIdData, t_hoff, and a > few members of PageHeaderData at the very least. Reading the three > header files involved carefully would probably point out areas I've > forgotten to mention. I think if you enlarge t_hoff and lp_off/lp_len > to 16 bits, you can use 64kB blocks, which might be useful too. > > Note that with pg12 you could have your own table AM that supported > wider ItemIds as a (small?) change on heapam, rather than supplant it > for all tables. That way you would only pay the (probably considerable) > cost of the wider line pointers on all tables ... > I wonder if it might prove a killer feature for some niche uses! Stranger things have come to pass. Suspect that going beyond 1600 columns would never be the default, even if the pg core devs, were happy to allow it as an official option (presumably at compile time?). As I think it would have negative performance impacts on the most uses of pg - as Alvaro hinted at. IMnsHO Certainly the will be many people intrigued as to what you are trying to do, even if we never want to do the same ourselves.
On Wed, Apr 24, 2019 at 3:11 PM pabloa98 <pabloa98@gmail.com> wrote:
We used tables because we have 2 types of queries on this table:SELECT * FROM table_wih_lots_of_columns WHERE condition involving a lot of columns.These type of queries read lot of rows.orSELECT columnX FROM table_wih_lots_of_columns WHERE condition involving a lot of columnsThese type of queries read very few rows.
Everyone else has had great advice on this, I'd like to add that arrays of any dimension are limited to 1GB like all varlena objects.
You should check out pg-strom, it's highly optimized for running exactly these kinds of queries on a GPU and comes with a native matrix type that can exceed the 1GB limit.
-Michel