Re: how to add more than 1600 columns in a table? - Mailing list pgsql-general
From | pabloa98 |
---|---|
Subject | Re: how to add more than 1600 columns in a table? |
Date | |
Msg-id | CAEjudX72SXL9hKBL+ctKbWb64=n53NxvVWmvBx-Gz1doB1d_BQ@mail.gmail.com Whole thread Raw |
In response to | Re: how to add more than 1600 columns in a table? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: how to add more than 1600 columns in a table?
|
List | pgsql-general |
Tom,
We are using currently 32kb pages. It was working fine with our 1200 columns tables. The new table design has around 2400 columns and it does not fit in our modified postgresql11.
We are ready (meaning, all the ready we could be with java and python but not too much C background) to modify the database source code if it is possible. We tried that a few months ago and we did not go too far. :)
I believe we replace an 8 bits int by a 32 bytes int (perhaps in t_heap? I am not sure) but it looked like we missed something because it did not work. Several tests failed and some of them paged faulted the server.
I think we had to modify other datastructures and some code in places like JIT to make it work.
A word about the database design: These tables are vectors (or matrixes) of numbers, booleans, and other vectors. They are normalized and all the columns are dependent on their PK column.
We used tables because we have 2 types of queries on this table:
SELECT * FROM table_wih_lots_of_columns WHERE condition involving a lot of columns.
These type of queries read lot of rows.
or
SELECT columnX FROM table_wih_lots_of_columns WHERE condition involving a lot of columns
These type of queries read very few rows.
UPDATES are at row level involving usually all the columns.
We like to use ACID semantic on these tables. So we chose PostgreSQL because we can create transactions, joins with other (more normal) tables and all that (compared with MongoDB it looks far better).
We considered several alternatives:
* Storing rows as JSONB, we would have to add a parsing stage in our code that will generate a lot of temporal objects. Besides, it will need a LOT more space.
* Storing rows as ARRAY[], some columns are arrays themselves. Other are booleans (that could be converted to numbers). The worst part is that we will have to reference a "column" using something like row_array[423] and trust it has the right data type. It would make our far more complicated.
* Storing info in vertical columns using arrays (the same problem than before, but adding much more complexity to the code).
* Storing outside as text files, no more ACID semantic :/
Having a plain table design let us write very clear code.
I think that PostgreSQL supporting long tables would give to the database a definitely advantage compared to other technologies used in ML and scientific programming (like MongoDB and Cassandra). Be
Pablo
On Wed, Apr 24, 2019 at 1:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
pabloa98 <pabloa98@gmail.com> writes:
> Sadly today we hit the 1600 columns limit of Postgresql 11.
> How could we add more columns?
You can't, at least not without some pretty fundamental changes;
that value is limited by field sizes within tuple headers. You'd
also, more than likely, find yourself hitting problems with the
tuples-can't-cross-page-boundary restrictions. (Our TOAST mechanism
avoids that problem for large individual fields, but not for many small
fields.)
It seems pretty unlikely to me that any sane table design actually
has thousands of truly-independent columns. Consider using arrays,
or perhaps composite sub-structures (JSON maybe?).
regards, tom lane
pgsql-general by date: