On 2024-02-12 11:28:41 -0500, Greg Sabino Mullane wrote:
> On Mon, Feb 12, 2024 at 1:50 AM veem v <veema0000@gmail.com> wrote:
>
> So we were thinking, adding many column to a table should be fine in
> postgres (as here we have a use case in which total number of columns may
> go till ~500+). But then, considering the access of columns towards the
> end of a row is going to add more time as compared to the column which is
> at the start of the row. As you mentioned, accessing 100th column may add 4
> to 5 times more as compared to the access of first column. So , is it
> advisable here to go for similar approach of breaking the table into two ,
> if the total number of column reaches certain number/threshold for a table?
>
>
> I'm not sure of what Peter was testing exactly to get those 4-5x figures,
Sorry, I should have included my benchmark code (it's short enough - see
below).
What i was actually timing was
select count(*) from t_postgresql_column_bench where v{i} = 'a'
for various i.
> but I presume that is column access time,
That was the goal. Of course there is always some overhead but I figured
that by counting rows where a column has a constant value the overhead
is minimal or at least constant.
> which would not mean a direct effect on your total query time of 4-5x.
Right. In any real application the column access time is only a part of
the total processing time and probably a small part, so the effect on
total processing time is correspondingly smaller.
hp
------------------------------------------------------------------------
#!/usr/bin/python3
import random
import time
import psycopg2
n_cols = 100
n_rows = 100000
db = psycopg2.connect("")
csr = db.cursor()
csr.execute("drop table if exists t_postgresql_column_bench")
q = "create table t_postgresql_column_bench ("
q += ", ".join(f"v{i} text" for i in range(n_cols))
q += ")"
csr.execute(q)
q = "insert into t_postgresql_column_bench values("
q += ", ".join("%s" for i in range(n_cols))
q += ")"
for j in range(n_rows):
v = [ chr(random.randint(96+1, 96+26)) for i in range(n_cols)]
csr.execute(q, v)
db.commit()
for i in range(n_cols):
q = f"select count(*) from t_postgresql_column_bench where v{i} = 'a'"
t0 = time.clock_gettime(time.CLOCK_MONOTONIC)
csr.execute(q)
r = csr.fetchall()
print(r)
t1 = time.clock_gettime(time.CLOCK_MONOTONIC)
print(i, t1 - t0)
db.commit()
------------------------------------------------------------------------
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"