Thread: Column as arrays.. more efficient than columns?
Table is like create table foo ( number int, subset int, value int ) select * from foo; number | subset | value 1 1 1 1 2 2 1 3 10 1 4 3 current query is like select number, avg(case when subset = 1 then value else null end) as v1, avg(case when subset = 2 then value else null end) as v2, avg(case when subset = 3 then value else null end) as v3, avg(case when subset = 4 then value else null end) as v4 from foo group by number results ------ number | v1 | v2 | v3 | v4 1 1 2 10 4 I'm thinking of denormalising it a bit and put it either as an array or just create a new table with the end result like the above. I just want to know which is more efficient. Users can just do a select * from new_foo where number = 'X'; Thanks.
Nobody has any comments on this?? On Thu, 2007-09-06 at 12:22 +0800, Ow Mun Heng wrote: > Table is like > > create table foo ( > number int, > subset int, > value int > ) > > select * from foo; > number | subset | value > 1 1 1 > 1 2 2 > 1 3 10 > 1 4 3 > > current query is like > > select number, > avg(case when subset = 1 then value else null end) as v1, > avg(case when subset = 2 then value else null end) as v2, > avg(case when subset = 3 then value else null end) as v3, > avg(case when subset = 4 then value else null end) as v4 > from foo > group by number > > results > ------ > number | v1 | v2 | v3 | v4 > 1 1 2 10 4 > > > I'm thinking of denormalising it a bit and put it either as an array or > just create a new table with the end result like the above. > > I just want to know which is more efficient. Users can just do a > > select * from new_foo where number = 'X'; > > > Thanks. > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend
On Sep 6, 2007, at 19:37 , Ow Mun Heng wrote: > Nobody has any comments on this?? Don't do it. Michael Glaesemann grzm seespotcode net
On Thu, 2007-09-06 at 19:52 -0500, Michael Glaesemann wrote: > On Sep 6, 2007, at 19:37 , Ow Mun Heng wrote: > > > Nobody has any comments on this?? > > Don't do it. don't do what? Don't denormalise the table? don't put them into arrays? Thing is, end-result is always for them to be in 1 row and all the columns are needed for data analysis. so, it's between a select * from foo vs select avg(...), avg(..) group by ..
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michael Glaesemann wrote: > > On Sep 6, 2007, at 19:37 , Ow Mun Heng wrote: > >> Nobody has any comments on this?? > > Don't do it. HAHAHAHAHAHAHA.... Joshua D. Drake > > Michael Glaesemann > grzm seespotcode net > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG4KS5ATb/zqfZUUQRAg9wAJ96nzIP18MGtMlRZltoyN0XQb3iogCfSuPd lX7G0aGGq6NbyrHOzW2N1lk= =YaVL -----END PGP SIGNATURE-----
On Sep 6, 2007, at 19:58 , Ow Mun Heng wrote: > Don't denormalise the table? Yes. Don't denormalize the tables. > don't put them into arrays? Yes. Don't use arrays. Caveat: if the data is *naturally* an array and you will not be doing any relational operations on individual elements of the arrays, then it makes sense to use arrays. Treat arrays as you would any other opaque type. Michael Glaesemann grzm seespotcode net
On Thu, 2007-09-06 at 20:20 -0500, Michael Glaesemann wrote: > On Sep 6, 2007, at 19:58 , Ow Mun Heng wrote: > > > Don't denormalise the table? > > Yes. Don't denormalize the tables. I would believe performance would be better it being denormalised. (in this case) > > > don't put them into arrays? > > Yes. Don't use arrays. Caveat: if the data is *naturally* an array > and you will not be doing any relational operations on individual > elements of the arrays, then it makes sense to use arrays. Treat > arrays as you would any other opaque type. Data is naturally an array, and will be used as an array in any case. Since there will not be queries where users will select any one of the values in that array, but the whole array itself. data willbe used in this form code | v1 | v2 | v3 | v4 A 1 2 10 23 B 10 12 15 22 C 11 24 18 46 D 21 22 20 41 which will be imported into statistical software/excel for further manipulation. I i give them in the denormalised form, it'll take them an addition 30min or so to make them back into the form above. and it'll make the queries more efficient too. index on Code, select * from foo where code = 'B'; By denormalising, I will also get the benefit of reducing the # of rows by a factor of 20.. (20 rows = 1 code)
On 9/6/07, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > Table is like > > create table foo ( > number int, > subset int, > value int > ) > > select * from foo; > number | subset | value > 1 1 1 > 1 2 2 > 1 3 10 > 1 4 3 > > current query is like > > select number, > avg(case when subset = 1 then value else null end) as v1, > avg(case when subset = 2 then value else null end) as v2, > avg(case when subset = 3 then value else null end) as v3, > avg(case when subset = 4 then value else null end) as v4 > from foo > group by number arrays are interesting and have some useful problems. however, we must first discuss the problems...first and foremost if you need to read any particular item off the array you must read the entire array from disk and you must right all items back to disk for writes. also, they cause some problems with constraints and other issues that come up with de-normalization tactics. however, If a particular data is expressed actually as an array of items (the polygon type comes to mind), then why not? let'l that said, let's look at a better way to express this query. what jumps out at me right away is: select number, subset, avg(value) from foo group by subset; does this give you the answer that you need? If not we can proceed and look at why arrays may or may not be appropriate (i suspect I am not seeing the whole picture here). merlin
On Sep 6, 2007, at 20:46 , Ow Mun Heng wrote: > On Thu, 2007-09-06 at 20:20 -0500, Michael Glaesemann wrote: >> On Sep 6, 2007, at 19:58 , Ow Mun Heng wrote: >> >>> Don't denormalise the table? >> >> Yes. Don't denormalize the tables. > > I would believe performance would be better it being denormalised. (in > this case) I assume you've arrived at the conclusion because you have (a) shown that the performance with a normalized schema does not meet your needs; (b) benchmarked the normalized schema under production conditions; (c) benchmarked the denormalized schema under production conditions; and (d) shown that performance is improved in the denormalized case to arrive at that conclusion. I'm interested to see the results of your comparisons. Regardless, it sounds like you've already made up your mind. Why ask for comments? Michael Glaesemann grzm seespotcode net
On Thu, 2007-09-06 at 21:53 -0400, Merlin Moncure wrote: > On 9/6/07, Ow Mun Heng <Ow.Mun.Heng@wdc.com> wrote: > > Table is like > > > > create table foo ( > > number int, > > subset int, > > value int > > ) > > > > select * from foo; > > number | subset | value > > 1 1 1 > > 1 2 2 > > 1 3 10 > > 1 4 3 > > > > current query is like > > > > select number, > > avg(case when subset = 1 then value else null end) as v1, > > avg(case when subset = 2 then value else null end) as v2, > > avg(case when subset = 3 then value else null end) as v3, > > avg(case when subset = 4 then value else null end) as v4 > > from foo > > group by number > > arrays are interesting and have some useful problems. however, we > must first discuss the problems...first and foremost if you need to > read any particular item off the array you must read the entire array > from disk and you must right all items back to disk for writes. also, > they cause some problems with constraints and other issues that come > up with de-normalization tactics. I see. Didn't know that.. Good to know. > select number, subset, avg(value) from foo group by subset; > > does this give you the answer that you need? No it doesn't select * from foo order by subset; code | subset | value ------+--------+------- A | 0 | 98 A | 1 | 20 A | 2 | 98 A | 3 | 98 A | 4 | 98 => select code, subset, avg(value) from foo group by subset; ERROR: column "foo.code" must appear in the GROUP BY clause or be used in an aggregate function => select code, subset, avg(value) from foo group by subset, code; code | subset | avg ------+--------+--------------------- A | 3 | 98.0000000000000000 A | 1 | 20.0000000000000000 A | 4 | 98.0000000000000000 A | 0 | 98.0000000000000000 A | 2 | 98.0000000000000000 => select code, round(avg(case when subset = '0' then value else null end),0) as v0, round(avg(case when subset = '1' then value else null end),0) as v1, round(avg(case when subset = '2' then value else null end),0) as v2, round(avg(case when subset = '3' then value else null end),0) as v3, round(avg(case when subset = '4' then value else null end),0) as v4 from foo group by code; code | v0 | v1 | v2 | v3 | v4 ------+----+----+----+----+---- A | 98 | 20 | 98 | 98 | 98
On Thu, 2007-09-06 at 20:57 -0500, Michael Glaesemann wrote: > On Sep 6, 2007, at 20:46 , Ow Mun Heng wrote: > > I would believe performance would be better it being denormalised. (in > > this case) > > I assume you've arrived at the conclusion because you have > (a) shown > that the performance with a normalized schema does not meet your > needs; > (b) benchmarked the normalized schema under production > conditions; > (c) benchmarked the denormalized schema under production > conditions; and > (d) shown that performance is improved in the > denormalized case to arrive at that conclusion. I'm interested to see > the results of your comparisons. > Regardless, it sounds like you've already made up your mind. Why ask > for comments? You've assumed wrong. I've not arrived at any conclusion but merely exploring my options on which way would be the best to thread. I'm asking the list because I'm new in PG and after reading all those articles on highscalability etc.. majority of them are all using some kind of denormalised tables. Right now, there's 8 million rows of data in this one table, and growing at a rapid rate of ~2 million/week. I can significantly reduce this number down to 200K (i think by denormalising it) and shrink the table size. I would appreciate your guidance on this before I go knock my head on the wall. :-)
Ow Mun Heng wrote: > => select code, subset, avg(value) from foo group by subset, code; > code | subset | avg > ------+--------+--------------------- > A | 3 | 98.0000000000000000 > A | 1 | 20.0000000000000000 > A | 4 | 98.0000000000000000 > A | 0 | 98.0000000000000000 > A | 2 | 98.0000000000000000 > An alternative way to get the output below, would be to feed your aggregate query above to the crosstab() function in contrib/tablefunc. Joe > => select code, round(avg(case when subset = '0' then value else null > end),0) as v0, > round(avg(case when subset = '1' then value else null end),0) as v1, > round(avg(case when subset = '2' then value else null end),0) as v2, > round(avg(case when subset = '3' then value else null end),0) as v3, > round(avg(case when subset = '4' then value else null end),0) as v4 > from foo > group by code; > code | v0 | v1 | v2 | v3 | v4 > ------+----+----+----+----+---- > A | 98 | 20 | 98 | 98 | 98
On Thu, 2007-09-06 at 20:19 -0700, Joe Conway wrote: > Ow Mun Heng wrote: > > => select code, round(avg(case when subset = '0' then value else null > > end),0) as v0, > > round(avg(case when subset = '1' then value else null end),0) as v1, > > round(avg(case when subset = '2' then value else null end),0) as v2, > > round(avg(case when subset = '3' then value else null end),0) as v3, > > round(avg(case when subset = '4' then value else null end),0) as v4 > > from foo > > group by code; > > code | v0 | v1 | v2 | v3 | v4 > > ------+----+----+----+----+---- > > A | 98 | 20 | 98 | 98 | 98 > > An alternative way to get the output below, would be to feed your > aggregate query above to the crosstab() function in contrib/tablefunc. I just looked at it and seems like the ... row_name and value must be of type text ...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/06/07 21:26, Ow Mun Heng wrote: > On Thu, 2007-09-06 at 20:57 -0500, Michael Glaesemann wrote: >> On Sep 6, 2007, at 20:46 , Ow Mun Heng wrote: > >>> I would believe performance would be better it being denormalised. (in >>> this case) >> I assume you've arrived at the conclusion because you have >> (a) shown >> that the performance with a normalized schema does not meet your >> needs; >> (b) benchmarked the normalized schema under production >> conditions; >> (c) benchmarked the denormalized schema under production >> conditions; and >> (d) shown that performance is improved in the >> denormalized case to arrive at that conclusion. I'm interested to see >> the results of your comparisons. > >> Regardless, it sounds like you've already made up your mind. Why ask >> for comments? > > You've assumed wrong. I've not arrived at any conclusion but merely > exploring my options on which way would be the best to thread. I'm > asking the list because I'm new in PG and after reading all those > articles on highscalability etc.. majority of them are all using some > kind of denormalised tables. Correlation != causation. There *might* be a causal relationship between high scalability and table denormalization, but I seriously doubt it. > Right now, there's 8 million rows of data in this one table, and growing > at a rapid rate of ~2 million/week. I can significantly reduce this > number down to 200K (i think by denormalising it) and shrink the table > size. Even presuming you only insert data SIX hours per day, that's only 13.3 inserts per second. Not very impressive. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG4N81S9HxQb37XmcRArnRAJ9T2vOWe+RTWK99zYKCXIVfzisY5ACg3s8H NAeykgSGT2jeiXUa8P8oRAQ= =GBcW -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/06/07 20:53, Merlin Moncure wrote: [snip] > > arrays are interesting and have some useful problems. however, we > must first discuss the problems...first and foremost if you need to > read any particular item off the array you must read the entire array > from disk and you must right all items back to disk for writes. Reads and writes are done at the page level, so I'm not sure this is valid. > also, > they cause some problems with constraints and other issues that come > up with de-normalization tactics. ACK. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG4N+/S9HxQb37XmcRApl6AJ43p087jXwHs2LHGlr+JoIUVs8s7QCgmRWY BjV99QNGxKQnel3vQ4RuBMA= =IeDI -----END PGP SIGNATURE-----
On Fri, 2007-09-07 at 00:18 -0500, Ron Johnson wrote: > On 09/06/07 21:26, Ow Mun Heng wrote: > I've not arrived at any conclusion but merely > > exploring my options on which way would be the best to thread. I'm > > asking the list because I'm new in PG and after reading all those > > articles on highscalability etc.. majority of them are all using some > > kind of denormalised tables. > > Correlation != causation. > > There *might* be a causal relationship between high scalability and > table denormalization, but I seriously doubt it. I can't refute you on this since I have no experience in this arena, only what I read in highscalbility.com (IIRC) > > Right now, there's 8 million rows of data in this one table, and growing > > at a rapid rate of ~2 million/week. I can significantly reduce this > > number down to 200K (i think by denormalising it) and shrink the table > > size. > > Even presuming you only insert data SIX hours per day, that's only > 13.3 inserts per second. Not very impressive. Data is inserted 24 hours a day, but not at the same rate each sec/minute. The problem isn't really the data-insertion, it's already inserted in a normalised manner. It's the selection of data. (OLTP datahouse) which takes a longer time and which is the area of worry.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/07/07 02:49, Ow Mun Heng wrote: > On Fri, 2007-09-07 at 00:18 -0500, Ron Johnson wrote: >> On 09/06/07 21:26, Ow Mun Heng wrote: >> I've not arrived at any conclusion but merely >>> exploring my options on which way would be the best to thread. I'm >>> asking the list because I'm new in PG and after reading all those >>> articles on highscalability etc.. majority of them are all using some >>> kind of denormalised tables. >> Correlation != causation. >> >> There *might* be a causal relationship between high scalability and >> table denormalization, but I seriously doubt it. > > I can't refute you on this since I have no experience in this arena, > only what I read in highscalbility.com (IIRC) > >>> Right now, there's 8 million rows of data in this one table, and growing >>> at a rapid rate of ~2 million/week. I can significantly reduce this >>> number down to 200K (i think by denormalising it) and shrink the table >>> size. >> Even presuming you only insert data SIX hours per day, that's only >> 13.3 inserts per second. Not very impressive. > > Data is inserted 24 hours a day, but not at the same rate each > sec/minute. The problem isn't really the data-insertion, it's already > inserted in a normalised manner. It's the selection of data. (OLTP > datahouse) which takes a longer time and which is the area of worry. Datahouse or "data warehouse"? - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG4T8QS9HxQb37XmcRAmwFAJ0bOFYj4gWg2VGa4l28kiDAkraQYACgl167 sRA33c8h7ZHS2qgAfgFmzkg= =66Z0 -----END PGP SIGNATURE-----
On 9/7/07, Ron Johnson <ron.l.johnson@cox.net> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 09/06/07 20:53, Merlin Moncure wrote: > [snip] > > > > arrays are interesting and have some useful problems. however, we > > must first discuss the problems...first and foremost if you need to > > read any particular item off the array you must read the entire array > > from disk and you must right all items back to disk for writes. > > Reads and writes are done at the page level, so I'm not sure this is > valid. sure it is...since the denormalized record is much larger (especially in array scenarios), the tuple is much larger meaning the page will fill up much more quickly meaning more dead pages, more vacuuming, etc. Besides that, the server has to do some work presenting the array as part of the read which is overhead. I didn't go into a lot of detail but the reasoning is sound. Here is a quick example showing the problem. merlin create table denormalized ( data int[] ); create table normalized ( id int primary key, datum int ); insert into normalized select v, v from generate_series(1, 100) v; insert into denormalized select array(select generate_series(1,100)); create sequence rotator maxvalue 100 cycle; -- bench denormalized (d.sql) -- update denormalized set data[n] = data[n] + 1 from (select nextval('rotator') as n) q merlin@mernix:~$ pgbench -c 4 -t 1000 -f d.sql starting vacuum...end. transaction type: Custom query scaling factor: 1 number of clients: 4 number of transactions per client: 1000 number of transactions actually processed: 4000/4000 tps = 2452.188456 (including connections establishing) tps = 2465.262905 (excluding connections establishing) INFO: "normalized": found 0 removable, 100 nonremovable row versions in 38 pages -- bench normalized (n.sql) -- update normalized set datum = datum + 1 where id = (select nextval('rotator')); merlin@mernix:~$ pgbench -c 4 -t 1000 -f n.sql starting vacuum...end. transaction type: Custom query scaling factor: 1 number of clients: 4 number of transactions per client: 1000 number of transactions actually processed: 4000/4000 tps = 6494.402637 (including connections establishing) tps = 6594.087741 (excluding connections establishing) INFO: "denormalized": found 0 removable, 1 nonremovable row versions in 223 page merlin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/07/07 07:49, Merlin Moncure wrote: > On 9/7/07, Ron Johnson <ron.l.johnson@cox.net> wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On 09/06/07 20:53, Merlin Moncure wrote: >> [snip] >>> arrays are interesting and have some useful problems. however, we >>> must first discuss the problems...first and foremost if you need to >>> read any particular item off the array you must read the entire array >>> from disk and you must right all items back to disk for writes. >> Reads and writes are done at the page level, so I'm not sure this is >> valid. > > sure it is...since the denormalized record is much larger (especially > in array scenarios), the tuple is much larger meaning the page will > fill up much more quickly meaning more dead pages, more vacuuming, > etc. Besides that, the server has to do some work presenting the > array as part of the read which is overhead. I didn't go into a lot > of detail but the reasoning is sound. Here is a quick example showing > the problem. We agree. What I meant was that reads and writes are done at the page level no matter whether the table is normalized or not. Thus, to say "if you need to read any particular item off the array you must read the entire array from disk and you must right all items back to disk for writes." is... irrelevant. That's probably an imprecise word, but it's all I can think of at the moment. We also agree regarding big records filling pages faster. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG4Uv8S9HxQb37XmcRAk2rAKCpxsJjhcMbvHJB5GrZOzNsUXgmWwCg7Cy0 CEU8zlbHGG9JvZgCSm/hajE= =/Uv/ -----END PGP SIGNATURE-----
Currently I store my clob data as a text. The data consists of large xml files. When I access the text field using getString method in jdbc, the program is not able to hold a huge string in memory .If I can stream this data to and from the database it will be good. Is there a way to do this?
Are there any other solution to this problem?
Thanks in advance
shar joe
Shape Yahoo! in your own image. Join our Network Research Panel today!
>Datahouse or "data warehouse"?
OLTP data warehouse.
On 9/7/07, Ron Johnson <ron.l.johnson@cox.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> On 09/06/07 20:53, Merlin Moncure wrote:
> [snip]
> >
> > arrays are interesting and have some useful problems. however, we
> > must first discuss the problems...first and foremost if you need to
> > read any particular item off the array you must read the entire array
> > from disk and you must right all items back to disk for writes.
>
> Reads and writes are done at the page level, so I'm not sure this is
> valid.
>>sure it is...since the denormalized record is much larger (especially
>>in array scenarios), the tuple is much larger meaning the page will
>>fill up much more quickly meaning more dead pages, more vacuuming,
>>etc. Besides that, the server has to do some work presenting the
>>array as part of the read which is overhead. I didn't go into a lot
>>of detail but the reasoning is sound. Here is a quick example showing
>>the problem.
[snip]
data warehouse, so once the data has been denormalised, no need to be updated again, so i would think that merlin's tps doesn't really take into account.
Anyway.. here are some stats on the table I was working on.
denormalising the table reduced the # of rows quite a bit. and the time taken to return 2.8K rows worth of results is only ~2s on the denormalised table vs.76secs on the original table.
AFAICT, this shows me better performance in terms of read-back.
comments please since I may not necessary know what I'm doing.
BTW, I don't really get all the talk about the dead-tuples, pages filing up more quickly etc..(BTW, data below is based on denormalising the table into column forms rather than as an array)
normalised table = 8 million
denormalised table = 328K
Join table sfoo = 3.6million
join table dbar = 1.5million
join table smallfoo = 108rows
Nested Loop (cost=0.00..15022.75 rows=1 width=280) (actual time=0.597..1345.239 rows=2872 loops=1)
Join Filter: ((dbar.famid)::text = (fam.famid)::text)
-> Nested Loop (cost=0.00..15017.32 rows=1 width=274) (actual time=0.310..247.265 rows=2872 loops=1)
Join Filter: ((sfoo.date_time = denorm.date_time) AND (sfoo.ttype = denorm.ttype))
-> Nested Loop (cost=0.00..5767.36 rows=71 width=281) (actual time=0.246..85.985 rows=2872 loops=1)
-> Index Scan using idx_dbar on dbar (cost=0.00..1175.61 rows=332 width=28) (actual time=0.154..46.172 rows=482 loops=1)
Index Cond: ((code)::text = 'AAA71'::text)
-> Index Scan using idx_denorm on denorm (cost=0.00..13.74 rows=7 width=253) (actual time=0.017..0.055 rows=6 loops=482)
Index Cond: ((denorm.snum)::text = (dbar.snum)::text)
-> Index Scan using idx_ts_sn on sfoo (cost=0.00..129.48 rows=46 width=37) (actual time=0.010..0.022 rows=6 loops=2872)
Index Cond: ((sfoo.snum)::text = (norm.snum)::text)
-> Seq Scan on fam (cost=0.00..4.08 rows=108 width=18) (actual time=0.004..0.169 rows=108 loops=2872)
Total runtime: 1350.234 ms
returned 2.8K rows
HashAggregate (cost=61819.46..61819.67 rows=1 width=73) (actual time=76251.012..76586.406 rows=2872 loops=1)
-> Nested Loop (cost=20.55..61819.40 rows=1 width=73) (actual time=140.007..36979.539 rows=57440 loops=1)
Join Filter: ((dbar.famid)::text = (fam.famid::text)
-> Nested Loop (cost=20.55..61813.97 rows=1 width=67) (actual time=139.585..3412.300 rows=57440 loops=1)
Join Filter: (sfoo.ttype = norm.ttype)
-> Nested Loop (cost=0.00..42351.18 rows=792 width=65) (actual time=0.117..464.893 rows=2275 loops=1)
-> Index Scan using idx_dbar on bar (cost=0.00..1175.61 rows=332 width=28) (actual time=0.058..7.275 rows=482 loops=1)
Index Cond: ((code)::text = 'AAA71'::text)
-> Index Scan using idx_sfoo on sfoo (cost=0.00..123.45 rows=46 width=37) (actual time=0.761..0.929 rows=5 loops=482)
Index Cond: ((sfoo.snum)::text = (dbar.snum)::text)
-> Bitmap Heap Scan on norm (cost=20.55..24.56 rows=1 width=46) (actual time=1.144..1.202 rows=25 loops=2275)
Recheck Cond: (((norm.snum)::text = (dbar.snum)::text) AND (sfoo.date_time = norm.date_time))
-> BitmapAnd (cost=20.55..20.55 rows=1 width=0) (actual time=0.929..0.929 rows=0 loops=2275)
-> Bitmap Index Scan on idx_norm (cost=0.00..6.70 rows=166 width=0) (actual time=0.056..0.056 rows=142 loops=2275)
Index Cond: ((norm.snum)::text = (dbar.snum)::text)
-> Bitmap Index Scan on idx_trz_rundate (cost=0.00..13.40 rows=604 width=0) (actual time=0.977..0.977 rows=55 loops=2021)
Index Cond: (sfoo.date_time = norm.date_time)
-> Seq Scan on fam fam_id (cost=0.00..4.08 rows=108 width=18) (actual time=0.008..0.287 rows=108 loops=57440)
Total runtime: 76591.106 ms
Ow Mun Heng wrote: > On Thu, 2007-09-06 at 20:19 -0700, Joe Conway wrote: >> An alternative way to get the output below, would be to feed your >> aggregate query above to the crosstab() function in contrib/tablefunc. > > I just looked at it and seems like the > ... > row_name and value must be of type text It doesn't say that, and apparently you didn't bother to test... create table foo (number int, subset int, value int); INSERT INTO foo VALUES(1,1,11),(1,2,22),(1,3,30),(1,4,43); INSERT INTO foo VALUES(2,1,10),(2,2,23),(2,3,31),(2,4,46); INSERT INTO foo VALUES(3,1,12),(3,2,24),(3,3,34),(3,4,47); INSERT INTO foo VALUES(4,1,9),(4,2,23),(4,3,35),(4,4,42); INSERT INTO foo VALUES(5,1,10),(5,2,22),(5,3,33),(5,4,45); select number, round(avg(case when subset = '0' then value else null end),0) as v0, round(avg(case when subset = '1' then value else null end),0) as v1, round(avg(case when subset = '2' then value else null end),0) as v2, round(avg(case when subset = '3' then value else null end),0) as v3, round(avg(case when subset = '4' then value else null end),0) as v4 from foo group by number order by number; number | v0 | v1 | v2 | v3 | v4 --------+----+----+----+----+---- 1 | | 11 | 22 | 30 | 43 2 | | 10 | 23 | 31 | 46 3 | | 12 | 24 | 34 | 47 4 | | 9 | 23 | 35 | 42 5 | | 10 | 22 | 33 | 45 (5 rows) select * from crosstab( 'select number, subset, round(avg(value)) from foo group by number, subset order by number', 'select * from (values(0),(1),(2),(3),(4)) as vc') AS ct(code int, v0 int, v1 int, v2 int, v3 int, v4 int); code | v0 | v1 | v2 | v3 | v4 ------+----+----+----+----+---- 1 | | 11 | 22 | 30 | 43 2 | | 10 | 23 | 31 | 46 3 | | 12 | 24 | 34 | 47 4 | | 9 | 23 | 35 | 42 5 | | 10 | 22 | 33 | 45 (5 rows) create table foo2 (number int, subset int, value float8); INSERT INTO foo2 VALUES(1,1,1.5*11),(1,2,1.4*22),(1,3,1.3*30),(1,4,1.2*43); INSERT INTO foo2 VALUES(2,1,1.5*10),(2,2,1.4*23),(2,3,1.3*31),(2,4,1.2*46); INSERT INTO foo2 VALUES(3,1,1.5*12),(3,2,1.4*24),(3,3,1.3*34),(3,4,1.2*47); INSERT INTO foo2 VALUES(4,1,1.5*9),(4,2,1.4*23),(4,3,1.3*35),(4,4,1.2*42); INSERT INTO foo2 VALUES(5,1,1.5*10),(5,2,1.4*22),(5,3,1.3*33),(5,4,1.2*45); INSERT INTO foo2 VALUES(1,1,2.5*11),(1,2,2.4*22),(1,3,2.3*30),(1,4,2.2*43); INSERT INTO foo2 VALUES(2,1,2.5*10),(2,2,2.4*23),(2,3,2.3*31),(2,4,2.2*46); INSERT INTO foo2 VALUES(3,1,2.5*12),(3,2,2.4*24),(3,3,2.3*34),(3,4,2.2*47); INSERT INTO foo2 VALUES(4,1,2.5*9),(4,2,2.4*23),(4,3,2.3*35),(4,4,2.2*42); INSERT INTO foo2 VALUES(5,1,2.5*10),(5,2,2.4*22),(5,3,2.3*33),(5,4,2.2*45); select number, avg(case when subset = '0' then value else null end) as v0, avg(case when subset = '1' then value else null end) as v1, avg(case when subset = '2' then value else null end) as v2, avg(case when subset = '3' then value else null end) as v3, avg(case when subset = '4' then value else null end) as v4 from foo2 group by number order by number; number | v0 | v1 | v2 | v3 | v4 --------+----+----+------+------+------ 1 | | 22 | 41.8 | 54 | 73.1 2 | | 20 | 43.7 | 55.8 | 78.2 3 | | 24 | 45.6 | 61.2 | 79.9 4 | | 18 | 43.7 | 63 | 71.4 5 | | 20 | 41.8 | 59.4 | 76.5 (5 rows) select * from crosstab( 'select number, subset, avg(value) from foo2 group by number, subset order by number', 'select * from (values(0),(1),(2),(3),(4)) as vc') AS ct(code int, v0 float8, v1 float8, v2 float8, v3 float8, v4 float8); code | v0 | v1 | v2 | v3 | v4 ------+----+----+------+------+------ 1 | | 22 | 41.8 | 54 | 73.1 2 | | 20 | 43.7 | 55.8 | 78.2 3 | | 24 | 45.6 | 61.2 | 79.9 4 | | 18 | 43.7 | 63 | 71.4 5 | | 20 | 41.8 | 59.4 | 76.5 (5 rows) Joe
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 09/07/07 09:00, Ow Mun Heng wrote: > >>Datahouse or "data warehouse"? > > OLTP data warehouse. But OLTP & DW are diametrically opposed in how you design, structure, load and use them. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG4ZRES9HxQb37XmcRArECAJ0elIpVRxGjB14fzH98opOmOnm4jwCg4Vb+ rn/mb7tB0d6p6GAio4lSxCc= =SCl9 -----END PGP SIGNATURE-----
On 9/7/07, SHARMILA JOTHIRAJAH <sharmi_jo@yahoo.com> wrote: > What is the best way to store clob data in postgresql? > Currently I store my clob data as a text. The data consists of large xml > files. When I access the text field using getString method in jdbc, the > program is not able to hold a huge string in memory .If I can stream this > data to and from the database it will be good. Is there a way to do this? > Are there any other solution to this problem? Look into large objects. they stream.