Thread: Front end memory consumption in SELECT
I have a large table that I need to traverse in full. I currently start with a simple unrestricted SELECT, and then fetch each and every row one at a time. I thought that by fetching just one row at a time I would not consume any significant amount of memory. However, judging by the memory consumption of my front-end process, it would seem that the SELECT is loading the entire table into memory before I even fetch the first row! Can anyone confirm that this is in fact what goes on? If so, is there any way to avoid it? The obvious solution would seem to be to use LIMIT and OFFSET to get just a few thousand rows at a time, but will that suffer from a time overhead while the backend skips over millions of rows to get to the ones it needs?? Thanks for any clues anyone can provide! Doug. P.S. If it matters, I am using the Perl interface. I am also running in SERIALIZABLE mode...
Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > However, judging by the memory consumption of my front-end process, > it would seem that the SELECT is loading the entire table into memory > before I even fetch the first row! Can anyone confirm that this is in > fact what goes on? libpq handles SELECTs that way. You should consider DECLARE CURSOR and FETCH if you need to retrieve a large query result in chunks. It was probably bad design for libpq to offer random access to query results --- I'm sure there are few applications that really care, and the memory-consumption problem is a very real one for many apps. But I see no way to fix it without fundamental changes to libpq's API, and it's not clear it's worth that kind of pain. Maybe there will be a deliberately-incompatible libpq Mark II someday ... or maybe we'll switch to a whole new client interface like CORBA. > If so, is there any way to avoid it? The obvious solution would seem > to be to use LIMIT and OFFSET to get just a few thousand rows at a > time, but will that suffer from a time overhead while the backend > skips over millions of rows to get to the ones it needs?? Yes. See the CURSOR stuff instead. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > > However, judging by the memory consumption of my front-end process, > > it would seem that the SELECT is loading the entire table into memory > > before I even fetch the first row! Can anyone confirm that this is in > > fact what goes on? > > libpq handles SELECTs that way. You should consider DECLARE CURSOR > and FETCH if you need to retrieve a large query result in chunks. Thanks for this suggestion, it has solved my problem very nicely. In case any Perl interface users are reading this and interested, this meant changing my old code that used to work like: my $result = $conn->exec("SELECT * from big_table"); $result->resultStatus == PGRES_TUPLES_OK or die(...); while (my @tuple = $result->fetchrow) { ... } to instead work like: my $result = $conn->exec( "DECLARE big_cursor CURSOR FOR SELECT * FROM big_table"); $result->resultStatus == PGRES_COMMAND_OK or die(...); do { $result = $conn->exec("FETCH 100 FROM big_cursor"); $result->resultStatus == PGRES_TUPLES_OK or die(...); while (my @tuple = $result->fetchrow) { ... } } while ($result->ntuples == 100); $conn->exec("CLOSE big_cursor"); # if not about to COMMIT anyway which limits to 100 the number of tuples in memory at any one time. I only found two cases that I could not handle this way: SELECT FOR UPDATE (since PostgreSQL cursors are read-only) and selects done outside of transactions. In my application both these exceptions only affected quite small selections (usually just one tuple) so I was able to revert to the non-cursor select with no problem. For what it is worth, I measured no significant difference in the total time taken to traverse the entire table, but the reduction in memory consumption was dramatic. Note that fetching only a single tuple at a time (instead of the 100 shown above) did take much longer. Hope this information helps someone, Doug.
Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
From
Douglas Thomson
Date:
A few weeks ago I reported a problem I was having with front end memory consumption in SELECT, and Tom's suggestion about using a cursor solved that problem very nicely at the front end. However, my front and back end processes are running on the same server, and I have just noticed that the back end process swells to the total size of all the tables involved in my query for the duration of the SELECT (or is it the duration of the declared cursor now?). To summarise the situation: - back end processes on my system are usually about 3M (PostgreSQL 6.5.1, Debian slink, Intel) - I am attempting a join on 9 tables, using a cursor to fetch the selected data - I am only selecting a tiny fraction of the total data - I am running in serializable mode if it matters - the files used to store my tables (including associated keys and indexes etc) take just under 17M on disk - the back end process grows to over 20M while processing my query This is a serious concern, because 17M is just some sample data, and will eventually be much larger. Any hints for ways of limiting back end memory consumption (hopefully to something constant rather than something proportional to the tables sizes)? Doug.
Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
From
Tom Lane
Date:
Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > I have just noticed that the back end process swells to > the total size of all the tables involved in my query for the > duration of the SELECT (or is it the duration of the declared cursor > now?). The life of the cursor, probably, but I'm a little surprised by this. > To summarise the situation: > - I am attempting a join on 9 tables, using a cursor to fetch the > selected data > - the back end process grows to over 20M while processing my query What query plan does EXPLAIN show for your SELECT? regards, tom lane ************
Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
From
Douglas Thomson
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > > I have just noticed that the back end process swells to > > the total size of all the tables involved in my query for the > > duration of the SELECT (or is it the duration of the declared cursor > > now?). > > The life of the cursor, probably, but I'm a little surprised by this. > > > To summarise the situation: > > - I am attempting a join on 9 tables, using a cursor to fetch the > > selected data > > - the back end process grows to over 20M while processing my query > > What query plan does EXPLAIN show for your SELECT? OK, here goes! I don't profess to understand the output from EXPLAIN, so here it is verbatim. If I hadn't checked that I have indexes on all the attributes (and attribute combinations) that make up the keys needed for the join, I would think the back end was trying to read in and sort a large part of the data just for the query! NOTICE: _outNode: don't know how to print type 21 NOTICE: QUERY DUMP: { NESTLOOP :cost 0 :size 600832 :width 240 :state <> :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 25 :restypmod-1 :resname "name" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 11:vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 9 :varoattno 2}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype 25:restypmod -1 :resname "canonical" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno15 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 5 :varoattno 2}}) :qpqual ({ EXPR :typeOid 0 :opType op:oper { OPER :opno 98 :opid 67 :opresulttype 16 } :args ({ VAR :varno 65000 :varattno 4 :vartype 25 :vartypmod -1 :varlevelsup0 :varnoold 2 :varoattno 6} { VAR :varno 65001 :varattno 2 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold1 :varoattno 2})} { EXPR :typeOid 0 :opType op :oper { OPER :opno 98 :opid 67 :opresulttype 16 } :args ({ VAR :varno65000 :varattno 3 :vartype 25 :vart! ypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 4} { VAR :varno 65001 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup0 :varnoold 1 :varoattno 1})}) :lefttree { SEQSCAN :cost 0 :size 0 :width 24 :state <> :qptargetlist ({ TARGETENTRY:resdom { RESDOM :resno 1 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunkfalse } :expr { VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1}} { TARGETENTRY:resdom { RESDOM :resno 2 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunkfalse } :expr { VAR :varno 1 :varattno 2 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2}}) :qpqual({ EXPR :typeOid 0 :opType op :oper { OPER :opno 98 :opid 67 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 3:vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3} { CONST :consttype 25 :constlen -1 :constisnull false:constvalue 32 [ 32 0 0 0 47 83 117 98 ! 106 101 99 116 115 47 49 57 57 57 47 115 101 109 49 47 71 67 79 47 49 56 49 50 ] :constbyval false})}) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 } :righttree { SEQSCAN :cost3.64259e+07 :size 1073741849 :width 216 :state <> :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 25:restypmod -1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 1 :vartype25 :vartypmod -1 :varlevelsup 0 :varnoold 0 :varoattno 1}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype 25 :restypmod-1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 2 :vartype25 :vartypmod -1 :varlevelsup 0 :varnoold 0 :varoattno 3}} { TARGETENTRY :resdom { RESDOM :resno 3 :restype 25 :restypmod-1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 3 :vartype25 :vartypmod -1 :varlevelsup 0 :var! noold 0 :varoattno 4}} { TARGETENTRY :resdom { RESDOM :resno 4 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 4 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold0 :varoattno 6}} { TARGETENTRY :resdom { RESDOM :resno 5 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 5 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold0 :varoattno 4}} { TARGETENTRY :resdom { RESDOM :resno 6 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 6 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold0 :varoattno 8}} { TARGETENTRY :resdom { RESDOM :resno 7 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 7 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold0 :varoattno 7}} { TARGETENTRY :resdom { ! RESDOM :resno 8 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false }NOTICE: QUERYPLAN: Nested Loop (cost=0.00 rows=600832 width=240) -> Seq Scan on t25 agents (cost=0.00 rows=0 width=24) -> Seq Scan (cost=36425892.00 rows=1073741849 width=216) -> ??? (cost=36425892.00 rows=1073741849 width=216) -> Merge Join (cost=36425892.00 rows=1073741849 width=216) -> Merge Join (cost=8371644.00 rows=850088153 width=192) -> Seq Scan (cost=8110708.50 rows=7333003 width=144) -> Sort (cost=8110708.50 rows=7333003 width=144) -> Hash Join (cost=13897.26 rows=7333003 width=144) -> Hash Join (cost=4635.32 rows=280070 width=132) -> Hash Join (cost=2674.97 rows=59324 width=120) -> Hash Join (cost=2180.41 rows=14894 width=108) -> Hash Join (cost=1705.82 rows=14315 width=96) -> Seq Scan on t30 enrolments (cost=667.40 rows=14315width=72) -> Hash (cost=302.51 rows=6955 width=24) -> Seq Scan on t24 users (cost=302.51 rows=6955width=24) -> Hash (cost=1.10 rows=3 width=12) -> Seq Scan on t21 studymodes (cost=1.10 rows=3 width=12) -> Hash (cost=1.53 rows=16 width=12) -> Seq Scan on t29 periods (cost=1.53 rows=16 width=12) -> Hash (cost=1.33 rows=10 width=12) -> Seq Scan on t23 campuses (cost=1.33 rows=10 width=12) -> Hash (cost=10.32 rows=252 width=12) -> Seq Scan on t28 courses (cost=10.32 rows=252 width=12) -> Seq Scan (cost=17962.40 rows=29831 width=48) -> Sort (cost=17962.40 rows=29831 width=48) -> Seq Scan on t27 offerings (cost=1457.42 rows=29831 width=48) -> Index Scan using t26_unq_0 on t26 subjects (cost=894.95 rows=13439 width=24) NOTICE: _outNode: don't know how to print type 21 Hope this means something to you, Doug.
Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
From
Tom Lane
Date:
Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > If I hadn't checked that I have indexes on > all the attributes (and attribute combinations) that make up the keys > needed for the join, I would think the back end was trying to read in > and sort a large part of the data just for the query! Well, of course it *is* ... the trick is to make sure that anything really large ends up on disk (in a temp file) and not in memory. Offhand the plan looks pretty reasonable for a 9-way join. I see that you have two sorts and five hashes going on here. Each of those will feel entitled to use as much memory as whatever your -S setting is, which IIRC is 512K by default. But 3.5meg altogether isn't enough to explain your memory usage ... unless you are using a larger-than-normal -S switch? Also, while the sorts are pretty certain to honor your -S limit (plus or minus some slop), the hashes are much more probabilistic affairs. If a table is much larger than the planner guesses, or if the distribution of tuple values is very uneven, a hash join might use a lot more memory than -S. It's hard to tell whether this might be happening without a lot more info than EXPLAIN provides, however. EXPLAIN shows that your tables being hashed are all pretty small, the largest being 't24 users' at 6955 rows. If that's accurate then I doubt hash overrun is the explanation... Anyone have any other ideas? regards, tom lane ************
Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
From
Douglas Thomson
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > > If I hadn't checked that I have indexes on > > all the attributes (and attribute combinations) that make up the keys > > needed for the join, I would think the back end was trying to read in > > and sort a large part of the data just for the query! > > Well, of course it *is* ... the trick is to make sure that anything > really large ends up on disk (in a temp file) and not in memory. What I don't understand is, why would it bother trying to read in lots of data and sort it when it already has an index sorted in the required order? I was trying to set up the individual table indexes so that the join could be done efficiently, and not need either lots of memory or temporary disk files. Do indexes not get used for joins? > Offhand the plan looks pretty reasonable for a 9-way join. I see that > you have two sorts and five hashes going on here. Each of those will > feel entitled to use as much memory as whatever your -S setting is, > which IIRC is 512K by default. But 3.5meg altogether isn't enough to > explain your memory usage ... unless you are using a larger-than-normal > -S switch? No, I use the default. > Also, while the sorts are pretty certain to honor your -S limit (plus or > minus some slop), the hashes are much more probabilistic affairs. If a > table is much larger than the planner guesses, or if the distribution of > tuple values is very uneven, a hash join might use a lot more memory > than -S. It's hard to tell whether this might be happening without a > lot more info than EXPLAIN provides, however. EXPLAIN shows that your > tables being hashed are all pretty small, the largest being > 't24 users' at 6955 rows. If that's accurate then I doubt hash overrun > is the explanation... 6955 rows is accurate for that table. Values could well be highly skewed though. The joining attribute for t24 is a student ID "number" text field, and the numbers will be grouped. Doug.
Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
From
Tom Lane
Date:
Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > I have just noticed that the back end process swells to > the total size of all the tables involved in my query for the > duration of the SELECT (or is it the duration of the declared cursor > now?). The life of the cursor, probably, but I'm a little surprised by this. > To summarise the situation: > - I am attempting a join on 9 tables, using a cursor to fetch the > selected data > - the back end process grows to over 20M while processing my query What query plan does EXPLAIN show for your SELECT? regards, tom lane ************
Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
From
Douglas Thomson
Date:
A few weeks ago I reported a problem I was having with front end memory consumption in SELECT, and Tom's suggestion about using a cursor solved that problem very nicely at the front end. However, my front and back end processes are running on the same server, and I have just noticed that the back end process swells to the total size of all the tables involved in my query for the duration of the SELECT (or is it the duration of the declared cursor now?). To summarise the situation: - back end processes on my system are usually about 3M (PostgreSQL 6.5.1, Debian slink, Intel) - I am attempting a join on 9 tables, using a cursor to fetch the selected data - I am only selecting a tiny fraction of the total data - I am running in serializable mode if it matters - the files used to store my tables (including associated keys and indexes etc) take just under 17M on disk - the back end process grows to over 20M while processing my query This is a serious concern, because 17M is just some sample data, and will eventually be much larger. Any hints for ways of limiting back end memory consumption (hopefully to something constant rather than something proportional to the tables sizes)? Doug. ************
Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
From
Douglas Thomson
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > > I have just noticed that the back end process swells to > > the total size of all the tables involved in my query for the > > duration of the SELECT (or is it the duration of the declared cursor > > now?). > > The life of the cursor, probably, but I'm a little surprised by this. > > > To summarise the situation: > > - I am attempting a join on 9 tables, using a cursor to fetch the > > selected data > > - the back end process grows to over 20M while processing my query > > What query plan does EXPLAIN show for your SELECT? OK, here goes! I don't profess to understand the output from EXPLAIN, so here it is verbatim. If I hadn't checked that I have indexes on all the attributes (and attribute combinations) that make up the keys needed for the join, I would think the back end was trying to read in and sort a large part of the data just for the query! NOTICE: _outNode: don't know how to print type 21 NOTICE: QUERY DUMP: { NESTLOOP :cost 0 :size 600832 :width 240 :state <> :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 25 :restypmod-1 :resname "name" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno 11:vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 9 :varoattno 2}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype 25:restypmod -1 :resname "canonical" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR :varno 65000 :varattno15 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 5 :varoattno 2}}) :qpqual ({ EXPR :typeOid 0 :opType op:oper { OPER :opno 98 :opid 67 :opresulttype 16 } :args ({ VAR :varno 65000 :varattno 4 :vartype 25 :vartypmod -1 :varlevelsup0 :varnoold 2 :varoattno 6} { VAR :varno 65001 :varattno 2 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold1 :varoattno 2})} { EXPR :typeOid 0 :opType op :oper { OPER :opno 98 :opid 67 :opresulttype 16 } :args ({ VAR :varno65000 :varattno 3 :vartype 25 :vart! ! ypmod -1 :varlevelsup 0 :varnoold 2 :varoattno 4} { VAR :varno 65001 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup0 :varnoold 1 :varoattno 1})}) :lefttree { SEQSCAN :cost 0 :size 0 :width 24 :state <> :qptargetlist ({ TARGETENTRY:resdom { RESDOM :resno 1 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunkfalse } :expr { VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 1}} { TARGETENTRY:resdom { RESDOM :resno 2 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunkfalse } :expr { VAR :varno 1 :varattno 2 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 2}}) :qpqual({ EXPR :typeOid 0 :opType op :oper { OPER :opno 98 :opid 67 :opresulttype 16 } :args ({ VAR :varno 1 :varattno 3:vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 3} { CONST :consttype 25 :constlen -1 :constisnull false:constvalue 32 [ 32 0 0 0 47 83 117 98 ! ! 106 101 99 116 115 47 49 57 57 57 47 115 101 109 49 47 71 67 79 47 49 56 49 50 ] :constbyval false})}) :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0 :scanrelid 1 } :righttree { SEQSCAN :cost3.64259e+07 :size 1073741849 :width 216 :state <> :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 25:restypmod -1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 1 :vartype25 :vartypmod -1 :varlevelsup 0 :varnoold 0 :varoattno 1}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype 25 :restypmod-1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 2 :vartype25 :vartypmod -1 :varlevelsup 0 :varnoold 0 :varoattno 3}} { TARGETENTRY :resdom { RESDOM :resno 3 :restype 25 :restypmod-1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 3 :vartype25 :vartypmod -1 :varlevelsup 0 :var! ! noold 0 :varoattno 4}} { TARGETENTRY :resdom { RESDOM :resno 4 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 4 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold0 :varoattno 6}} { TARGETENTRY :resdom { RESDOM :resno 5 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 5 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold0 :varoattno 4}} { TARGETENTRY :resdom { RESDOM :resno 6 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 6 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold0 :varoattno 8}} { TARGETENTRY :resdom { RESDOM :resno 7 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop0 :resgroupref 0 :resjunk false } :expr { VAR :varno 0 :varattno 7 :vartype 25 :vartypmod -1 :varlevelsup 0 :varnoold0 :varoattno 7}} { TARGETENTRY :resdom { ! ! RESDOM :resno 8 :restype 25 :restypmod -1 :resname "<>" :reskey 0 :reskeyop 0 :resgroupref 0 :resjunk false }NOTICE: QUERYPLAN: Nested Loop (cost=0.00 rows=600832 width=240) -> Seq Scan on t25 agents (cost=0.00 rows=0 width=24) -> Seq Scan (cost=36425892.00 rows=1073741849 width=216) -> ??? (cost=36425892.00 rows=1073741849 width=216) -> Merge Join (cost=36425892.00 rows=1073741849 width=216) -> Merge Join (cost=8371644.00 rows=850088153 width=192) -> Seq Scan (cost=8110708.50 rows=7333003 width=144) -> Sort (cost=8110708.50 rows=7333003 width=144) -> Hash Join (cost=13897.26 rows=7333003 width=144) -> Hash Join (cost=4635.32 rows=280070 width=132) -> Hash Join (cost=2674.97 rows=59324 width=120) -> Hash Join (cost=2180.41 rows=14894 width=108) -> Hash Join (cost=1705.82 rows=14315 width=96) -> Seq Scan on t30 enrolments (cost=667.40 rows=14315width=72) -> Hash (cost=302.51 rows=6955 width=24) -> Seq Scan on t24 users (cost=302.51 rows=6955width=24) -> Hash (cost=1.10 rows=3 width=12) -> Seq Scan on t21 studymodes (cost=1.10 rows=3 width=12) -> Hash (cost=1.53 rows=16 width=12) -> Seq Scan on t29 periods (cost=1.53 rows=16 width=12) -> Hash (cost=1.33 rows=10 width=12) -> Seq Scan on t23 campuses (cost=1.33 rows=10 width=12) -> Hash (cost=10.32 rows=252 width=12) -> Seq Scan on t28 courses (cost=10.32 rows=252 width=12) -> Seq Scan (cost=17962.40 rows=29831 width=48) -> Sort (cost=17962.40 rows=29831 width=48) -> Seq Scan on t27 offerings (cost=1457.42 rows=29831 width=48) -> Index Scan using t26_unq_0 on t26 subjects (cost=894.95 rows=13439 width=24) NOTICE: _outNode: don't know how to print type 21 Hope this means something to you, Doug. ************
Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
From
Douglas Thomson
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes: > Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > > If I hadn't checked that I have indexes on > > all the attributes (and attribute combinations) that make up the keys > > needed for the join, I would think the back end was trying to read in > > and sort a large part of the data just for the query! > > Well, of course it *is* ... the trick is to make sure that anything > really large ends up on disk (in a temp file) and not in memory. What I don't understand is, why would it bother trying to read in lots of data and sort it when it already has an index sorted in the required order? I was trying to set up the individual table indexes so that the join could be done efficiently, and not need either lots of memory or temporary disk files. Do indexes not get used for joins? > Offhand the plan looks pretty reasonable for a 9-way join. I see that > you have two sorts and five hashes going on here. Each of those will > feel entitled to use as much memory as whatever your -S setting is, > which IIRC is 512K by default. But 3.5meg altogether isn't enough to > explain your memory usage ... unless you are using a larger-than-normal > -S switch? No, I use the default. > Also, while the sorts are pretty certain to honor your -S limit (plus or > minus some slop), the hashes are much more probabilistic affairs. If a > table is much larger than the planner guesses, or if the distribution of > tuple values is very uneven, a hash join might use a lot more memory > than -S. It's hard to tell whether this might be happening without a > lot more info than EXPLAIN provides, however. EXPLAIN shows that your > tables being hashed are all pretty small, the largest being > 't24 users' at 6955 rows. If that's accurate then I doubt hash overrun > is the explanation... 6955 rows is accurate for that table. Values could well be highly skewed though. The joining attribute for t24 is a student ID "number" text field, and the numbers will be grouped. Doug. ************
Re: [INTERFACES] Back end memory consumption (was Front end memory consumption in SELECT)
From
Tom Lane
Date:
Douglas Thomson <dougt@mugc.cc.monash.edu.au> writes: > If I hadn't checked that I have indexes on > all the attributes (and attribute combinations) that make up the keys > needed for the join, I would think the back end was trying to read in > and sort a large part of the data just for the query! Well, of course it *is* ... the trick is to make sure that anything really large ends up on disk (in a temp file) and not in memory. Offhand the plan looks pretty reasonable for a 9-way join. I see that you have two sorts and five hashes going on here. Each of those will feel entitled to use as much memory as whatever your -S setting is, which IIRC is 512K by default. But 3.5meg altogether isn't enough to explain your memory usage ... unless you are using a larger-than-normal -S switch? Also, while the sorts are pretty certain to honor your -S limit (plus or minus some slop), the hashes are much more probabilistic affairs. If a table is much larger than the planner guesses, or if the distribution of tuple values is very uneven, a hash join might use a lot more memory than -S. It's hard to tell whether this might be happening without a lot more info than EXPLAIN provides, however. EXPLAIN shows that your tables being hashed are all pretty small, the largest being 't24 users' at 6955 rows. If that's accurate then I doubt hash overrun is the explanation... Anyone have any other ideas? regards, tom lane ************