"John Smith" <sodgodofall@gmail.com> writes:
> Consider the following query:
> postgres=# select count(*) from generate_series(1,100000000000000000);
> A vmstat while this query was running seems to suggest that the
> generate_series() was being materialized to disk first and then the
> count computed
Yeah, that's what I'd expect given the way nodeFunctionscan.c works.
> A small variation in the query (i.e. placing the generate_series in
> the select list without a from clause) exhibited a different
> behaviour:
> postgres=# select count(*) from (select
> generate_series(1,100000000000000000)) as A;
> This time, Postgres seemed to be trying to build the entire
> generate_series() result set in memory and eventually started to swap
> out until the swap space limit after which it crashed.
Hmm, at worst you should get an "out of memory" error --- that's what
I get. I think you're suffering from the dreaded "OOM kill" disease.
> Interestingly though, when the range in the generate_series() was
> small enough to fit in 4 bytes of memory (e.g.
> generate_series(1,1000000000) ), the above query completed consuming
> only negligible amount of memory. So, it looked like the aggregate
> computation was being pipelined with the tuples returned from
> generate_series().
It's pipelined either way. But int8 is a pass-by-reference data type,
and it sounds like we have a memory leak for this case.
> Third question: more generally, is there a way I can ask Postgres to
> report an Out-of-memory the moment it tries to consume greater than a
> certain percentage of memory (instead of letting it hit the swap and
> eventually die or thrash) ?
First, you want to turn off memory overcommit in your kernel;
second, you might find that running the postmaster under conservative
ulimit settings would make the behavior nicer.
regards, tom lane