Problem with joining queries. - Mailing list pgsql-performance

From Konstantinos Krikellas
Subject Problem with joining queries.
Date
Msg-id 1171544313.31678.22.camel@kkrik-desktop
Whole thread Raw
Responses Re: Problem with joining queries.
List pgsql-performance
Hi,

I am using PostgreSQL for benchmarking and I study the following query:

SELECT *
FROM k10_1, k10_2, k10_3, k10_4, k10_5
WHERE k10_1.times4 = k10_2.times4
AND   k10_2.times4 = k10_3.times4
AND   k10_3.times4 = k10_4.times4
AND   k10_4.times4 = k10_5.times4

The used schema for all the tables is:

Column  |     Type      | Modifiers
---------+---------------+-----------
unique1 | integer       | not null
unique2 | integer       |
times4  | integer       |
times10 | integer       |
ten     | integer       |
stringu | character(52) |
Indexes:
    "k10_*_pkey" PRIMARY KEY, btree (unique1)


Each table has 10000 tuples of 72 bytes each (constant). The field times4 in every table is valued in [0,2500), each value appearing exactly four times but in random order. It is easy to extract that the result has exactly 2,560,000 tuples or approximate size 185 MB. The database has been 'VACUUM FULL'-ed and is static.

When I execute this query with EXPLAIN ANALYSE, the query is executed in 10-20 sec and consumes only 8Mb of memory, depending to the machine (I have tried it on P4-2.0GHz, P4-2.2GHz and Athlon 4200++ 64x2, all with 2 Gb RAM and Linux OS, Ubuntu Edgy or Fedora 6). However, when I execute exactly the same query normally and direct the output to /dev/null, PostgreSQL consumes all the available memory (RAM and swap), and the query cannot be executed, as I receive the message 'Out of memory'. The same thing happened to all the machines. I have tried to adjust working memory and shared buffers but it still performed in the same way.

Since this is not exactly an extreme query, as its input is 5 tables with 10 thousands tuples and its output is 2.6 millions, it seems that a problem exists in this case. I would like to pose the following questions:

1. Why PostgreSQL fails to execute the query? Is there any parameter that specifies when the buffer manager tries to store intermediate and final results to the disc and how much of disk space it can occupy for temporary results?

2. How does EXPLAIN ANALYSE work? Does it create all the intermediate results as in the normal execution? Does it call the print function at the final result and direct the output to /dev/null or it doesn't call it at all? This is important as, if the cardinality of the final result is high, the print function callls impose a significant penalty on execution time

3. Is there any way of executing a query without materialising the final result but only the intermediate results, if the query plans demand it?

I hope you can enlighten me with these questions.

Kind regards,
Konstantinos Krikellas
PhD student, Database Group
University of Edinburgh
Email: K.Krikellas@sms.ed.ac.uk
Pnone number: +44 (0) 131 651 3769 

pgsql-performance by date:

Previous
From: Brian Herlihy
Date:
Subject: Re: An unwanted seqscan
Next
From: "Merlin Moncure"
Date:
Subject: Re: Benchmarking PGSQL?