Thread: Problem with joining queries.

Problem with joining queries.

From
Konstantinos Krikellas
Date:
Hi,

I am using PostgreSQL for benchmarking and I study the following query:

SELECT *
FROM k10_1, k10_2, k10_3, k10_4, k10_5
WHERE k10_1.times4 = k10_2.times4
AND   k10_2.times4 = k10_3.times4
AND   k10_3.times4 = k10_4.times4
AND   k10_4.times4 = k10_5.times4

The used schema for all the tables is:

Column  |     Type      | Modifiers
---------+---------------+-----------
unique1 | integer       | not null
unique2 | integer       |
times4  | integer       |
times10 | integer       |
ten     | integer       |
stringu | character(52) |
Indexes:
    "k10_*_pkey" PRIMARY KEY, btree (unique1)


Each table has 10000 tuples of 72 bytes each (constant). The field times4 in every table is valued in [0,2500), each value appearing exactly four times but in random order. It is easy to extract that the result has exactly 2,560,000 tuples or approximate size 185 MB. The database has been 'VACUUM FULL'-ed and is static.

When I execute this query with EXPLAIN ANALYSE, the query is executed in 10-20 sec and consumes only 8Mb of memory, depending to the machine (I have tried it on P4-2.0GHz, P4-2.2GHz and Athlon 4200++ 64x2, all with 2 Gb RAM and Linux OS, Ubuntu Edgy or Fedora 6). However, when I execute exactly the same query normally and direct the output to /dev/null, PostgreSQL consumes all the available memory (RAM and swap), and the query cannot be executed, as I receive the message 'Out of memory'. The same thing happened to all the machines. I have tried to adjust working memory and shared buffers but it still performed in the same way.

Since this is not exactly an extreme query, as its input is 5 tables with 10 thousands tuples and its output is 2.6 millions, it seems that a problem exists in this case. I would like to pose the following questions:

1. Why PostgreSQL fails to execute the query? Is there any parameter that specifies when the buffer manager tries to store intermediate and final results to the disc and how much of disk space it can occupy for temporary results?

2. How does EXPLAIN ANALYSE work? Does it create all the intermediate results as in the normal execution? Does it call the print function at the final result and direct the output to /dev/null or it doesn't call it at all? This is important as, if the cardinality of the final result is high, the print function callls impose a significant penalty on execution time

3. Is there any way of executing a query without materialising the final result but only the intermediate results, if the query plans demand it?

I hope you can enlighten me with these questions.

Kind regards,
Konstantinos Krikellas
PhD student, Database Group
University of Edinburgh
Email: K.Krikellas@sms.ed.ac.uk
Pnone number: +44 (0) 131 651 3769 

Re: Problem with joining queries.

From
Richard Huxton
Date:
Konstantinos Krikellas wrote:
> Hi,
>
> I am using PostgreSQL for benchmarking and I study the following query:
>
> SELECT *
> FROM k10_1, k10_2, k10_3, k10_4, k10_5
> WHERE k10_1.times4 = k10_2.times4
> AND   k10_2.times4 = k10_3.times4
> AND   k10_3.times4 = k10_4.times4
> AND   k10_4.times4 = k10_5.times4

> Each table has 10000 tuples of 72 bytes each (constant). The field

That's 72 bytes plus about 24 for each row header. Plus 4 bytes for the
text length, and you can't assume each character is only one byte if
you're using UTF-8 or similar.

> times4 in every table is valued in [0,2500), each value appearing
> exactly four times but in random order. It is easy to extract that the
> result has exactly 2,560,000 tuples or approximate size 185 MB. The
> database has been 'VACUUM FULL'-ed and is static.
>
> When I execute this query with EXPLAIN ANALYSE, the query is executed in
> 10-20 sec and consumes only 8Mb of memory, depending to the machine (I
> have tried it on P4-2.0GHz, P4-2.2GHz and Athlon 4200++ 64x2, all with 2
> Gb RAM and Linux OS, Ubuntu Edgy or Fedora 6). However, when I execute
> exactly the same query normally and direct the output to /dev/null,
> PostgreSQL consumes all the available memory (RAM and swap), and the
> query cannot be executed, as I receive the message 'Out of memory'. The
> same thing happened to all the machines. I have tried to adjust working
> memory and shared buffers but it still performed in the same way.

Not happening here (8.2.x, output redirected using "\o /dev/null") - are
you sure it's not psql (or whatever client) that's using up your memory,
as it tries to build the entire result set before sending it to
/dev/null? Don't forget, you've got 5 copies of the columns so that
would be ~ 700MB.

If it is the backend, you'll need to give some of the tuning parameters
you're using, since it works here on my much smaller dev server (1GB RAM
and plenty of other stuff using it).
--
   Richard Huxton
   Archonet Ltd

Re: Problem with joining queries.

From
Konstantinos Krikellas
Date:
Not happening here (8.2.x, output redirected using "\o /dev/null") - are 
you sure it's not psql (or whatever client) that's using up your memory, 
as it tries to build the entire result set before sending it to 
/dev/null? Don't forget, you've got 5 copies of the columns so that 
would be ~ 700MB.

If it is the backend, you'll need to give some of the tuning parameters 
you're using, since it works here on my much smaller dev server (1GB RAM 
and plenty of other stuff using it).

You are absolutely right about the result size.

I tried the '\o /dev/null' and worked, I had not realised the client is buffering the final result.
Instead, I used to execute the command 'psql < query.sql > /dev/null', so the psql process consumed all the available memory.

Thank you very much for your immediate and felicitous response.

Kind regards, 
Konstantinos Krikellas
PhD student, Database Group
University of Edinburgh
Email: K.Krikellas@sms.ed.ac.uk
Pnone number: +44 (0) 131 651 3769