Thread: Problem with joining queries.
Hi,
I am using PostgreSQL for benchmarking and I study the following query:
SELECT *
FROM k10_1, k10_2, k10_3, k10_4, k10_5
WHERE k10_1.times4 = k10_2.times4
AND k10_2.times4 = k10_3.times4
AND k10_3.times4 = k10_4.times4
AND k10_4.times4 = k10_5.times4
The used schema for all the tables is:
Column | Type | Modifiers
---------+---------------+-----------
unique1 | integer | not null
unique2 | integer |
times4 | integer |
times10 | integer |
ten | integer |
stringu | character(52) |
Indexes:
"k10_*_pkey" PRIMARY KEY, btree (unique1)
Each table has 10000 tuples of 72 bytes each (constant). The field times4 in every table is valued in [0,2500), each value appearing exactly four times but in random order. It is easy to extract that the result has exactly 2,560,000 tuples or approximate size 185 MB. The database has been 'VACUUM FULL'-ed and is static.
When I execute this query with EXPLAIN ANALYSE, the query is executed in 10-20 sec and consumes only 8Mb of memory, depending to the machine (I have tried it on P4-2.0GHz, P4-2.2GHz and Athlon 4200++ 64x2, all with 2 Gb RAM and Linux OS, Ubuntu Edgy or Fedora 6). However, when I execute exactly the same query normally and direct the output to /dev/null, PostgreSQL consumes all the available memory (RAM and swap), and the query cannot be executed, as I receive the message 'Out of memory'. The same thing happened to all the machines. I have tried to adjust working memory and shared buffers but it still performed in the same way.
Since this is not exactly an extreme query, as its input is 5 tables with 10 thousands tuples and its output is 2.6 millions, it seems that a problem exists in this case. I would like to pose the following questions:
1. Why PostgreSQL fails to execute the query? Is there any parameter that specifies when the buffer manager tries to store intermediate and final results to the disc and how much of disk space it can occupy for temporary results?
2. How does EXPLAIN ANALYSE work? Does it create all the intermediate results as in the normal execution? Does it call the print function at the final result and direct the output to /dev/null or it doesn't call it at all? This is important as, if the cardinality of the final result is high, the print function callls impose a significant penalty on execution time
3. Is there any way of executing a query without materialising the final result but only the intermediate results, if the query plans demand it?
I hope you can enlighten me with these questions.
Kind regards,
I am using PostgreSQL for benchmarking and I study the following query:
SELECT *
FROM k10_1, k10_2, k10_3, k10_4, k10_5
WHERE k10_1.times4 = k10_2.times4
AND k10_2.times4 = k10_3.times4
AND k10_3.times4 = k10_4.times4
AND k10_4.times4 = k10_5.times4
The used schema for all the tables is:
Column | Type | Modifiers
---------+---------------+-----------
unique1 | integer | not null
unique2 | integer |
times4 | integer |
times10 | integer |
ten | integer |
stringu | character(52) |
Indexes:
"k10_*_pkey" PRIMARY KEY, btree (unique1)
Each table has 10000 tuples of 72 bytes each (constant). The field times4 in every table is valued in [0,2500), each value appearing exactly four times but in random order. It is easy to extract that the result has exactly 2,560,000 tuples or approximate size 185 MB. The database has been 'VACUUM FULL'-ed and is static.
When I execute this query with EXPLAIN ANALYSE, the query is executed in 10-20 sec and consumes only 8Mb of memory, depending to the machine (I have tried it on P4-2.0GHz, P4-2.2GHz and Athlon 4200++ 64x2, all with 2 Gb RAM and Linux OS, Ubuntu Edgy or Fedora 6). However, when I execute exactly the same query normally and direct the output to /dev/null, PostgreSQL consumes all the available memory (RAM and swap), and the query cannot be executed, as I receive the message 'Out of memory'. The same thing happened to all the machines. I have tried to adjust working memory and shared buffers but it still performed in the same way.
Since this is not exactly an extreme query, as its input is 5 tables with 10 thousands tuples and its output is 2.6 millions, it seems that a problem exists in this case. I would like to pose the following questions:
1. Why PostgreSQL fails to execute the query? Is there any parameter that specifies when the buffer manager tries to store intermediate and final results to the disc and how much of disk space it can occupy for temporary results?
2. How does EXPLAIN ANALYSE work? Does it create all the intermediate results as in the normal execution? Does it call the print function at the final result and direct the output to /dev/null or it doesn't call it at all? This is important as, if the cardinality of the final result is high, the print function callls impose a significant penalty on execution time
3. Is there any way of executing a query without materialising the final result but only the intermediate results, if the query plans demand it?
I hope you can enlighten me with these questions.
Kind regards,
Konstantinos Krikellas PhD student, Database Group University of Edinburgh Email: K.Krikellas@sms.ed.ac.uk Pnone number: +44 (0) 131 651 3769 |
Konstantinos Krikellas wrote: > Hi, > > I am using PostgreSQL for benchmarking and I study the following query: > > SELECT * > FROM k10_1, k10_2, k10_3, k10_4, k10_5 > WHERE k10_1.times4 = k10_2.times4 > AND k10_2.times4 = k10_3.times4 > AND k10_3.times4 = k10_4.times4 > AND k10_4.times4 = k10_5.times4 > Each table has 10000 tuples of 72 bytes each (constant). The field That's 72 bytes plus about 24 for each row header. Plus 4 bytes for the text length, and you can't assume each character is only one byte if you're using UTF-8 or similar. > times4 in every table is valued in [0,2500), each value appearing > exactly four times but in random order. It is easy to extract that the > result has exactly 2,560,000 tuples or approximate size 185 MB. The > database has been 'VACUUM FULL'-ed and is static. > > When I execute this query with EXPLAIN ANALYSE, the query is executed in > 10-20 sec and consumes only 8Mb of memory, depending to the machine (I > have tried it on P4-2.0GHz, P4-2.2GHz and Athlon 4200++ 64x2, all with 2 > Gb RAM and Linux OS, Ubuntu Edgy or Fedora 6). However, when I execute > exactly the same query normally and direct the output to /dev/null, > PostgreSQL consumes all the available memory (RAM and swap), and the > query cannot be executed, as I receive the message 'Out of memory'. The > same thing happened to all the machines. I have tried to adjust working > memory and shared buffers but it still performed in the same way. Not happening here (8.2.x, output redirected using "\o /dev/null") - are you sure it's not psql (or whatever client) that's using up your memory, as it tries to build the entire result set before sending it to /dev/null? Don't forget, you've got 5 copies of the columns so that would be ~ 700MB. If it is the backend, you'll need to give some of the tuning parameters you're using, since it works here on my much smaller dev server (1GB RAM and plenty of other stuff using it). -- Richard Huxton Archonet Ltd
Not happening here (8.2.x, output redirected using "\o /dev/null") - are you sure it's not psql (or whatever client) that's using up your memory, as it tries to build the entire result set before sending it to /dev/null? Don't forget, you've got 5 copies of the columns so that would be ~ 700MB. If it is the backend, you'll need to give some of the tuning parameters you're using, since it works here on my much smaller dev server (1GB RAM and plenty of other stuff using it).
You are absolutely right about the result size.
I tried the '\o /dev/null' and worked, I had not realised the client is buffering the final result.
Instead, I used to execute the command 'psql < query.sql > /dev/null', so the psql process consumed all the available memory.
Thank you very much for your immediate and felicitous response.
Kind regards,
Konstantinos Krikellas PhD student, Database Group University of Edinburgh Email: K.Krikellas@sms.ed.ac.uk Pnone number: +44 (0) 131 651 3769 |