Thread: Too much blocks read
Hello everbody, I´m doing some tests with a large table about 7 milions tuples. So, I need to retrieve only the last value for some key. That key has about 20.000 tuples in this table. SELECT field1 FROM table_7milions WHERE field1 = 'my_key' ORDER BY field1 DESC LIMIT 1 The statistics tables shows the postgres read about 782656 block from disk for the table and more 24704 blocks from disk for the index. A simple select is reading about 98 MB from disk and putting into shared memory. So I did some tests like that: -- I have created a partial table for that key SELECT * INTO part_table FROM table_7milions WHERE field1 = 'my_key' -- Now I do the same select on the same 20.000 tuples, but in the part_table SELECT field1 FROM part_table WHERE field1 = 'my_key' ORDER BY field1 desc LIMIT 1 Now the statistics shows the postgres read 54016 blocks from disk, only for the table becouse it doesn´t have a index. The same select is reading about 6 MB from disk and putting into shared memory. I´m thinking It hapens because in the 7 millions tables, the same 8k block has diferent records with different keys, so only a few records with 'my_key' is retrieved when I read a 8k block. In the part_table, all records stored in a 8k block have 'my_key', so It´s much optimized. My doubt, there is a way to defrag my 7 millions table to put all records with the same key in the same 8k block? How can I do that? If there is not, I think it´s a good idea for the next versions. Thank you, Waldomiro Caraiani
On Wed, 18 Nov 2009, Waldomiro wrote: > So, I need to retrieve only the last value for some key. That key has about > 20.000 tuples in this table. > > SELECT field1 > FROM table_7milions > WHERE field1 = 'my_key' > ORDER BY field1 DESC > LIMIT 1 What's the point of this query? You are forcing Postgresql to read in all the rows where field1 = 'my_key', so that they can be sorted, but the sort will be completely unpredictable because all the values will be the same. If you wanted to grab any row, then remove the ORDER BY, and it will just return the first one it finds. Matthew -- The best way to accelerate a Microsoft product is at 9.8 metres per second per second. - Anonymous
Waldomiro wrote: > ... > I´m thinking It hapens because in the 7 millions tables, the same 8k > block has diferent records with different keys, so only a few records > with 'my_key' is retrieved when I read a 8k block. > In the part_table, all records stored in a 8k block have 'my_key', so > It´s much optimized. > > My doubt, there is a way to defrag my 7 millions table to put all > records with the same key in the same 8k block? Read about the "CLUSTER ON index-name" SQL command. It does exactly what you're asking. Craig
In response to Waldomiro : > I?m thinking It hapens because in the 7 millions tables, the same 8k > block has diferent records with different keys, so only a few records > with 'my_key' is retrieved when I read a 8k block. > In the part_table, all records stored in a 8k block have 'my_key', so > It?s much optimized. > > My doubt, there is a way to defrag my 7 millions table to put all > records with the same key in the same 8k block? > > How can I do that? CLUSTER your table: http://www.postgresql.org/docs/current/static/sql-cluster.html Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99