Re: TB-sized databases - Mailing list pgsql-performance

From Matthew
Subject Re: TB-sized databases
Date
Msg-id Pine.LNX.4.58.0712061626370.3731@aragorn.flymine.org
Whole thread Raw
In response to Re: TB-sized databases  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: TB-sized databases
List pgsql-performance
On Thu, 6 Dec 2007, Tom Lane wrote:
> Indeed, and if you've got examples where it's that far off, you should
> report them.

Oo, oo, I have one!

So, this query bit us a while back. We had two tables being joined
together in a query by a key column. The key column was an integer, and
for the first table it had a range from zero to a bazillion. For the
second table, it had a range from half a bazillion to half a bazillion
plus a hundred. The first table had a bazillion rows, and the second table
had only about a hundred. Both tables were well ordered on the key column.
Both tables had an index on the key column.

So, our query was like this:

SELECT * FROM table1, table2 WHERE table1.key = table2.key LIMIT 1

... because we wanted to find out if there were *any* hits between the two
tables. The query took hours. To understand why, let's look at the query
without the LIMIT. For this query, Postgres would perform a nested loop,
iterating over all rows in the small table, and doing a hundred index
lookups in the big table. This completed very quickly. However, adding the
LIMIT meant that suddenly a merge join was very attractive to the planner,
as it estimated the first row to be returned within milliseconds, without
needing to sort either table.

The problem is that Postgres didn't know that the first hit in the big
table would be about half-way through, after doing a index sequential scan
for half a bazillion rows.

We fixed this query by changing it to:

SELECT * FROM table1, table2 WHERE table1.key = table2.key
       AND table1.key >= (SELECT MIN(key) FROM table2)
       AND table1.key <= (SELECT MAX(key) FROM table2)

... which artificially limited the index sequential scan of table2 to
start from the earliest possible hit and only continue to the last
possible hit. This query completed quickly, as the min and max could be
answered quickly by the indexes.

Still, it's a pity Postgres couldn't work that out for itself, having all
the information present in its statistics and indexes. AIUI the planner
doesn't peek into indexes - maybe it should.

Matthew

--
In the beginning was the word, and the word was unsigned,
and the main() {} was without form and void...

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: TB-sized databases
Next
From: Tom Lane
Date:
Subject: Re: TB-sized databases