Re: Digesting explain analyze - Mailing list pgsql-performance

From Ron Mayer
Subject Re: Digesting explain analyze
Date
Msg-id 4B44EC89.7060200@cheapcomplexdevices.com
Whole thread Raw
In response to Digesting explain analyze  (Jesper Krogh <jesper@krogh.cc>)
Responses Re: Digesting explain analyze  (Jesper Krogh <jesper@krogh.cc>)
List pgsql-performance
Jesper Krogh wrote:
> I have a table that consists of somewhere in the magnitude of 100.000.000
> rows and all rows are of this tuples
>
> (id1,id2,evalue);
>
> Then I'd like to speed up a query like this:
>
> explain analyze select id from table where id1 = 2067 or id2 = 2067 order
> by evalue asc limit 100;
>
> ...The inner sets are on average 3.000 for
> both id1 and id2 and a typical limit would be 100, so if I could convince
> postgresql to not fetch all of them then I would reduce the set retrieved
> by around 60. The dataset is quite large so the random query is not very
> likely to be hitting the same part of the dataset again, so there is going
> to be a fair amount of going to disk.,

If disk seeks are killing you a kinda crazy idea would be to
duplicate the table - clustering one by (id1) and
the other one by an index on (id2) and unioning the
results of each.

Since each of these duplicates of the table will be clustered
by the column you're querying it on, it should just take one
seek in each table.

Then your query could be something like

  select * from (
    select * from t1 where id1=2067 order by evalue limit 100
    union
    select * from t2 where id2=2067 order by evalue limit 100
  ) as foo order by evalue limit 100;

Hmm..  and I wonder if putting evalue into the criteria to cluster
the tables too (i.e. cluster on id1,evalue) if you could make it
so the limit finds the right 100 evalues first for each table....






pgsql-performance by date:

Previous
From: Jesper Krogh
Date:
Subject: Digesting explain analyze
Next
From: Zintrigue
Date:
Subject: noob inheritance question