Thread: Optimize a big matrix select

Optimize a big matrix select

From
Florian Schaetz
Date:

Hello together,

 

my name is Florian (ok, that much was obvious) and I’m using PostgreSQL as well as MySQL for a caching solution on a routing server. One thing I am always trying to do is optimize the db times, so here I go again…

 

I have a table myTable with many rows (millions, potentially billions). This table looks approx. like this...

 

- attribute_id

- source

- destination

- payload1

- payload2

- payload3

All of these columns are integers. The primary key is (attribute_id, source, destination), but there are also single indexes on the first three columns.

 

What I want to do is to get the “matrix” for n source/destination points, for example if I had the points 1,2, then I would want the payload for…

 

source = 1, destination = 1

source = 1, destination = 2

source = 2, destination = 1

source = 2, destination = 2

 

So, my query, using attribute_1 = 1 as an example, looks like this:

 

select c.source, c.destination, c.payload1, c.payload2, c.payload3 from myTable

inner join TEMP_TABLE t1 ON t1.point = c.source

 inner join TEMP_TABLE t2 ON t2.point = c.destination

where c.attribute_id = 1;

 

The TEMP_TABLE simply contains the list of integers as “point” (with an index).

 

With the temp table is filled with around 1700 numers, the result will be approx. 2.9mil entries (1700 * 1700). Unfortunately, it’s quite slow (10-15secs). With MySQL, the main problem is the fetch, while the query itself only takes milliseconds, but I don’t know where the problem with PostgreSQL is (if any). But with only selecting count(*) the whole thing is much faster (1.3 seconds), so I guess the fetch part is the problem here, too.

 

I have already tried more indices, sub-queries, IN, = ANY, etc. – nothing seems to speed it up.

 

Does anyone see a chance to speed this query up significantly?

 

Regards,

 

Flo

Re: Optimize a big matrix select

From
Kevin Grittner
Date:
Florian Schaetz <Florian.Schaetz@Optitool.DE> wrote:

> The TEMP_TABLE simply contains the list of integers as “point”
> (with an index).

Did you run ANALYZE on TEMP_TABLE after loading data and before
running your query?

You're not doing a lot of deletes or updates in TEMP_TABLE at any
point, are you?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Optimize a big matrix select

From
Florian Schaetz
Date:
Hello,

> Did you run ANALYZE on TEMP_TABLE after loading data and before running your query?

Nope, not yet. What will the effect be? The temp table is comparatively small compared to the main table (and has, yet,
noforeign keys).
 

> You're not doing a lot of deletes or updates in TEMP_TABLE at any point, are you?

Only one insert. The temp table gets created, used (hopefully, if I get this query to run faster, only once) and then
dropped.
 

Regards, 

Flo

Re: Optimize a big matrix select

From
Kevin Grittner
Date:
Florian Schaetz <Florian.Schaetz@Optitool.DE> wrote:

>> Did you run ANALYZE on TEMP_TABLE after loading data and before
>> running your query?
>
> Nope, not yet. What will the effect be? The temp table is
> comparatively small compared to the main table (and has, yet, no
> foreign keys).

A temp table is never vacuumed or analyzed by autovacuum, so unless
you do so using explicit commands it will have default "magic
numbers" statistics.  You are likely to get a better plan if you
run ANALYZE (or perhaps VACUUM ANALYZE) on the temp table after you
load data and before you use it in a query.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Optimize a big matrix select

From
Florian Schaetz
Date:
> A temp table is never vacuumed or analyzed by autovacuum, so unless you 
> do so using explicit commands it will have default "magic numbers" statistics.
>  You are likely to get a better plan if you run ANALYZE (or perhaps VACUUM
>  ANALYZE) on the temp table after you load data and before you use it in a query.

Thanks for the tip. Unfortunately, while testing, this doesn't seem to make any difference, not even when using
non-temporarytables (with ANALYZE/VACUUM). Still more than 10 seconds for the fetch, 1-2 for the count. It seems, the
problemis simply that the fact that we are talking about more than 100MB an data that needs to be fetched.
 

Regards,

Flo