Re: How to read query plan - Mailing list pgsql-performance

From John Arbash Meinel
Subject Re: How to read query plan
Date
Msg-id 42347D5E.9000900@arbash-meinel.com
Whole thread Raw
In response to Re: How to read query plan  (Miroslav Šulc <miroslav.sulc@startnet.cz>)
Responses Re: How to read query plan
List pgsql-performance
Miroslav Šulc wrote:

> Hi John,
>
> thank you for your response.
>
I will comment on things separately.

> John Arbash Meinel wrote:
>
...

> These external tables contain information that are a unique parameter
> of the AdDevice (like Position, Region, County, City etc.), in some
> containing localized description of the property attribute. Some of
> them could be moved into the main table but that would create a
> redundancy, some of them cannot be moved into the main table (like
> information about Partners which is definitely another object with
> respect to AdDevices). I think the names of the tables are
> self-explanatory so it should be clear what each table stores. Is this
> design incorrect?
>
It's actually more of a question as to why you are doing left outer
joins, rather than simple joins.
Are the tables not fully populated? If so, why not?

How are you using this information? Why is it useful to get back rows
that don't have all of their information filled out?
Why is it useful to have so many columns returned? It seems like it most
cases, you are only going to be able to use *some* of the information,
why not create more queries that are specialized, rather than one get
everything query.

> In fact, we only need about 30 records at a time but LIMIT can
> speed-up the query only when looking for the first 30 records. Setting
> OFFSET slows the query down.
>
Have you thought about using a cursor instead of using limit + offset?
This may not help the overall time, but it might let you split up when
the time is spent.
BEGIN;
DECLARE <cursor_name> CURSOR FOR SELECT ... FROM ...;
FETCH FORWARD 30 FROM <cursor_name>;
FETCH FORWARD 30 FROM <cursor_name>;
...
END;

>> I don't
>> really know what you are looking for, but you are joining against enough
>> tables, that I think this query is always going to be slow.
>
>
> In MySQL the query was not so slow and I don't see any reason why
> there should be large differences in SELECT speed. But if the design
> of the tables is incorrect, we will correct it.
>
In the other post I asked about your postgres settings. The defaults are
pretty stingy, so that *might* be an issue.

>> From what I can tell, you have 1 table which has 6364 rows, and you are
>> grabbing all of those rows, and then outer joining it with about 11
>> other tables.
>
>
> Here are the exact numbers:
>
> AdDevicesSites - 6364
> AdDevicesSiteRegions - 15
> AdDevicesSiteCounties - 110
> AdDevicesSiteCities - 124
> AdDevicesSiteStreets - 2858
> AdDevicesSiteStreetDescriptions - 4585
> AdDevicesSiteDistricts - 344
> AdDevicesSiteSizes - 110
> AdDevicesSiteVisibilities - 4
> AdDevicesSitePositions - 3
> AdDevicesSiteStatusTypes - 5
> PartnerIdentifications - 61
> Partners - 61
> CadastralUnits - 13027
>
And if I understand correctly, you consider all of these to be outer
joins. Meaning you want *all* of AdDevicesSites, and whatever info goes
along with it, but there are no restrictions as to what rows you want.
You want everything you can get.

Do you actually need *everything*? You mention only needing 30, what for?

>> I would actually guess that the most expensive parts of the plan are the
>> NESTED LOOPS which when they go to materialize have to do a sequential
>> scan, and they get executed 6364 times. It looks like the other tables
>> are small (only 3-5 rows), so it takes about 0.05 ms for each seqscan,
>> the problem is that because you are doing it 6k times, it ends up taking
>> about 300ms of your time.
>>
>> You could try setting "set enable_nestloop to off".
>> I don't know that it will be faster, but it could be.
>
>
> I have tried that and it resulted in about 2 sec slowdown :-(

Generally, the optimizer *does* select the best query plan. As long as
it has accurate statistics, which it seems to in this case.

>
>> In general, though, it seems like you should be asking a different
>> question, rather than trying to optimize the query that you have.
>
>
> You mean "how should I improve the design to make the query faster"?
>
There is one possibility if we don't find anything nicer. Which is to
create a lazy materialized view. Basically, you run this query, and
store it in a table. Then when you want to do the SELECT, you just do
that against the unrolled table.
You can then create triggers, etc to keep the data up to date.
Here is a good documentation of it:
http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

It is basically a way that you can un-normalize data, in a safe way.

Also, another thing that you can do, is instead of using a cursor, you
can create a temporary table with the results of the query, and create a
primary key which is just a simple counter. Then instead of doing limit
+ offset, you can select * where id > 0 and id < 30; ... select * where
id > 30 and id < 60; etc.

It still requires the original query to be run, though, so it is not
necessarily optimal for you.

>> Can you post the original SQL statement, and maybe describe what you are
>> trying to do?
>
>
> I hope the explanation above is clear and sufficient :-)
>
>>
>> John
>> =:->
>

Unfortunately, I don't really see any obvious problems with your query
in the way that you are using it. The problem is that you are not
applying any selectivity, so postgres has to go to all the tables, and
get all the rows, and then try to logically merge them together. It is
doing a hash merge, which is generally one of the faster ones and it
seems to be doing the right thing.

I would be curious to see how mysql was handling this query, to see if
there was something different it was trying to do. I'm also curious how
much of a difference there was.

John
=:->


Attachment

pgsql-performance by date:

Previous
From: John Arbash Meinel
Date:
Subject: Re: How to read query plan
Next
From: Miroslav Šulc
Date:
Subject: Re: How to read query plan