Re: Erratically behaving query needs optimization - Mailing list pgsql-performance

From Barbu Paul - Gheorghe
Subject Re: Erratically behaving query needs optimization
Date
Msg-id CAB14va8NdtNqz9aNqrpm4yEkBkwFX5V5jcjAUKf-T=XGsS-MOA@mail.gmail.com
Whole thread Raw
In response to Re: Erratically behaving query needs optimization  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
On Tue, Sep 3, 2019 at 12:57 AM Jeff Janes <jeff.janes@gmail.com> wrote:
>
> On Mon, Aug 26, 2019 at 4:26 AM Barbu Paul - Gheorghe <barbu.paul.gheorghe@gmail.com> wrote:
>>
>> On Sun, Aug 25, 2019 at 5:51 PM Jeff Janes <jeff.janes@gmail.com> wrote:
>> >
>> > Yes, it certainly looks like it is due to cold caches.  But you say it is slow at first, and then say it varies
greatlyduring a run.  Is being slow at first the only way it varies greatly, or is there large variation even beyond
that?
>>
>> There is a great variation in run times (hundreds of ms to several
>> seconds) even beyond the start of the server.
>> The query runs several times with a different device_id, object_id and
>> another list of attribute_ids and it varies from one another.
>
>
> If you run the exact same query (with the same parameters) once the cache is hot, is the performance than pretty
consistentwithin a given parameterization?  Or is still variable even within one parameterization. 
>
> If they are consistent, could you capture a fast parameterizaton and a slow parameterization and show then and the
plansor them? 

Cannot test right now, but I think I had both cases.
In the same parametrization I had both fast and slow runs and of
course it varied when changed parametrization.

>>
>> EXPLAIN (ANALYZE,BUFFERS)
>>  SELECT DISTINCT ON (results.attribute_id) results.timestamp,
>> results.data FROM results
>>  WHERE
>>      results.data <> '<NullData/>'
>>      AND results.data IS NOT NULL
>>      AND results.object_id = 1955
>>      AND results.attribute_id IN (4, 5) -- possibly a longer list here
>>      AND results.data_access_result = 'SUCCESS'
>>  ORDER BY results.attribute_id, results.timestamp DESC
>>  LIMIT 2 -- limit by the length of the attributes list
>>
>> Limit  (cost=166793.28..167335.52 rows=2 width=54) (actual
>> time=134783.510..134816.941 rows=2 loops=1)
>>   Buffers: shared hit=19086 read=46836, temp read=1522 written=3311
>>   ->  Unique  (cost=166793.28..168420.01 rows=6 width=54) (actual
>> time=134783.507..134816.850 rows=2 loops=1)
>>         Buffers: shared hit=19086 read=46836, temp read=1522 written=3311
>>         ->  Sort  (cost=166793.28..167606.64 rows=325346 width=54)
>> (actual time=134783.505..134802.602 rows=205380 loops=1)
>>               Sort Key: attribute_id, "timestamp" DESC
>
>
> Do you have an index on (attribute_id, "timestamp" DESC)?  That might really help if it can step through the rows
alreadysorted, filter out the ones that need filtering out (building the partial index might help here), hit the other
twotables for each of those rows using a nested loop, and stop after 2 rows which meet those conditions.  The problem
isif you have to step through an enormous number for rows before finding 2 of them with device_id=97. 

I tried that index and it wasn't used, it still chose to do an
in-memory quicksort of ~600 kB. I wonder why?

>>
>> So maybe I should de-normalize and place the device_id column into the
>> "results" table and add it to the index in your suggestion above?
>
>
> Yes, if nothing else works, that should.  How hard would it be to maintain that column in the correct state?

In the end I used this solution. It works ... fine, still I see slow
response times when the caches are cold, but afterwards things seem to
be fine (for now at least).
I had this in mind for a while, but wasn't convinced it was "good
design" since I had to denormalize the DB, but seeing the erratic
behaviour of the query, I finally gave up on using smart indices
trying to satisfy the planner.

It's also the first time I do this outside of a controlled learning
environment so there could be things that I missed.

Thanks for the help, all of you!

> Cheers,
>
> Jeff



--

Barbu Paul - Gheorghe



pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Erratically behaving query needs optimization
Next
From: Merlin Moncure
Date:
Subject: Re: UPGRADE TO PG11 CAUSED DEGREDATION IN PERFORMANCE