Home > mailing lists

Re: Why could different data in a table be processed with different performance? - Mailing list pgsql-performance

From	Vladimir Ryabtsev
Subject	Re: Why could different data in a table be processed with different performance?
Date	September 25, 2018 01:28:15
Msg-id	CAMqTPqnUB54YXQutPKPWdoUKEZOnV58WndfTvZagrNLiTr6RHw@mail.gmail.com Whole thread Raw
In response to	Re: Why could different data in a table be processed with differentperformance? (Fabio Pardi <f.pardi@portavita.eu>)
Responses	Re: Why could different data in a table be processed with differentperformance? Re: Why could different data in a table be processed with differentperformance?
List	pgsql-performance

Tree view

> You can create 2 partial indexes and the planner will pick it up for you. (and the planning time will go a bit up).
Created two partial indexes and ensured planner uses it. But the result is still the same, no noticeable difference.

> it is not unusual to have 1GB cache or more... and do not forget to drop the cache between tests + do a sync
I conducted several long runs of dd, so I am sure that this numbers are fairly correct. However, what worries me is that I test sequential read speed while during my experiments Postgres might need to read from random places thus reducing real read speed dramatically. I have a feeling that this can be the reason.
I also reviewed import scripts and found the import was done in DESCENDING order of IDs. It was so to get most recent records sooner, may be it caused some inefficiency in the storage... But again, it was so for both ranges.

> - how big is your index?
pg_table_size('articles_pkey') = 1561 MB

> - how big is the table?
pg_table_size('articles') = 427 GB
pg_table_size('pg_toast.pg_toast_221558') = 359 GB

> - given the size of shared_buffers, almost 2M blocks should fit, but you say 2 consecutive runs still are hitting the disk. That's strange indeed since you are using way more than 2M blocks.

TBH, I cannot say I understand your calculations with number of blocks... But to clarify: consecutive runs with SAME parameters do NOT hit the disk, only the first one does, consequent ones read only from buffer cache.

> Did you check that perhaps are there any other processes or cronjobs (on postgres and on the system) that are maybe reading data and flushing out the cache?

I checked with iotop than nothing else reads intensively from any disk in the system. And again, the result is 100% reproducible and depends on ID range only, if there were any thing like these I would have noticed some fluctuations in results.

> You can make use of pg_buffercache in order to see what is actually cached.

It seems that there is no such a view in my DB, could it be that the module is not installed?

> - As Laurenz suggested (VACUUM FULL), you might want to move data around. You can try also a dump + restore to narrow the problem to data or disk
I launched VACUUM FULL, but it ran very slowly, according to my calculation it might take 17 hours. I will try to do copy data into another table with the same structure or spin up another server, and let you know.

> - You might also want to try to see the disk graph of Windows, while you are running your tests. It can show you if data (and good to know how much) is actually fetching from disk or not.
I wanted to do so but I don't have access to Hyper-V server, will try to request credentials from admins.

Couple more observations:

1) The result of my experiment is almost not affected by other server load. Another user was running a query (over this table) with read speed ~130 MB/s, while with my query read at 1.8-2 MB/s.

2) iotop show higher IO % (~93-94%) with slower read speed (though it is not quite clear what this field is). A process from example above had ~55% IO with 130 MB/s while my process had ~93% with ~2MB/s.

Regards,

Vlad

pgsql-performance by date:

From: Andres Freund
Date: 24 September 2018, 22:51:53
Subject: Re: Explain is slow with tables having many columns

From: Justin Pryzby
Date: 25 September 2018, 02:34:23
Subject: Re: Why could different data in a table be processed with differentperformance?

Re: Why could different data in a table be processed with different performance? - Mailing list pgsql-performance

Previous

Next