Re: Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel Seq Scan vs kernel read ahead
Date
Msg-id CA+TgmoZ-zE=XsHFnwiK5ZMnGv6WvW+oJnRXTeL=p16X0=nrDeg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan vs kernel read ahead  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Jun 16, 2020 at 6:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I agree that won't be a common scenario but apart from that also I am
> not sure if we can conclude that the proposed patch won't cause any
> regressions.  See one of the tests [1] done by Soumyadeep where the
> patch has caused regression in one of the cases, now we can either try
> to improve the patch and see we didn't cause any regressions or assume
> that those are some minority cases which we don't care.  Another point
> is that this thread has started with a theory that this idea can give
> benefits on certain filesystems and AFAICS we have tested it on one
> other type of system, so not sure if that is sufficient.

Yeah, it seems like those cases might need some more investigation,
but they're also not necessarily an argument for a configuration
setting. It's not so much that I dislike the idea of being able to
configure something here; it's really that I don't want a reloption
that feels like magic. For example, we know that work_mem can be
really hard to configure because there may be no value that's high
enough to make your queries run fast during normal periods but low
enough to avoid running out of memory during busy periods. That kind
of thing sucks, and we should avoid creating more such cases.

One problem here is that the best value might depend not only on the
relation but on the individual query. A GUC could be changed
per-query, but different tables in the query might need different
values. Changing a reloption requires locking, and you wouldn't want
to have to keep changing it for each different query. Now if we figure
out that something is hardware-dependent -- like we come up with a
good formula that adjusts the value automatically most of the time,
but say it needs to more more on SSDs than on spinning disks or the
other way around, well then that's a good candidate for some kind of
setting, maybe a tablespace option. But if it seems to depend on the
query, we need a better idea, not a user-configurable setting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: hashagg slowdown due to spill changes
Next
From: Tom Lane
Date:
Subject: Re: Infinities in type numeric