Thread: [PERFORMANCE] Insights: fseek OR read_cluster?

[PERFORMANCE] Insights: fseek OR read_cluster?

From

Antonio Rodriges

Date:

24 September 2011, 06:49:49

Hello,

It is interesting how PostgreSQL reads the tablefiie.
Whether its indexes store/use filesystem clusters locations containing
required data (so it can issue a low level cluster read) or it just
fseeks inside a file?

Thank you

Re: [PERFORMANCE] Insights: fseek OR read_cluster?

From

Craig Ringer

Date:

26 September 2011, 08:26:59

On 24/09/2011 2:49 PM, Antonio Rodriges wrote:
> Hello,
>
> It is interesting how PostgreSQL reads the tablefiie.
> Whether its indexes store/use filesystem clusters locations containing
> required data (so it can issue a low level cluster read) or it just
> fseeks inside a file?

What is read_cluster()  ? Are you talking about some kind of async
and/or direct I/O? If so, PostgreSQL is not designed for direct I/O, it
benefits from using the OS's buffer cache, I/O scheduler, etc.

IIRC Pg uses pread() to read from its data files, but I didn't go double
check in the sources to make sure.

--
Craig Ringer

Re: [PERFORMANCE] Insights: fseek OR read_cluster?

From

Antonio Rodriges

Date:

26 September 2011, 12:51:25

Thank you, Craig, your answers are always insightful

> What is read_cluster()  ? Are you talking about some kind of async and/or

I meant that if you want to read a chunk of data from file you (1)
might not call traditional fseek but rather memorize hard drive
cluster numbers to boost disk seeks and, (2) perform the read of disk
cluster directly.

> direct I/O? If so, PostgreSQL is not designed for direct I/O, it benefits
> from using the OS's buffer cache, I/O scheduler, etc.
>
> IIRC Pg uses pread() to read from its data files, but I didn't go double
> check in the sources to make sure.
>
> --
> Craig Ringer
>

Re: [PERFORMANCE] Insights: fseek OR read_cluster?

From

Marti Raudsepp

Date:

26 September 2011, 19:07:30

On Mon, Sep 26, 2011 at 15:51, Antonio Rodriges <antonio.rrz@gmail.com> wrote:
>> What is read_cluster()  ? Are you talking about some kind of async and/or
>
> I meant that if you want to read a chunk of data from file you (1)
> might not call traditional fseek but rather memorize hard drive
> cluster numbers to boost disk seeks and, (2) perform the read of disk
> cluster directly.

PostgreSQL accesses regular files on a file system via lseek(), read()
and write() calls, no magic.

In modern extent-based file systems, mapping a file offset to a
physical disk sector is very fast -- compared to the time of actually
accessing the disk.

I can't see how direct cluster access would even work, unless you'd
give the database direct access to a raw partition, in which case
Postgres would effectively have to implement its own file system. The
gains are simply not worth it for Postgres, our developer resources
are better spent elsewhere.

Regards,
Marti

Re: [PERFORMANCE] Insights: fseek OR read_cluster?

From

Antonio Rodriges

Date:

27 September 2011, 16:12:44

Thank you, Marti,

Is there any comprehensive survey of (at least most, if not all)
modern features of operating systems, for example I/O scheduling,
extent-based filesytems, etc.?