Thread: PG index architecture

PG index architecture

From

Andy Colson

Date:

15 July 2014, 20:26:59

I was thinking about indexes, and am kinda curious about sequential access.

I know nothing of PG guts, so this might even be a dumb question.

As I understand indexes, they are a key value pair, that contain a value
and a position.  You lookup the value then use the position to seek into
the database to load the record.

Do we, or could we, load all the the matching index records, then sort
them by position?  (maybe not all, maybe large batches)

When loading from the database, if access was slightly more sequential
(vs very random), would it increase performance?

Said another way:

I think of table scanning as sequential, and fast.  That would be
loading db record 1,2,3, etc.

Would it be faster to load db records "mostly sequential": 1,3,4,7,10
compared to randomly: 7,3,10,1,4

-Andy

Re: PG index architecture

From

Igor Neyman

Date:

15 July 2014, 20:44:59

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Andy Colson
> Sent: Tuesday, July 15, 2014 4:27 PM
> To: pgsql
> Subject: [GENERAL] PG index architecture
>
> I was thinking about indexes, and am kinda curious about sequential access.
>
> I know nothing of PG guts, so this might even be a dumb question.
>
> As I understand indexes, they are a key value pair, that contain a value and a
> position.  You lookup the value then use the position to seek into the
> database to load the record.
>
> Do we, or could we, load all the the matching index records, then sort them
> by position?  (maybe not all, maybe large batches)
>
> When loading from the database, if access was slightly more sequential (vs
> very random), would it increase performance?
>
> Said another way:
>
> I think of table scanning as sequential, and fast.  That would be loading db
> record 1,2,3, etc.
>
> Would it be faster to load db records "mostly sequential": 1,3,4,7,10
> compared to randomly: 7,3,10,1,4
>
> -Andy
>

It is called CLUSTER:

http://www.postgresql.org/docs/9.2/static/sql-cluster.html

Regards,
Igor Neyman

Re: PG index architecture

From

Tom Lane

Date:

15 July 2014, 20:51:59

Andy Colson <andy@squeakycode.net> writes:
> As I understand indexes, they are a key value pair, that contain a value
> and a position.  You lookup the value then use the position to seek into
> the database to load the record.

> Do we, or could we, load all the the matching index records, then sort
> them by position?  (maybe not all, maybe large batches)

This is more or less what a "bitmap index scan" does.

            regards, tom lane

Re: PG index architecture

From

John R Pierce

Date:

15 July 2014, 20:54:37

On 7/15/2014 1:26 PM, Andy Colson wrote:
> As I understand indexes, they are a key value pair, that contain a
> value and a position.  You lookup the value then use the position to
> seek into the database to load the record.

indexes are stored as a B-tree.  each terminal node has a block number
for the target record.

>
> Do we, or could we, load all the the matching index records, then sort
> them by position?  (maybe not all, maybe large batches)
>

b-trees are inherently ordered.   data records, however are not.


> When loading from the database, if access was slightly more sequential
> (vs very random), would it increase performance?
>
> Said another way:
>
> I think of table scanning as sequential, and fast.  That would be
> loading db record 1,2,3, etc.

database tables are unordered sets, there is no record 1,2,3.

>
> Would it be faster to load db records "mostly sequential": 1,3,4,7,10
> compared to randomly: 7,3,10,1,4

its unclear to me what you mean here.


--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: PG index architecture

From

Andy Colson

Date:

15 July 2014, 21:54:10

On 7/15/2014 3:54 PM, John R Pierce wrote:
> On 7/15/2014 1:26 PM, Andy Colson wrote:
>> As I understand indexes, they are a key value pair, that contain a
>> value and a position.  You lookup the value then use the position to
>> seek into the database to load the record.
>
> indexes are stored as a B-tree.  each terminal node has a block number
> for the target record.
>
>>
>> Do we, or could we, load all the the matching index records, then sort
>> them by position?  (maybe not all, maybe large batches)
>>
>
> b-trees are inherently ordered.   data records, however are not.
>
>
>> When loading from the database, if access was slightly more sequential
>> (vs very random), would it increase performance?
>>
>> Said another way:
>>
>> I think of table scanning as sequential, and fast.  That would be
>> loading db record 1,2,3, etc.
>
> database tables are unordered sets, there is no record 1,2,3.
>
>>
>> Would it be faster to load db records "mostly sequential": 1,3,4,7,10
>> compared to randomly: 7,3,10,1,4
>
> its unclear to me what you mean here.
>
>

Ah, yea, sorry, I don't really mean record #1, I mean hard drive seek to
address 1.  (where address came from the index lookup)

Know what I mean?

-Andy

Re: PG index architecture

From

John R Pierce

Date:

15 July 2014, 22:03:16

On 7/15/2014 2:54 PM, Andy Colson wrote:
>
> Ah, yea, sorry, I don't really mean record #1, I mean hard drive seek
> to address 1.  (where address came from the index lookup)

of course, its a file+offset, so there's at least 1-2 more levels of
indirection before you get to physical addresses on the hard disk.

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast