Thread: Index size

Index size

From
Samuel Williams
Date:
I'd like to understand a bit more about indexes in PG.

When I have a row in a table, and an index, say, for a single column,
does that duplicate the entire row on disk? Or is there some kind of
id lookup involved?

Thanks.


Re: Index size

From
Adrian Klaver
Date:
On 11/29/2016 03:30 PM, Samuel Williams wrote:
> I'd like to understand a bit more about indexes in PG.

https://www.postgresql.org/docs/9.5/static/indexam.html

"An index is effectively a mapping from some data key values to tuple
identifiers, or TIDs, of row versions (tuples) in the index's parent
table. A TID consists of a block number and an item number within that
block (see Section 63.6). This is sufficient information to fetch a
particular row version from the table. ..."

>
> When I have a row in a table, and an index, say, for a single column,
> does that duplicate the entire row on disk? Or is there some kind of
> id lookup involved?
>
> Thanks.
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Index size

From
Samuel Williams
Date:
Is there any reason why for the same data set, and same indexes, that
the data in postgres would be significantly larger than
innodb/mariadb?


Re: Index size

From
Karsten Hilbert
Date:
On Thu, Dec 01, 2016 at 12:38:37AM +1300, Samuel Williams wrote:

> Is there any reason why for the same data set, and same indexes, that
> the data in postgres would be significantly larger than
> innodb/mariadb?

Sure: because they do entirely different things on-disk.

Regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346


Re: Index size

From
Adrian Klaver
Date:
On 11/30/2016 03:38 AM, Samuel Williams wrote:
> Is there any reason why for the same data set, and same indexes, that
> the data in postgres would be significantly larger than
> innodb/mariadb?
>

Hard to say without the table/index definitions and some indication of
what the data is.

What version of Postgres?

Also how are you measuring the size of the data set?

Have you looked at/tried the functions here?:

https://www.postgresql.org/docs/9.5/static/functions-admin.html#FUNCTIONS-ADMIN-DBOBJECT





--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Index size

From
Samuel Williams
Date:
Thanks everyone for your feedback so far. I've done a bit more digging:

MySQL in MBytes (about 350 million rows):

index_user_event_on_what_category_id_created_at_latlng | 22806.00
index_user_event_for_reporting | 18211.00
index_user_event_on_created_at | 9519.00
index_user_event_on_user_id | 6884.00
index_user_event_on_poi_id | 4891.00
index_user_event_on_deal_id | 3979.00

Postgres (about 250 million rows):

index_user_event_on_what_category_id_created_at_latlng | 25 GB
index_user_event_for_reporting | 19 GB
index_user_event_on_created_at | 7445 MB
index_user_event_on_user_id | 7274 MB
index_user_event_on_deal_id | 7132 MB
index_user_event_on_poi_id | 7099 MB

So, the index is a bit bigger, plus there is also the PKEY index which
increases disk usage by another whole index. Keep in mind in the
above, MySQL has about 40% more data.

With some indexes, it looks like MySQL might not be adding all data to
the index (e.g. ignoring NULL values). Does MySQL ignore null values
in an index? Can we get the same behaviour in Postgres to minimise
usage? What would be the recommendation here?

For the composite indexes, if any value is null, we are also less
interested in it, so it could be ignored from the index.

Thanks,
Samuel


Re: Index size

From
Steve Atkins
Date:
> On Dec 3, 2016, at 3:57 PM, Samuel Williams <space.ship.traveller@gmail.com> wrote:
>
> Thanks everyone for your feedback so far. I've done a bit more digging:
>
> MySQL in MBytes (about 350 million rows):
>
> index_user_event_on_what_category_id_created_at_latlng | 22806.00
> index_user_event_for_reporting | 18211.00
> index_user_event_on_created_at | 9519.00
> index_user_event_on_user_id | 6884.00
> index_user_event_on_poi_id | 4891.00
> index_user_event_on_deal_id | 3979.00
>
> Postgres (about 250 million rows):
>
> index_user_event_on_what_category_id_created_at_latlng | 25 GB
> index_user_event_for_reporting | 19 GB
> index_user_event_on_created_at | 7445 MB
> index_user_event_on_user_id | 7274 MB
> index_user_event_on_deal_id | 7132 MB
> index_user_event_on_poi_id | 7099 MB
>
> So, the index is a bit bigger, plus there is also the PKEY index which
> increases disk usage by another whole index. Keep in mind in the
> above, MySQL has about 40% more data.
>
> With some indexes, it looks like MySQL might not be adding all data to
> the index (e.g. ignoring NULL values). Does MySQL ignore null values
> in an index? Can we get the same behaviour in Postgres to minimise
> usage? What would be the recommendation here?

It's unlikely anyone will be able to usefully answer the questions you
should be asking without seeing the schema and index definitions,
and maybe some clues about how you're querying the data.

Cheers,
  Steve

Re: Index size

From
Melvin Davidson
Date:


On Sat, Dec 3, 2016 at 9:32 PM, Steve Atkins <steve@blighty.com> wrote:

> On Dec 3, 2016, at 3:57 PM, Samuel Williams <space.ship.traveller@gmail.com> wrote:
>
> Thanks everyone for your feedback so far. I've done a bit more digging:
>
> MySQL in MBytes (about 350 million rows):
>
> index_user_event_on_what_category_id_created_at_latlng | 22806.00
> index_user_event_for_reporting | 18211.00
> index_user_event_on_created_at | 9519.00
> index_user_event_on_user_id | 6884.00
> index_user_event_on_poi_id | 4891.00
> index_user_event_on_deal_id | 3979.00
>
> Postgres (about 250 million rows):
>
> index_user_event_on_what_category_id_created_at_latlng | 25 GB
> index_user_event_for_reporting | 19 GB
> index_user_event_on_created_at | 7445 MB
> index_user_event_on_user_id | 7274 MB
> index_user_event_on_deal_id | 7132 MB
> index_user_event_on_poi_id | 7099 MB
>
> So, the index is a bit bigger, plus there is also the PKEY index which
> increases disk usage by another whole index. Keep in mind in the
> above, MySQL has about 40% more data.
>
> With some indexes, it looks like MySQL might not be adding all data to
> the index (e.g. ignoring NULL values). Does MySQL ignore null values
> in an index? Can we get the same behaviour in Postgres to minimise
> usage? What would be the recommendation here?

It's unlikely anyone will be able to usefully answer the questions you
should be asking without seeing the schema and index definitions,
and maybe some clues about how you're querying the data.

Cheers,
  Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

You are comparing apples to oranges. MySQL and PostgreSQL engines are different by design, so likewise
the size of the indexes will be different.
You may as well ask why a 2015 Porsche 911 Carrera 6 cyl, 3.4 L, Auto(AM-S7) gets 22 MPG,
but the 2016 Jaguar XF MPG gets 24 mpg.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Index size

From
Samuel Williams
Date:
So, uh, my main question was, does MySQL add null values to an index, and is this different from Postgres. The schema is irrelevant, except that the column allows null values. I noticed when you create an index you can add a where clause. Could it be I should add WHERE the fields are not null?

Here is the schema. Any advice or suggestions most welcome.

--

-- PostgreSQL database dump

--


-- Dumped from database version 9.5.4

-- Dumped by pg_dump version 9.5.4


SET statement_timeout = 0;

SET lock_timeout = 0;

SET client_encoding = 'UTF8';

SET standard_conforming_strings = on;

SET check_function_bodies = false;

SET client_min_messages = warning;

SET row_security = off;


SET search_path = public, pg_catalog;


SET default_tablespace = '';


SET default_with_oids = false;


--

-- Name: user_event; Type: TABLE; Schema: public; Owner: http

--


CREATE TABLE user_event (

    id integer NOT NULL,

    user_id integer NOT NULL,

    latitude numeric(9,6),

    longitude numeric(9,6),

    active boolean DEFAULT true NOT NULL,

    poi_id integer,

    deal_id integer,

    category_id integer,

    what character varying(32) NOT NULL,

    locale character varying(8),

    created_at timestamp without time zone NOT NULL,

    parameters jsonb

);



ALTER TABLE user_event OWNER TO http;


--

-- Name: user_event_id_seq; Type: SEQUENCE; Schema: public; Owner: http

--


CREATE SEQUENCE user_event_id_seq

    START WITH 1

    INCREMENT BY 1

    NO MINVALUE

    NO MAXVALUE

    CACHE 1;



ALTER TABLE user_event_id_seq OWNER TO http;


--

-- Name: user_event_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: http

--


ALTER SEQUENCE user_event_id_seq OWNED BY user_event.id;



--

-- Name: id; Type: DEFAULT; Schema: public; Owner: http

--


ALTER TABLE ONLY user_event ALTER COLUMN id SET DEFAULT nextval('user_event_id_seq'::regclass);



--

-- Name: user_event_pkey; Type: CONSTRAINT; Schema: public; Owner: http

--


ALTER TABLE ONLY user_event

    ADD CONSTRAINT user_event_pkey PRIMARY KEY (id);



--

-- Name: index_user_event_for_reporting; Type: INDEX; Schema: public; Owner: http

--


CREATE INDEX index_user_event_for_reporting ON user_event USING btree (latitude, longitude, created_at, user_id);



--

-- Name: index_user_event_on_created_at; Type: INDEX; Schema: public; Owner: http

--


CREATE INDEX index_user_event_on_created_at ON user_event USING btree (created_at);



--

-- Name: index_user_event_on_deal_id; Type: INDEX; Schema: public; Owner: http

--


CREATE INDEX index_user_event_on_deal_id ON user_event USING btree (deal_id);



--

-- Name: index_user_event_on_poi_id; Type: INDEX; Schema: public; Owner: http

--


CREATE INDEX index_user_event_on_poi_id ON user_event USING btree (poi_id);



--

-- Name: index_user_event_on_user_id; Type: INDEX; Schema: public; Owner: http

--


CREATE INDEX index_user_event_on_user_id ON user_event USING btree (user_id);



--

-- Name: index_user_event_on_what_category_id_created_at_latlng; Type: INDEX; Schema: public; Owner: http

--


CREATE INDEX index_user_event_on_what_category_id_created_at_latlng ON user_event USING btree (what, category_id, created_at, latitude, longitude);



--

-- PostgreSQL database dump complete

--




Re: Index size

From
Samuel Williams
Date:
Melvin, of course there are differences. However, I suspect there are at least SOME tangible differences which can be identified.

On 4 December 2016 at 15:53, Melvin Davidson <melvin6925@gmail.com> wrote:


On Sat, Dec 3, 2016 at 9:32 PM, Steve Atkins <steve@blighty.com> wrote:

> On Dec 3, 2016, at 3:57 PM, Samuel Williams <space.ship.traveller@gmail.com> wrote:
>
> Thanks everyone for your feedback so far. I've done a bit more digging:
>
> MySQL in MBytes (about 350 million rows):
>
> index_user_event_on_what_category_id_created_at_latlng | 22806.00
> index_user_event_for_reporting | 18211.00
> index_user_event_on_created_at | 9519.00
> index_user_event_on_user_id | 6884.00
> index_user_event_on_poi_id | 4891.00
> index_user_event_on_deal_id | 3979.00
>
> Postgres (about 250 million rows):
>
> index_user_event_on_what_category_id_created_at_latlng | 25 GB
> index_user_event_for_reporting | 19 GB
> index_user_event_on_created_at | 7445 MB
> index_user_event_on_user_id | 7274 MB
> index_user_event_on_deal_id | 7132 MB
> index_user_event_on_poi_id | 7099 MB
>
> So, the index is a bit bigger, plus there is also the PKEY index which
> increases disk usage by another whole index. Keep in mind in the
> above, MySQL has about 40% more data.
>
> With some indexes, it looks like MySQL might not be adding all data to
> the index (e.g. ignoring NULL values). Does MySQL ignore null values
> in an index? Can we get the same behaviour in Postgres to minimise
> usage? What would be the recommendation here?

It's unlikely anyone will be able to usefully answer the questions you
should be asking without seeing the schema and index definitions,
and maybe some clues about how you're querying the data.

Cheers,
  Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

You are comparing apples to oranges. MySQL and PostgreSQL engines are different by design, so likewise
the size of the indexes will be different.
You may as well ask why a 2015 Porsche 911 Carrera 6 cyl, 3.4 L, Auto(AM-S7) gets 22 MPG,
but the 2016 Jaguar XF MPG gets 24 mpg.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: Index size

From
Melvin Davidson
Date:

On Sun, Dec 4, 2016 at 4:43 AM, Samuel Williams <space.ship.traveller@gmail.com> wrote:
Melvin, of course there are differences. However, I suspect there are at least SOME tangible differences which can be identified.

On 4 December 2016 at 15:53, Melvin Davidson <melvin6925@gmail.com> wrote:


On Sat, Dec 3, 2016 at 9:32 PM, Steve Atkins <steve@blighty.com> wrote:

> On Dec 3, 2016, at 3:57 PM, Samuel Williams <space.ship.traveller@gmail.com> wrote:
>
> Thanks everyone for your feedback so far. I've done a bit more digging:
>
> MySQL in MBytes (about 350 million rows):
>
> index_user_event_on_what_category_id_created_at_latlng | 22806.00
> index_user_event_for_reporting | 18211.00
> index_user_event_on_created_at | 9519.00
> index_user_event_on_user_id | 6884.00
> index_user_event_on_poi_id | 4891.00
> index_user_event_on_deal_id | 3979.00
>
> Postgres (about 250 million rows):
>
> index_user_event_on_what_category_id_created_at_latlng | 25 GB
> index_user_event_for_reporting | 19 GB
> index_user_event_on_created_at | 7445 MB
> index_user_event_on_user_id | 7274 MB
> index_user_event_on_deal_id | 7132 MB
> index_user_event_on_poi_id | 7099 MB
>
> So, the index is a bit bigger, plus there is also the PKEY index which
> increases disk usage by another whole index. Keep in mind in the
> above, MySQL has about 40% more data.
>
> With some indexes, it looks like MySQL might not be adding all data to
> the index (e.g. ignoring NULL values). Does MySQL ignore null values
> in an index? Can we get the same behaviour in Postgres to minimise
> usage? What would be the recommendation here?

It's unlikely anyone will be able to usefully answer the questions you
should be asking without seeing the schema and index definitions,
and maybe some clues about how you're querying the data.

Cheers,
  Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

You are comparing apples to oranges. MySQL and PostgreSQL engines are different by design, so likewise
the size of the indexes will be different.
You may as well ask why a 2015 Porsche 911 Carrera 6 cyl, 3.4 L, Auto(AM-S7) gets 22 MPG,
but the 2016 Jaguar XF MPG gets 24 mpg.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.



Samuel,
Please note that the accepted convention in this forum is to bottom post. Please do not top post.

>Melvin, of course there are differences. However, I suspect there are at least SOME tangible differences which can be identified.
I have to ask, WHY do you think it is important to identify any differences in the index methods.
What is important is that you understand why you need indexes and how they are used.


Tutorial - Indexes
https://www.tutorialspoint.com/postgresql/postgresql_indexes.htm

CREATE INDEX
https://www.postgresql.org/docs/9.4/static/sql-createindex.html

--

Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Index size

From
Andreas Joseph Krogh
Date:
På søndag 04. desember 2016 kl. 16:51:54, skrev Melvin Davidson <melvin6925@gmail.com>:

On Sun, Dec 4, 2016 at 4:43 AM, Samuel Williams <space.ship.traveller@gmail.com> wrote:
Melvin, of course there are differences. However, I suspect there are at least SOME tangible differences which can be identified.
 
 
On 4 December 2016 at 15:53, Melvin Davidson <melvin6925@gmail.com> wrote:
 
 
On Sat, Dec 3, 2016 at 9:32 PM, Steve Atkins <steve@blighty.com> wrote:

> On Dec 3, 2016, at 3:57 PM, Samuel Williams <space.ship.traveller@gmail.com> wrote:
>
> Thanks everyone for your feedback so far. I've done a bit more digging:
>
> MySQL in MBytes (about 350 million rows):
>
> index_user_event_on_what_category_id_created_at_latlng | 22806.00
> index_user_event_for_reporting | 18211.00
> index_user_event_on_created_at | 9519.00
> index_user_event_on_user_id | 6884.00
> index_user_event_on_poi_id | 4891.00
> index_user_event_on_deal_id | 3979.00
>
> Postgres (about 250 million rows):
>
> index_user_event_on_what_category_id_created_at_latlng | 25 GB
> index_user_event_for_reporting | 19 GB
> index_user_event_on_created_at | 7445 MB
> index_user_event_on_user_id | 7274 MB
> index_user_event_on_deal_id | 7132 MB
> index_user_event_on_poi_id | 7099 MB
>
> So, the index is a bit bigger, plus there is also the PKEY index which
> increases disk usage by another whole index. Keep in mind in the
> above, MySQL has about 40% more data.
>
> With some indexes, it looks like MySQL might not be adding all data to
> the index (e.g. ignoring NULL values). Does MySQL ignore null values
> in an index? Can we get the same behaviour in Postgres to minimise
> usage? What would be the recommendation here?


It's unlikely anyone will be able to usefully answer the questions you
should be asking without seeing the schema and index definitions,
and maybe some clues about how you're querying the data.

Cheers,
  Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
You are comparing apples to oranges. MySQL and PostgreSQL engines are different by design, so likewise
the size of the indexes will be different.
You may as well ask why a 2015 Porsche 911 Carrera 6 cyl, 3.4 L, Auto(AM-S7) gets 22 MPG,
but the 2016 Jaguar XF MPG gets 24 mpg.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.
 
Samuel,
Please note that the accepted convention in this forum is to bottom post. Please do not top post.
 
The accepted convention in this forum is to not have purple signature:-)
 
--
Andreas Joseph Krogh
 

Re: Index size

From
Samuel Williams
Date:
Melvin, uh... I'm a software engineer... since when was it a problem to want to know how things work and why they are different? If you have nothing to contribute of a relevant technical nature, please don't reply, I'm really not interested.....

Re: Index size

From
"Joshua D. Drake"
Date:
On 12/03/2016 03:57 PM, Samuel Williams wrote:

> With some indexes, it looks like MySQL might not be adding all data to
> the index (e.g. ignoring NULL values). Does MySQL ignore null values
> in an index? Can we get the same behaviour in Postgres to minimise
> usage? What would be the recommendation here?

I don't think you are going to find many answers about the MySQL
implementation (since it will be storage engine specific and it isn't
Postgres) on a PostgreSQL list.

Sincerely,

JD

--
Command Prompt, Inc.                  http://the.postgres.company/
                         +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


Re: Index size

From
Melvin Davidson
Date:

On Mon, Dec 5, 2016 at 8:06 PM, Samuel Williams <space.ship.traveller@gmail.com> wrote:
Melvin, uh... I'm a software engineer... since when was it a problem to want to know how things work and why they are different? If you have nothing to contribute of a relevant technical nature, please don't reply, I'm really not interested.....

Samuel,

>since when was it a problem to want to know how things work and why they are different?
I am a retired DBA/Software Engineer/Customer Engineer. I have worked all aspects of IT.
Hardware, Software, Database, etc. I never said there was a problem wanting to know
the difference between things. What I did say was that there IS a difference. You are
concentrating your inquiry in the wrong area. If you want to know something specific about
a PostgreSQL index, then ask that, but don't expect anyone to explain why it's different
from a MySQL index, because there is no point in it.

> Could it be I should add WHERE the fields are not null?
That depends on the queries they will be used with.
If you have a WHERE clause in your query that requires a not null field, then the index
will be useful, providing the table is large enough to make the index scan faster than
a sequential scan.

So if you have a query like:
SELECT id, user_id, poi_id
  FROM user_event
 WHERE deal_id IS NULL;
 
 Then, yes, adding a WHERE clause to index_user_event_on_deal_id for NULL deal_id's
 is fine.
 
 But if you want
SELECT id, user_id, poi_id
  FROM user_event
 WHERE deal_id = <some_value> ;
 Then you don't need the WHERE deal_id IS NULL in your index.
 
 BTW, most dba's prefer simpler, shorter, index names.
 EG: idx_user_event_deal_id


--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Index size

From
""
Date:
Samuel Williams <space.ship.traveller@gmail.com> wrote:
> So, uh, my main question was, does MySQL add null values to an index, and is this different from Postgres...

Samuel,

A quick google says that Mysql does index NULLs. Ask a Mysql group to get a more definitive answer.

More relevant to your original question, I'll go out on a limb as I struggle to recall a fuzzy memory.

The difference between Mysql and Postgresql is fundamental architecture, so yes the index creation will be very
different,as others have said. IIRC (and I may not be), Mysql stores where a row is on the disk via the PK index. That
meansthat secondary indexes point to the proper row in the PK index, which does mean that when you use a secondary
indexto get data that there is a double look up. They claim that's faster for updates and other stuff because a change
toa row only requires 1 index to be changed. 

Postgresql stores the direct disk location in each index, which slows down updates a little, but makes selects faster
(andI find this really amusing because so many people say Mysql is great because it's fast at reads, yet
architecturallyPG is faster). If I'm wrong here, I'm sure I'll be corrected. :) 

So you can see that Mysql indexes should be smaller than PG indexes because of what they carry. Personally, I think the
diffis small enough I'm not going to worry about it, but math is such that some numbers are smaller than others. :) So
thatshould explain what you're seeing. 

My opinion is that you shouldn't worry about the index size. Which DB does what you want the best? That obviously
dependson what your needs are, but after using both Mysql and PG, I'll take PG whenever possible, thank you. Mysql has
gottenbetter over the last 5-8 years, but there are still many pits of quicksand ready to swallow you up there that PG
doesn'thave. If you know where those pits are and/or don't travel into that part of the jungle, great for you;
personally,I prefer to get the job done without having to watch where I'm walking. ;) 

HTH,
Kevin


Re: [GENERAL] Index size

From
Samuel Williams
Date:
Thanks Kevin, that makes sense. Yeah, I understand the architectural
difference a bit more now. I also read that when you change a column
which is not index, all the indexes for that row need to be updated
anyway. Is that correct?

On 7 December 2016 at 05:27,  <kbrannen@pwhome.com> wrote:
> Samuel Williams <space.ship.traveller@gmail.com> wrote:
>> So, uh, my main question was, does MySQL add null values to an index, and is this different from Postgres...
>
> Samuel,
>
> A quick google says that Mysql does index NULLs. Ask a Mysql group to get a more definitive answer.
>
> More relevant to your original question, I'll go out on a limb as I struggle to recall a fuzzy memory.
>
> The difference between Mysql and Postgresql is fundamental architecture, so yes the index creation will be very
different,as others have said. IIRC (and I may not be), Mysql stores where a row is on the disk via the PK index. That
meansthat secondary indexes point to the proper row in the PK index, which does mean that when you use a secondary
indexto get data that there is a double look up. They claim that's faster for updates and other stuff because a change
toa row only requires 1 index to be changed. 
>
> Postgresql stores the direct disk location in each index, which slows down updates a little, but makes selects faster
(andI find this really amusing because so many people say Mysql is great because it's fast at reads, yet
architecturallyPG is faster). If I'm wrong here, I'm sure I'll be corrected. :) 
>
> So you can see that Mysql indexes should be smaller than PG indexes because of what they carry. Personally, I think
thediff is small enough I'm not going to worry about it, but math is such that some numbers are smaller than others. :)
Sothat should explain what you're seeing. 
>
> My opinion is that you shouldn't worry about the index size. Which DB does what you want the best? That obviously
dependson what your needs are, but after using both Mysql and PG, I'll take PG whenever possible, thank you. Mysql has
gottenbetter over the last 5-8 years, but there are still many pits of quicksand ready to swallow you up there that PG
doesn'thave. If you know where those pits are and/or don't travel into that part of the jungle, great for you;
personally,I prefer to get the job done without having to watch where I'm walking. ;) 
>
> HTH,
> Kevin
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Index size

From
Melvin Davidson
Date:

On Fri, Dec 9, 2016 at 6:40 PM, Samuel Williams <space.ship.traveller@gmail.com> wrote:
Thanks Kevin, that makes sense. Yeah, I understand the architectural
difference a bit more now. I also read that when you change a column
which is not index, all the indexes for that row need to be updated
anyway. Is that correct?

On 7 December 2016 at 05:27,  <kbrannen@pwhome.com> wrote:
> Samuel Williams <space.ship.traveller@gmail.com> wrote:
>> So, uh, my main question was, does MySQL add null values to an index, and is this different from Postgres...
>
> Samuel,
>
> A quick google says that Mysql does index NULLs. Ask a Mysql group to get a more definitive answer.
>
> More relevant to your original question, I'll go out on a limb as I struggle to recall a fuzzy memory.
>
> The difference between Mysql and Postgresql is fundamental architecture, so yes the index creation will be very different, as others have said. IIRC (and I may not be), Mysql stores where a row is on the disk via the PK index. That means that secondary indexes point to the proper row in the PK index, which does mean that when you use a secondary index to get data that there is a double look up. They claim that's faster for updates and other stuff because a change to a row only requires 1 index to be changed.
>
> Postgresql stores the direct disk location in each index, which slows down updates a little, but makes selects faster (and I find this really amusing because so many people say Mysql is great because it's fast at reads, yet architecturally PG is faster). If I'm wrong here, I'm sure I'll be corrected. :)
>
> So you can see that Mysql indexes should be smaller than PG indexes because of what they carry. Personally, I think the diff is small enough I'm not going to worry about it, but math is such that some numbers are smaller than others. :) So that should explain what you're seeing.
>
> My opinion is that you shouldn't worry about the index size. Which DB does what you want the best? That obviously depends on what your needs are, but after using both Mysql and PG, I'll take PG whenever possible, thank you. Mysql has gotten better over the last 5-8 years, but there are still many pits of quicksand ready to swallow you up there that PG doesn't have. If you know where those pits are and/or don't travel into that part of the jungle, great for you; personally, I prefer to get the job done without having to watch where I'm walking. ;)
>
> HTH,
> Kevin
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

>I also read that when you change a column which is not index, all the
>indexes for that row need to be updated anyway. Is that correct?

That is not correct. Indexes are changed under the following conditions:
A. An insert is done to the table which involves an index.
B. A delete is done to the table which involves an index.
C. An update is done that involves columns included in an index.
D. An index is REINDEXed

Indexes point to the tid of the row for which the column(s) in the index
are involved. So if columns updated are not involved in the index,
there is no need to change the index.

https://www.postgresql.org/docs/9.4/static/sql-createindex.html

https://www.postgresql.org/docs/9.4/static/sql-reindex.html

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] Index size

From
"Peter J. Holzer"
Date:
On 2016-12-09 21:45:35 -0500, Melvin Davidson wrote:
> On Fri, Dec 9, 2016 at 6:40 PM, Samuel Williams <space.ship.traveller@gmail.com>
> wrote:
> >I also read that when you change a column which is not index, all the
> >indexes for that row need to be updated anyway. Is that correct?
>
> That is not correct. Indexes are changed under the following conditions:
> A. An insert is done to the table which involves an index.
> B. A delete is done to the table which involves an index.
> C. An update is done that involves columns included in an index.
> D. An index is REINDEXed
>
> Indexes point to the tid of the row for which the column(s) in the index
> are involved. So if columns updated are not involved in the index,
> there is no need to change the index.

I don't think this is generally correct. The TID is a (block,item)
tuple. It the updated version of the row doesn't fit into the same block
it has to be stored in a different block, so the TID will change (AIUI
there is a bit of trickery to avoid changing the TID if the new version
is stored in the same block). This means that all the index entries for
this row (not just for the changed field) will have to be updated. You
can set fillfactor to a smaller value to make this less likely.

        hp

--
   _  | Peter J. Holzer    | A coding theorist is someone who doesn't
|_|_) |                    | think Alice is crazy.
| |   | hjp@hjp.at         | -- John Gordon
__/   | http://www.hjp.at/ |    http://downlode.org/Etext/alicebob.html

Attachment

Re: [GENERAL] Index size

From
Melvin Davidson
Date:


On Sat, Dec 10, 2016 at 5:42 PM, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2016-12-09 21:45:35 -0500, Melvin Davidson wrote:
> On Fri, Dec 9, 2016 at 6:40 PM, Samuel Williams <space.ship.traveller@gmail.com>
> wrote:
> >I also read that when you change a column which is not index, all the
> >indexes for that row need to be updated anyway. Is that correct?
>
> That is not correct. Indexes are changed under the following conditions:
> A. An insert is done to the table which involves an index.
> B. A delete is done to the table which involves an index.
> C. An update is done that involves columns included in an index.
> D. An index is REINDEXed
>
> Indexes point to the tid of the row for which the column(s) in the index
> are involved. So if columns updated are not involved in the index,
> there is no need to change the index.

I don't think this is generally correct. The TID is a (block,item)
tuple. It the updated version of the row doesn't fit into the same block
it has to be stored in a different block, so the TID will change (AIUI
there is a bit of trickery to avoid changing the TID if the new version
is stored in the same block). This means that all the index entries for
this row (not just for the changed field) will have to be updated. You
can set fillfactor to a smaller value to make this less likely.

        hp

--
   _  | Peter J. Holzer    | A coding theorist is someone who doesn't
|_|_) |                    | think Alice is crazy.
| |   | hjp@hjp.at         | -- John Gordon
__/   | http://www.hjp.at/ |    http://downlode.org/Etext/alicebob.html

Yes, I see your point, but the case where the row does not fit into the same block
would only occur with unlimited field types such as var[], bytea[], etc. I believe
that to be the exception, and not the rule, so can we agree that we are both right
in that for the general case indexes are updated as I have described and for the
exception they act as you describe?
--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] Index size

From
Christophe Pettus
Date:
> On Dec 11, 2016, at 08:37, Melvin Davidson <melvin6925@gmail.com> wrote:
>
> Yes, I see your point, but the case where the row does not fit into the same block
> would only occur with unlimited field types such as var[], bytea[], etc.

No, that's not correct either.  A block can fill up for any number of reasons: Dead tuples, a long HOT chain, or just a
lotof data packed into the block.  The first UPDATE to a tuple in a block that was populated by continuous INSERTs or a
COPY,if the table is using the default fillfactor, will cause the new tuple to be written to a different block, which
willcause an index update. 

The important thing to remember is that the process you are describing (in which indexes are not updated unless a
columninvolved in the index changes) is an optimization, Heap-Only Tuples.  It's a very common optimization, but it's
notguaranteed. 

--
-- Christophe Pettus
   xof@thebuild.com