Re: Why Not MySQL? - Mailing list pgsql-hackers

From Mitch Vincent
Subject Re: Why Not MySQL?
Date
Msg-id 042101bfb52e$2ec13400$4100000a@venux.net
Whole thread Raw
In response to Re: Why Not MySQL?  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Why Not MySQL?
List pgsql-hackers
> > ipa=> select count(app_id) from applicants_states;
> >  count
> > ------
> > 244367
>
> > ipa=> select count(app_id) from applicants;
> > count
> > -----
> >  9791
>
> Now I'm confused --- what's the data model here?  I guess each applicants
> row must match many entries in applicants_states?

Well, that's one possible search a person could do.

applicants is the 63 field table that hold general info about the person.

applicants_states is a table having only two fields --> app_id int4, rstate
varchar(2) -- this holds all the state abbreviations of the states that an
applicant will relocate too. It was either break it out into a different
table or make a ver large varchar() field in the applicant table that would
most of the time be totally blank but would have to be able to have every
state abbreviation in there, including a delimiter (comma or something)
between each one.. We brokw it out into another table so each applicant can
have virtually unlimited number of states (or other countries) in the
database.

So for each applicant there could be 53 records in the applicants_states
table.. (there are 53 different abbreviations that are valid in this
application)..

Clear as mud? :-)

Looking back I think that it would have almost been better to make a 150-200
character varchar field  -- perhaps not though..

> Anyway, I suspect you definitely want to avoid a nested-loop join :-).

*cough cough cough* I use to have a subselect there -- in fact you smacked
me around and told me to change it!

> It'd be fairly reasonable for the system to use either hash or merge
> join, I think.  What does EXPLAIN show that the system is actually
> doing with this query?

6.5.3 :

ipa=> explain select * from applicants as a,applicants_states as s where
a.firstname ~*
ipa-> '^mitch' and s.rstate='AL' and s.app_id=a.app_id limit 10 offset 0;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=1693.76 rows=6 width=615) ->  Seq Scan on applicants a  (cost=1554.71 rows=50 width=599) ->  Index
Scanusing applicants_states_app_id on applicants_states s
 
(cost=2.78 rows=1023 width=16)

7.0RC3 (with the exact same data) :

ipa=# explain select * from applicants as a,applicants_states as s where
a.firstname ~*
ipa-# '^mitch' and s.rstate='AL' and s.app_id=a.app_id limit 10 offset 0;
NOTICE:  QUERY PLAN:

Merge Join  (cost=0.00..29675.86 rows=2397 width=615) ->  Index Scan using applicants_states_app_id on
applicants_statess
 
(cost=0.00..23062.15 rows=2445 width=16) ->  Index Scan using applicants_app_id on applicants a
(cost=0.00..6581.91 rows=98 width=599)

NOTICE:  QUERY PLAN:

Merge Join  (cost=0.00..29675.86 rows=2397 width=615) ->  Index Scan using applicants_states_app_id on
applicants_statess
 
(cost=0.00..23062.15 rows=2445 width=16) ->  Index Scan using applicants_app_id on applicants a
(cost=0.00..6581.91 rows=98 width=599)

I did get two errors importing data from the 6.5.3 database into the 7.0
database :

ERROR:  DefineIndex: opclass "date_ops" does not accept datatype "timestamp"
ERROR:  DefineIndex: opclass "date_ops" does not accept datatype "timestamp"

And I have several fields of type 'datetime' in 6.5.3 which seem to get
translated to 'timestamp' in 7.0 -- then I try to index them using
date_ops..


-Mitch






pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: 7.0RC2 compile error !
Next
From: Tom Lane
Date:
Subject: Re: 7.0RC2 compile error !