Home > Uncategorized > Naive SQL(ite) Question

Naive SQL(ite) Question

August 7th, 2006

The best thing about writing Data Crunching wasn’t that it let me work out some ideas for material that’s now in the Software Carpentry course (although that was nice). The best part was that it forced me to finally learn a little SQL. For reasons I’ve now forgotten, I developed a dislike for databases when I was an undergrad. As a result, I was one of the few developers I knew who couldn’t do anything more than “select * from table”.

But I still don’t know very much, which is why I’m asking for help. I’ve inherited an SQLite database showing who’s been involved in projects of various kinds. One of the tables looks like this:

People
Ident Surname Forename Affiliation
7701 Turing Alan Cambridge University
6903 Hopper Grace US Navy
0055 Newton Isaac Cambridge University

There’s a lot of redundancy in the “Affiliation” entries (only about 1200 different values, out of more than 7000 records). I’d therefore like to split the table in two:

People
Ident Surname Forename AffilId
7701 Turing Alan 0
6903 Hopper Grace 1
0055 Newton Isaac 0

and:

Institutions
Ident Name
0 Cambridge University
1 US Navy

Here’s what I’ve done so far:

  1. Used “select distinct” to pull the institution names out of the “Person” table and stuff them into “Institutions” with an auto-incremented integer ID.
  2. Added an integer-valued “AffilId” column to “Person”.

The next step is to put the right institution ID values into the “AffilId” column (they’re currently all zeroes), and then delete the no-longer-needed “Affiliation” column. I thought the first half of this would look something like:

update People set AffilId = (
select Institutions.Ident
from Institutions, People
where Institutions.Name = People.Affiliation
);

but SQLite rejects that. I’ve tried several variations without success; if anyone knows the answer, I’d welcome advice.


In response to Shahan’s comment, here’s what I’m doing:

-- Create the 'People' table, and show its contents
create table People(Ident integer not null, Surname text not null, Affil text not null);
insert into People values(123, "Newton", "Cambridge");
insert into People values(456, "Darwin", "London");
insert into People values(789, "Turing", "Cambridge");
select * from People;
123 Newton Cambridge
456 Darwin London
789 Turing Cambridge
-- Create the 'Places' table and show its contents
create table Places(Ident integer not null, Name text not null);
insert into Places values(0, "Cambridge");
insert into Places values(1, "London");
select * from Places;
0 Cambridge
1 London
-- Create the table that will hold the refactored data
create table Result(Ident integer not null, Surname text not null, AffilId integer not null);
insert into Result select Ident, Surname, 999 from People;
select * from Result;
123 Newton 999
456 Darwin 999
789 Turing 999
-- Test the intended subquery: it seems to do what I want
select People.Surname, Places.Ident from Places, People, Result
where (People.Ident = Result.Ident)
and (People.Affil = Places.Name);
Newton 0
Turing 0
Darwin 1
-- Try to update the final table in place
update Result set AffilId = (
select Places.Ident from Places, People, Result
where (People.Ident = Result.Ident)
and (People.Affil = Places.Name) );
select * from Result;
123 Newton 0
456 Darwin 0
789 Turing 0

Whoops — ‘Darwin’ should have a location ID of ’1′, not ’0′. What’s going on?

Uncategorized

  1. Shahan K
    August 7th, 2006 at 21:07 | #1

    good thing i had my 343 assgn due last week, and i don’t have sqlite so here goes:
    this is a correlated subquery…

    in posgres it looks like this:
    update people
    set affilid = af.ident
    from
    (select i.ident, i.institution
    from institutions i
    ) af
    where people.affiliation = af.institution;

    from http://support.realsoftware.com/listarchives/realbasic-nug/2005-10/msg00632.html
    i assume it might look like this:

    update poeple set affilid = (select institutions.ident from institutions where institution.name = people.affiliation);

  2. Shahan K
    August 11th, 2006 at 02:43 | #2

    Thanks for posting the sql.
    In your update query you have the Result table in your subquery. By doing this it’s no longer correlated, try taking it out of the subquery so that it references the outer Result table. The same is done for my prev comment. If it still doesn’t work then it may be the ver, correlated subqueries were introduced in ver 3.1 while ver 2.18 complains that the column doesn’t exist. Hope this helps.

  3. Shahan K
    August 11th, 2006 at 02:46 | #3

    update Result set AffilId = (
    select Places.Ident from Places, People
    where People.Ident = Result.Ident
    and People.Affil = Places.Name );

Comments are closed.