The best thing about writing Data Crunching wasn't that it let me work out some ideas for material that's now in the Software Carpentry course (although that was nice). The best part was that it forced me to finally learn a little SQL. For reasons I've now forgotten, I developed a dislike for databases when I was an undergrad. As a result, I was one of the few developers I knew who couldn't do anything more than "select * from table". But I still don't know very much, which is why I'm asking for help. I've inherited an SQLite database showing who's been involved in projects of various kinds. One of the tables looks like this:
People
Ident Surname Forename Affiliation
7701 Turing Alan Cambridge University
6903 Hopper Grace US Navy
0055 Newton Isaac Cambridge University
... ... ... ...
There's a lot of redundancy in the "Affiliation" entries (only about 1200 different values, out of more than 7000 records). I'd therefore like to split the table in two:
People
Ident Surname Forename AffilId
7701 Turing Alan 0
6903 Hopper Grace 1
0055 Newton Isaac 0
... ... ... ...
and:
Institutions
Ident Name
0 Cambridge University
1 US Navy
... ...
Here's what I've done so far:
  1. Used "select distinct" to pull the institution names out of the "Person" table and stuff them into "Institutions" with an auto-incremented integer ID.
  2. Added an integer-valued "AffilId" column to "Person".
The next step is to put the right institution ID values into the "AffilId" column (they're currently all zeroes), and then delete the no-longer-needed "Affiliation" column. I thought the first half of this would look something like:
update People set AffilId = (
select Institutions.Ident
from Institutions, People
where Institutions.Name = People.Affiliation
);
but SQLite rejects that. I've tried several variations without success; if anyone knows the answer, I'd welcome advice.
In response to Shahan's comment, here's what I'm doing:
-- Create the 'People' table, and show its contents
create table People(Ident integer not null, Surname text not null, Affil text not null);
insert into People values(123, "Newton", "Cambridge");
insert into People values(456, "Darwin", "London");
insert into People values(789, "Turing", "Cambridge");
select * from People;
123 Newton Cambridge
456 Darwin London
789 Turing Cambridge
-- Create the 'Places' table and show its contents
create table Places(Ident integer not null, Name text not null);
insert into Places values(0, "Cambridge");
insert into Places values(1, "London");
select * from Places;
0 Cambridge
1 London
-- Create the table that will hold the refactored data
create table Result(Ident integer not null, Surname text not null, AffilId integer not null);
insert into Result select Ident, Surname, 999 from People;
select * from Result;
123 Newton 999
456 Darwin 999
789 Turing 999
-- Test the intended subquery: it seems to do what I want
select People.Surname, Places.Ident from Places, People, Result
where (People.Ident = Result.Ident)
and (People.Affil = Places.Name);
Newton 0
Turing 0
Darwin 1
-- Try to update the final table in place
update Result set AffilId = (
select Places.Ident from Places, People, Result
where (People.Ident = Result.Ident)
and (People.Affil = Places.Name) );
select * from Result;
123 Newton 0
456 Darwin 0
789 Turing 0
Whoops --- 'Darwin' should have a location ID of '1', not '0'. What's going on?