I have a bunch of old mailboxes (created by pine), and I'd like to extract the message bodies. Problem is, some of the messages were sent as HTML, so the bodies are multipart MIME messages: the first part is in plain text (or close to), while the second is the marked-up HTML.

I thought that parsing each mailbox file with Python's email module would be simple, but as is often the case:

  • it's designed to handle industrial-strength problems, which means that there's a lot of things I don't need getting in the way of me finding the 5% I actually want; and
  • it assumes I know a lot more about the topic than I do.

This posting isn't really a plea for help (although if you do have something lying around that will do what I want, I'd appreciate a ping). What I'm really asking for is layered library design: when you're creating a package for general use, please remember the 80/20 rule, and make the 20% that will satisfy 80% of users very simple, and very easy to find.