Home > Teaching > Size and Activity

Size and Activity

November 11th, 2009

Next term, I’m going to be teaching CSC302 (the second of our two-course sequence in software engineering). The mandate for the course is to introduce students to the tools and methods they need to deal with large applications; as part of it, I’m thinking of having each group of students go spelunking in a different pre-existing code base.  I’d therefore like to find 15-20 applications that are:

  1. Relatively well written in C, Java, or Python (the three languages I can be sure the students know).
  2. Open source (for obvious reasons).
  3. About 50,000 lines long (yes, I know that lines of code is a weak measure of complexity, but it’s easy to calculate).
  4. Build and run on Windows, Linux, and Mac OS X.
  5. Under active development (so that students have someone to turn to when they have questions).

I’d prefer complete applications to libraries, toolkits, or frameworks.  Vim is a good example of what I’m after; I’d welcome pointers to others.

Teaching

  1. guest
    November 11th, 2009 at 11:17 | #1

    Not sure about the LOC measure but things that come to mind are Azureus (Java), Audacity (C++), NumPy + SciPy (Python – prob much bigger but lots of cleanly seperated sections so it would be easy to get started), matplotlib (Python), spyder ( http://packages.python.org/spyder/ python – bit smaller but really cool and could probably use the help), pidgin (C), jabref (java), ardour (c++).

    That’s pretty much all my favorite open source apps – apart from some being C++ I think they all meet the requirements..

  2. Darren
    November 11th, 2009 at 11:41 | #2

    I remember digging through FFMPEG
    http://ffmpeg.sourceforge.net/
    a few years ago. Which might be about the right size. It contains the libavcodec that is used in VLC player.

  3. Kalle Svensson
    November 11th, 2009 at 12:25 | #3

    What about SQLite or WebKit? Perhaps SQLite isn’t that fun to explore unless you are into relational algebra, but it is very well written!

  4. Stan
    November 11th, 2009 at 12:56 | #4

    I would personally stay away from codecs like FFMPEG. It is much harder to understand what is going on without first becoming fairly knowledgeable about video compression. (A nice thing, but you probably don’t have time for that in a semester.) Avoiding programs that require a lot of domain-specific knowledge is probably a good idea.

    That said, for people with a bit of mathematical leaning, I think cairo (the 2D graphics library) could be interesting. According to sloccount, it is 65,000 lines of C, plus it has a full test suite. Students could start by using the library to make some pretty pictures before digging into how it works.

  5. michael dillon
    November 11th, 2009 at 13:30 | #5

    Leo the Python editor. Quite a contrast to vim with it’s tree structured approach. http://webpages.charter.net/edreamleo/front.html

  6. November 11th, 2009 at 13:42 | #6

    I like the Python-based roundup issue tracker: http://roundup.sourceforge.net. Doing “find roundup -name ‘*.py’ | xargs wc -l” yields a line count of about 45KLOC. You can run the tracker without integrating it into a web server like Apache.

    GNU wget is also about 50KLOC, and it should build on Windows. http://www.gnu.org/software/wget/

  7. November 11th, 2009 at 14:29 | #7

    Yea, +1 on Roundup (I’ve been poking around in the code, and it’s fairly decent).
    I’d be worried about C code, though… Given that, by 302, they won’t have taken 369 (read: they won’t know C), and “real” C applications are orders of magnitude more complex than anything they will have seen in 209 (when I looked into Vim’s source, probably 1/10th of the lines were #ifdefs), they won’t stand a chance. Or, at least, they would be at a very, very, very significant disadvantage.

  8. November 11th, 2009 at 14:31 | #8

    That said, maybe the source for Python would be interesting to look at? You’d have the standard library (a truck load of Python) and the interpreter (a truck load of decently comprehensible C) to look at, and I know first hand that it’s very easy to build across platforms.

  9. November 11th, 2009 at 15:58 | #9

    Hi Greg, I’d like to suggest Task Coach. Actively developed, Python + wxPython, runs on Windows, Linux and Mac OS X. Uses design patterns, has over 3000 unittests.

  10. November 12th, 2009 at 00:22 | #10

    MoinMoin Wiki?

    Python, GPL, about 50 kLOC (depends on what you look at, how you count), platform-independant, under active development (we like to work with interested students and have done so the last years in Google Summer of Code).

    We use: Mercurial DVCS, py.test, epydoc, PEP8, WSGI, werkzeug, pygments, xapian, wiki, irc, eclipse/pydev, vim, … and we like clean and easy readable python code.

    1.9 is the upcoming stable release and thus nice for developments like plugins that should be in production soon.

    2.0 is unstable/experimental and nice for exciting hacks at the core (and this is what we currently do there).

    If that sounds interesting, please contact us early (email, IRC #moin-dev on freenode).

  11. Adrian
    November 12th, 2009 at 02:22 | #11

    PostgreSQL. Very much recommended as a case study in software construction.

  12. November 13th, 2009 at 16:51 | #12

    For context, in previous years we’ve used:

    – JEdit
    – JFreeChart
    – UMLLet

    Some years we allowed the students to vote which codebase to use; losers in these votes included Lobo, Violet, TWiki and the Google Web Toolkit.

    One additional criteria to add to Greg’s: It needs to be an application domain that the students already understand fairly well.

Comments are closed.