Archive

Archive for the ‘Making Software’ Category

“Making Software” Screencast

November 17th, 2010
Comments Off

A screencast about Making Software is now up on Amazon. I had to talk pretty fast to fit their four-minute limit, but I think I hit the high points.

Making Software

More Good Science

November 12th, 2010

We’re starting to get feedback on Making Software, most of it positive (but some of it grumpy: “how dare your evidence contradict my cherished belief!”). Here are two recent papers that aren’t in the book, but will give you a taste of what is:

Rossbach, Hofmann, and Witchel: “Is Transactional Programming Actually Easier?” In Proc. 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. The question they set out to answer is, does software transactional memory (STM) make parallel programming easier or not? From their abstract:

In this paper, we describe a user-study in which 147 undergraduate students in an operating systems course implemented the same programs using coarse and fine-grain locks, monitors, and transactions. We surveyed the students after the assignment, and examined their code to determine the types and frequency of programming errors for each synchronization technique. Inexperienced programmers found baroque syntax a barrier to entry for transactional programming. On average, subjective evaluation showed that students found transactions harder to use than coarse-grain locks, but slightly easier to use than fine-grained locks. Detailed examination of synchronization errors in the students’ code tells a rather different story. Overwhelmingly, the number and types of programming errors the students made was much lower for transactions than for locks. On a similar programming problem, over 70% of students made errors with fine-grained locking, while less than 10% made errors with transactions.

In other words, students did better, but thought they did worse. This is interesting for a whole bunch of reasons (not least that for highlighting how flaky subjective self-assessment is).

Bird, Nagappan, Murphy, Gall, and Devanbu: “An Analysis of the E ffect of Code Ownership on Software Quality across Windows, Eclipse, and Firefox.”

From their abstract:

We examine the relationship between di erent ownership measures and software faults/failures in three large software projects drawn from di fferent process domains: Windows Vista, the Eclipse Java IDE, and the Firefox Web Browser. We find that in all cases, measures of ownership such as the number of low-expertise developers, and the proportion of ownership for the top owner have a relationship with both pre-release faults and post-release failures. However, we find that the strength of the eff ects is related to the development process used. Vista shows the strongest relationship with ownership level, followed by Eclipse, and then Firefox, suggesting that the more that a project uses an open source style process, the more that team sizes rather than ownership levels aff ect failures. We also find reasons that low-expertise developers make changes to components and show that the removal of low-expertise contributions dramatically decreases the performance of contribution-based defect prediction.

They are painstaking in defining what they mean by “ownership”, and how they measure it, so that other people can (and should!) replicate their work. Drilling down, their conclusions are:

  • Vista:
    1. The number of minor contributors has a strong positive relationship with both pre- and post-release failures even when controlling for metrics such as size, churn, and complexity.
    2. Higher levels of ownership for the top contributor to a component results in fewer failures when controlling for the same metrics, but the effect is smaller than the number of minor contributors.
    3. Ownership has a stronger relationship with pre-release failures than post-release failures.
  • Eclipse:
    1. Both MINOR and TOTAL (defined in the paper) have a positive relationship with pre- and post-release defects. However, neither is consistently a better indicator, and the effect is weaker than in Vista.
    2. Higher levels of Ownership sometimes have a positive relationship with pre- and post-release quality, but the effect is small when it is statistically significant.
    3. Ownership measures have a slightly larger effect on pre-release failures than post-release failures.
  • Firefox:
    1. Team size has a stronger relationship with defects than ownership levels.
    2. Team size and ownership metrics have a much stronger relationship with pre-release defects than post-release defects.

This is cool: we can measure important things, we can see how they relate to other important things, and (crucially) we can act on what we see. I’m looking forward to seeing what both groups do next.

Making Software

“Making Software” Covers

October 8th, 2010
Comments Off

“Making Software” Now Available on Rough Cuts

July 16th, 2010
Comments Off

Making Software (the collection on empirical software engineering that I helped edit) is now available on Safari Rough Cuts — chapters include:

  1. A Communal Workshop or Doors that Close?
  2. Learning through Application: The Maturing of the Quality Improvement Paradigm in the SEL
  3. Conway’s Corollary
  4. Architecting: How Much and When
  5. How Usable Are Your APIs?
  6. Modern Code Review
  7. Quality Wars: Open Source vs. Proprietary Software
  8. Personality, Intelligence, and Expertise: Impacts on Software Development
  9. Mining Your Own Evidence
  10. What We Can Learn From Systematic Reviews
  11. Understanding Software Engineering through Qualitative Methods
  12. What Does 10x Mean? Measuring Variations in Programmer Productivity
  13. Code Talkers
  14. Why Aren’t More Women in Computer Science?
  15. Pair Programming
  16. The Art of Collecting Bug Reports
  17. Identifying and Managing Dependencies in Global Software Development
  18. Why Is It So Hard to Learn to Program?
  19. Beyond Lines of Code: Do We Need More Complexity Metrics?
  20. The Quest for Convincing Evidence
  21. Copy-Paste as a Principled Engineering Tool
  22. Two Comparisons of Programming Languages
  23. How Effective is Test Driven Development?
  24. How Effective Is Modularization?
  25. The Evidence for Design Patterns

We hope you enjoy it!

Making Software

It’s Gone to Production

July 7th, 2010

The collection of essays on evidence-based software engineering that Andy Oram and I edited has gone to production. The final title is Making Software: What Really Works, and Why We Believe It. Individual chapters will be available as Rough Cuts from O’Reilly next month, and the book itself should be on the shelves not long after.

making-software1

I’d like to thank all the people who volunteered their time; in no particular order, they and their chapters are:

  1. Tim Menzies and Forrest Shull: The Quest for Convincing Evidence
  2. Lutz Prechelt and Marian Petre: Credibility, or Why Should I Insist on Being Convinced?
  3. Barbara Kitchenham: What We Can Learn From Systematic Reviews
  4. Andrew Ko: Understanding Software Engineering through Qualitative Methods
  5. Victor R. Basili: Learning through Application: The Maturing of the Quality Improvement Paradigm in the SEL
  6. Jo E.Hannay: Personality, Intelligence, and Expertise: Impacts on Software Development
  7. Mark Guzdial: Why Is It So Hard to Learn to Program?
  8. Israel Herraiz and Ahmed E. Hassan: Beyond Lines of Code: Do We Need More Complexity Metrics?
  9. Elaine J. Weyuker and Thomas J. Ostrand: Finding Fault: Developing an Automated System for Predicting Which Files Will Contain Defects
  10. Barry Boehm: Architecting: How Much and When
  11. Christian Bird: Conway’s Corollary
  12. Burak Turhan, Lucas Layman, Madeline Diep, Hakan Erdogmus, and Forrest Shull: How Effective is Test Driven Development?
  13. Michele A. Whitecraft and Wendy M. Williams: Why Aren’t More Women in Computer Science?
  14. Lutz Prechelt: Two Comparisons of Programming Languages
  15. Diomidis Spinellis: Quality Wars: Open Source vs. Proprietary Software
  16. Robert DeLine: Code Talkers
  17. Laurie Williams: Pair Programming
  18. Jason Cohen: Modern Code Review
  19. Jorge Aranda: A Communal Workshop or Doors that Close?
  20. Steve McConnell: What Does 10x Mean? Measuring Variations in Programmer Productivity
  21. Neil Thomas and Gail Murphy: How Effective Is Modularization?
  22. Walter Tichy: The Evidence for Design Patterns
  23. Tom Ball and Nachi Nagappan: Evidence-Based Failure Prediction
  24. Rahul Premraj and Thomas Zimmermann: The Art of Collecting Bug Reports
  25. Dewayne Perry: Where Do Most Software Flaws Come From?
  26. Andrew Begel and Beth Simon: Novice Professionals: How Newly-Hired Recently-Graduated Software Developers Fare in their First Software Engineering Job
  27. Kim Sebastian Herzig and Andreas Zeller: Mining Your Own Evidence
  28. Michael Godfrey and Cory Kapser: Copy-Paste as a Principled Engineering Tool
  29. Steven Clarke: How Usable Are Your APIs?
  30. Marcelo Cataldo: Identifying and Managing Dependencies in Global Software Development

Making Software

The Jolts Are Back

June 22nd, 2010
Comments Off

The Jolt Awards for best software (and book) are back: this page on the Doctor Dobb’s Journal site has the schedule and categories.  It’s a shame that neither of the collections I’m helping edit right now (one on evidence-based software engineering, the other on the architecture of open source applications) will be in print in time to qualify this year, but there’s always 2011 :-)

Announcements, Architecture of Open Source Applications, Making Software

Communication Matters Most

April 6th, 2010
Comments Off

Tania Samsonova has posted an interesting article discussing the importance of communication skills to job success for junior developers. Drawing on the work of people like Andrew Begel and Beth Simon (who are contributing a chapter to our upcoming book on evidence-based software engineering), Tania talks about how the ability to ask questions and share ideas is a lot more important than specific technical skills. I particularly like this quote:

- Anna, congratulations: your understanding of spoken English improved a lot.

-How do you know? You rarely talk to me anyway.

-You’ve stopped smiling and nodding all the time when people talk to you.

Making Software

Currently Juggling

March 15th, 2010

I keep telling my students not to over-commit themselves. It’s a shame I don’t take my own advice :-) . Here’s what I’ve currently got on the go:

Software Carpentry teaches basic software development skills to scientists and engineers. I have 80% of the funding I need to spend a year upgrading its content and delivery. I hope to raise the last 20% of the money in the next few weeks. If I can pull it off, the major challenges will be:

  1. Learning how to create effective online course material: there’s lots of handwaving out there about wikis in the classroom, but nothing substantive about instructional design for mature learners using present-day internet technologies.
  2. Assessment. We don’t know how to measure the productivity of programmers, or the productivity of scientists; trying to gauge this course’s impact on the productivity of scientific programmers will therefore be something of a challenge. (One of the reasons I left industry for academia in 2006 was to figure out how to do this, but my attempts to find research funding all failed.)
  3. Mechanics. Site5 only allows one shell account per domain, which makes it difficult to open up the project’s Subversion repository to other contributors. And I’ll have to choose a format for the lecture notes: LaTeX, plain HTML, S5, one of the many wiki formats… And figure out a better way to create and manage images and video. And pick a bibliography format. And…

A professional Master’s degree in Computer Science at the University of Toronto to complement the department’s existing research Master’s. The program consists of five regular graduate courses, a course each on business skills and professional communication, and an eight-month industrial internship in which students have to show that they can translate theory into practice. We are now accepting applications for September 2010 entry, so if you’d like to learn leading-edge ideas from some of the best researchers in the world, please check it out.

Basie, our replacement for Trac, built on Django and jQuery, is coming along nicely, but I don’t know what will happen to it once I leave U of T. A few non-students are now involved in its development, but we aren’t big enough to bid for our own Google Summer of Code students. If anyone would like to get involved, please give me a shout. (I’d particularly like to hear from ex-project students—it would be nice to have an excuse to stay in touch.)

UCOSP stands for “undergraduate capstone open source projects”. Since September 2008, undergraduates from several universities in Canada and the US have been taking part in joint capstone projects in order to learn first-hand what distributed development is like. Each team has students from two or three schools, and works for a term under the supervision of a faculty or industry lead on an open source project. We’re currently trying to find $35,000 to hire a half-time administrator to run the program from September 2010 so that we can scale up from the present 45 students/term to 80, 90, or more. Again, if you’re interested, please give me a shout.

CSC302 is my regular undergraduate software engineering course. This term, six teams of students are porting Django to Python 3, adding pivot tables to Gnumeric, parallelizing parts of ILUTE, upgrading PyLint, pluginifying Selenium, and extending SpatiaLite. It could be the last regular course I teach at the University of Toronto; it has been a bit bumpy, but I’m glad the students are getting to work on real things.

Grad student supervision: Alecia, Zuzel, and Mike all have topics nailed down, and Jason is writing up. I plan to spend one morning a week in the department working with them from now through next January; I’m looking forward to seeing what they produce.

The Cowichan Problems. This one goes back to the mid-1990s, when I first realized that human performance was at least as important to overall productivity in computational science as machine performance. The idea is to use a suite of fairly simple applications, all stitched together, to benchmark the usability of parallel programming systems. A couple of undergrads updated the code last year; I’m hoping to revisit it as part of my work on Software Carpentry.

Book #1, called What Really Works?, is a Beautiful Code-style book that presents evidence-based results in software engineering. Where do bugs actually come from? Does pair programming get the job done faster? Can code metrics predict post-release fault rates? Are some programming languages intrinsically more productive than others? Each of our authors will explore one such question in a chapter-length essay; contributions are now coming in, and we’re still on track to have the book on the shelves this summer. (I’ve been talking about this subject and this book for a few months now; if you’re interested, you can view the slides.)

Book #2 is yet another collection, this time exploring the architecture of open source applications. As I said in my lightning talk at PyCon, the aim isn’t really to explain the internals of Hadoop, Parrot, and Mercurial (though I think that’s worth doing). The real aim is to teach people how to think about software architecture by showing them how architects think. We’re hoping to have chapters in for review by November, and the book out this time next year.

Book #3 is an illustrated children’s book about the universe, life, science, and global warming. I’ve had some good feedback from the editor who handled my last children’s book, but most of the work is still in front of me.

Projects I’m not working on:

Government 2.0: I enjoyed working on open data/open government projects with my students last term, but I couldn’t find any faculty at U of T willing to keep it going. I could have found Gov 2.0 stuff for CSC302, but I thought open source work would be better for them.

Two novels and half a dozen short stories. I enjoy writing fiction, but it feels like an indulgence, and I keep pushing it aside to do “serious” stuff. I’m sure that when I’m seventy I’ll regret having done that, so I hope to spend one hour a day writing fiction once I start full-time on Software Carpentry.

Jazz: I haven’t touched my sax since this time last year—it may be vanity, but I’d rather not play at all than play badly. Maybe when my daughter’s a little older…

Exercise: yeah… exercise. Maybe I’ll get my bike back on the road this week…

Basie, Government 2.0, Making Software, Research, Teaching, Uncategorized

I Don’t Care Until I Can Check

January 31st, 2010

Over in the Agile Usability group, Larry Constantine writes:

…Capers Jones has been sharing with me some hard data summaries on a variety of development methods and practices gathered from a very large number of projects undertaken by varied organizations that contribute data on bugs, costs, etc., to his company….An interesting thing is that agile methods fare better in most measures, including total cost of ownership of final software product, than practices associated with CMM level 3 but are NOT as good as the Rational Unified Process and all three are trumped by CMM level 5…I don’t want to get into the specific numbers (the data set is proprietary anyway)…I want to raise a very different issue: What would it mean to the agile community IF these findings really were valid and true?

To which I can only reply, “Show me the data.” Seriously: if you’re not willing to show people your data and explain where and how it was collected, and how it was analyzed, we should pay exactly as much attention to you as we do to the guy in the bar who claims to have met a guy who met the guy who actually shot JFK. My greatest hope for our upcoming collection on evidence-based software engineering is that it will remind people that neither anecdotes nor trade secrets constitute proof.

Making Software

CUSEC 2010

January 25th, 2010
Comments Off

If it’s Monday, I must be catching up… I spoke at CUSEC 2010 last week to about 250 students and others about evidence-based software engineering. The talk is an update of the one I gave at DevDays last October; it’s basically a pitch for an upcoming O’Reilly collection on the subject, and the slides are up on SlideShare. You can find Joey de Villa’s detailed notes on the CUSEC keynotes on his blog:

and more to come.

Making Software