Projects

Altruism in Software Teams

The aim of this project is to see if it is possible to detect altruism in software teams (i.e., to measure how much time developer A spends helping developer B even though B's problem isn't officially A's concern). If so, the research will try to determine if there is any correlation between altruism and (for example) staff turnover or the long-term maintainability of the code base.
Keywords: software development, team dynamics

Browsercast

Tools like PowerPoint aren't web-friendly. When you export a slideshow to the web, what you get is a bunch of images, while screencasts are opaque to search engines and disability aids. In contrast, Browsercast plays snippets of audio in the browser as the viewer moves through the slides, so "View Source", links, CSS, screen readers, and search work as they should. The prototype uses just 5kb of JavaScript; the aim of this project is to turn it into a functional tool.
Keywords: JavaScript, UI design, multimedia, accessibility

Validity of Claims

Are some programmers really ten times more productive than others? Does test-driven development actually make programmers more productive? And do people actually believe these claims? This project will conduct a quantitative survey of best-selling books on software developmnt to measure how many of their claims are backed by citations, and of those, how many are considered valid, then survey programmers to see which (if any) they believe.
Keywords: evidence-based software engineering

The Impact of Calibrated Code Review

Give a novice programmer a one-page program and have them score it using a checklist, then grade them on how closely their scoring matches the instructor’s. (They start with 100%, and lose one mark for each false positive or false negative.) After doing this a handful of times, they should learn to see code through the instructor’s eyes. Does this help them write better code? If so, how quickly and how well? This project will attempt to answer these questions.
Keywords: code review, education

Code Selectors

CSS Selectors allow developers to select elements in a web page, and jq offers a similar notation for selecting elements in JSON documents. The aim of this project is to develop a similar notation for selecting blocks of source code, e.g., to say, "Get the first for loop in the method m in the class C". The primary use case will be including snippets of code in books and tutorials, so the notation must be able to handle multiple languages.
Keywords: Python, programming languages, parsing

Simulations of Distributed Systems

Software Design by Example in Python deliberately ignored concurrency, partial failure, and everything else associated with modern distributed applications. The aim of this project is to (start to) fix that by building scale models of distributed protocols and systems from TCP to BitTorrent and load-balancing tools using either Py-DES or SimPy. The tutorials will use simulators so that the accompanying lessons could illustrate edge cases in reproducible ways.
Keywords: Python, distributed systems, discrete event simulation, education

Developer Discussions

Which of the techniques catalogued in The Discussion Book are programmers familiar with? Which ones are supported by their tools? Which ones do they use informally without explicit tool support, and how do they operationalize them? These questions cannot be answered by mining software repositories; instead, the student(s) doing this project will have to administer surveys and conduct observational studies.
Keywords: observational study, surveys

Dragnet

One type of exercise that H5P doesn't support is adding labels to diagrams. This prototype takes an SVG with some specially-marked labels, moves those labels to the side, and then lets the user try to drag them back into the right places. A deployable version would need to do a lot more, such as dealing with scaling transformations; the goal of this project is to turn the demo into something a classroom teacher could use.
Keywords: JavaScript, SVG, UI design, education

Drawing Execution Order

Philip Guo's Python Tutor helps novice programmers visualize the data structures in their programs. The aim of this project is to build a similar tool that can display the order in which statements are executed in a more readable way than is shown below:
execution tracer
Keywords: education, program analysis

Extending Lox

Lox is a simple interpreted language created by Robert Nystrom for his book Crafting Interpreters. Many people have extended it in various ways; in this project, students would re-create Lox by working through the second half of Nystrom's book, then add operator overloading, cooperative concurrency, and a few other features to bring the language up to par with Lua.
Keywords: C, compilers, programming languages

Generative Art

The goal of this project is to translate Danielle Navarro's tutorial on generative art that starts with this blog post from R to Python. By the end, learners should be able to build software that creates art like the examples in this gallery.
Keywords: Python, R, generative art

Ghost Engineers

A junk "study" about "ghost engineers" that appeared in late 2024 was probably viewed more times than every carefully-done study in empirical software engineering published in that year. The aim of this project is to see if that claim is true, i.e., to estimate how many people read or re-posted that study and compare it to estimates of the number of people outside academia who read or quoted reputable peer-reviewed studies.
Keywords: engagement, bullshit, empirical software engineering

The Hoye Test

The Turing Test classifies a machine as "intelligent" if an independent observer can't distinguish between it and a human being in conversation. This project will implement a similar test for malicious software (which we call the Hoye Test in honor of the person who proposed it): pick an application (e.g., a discussion forum), build a work-alike that is deliberately malicious in some way (e.g., designed to radicalize its users), and then have people use both and guess which is which.
Keywords: software design, radicalization

Student Adoption of IDE Tools

Which features of integrated development environments (IDEs) do students actually use? To find out, this project will have a set of students record their screens while solving a set of programming and debugging problems, then analyze those recordings to see whether and when students use breakpointing debuggers, multiple cursors, refactoring tools, and other features.
Keywords: observational study, integrated development environments

Marimo and H5P

Marimo is a next-generation computational notebook that enables data scientists to mix code, discussion, and results in a reproducible way. Its plugin system relies on AnyWidget, which specifies a simple contract between extensions and Marimo's rendering and execution engine. The aim of this project is to design, build, and test a set of Marimo plugins that can be used for classroom exercises similar to those in the H5P toolkit: multiple choice, fill in the blanks, and so on.
Keywords: Python, JavaScript, computational notebooks, education

Markdown to DOM

Python-Markdown converts Markdown to HTML; if an application needs a DOM tree that it can check or manipulate, it must then parse the HTML using a library like BeautifulSoup, perform whatever operations it needs to, and then convert the DOM back to HTML. In this project, students will refactor Python-Markdown so that it can generate a Beautiful Soup-compatible DOM tree directly.
Keywords: Python, open source, parsing

What "Business of Software" Doesn't Teach

Many universities offer an undergraduate course on entrepreneurship or the business of software. This project will survey these courses to determine what they don't teach. For example, how many of these courses (if any) devote time to labor rights? How many discuss anti-trust legislation? And how does that vary by country and by the nature of the institution?
Keywords: business of software, entrepreneurship, lesson content

Narwhals

Narwhals is a Python package that provides compatibility between dataframe libraries, allowing applications to use Pandas, Polars, and other libraries through a common API. Students working on this project will contribute directly to Narwhals, and will be responsible for fixing bugs, designing new features, and shepherding their work through review into production.
Keywords: Python, data science, open source

Parallelizing Marimo Notebooks

Marimo is a next-generation computational notebook that (a) stores everything as Python source code and (b) analyzes code to prevent out-of-order execution of cells. Dagster and Metaflow are computational workflow tools that allow users to add decorators to functions and methods to specify computational chunks. The goal of this project is to see if the two can be married, i.e., to see if it's possible to add decorators to cell functions in Marimo to parallelize notebooks directly.
Keywords: Python, parallel computing, workflows, computational notebooks

Software Design for Everyone

Each lesson in this tutorial will present a "what if?" scenario and then explores its implications for software design. How would you redesign a cell phone app if you had crippling arthritis (which you can simulate by taping popsicle sticks to your fingers)? What if you thought your government might take a sharp turn to the right and retroactively weaponize women's health records: (how) could you satisfy doctors' need for information with patient safety? The practical exercises will assume enough programming skill to build simple web applications.
Keywords: accessibility, UI design, education

Software Design by Example in Gleam

Gleam is a modern functional language that runs on the Erlang/OTP platform (and can also be compiled to JavaScript). The aim of this project is to translate examples from Software Design by Example into Gleam to help people coming from Python and other mainstream languages understand how to use FP in practice.
Keywords: Gleam, functional programming, software design, education

Software Performance by Example

Each lesson in this tutorial will take a simple application, analyze its performance, and then make it faster. Along the way, the lessons will present general tips for improving performance similar to those in Jon Bentley's classic book Writing Efficient Programs, update them, and show how to apply them in practice.
Keywords: C, Python, JavaScript, SQL, distributed systems, performance

Software Security by Example

The first lesson in this tutorial will present a simple implementation of a wiki designed for shared note-taking. Each of the following lessons will fix one of its security shortcomings (or one of the shortcomings introduced by an earlier fix). Some will be vulnerabilities such as cross-site scripting or SQL injection; others will be missing features such as basic authentication or OAuth, role-based access control, the kind of logging that every sys admin wishes they had, static code analysis, and eventually the audit and emergency response procedures that such tools are meant to support.
Keywords: web programming, digital security, education

Session Recording and Playback

asciinema and similar tools can record a terminal window session and play it back in another terminal or in a browser. This prototype adds audio recording and synchronized playback so that (for example) an instructor can record a live coding session with a voiceover for a learner to go through later. This project will extend that prototype to replay sessions in the browser.
Keywords: JavaScript, UI design, multimedia, accessibility

projects = { “ongoing”: [ “browsercast”, “marimo-h5p”, ], “programming”: [ “execution-order”, “narwhals”, “markdown-dom”, “dragnet”, “wysiwyg-editor”, “xkcd-charts”, “extending-lox”, “testing-rse”, “parallel-marimo”, “tidyblocks”, “wysiwyg-notebook”, “code-selectors”, “session-recording”, “tower-support”, ], “tutorials”: [ “web-tutorial”, “generative-art”, “sdx-gleam”, “des-sim”, “sdx-security”, “sdx-everyone”, “unbreaking”, “sdx-performance”, ], “research”: [ “undergrad-textbooks”, “variable-roles”, “code-review”, “claim-validity”, “understanding-ethics”, “missing-lessons”, “slide-text”, “tooling-effort”, “developer-discussions”, “ide-adoption”, “ghost-engineers”, # “hoye-test”, # “altruism”, ] }

Comparing Slideshow Tools

Are slideshows written using HTML- or Markdown-based tools more text-intensive than those written in PowerPoint? Putting it another way, are slides written in formats that version control understands (text) less likely to use diagrams than slides written with GUI tools? To answer this question, the student(s) doing this project will have to develop ways to quantify how graphical or textful a presentation is, and learn how to make work of this kind reproducible.
Keywords: slideshows, tools, reproducible research

Testing Research Software

The JavaScript and Python versions of Software Design by Example showed readers how to design programs by working through scaled-down examples. In contrast, this project will develop scaled-down versions of things like fluid flow simulators and data analysis pipelines, and then shows readers how to test them. Each lesson will open with a short recap of the science and a walk-through of the untested code, then explore how that code can be tested.
Keywords: Python, computational science, software testing, education

A Blocks-Based Data Science Tool

TidyBlocks was a prototype of a Scratch-like tool for teaching introductory data ascience. It turned out to be an inappropriate visual paradigm, as there was no natural way to represent join operations as nested blocks. The aim of this project is to explore an alternative using a node-and-connector model like that of Node-RED or Yahoo! Pipes.
Keywords: JavaScript, UI design, programming tools, education

Tooling Effort over Time

How does the percentage of effort devoted to tooling and deployment change as a project grows and/or ages? And how has it changed as we've moved from desktop applications to cloud-based applications? Once a project reaches a certain size, does the amount of tooling (measured by number of files or lines of configuration) level off? Does the effort required to maintain the tooling grow as the code base grows, or does it level off as well? Answering these questions will give the student(s) a chance to learn how to mine software repositories.
Keywords: mining software repositories, tooling, reproducible research

A Tower Support Game

A tower defense game is one in which the player builds fixed defenses against incoming waves of attackers. (Kingdom Rush is a personal favorite.) The objective of this game is to prototype a simple tower support game, in which the player builds bridges, first aid stations, and so on to help travelers reach their destination.
By Ironhide Game Studio - https://www.gamezebo.com/reviews/kingdom-rush-review/, Fair use, https://en.wikipedia.org/w/index.php?curid=74161297
Keywords: games, JavaScript

Unbreaking Software

Most programmers spend a large part of their time debugging, but most courses only show working code, and most textbooks don't discuss how to prevent, diagnose, and fix errors. This tutorial will fill that gap by presenting dozens of case studies showing how to find and fix real-world problems. Along the way, it will present examples of what programmers can do to handle errors gracefully, from data structure repair to automatically restarting servers.
Keywords: software design, debugging, education

Analysis of Undergrad Textbooks

Most undergraduate computer science programs have a first- or second-year course on data structures and algorithms. What do these courses actually teach, how has their content changed since Wirth's classic book appeared in 1976, and which of these algorithms and data structures are used in upper-year courses? To answer these questions, this project will assemble and apply tools to analyze the text of several dozen textbooks; along the way, the students doing the project will have to decide how to identify topics, how to count them, and how to make their work reproducible.
Keywords: natural language processing, education

Understanding Ethics

This project will start by creating a set of scenarios in which a programmer needs to make an ethical decision, each with multiple-choice options. An expert will determine the best answer for each; students and professionals will then be asked to answer the same questions, and the results will be analyzed to see how well each group matches the experts' opinions and whether practitioners' opinions are any better than those of students.
Keywords: ethics, professional development

Identification of Variable Roles

Sajaniemi et al's work on roles of variables identified and named ten small patterns in the way variables are used in novice programs. This project would build static and dynamic analysis tools to detect those patterns (and possibly others) in programs as an aid to teaching, debugging, and code review.
Keywords: program analysis, education

Human-Scale Web Programming

This incomplete tutorial is an introduction to web programming aimed at scientists and others will little or no experience with JavaScript, HTTP requests, and related technologies. In this project, a student (or team of students) with an interest in teaching would fill it in and offer it at least once in order to learn more about how to create and deliver high-quality lessons.
Keywords: JavaScript, Python, web programming, education

A Little WYSIWYG Editor

Panchekha and Harrelson's Web Browser Engineering builds a small but fully-functional web browser step by step to show students how real ones work. The aim of this project is to build an equally simple desktop WYSIWYG editor in Python that supports both styled text and embedded sketching.
Keywords: Python, UI design

A WYSIWYG Computational Notebook

Jupyter uses JSON as its storage format, while Marimo and Quarto use Python with embedded strings and Markdown with embedded code respectively. This project will explore a third option by building an extension for LibreOffice using the the Jupyter messaging protocol so that people who prefer WYSIWYG editors can embed code and its output alongside diagrams, tables, and other media.
Keywords: Java, computational notebooks, UI design

XKCD Charts

Chart.xkcd is a JavaScript library that displays charts in the sketchy hand-drawn style of XKCD. Its creator is no longer maintaining it; this project will fork the original code, fix outstanding issues, and add new features such as axis limits and stable coloring schemes.
Keywords: JavaScript, SVG, UI design, data visualization, open source