Projects
Altruism in Software Teams
The aim of this project is to see if it is possible to detect
altruism in software teams (i.e., to measure how much time developer
A spends helping developer B even though B's problem isn't
officially A's concern). If so, the research will try to determine
if there is any correlation between altruism and (for example) staff
turnover or the long-term maintainability of the code base.
Keywords: software development, team dynamics
Browsercast
Tools like PowerPoint aren't web-friendly. When you export a
slideshow to the web, what you get is a bunch of images, while
screencasts are opaque to search engines and disability aids. In
contrast, Browsercast
plays snippets of audio in the browser as the viewer moves through
the slides, so "View Source", links, CSS, screen readers, and search
work as they should. The prototype uses just 5kb of JavaScript; the
aim of this project is to turn it into a functional tool.
Keywords: JavaScript, UI design, multimedia, accessibility
Validity of Claims
Are some programmers really ten times more productive than others?
Does test-driven development actually make programmers more
productive? And do people actually believe these claims? This
project will conduct a quantitative survey of best-selling books on
software developmnt to measure how many of their claims are backed
by citations, and of those, how many are considered valid, then
survey programmers to see which (if any) they believe.
Keywords: evidence-based software engineering
The Impact of Calibrated Code Review
Give a novice programmer a one-page program and have them score it
using a checklist, then grade them on how closely their scoring
matches the instructor’s. (They start with 100%, and lose one mark
for each false positive or false negative.) After doing this a
handful of times, they should learn to see code through the
instructor’s eyes. Does this help them write better code? If so, how
quickly and how well? This project will attempt to answer these
questions.
Keywords: code review, education
Code Selectors
CSS
Selectors allow developers to select elements in a web page,
and jq offers a
similar notation for selecting elements in JSON documents. The aim
of this project is to develop a similar notation for selecting
blocks of source code, e.g., to say, "Get the first for loop in the
method m in the class C". The primary use
case will be including snippets of code in books and tutorials, so
the notation must be able to handle multiple languages.
Keywords: Python, programming languages, parsing
Simulations of Distributed Systems
Software Design by
Example in Python deliberately ignored concurrency, partial
failure, and everything else associated with modern distributed
applications. The aim of this project is to (start to) fix that by
building scale models of distributed protocols and systems from TCP
to BitTorrent and load-balancing tools using either
Py-DES
or SimPy. The tutorials
will use simulators so that the accompanying lessons could
illustrate edge cases in reproducible ways.
Keywords: Python, distributed systems, discrete event simulation, education
Developer Discussions
Which of the techniques catalogued in
The
Discussion Book are programmers familiar with? Which ones
are supported by their tools? Which ones do they use informally
without explicit tool support, and how do they operationalize them?
These questions cannot be answered by mining software repositories;
instead, the student(s) doing this project will have to administer
surveys and conduct observational studies.
Keywords: observational study, surveys
Dragnet
One type of exercise that H5P doesn't support is adding
labels to diagrams. This
prototype takes an SVG with some specially-marked labels, moves
those labels to the side, and then lets the user try to drag them
back into the right places. A deployable version would need to do a
lot more, such as dealing with scaling transformations; the goal of
this project is to turn the demo into something a classroom teacher
could use.
Keywords: JavaScript, SVG, UI design, education
Drawing Execution Order
Philip Guo's Python Tutor
helps novice programmers visualize the data structures in their
programs. The aim of this project is to build a similar tool that
can display the order in which statements are executed in a more
readable way than is shown below:
Keywords: education, program analysis
Extending Lox
Lox is a simple interpreted language created by Robert Nystrom for
his book Crafting
Interpreters. Many people have extended it in various ways;
in this project, students would re-create Lox by working through the
second half of Nystrom's book, then add operator overloading,
cooperative concurrency, and a few other features to bring the
language up to par with Lua.
Keywords: C, compilers, programming languages
Generative Art
The goal of this project is to translate Danielle Navarro's tutorial
on generative art that starts
with this
blog post from R to Python. By the end, learners should be able
to build software that creates art like the examples
in this gallery.
Keywords: Python, R, generative art
Ghost Engineers
A junk "study" about "ghost engineers" that appeared in late 2024
was probably viewed more times than every carefully-done study in
empirical software engineering published in that year. The aim of
this project is to see if that claim is true, i.e., to estimate how
many people read or re-posted that study and compare it to estimates
of the number of people outside academia who read or quoted
reputable peer-reviewed studies.
Keywords: engagement, bullshit, empirical software engineering
The Hoye Test
The Turing Test classifies a machine as "intelligent" if an
independent observer can't distinguish between it and a human being
in conversation. This project will implement a similar test for
malicious software (which we call
the Hoye Test in honor
of the person who proposed it): pick an application (e.g., a
discussion forum), build a work-alike that is deliberately malicious
in some way (e.g., designed to radicalize its users), and then have
people use both and guess which is which.
Keywords: software design, radicalization
Student Adoption of IDE Tools
Which features of integrated development environments (IDEs) do
students actually use? To find out, this project will have a set of
students record their screens while solving a set of programming and
debugging problems, then analyze those recordings to see whether and
when students use breakpointing debuggers, multiple cursors,
refactoring tools, and other features.
Keywords: observational study, integrated development environments
Marimo and H5P
Marimo is a next-generation
computational notebook that enables data scientists to mix code,
discussion, and results in a reproducible way. Its plugin system
relies on AnyWidget, which
specifies a simple contract between extensions and Marimo's
rendering and execution engine. The aim of this project is to
design, build, and test a set of Marimo plugins that can be used for
classroom exercises similar to those in the
H5P
toolkit: multiple choice, fill in the blanks, and so on.
Keywords: Python, JavaScript, computational notebooks, education
Markdown to DOM
Python-Markdown
converts Markdown to HTML; if an application needs a DOM tree that
it can check or manipulate, it must then parse the HTML using a
library like BeautifulSoup,
perform whatever operations it needs to, and then convert the DOM
back to HTML. In this project, students will refactor
Python-Markdown so that it can generate a Beautiful Soup-compatible
DOM tree directly.
Keywords: Python, open source, parsing
What "Business of Software" Doesn't Teach
Many universities offer an undergraduate course on entrepreneurship
or the business of software. This project will survey these courses
to determine what they don't teach. For example, how many
of these courses (if any) devote time to labor rights? How many
discuss anti-trust legislation? And how does that vary by country
and by the nature of the institution?
Keywords: business of software, entrepreneurship, lesson content
Narwhals
Narwhals is a
Python package that provides compatibility between dataframe
libraries, allowing applications to use
Pandas, Polars,
and other libraries through a common API. Students working on this
project will contribute directly to Narwhals, and will be
responsible for fixing bugs, designing new features, and shepherding
their work through review into production.
Keywords: Python, data science, open source
Parallelizing Marimo Notebooks
Marimo is a next-generation
computational notebook that (a) stores everything as Python source
code and (b) analyzes code to prevent out-of-order execution of
cells. Dagster and
Metaflow are computational
workflow tools that allow users to add decorators to functions and
methods to specify computational chunks. The goal of this project is
to see if the two can be married, i.e., to see if it's possible to
add decorators to cell functions in Marimo to parallelize notebooks
directly.
Keywords: Python, parallel computing, workflows, computational notebooks
Software Design for Everyone
Each lesson in this tutorial will present a "what if?" scenario and
then explores its implications for software design. How would you
redesign a cell phone app if you had crippling arthritis (which you
can simulate by taping popsicle sticks to your fingers)? What if
you thought your government might take a sharp turn to the right and
retroactively weaponize women's health records: (how) could you
satisfy doctors' need for information with patient safety? The
practical exercises will assume enough programming skill to build
simple web applications.
Keywords: accessibility, UI design, education
Software Design by Example in Gleam
Gleam is a modern functional language that runs on the
Erlang/OTP platform (and can
also be compiled to JavaScript). The aim of this project is to
translate examples from Software Design
by Example into Gleam to help people coming from Python and
other mainstream languages understand how to use FP in practice.
Keywords: Gleam, functional programming, software design, education
Software Performance by Example
Each lesson in this tutorial will take a simple application, analyze
its performance, and then make it faster. Along the way, the lessons
will present general tips for improving performance similar to those
in Jon Bentley's classic book Writing Efficient Programs,
update them, and show how to apply them in practice.
Keywords: C, Python, JavaScript, SQL, distributed systems, performance
Software Security by Example
The first lesson in this tutorial will present a simple
implementation of a wiki designed for shared note-taking. Each of
the following lessons will fix one of its security shortcomings (or
one of the shortcomings introduced by an earlier fix). Some will be
vulnerabilities such as cross-site scripting or SQL injection;
others will be missing features such as basic authentication or
OAuth, role-based
access control, the kind of logging that every sys admin wishes they
had, static code analysis, and eventually the audit and emergency
response procedures that such tools are meant to support.
Keywords: web programming, digital security, education
Session Recording and Playback
asciinema and similar tools can
record a terminal window session and play it back in another
terminal or in a browser.
This prototype
adds audio recording and synchronized playback so that (for example)
an instructor can record a live coding session with a voiceover for
a learner to go through later. This project will extend that
prototype to replay sessions in the browser.
Keywords: JavaScript, UI design, multimedia, accessibility
projects = { “ongoing”: [ “browsercast”, “marimo-h5p”, ], “programming”: [ “execution-order”, “narwhals”, “markdown-dom”, “dragnet”, “wysiwyg-editor”, “xkcd-charts”, “extending-lox”, “testing-rse”, “parallel-marimo”, “tidyblocks”, “wysiwyg-notebook”, “code-selectors”, “session-recording”, “tower-support”, ], “tutorials”: [ “web-tutorial”, “generative-art”, “sdx-gleam”, “des-sim”, “sdx-security”, “sdx-everyone”, “unbreaking”, “sdx-performance”, ], “research”: [ “undergrad-textbooks”, “variable-roles”, “code-review”, “claim-validity”, “understanding-ethics”, “missing-lessons”, “slide-text”, “tooling-effort”, “developer-discussions”, “ide-adoption”, “ghost-engineers”, # “hoye-test”, # “altruism”, ] }
Comparing Slideshow Tools
Are slideshows written using HTML- or Markdown-based tools more
text-intensive than those written in PowerPoint? Putting it another
way, are slides written in formats that version control understands
(text) less likely to use diagrams than slides written with GUI
tools? To answer this question, the student(s) doing this project
will have to develop ways to quantify how graphical or textful a
presentation is, and learn how to make work of this kind
reproducible.
Keywords: slideshows, tools, reproducible research
Testing Research Software
The JavaScript and
Python versions of Software Design by
Example showed readers how to design programs by working
through scaled-down examples. In contrast, this project will develop
scaled-down versions of things like
fluid
flow simulators and data analysis pipelines, and then shows
readers how to test them. Each lesson will open with a short recap
of the science and a walk-through of the untested code, then explore
how that code can be tested.
Keywords: Python, computational science, software testing, education
A Blocks-Based Data Science Tool
TidyBlocks
was a prototype of a
Scratch-like tool for
teaching introductory data ascience. It turned out to be an
inappropriate visual paradigm, as there was no natural way to
represent join operations as nested blocks. The aim of this project
is to explore an alternative using a node-and-connector model like
that of Node-RED
or Yahoo!
Pipes.
Keywords: JavaScript, UI design, programming tools, education
Tooling Effort over Time
How does the percentage of effort devoted to tooling and deployment
change as a project grows and/or ages? And how has it changed as
we've moved from desktop applications to cloud-based applications?
Once a project reaches a certain size, does the amount of tooling
(measured by number of files or lines of configuration) level off?
Does the effort required to maintain the tooling grow as the code
base grows, or does it level off as well? Answering these questions
will give the student(s) a chance to learn how to mine software
repositories.
Keywords: mining software repositories, tooling, reproducible research
A Tower Support Game
A tower defense game
is one in which the player builds fixed defenses against incoming waves of
attackers. (Kingdom Rush
is a personal favorite.) The objective of this game is to prototype a simple
tower support game, in which the player builds bridges, first aid
stations, and so on to help travelers reach their destination.
Keywords: games, JavaScript
Unbreaking Software
Most programmers spend a large part of their time debugging, but
most courses only show working code, and most textbooks don't
discuss how to prevent, diagnose, and fix errors. This tutorial
will fill that gap by presenting dozens of case studies showing how
to find and fix real-world problems. Along the way, it will present
examples of what programmers can do to handle errors gracefully,
from data structure repair to automatically restarting servers.
Keywords: software design, debugging, education
Analysis of Undergrad Textbooks
Most undergraduate computer science programs have a first- or
second-year course on data structures and algorithms. What do these
courses actually teach, how has their content changed since
Wirth's classic
book appeared in 1976, and which of these algorithms and data
structures are used in upper-year courses? To answer these
questions, this project will assemble and apply tools to analyze the
text of several dozen textbooks; along the way, the students doing
the project will have to decide how to identify topics, how to count
them, and how to make their work reproducible.
Keywords: natural language processing, education
Understanding Ethics
This project will start by creating a set of scenarios in which a
programmer needs to make an ethical decision, each with
multiple-choice options. An expert will determine the best answer
for each; students and professionals will then be asked to answer
the same questions, and the results will be analyzed to see how well
each group matches the experts' opinions and whether practitioners'
opinions are any better than those of students.
Keywords: ethics, professional development
Identification of Variable Roles
Sajaniemi et al's work
on roles
of variables identified and named ten small patterns in the way
variables are used in novice programs. This project would build
static and dynamic analysis tools to detect those patterns (and
possibly others) in programs as an aid to teaching, debugging, and
code review.
Keywords: program analysis, education
Human-Scale Web Programming
This
incomplete tutorial is an introduction to web programming aimed
at scientists and others will little or no experience with
JavaScript, HTTP requests, and related technologies. In this
project, a student (or team of students) with an interest in
teaching would fill it in and offer it at least once in order to
learn more about how to create and deliver high-quality lessons.
Keywords: JavaScript, Python, web programming, education
A Little WYSIWYG Editor
Panchekha and Harrelson's
Web Browser Engineering
builds a small but fully-functional web browser step by step to show
students how real ones work. The aim of this project is to build an
equally simple desktop WYSIWYG editor in Python that supports both
styled text and embedded sketching.
Keywords: Python, UI design
A WYSIWYG Computational Notebook
Jupyter uses JSON as its storage format, while
Marimo
and Quarto use Python with
embedded strings and Markdown with embedded code respectively. This
project will explore a third option by building an
extension
for LibreOffice using
the the
Jupyter messaging protocol so that people who prefer WYSIWYG
editors can embed code and its output alongside diagrams, tables,
and other media.
Keywords: Java, computational notebooks, UI design
XKCD Charts
Chart.xkcd is a
JavaScript library that displays charts in the sketchy hand-drawn
style of XKCD. Its creator is no
longer maintaining it; this project will fork the original code, fix
outstanding issues, and add new features such as axis limits and
stable coloring schemes.
Keywords: JavaScript, SVG, UI design, data visualization, open source