A Gini coefficient is a simple measure of income equality. A coefficient of 0 indicates perfect equality (everyone has the same amount), while a coefficient of 1 indicates perfect inequality (one person has everything while everyone else has nothing). Highly unequal countries like South Africa have an (official) Gini coefficient around 0.65, while less unequal countries like Canada have a coefficient around 0.35.
Gini coefficients can be applied to other things, such as the number of commits made by each contributor to a Git repository. I was curious: how (un)equal are contributions to software projects? And are contributions to lessons more or less unequal? To find out, I crawled the histories of five Software Carpentry lessons and five widely-used numerical Python libraries. The results are undoubtedly wrong (I’m not trying to merge records for people with multiple email addresses, for example), but I was surprised:
- that the coefficients are so similar,
- and that they are so unequal when measuring the number of commits,
- but that they are so much more equal when counting insertions minus deletions (i.e., number of lines contributed),
- and that there is so much more variability when counting lines. (I don’t know if taking the ratio of Gini coefficients is meaningful, but it gives an idea of the scale of the disparities.)
I have a stack of background reading to do, but it’s shaping up to be a fun little project. If you know of other simple ways to measure the evenness of contributorship, or if you can explain why measuring by contributions and by lines gives such different answers, I’d enjoy hearing from you.
|Project||Gini (Commits)||Gini (Lines)||Ratio|