Gini Coefficients

A Gini coefficient is a simple measure of income equality. A coefficient of 0 indicates perfect equality (everyone has the same amount), while a coefficient of 1 indicates perfect inequality (one person has everything while everyone else has nothing). Highly unequal countries like South Africa have an (official) Gini coefficient around 0.65, while less unequal countries like Canada have a coefficient around 0.35.

Gini coefficients can be applied to other things, such as the number of commits made by each contributor to a Git repository. I was curious: how (un)equal are contributions to software projects? And are contributions to lessons more or less unequal? To find out, I crawled the histories of five Software Carpentry lessons and five widely-used numerical Python libraries. The results are undoubtedly wrong (I’m not trying to merge records for people with multiple email addresses, for example), but I was surprised:

  1. that the coefficients are so similar,
  2. and that they are so unequal when measuring the number of commits,
  3. but that they are so much more equal when counting insertions minus deletions (i.e., number of lines contributed),
  4. and that there is so much more variability when counting lines. (I don’t know if taking the ratio of Gini coefficients is meaningful, but it gives an idea of the scale of the disparities.)

I have a stack of background reading to do, but it’s shaping up to be a fun little project. If you know of other simple ways to measure the evenness of contributorship, or if you can explain why measuring by contributions and by lines gives such different answers, I’d enjoy hearing from you.

Project Gini (Commits) Gini (Lines) Ratio
Git Lesson 0.7867 0.0306 25.69
Python Lesson 0.8250 0.0982 8.40
R Lesson 0.7899 0.0299 26.42
Shell Lesson 0.7955 0.0362 21.96
SQL Lesson 0.8101 0.0462 17.52
Lessons Average 0.8014 0.0482 20.00
NumPy 0.9097 0.0105 86.89
Pandas 0.8743 0.0306 28.55
Scikit-Image 0.8547 0.2495 3.42
Scikit-Learn 0.8836 0.0492 179.51
SciPy 0.8821 0.0047 185.94
Projects Average 0.8809 0.0601 96.86

In the wake of posts about Shopify's support for white nationalists and DataCamp's attempts to cover up sexual harassment
I have had to disable comments on this blog. Please email me if you'd like to get in touch.