Software Design by Example: A Package Manager

class: slide-title

<p>
  <a href="https://third-bit.com/sdxpy/">Software Design by Example</a>
</p>
<h1>A Package Manager</h1>
<div class="bottom">
  <a href="../">chapter</a>
</div>

---

## The Problem

-   Packages and their dependencies change over time

-   If A and B require different versions of C
    it might not be possible to use A and B together

-   Need a systematic way to find globally-consistent sets of packages

---

## Identifying Versions

-   <a class="gl-ref" href="../../glossary/#gl:semantic_versioning" title="A standard for identifying software releases. In the version identifier `major.minor.patch`, `major` changes when a new version of software is incompatible with old versions, `minor` changes when new features are added to an existing version, and `patch` changes when small bugs are fixed." markdown="1">Semantic versioning</a> uses three integers `X.Y.Z`

-   `X` is the major version (breaking changes)

-   `Y` is the minor version (new features)

-   `Z` is the <a class="gl-ref" href="../../glossary/#gl:patch" title="A single file containing a set of changes to a set of files, separated by markers that indicate where each individual change should be applied, or the semantic versioning identifier for such a file." markdown="1">patch</a> (bug fixes)

-   Notation

-   `>=1.2.3` means "any version from 1.2.3 onward"

-   `<4` means "any version before 4.anything"

-   `1.0-3.1` means "any version in the specified range"

---

## A Simplified Version

```json
{
  "A": {
    "3": {"B": ["3", "2"], "C": ["2"]},
    "2": {"B": ["2"],      "C": ["2", "1"]},
    "1": {"B": ["1"]}
  },
  "B": {
    "3": {"C": ["2"]},
    "2": {"C": ["1"]},
    "1": {"C": ["1"]}
  },
  "C": {
    "2": [],
    "1": []
  }
}
```

---

## Multiple Dimensions

-   Imagine a multi-dimensional grid with one axis per package

-   Possible combinations are points in this grid

-   But how do we find the legal ones?

---

## Estimate Work

-   Our example has 3×3×2=18 combinations

-   Adding one more package with two versions doubles
    the <a class="gl-ref" href="../../glossary/#gl:search_space" title="The set of all possible solutions to a problem, i.e., the set of possibilities that an algorithm must search through to find an answer." markdown="1">search space</a>

-   A <a class="gl-ref" href="../../glossary/#gl:combinatorial_explosion" title="The exponential growth in the size of a problem or the time required to solve it that arises when all possible combinations of a set of items must be searched." markdown="1">combinatorial explosion</a>

-   Brute force solutions are impractical even for small problems

-   But worth implementing as a starting point and for testing

---

## Brute Force

-   Generate all possible combinations of package versions

-   Then eliminate ones that aren't compatible with the manifest

```py
def main():
    manifest = json.load(sys.stdin)
    possible = make_possibilities(manifest)
    print(f"{len(possible)} possibilities")
    allowed = [p for p in possible if compatible(manifest, p)]
    print(f"{len(allowed)} allowed")
    for a in allowed:
        print(a)
```

---

## Generating Possibilities

-   Create a list of the available versions of each package

-   Generate their <a class="gl-ref" href="../../glossary/#gl:cross_product" title="The set of all possible combinations of items from one or more sets." markdown="1">cross product</a>

```py
def make_possibilities(manifest):
    available = []
    for package, versions in manifest.items():
        available.append([(package, v) for v in versions])
    return list(itertools.product(*available))
```

---

## Checking Validity

-   Compare every entry X against every other entry Y

-   If they are the same package, keep looking

-   If package X's requirements say nothing about package Y,
    keep searching

-   If X *does* depend on Y
    but this particular version of X doesn't list this particular version of Y
    as a dependency,
    rule out this combination

-   If combination is still a candidate, add to list

---

## Checking Validity

```py
def compatible(manifest, combination):
    for package_i, version_i in combination:
        lookup_i = manifest[package_i][version_i]
        for package_j, version_j in combination:
            if package_i == package_j:
                continue
            if package_j not in lookup_i:
                continue
            if version_j not in lookup_i[package_j]:
                return False
    return True
```

-   Finds 3 valid combinations among our 18 possibilities

```
18 possibilities
3 allowed
(('A', '3'), ('B', '3'), ('C', '2'))
(('A', '2'), ('B', '2'), ('C', '1'))
(('A', '1'), ('B', '1'), ('C', '1'))
```

---

## Manual Generation

-   Create our own cross-product function
    in preparation for doing something more efficient

```py
def make_possibilities(manifest):
    available = []
    for package, versions in manifest.items():
        available.append([(package, v) for v in versions])

accum = []
    _make_possible(available, [], accum)
    return accum
```

-   Create a list of lists of available versions

-   Create an empty accumulator

-   Do some magic

---

## Manual Generation

-   Each call to `_make_possible` handles one package's worth of work

-   Loop over available versions of current package

-   Add that package to the combination in progress

-   If more packages, recurse

-   Otherwise, append to accumulator

```py
def _make_possible(remaining, current, accum):
    if not remaining:
        accum.append(current)
    else:
        head, tail = remaining[0], remaining[1:]
        for h in head:
            _make_possible(tail, current + [h], accum)
```

---

## Incremental Search

-   Generate-and-discard is inefficient

-   Stop immediately if a partial combination of packages is illegal

```py
def main():
    manifest = json.load(sys.stdin)
    packages = list(manifest.keys())
    if len(sys.argv) > 1:
        packages.reverse()

accum = []
    count = find(manifest, packages, accum, [], 0)

print(f"count {count}")
    for a in accum:
        print(a)
```

-   `reverse` to allow experimentation

---

## Finding Possibilities

```py
def find(manifest, remaining, accum, current, count):
    count += 1
    if not remaining:
        accum.append(current)
    else:
        head, tail = remaining[0], remaining[1:]
        for version in manifest[head]:
            candidate = current + [(head, version)]
            if compatible(manifest, candidate):
                count = find(
                    manifest, tail, accum, candidate, count
                )
    return count
```

---

## Finding Possibilities

1.  The manifest that tells us what's compatible with what.

2.  The names of the packages we've haven't considered yet.

3.  An accumulator to hold all the valid combinations we've found so far.

4.  The partially-completed combination we're going to extend next.

5.  A count of the number of combinations we've considered so far,
    which we will use as a measure of efficiency.

---

## Savings

-   We only create 11 candidates instead of 18

```sh
python incremental.py < triple.json
```

```
count 11
[('A', '3'), ('B', '3'), ('C', '2')]
[('A', '2'), ('B', '2'), ('C', '1')]
[('A', '1'), ('B', '1'), ('C', '1')]
```

-   Reversing the order cuts it to 9

```sh
python incremental.py reversed < triple.json
```

```
count 9
[('C', '2'), ('B', '3'), ('A', '3')]
[('C', '1'), ('B', '2'), ('A', '2')]
[('C', '1'), ('B', '1'), ('A', '1')]
```

---

## Using a Theorem Prover

-   An automated theorem prover can do much better

-   But the algorithms quickly become very complex

-   Prove that a set of logical propositions (e.g., dependencies) are satisfiable

-   We will use the [Z3 theorem prover][z3]

-   Whose documentation is unfortunately a barrier to entry

---

## Using Z3

```py
from z3 import Bool, Solver

A = Bool("A")
B = Bool("B")
C = Bool("C")
```

-   `A`, `B`, and `C` don't have values

-   Instead, each represents the set of possible Boolean values

---

## Using Z3

-   Specify constraints such as `A == B`

```py
solver = Solver()
solver.add(A == B)
solver.add(B == C)
report("A == B & B == C", solver.check())
```

-   Then ask Z3 to find a <a class="gl-ref" href="../../glossary/#gl:model" title="A set of values for variables that satisfies a specific set of constraints." markdown="1">model</a> that satisfies those constraints

```
A == B & B == C: sat
A False
B False
C False
```

---

## Unsatisfiable

-   Require `A` to equal `B` and `B` to equal `C`
    but `A` and `C` to be unequal

```py
solver = Solver()
solver.add(A == B)
solver.add(B == C)
solver.add(A != C)
report("A == B & B == C & B != C", solver.check())
```

```
A == B & B == C & B != C: unsat
```

---

## Packaging

-   Represent our package versions

```py
A1 = Bool("A.1")
A2 = Bool("A.2")
A3 = Bool("A.3")

B1 = Bool("B.1")
B2 = Bool("B.2")
B3 = Bool("B.3")

C1 = Bool("C.1")
C2 = Bool("C.2")
```

---

## Packaging

-   We want one version of `A`

```py
solver = Solver()
solver.add(Or(A1, A2, A3))
```

-   But the versions of `A` are mutually exclusive

```py
solver.add(Implies(A1, Not(Or(A2, A3))))
solver.add(Implies(A2, Not(Or(A1, A3))))
solver.add(Implies(A3, Not(Or(A1, A2))))
```

-   Do the same for `B` and `C`

---

## Dependencies

-   Add inter-package dependencies and solve

```py
solver.add(Implies(A3, And(Or(B3, B2), C2)))
solver.add(Implies(A2, And(B2, Or(C2, C1))))
solver.add(Implies(A1, B1))
solver.add(Implies(B3, C2))
solver.add(Implies(B2, C1))
solver.add(Implies(B1, C1))

print("result", solver.check(), solver.model())
```

```
result sat [B.3 = True,
 A.1 = False,
 C.2 = True,
 C.1 = False,
 B.2 = False,
 A.3 = True,
 A.2 = False,
 B.1 = False]
```

---

## Summary

[z3]: https://en.wikipedia.org/wiki/Z3_Theorem_Prover