class: slide-title
Software Design by Example
Object Persistence
chapter
--- ## How to Save Data? - Prose as plain text - Tables as CSV - What about objects? - List of dictionaries of lists of dictionaries --- ## Existing Options - [JSON][py_json] or [YAML][py_yaml]: language-neutral - But therefore lowest common denominator - Boolean, number, string, list, dictionary (with string keys) - [pickle][py_pickle] module: Python-specific - Arbitrary nested objects (good) - Other languages can't read its files (bad) --- ## Getting Started - Store each __atomic value__ on a line of its own - `type_name:value` ```txt bool:True int:123 ``` - Split strings on newlines - Save the number of lines ```txt # input this is two lines ``` ```txt # output str:2 this is two lines ``` --- ## Implementation ```py def save(writer, thing): if isinstance(thing, bool): print(f"bool:{thing}", file=writer) elif isinstance(thing, float): print(f"float:{thing}", file=writer) elif isinstance(thing, int): print(f"int:{thing}", file=writer) else: raise ValueError(f"unknown type of thing {type(thing)}") ``` --- ## Collections - Save type and number of elements ```py elif isinstance(thing, list): print(f"list:{len(thing)}", file=writer) for item in thing: save(writer, item) ``` --- ## What This Looks Like ```py save(sys.stdout, [False, 3.14, "hello", {"left": 1, "right": [2, 3]}]) ``` ``` list:4 bool:False float:3.14 str:1 hello dict:2 str:1 left int:1 str:1 right list:2 int:2 int:3 ``` -- - Computer doesn't need indentation or end markers - But we might add them for readability --- ## Reading Data ```py def load(reader): line = reader.readline()[:-1] assert line, "Nothing to read" fields = line.split(":", maxsplit=1) assert len(fields) == 2, f"Badly-formed line {line}" key, value = fields if key == "bool": names = {"True": True, "False": False} assert value in names, f"Unknown Boolean {value}" return names[value] elif key == "float": return float(value) elif key == "int": return int(value) else: raise ValueError(f"unknown type of thing {line}") ``` --- ## Reading Multi-line Values ```py elif key == "list": return [load(reader) for _ in range(int(value))] ``` - Use a list comprehension instead of a loop --- ## Open-Closed Principle - Software should be open for extension but closed for modification - I.e., should be able to add new code without rewriting existing code - Create a dispatch function that figures out what reader or writer to call - Find appropriate things to call dynamically - Instead of looking for functions, look for methods - If the type of the thing we're saving is `something`, provide a method `_something` --- ## Saving ```py class SaveObjects: def __init__(self, writer): self.writer = writer def save(self, thing): typename = type(thing).__name__ method = f"save_{typename}" assert hasattr(self, method), \ f"Unknown object type {typename}" getattr(self, method)(thing) ``` - Handle loading the same way --- ## Next Steps - What to do with user-defined classes? - Or things from the standard library, for that matter? - Convert user types to built-in types - Either the object tells us how… - …or we do it generically - Either way, how to convert back? - Save class definitions as well as objects' values - Most general (code is just data) - But most difficult to implement - And a potential security hole --- ## Aliasing
```py shared = ["shared"] fixture = [shared, shared] ``` - "Surely nobody would ever do this!" - But every child node in an HTML tree has a reference to its parent --- ## Aliasing - Store a unique ID for every object using Python's `id` - Keep track of the objects seen so far - Write that ID the first time we see the object - Write a special entry when we see the object again --- ## Saving ```py def save(self, thing): thing_id = id(thing) if thing_id in self.seen: self._write("alias", thing_id, "") return self.seen.add(thing_id) typename = type(thing).__name__ method = f"save_{typename}" assert hasattr(self, method), \ f"Unknown object type {typename}" getattr(self, method)(thing) ``` --- ## What It Looks Like ```py word = "word" child = [word, word] parent = [] parent.append(parent) parent.append(child) saver = SaveAlias(sys.stdout) saver.save(parent) ``` ``` list:4539747200:2 alias:4539747200: list:4539552960:2 str:4539552048:1 word alias:4539552048: ``` --- class: summary ## Summary
[py_json]: https://docs.python.org/3/library/json.html [py_pickle]: https://docs.python.org/3/library/pickle.html [py_yaml]: https://pyyaml.org/wiki/PyYAMLDocumentation