Concurrent Examples
I would like to write a short tutorial on web programming for data scientists. I want to be able to re-run the examples and capture their output automatically, but concurrency makes this much harder than it is for something like this SQL tutorial. I need to be able to:
- launch two or more processes (clients and servers);
- make sure clients don’t try running before their servers are ready to accept requests; and
- capture the messages printed by these processes in the order in which they appear (rather than all of one process’s output followed by all of another’s).
Robert Kern kindly provided a predecessor for this script:
#!/usr/bin/env bash
# Save the process group ID of this script.
pgid=`ps -o pgid= $$`
# Trap a Ctrl-C SIGINT and kill everything running inside this script.
trap "pkill -KILL -g $pgid" INT
# 1. Redirect server stderr to stdout.
# 2. Prefix each line with 'server'.
# 3. Background the process.
($1 2>&1 | while read server; do echo 'S: ' ${server}; done) &
# Wait.
sleep 1
# 1. Redirect client stderr to stdout.
# 2. Prefix each line with 'client'.
$2 2>&1 | while read client; do echo 'c: ' ${client}; done
# Kill this script and its children (client and server) when client finishes.
pkill -KILL -g $pgid
When run like this:
run2.sh "python -m http.server -d site" "python simple_request.py" > output.txt
it launches a simple HTTP server,
sleeps for one second,
and then runs the client program simple_request.py
.
The output from the server uses S
as a prefix
while the output from the client uses c
.
This script works, but only sort of:
- The one-second delay before launching the client isn’t always enough.
- When I’m using Python’s
socket
,socketserver
, andssl
modules, processes don’t always relinquish sockets cleanly upon exit (particularly if there’s a deliberate bug in the code to illustrate errors and error handling), which means I can’t re-run the job for several seconds.
Later,
Jean-Marc Saffroy provided this script,
which uses lsof
to wait until servers start listening on ports
before launching clients:
#!/usr/bin/env bash
# First argument is a set of TCP ports.
# This is followed by one command per server listening to those ports
# and then by client commands, e.g.
# e.g: ./runner.sh "8081 8082" "servercmd 8081" "servercmd 8082" "client1" "client2" ...
PORTS="$1"
shift
CHILDREN=
# Wait for a port to be available.
await_port_free() {
PORTNUM=$1
while lsof -n -iTCP:${PORTNUM} ; do
sleep 0.5
# printf "*"
done
# printf "\nport $PORTNUM free\n"
}
# Wait for a port to be in the listening state.
await_port_listen() {
PORTNUM=$1
while ! lsof -n -iTCP:${PORTNUM}|grep -qw LISTEN ; do
sleep 0.5
# printf "*"
done
# printf "\nport $PORTNUM in LISTEN state\n"
}
# Kill all child processes (suppressing messages so as not to clutter output).
on_exit(){
# disable trap
trap - exit int
# gently kill every child
kill -INT $CHILDREN &>/dev/null
sleep 1
# thorough cleanup
pkill -TERM -g 0
}
# exiting or ^C runs on_exit
trap on_exit exit int
# Launch the servers as their ports become available, and wait until each one
# has started listening before starting the next one.
for PORT in $PORTS; do
await_port_free $PORT
CMD="$1"
shift
$CMD &
CHILDREN="$CHILDREN $!"
await_port_listen $PORT
done
# Launch all of the clients.
for CMD in "$@"; do
$CMD &
CHILDREN="$CHILDREN $!"
done
# Wait until any child process exits
while true; do
for CHILD in $CHILDREN; do
if ! kill -0 $CHILD &>/dev/null; then
exit # to on_exit
fi
done
sleep 0.5
done
Is there a better way? I feel like there must be—I’m hardly the first person to wrestle with these issues—but it has been decades (literally) since I programmed at this level. One possibility is to borrow some code from VCR.py and run everything in a single process after replacing the underlying socket library with something that captures and forwards messages. That would have the advantage of being reproducible—I’m going to use this to re-run examples for a tutorial, and while I don’t care what the order of messages is, I do care that it’s the same each time—but this kind of mocking will only work if everything is in Python. If you have another solution you can share, please reach out.