Code on Github: https://github.com/evanfletcher42/neural-quines
Quines are programs that take no input and output their own source code. Can we do something like that with a neural network? Specifically: Can we train a neural network to output its own weights?
We are effectively looking for fixpoints, where f(x, W) = W, such that f is a neural network constructed out of common ML building blocks containing weights W. Unlike a traditional machine-learning problem, where we are fitting to a dataset of some variety, this network must be trained to hit a moving target: changing any weight will change the target of the optimization. Makes it fun!
This post describes a few different shapes and sizes of neural-network quines, and how to train them.
(This work is inspired by a 2018 paper by Oscar Chang and Hod Lipson. It’s a good read.)
The goal is to create a machine-learning system that produces outputs which describe the machine-learning system in some meaningful manner.
It’s important to note that there is an easy answer to making a neural quine: Just set all the weights to zero! Given any input, this will spit out zero(s). Done! Also, boring.
If a zero quine or other trivial solution shows up naturally, I’ll take it, but I’ll also try to guide training towards non-trivial solutions wherever possible. If only for the sake of making more interesting plots.
As Chang & Lipson observe, some trickery is required to get a network to output its own weights. To use their example: Say the last layer in a feed-forward network has A inputs and B outputs; the linear transformation in this layer would have A*B weights, which is strictly greater than B for any A > 1.
We can dodge this by:
(2) stretches the rules a bit – these are just more layers in the network – but, since we’re not training them, we can either ignore them as “part of the environment,” or pretend they’re cleverly-chosen bespoke transforms that make the method work.
For this one, we make a simple three-layer network, with one trainable 16×16 linear layer (no biases) in the middle. The goal is to query for weights in this layer by row & column indices. The first and last layers are held fixed at their random initializations, and serve only to adapt shape between 2D input (row, column) and 1D output (weight).
Simply training this network to minimize mean-squared error does not work well. RMSprop works best, but even so, fails to produce a compelling result, with residual errors at the same scale as the weights themselves.
Adding in “regeneration” from [1], where we alternate between gradient-descent steps & just replacing the model’s weights with its own predicted results, yields a significant improvement: the system almost instantly converges on the zero quine! (Or mild transformations of it, depending on if we enable biases in the projection layers.)
We can forcibly avoid the zero quine by adding a parameter-free normalization layer after the trained dense layer. Both instance normalization and batch normalization (with all learned & running parameters disabled – no extra weights!) produce imperfect, but non-trivial, quines with relatively low error.
Increasing the size of the queried matrix begins yielding some very impressive results: Here, a 64×64 weights matrix with values in the range [-2.1 … 0.8] is accurately replicated with RMSE ~2e-7.
It’s worth noting that, for all these tests, I’m inputting all (row, column) indices in one batch. Using batch norm as I am here, with all trained & running-stats parameters disabled, means the normalization will depend on the input; the network’s output would change if I did not query all the weights at once. The additional distribution information may explain why batchnorm works much better here.
One issue with querying by row & column is that this requires outside knowledge of the shape of the weights matrix. We can encode this information in the inputs by making them one-hot vectors. This change of input does considerably increase the number of random-but-fixed weights in the input-projection layer – but, as noted before, we’re ignoring those.
This time, the best results were found by training this naively – no regeneration or normalization layers:
Though personally, I think the output is more visually recognizable after further training. The loss may strictly larger, but its magnitude is smaller compared to the range of weights.
Training with regeneration gets us back to the zero quine – sort of. The reported mean-squared error is floating-point equal to 0.0, but is clearly not zero, nor is the network actually the trivial zero quine; output still resembles the weights matrix. Weight magnitudes are just driven to ~10^-22 to reduce mean-squared error, and we’re running into the limits of 32-bit-floats.
Interestingly, training with normalization layers made things considerably worse this time; outputs did not even slightly resemble the ground-truth matrix.
Rather than querying for specific coordinates, this approach asks a recurrent neural network to predict a long sequence of values – specifically, all of the weights in the network, concatenated together.
y_true = torch.cat([w.flatten() for w in model.parameters()])[..., None]
This has the benefit of being able to represent all the weights in the network. No need to ignore any fixed input or output projections! Also, since we don’t need to worry about querying nice rectangles, we’re free to use biases, normalizations with learned parameters, or anything else.
To be specific: this sequence-to-sequence RNN is trained to predict the next weight in a sequence, given the previous weight (teacher forcing). At evaluation time, a user would input a 0 to get the first weight, then re-input whatever the network outputs to get each subsequent weight.
I tried both LSTM-based and GRU-based nets, using pytorch’s built-in multi-layer RNN primitives for both. In all cases, the network rapidly converged on almost-all constant weights, except for a few at the beginning and end of the sequence. The network can correctly predict these first few weights and constants, but would reliably fail to predict the weights in the last layer. This seems to hold regardless of the width or depth of the networks, and remains true even with added normalizations.
Technically, this is already non-trivial, if a poor result that isn’t very interesting to look at. I may need to revisit this in the future; perhaps constructing a more custom RNN with normalizations at each layer may help.
Outputting weights, while interesting, is still not particularly self-descriptive; to use them, we still need to know something about the structure of the network, how to drive it, and how to interpret its results. We can do better: Why not just output the Python code that trains & executes the model?
Bootstrapping this is straightforward; LSTMs have a well-studied ability to memorize and output long sequences, and all we need to do here is memorize ~2.6kB of ASCII-only Python code. This model, which is probably oversized for this task, is a bog-standard sequence-to-sequence character-level prediction net. We train the net via teacher forcing, using the source code file as the dataset, and terminate when accuracy hits 100%.
This is, of course, cheating; quines aren’t supposed to read their own source files. Well, fair – but once that’s done, we can use model output only to repeat the process. It’s an ouroboros, and I think that’s neat.
Besides, when else will I get a chance to write code like this?
# Hey, this inscrutable black box of linear algebra outputs code!
# Let's execute it. What could go wrong?
eval(predict(model))