Neural Quines

Code on Github: https://github.com/evanfletcher42/neural-quines

Quines are programs that take no input and output their own source code. Can we do something like that with a neural network? Specifically: Can we train a neural network to output its own weights?

We are effectively looking for fixpoints, where f(x, W) = W, such that f is a neural network constructed out of common ML building blocks containing weights W. Unlike a traditional machine-learning problem, where we are fitting to a dataset of some variety, this network must be trained to hit a moving target: changing any weight will change the target of the optimization. Makes it fun!

This post describes a few different shapes and sizes of neural-network quines, and how to train them.

(This work is inspired by a 2018 paper by Oscar Chang and Hod Lipson. It’s a good read.)

The Rules

The goal is to create a machine-learning system that produces outputs which describe the machine-learning system in some meaningful manner.

  1. Output All Trainable Parameters (Or Something Equally Self-Descriptive)
    Usually this will be the network’s weights, directly. The point is to create a moving target for optimization, where the act of training any parameter in the network changes the desired output.

  2. No Self-Descriptive Input
    Quines take no input and produce their own code as output. They are not permitted to read their own source files, or use any introspective capability to print its own representation. To whatever degree this is possible, a neural network quine should mimic this behavior.

    “No input” won’t quite be possible for these networks – they are functions that expect inputs – but we can settle for inputting constants, random numbers, “queries” (e.g. row/column indices), intermediate outputs from the network (as in RNNs), or other things that do not directly describe the expected output.

    There are a few interesting papers that focus on the “self replication” / “artificial life” angle, and do produce non-trivial self-replicating nets – but these use reductions of their own weights as input, which I’m arbitrarily going to say is against the rules.

  3. It’s Okay to Know How to Use the Network
    “Outside” knowledge of the range of an input, or how to encode, normalize, or otherwise represent any input, is fine. Same goes for interpreting outputs. We can consider this part of the “environment” that would “run” our quine.

  4. Absolute Perfection Is Not Required
    Aside from certain trivial situations, it’s unlikely that any of these networks will be able to output their own weights out to the end of floating-point precision. Absolutely perfect zero-residual results are rare for any optimization. For this, we’ll just say that “pretty close is good enough,” and we’ll consider final error numbers when assessing quality.

The Zero Quine & Auxiliary Goals

It’s important to note that there is an easy answer to making a neural quine: Just set all the weights to zero! Given any input, this will spit out zero(s). Done! Also, boring.

If a zero quine or other trivial solution shows up naturally, I’ll take it, but I’ll also try to guide training towards non-trivial solutions wherever possible. If only for the sake of making more interesting plots.

Addressing Shape

As Chang & Lipson observe, some trickery is required to get a network to output its own weights. To use their example: Say the last layer in a feed-forward network has A inputs and B outputs; the linear transformation in this layer would have A*B weights, which is strictly greater than B for any A > 1.

We can dodge this by:

  1. Allow running the network several times: either “querying” for specific weights (or subsets of weights), or recurrently
  2. Surrounding trainable layer(s) with random, fixed projections, which serve only to adapt shape from inputs & to outputs.

(2) stretches the rules a bit – these are just more layers in the network – but, since we’re not training them, we can either ignore them as “part of the environment,” or pretend they’re cleverly-chosen bespoke transforms that make the method work.

Querying Weights by Row and Column Indices

For this one, we make a simple three-layer network, with one trainable 16×16 linear layer (no biases) in the middle. The goal is to query for weights in this layer by row & column indices. The first and last layers are held fixed at their random initializations, and serve only to adapt shape between 2D input (row, column) and 1D output (weight).

Simply training this network to minimize mean-squared error does not work well. RMSprop works best, but even so, fails to produce a compelling result, with residual errors at the same scale as the weights themselves.

Adding in “regeneration” from [1], where we alternate between gradient-descent steps & just replacing the model’s weights with its own predicted results, yields a significant improvement: the system almost instantly converges on the zero quine! (Or mild transformations of it, depending on if we enable biases in the projection layers.)

Training normally, versus alternating optimizer steps & regeneration. Regeneration instantly converges to the zero quine.

We can forcibly avoid the zero quine by adding a parameter-free normalization layer after the trained dense layer. Both instance normalization and batch normalization (with all learned & running parameters disabled – no extra weights!) produce imperfect, but non-trivial, quines with relatively low error.

Increasing the size of the queried matrix begins yielding some very impressive results: Here, a 64×64 weights matrix with values in the range [-2.1 … 0.8] is accurately replicated with RMSE ~2e-7.

It’s worth noting that, for all these tests, I’m inputting all (row, column) indices in one batch. Using batch norm as I am here, with all trained & running-stats parameters disabled, means the normalization will depend on the input; the network’s output would change if I did not query all the weights at once. The additional distribution information may explain why batchnorm works much better here.

Querying Weights by One-Hot Row and Column Indices

One issue with querying by row & column is that this requires outside knowledge of the shape of the weights matrix. We can encode this information in the inputs by making them one-hot vectors. This change of input does considerably increase the number of random-but-fixed weights in the input-projection layer – but, as noted before, we’re ignoring those.

This time, the best results were found by training this naively – no regeneration or normalization layers:

Not bad…

Though personally, I think the output is more visually recognizable after further training. The loss may strictly larger, but its magnitude is smaller compared to the range of weights.

Not bad!

Training with regeneration gets us back to the zero quine – sort of. The reported mean-squared error is floating-point equal to 0.0, but is clearly not zero, nor is the network actually the trivial zero quine; output still resembles the weights matrix. Weight magnitudes are just driven to ~10^-22 to reduce mean-squared error, and we’re running into the limits of 32-bit-floats.

“Zero RMS error,” for certain limited-precision definitions of zero.

Interestingly, training with normalization layers made things considerably worse this time; outputs did not even slightly resemble the ground-truth matrix.

A RNN That Outputs Its Own Weights

Rather than querying for specific coordinates, this approach asks a recurrent neural network to predict a long sequence of values – specifically, all of the weights in the network, concatenated together.

y_true = torch.cat([w.flatten() for w in model.parameters()])[..., None]

This has the benefit of being able to represent all the weights in the network. No need to ignore any fixed input or output projections! Also, since we don’t need to worry about querying nice rectangles, we’re free to use biases, normalizations with learned parameters, or anything else.

To be specific: this sequence-to-sequence RNN is trained to predict the next weight in a sequence, given the previous weight (teacher forcing). At evaluation time, a user would input a 0 to get the first weight, then re-input whatever the network outputs to get each subsequent weight.

I tried both LSTM-based and GRU-based nets, using pytorch’s built-in multi-layer RNN primitives for both. In all cases, the network rapidly converged on almost-all constant weights, except for a few at the beginning and end of the sequence. The network can correctly predict these first few weights and constants, but would reliably fail to predict the weights in the last layer. This seems to hold regardless of the width or depth of the networks, and remains true even with added normalizations.

Weights & predictions for a 2-layer 4-wide LSTM + output fully-connected layer. Sequence reshaped into a rectangle for easy viewing.

Technically, this is already non-trivial, if a poor result that isn’t very interesting to look at. I may need to revisit this in the future; perhaps constructing a more custom RNN with normalizations at each layer may help.

Bonus: RNN That Outputs Its Own Python Source Code

Outputting weights, while interesting, is still not particularly self-descriptive; to use them, we still need to know something about the structure of the network, how to drive it, and how to interpret its results. We can do better: Why not just output the Python code that trains & executes the model?

Bootstrapping this is straightforward; LSTMs have a well-studied ability to memorize and output long sequences, and all we need to do here is memorize ~2.6kB of ASCII-only Python code. This model, which is probably oversized for this task, is a bog-standard sequence-to-sequence character-level prediction net. We train the net via teacher forcing, using the source code file as the dataset, and terminate when accuracy hits 100%.

Representing a 2.6 kB string with 944 kB of floats. Now that’s efficiency!

This is, of course, cheating; quines aren’t supposed to read their own source files. Well, fair – but once that’s done, we can use model output only to repeat the process. It’s an ouroboros, and I think that’s neat.

Besides, when else will I get a chance to write code like this?

# Hey, this inscrutable black box of linear algebra outputs code!
# Let's execute it.  What could go wrong?  
eval(predict(model))

About the Author