By extending the form of the output of RNN, this work enables RNN to solve problems whose output is a sequence of discrete tokens corresponding to positions in the input sequence. For example, in the Convex Hull problem or TSP problem, the RNN should output a given point each time to generate a path. The motivation is well-expressed and the method is simple and elegant.

Traditional RNN can also achieve those problems but the output is without restriction (e.g. index or coordinate), which will cause wrong or vague output when the test sequences is longer than the training sequences. However, PointerNet at each timestep can output the probability for each position of the input, namely, point to each position of the input with different confidence. This output form can restrict the output space and better focus on the insight logit.

So let’s begin with the sequence-to-sequence model. Assume is the input sequence and is the output sequence, where each element in is an index (position) of . And the parameter of RNN is . Then the conditional probability can be calculate by the chain rule:

And the optimal parameter of the model should maximize the conditional probability on all training pairs:

Classical methods usually encode the input sequence into a compressed code and then decode that code into the output sequence. This method runs fast (O(n)) but it is hard to compress all information into a small coding space without distortion after multiple steps. To resolve that, Content Based Input Attention mechanism allows the decoder to look up the hidden states of all previous encoders.

In this case, the information of previous hidden states can bypass processing and directly be used by the decoder, which prevents the distortion of information.

Pointer Net however employs the same attention not to grant decoder access to previous hidden states but to output the index (position). Since the output sequence consists of indices (positions) of the input sequence, the attention, which naturally satisfies this restriction, can be treated as the confidence of each position being the predicted position. The formular is quite similar with Sequence-to-Sequence:

Three experiment that requires indices output is tested: Convex Hull, Delaunay Triangulation and Travelling Salesman Problem. Compared with previous methods, PointerNet achieves almost perfect on the first two experiments but still fails to handle TSP problem with large N.

[DR015]: Strictly speaking, this paper does not propose a new formula to tackle the problem with indices output. It directly takes the attention distribution from content-based method as the output of the network. In spite of that, they tested this method on three untried experiments and the performance is somehow acceptable. PtrNet works well on both Convex Hull and Delaunay Triangulation problems and poorly on TSP problem (from my perspective). Unfortunately, the first two problems can be solved with a deterministic algorithm with smaller computational complexity than PtrNet.

Human being solves the aforementioned problems with visual instinct rather than a complex algorithm. It may not produce an optimal solution, but is more human-alike. So is it possible to solve those dot-connecting puzzles with visual features rather than absolute coordinate?

Thinking outside the box, I am more interested in how neural network solves logical puzzles than whether it is doing a good job. For instance, we can solve the Sudoku puzzle with the searching algorithm with DFS or BFS. Human nevertheless cannot handle deep recursion but can make a complex deduction. With attention read/write, can a neural network learn to deduct from the current board without deep recursion? It would be definitely fun to watch a NN to play Sudoku.