Introduction
During the last several months, my wife and I went through a Kenken phase. For those not familiar with Kenken, check out this website. Kenken is a great example of a class of problems that are more broadly categorized as constraint satisfaction problems (CSPs). I wrote some code that generates and solves Kenkens, which I’d like to write about when I have more time, but in this post I want to talk about an interesting, nonstandard way to set up and solve a different, far more popular CSP: sudoku!
What is sudoku?
For those who are not familiar with sudoku puzzles, they are designed as follows. The initial board is a 9x9, partially filled in grid. The goal is for the solver to find (the unique) values for the empty squares so that the board satisfies the following rules:
- There can be no repeated values in any row. (Each row must contain each value from 1 up to 9 exactly once.)
- There can be no repeated values in any column. (Each column must contain each value from 1 up to 9 exactly once.)
- There can be no repeated values in each of the 9 3x3 groups of cells outlined in a bolded black line in the figures below.


An unfilled board (left) and its solution (right).
Modeling the problem
In order to formulate this as an optimization problem we need to come up with a variable to optimize and constraints that tell the optimization algorithm what kinds of solutions are valid. In any sudoku problem, there are 81 cells that need to be filled in (including those that are already populated at the start). Each of those cells has 9 possible values, so we will index our variable $x$ with 3 indices: the first will indicate the row, the second will indicate the column, and the third will refer to a value between 1 and 9. Furthermore, we will require each $x_{ijk}$ to take either the value 0 or the value 1: $x_{ijk} = 1$ if the cell at position $(i, j)$ on the board has the value $k$ and 0 otherwise.
This way of modeling a sudoku board is not the most intuitive one possible, but it will make formulating our constraints easier, which is the topic we turn to next. (If the above wasn’t clear, give it another read before moving on.)
Constraints
There are a few types of constraints we need to respect to make sure that our optimization algorithm comes up with a valid solution to our sudoku problems. We will discuss each type of constraint in turn.
Respecting given values
The first constraint the solution needs to satisfy is that the values that are already provided must be respected. That is, if the first cell is already filled with the value 5, the optimizer is not allowed to change that. By the way we defined $x$ earlier, this would correspond to the constraint $x_{115} = 1$. Similarly, if the lower right value were set to 1, we would have another constraint corresponding to this value given by $x_{991} = 1$. We add a constraint like this for every value that is provided on the initial board.
Each cell contains a single value
This is a relatively simple constraint, but an important one. To model this constraint for cell $(1, 1)$, we would require that $\sum_{k=1}^9 x_{11k} = 1$. Because each entry of $x$ is either 0 or 1, this constraint says that exactly one of the entries corresponding to the first cell must be set. We add a constraint like the one I described for the first cell for each of the 81 cells.
Row, column, and box constraints
We require that each row contain each digit from 1 to 9 . To encode that the digits in row $i$ must be unique, we need to make sure that for each value $k$, we have $\sum_{j=1}^9 x_{ijk} = 1$ (in the sum, $i$ and $k$ are fixed). There are 81 such constraints corresponding to the rows, another 81 can be analogously formulated for the columns, and another 81 can be formulated to model the box constraints.
Consolidating constraints
Once we’ve modeled all the constraints, we can stack them into a single matrix equality constraint $Ax = b$. Although we’ve been discussing $x$ as though it were three dimensional, when we pass the optimization problem to the computer, we flatten it into one long 729-vector (9 rows $\times$ 9 columns $\times$ 9 candidate values per cell). Each row of $A$ corresponds to a single constraint. Coefficients corresponding to the variables that are active in that constraint are set to 1 and the other coefficients in that row are all set to 0. Because all of our constraints have 1s on their right hand sides, we have $b = \mathbf 1$.
Formulating and solving the problem
We can now formulate our optimization problem:
Because sudokus have unique solutions, we just need to find the $x$ that satisfies our constraints – there will only be one such $x$! Because we just need to find that $x$, the objective value doesn’t have to help us discriminate between competing feasible points for this problem, so we can safely use 0 as our objective function. (This type of problem is known as a feasibility problem.)*
With the optimization problem in hand, the following short piece of code uses CVXPY to solve any sudoku puzzle very quickly:
1
2
3
4
5
6
7
8
9
10
11
12
import cvxpy as cp
import numpy as np
# ... <code that formulates the constraints> ...
x = cp.Variable(N, boolean=True)
A = np.array(list(constrs))
b = np.ones(len(constrs))
objective = cp.Minimize(0)
constraints = [A @ x == b]
problem = cp.Problem(objective, constraints)
problem.solve(solver=cp.ECOS_BB)
The full code (~100 lines) can be found here.
Conclusion
Sudoku is typically framed and solved as a CSP with algorithms that involve some guessing and checking. I thought this was an interesting application of optimization to a problem that maybe doesn’t immediately lend itself to such a formulation. Hope you enjoyed!
*As a note for a slightly more technically inclined reader, a friend of mine pointed out that the algorithm that the solver uses, called branch and bound, ends up degenerating into an exponential tree search, akin to just trying out all possibilities with no way of discriminating between “better” and “worse” points. To remedy this (at least in part), we can use the objective function $x^T \mathbf 1$, which counts the number of ones in any solution. By using this objective instead of $0$, we give the algorithm a way to “prune” the tree it is searching, which may lead to performance benefits. The code implementing this approach would only be slightly different:
1
2
3
4
5
6
7
8
9
10
11
12
import cvxpy as cp
import numpy as np
# ... <code that formulates the constraints> ...
x = cp.Variable(N, boolean=True)
A = np.array(list(constrs))
b = np.ones(len(constrs))
objective = cp.Minimize(x.T @ np.ones(N)) # <-- new objective function
constraints = [A @ x == b]
problem = cp.Problem(objective, constraints)
problem.solve(solver=cp.ECOS_BB)