# Machine Learning and the Continuum Hypothesis

Not even Machine Learning is safe from Set Theory, or so it seems. On the website of the journal Nature there is an article about a paper in Nature Machine Intelligence that connects a certain kind of learnability to the Continuum Hypothesis. The conclusion of the paper is that certain abstract learnability questions are undecidable on the basis of the normal ZFC axioms of Set Theory.

The article tries to explain what is going on but seems to confuse two disparate things: Gödel’s (First) Incompleteness Theorem on the one hand and the undecidability of the Continuum Hypothesis on the other hand.
The first is a very general statement about first-order theories; it states that for every theory that satisfies a number of technical conditions there are statements that have no formal proof and neither do their negations. Elementary number theory is subject to this theorem, as is ZFC Set Theory.
The second is a Set-theoretical statement for which we can prove that there is no formal proof, nor for its negation. It is also more interesting than Gödel’s statements; the latter `simply’ assert their own unprovability, whereas the Continuum Hypothesis is a fundamental statement/question about the set of real numbers.

The confusion manifests itself when the Continuum Hypothesis is called a paradox. It is not. The statements from the Incompleteness Theorem on the other hand are usually likened to the Liar Paradox in that “This formula if unprovable” looks a lot like the paradox that is “This sentence is false”.

The paper itself also alludes to the Incompleteness Theorem; it even states that is used in the argument. It is not. No use is made of Gödel’s abstract unprovable sentences.

## The Set Theory

So, what is the Set Theory in the paper? The learnability question is shown to be equivalent to the existence of a natural number m and a map η from the family of m-element subsets of [0,1] to the family F of finite subsets of [0,1] that satisfies the following condition: if A is a subset of [0,1] with m+1 elements then it has a subset B with m elements such that η(B) contains A.

The main theorem of the paper states that an arbitrary set X admits such a map with m=k+1 if and only if X has cardinality at most ℵk.

If the Continuum Hypothesis holds then [0,1] has cardinality ℵ1, hence there is a map as required with m=2. More generally there is a map as required if the cardinality of [0,1] is equal to ℵk for some natural number k. These possibilities do not lead to contradictions, hence neither does the learnability statement. On the other hand, the statement that the cardinality of [0,1] is larger than all these ℵk does not lead to contradictions either, hence neither does the negation of the learnability statement.

The derivation of the main statement parallells that of the main result of the paper Sur une caractérisation des alephs by Kuratowski from 1951: a set X has cardinality at most ℵk if and only if its power Xk+2 can be written as the union of k+2 sets A1, …, Ak+2 such that for every i and every point (x1,…,xk+2) in the power the set of points y in Ai that satisfy yj=xj for j≠i is finite; in Kuratowski’s words “Ai is finite in the direction of the ith axis”.
Indeed one can even construct of a map η for m=k+1 from this decomposition of the canonical set ωk of cardinality ℵk.
For notational simplicity we take k=2, so m=3, and ω24 has a decomposition into four sets A1, A2, A3, and A4. Given a subset F of ω2 of 3 elements enumerate it in increasing order: x1<x2<x3. The set η(F) will consist of F itself together with

• all x for which (x,x1,x2,x3) belongs to A1,
• all x for which (x1,x,x2,x3) belongs to A2,
• all x for which (x1,x2,x,x3) belongs to A3,
• all x for which (x1,x2,x3,x) belongs to A4

To see that this works let G be a four-element subset of ω2, enumerate it as y1<y2<y3<y4. Then (y1,y2,y3,y4) belongs to one of the four sets, say it belongs to A2; then G is a subset of η({y1,y3,y4}): the point y2 is included in the second line in the list above.

Note. The proof of the main theorem (Theorem 1) of the paper is not quite correct: it fails for k=1 for example as one encounters the cardinal number ℵ-1. Worse: in that case the ordering <1 seems to have order type ω1 and ω0 simultaneously. All this can be repaired with a better write-up.