# Machine Learning and the Continuum Hypothesis

Not even Machine Learning is safe from Set Theory, or so it seems. On the website of the journal Nature there is an article about a paper in Nature Machine Intelligence that connects a certain kind of learnability to the Continuum Hypothesis. The conclusion of the paper is that certain abstract learnability questions are undecidable on the basis of the normal ZFC axioms of Set Theory.

The article tries to explain what is going on but seems to confuse two disparate things: Gödel’s (First) Incompleteness Theorem on the one hand and the undecidability of the Continuum Hypothesis on the other hand.

The first is a very general statement about first-order theories; it states that for every theory that satisfies a number of technical conditions there are statements that have no formal proof *and neither do their negations*. Elementary number theory is subject to this theorem, as is ZFC Set Theory.

The second is a Set-theoretical statement for which we can prove that there is no formal proof, nor for its negation. It is also more interesting than Gödel’s statements; the latter `simply’ assert their own unprovability, whereas the Continuum Hypothesis is a fundamental statement/question about the set of real numbers.

The confusion manifests itself when the Continuum Hypothesis is called a paradox. It is not. The statements from the Incompleteness Theorem on the other hand are usually likened to the Liar Paradox in that “This formula if unprovable” looks a lot like the paradox that is “This sentence is false”.

The paper itself also alludes to the Incompleteness Theorem; it even states that is used in the argument. It is not. No use is made of Gödel’s abstract unprovable sentences.

## The Set Theory

So, what is the Set Theory in the paper? The learnability question is shown to be equivalent to the existence of a natural number m and a map η from the family of m-element subsets of [0,1] to the family F of finite subsets of [0,1] that satisfies the following condition: if A is a subset of [0,1] with m+1 elements then it has a subset B with m elements such that η(B) contains A.

The main theorem of the paper states that an arbitrary set X admits such a map with m=k+1 if and only if X has cardinality at most ℵ_{k}.

If the Continuum Hypothesis holds then [0,1] has cardinality ℵ_{1}, hence there is a map as required with m=2. More generally there is a map as required if the cardinality of [0,1] is equal to ℵ_{k} for some natural number k. These possibilities do not lead to contradictions, hence neither does the learnability statement. On the other hand, the statement that the cardinality of [0,1] is larger than all these ℵ_{k} does not lead to contradictions either, hence neither does the negation of the learnability statement.

The derivation of the main statement parallells that of the main result of the paper Sur une caractérisation des alephs by Kuratowski from 1951: a set X has cardinality at most ℵ_{k} if and only if its power X^{k+2} can be written as the union of k+2 sets A_{1}, …, A_{k+2} such that for every i and every point (x_{1},…,x_{k+2}) in the power the set of points y in A_{i} that satisfy y_{j}=x_{j} for j≠i is finite; in Kuratowski’s words “A_{i} is finite in the direction of the ith axis”.

Indeed one can even construct of a map η for m=k+1 from this decomposition of the canonical set ω_{k} of cardinality ℵ_{k}.

For notational simplicity we take k=2, so m=3, and ω_{2}^{4} has a decomposition into four sets A_{1}, A_{2}, A_{3}, and A_{4}. Given a subset F of ω_{2} of 3 elements enumerate it in increasing order: x_{1}<x_{2}<x_{3}. The set η(F) will consist of F itself together with

- all x for which (x,x
_{1},x_{2},x_{3}) belongs to A_{1}, - all x for which (x
_{1},x,x_{2},x_{3}) belongs to A_{2}, - all x for which (x
_{1},x_{2},x,x_{3}) belongs to A_{3}, - all x for which (x
_{1},x_{2},x_{3},x) belongs to A_{4}

To see that this works let G be a four-element subset of ω_{2}, enumerate it as y_{1}<y_{2}<y_{3}<y_{4}. Then (y_{1},y_{2},y_{3},y_{4}) belongs to one of the four sets, say it belongs to A_{2}; then G is a subset of η({y_{1},y_{3},y_{4}}): the point y_{2} is included in the second line in the list above.

**Note**. The proof of the main theorem (Theorem 1) of the paper is not quite correct: it fails for k=1 for example as one encounters the cardinal number ℵ_{-1}. Worse: in that case the ordering <_{1} seems to have order type ω_{1} and ω_{0} simultaneously. All this can be repaired with a better write-up.

## 2 comments

Abstract: We comment on a recent paper that connects certain forms of machine learning to Set Theory. We point out that part of the set-theoretic machinery is related to a result of Kuratowski about decompositions of finite powers of sets and we show that there is no Borel measurable monotone compression function on the unit interval.

Artificial intelligence (AI) is trending globally in commerce, science, health care, geopolitics, and more. Deep learning, a subset of machine learning, is the lever that launched the worldwide rush—an area of strategic interest for researchers, scientists, visionary CEOs, academics, geopolitical think tanks, pioneering entrepreneurs, astute venture capitalists, strategy consultants, and management executives from companies of all sizes. Yet in the midst of this AI renaissance is a relatively fundamental yet unsolvable problem with machine learning that is not commonly known, nor frequently discussed outside of the small cadre of philosophers, and artificial intelligence experts.