# Posted in January 2019

## More on machine learning and CH

A few days ago I wrote about a paper establishing an independence result in the field of machine learning. Here I offer a few more comments.

In the paper the authors comment on the relation between their result and actual machine learning. That relation may seem tenuous because none of the functions involved in the arguments is related to any kind of algorithm.

Indeed the constructions of the compression schemes are very non-constructive in that they use repeated applications of the Axiom of Choice.

Now the inequality 2^{ℵ0}>ℵ_{ω} implies there are no compression schemes whatsoever. But it may be of interest to know that consistent examples of compression schemes must be nonconstructive. It turns out that, with the aid of a few standard results from Descriptive Set Theory it is relatively easy to show outright that there are no Borel measurable monotone compression schemes and hence no Borel measurable learning functions for the class of problems studied in the paper mentioned above either.

The details can be found in this note.

## Machine Learning and the Continuum Hypothesis

Not even Machine Learning is safe from Set Theory, or so it seems. On the website of the journal Nature there is an article about a paper in Nature Machine Intelligence that connects a certain kind of learnability to the Continuum Hypothesis. The conclusion of the paper is that certain abstract learnability questions are undecidable on the basis of the normal ZFC axioms of Set Theory.

The article tries to explain what is going on but seems to confuse two disparate things: Gödel’s (First) Incompleteness Theorem on the one hand and the undecidability of the Continuum Hypothesis on the other hand.

The first is a very general statement about first-order theories; it states that for every theory that satisfies a number of technical conditions there are statements that have no formal proof *and neither do their negations*. Elementary number theory is subject to this theorem, as is ZFC Set Theory.

The second is a Set-theoretical statement for which we can prove that there is no formal proof, nor for its negation. It is also more interesting than Gödel’s statements; the latter `simply’ assert their own unprovability, whereas the Continuum Hypothesis is a fundamental statement/question about the set of real numbers.

The confusion manifests itself when the Continuum Hypothesis is called a paradox. It is not. The statements from the Incompleteness Theorem on the other hand are usually likened to the Liar Paradox in that “This formula if unprovable” looks a lot like the paradox that is “This sentence is false”.

The paper itself also alludes to the Incompleteness Theorem; it even states that is used in the argument. It is not. No use is made of Gödel’s abstract unprovable sentences.

## The Set Theory

So, what is the Set Theory in the paper? The learnability question is shown to be equivalent to the existence of a natural number m and a map η from the family of m-element subsets of [0,1] to the family F of finite subsets of [0,1] that satisfies the following condition: if A is a subset of [0,1] with m+1 elements then it has a subset B with m elements such that η(B) contains A.

The main theorem of the paper states that an arbitrary set X admits such a map with m=k+1 if and only if X has cardinality at most ℵ_{k}.

If the Continuum Hypothesis holds then [0,1] has cardinality ℵ_{1}, hence there is a map as required with m=2. More generally there is a map as required if the cardinality of [0,1] is equal to ℵ_{k} for some natural number k. These possibilities do not lead to contradictions, hence neither does the learnability statement. On the other hand, the statement that the cardinality of [0,1] is larger than all these ℵ_{k} does not lead to contradictions either, hence neither does the negation of the learnability statement.

The derivation of the main statement parallells that of the main result of the paper Sur une caractérisation des alephs by Kuratowski from 1951: a set X has cardinality at most ℵ_{k} if and only if its power X^{k+2} can be written as the union of k+2 sets A_{1}, …, A_{k+2} such that for every i and every point (x_{1},…,x_{k+2}) in the power the set of points y in A_{i} that satisfy y_{j}=x_{j} for j≠i is finite; in Kuratowski’s words “A_{i} is finite in the direction of the ith axis”.

Indeed one can even construct of a map η for m=k+1 from this decomposition of the canonical set ω_{k} of cardinality ℵ_{k}.

For notational simplicity we take k=2, so m=3, and ω_{2}^{4} has a decomposition into four sets A_{1}, A_{2}, A_{3}, and A_{4}. Given a subset F of ω_{2} of 3 elements enumerate it in increasing order: x_{1}<x_{2}<x_{3}. The set η(F) will consist of F itself together with

- all x for which (x,x
_{1},x_{2},x_{3}) belongs to A_{1}, - all x for which (x
_{1},x,x_{2},x_{3}) belongs to A_{2}, - all x for which (x
_{1},x_{2},x,x_{3}) belongs to A_{3}, - all x for which (x
_{1},x_{2},x_{3},x) belongs to A_{4}

To see that this works let G be a four-element subset of ω_{2}, enumerate it as y_{1}<y_{2}<y_{3}<y_{4}. Then (y_{1},y_{2},y_{3},y_{4}) belongs to one of the four sets, say it belongs to A_{2}; then G is a subset of η({y_{1},y_{3},y_{4}}): the point y_{2} is included in the second line in the list above.

**Note**. The proof of the main theorem (Theorem 1) of the paper is not quite correct: it fails for k=1 for example as one encounters the cardinal number ℵ_{-1}. Worse: in that case the ordering <_{1} seems to have order type ω_{1} and ω_{0} simultaneously. All this can be repaired with a better write-up.

## Recent Comments