Machine Learning and the Continuum Hypothesis

Posted on 2019-01-09 by Klaas Pieter Hart

Not even Machine Learning is safe from Set Theory, or so it seems. On the website of the journal Nature there is an article about a paper in Nature Machine Intelligence that connects a certain kind of learnability to the Continuum Hypothesis. The conclusion of the paper is that certain abstract learnability questions are undecidable on the basis of the normal ZFC axioms of Set Theory.

The article tries to explain what is going on but seems to confuse two disparate things: Gödel’s (First) Incompleteness Theorem on the one hand and the undecidability of the Continuum Hypothesis on the other hand.
The first is a very general statement about first-order theories; it states that for every theory that satisfies a number of technical conditions there are statements that have no formal proof and neither do their negations. Elementary number theory is subject to this theorem, as is ZFC Set Theory.
The second is a Set-theoretical statement for which we can prove that there is no formal proof, nor for its negation. It is also more interesting than Gödel’s statements; the latter `simply’ assert their own unprovability, whereas the Continuum Hypothesis is a fundamental statement/question about the set of real numbers.

The confusion manifests itself when the Continuum Hypothesis is called a paradox. It is not. The statements from the Incompleteness Theorem on the other hand are usually likened to the Liar Paradox in that “This formula if unprovable” looks a lot like the paradox that is “This sentence is false”.

The paper itself also alludes to the Incompleteness Theorem; it even states that is used in the argument. It is not. No use is made of Gödel’s abstract unprovable sentences.

The Set Theory

So, what is the Set Theory in the paper? The learnability question is shown to be equivalent to the existence of a natural number m and a map η from the family of m-element subsets of [0,1] to the family F of finite subsets of [0,1] that satisfies the following condition: if A is a subset of [0,1] with m+1 elements then it has a subset B with m elements such that η(B) contains A.

The main theorem of the paper states that an arbitrary set X admits such a map with m=k+1 if and only if X has cardinality at most ℵ_k.

If the Continuum Hypothesis holds then [0,1] has cardinality ℵ₁, hence there is a map as required with m=2. More generally there is a map as required if the cardinality of [0,1] is equal to ℵ_k for some natural number k. These possibilities do not lead to contradictions, hence neither does the learnability statement. On the other hand, the statement that the cardinality of [0,1] is larger than all these ℵ_k does not lead to contradictions either, hence neither does the negation of the learnability statement.

The derivation of the main statement parallells that of the main result of the paper Sur une caractérisation des alephs by Kuratowski from 1951: a set X has cardinality at most ℵ_k if and only if its power X^k+2 can be written as the union of k+2 sets A₁, …, A_k+2 such that for every i and every point (x₁,…,x_k+2) in the power the set of points y in A_i that satisfy y_j=x_j for j≠i is finite; in Kuratowski’s words “A_i is finite in the direction of the ith axis”.
Indeed one can even construct of a map η for m=k+1 from this decomposition of the canonical set ω_k of cardinality ℵ_k.
For notational simplicity we take k=2, so m=3, and ω₂⁴ has a decomposition into four sets A₁, A₂, A₃, and A₄. Given a subset F of ω₂ of 3 elements enumerate it in increasing order: x₁<x₂<x₃. The set η(F) will consist of F itself together with

all x for which (x,x₁,x₂,x₃) belongs to A₁,
all x for which (x₁,x,x₂,x₃) belongs to A₂,
all x for which (x₁,x₂,x,x₃) belongs to A₃,
all x for which (x₁,x₂,x₃,x) belongs to A₄

To see that this works let G be a four-element subset of ω₂, enumerate it as y₁<y₂<y₃<y₄. Then (y₁,y₂,y₃,y₄) belongs to one of the four sets, say it belongs to A₂; then G is a subset of η({y₁,y₃,y₄}): the point y₂ is included in the second line in the list above.

Note. The proof of the main theorem (Theorem 1) of the paper is not quite correct: it fails for k=1 for example as one encounters the cardinal number ℵ_-1. Worse: in that case the ordering <₁ seems to have order type ω₁ and ω₀ simultaneously. All this can be repaired with a better write-up.

Be Sociable, Share!

Posted in Georg Cantor, Logica, onmogelijkheden, Set Theory, verzamelingen, Wiskunde permalink

2 comments

Abstract: We comment on a recent paper that connects certain forms of machine learning to Set Theory. We point out that part of the set-theoretic machinery is related to a result of Kuratowski about decompositions of finite powers of sets and we show that there is no Borel measurable monotone compression function on the unit interval.

Artificial intelligence (AI) is trending globally in commerce, science, health care, geopolitics, and more. Deep learning, a subset of machine learning, is the lever that launched the worldwide rush—an area of strategic interest for researchers, scientists, visionary CEOs, academics, geopolitical think tanks, pioneering entrepreneurs, astute venture capitalists, strategy consultants, and management executives from companies of all sizes. Yet in the midst of this AI renaissance is a relatively fundamental yet unsolvable problem with machine learning that is not commonly known, nor frequently discussed outside of the small cadre of philosophers, and artificial intelligence experts.

KP Hart

Recent Posts

Recent Comments

Archives

Categories

Meta

Machine Learning and the Continuum Hypothesis

The Set Theory

2 comments

Leave a Reply

Direct links: