A totally ordered bijective mapping from string to integer

Here’s an interesting math problem that I discussed with my roommates.

Give a bijective function f that maps from the set of all strings to the set of integers such that for every pair of strings S and T, S < T iff f(S) < f(T) and S = T iff f(S) = f(T) (i.e. its lexicographical order is preserved) or show that there is no such a function.

We will show that there is only one case in which there is such a function. Credit: George Chen, Jonahan Kotker, Kaushik Iyer, and Ryosuke Niwa.

Sketch of Proof

Ok, this proof is bad. Someone needs to improve it.

First we define a utility function R(S, i) that maps the i-th letter in the string S to a fixed-width positive integer lexicographically (e.g. use UTF-16 so that each digit is expressible in four hexadecimal digits); if there is no i-th letter in S, then R(S, i) = 0. Let us call this width W. We also define a N(S, T) that takes two integers and concatenate them so that N(S,T)=S×bW+T where b is the base used in R(S, i). It is intuitively clear that we can recursively apply N to concatenate arbitrary number of integers of width W. We will divide the problems into several cases depending on the cardinality of the set of strings and of string.

  1. If the set of strings is consisting of a finite number of finite-length strings, then there is such a function. Find the longest string in the set, and let us call its length L. Then for each string S, obtain R(S,k) for k=1,2,...,(L1) and concatenate them in the respective order; i.e. N(N(...N(R(S,1),R(S,2))...),R(S,L1)). But then the resultant integer has the desired property.
  2. If there are infinite number of finite-length strings, then suppose there is such a function f. Let N=f("a") and M=f("c"), then we know that N<M since f("b")<f("c"). Because N and M are integers and they are finite. In particular, there are MN1 such strings. Enumerate all MN1 strings, and append “a” at the end. Clearly, all of newly created strings lie between “b” and “c” lexicographically. But we know have 2(MN1) strings even though we have only MN1 integers between “b” and “c”. We have reached a contradiction. Thus our supposition is false, and there is no such function f.
  3. If there are finite number of infinite-length strings, then what??
  4. In the case that there are infinite number of infinite-length strings, consider the relationship between an infinite-length string and a real number. Namely, we can express any real number as an infinite-length string by writing each digit down although infinite. Clearly, this implies that such set of strings has the cardinality at least as large as that of the set of all real numbers. Then, it is certainly impossible to have a bijective function from this set to the set of integers.

Correction per Min Xu’s Comments

For 3, enumerate all strings in the lexicographical order and assign integers.

For 4, If there are infinite number of infinite length strings, then this is a superset of the set considered in the case 2.  This clearly indicates that there is no function f.