Proposistion (Lecture 1)

Outline

Our focus in this lecture will be on these 3 questions.

  1. What is a proposition?
  2. What is a logical connective?
  3. How logical connectives interact?

Definition (Discussion)

A proposition (in this course at least) is always assumed to be unambiguous and definite, so that any proposition is always either true or false. We also assume propositions are objective, so that their truth value is independent from whether they being true or not.


Examples

  1. London is the capital of the UK.
  2. Berlin ist die deutsche Hauptstadt nicht.
  3. \(3 > 2\).
  4. \(12345678^{87654321}\) is a prime number.
  5. For every real number, there is an integer such that the previous is smaller the latter.

Nonexamples

  1. Real analysis is easy.
  2. \(x > 2\).
  3. Is there a number whose square is 2?

Connectives

Negation

Let \(A\) be any proposition, we denote \(\neg A\) to be the negation of \(A\). For example, if \(A\) is the proposition "\(3 > 2\)", then \(\neg A\) is "\(3 \le 2\)".

From previous discussion, we can see:

\(A\) \(\neg A\)
T F
F T

Conjunction

Conjunction is just a fancy way to say "and". For any two propositions \(A\) and \(B\), we will denote their conjunction to be \(A\land B\). \(A \land B\) is t true whenever \(A\) and \(B\) are simultaneously true. For example "\(3 > 2 \land 2 > 1\)" asserts that three is bigger than two and two is bigger than one.

\(A\) \(B\) \(A \land B\)
T T T
T F F
F T F
F F F

Conjunction in some older references might be known as the logical product.


Disjunction

For ant two propositions \(A\) and \(B\) , we denote their disjunction to be \(A\lor B\). \(A\lor B\) is true whenever at least one of \(A\) \(B\) is true. This may be different from everyday "or" which sometimes has an exclusive flavour.

\(A\) \(B\) \(A \lor B\)
T T T
T F T
F T T
F F F

Disjunction in some older references might be known as the logical sum.


Implication

Implication loosely translates to "if ... then ...". Let us denote \(A\) implying \(B\) by \(A \implies B\) or \(A \to B\), whichever you prefer. We call in this case \(A\) to be antecedent or premise and \(B\) to be consequent or conclusion.

\(A\) \(B\) \(A \implies B\)
T T T
T F F
F T T
F F T

The last two rows are the baffling statement that ex falso quodlibet which might be worthy of a philosophical debate1. But we will content ourselves with the following discussion.


For all \(x\), \(x > 2 \implies x^2>4\)

We hope this to be true (intuitionistically). Let us work out explicitly what this says for different \(x\).

\(x\) \(x>2\) \(x^2>4\) \(x>2\implies x^2>4\)
3 T T T
-3 F T T
0 F F T

So I mean, why not?

This is why \(\implies\) only loosely translates to "if ... then ..." because the latter contains a causal relation. But for our usage, "Snow is black \(\implies\) grass is red" is a true statement2.


Equivalence

For propositions \(A\) and \(B\), \(A\) and \(B\) are equivalent, denoted by \(A \iff B\), is to say A is true whenever B is true and vice versa.

\(A\) \(B\) \(A \iff B\)
T T T
T F F
F T F
F F T

Exercises

  1. Fill in the blank and convince yourself that we are defining equivalence to be \((A\implies B) \land (B \implies A)\)
\(A\) \(B\) \(A\implies B\) \(B\implies A\) \(A \iff B\)
T T T
T F F
F T F
F F T
  1. Convince yourself \(A \iff \neg\neg A\). This is called the double negation law.

  2. Convince yourself \((A\implies B)\iff(\neg A\lor B)\)


Quantifiers

\(\forall\)

It is somewhat cumbersome to always write statements like "P is true for every x". Instead we write \(\forall xP\) or \(\forall x, P\) if adding a comma makes the meaning clearer.

\(\exists\)

We also denote statements like "there is some x to make P true" as \(\exists x P\) or \(\exists x, P\).

We also have \[\neg\forall x P\iff \exists x\neg P\]


Exercises

Assume all we know is natural number, what does this mean and is is true:

  1. \(\forall N,\exists p, P(p)\land (p>N)\) where \(P(p)\) is to say \(p\) is prime.
  2. \(\forall m\forall n,m+n>m\)
  3. \(\exists z\exists o\forall m,m+z=m \land m\times o=m\)
  4. come up with more examples until you are comfortable with \(\forall\) and \(\exists\).

Laws of propositions and connectives

Let \(A\), \(B\), \(C\) be propositions

Commutative laws

  1. of conjunction \[A \land B \iff B\land A\]
  2. of disjunction \[A \lor B \iff B\lor A\]

Associative laws

  1. of conjunction \[[(A\land B)\land C] \iff [A\land(B\land C)]\]
  2. of disjunction \[[(A\lor B)\lor C] \iff [A\lor(B\lor C)]\]

Let \(A\), \(B\), \(C\) be propositions

Distributive laws

  1. \([A\land (B\lor C)] \iff [(A\land B)\lor(A\land C)]\)
  2. \([A\lor (B\land C)] \iff [(A\lor B)\land(A\lor C)]\)

Idempotent laws

  1. of conjunction \[(A\land A)\iff A\]
  2. of disjunction \[(A\lor A)\iff A\]

Transitive laws

  1. of implication \[[(A\implies B)\land (B\implies C)]\implies (A\implies C)\]
  2. of equivalence \[[(A\iff B)\land (B\iff C)]\implies(A\iff C)\]

Let \(A\), \(B\) be propositions

De Morgan's Laws

  1. \(\neg(A\lor B)\iff(\neg A\land\neg B)\)
  2. \(\neg(A\land B)\iff(\neg A\lor\neg B)\)

Law of contraposition

\((A\implies B)\iff(\neg B\implies \neg A)\)

Law of syllogism

  1. \((A\lor\neg A)\) is a tautology.
  2. \((A\land\neg A)\) is a contradiction.

Absorption Laws

Let \(T\) be any true proposition and \(F\) be any false proposition

  1. \((A\land T)\iff A\)
  2. \((A\lor T)\iff T\)
  3. \((A\land F)\iff F\)
  4. \((A\lor F)\iff A\)

Exercises

Let \(A\) and \(B\) be propositions, express the following using connectives. Try to see some equivalent representations using the "laws".

  1. one of and only one of \(A\), \(B\) is true;
  2. \(A\) and \(B\) are never true at the same time;
  3. at least one of \(A\), \(B\) is true;
  4. create your own until you feel comfortable with connectives.

Let \(A\) be "something only about \(x\)" so that \(\forall xA\) and \(\exists xA\) are propositions, express the following:

  1. there is a unique x to make \(A\) true. (From now on, we denote this as \(\exists!x,A\))
  2. If you know group theory, try to express the fact that \(\mathbb{Z}\) is an additive group.

Summary

Please summarise this lecture by answering:

  1. So what is a proposition?
  2. So what is a logical connective?
  3. So how do logical connectives interact?

For more attentive reader, you might find these useful with a deeper view towards first order logic:

  1. Hamilton (1988) chapter 1 and 2;
  2. Van Dalen (2004) chapter 1

Basic naïve set theory (Lecture 2)

Disclaimer

In this section we sometimes need to purposely avoid the most rigorous approach. Because otherwise we should develop first order logic and axiomatic set theory which are really quite advanced. In particular, we will not really define what a set is, instead we just pretend that this is an intrinsically clear definition and ignore set-theoretic paradoxes such as Russell.

What to expect in this lecture

  1. roughly what a set is;
  2. operations on sets;
  3. properties of sets and their operations;
  4. relations and functions

"Definition"

For the purpose of this course only, a set is something we can talk about "membership".

So for \(A\) any set, if we want to say "\(x\) is a member of \(A\)", we write conveniently \(x\in A\); if we want to say "\(x\) is not a member of \(A\)", we write "\(x\not\in A\)", short for "\(\neg(x\in A)\)".

  • We can write down a set simply by listing all the elements (if this is possible). For example, \(\{1,2\}\) is the set with only two elements, namely 1 and 2.
  • We can write down a set by its defining property: \(\{x | P\}\) where P is some proposition. Then any element in this set satisfies \(P\), and everything satisfying \(P\) is in the set. For example \(\{x|x\in\mathbb N\land x \mathrm{\ even}\}\) is the set \(\{0,2,4,6,8,10,...\}\).

For \(A\) any set and \(p\) some well-defined property, let us agree to short \(\forall x,x\in A\implies x \textrm{ has property p}\) as \(\forall x\in A\) and \(\exists x,x\in A\land x\textrm{ has property p}\) to \(\exists x\in A, x\textrm{ has property p}\).

Little exercise: check \(\neg(\forall x\in A, x\textrm{ has property p})\iff\exists x\in A,\neg(x \textrm{ has property p})\).

Axiom of extensionality

We agree that all about sets are elements and nothing else: if \(A\) and \(B\) are sets then \(A=B\iff \forall x, (x\in A\iff x\in B)\). This is the axiom of extensionality.

For any set \(A\), we write \(|A|\) or \(\mathrm {card} A\) to be the cardinality of \(A\) when \(A\) is finite. So, \(|{1,2,3}|=3\).


Prehaps the most important set of all. (Exercise)

  1. There is a trivial example of set, namely, the set containing nothing. Prove using the axiom of extensionality that there is only one empty set. Hence we denote this unique empty set as \(\emptyset\). If you want to know why I claim such a boring set to be vitally important, consult perhaps Halmos (2017), in some (weak) sense maths is really about empty set.

  2. Please explain with the axiom of extensionality, why \(\emptyset\) and \(\{\emptyset\}\) are two different object.

In the notion above, \(\emptyset\) is the only set with size zero. If you did the exercise in the last lecture: \[\exists!\emptyset,\forall x,x\notin\emptyset\]


Subset and power set

Let \(A\) be any set.

  • We say a set \(B\) is a subset of \(A\) if and only if \(\forall x,x\in B\implies x\in A\). In this case, we denote it as \(B\subseteq A\). Note that in particular \(A\subseteq A\) and \(\emptyset\subseteq A\). We also short \(B\subseteq A\land B\neq A\) to \(B\subsetneq A\) and in this case \(B\) is a proper subset of \(A\). By \(\{x\in A|P\}\), we mean \(\{x|x\in A\land P\}\) -- the subset of \(A\) whose elements satisfy P.

  • By power set of \(A\), written as \(P(A)\) or \(\mathcal P (A)\), we mean the set of all subsets of \(A\).

Exercise

Prove by ax. of ext., \(A=B\iff A\subseteq B\land B\subseteq A\)


Say \(A=\{1,2,3\}\), Some examples of subsets

  1. \(\emptyset\)
  2. \(\{1,2\}\)
  3. \(\{1,2,3\}\)

Some nonexamples of subsets

  1. \(\{\emptyset\}\)
  2. \(1\)
  3. \(\{1,\{2\}\}\)

\(\mathcal P (A) = \{\emptyset,\{1\},\{2\},\{3\},\{1,2\},\{1,3\},\{2,3\},\{1,2,3\}\}\)

Note that in some older references, \(\mathcal P(A)\) is sometimes denoted as \(2^A\), could you think why? (Hint: size)


Operations on sets

Let \(A\) and \(B\) be sets:

Union

We write \(A\cup B\) to be the union of \(A\) and \(B\): \[A\cup B:=\{x|x\in A\lor x\in B\}\]

Let \(I\) be an indexing set and for any \(i\in I\), \(A_i\) is a set, we define \(\bigcup_{i\in I}A_i:=\{x|\exists i\in I,x\in A_i\}\)

Intersection

We write \(A\cap B\) to be the intersection of \(A\) and \(B\): \[A\cap B:=\{x|x\in A\land x\in B\}\]

Let \(I\) be an indexing set and for any \(i\in I\), \(A_i\) is a set, we define \(\bigcap_{i\in I}A_i:=\{x|\forall i\in I,x\in A_i\}\)

Difference

We write the difference between \(A\) and \(B\) as \(A-B\) or \(A/B\): \[A-B:=\{x|x\in A\land x\notin B\}\] In many scenarios, will often fix a set \(A\) and everything we do will be within this set. We call this set the universe. In this case instead of \(A-B\), we write \(B^c\) (for complement). For example the universe of real analysis is often \(\mathbb R\) and the set of irrational numbers is \(\mathbb Q^c\).

Symmetric difference

Denoted by \(A\Delta B\), we write the symmetric difference of \(A\) and \(B\) to be: \[A\Delta B=(A-B)\cup(B-A)\]

Cartesian product

From the axiom of extensionality, \(\{a,b\}=\{b,a\}\). However often we need to care about the order of elements. Thus we define ordered pair (2-tuple)

We then inductively define ordered \(n\)-tuple. For \(n\) elements \(a_1,...,a_n\), we define \((a_1,...,a_n)\) to be \((a_1,(a_2,...,a_n))\) where \((a_2,...,a_n)\) is the ordered \(n-1\)-tuple formed by \((a_2,...,a_n)\).

{ordered pair} \[(a,b)=\{\{a\},\{a,b\}\}\]

Theorem

\[[(a,b)=(x,y)]\iff[(a=x)\land(b=y)]\]


Proof

For the forward direction: Suppose \((a,b)=(x,y)\), then \(\{\{a\},\{a,b\}\}=\{\{x\},\{x,y\}\}\).

Either \(a=b\) or \(a\ne b\).

If \(a=b\) then \((a,b)=\{\{a\}\}=(x,y)=\{\{x\},\{x,y\}\}\). By ax. of ext., \(\{x\}\in\{\{a\}\}\) and \(\{x,y\}\in\{\{a\}\}\). Hence \(\{x\}=\{a\}\) and \(\{x,y\}=\{a\}\), by ax. of ext., \(x=a=y\).

Suppose now \(a\ne b\), by ax. of ext., \(\{a\}\in \{\{x\},\{x,y\}\}\). However, if \(\{a\}=\{x,y\}\), by ax. of ext., we must have \(x=y=a\). Then \(\{\{x\},\{x,y\}\}=\{\{a\}\}\). Then \(\{\{a\},\{a,b\}\}=\{\{a\}\}\). Then \(\{a,b\}=\{a\}\), then by ax. of ext., \(a=b\), a contradiction. Then \(\{a\}=\{x\}\), by ax. of ext., \(a=x\). Also \(\{a,b\}=\{x,y\}\), for otherwise \(a=b\). Then \(b=x\) or \(b=y\). The former contradicts \(b\ne a\) via \(a=x\). Hence \(b=y\).

The other direction is more or less trivial.

come back here after lecture 3

Verify \((a_1,...,a_n)=(b_1,...,b_n)\iff\forall i\in\{1,...,n\},a_i=b_i\)


{Cartesian product}

Let \(A\) and \(B\) be sets, the Cartesian product, written as \(A\times B\), is defined as: \[\{(a,b)|a\in A\land b\in B\}\]. Instead of \(A\times A\), we write \(A^2\). By ax. of ext., \(p\in A\times B\iff\exists\in A\exists b\in B,p=(a,b)\).


Example

Let \(A=\{0,1,2,3,4,5\}\) and \(B=\{1,3,5,7,9\}\).

  • \(A\cup B=\{0,1,2,3,4,5,6,7,8,9\}\)
  • \(A\cap B=\{1,3,5\}\)
  • \(A-B = \{0,2,4\}\)
  • \(B-A = \{7,9\}\)
  • \(A\Delta B=\{0,2,4,7,9\}\)
  • \(A\cup B=(A\cap B)\cup(A\Delta B)\)
  • \(A\times B\) is \[\begin{equation}\nonumber \begin{aligned} \{&(0,1),(0,3),(0,5),(0,7),(0,9),\\ &(1,1),(1,3),(1,5),(1,7),(1,9)\\ &(2,1),(2,3),(2,5),(2,7),(2,9),\\ &(3,1),(3,3),(3,5),(3,7),(3,9)\\ &(4,1),(4,3),(4,5),(4,7),(4,9),\\ &(5,1),(5,3),(5,5),(5,7),(5,9)\} \end{aligned} \end{equation}\] I regret here I chose \(A\) and \(B\) in this way.

Laws for operations on sets

Let \(A\), \(B\) and \(C\) be sets

Commutative law

  1. of union: \(A\cup B=B\cup A\)
  2. of intersection: \(A\cap B = B\cap A\)
\[\begin{align} x \in A\cup B &\iff x\in \{t| t\in A\lor t\in B\}\\ &\iff x\in A \lor x\in B\\ &\iff x\in B \lor x\in A\\ &\iff x\in \{t| t\in B\lor t\in A\}\\ &\iff x\in B\cup A \end{align}\]

By axiom of extensionality, this conclude the proof of 1; 2 is left as an exercise.


Associative laws (whose proves are left as exercise)

  1. of union: \(A\cup(B\cup C)=(A\cup B)\cup C\)
  2. of intersection: \(A\cap(B\cap C)=(A\cap B)\cap C\)

Distributive laws

  1. \(A\cap(B\cup C)=(A\cap B)\cup (A\cap C)\)
  2. \(A\cup(B\cap C)=(A\cup B)\cap (A\cup C)\)
\[\begin{align} x \in A\cap(B\cup C) &\iff x\in A\land(x\in B\cup C) \\ &\iff x\in A\land(x\in B\lor x\in C)\\ \end{align}\]

Please continue using lecture 1. Proof of 2 is left as exercise.


Idempotent laws

  1. \(A\cup A=A\)
  2. \(A\cap A=A\)
\[\begin{align} a\in A\cup A &\iff a\in \{x|x\in A\lor x\in A \} \\ &\iff a\in A\lor a\in A \\ &\iff a \in A \end{align}\]

By the axiom of extensionality, we have \(A\cup A=A\). Proof of 2 is exercise.


Exercise (a bit more involved)

Prove the following

  1. \(A\cup (B-A) = A\cup B\) (Hint: use \(x\in A\lor x\notin A\) is a tautology.)
  2. \(A-B=A-(A\cap B)\) (Hint: use \(x\in A\land x\notin A\) is a contradiction)
  3. De Morgan's Laws (Hint: use De Morgan's law from lecture 1)
  • \(A-(B\cap C)=(A-B)\cup(A-C)\)
  • \(A-(B\cup C)=(A-B)\cap(A-C)\)

Laws about Cartesian product

Let \(A\), \(B\), \(C\) and \(D\) be sets

  1. \(A\times(B\cup C)=(A\times B)\cup(A\times C)\)
  2. \(A\times(B\cap C)=(A\times B)\cap(A\times C)\)
  3. \((A\times B)\cup(C\times D)=(A\cup C)\times(B\cup D)\)
  4. \((A\times B)\cap(C\times D)=(A\cap C)\times(B\cap D)\)

Proof

  1. \[ \begin{aligned} (x,y)\in A\times(B\cup C) &\iff x\in A\land y\in B\cup C \\ &\iff x\in A\land(y\in B\lor y\in C) \end{aligned} \] You can take it from here using lecture 1;
  2. similar to 1;

  3. \[ \begin{aligned} (x,y)\in (A\times B)\cup(C\times D) \iff& (x,y)\in (A\times B)\\ \lor& (x,y)\in (C\times D)\\ \iff& (x\in A\land y\in B)\\ \lor&(x\in C\land y\in D) \end{aligned} \] You can take it from here using lecture 1;
  4. similar to 3.

Relations

Let \(X\) and \(Y\) be sets. A (binary) relation from \(X\) to \(Y\) is simply a subset \(R\subseteq X\times Y\). When dealing with relations, instead of \((x,y)\in R\) we write \(xRy\).

An \(n\)-ary relation amongst \(A_1\),...,\(A_n\) is a subset of \(A_1\times...\times A_n\).

Let \(R\) be a binary relation from \(X\) to \(Y\), we define

  1. The domain of R, \(\mathrm{dom}R:=\{x\in X|\exists y\in Y,xRy\}\)
  2. The range of R,\(\mathrm{ran}R:=\{y\in Y|\exists x\in X,xRy\}\)
  3. The inverse relation of \(R^{-1}:=\{(y,x)\in Y\times X|xRy\}\)

Example

The "\(<\)" relation on \(\mathbb N\) is the set \(\{(0,1),(0,2),...,(1,2),(1,3),...\}\).

  • \(\mathrm{dom}(<)=\mathbb N\)
  • \(\mathrm{ran}(<)=\mathbb N-\{0\}\)
  • inverse of \(<\) is \(\{(1,0),(2,0),...,(2,1),(3,1),...\}\), the \(>\) relation on \(\mathbb{N}\)

We can compose relations: let \(R\) be a relation from \(X\) to \(Y\) and \(S\) be a relation from \(Y\) to \(Z\). The composition of \(S\) and \(R\) is a relation from \(X\) to \(Z\) defined as: \[S\circ R:=\{(x,z)\in X\times Z|\exists y\in Y,xRy\land ySz\}\]

Let \(R\), \(S\), \(T\) be relations from \(A\) to \(B\), \(B\) to \(C\), \(C\) to \(D\) respectively, we have the following:

  1. \((R^{-1})^{-1}=R\)
  2. \(\mathrm{dom}(R^{-1})=\mathrm{ran}R\)
  3. \(\mathrm{ran}(R^{-1})=\mathrm{dom}R\)
  4. \(T\circ(S\circ T)=(T\circ S)\circ R\)
  5. \((S\circ R)^{-1}=R^{-1}\circ S^{-1}\)

Try this or otherwise see Stewart and Tall (2015) for proves.


Let \(R\) be a relation from \(A\) to \(A\). R is

  • reflexive if and only if \(\forall x\in X,xRx\);
  • symmetric if and only if \(\forall x\in X\forall y\in X,xRy\implies yRx\)
  • transitive if and only if \(\forall x\in X\forall y\in X\forall z\in X,(xRy\land yRz)\implies xRz\)

If R is reflexive, symmetric and transitive then R is called an equivalence relation. In this case we sometimes write \(xRy\) as \(x\equiv y\textrm{ mod }R\).

  • \(\le\) on \(\mathbb Z\) is reflexive, transitive but not symmetric.
  • \(=\) is an equivalence relation.
  • \(\{(x,y)\in \mathbb Z^2||x-y|\le 1\}\) is reflexive, symmetric but not transitive.
  • \(\equiv\textrm{mod }m:=\{(x,y)\in\mathbb Z^2|\exists k\in \mathbb Z, x-y=km\}\) is an equivalence relation.

Let \(R\) be an equivalence relation on \(X\) and \(x\in X\). The equivalence class of \(x\) with respect to \(R\), written as \([x]_R\), is \(\{y\in X|(y,x)\in R\}\)

Trivially, since \(R\) is in particular reflexive, \(\forall x\in X,xRx\) hence \(\forall x,x\in[x]_R\).


The following is really quite important. The proof is trivial but equivalence relation is ubiquitous in mathematics.

Theorem

\(\forall x\in X,\forall y\in X, y\in[x]_R\iff([y]_R=[x]_R)\)

Proof

Fix \(x\in X\) and \(y\in X\). The forward direction: suppose \(y\in[x]_R\). Then for any \(z\) \[ \begin{aligned} z\in[y]_R&\iff yRz & [\textrm{by defition}]\\ &\iff xRz & [R\textrm{ is equivalence relation}^\dagger]\\ &\iff z\in[x]_R &[\textrm{by definition}] \end{aligned} \]

The other direction is trivial.

For lack of time, the \(\dagger\) part is not explained with entirety, but do it to test if you understand transitivity and symmetry. Since equivalence relation is important, read more on Stewart and Tall (2015).

Function

Structurally speaking, function is just a special type of relation, but we seldom think it of this way, rather we tends to think function as if a rule of assignment.

Let \(f\) be a relation from \(X\) to \(Y\). We say \(f\) is a function from \(X\) to \(Y\) if and only if \[\forall x\in X,\exists!y\in Y,(x,y)\in f\land \mathrm{dom}f=X\]

In this case instead of \((x,y)\in f\) or \(xfy\), we write it as \(f(x)=y\) or \(y\mapsto f(x)\); we also note \(f\) being a function from \(X\) to \(Y\) as \(f:X\to Y\). We call \(Y\) as the codomain \(f\), denoted as \(\mathrm{codom} f\). Note \(\mathrm{codom}\) is different from \(\mathrm{ran}\).

For any set \(X\), we have a function usually called the identity on X \(i_X:=\mathrm{diag}X\) from \(X\) to \(X\).


Theorem

Let \(X\), \(Y\) be sets and \(f\), \(g\) on functions from \(X\) to \(Y\). Then \[f=g\iff\forall x\in X,f(x)=g(x)\]

Proof

Forward direction is trivial. Let \(x\in X\) since \(f\) is a function, there is a unique \(y\in Y\) such that \(y=f(x)\), then \((x,y)\in g\). This is \(f\subseteq g\). Similarly \(g\subseteq f\). So for functions \(f, g\) to be equal, domains of \(f\) and \(g\) must equal and codomains must be equal as well.

So if \(f:\mathbb Z\to\mathbb Z\), \(g:\mathbb Z\to \mathbb Z\),\(h:\mathbb Z\to\mathbb N\) be given by \(f(x)=(x+1)^2\), \(g(x)=x^2+2x+1\) and \(h(x)=(x+1)^2\) respectively, then \(f=g\) but \(f\ne h\) for their codomains are different.


Theorem

For \(f:X\to Y\),\(g:Y\to Z\), we have \(g\circ f\) is a function and \(\forall x\in X,g\circ f(x)=g(f(x))\)

Proof

Recall that \(g\circ f:=\{(x,z)|\exists y\in Y,(x,y)\in f\land (y,z)\in g\}\). This is \(g\circ f:=\{(x,z)|\exists y\in Y, y=f(x)\land z=g(y)\}\). Since \(f\) is a function, this is \(g\circ f:=\{(x,z)|\exists!y\in Y,y=f(x)\land z=g(y)\}\). So for any \(x\in X\), suppose there are \(z_1\) and \(z_2\) in \(Z\) such that \((x,z_1)\) and \((x,z_2)\) are both in \(g\circ f\), then there is a unique \(y\in Y\) such that \(y=f(x)\), but \(z_1=g(y)=z_2\). This concludes the proof.


Example

  1. Let \(X=\{1,2,3\}\), \(Y={4,5,6}\), \(f=\{(1,4),(2,5),(3,6)\) is a function. We more customarily write it as \[f:\{1,2,3\}\to\{4,5,6\}\]\[x\mapsto x+3\] In contrast, \(g=\{(1,4),(1,5),(2,6),(3,6)\}\) is not a function.

  2. \(f=\{(x,y)\in \mathbb R^2|y=x^2\}\) is a function from \(\mathbb R\) to \(\mathbb R\). In contrast \(g_1=\{(x,y)\in \mathbb R^2|x=y^2\}\) is not a function from \(\mathbb R\) to \(\mathbb R\). However, \(g_2=\{(x,y)\in \mathbb R\times \mathbb R_{\ge0}|x=y^2\}\) is a function from \(\mathbb R\) to \(\mathbb R_{\ge0}\), the non-negative real numbers.

  3. Let \(f:\mathbb Z\to\mathbb Z\) and \(g:\mathbb Z\to\mathbb Z\) be given by \(f(x)=x^2+1\) and \(g(x)=2x-1\) respectively. Then \(f\circ g:\mathbb Z\to\mathbb Z\) and \(g\circ f:\mathbb Z\to\mathbb Z\) are both functions given by \(f\circ g(x)=f(g(x))=g(x)^2+2=(2x-1)^2+2=4x^2-4x+3\) and \(g\circ f(x)=g(f(x))=2f(x)-1=2x^2+3\). So composition is not commutative.


Generalized Cartesian product

For an indexing set \(I\) such that for any \(i\in I\),\(A_i\) is a set. We can define the cartesian product of \(\{A_i|i_\in I\}\), denoted by \(\prod_{i\in I} A_i\), to be \(\{f:I\to\bigcup_{i\in I}A_i|\forall i\in I,f(i)\in A_i\}\). Think about this -- in which sense this is a generalization of the original notion. (Hint: think about the set \(\{1,2\}\))

Axiom of choice

In the above notion, if \(I\ne\emptyset\) and \(\forall i\in I,A_i\ne\emptyset\), then \(\prod_{i\in I}A_i\) is not empty.

Axiom of choice is an axiom of vital importance, but it has some consequences weird enough for it to be debatable. Try google Banach-Tarski paradox or look at Halmos (2017). Really, it is fun.


Image and inverse image

Let \(f:X\to Y\), \(A\subseteq X\) adn \(B\subseteq Y\). By \(f(A)\), the image of \(A\) under \(f\). we mean \(\{y\in Y|\exists x\in A,y=f(x)\}\); by \(f^{-1}(B)\), the preimage of \(B\) of \(f\), we mean \(\{x\in X|\exists y\in B,y=f(x)\}\).

Not important to us, but we have two induced function:

  1. \(\mathrm{im}:\mathcal P(X)\to\mathcal P(Y)\) given by \(A\mapsto f(A)\);
  2. \(\mathrm{preim}:\mathcal P(Y)\to\mathcal P(X)\) given by \(B\mapsto f^{-1}(B)\)

Theorem

Let \(f:X\to Y\), \(A_1, A_2\subseteq X\) and \(B_1, B_2\subseteq Y\), then

  1. \(\mathrm{ran}f=f(X)\);
  2. \(f(A_1\cup A_2)=f(A_1)\cup f(A_2)\);
  3. \(f(A_1\cap A_2)\subseteq f(A_1)\cap f(A_2)\);
  4. \(A\subseteq B\implies f(A)\subseteq f(B)\)
  5. \(f^{-1}(B_1\cup B_2)=f^{-1}(B_1)\cup f^{-1}(B_2)\)
  6. \(f^{-1}(B_1\cap B_2)=f^{-1}(B_1)\cap f^{-1}(B_2)\)
  7. \(B_1\subseteq B_2\implies f^{-1}(B_1)\subseteq f^{-1}(B_2)\).

Proof

  1. By definition;
  2. \[ \begin{aligned} y\in f(A_1\cup A_2) \iff&\exists x\in A_1\cup A_2,y=f(x)\\ \iff&\exists x,x\in A_1\cup A_2\land y=f(x)\\ \iff&\exists x,(x\in A_1\lor x\in A_2)\land y=f(x)\\ \iff&\exists x, (x\in A_1\land y=f(x))\\&\lor(x\in A_2\land y=f(x))\\ ^\dagger\iff&(\exists x, (x\in A_1\land y=f(x)))\\&\lor(\exists x, (x\in A_2\land y=f(x)))\\ \iff&(\exists x\in A_1,y=f(x))\\&\lor(\exists x\in A_2,y=f(x))\\ \iff&x\in f(A_1)\cup f(A_2) \end{aligned}\]

The rest are exercises. Pay special attention to 3 to see why the step corresponding to \(\dagger\) doesn't work, or mayber carefully think why \(\dagger\) in 2 works.


Injction, surjection and inverse functions.

Let \(f:X\to Y\) be a function

\(f\) is an injection (also one-to-one or monomorphism) if and only if: \[ \forall x_1\in X,\forall x_2\in X,[(f(x_1)=f(x_2))\implies x_1=x_2]; \] \(f\) is an surjection (also onto or epimorphism) if an only if \(\mathrm{ran}f=Y\), equivalently \[ \forall y\in Y,\exists x\in X,y=f(x) \] If \(f\) is both injective and surjective, \(f\) is a bijection (or isomorphism).

Remark

If \(f:X\to Y\) is an injection, then \(\bar f:X\to\mathrm{ran}f\) defined by \(x\mapsto f(x)\) is a bijection.


Example

  1. For any set \(X\), \(i_X\) is a bijection.
  2. Let \(X = \{1,2,3,4\}\),\(Y = \{5,6,7\}\). Define \(f:X\to Y\) as \(f:=\{(1,5), (2, 6), (3, 7), (4, 6)\}\). Then \(f\) is an surjection. But define \(g:X\to Y\) as \(g:=\{(1,5), (2, 6), (3, 5), (4, 6)\}\), then g is not surjective, for example \(7\notin\mathrm{ran}g\)
  3. Let \(f :\mathbb N \to\mathbb N\) defined by \(f(n) = n^2\). Then \(f\) is an injection. In contrast let \(g : \mathbb Z \to \mathbb Z\) defined by \(g(n) = n^2\). Then \(g\) is not an injection.

Inverses

Let \(f:X\to Y\) be a function.

A function \(g:Y\to X\) is said to be a left inverse (or retraction) of \(f\) if and only if \(g\circ f=i_X\);

A function \(h:Y\to X\) is said to be a right inverse (or section) of \(f\) if and only if \(f\circ h=i_Y\).

A function \(j:Y\to X\) is said to be an inverse of \(f\) if and only if \(f\circ j=i_Y\) and \(j\circ f=i_X\). In particular, an inverse is both a left inverse and a right inverse.


Theorem

  1. If a left inverse of \(f\) exists, then \(f\) is injective;
  2. If a right inverse of \(g\) exists, then \(f\) is surjective;
  3. If an inverse exists then \(f\) is bijective;

Proof

  1. Fix \(x_1\), \(x_2\) to be two elements of \(X\). Suppose \(f(x_1)=f(x_2)\), then \(g\circ f(x_1)=g\circ f(x_2)\) then by g being left inverse, \(x_1=x_2\).
  2. For any \(y\in Y\), \(f(h(y))=y\).
  3. 1 and 2.

Assume \(X\) and \(Y\) are nonempty and \(f:X\to Y\) is a function

Theorem

  1. If \(f\) is injective, then a left inverse exists;
  2. If \(f\) is surjective, then a right inverse exists;
  3. If \(f\) is bijective, then an inverse exists;

Proof

  1. Fix \(a\in X\). Define \(g:X\to Y\) by \[ y\mapsto\left\{ \begin{aligned} x &\ \ \ \exists x\in X,y=f(x)^\dagger\\ a &\ \ \ y\notin\mathrm{ran}f\\ \end{aligned}\right. \]

This is well-defined because of injectivity of \(f\): the \(x\) is unique in \(\dagger\).

Then \(g\circ f(x)=x\)

  1. By surjectivity of \(f\), for any \(y\in Y\), \(f^{-1}(\{y\})\) is non-empty and \(Y=\bigcup_{y\in Y}f^{-1}(\{y\})\). By the axiom of choice, there is a function \(h:Y\to X\) such that for any \(y\in Y\), \(h(y)\in f^{-1}(\{y\})\).

  2. The \(g\) in part 1 is actually also a right inverse.


Theorem

If \(f:X\to Y\) is a bijective, then the inverse function is unique. Henceforward we denote it as \(f^{-1}\).

Proof

Suppose \(g_1:Y\to X\) and \(g_2:Y\to X\) are two inverse functions of \(f\). Then for any \(g_1=g_1\circ i_Y=g_1\circ(f\circ g_2)=(g_1\circ f)\circ g_2=i_X\circ g_2=g_2\).


Exercise

  • Let \(\mathfrak F\subseteq \mathcal P(\mathbb N)\) be the collection of all finite subsets of natural number. Define a relation on \(\mathfrak F\) to be \(S_1\approx S_2\iff\exists f:S_1\to S_2, f\textrm{ bijective}\).
  1. Prove \(\approx\) is an equivalence relation.
  2. What is \([\emptyset]_\approx\)?
  3. In general, what is \([\{1,2,...,n\}]_\approx\)
  • [Hard] Prove that if there is an injective function \(f:X\to Y\) and an injective function \(g:Y\to X\), then there is a bijective function \(h:X\to Y\). If you find this too hard, read and understand the Schröder–Bernstein theorem.

Summary

In this lecture we covered basic naïve theory and defined notion of relation and function. Perhaps the most important two points (not just for this course) you should take are the notion of equivalence relation and the notion of bijective function.

Mainly \(\mathbb N\) and \(\mathbb Z\) (Lecture 3)

In this lecture, we'll be talking about the construction of natural numbers, integers and rational numbers. There arithmetic will be defined. In the end you can basically cover everything you already knew about natural numbers integers and rational numbers, but now you can say with (more) confidence you actually know what a natural number, integer and rational number are.

\(\mathbb N\)

Without the time to do properly the full picture, but here it comes:

A set \(x\) is inductive if: - \(\emptyset\in x\) - \(\forall a\in x, a\cup\{a\}\in x\)

\(a\cup\{a\}\) is called the successor of \(a\), sometime we write it as \(S(a)\).

Another axiom of set theory is the axiom of infinity asserting existence of an inductive set.

We have the following theorem

For any two set \(x\) and \(y\), if \(x\) and \(y\) are both inductive, then so is \(x\cap y\)

\(\emptyset \in x\cap y\) because \(\emptyset\in x\) and \(\emptyset\in y\). Suppose \(a\in x\cap y\), then since \(a\in x\) (\(\in y\), resp.) we have \(a\cup\{a\}\in x\) (\(\in y\) resp.), thus \(a\cup\{a\}\in x\cup y\).

Let \(x\) be the inductive set endorsed by the axiom of infinity, we define the natural number as the smallest inductive subset of \(x\) \(\bigcap_{y\in\mathcal P(x) \textrm{inductive}} y\)

This intimidating definition is actually quite nice for \(\mathbb N\) being inductive must contain \(\{\emptyset, S(\emptyset), S(S(\emptyset)) , ...\}\) and \(\{\emptyset, S(\emptyset), S(S(\emptyset)) , ...\}\) is indeed inductive hence we have \(\mathbb N = \{\emptyset, S(\emptyset), S(S(\emptyset)) , ...\}\). Still intimidating? (This is only an aid for clarification) Maybe instead of \(\emptyset\), we write it as \(0\) and instead of \(S(\emptyset)\) we write it as \(1\) etc. Then

  1. \(0 = \emptyset\)
  2. \(1 = S(0) = S(\emptyset)=\emptyset\cup\{\emptyset\}=\{\emptyset\}=\{0\}\)
  3. \(2 = S(1) = 1\cup\{1\}=\{0\}\cup\{1\}=\{0,1\}\)
  4. In general, \(n+1 = \{0,...,n\}\)

So \(n\) is a set with \(n\) elements. In particular \(0\ne S(n)\) for any \(n\). It requires more set theory than that I would like to develop3 to prove \(S\) is injective, i.e \(S(a)=S(b)\implies a=b\), but this is true.

Now we can show the principle of induction:

Let \(P\) be some property of natural number, if \(P(0)\) and \(\forall m\in\mathbb N,P(m)\implies P(m+1)\) then all natural number satisfies \(P\).

Let \(y:=\{n\in\mathbb N|P(n)\}\). Then \(y\) is inductive: \(0\in y\) and for any \(m\in y\), since \(P(m)\) we have \(P(m+1)\) hence \(m+1\in y\). So \(y\) is an inductive subset of \(\mathbb N\) who is the smallest inductive subset. Hence \(y=\mathbb N\). In otherword, all natural number has the property \(P\).

For every \(n \in\mathbb N\), we can define an "adding n" function, for convenience we write it as \(n+\): 1. \(n + 0 := n\) 2. \(n + S(m) := S(n+m)\)

For example the function "adding 0" does nothing, and we prove this by induction:

  1. \(0+0=0\) by definition
  2. Suppose \(0+m=m\), then \(0+S(m)=S(0+m)=S(m)\)

From this we can prove the commutativity: \(\forall n\in\mathbb N\forall m\in\mathbb N,n+m=m+n\).

  1. From above we proved \(\forall m\in\mathbb N,0+m=m=m+0\)
  2. Suppose \(\forall m\in\mathbb N,n+m=m+n\). Let's now prove \(\forall m\in\mathbb N,S(n)+m=m+S(n)\) by induction on \(m\):
  • \(S(n)+0=S(n)=0+S(n)\)
  • Suppose \(S(n)+m=m+S(n)\), then \(S(n)+S(m)=S(S(n)+m)=S(m+S(n))=S(S(m+n))\). Similarly \(S(m)+S(n)=S(m+S(n))=S(S(m+n))\).

Associativity is proved in a similar (but more sophisticated) manner.

The function "adding \(n\)" is injective: if \(n+a=n+b\) then \(a=b\). This is more wildly known as the cancellation property. - "adding \(0\)" is injective, indeed: \(0+a=0+b\) by definition is \(a=b\). - Suppose "adding \(n\)" is injective then \(S(n)+a=S(n)+b\implies a+S(n)=b+S(n)\implies S(a+n)=S(b+n)\implies a+n=b+n\implies n+a=n+b\implies a=b\).

Similarly we can define a function "multiplying \(n\)", denote it by \(n\times\): 1. \(n\times 0:=0\) 2. \(n\times S(m) := n\times m+ n\)

So for example \(n\times 1=n\times S(0)=n\times0+n=n\). We can of course prove the commutativity and asscociativity for multiplication. These are left as exercises.

From addition we can define "less than" (\(<\)) and and "divisibility" (\(|\)) amongst other things: Let \(m\), \(n\) be natural numbers

  1. \(n < m\iff\exists c\in\mathbb N,m=n+c\)
  2. \(n\le m\iff n<m\lor n=m\)
  3. \(n | m\iff\exists k\in\mathbb N,m=n\times k\).
Next important theorem is the well-ordering property of natural number.

Let \(s\) be an non-empty subset of natural number, then \(s\) has a least element. This is \(\exists n_0\in s,\forall m\in s,n_0 \le m\).

We proof by contradiction. Suppose \(s\) is non-empty without a least element. Define \(Z:=\{k\in\mathbb N|\forall n\in\mathbb N,(n<k)\implies(n\notin s)\}\). Then \(0\in Z\). Suppose \(k\in Z\), let \(m<S(k)\) then either \(m<k\) or \(m=k\). If \(m<k\), by \(k\in Z\) we have \(m\notin s\). If \(m=k\),then \(m\notin s\) for otherwise \(m\) is the least element of \(s\). Hence as long as \(m< S(k)\), we have \(m\notin s\). Hence \(Z\) is inductive. Then \(Z=\mathbb N\). Then \(s=\emptyset\), this is a contradiction.

From the well-ordering property, we have the strong induction principle:

Let \(P\) be a property about natural numbers. Suppose \(P(0)\) and for any natural number \(n\), \(P(0)\land P(1)\land...\land P(n)\implies P(n+1)\). Then all natural numbers have property \(P\).

Let \(s:=\{n\in\mathbb N|\neg P(n)\}\). Then if \(s\) is non-empty, \(s\) will have least element \(n_0\). \(n_0\) cannot be \(0\), since \(P(0)\). Then \(P(1)\),....,\(P(n_0-1)\) but \(\neg P(n_0)\). Contradiction. Hence \(s=\emptyset\).

In fact well-ordering principle \(\iff\) strong induction. Please fill in the other direction.

Example and practice using induction

\[\sum_{i=0}^n i^2=\frac{n(n+1)(2n+1)}{6}\]

  • For \(n=0\), left hand side is \(0\) and so the right hand side.
  • Suppose \(\sum_{i=0}^n i^2=\frac{n(n+1)(2n+1)}{6}\) then \(\sum_{i=0}^{n+1} i^2=\sum_{i=0}^n i^2+(n+1)^2=\frac{n(n+1)(2n+1)}{6}+(n+1)^2=\frac{n(n+1)(2n+1)+6(n+1)^2}{6}=\frac{(n+1)(n(2n+1)+6(n+1))}{6}=\frac{(n+1)(n+2)(2n+3)}{6}\)

\[\sum_{i=0}^n (2i+1)=(n+1)^2\]

  • For \(n=0\), left hand side is \(1\) is indeed \((0+1)^2\).
  • Suppose \(\sum_{i=0}^n (2i+1)=(n+1)^2\) then \(\sum_{i=0}^{n+1} (2i+1)=(n+1)^2+(2n+3)=n^2+2n+1+2n+3=n^2+4n+4=(n+2)^2\)

Let \(0!=1\) and \((n+1)!=n!\times(n+1)\). Let \({n\choose k}=\frac{n!}{(n-k)!k!}\) for \(k\le n\) and \(0\) for \(k>n\)


\[{n\choose k+1}+{n\choose k}={n+1\choose k+1}\]

  • For \(k>n\) both lhs, rhs are 0.
  • For \(k=n\) then lhs is \(0+1\) while rhs is 1.
  • For \(k<n\) then
\[\begin{equation}\nonumber \begin{aligned} {n+1\choose k+1}&=\frac{(n+1)!}{(k+1)!(n-k)!}\\ {n\choose k+1}+{n\choose k}&=\frac{n!}{(n-k-1)!(k+1)!}+\frac{n!}{k!(n-k)!}\\ &=\frac{n!(n-k)+n!(k+1)}{(k+1)!(n-k)!} \end{aligned} \end{equation}\]

\[(a+b)^n=\sum_{i=0}^n{n\choose i}a^ib^{n-i}\]

  • \(n=0\), lhs is \(1\); rhs is \({0\choose 0}a^0b^0=1\).
  • Suppose \((a+b)^n=\sum_{i=0}^n{n\choose i}a^ib^{n-i}\), then
\[\begin{equation}\nonumber \begin{aligned} (a+b)^{n+1}&=\left (\sum_{i=0}^n {n\choose i}a^i b^{n-i}\right)\times(a+b)\\ &=\sum_{i=0}^n{n\choose i}a^{i+1}b^{n-i} + \sum_{i=0}^n{n\choose i}a^ib^{n-i+1} \\ &=a^{n+1}+\sum_{i=0}^{n-1}{n\choose k}a^{k+1} b^{n-k}+b^{n+1}+\sum_{i=1}^n{n\choose k}a^k b^{n-k+1}\\ \end{aligned} \end{equation}\]
\[\begin{equation} \nonumber \begin{aligned} (a+b)^{n+1}&=a^{n+1}+\sum_{i=0}^{n-1}{n\choose k}a^{k+1} b^{n-k}+b^{n+1}+\sum_{i=0}^{n-1}{n\choose k+1}a^{k+1} b^{n-k}\\ &=a^{n+1}+b^{n+1}+\sum_{i=0}^{n-1}{n+1\choose k+1}a^{k+1}b^{n-k}\\ &=a^{n+1}+b^{n+1}+\sum_{i=1}^{n}{n+1\choose k}a^{k}b^{n+1-k}\\ &=\sum_{i=0}^{n+1}{n+1\choose k}a^{k}b^{n+1-k} \end{aligned} \end{equation}\]

Define the fibbonacci number to be \(F_0 = 0\), \(F_1=1\) and \(F_n = F_{n-1}+F_{n-2}\). Write \(\phi = \frac{\sqrt 5-1}{2}\)

\[F_n=\frac{1}{\sqrt 5}\left(\left(\frac{1+\sqrt 5}{2}\right)^n-\left(\frac{1-\sqrt 5}{2}\right)^n\right)\]

In this example, we need to use strong induction.

  • \(n=0\), check
  • \(n=1\), rhs = \(\frac{1}{\sqrt 5}\left(\frac{1+\sqrt 5-1+\sqrt 5}{2}\right)=1\)

  • Suppose hold for \(0\) up to \(n\). Then write \(\phi = \frac{1+\sqrt 5}{2}\). Then \(F_n=\frac{1}{\sqrt 5}\left(\phi^n-(-\frac{1}{\phi})^n\right)\) and \(F_{n-1}=\frac{1}{\sqrt 5}\left(\phi^{n-1}-(-\frac{1}{\phi})^{n-1}\right)\). Note \(\phi ^ {n+1}-\phi^n-\phi^{n-1}=\phi^{n-1}\times(\phi^2-\phi-1)=\phi^{n-1}\times(\frac{3+\sqrt 5}{2}-\frac{1+\sqrt 5}{2}-1)=0\). Similarly \(\left(-\frac{1}{\phi}\right)^{n+1}-\left(-\frac{1}{\phi}\right)^n-\left(-\frac{1}{\phi}\right)^{n-1}=\left(-\frac{1}{\phi}\right)^{n-1}\times\left(\left(-\frac{1}{\phi}\right)^2-\left(-\frac{1}{\phi}\right)-1\right)=\left(-\frac{1}{\phi}\right)^{n-1}\times\left(\frac{2}{3+\sqrt 5}+\frac{2}{\sqrt 5+1}-1\right)=\left(-\frac{1}{\phi}\right)^{n-1}\times\left(\frac{3-\sqrt5}{2}+\frac{\sqrt 5-1}{2}-1\right)=0\). Hence we have \(F_{n+1}=\frac{1}{\sqrt 5}\left(\left(\frac{1+\sqrt 5}{2}\right)^{n+1}-\left(\frac{1-\sqrt 5}{2}\right)^{n+1}\right)\)

\(\mathbb Z\)

Definition

This is the first descent application of equivalence relation.

Let us define a relation \(\sim\) on \(\mathbb N\times\mathbb N\) to be \((a,b)\sim(x,y)\iff a+y=b+x\).

\(\sim\) is an equivalence relation.

  • reflexivity: for all \((a,b)\in\mathbb N^2\), since \(a+b=b+a\), we must have \((a,b)\sim(a,b)\).
  • symmetry: \((a,b)\sim(x,y)\iff a+y=b+x\iff x+b=y+a\iff(x,y)\sim(a,b)\).
  • transitivity: \((a,b)\sim(x,y)\land(x,y)\sim(\phi,\psi)\) then \(a+y=b+x\land x+\psi=y+\phi\) then \(a+y+\phi=b+x+\phi\) then \(a+x+\psi=b+x+\phi\) then \(a+\psi = b+\phi\).

\(\mathbb Z\) is the set of equivalence classes under \(\sim\). In maths jibber jabber this is \(\mathbb N^2\) modulo \(\sim\), written as \(\frac{\mathbb{N}^2}\sim\).


Let us look at some examples of elements in \(\mathbb Z\)

  • \([(3,0)]_\sim\) is also \([(4,1)]_\sim\) or \([(5,2)]_\sim\) etc.
  • \([(0,3)]_\sim\) is also \([(1,4)]_\sim\) or \([(2,5)]_\sim\) etc.
  • \([(0,0)]_\sim\) is also \([(1,1)]_\sim\) or \([(2,2)]_\sim\) etc.

Every elements in \(\mathbb Z\) can be written as \([(a,0)]_\sim\) or \([(0,a)]_\sim\). Such an \(a\) is unique.

Say we are talking about \([(x,y)]_\sim\). If \(x<y\) then by definition there is some \(z\) such that \(y=x+z\), then \([(x,y)]_\sim=[(0,z)]_\sim\). If \(x=y\), then \([(x,y)]_\sim=[(0,0)]_\sim\). If \(y<x\), then for some \(z\) we have \(y+z=x\) then \([(z,0)]_\sim\) will do.

\([(a,0)]_\sim=[(b,0)]_\sim\implies(a,0)\sim(b,0)\implies a=b\). The other case is basically the same.

So now the definition may be clearer: \([(a,0)]_\sim\) are the "positive integers" and \([(0, a)]_\sim\) are the "negative integers".


Arithmetic on \(\mathbb Z\)

Addition and subtraction

Let \([(a,b)]\) and \([(x,y)]\) be integers

We define \([(a,b)]+[(x,y)]=[(a+x, b+y)]\). We need to check whether this is well-defined: Suppose \((a,b)\sim(a',b')\) and \((x,y)\sim(x',y')\). Then \((a+x,b+y)\sim(a'+x',b'+y')\) because \((a+x)+(b'+y')=(a+b')+(x+y')=(b+a')+(y+x')=(b+y)+(a'+x')\)

Some examples:

  • \([(1,0)]+[(0,1)]=[(1,1)]=[(0,0)]\). (Informally \(1+(-1)=0\))
  • \([(0,1)]+[(0,2)]=[(0,3)]\) (Informally \((-1)+(-2)=-3\))
  • \([(4,5)]+[(12,6)]=[(16,11)]=[(5,0)]\). (Informally \((-1)+6=5\))

We can define \([(a,b)]-[(x,y)]=[(a,b)]+[(y,x)]\). (Informally \(z_1-z_2=z_1+(-z_2)\)). Thankfully we checked addition is well defined, we don't have to check again.

We can of course check commutativity associativity of addition on \(\mathbb Z\) and so on.


Multiplication

We define \([(a,b)]\times[(x,y)]=[(ax+by,ay+bx)]\). We also need to check whether this is well-defined: Suppose \((a,b)\sim(a',b')\) and \((x,y)\sim(x',y')\). then \(x(a+b')+y(a'+b)+a'(x+y')+b'(x'+y)=x(a'+b)+y(a+b')+a'(x'+y)+b'(x+y')\). Expand out and cancel what can be cancelled we get \(ax+by+a'y'+b'x'=bx+ay+a'x'+b'y'\). Rearrange what get what we want.

Some examples:

  • \([(1,1)]\times[(1,0)]=[(1,1)]=[(0,0)]\). (Informally \(0\times1=0\))
  • \([(0,1)]\times[(0,1)]=[(1,0)]\). (Informally \((-1)\times(-1)=1\))

Order

Let \(x\) be an element in \(\mathbb Z\). We say \(x\ge 0\) (\(x\) is non-negative) if \(x=[(k,0)]\) for some \(k\in\mathbb N\). We say \(x\le 0\) (\(x\) is non-positive) if \(x=[(0,k)]\) for some \(k\in\mathbb N\). From previous theorem, this is always possible.

Thus we can define an order on \(\mathbb Z\): for any two integers \(x\) and \(y\), \(x\le y\iff y-x\le 0\).

Let \(x\) and \(y\) be two non-positive integer then \(x\times y\) is non-negative.

So for some \(k_1,k_2\in\mathbb N\), \(x=[(0,k_1)]\) and \(y=[(0,k_2)]\). Then \(x\times y\) be definition is \([(k_1\times k_2,0)]\), so non-negative.

So you must have seen \([(k,0)]\) is just \(k\) and \([(0,k)]\) is just \(-k\) like what we learnt in primary school.


Exercise

  1. Let \(x,y,z\) be integers, prove \(x(y+z)=xy+xz\).
  2. Let \(a,b\) be integers, prove \(a-b+b=a\)
  3. Google what is integral domain and prove that \(\mathbb Z\) is an integral domain. If too boring, assume \(\mathbb Z\) is a ring, so just prove \(xy=0\implies x=0\lor y=0\)

Field

A field is a set \(F\) with addition and multiplication on \(F\) such that

So \(\mathbb N\) is not a field for \(1\) doesn't have additive inverse "\(-1\in\mathbb N\)"; \(\mathbb Z\) is not a field for \(2\) doesn't have multiplicative inverse \(\frac{1}{2}\in\mathbb Z\). Let's fix this by constructing \(\mathbb Q\).


\(\mathbb Q\)

The construction of rational from integers is awfully similar to construction of integers. We define everything but leave everything else as exercise.

We define a relation on \(\mathbb Z\times(\mathbb Z-\{0\})\) to be \((a,b)\approx(c,d)\iff ad=bc\).

\(\approx\) is equivalence relation.

\(\mathbb Q:=\frac{\mathbb Z\times(\mathbb Z-\{0\})}{\approx}\) is the set of equivalence classes.

Let \([(a,b)]_\approx\) and \([(c,d)]_\approx\) be rational numbers.

  • addition is defined as \([(a,b)]_\approx+[(c,d)]_\approx[(ad+bc,bd)]_\approx\);
  • \(-[(a,b)]_\approx=[(-a,b)]_\approx\). \([(a,b)]_\approx-[(c,d)]_\approx=[(a,b)]_\approx+(-[(c,d)]_\approx)\)
  • multiplication is defined as \([(a,b)]_\approx\times[(c,d)]_\approx=[(ac,bd)]_\approx\);

Addition and multiplication is well defined.Hence subtraction is well defined.

  • \([(a,b)]_\approx\ge0\) if and only if \(a\times b\ge0\) as an inequality of integers.
  • \([(a,b)]_\approx\ge[(c,d)]_\approx\) if and only if \([(a,b)]_\approx-[(c,d)]_\approx\ge0\)

In the end instead of \([(a,b)]_\approx\), we write \(\frac{a}{b}\) like we did in primary school.

Big Exercise

Prove that \(\mathbb Q\) is indeed a field.

Summary

So natural number is the smallest inductive set. So integer is \(\frac{\mathbb N^2}{\sim}\). So rational number is \(\frac{\mathbb Z\times(\mathbb Z-\{0\})}{\approx}\). So we learnt so much yet in a sense nothing new.

\(\mathbb R\) and the completeness axiom (Lecture 4)

In this lecture, we are going to construct the real numbers and prove the completeness axiom.

So first of all, why isn't \(\mathbb Q\) enough, after all \(\mathbb Q\) is a field already? In some sense \(\mathbb Q\) is big enough already. For any two distinct elements \(p, q \in \mathbb Q\), there is a third rational number in between, namely \(\frac{p+q}{2}\). But in some other sense \(\mathbb Q\) is not very big, indeed \(\mathbb Q\) is countable (will be explained below), loosely meaning \(\mathbb Q\) is only as big as \(\mathbb N\). But the main problem is that \(\mathbb Q\) has a lot of "holes". Let me explain. \(\{x\in\mathbb Q|x^2<2\}\) and \(\{x\in\mathbb Q|x^2 > 2\}\) covers \(\mathbb Q\) (meaning their union is whole of \(\mathbb Q\)). However this cover is missing the "\(x\)" such that \(x^2=2\). We will fix this in this lecture.

Countability

A set \(X\) is said to be countable if and only if there is a bijection from \(X\) to some subset of \(\mathbb N\). So all finite sets are countable. \(\mathbb Z\) is countable. Indeed, define a function \(f : \mathbb Z\to\mathbb N\) as following:

\[ \left\{\begin{aligned}0&\mapsto0\\n&\mapsto 2n\\-n&\mapsto 2n+1\end{aligned}\right. \] This is a bijection.


\(\mathbb Q\) is countable as well. We use the Cantor's diagonal argument. This is just to illustrate, to prove this formally, we need something like cartesian product of countable set is countable and so on, perhaps even Schroder-Berstein.

\[ \frac{1}{1}, \frac{2}{1}, \frac{3}{1}, \frac{4}{1},...\] \[\frac{1}{2}, \frac{3}{2}, \frac{5}{2}, \frac{7}{2},...\] \[\frac{1}{3}, \frac{2}{3}, \frac{4}{3}, \frac{5}{3},...\]

Definition of \(\mathbb R\)

In this section we are following Zorich, Mathematical Analysis.

Any set \(\mathbb R\) is called a set of real numbers if the following axioms holds:

Axioms for addition

An operation \(+:\mathbb R \times \mathbb R \to \mathbb R\) should satisfy:

  1. There is an additive identity called \(0\) such that for every \(x\in\mathbb R\), \(x+0=0+x=x\).
  2. For every \(x\in\mathbb R\),there exists \(-x\in\mathbb R\) such that \(x+(-x)=(-x)+x=0\).
  3. \(+\) is associative: for every \(a,b,c\in\mathbb R\), \(a+(b+c)=(a+b)+c\).
  4. \(+\) is commutative: for every \(a,b\in\mathbb R\), \(a+b=b+a\).

Axioms for multiplication

An operation \(.:\mathbb R\times\mathbb R\to\mathbb R\) should satisfy:

  1. there is a multiplicative identity \(1\in\mathbb R-\{0\}\) such that for all \(x\in\mathbb R\), \(x1=1x=x\)
  2. For every \(x\in\mathbb R-\{0\}\), there exists an element \(x^{-1}\in\mathbb R\) such that \(xx^{-1}=x^{-1}x=1\).
  3. multiplication is associative: for any \(a,b,c\in\mathbb R\), \(a(bc)=(ab)c\).
  4. multiplication is commutative: for any \(a,b\in\mathbb R\), \(ab=ba\).

Connection between addition and multiplication

Multiplication should distribute with respect to addition: for every \(a,b,c\in\mathbb R\),\((a+b)c=ac+bc\).

Oder axioms

There is \(\le\) on \(\mathbb R\) such that:

  1. \(\forall x\in\mathbb R(x\le x)\)
  2. \(\forall x,y\in\mathbb R,(x\le y)\land (y\le x)\implies(x=y)\)
  3. \(\forall x,y,z\in\mathbb R,(x\le y)\land(y\le z)\implies(x\le z)\)
  4. \(\forall x,y\in\mathbb R, (x\le y)\lor (y\le x)\)

Connection between order and addition

For any \(x,y,z\in\mathbb R\), \((x\le y)\implies(x+z\le y+z)\).

Connection between order and multiplication

For any \(x,y\in\mathbb R\), \((0\le x)\land (0\le y)\implies(0\le xy)\).

Axiom of completeness

A set \(X\subseteq\mathbb R\) is said to be bounded above if and only if \(\exists K\in\mathbb R,\forall x\in X,x\le K\). Such \(K\) is called an upper bound. A supremum of \(X\) is an upper bound \(a\) such that \(\forall \epsilon>0\exists b\in X,b>a-\epsilon\). Let \(\emptyset\ne X\subseteq\mathbb R\) be bounded above. \(X\) attains a supremum in \(\mathbb R\).

A dual notion of bounded above and supremum is bounded below and infimum: A set \(X\subseteq\mathbb R\) is said to be bounded above if and only if \(\exists K\in\mathbb R,\forall x\in X,x\ge K\). Such \(K\) is called an upper bound. A infimum of \(X\) is an upper bound \(a\) such that \(\forall \epsilon>0\exists b\in X,b<a+\epsilon\).

\(\mathbb N\) lives in real as \(\{0,1,1+1,1+1+1,...\}\). \(\mathbb Z\) lives in real as \(\mathbb N\) and all the additive inverse of \(\mathbb N\). \(\mathbb Q\) lives in real multiples of all the multiplicative inverse of \(\mathbb Z\).


Consequences of axiom of completeness

Let \(\emptyset\ne X\subseteq\mathbb R\) be bounded below \(X\) attains a infimum in \(\mathbb R\).

Let \(Y\) be the set \(\{-x|x\in X\}\). \(Y\) is bounded above: Since \(\exists K\in\mathbb R,\forall x\in X, x\ge K.\) Then \(-x\le -K\). Hence \(-K\) is the upper bound of \(Y\). By axiom of completeness, \(Y\) has supremum \(a\). Then \(-a\) is the infimum of \(X\): \(a\) is upper bound of \(Y\), so \(-a\) is upper bound of \(X\). Fix \(\epsilon>0\), \(\exists y\in Y,y>a-\epsilon\). Then \(-y\in X\) and \(-y<-a+\epsilon\).


[Archimedean Principle] \[\forall x\in\mathbb R\exists n\in\mathbb Z,n\le x\]

Otherwise, for some \(x\), \(\forall n\in\mathbb Z,n>x\). So \(\mathbb Z\) is bounded above. Then \(\sup \mathbb Z\) exists. Then \(\sup \mathbb Z-1\) is not upper bound for \(\mathbb Z\). So for some integer \(m\in \mathbb Z\), \(m>\sup\mathbb Z-1\). Then \(m+1\) being integer is \(>\sup\mathbb Z\). Contradiction.

So \(\mathbb N\) is bounded. So \(\forall x>0,\exists n\in\mathbb N,\frac{1}{n}<x\). Because \(\exists n\in\mathbb Z,n>\frac{1}{x}\), then rearrange.


[well-ordering of \(\mathbb Z\)] Any non-empty bounded below subset \(S\) of \(\mathbb Z\) has a minimum (i.e. supremum actually \(\in S\)).

So \(S\) has infimum \(s\). Then \(s+1\) is not a lower bound, hence for some \(m\in S\), \(m<s+1\). Then \(m-1<s\le m\). Actually \(m\) is the minimum of \(S\) because otherwise for some \(n\in S\) with \(n<m\). Then \(m>n\ge s>m-1\). So \(n>m-1\), this is the same as \(n\ge m\). So \(m>n\ge m\), i.e \(m>m\). Contradiction.

So nonempty bounded above subset of \(\mathbb Z\) has a maximum.


\[\forall a\in\mathbb R,\forall b\in \mathbb R [(b>a)\implies(\exists r\in\mathbb Q, a<r<b)]\]

By Archimedean, \(\exists n\in\mathbb N,\frac{1}{n}<(b-a)\). So \(na < nb-1\). Let \(S:=\{m\in\mathbb Z|m<nb\}\). So \(S\) is bounded above and nonempty. So \(S\) has maximum \(m\). \(m<nb\le m+1\). The first inequality is from \(m\in S\). The second is from \(nb-1\in S\), then \(nb-1\le m\) then \(nb\le m+1\). So \(na<nb-1\le (m+1)-1=m<nb\). So \(na<m<nb\), divide n, \(a<\frac{m}{n}<b\)


\[\exists! x>0,x^2=2\]

Existence: take \(x\) to be \(\sup\{y|y^2\le2\}\) which is nonempty (\(1\) is in that) and bounded above by 2. We now prove \(x^2\) is indeed \(2\): Otherwise \(x^2<2\) or \(x^2 > 2\).

  • if \(x^2>2\). Let \(\epsilon = \frac{x^2-2}{2x}\). Then \(\epsilon>0\) and \((x-\epsilon)^2=x^2-2x\epsilon+\epsilon^2>x^2-2x\epsilon=x^2-2x\frac{x^2}{2x}=2\). Then \(x-\epsilon\) is also upper bound but \(x\) is the smallest.

  • if \(x^2<2\). Let \(\epsilon =\frac{2-x^2}{2x+1}\). Then \(0<\epsilon<1\) (\(<1\) because \((x+1)^2>2\) otherwise \(x+1\in\{y|y^2\le 2\}\), then \(x+1\le x\)). So \((x+\epsilon)^2=x^2+2x\epsilon+\epsilon^2<x^2+2x\epsilon+\epsilon=x^2+\epsilon(2x+1)=x^2+\frac{2-x^2}{2x+1}(2x+1)=2\).

Uniqueness is because supremum is unique. Let \(S\) be any set, suppose \(a,b\) are both supremum, then \(a\le b\) and \(b\le a\).


Exercises

Note that we are using a lot of properties of addition, multiplication and order of \(\mathbb R\) we haven't proved but can be proven easily from the axioms. Try spot them and prove them. For example we used somewhere if \(x>0\) then \(\frac{1}{x}<y\iff\frac{1}{y}<x\) etc.


Just from the list of axioms, we can say nothing about the uniqueness or existence of real numbers. There might be many real numbers or might be none at all. In this section we sketch a proof that real number if exists, is unique up to order-preserving isomorphism. In next section we will actually construct a real number set.

Uniqueness of \(\mathbb R\)

The above axioms defines what we call a complete ordered fields. We are going to prove any two complete ordered fields are isomorphic.

Let \((k_1,+_1,0_1,\times_1,1_1,\le_1),(k_2,+_2,0_2,\times_2,1_2,\le_2)\) be two complete ordered fields. An order-preserving isomorphism \(\phi:k_1\to k_2\) is a bijection such that

Let \(\mathbb N_i=\{0_i,1_i,1_i+_i1_i,1_i+_i1_i+1_i,\cdots\}\) be the natural numbers inside \(k_i\). Let \(\mathbb Z_i\) be the corresponding integers inside \(k_i\) and \(\mathbb Q_i\) be rationals (\(\{a\times b^{-1}|a\in\mathbb Z_i,b\in\mathbb Z_i-\{0\}\}\)).

To prove the existence of such order-preserving isomorphism, we deploy the following strategy:

  1. We construct an isomorphism from \(\mathbb N_1\) to \(\mathbb N_2\);
  2. Then extends it to an isomorphism from \(\mathbb Z_1\) to \(\mathbb Z_2\);
  3. Then extends it to an isomorphism from \(\mathbb Q_1\) to \(\mathbb Q_2\);
  4. Then finally extends it to \(k_1\to k_2\).

So let us define \(\phi:\mathbb N_1\to\mathbb N_2\) by \(\phi(0_1)=0_2\) and \(\phi(1_1)=1_2\). Then everything else is forced : \(\phi(2_1)=\phi(1_1+_11_1)=\phi(1_1)+_2\phi(1_1)=1_2+_2+1_2=2_2\). Prove by induction we can see in general \(\phi(n_1)=n_2\). (\(n_i\) is \(1_i+_i+1_i\cdots+_i1_i\) \(n\) times.)

Then since \((-a)=(-1)\times a\) for all \(a\in k_i\). If we require \(\phi((-1)_1)=(-1)_2\). Everything is defined from \(\mathbb Z_1\to\mathbb Z_2\). Similarly we can extend it again by \(\phi(a\times_1 b^{-1})=\phi(a)\times_2\phi(b)^{-1}\). Check that everything defined so far is order preserving and bijection: For example, assume \(n_1,m_1,a_1,b_1\) are positive integers in \(\mathbb Z_1\), if \(n_1\times_1m_1^{-1}\le a_1\times_1b_1^{-1}\) by the order axioms, we can rearrange \(n_1\times_1 b_1\le_1 a_1\times_1 m_1\). Then apply \(\phi\), \(n_2\times_2 b_2\le_2a_2\times_2m_2\). So everything works if you want to check them.

So far everything is easy, but extending it to whole of \(k_1\to k_2\) is slightly trickier.

For an arbitrary \(r_1\in k_1\). The set \(S(r_1):=\phi(\{q\in\mathbb Q_1|q\le r_1\})\) is non-empty. We can easily prove this using the theorem above, namely \(\exists q\in\mathbb Q_1,r_1-1<q<r_1\). Then \(\phi(q)\in S(r_1)\), also \(\exists q\in\mathbb Q_1,q>r_1\) for similar reason, then \(\phi(q)\) is an upperbound for \(S(r_1)\). So we can define \(\phi(r_1):=\sup S(r_1)\). We can check order preserving: if \(r_1\le t_1\) then \(\{q\in\mathbb Q_1|q\le r_1\}\subseteq\{q\in\mathbb Q_1|q\le t_1\}\) then \(S(r_1)\subseteq S(t_1)\). Then \[S(r_1)\subseteq S(t_1)\implies \sup S(r_1)\le_1 \sup S(t_1)\implies\phi(r_1)\le_2\phi(t_2)\]

The first implication is general and requires a little bit of thought: for any two sets \(S\subseteq T\), if their supremum exists then \(\sup S\le \sup T\). Clearly upperbounds for \(T\) are upperbounds for \(S\) as well. For the purpose of contradiction say \(\sup S > \sup T\), then \(\exists s\in S,s>\sup T\). But this \(s\in T\), contradiction.

\(\phi\) being injective is relatively straightforward. for if \(r\ne s\), then either \(r<_1 s\) or \(s<_1 r\). Assuming without lost of generality \(r<_1 s\). Then \(r<_1 s-\epsilon\) for some \(\epsilon>0\) but small enough (also by archimedean). Then \(\phi(s-\epsilon)\) is still upperbound for \(S(r)\) but not \(S(s)\). So we need to prove subjectivity : for any \(r_2\in k_2\), there is \(r_1\in k_1\) such that \(\phi(r_1)=r_2\). Define \(r_1:=\sup \phi^{-1}(\{t_2\in k_2|t_2\le r_2\})\). We need first to show \(r_1\) exists. So \(\phi^{-1}(\{t_2\in k_2|t_2\le r_2\})\) is nonempty and bounded above. Here is a sketch there are rationals \(\alpha_2,\beta_2\in\mathbb Q_2\) such that \(\alpha_2<r_2\) and \(\beta_2>r_2\). \(\phi:\mathbb Q_1\to\mathbb Q_2\) is bijective and order preserving so \(\phi^{-1}(\alpha_2)\in\phi^{-1}(\{t_2\in k_2|t_2\le r_2\})\) and \(\phi^{-1}(\beta_2)\) is an upperbound.

Exercise

  1. We also need to check \(\phi(r_1)=r_2\).
  2. For any two sets \(A,B\), write \(A+B:=\{a+b|a\in A,b\in B\}\). Then \(\sup(A+B)=\sup A+\sup B\). Use this to prove \(\phi(s+_1 t)=\phi(s)+_2\phi(t)\).
  3. Similarly prove \(\phi(s\times_1t)=\phi(s)\times_2\phi(t)\).

This finishes the sketch that every two real numbers is isomorphic.

Existence of \(\mathbb R\)

Let \(A,B\) be a partition of \(\mathbb Q\), \((A, B)\) is a Dedekind's cut if the following hold:

Then we define real numbers to be all Dedekind's cut. Let \((A_1,B_1)\) and \((A_2,B_2)\) be two Dedekinds cut, then define

  1. \((A_1,B_1)+(A_2,B_2):=(A_1+A_2,B_1+B_2)\)
  2. \((A_1,B_1)\times(A_2,B_2):=(\{a_1a_2|a_1\in A_1,a_2\in A_2,\textrm{at least one of } a_1, a_2\textrm{ is nonnegative.}\})\).
  3. \((A_1,B_1)\le (A_2,B_2)\) if \(A_1\) is a proper subset \(A_2\).

The exercise is to prove the axioms. Quite painful, if you are not masochistic, ignore it. The important bit: if \(S=\{(A_i,B_i)|i\in I\}\) is a set of dedekind's cut, nonempty and bounded. Then one can prove \(\sup S=\left(\bigcup_{i\in I}A_i,\mathbb Q-\bigcup_{i\in I}A_i\right)\).

Example

  1. Let \(r\) be in \(\mathbb Q\), \((\{t\in\mathbb Q|t<r\},\{t\in\mathbb Q|t\ge r\})\) represents \(r\) in \(\mathbb R\)
  2. \((\{x\in\mathbb Q|x^2<2\},\{x\in\mathbb Q|x^2>2\})\) represents \(\sqrt2\in\mathbb R\)

See Rudin and others (1964) for more details on Dedekinds cuts.

\(\mathbb R\) is uncoutable

There is a theorem (see Zorich Vladimir (2016) page 63) saying that any \(r\) real number can be uniquely represented by \(f:\mathbb N\to\{0,1,2,3,4,5,6,7,8,9\}\). For example \(\sqrt2\) is represented by \(f(0)=1,f(1)=4,f(2)=1,f(3)=4\) etc (i.e the decimal expansion).

Let us prove no bijection \(\mathbb R\to\mathbb N\). Suppose otherwise \(\mathbb R\) is countable then suppose the bijection \(g\) is like the following \(n\mapsto a_{n0}a_{n1}a_{n2}a_{n3}...\). We can define a number \(b\) whose decimal expansion \(b_0b_1b_2b_3...\) such that \(b_i\ne a_{ii}\). Then no natural number maps to \(b_0b_1b_2...\) because \(g(i)\) has \(i\)th digit different from \(b\).

Then no bijection \(\mathbb N\to\mathbb R\), so not countable.

Summary

I think the pièce de résistance in this lecture is not how real numbers are actually constructed, but the axioms themselves. There are many construction of real numbers, but by our sketch of proof, they are essentially the same (up to an order preserving isomorphism). \(\mathbb R\) is still not a perfect field in the sense of algebraic closedness (i.e. if all polynomials have a root). But to fix that we go to the construction of \(\mathbb C\) (normally as \(\frac{\mathbb R[X^2]}{(X^2+1)}\), if you know more algebra), however we are losing ordering property on \(\mathbb C\).

Something being unique up to isomorphism and properties being preserved under isomorphism are important in mathematics because then we can forget about the actually details and look at some abstract entities instead or if we don't want to work with some abstract entity we can work with a more concrete realisation. In our case, Dedekind's cut is hard to work with, because of the uniqueness, we don't actually care about Dedekind's cut that much other than that it can be done so real numbers exist, the axioms is all we need.

Sequences and limits (Lecture 5)

In this lecture, we will discuss sequences and relevant notions. Sequences are of many forms, there can be sequences of real numbers, of functions, of integrals etc. We are going to make sense of notions of limits, convergence etc.

Sequence

A sequence of real numbers is a function \(f:\mathbb N\to \mathbb R\). Often we write \((a_n)_{n\in\mathbb N}\) to mean the sequence with \(n\)-th element to be \(a_n:=f(n)\).

Often we omit the subscript \(n\in\mathbb N\) part. Or we may write \((a_n)_{n=i}^\infty\) to mean the sequence \(f(n)=a_{n+i}\), i.e the sequnce \((a_i,a_{i+1},a_{i+2},...)\). Again we usually discard the \(_{n=i}^{\infty}\) part if it is clear. For example \(\left(\frac{1}{n}\right)\) means \(\left(\frac{1}{n}\right)_{n=1}^\infty\), so it is \(\left(\frac{1}{1},\frac{1}{2},...\right)\) because dividing by zero is not allowed. Normally exact which index we start with is either clear from context or doesn't really matter.

  • So \((n)_{n\in\mathbb N}\) is the sequence \((0,1,2,3,\cdots)\),
  • So \((\frac{1}{n+1})_{n\in\mathbb N}\) is the sequence \(\left(1,\frac{1}{2},\frac{1}{3},...\right)\),
  • etc.

A null sequence \((a_n)\) is a sequence such that \[ \forall \epsilon>0\exists N\in\mathbb N\forall n>N, |a_n|<\epsilon \]

Whether \(|a_n|<\epsilon\) or \(|a_n|\le \epsilon\) doesn't really matter: Say \(\forall \epsilon>0\exists N\in\mathbb N\forall n>N, |a_n|\le\epsilon \iff \forall \epsilon>0\exists N\in\mathbb N\forall n>N, |a_n|<\epsilon\). From right to left is trivial. From left to right is as following: Fix any \(\epsilon > 0\). Consider \(\frac{\epsilon}{2}\), then take the \(N\in\mathbb N\) such that \(\forall n>N, |a_n|\le \frac{\epsilon}{2}<\epsilon\).

Because for real number \(a\), \(||a||=|a|\), we have \((a_n)_{n\in\mathbb N}\) is null if and only if \((|a_n|)_{n\in\mathbb N}\) is null. Here are some examples of null sequences:

  • \((a_n)_{n\in\mathbb N}\) such that \(\forall n\in\mathbb N,a_n=0\). For any \(\epsilon>0\), take \(N=0\) will do.
  • \((\frac{1}{n})_{n=1}^{\infty}\) is a null sequence. Take \(\epsilon>0\), by archemidean, there is some \(N\in\mathbb N\) such that \(N>\frac{1}{\epsilon}\). Then for any \(n>N\), \(|\frac{1}{n}|=\frac{1}{n}<\frac{1}{N}<\epsilon\).
  • \(\left(\frac{(-1)^n}{n}\right)_{n=1}^{\infty}\).
  • If \((a_n)_{n\in\mathbb N}\) is null then \((b_n):=(a_n)_{n=i}^{\infty}\) is null for any \(i\). For any \(\epsilon>0\), there is some \(N\in\mathbb N\) such that \(\forall n>N,|a_n|<\epsilon\). Then because of \(b_n=a_{n+i}\), \(\forall n>N+i,|b_n|=|a_{n+i}|<\epsilon\), since \(n+i>N+i\) implies \(n>N\).

Cauchy sequence

A sequence \((a_n)\) of real numbers is Cauchy if and only if: \[\forall \epsilon>0\exists N\in\mathbb N \forall n>N\forall m>N, |a_n - a_m|<\epsilon\]

\((\frac{1}{n^2})\) is Cauchy: fix \(\epsilon>0\), there is some \(N\in\mathbb N\) such that \(2^N>\frac{2}{\epsilon}\) i.e. \(2^{-N}<\frac{\epsilon}{2}\). Then \[\left|\frac{1}{2^n}-\frac{1}{2^m}\right|\le\frac{1}{2^n}+\frac{1}{2^m}\le\frac{1}{2^N}+\frac{1}{2^N}<\epsilon\]

Limit

Let \((a_n)\) be a sequence we say \((a_n)\) converges to \(a\in\mathbb R\) if \((a_n-a)_{n\in\mathbb N}\) is null. Or equivalently

\[ \forall\epsilon>0\exists N\in\mathbb N\forall n>N,|a_n-a|<\epsilon \]

We write this as \(\lim_{n\to\infty} a_n=a\). If \((a_n)\) doesn't converge to anything in \(\mathbb R\), we say that \(a_n\) diverges. If \(\forall M>0\exists N\in\mathbb N\forall n>N,a_n>M\) we say \((a_n)\) diverges to \(\infty\). If \(\forall M<0\exists N\in\mathbb N\forall n>N,a_n<M\), we say \((a_n)\) diverges to \(-\infty\)

So by definition a null sequence converges to \(0\).

  • \(\left(\frac{n}{n+1}\right)\) converges to \(1\). Indeed \(\left(\frac{1}{n+1}\right)\) is null.
  • But \(((-1)^n)\) is not convergent. Because say \(\lim_{n\to\infty}(-1)^n=x\) for some \(x\). Then take \(\epsilon=\frac{1}{2}\). If \(x\ge 0\) then for \(n\) odd, \(|x-(-1)^n|=|x+1|\ge 1>\epsilon\). If \(x<0\), then for any \(n\) even \(|x-(-1)^n|=|x-1|=|-x+1|>1>\epsilon\). So for any \(N\in\mathbb N\), there is always some \(n>N\) such that \(|x-a_n|\ge \epsilon\). \((-1)^n\) neither diverges to \(\infty\) nor \(-\infty\).
  • \((n^2)\) diverges to \(\infty\).
  • \((-n)\) diverges to \(-\infty\).

Properties of limit

A sequence \((a_n)\) if converge, has only one limit.

Say \(\lim a_n=a\) and \(\lim a_n=b\). Then for any \(\epsilon>0\), there is an \(N_1\in\mathbb N\) such that \(|a_n-a|<\frac{\epsilon}{2}\) for all \(n>N_1\) and there is \(N_2\in\mathbb N\) such that \(|a_n-b|<\frac{\epsilon}{2}\). Then for all \(n>\max(N_1,N_2)\), we have \(|b-a|=|b-a_n+a_n-a|\le|b-a_n|+|a_n-a|\le\frac{\epsilon}{2}+\frac{\epsilon}{2}=\epsilon\). If \(a\ne b\), either \(a<b\) or \(a>b\) but then \(|a-b|>0\), by (some corollary of) Archemedian principle for some \(M\in\mathbb N\), \(\frac{1}{M}<|a-b|\). Taking \(\epsilon=\frac{1}{M}\), we get a contradiction, hence \(a=b\).

Any convergent sequence \((a_n)\) is bounded, i.e. \(\exists (K_1,K_2)\in\mathbb R\forall n\in\mathbb N,K_2<a_n<K_1\).

Say \(\lim a_n=a\). Then there is some \(N\in\mathbb N\) such that \(\forall n>N,|a_n-a|<1\), i.e. \(a-1<a_n<a+1\). Then \(\forall n\in\mathbb N, \min\left(a_0,a_1,\cdots,a_{N-1},a-1\right) <a_n<\max\left(a_0,a_1,a_2,\cdots,a_{N-1},a+1\right)\).

Note the converse is not true. \(((-1)^n)\) is bounded but not convergent.

Let \((a_n)\) and \((b_n)\) be 2 sequences such that \(\lim a_n=a\) and \(\lim b_n=b\) and \(\forall n\in\mathbb N,a_n\le b_n\). Then \(a\le b\).

Say otherwise \(a>b\). Let \(\epsilon=\frac{a-b}{2}\). Take \(N_1,N_2\in\mathbb N\) be the natural numbers such that \(\forall n>N_1,|a_n-a|<\epsilon\) and \(\forall n>N_2,|b_n-b|<\epsilon\). Then let \(N=\max(N_1,N_2)\) then for any \(n>N\), \(a-a_n\le|a_n-a|<\epsilon\) and \(b_n-b\le|b-b_n|<\epsilon\), i.e. \(a-\epsilon<a_n\) and \(b_n<b+\epsilon\). So \(b_n<b+\epsilon=a-\epsilon<a_n\). Contradiction.

[Sandwich rule] Let \(a\in\mathbb R\) and \((a_n),(b_n)\) and \((c_n)\) be sequences. Suppose \(forall n\in\mathbb N, a_n\le b_n\le c_n\) and \((a_n)\) and \((c_n)\) both converge to \(a\). Then \((b_n)\) converges to \(a\) as well.

Fix an \(\epsilon > 0\). We can find \(N_1,N_2\in\mathbb N\) such that \[ [(\forall n>N_1), a-\epsilon<a_n]\land[(\forall n>N_2, c_n<a+\epsilon)] \]

Then for all \(n>\max(N_1,N_2)\), we have \(a-\epsilon<a_n\le b_n\le c_n<a+\epsilon\). So \(|b_n-a|<\epsilon\). Since the abobe argument works for any \(\epsilon>0\), we must have \(|b_n-a|=0\). Hence \(b_n=a\)

This theorem is handy to compute limits.

Let \(\lim a_n=a\) and \(\lim b_n=b\). Then

  • \(\lim (a_n+b_n)=a+b\);
  • \(\lim (a_n b_n)=ab\);
  • provided that \(b\ne 0\) and \(\forall n\in\mathbb N, b_n\ne 0\), \(\lim \left(\frac{a+n}{b_n}\right)=\frac{a}{b}\).
  • Fix \(\epsilon > 0\). Choose \(N_1\) in the way that \(\forall n>N_1,|a_n-a|<\frac{\epsilon}{2}\) and \(N_2\) such that \(\forall n>N_2, |b_n-b|<\frac{\epsilon}{2}\). Let \(N=\max(N_1,N_2)\). So \(|(a_n+b_n)-(a+b)|=|(a_n-a)+(b_n-b)|\le|a_n-a|+|b_n-b|<\frac{\epsilon}{2}+\frac{\epsilon}{2}=\epsilon\).
  • By one of the previous theorem \((b_n)\) is bounded. Let \(K\) be the real number such that \(\forall n\in\mathbb N, |b_n|\le K\). Replace \(K\) with \(\max(K,|a|)\) if necessary, we can assume \(|a|\le K\). So we have \[ |a_nb_n-ab|=|a_nb_n-ab_n+ab_n-a_nb_n|\le|a_n-a||b_n|+|a||b_n-b|\le K(|a_n-a|+|b_n-b|) \] So choose \(N_1\) such that \(\forall n>N_1, |a_n-a|<\frac{\epsilon}{2K}\), choose \(N_2\) such that \(\forall n>N_2, |b_n-b|<\frac{\epsilon}{2K}\). Let \(N=\max(N_1,N_2)\), then \(|a_nb_n-ab|<\epsilon\)
  • We have \[ |\frac{a_n}{b_n}-\frac{a}{b}|=|\frac{a_n b-ab_n}{b_n b}|\le\frac{|a_n-a||b|+|a||b_n-b|}{b_nb} \]

Since \(b\ne 0\), we can choose \(N_1\in\mathbb N\) such that \(\forall n>N, |b-b_n|<\frac{|b|}{2}\). Since \(|b-b_n|\ge |b|-|b_n|\). Hence we have for all \(n>N_1, |b_n|>\frac{1}{2}|b|\). Fix \(\epsilon > 0\), Choose N_2 such that for all \(n>N_2\), \(|a_n-a|<\frac{1}{4}|b|\epsilon\). Choose \(N_3\) such that forall \(n>N_3\), \(|b_n-b|<\frac{1}{4a}b^2\epsilon\). So forall \(n>\max(N_1,N_2,N_3)\), we have \(\left|\frac{a_n}{b_n}-\frac{a}{b}\right|<\epsilon\)

Monotone sequence

  • A sequence \((a_n)\) is said to be increasing if \(\forall n\in\mathbb N,(a_n\le a_{n+1})\); strictly increasing if \(\forall n\in\mathbb N, a_n<a_{n+1}\). Descreasing sequence and strictly decreasing sequences are similarly defined.

  • A monotone sequence is either an increasing or decreasing sequence.
  • \((n^2)\) is increasing hence monotone.
  • \(\left(\frac{1}{n}\right)\) is decreasing hence monotone.
  • \((a_n)\) where all \(a_n=0\) is monotone, because it's is an increasing (also decreasing) sequence.
  • \(\left((-1)^n \frac{1}{n}\right)\) is not monotone.

For an increasing sequence, bounded above \(\iff\) convergent

From left to right is general (doesn't need increasingness). The other direction:

Just view \((a_n)\) as a set: \(A:=\sup\{a_n:n\in\mathbb N\}\). Bounded above and nonempty hence \(\sup A\) exists, call it \(a\). Let \(\epsilon>0\), then there is \(N\in\mathbb N\) such that \(a_N>a-\epsilon\). Since increasing, forall \(n>N\), \(a_n\ge a_N>a-\epsilon\).

Here is the dual theorem

For a decreasing sequence, bounded below \(\iff\) convergent.

The proof is too similar to be worthy talking. Left as an exercise.

Examples and exercises

Consider \((a_n)\) such that \(a_n = \sqrt{a_{n-1}+2}\) and \(a_0=\sqrt{2}\).

  • Then \((a_n)\) is bounded. Indeed \(\forall n\in\mathbb N, 0<a_n\le 2\).
    Positivity is definition: \(a_0 > 0\). \(\sqrt{}\) is defined to be the postive square root, all we need is \(a_{n-1}+2\) nonzero, but if we used induction \(a_{n-1}\) by induction hypothesis is postive, adding 2 is (even more) positive. Let's focus on the \(\le\) part via induction. \(\sqrt{2}\le2\). Assume \(a_{n-1}\le2\), then \(a_n=\sqrt{a_{n-1}+2}\le\sqrt{2+2}=2\). Here we used \(0\le x\le y\implies \sqrt{x}\le\sqrt{y}\). This is from axioms of real numbers: otherwise \(\sqrt x>\sqrt y\) then \(x > y\).
  • \((a_n)\) is increasing.
    Let \(n\in\mathbb N\), then \(\sqrt{a_n+2}\ge a_n\iff a_n+2\ge a_n^2\\ \iff a_n+2-a_n^2\ge0\iff (2-a_n)(a_n+1)\ge0\). But \(a_n+1>0\) so \(2-a_n\ge 0\).
  • \((a_n)\) converges to 2.
    So by the previous theorem, \((a_n)\) is convergent, say \(\lim a_n=x\), by one of the previous theorem: since \(a_n^2=a_{n-1}+2\) we have \(x^2=x+2\) so \(x=2\) or \(x=-1\). But since all \(a_n>0\), \(x\ge 0\). So \(x=2\).

(Zorich Vladimir 2016 P88)

If \(q>1\), then \(\lim_{n\to\infty}\frac{n}{q^n}=0\).

Let us denote \(x_n=\frac{n}{q_n}\), then \(x_{n+1}=\frac{n+1}{nq}x_n\). \(\lim \frac{n+1}{nq}=\lim \left(\frac{n+1}{n}\times\frac{1}{q}\right)=\frac{1}{q}<1\). Then there is some \(N\in\mathbb N\), such that forall \(n>N,\frac{n+1}{n}q<1\). So \(x_{n+1}<x_n\) for \(n>N\). So the sequence \((x_n)_{n=N}^\infty\) is decreasing so a limit exists, say \(x=\lim x_n\). Then \(x=\lim (x_{n+1})=\lim \left(\frac{n+1}{nq} x_n\right)=\frac{1}{q}x\). So \(x=0\).

From this we have some interesting corollaries

\[\lim_{n\to\infty}\sqrt[n]{n}=1\]

For any \(\epsilon > 0\), there is an \(N\in\mathbb N\) such that \(1\le n<(1+\epsilon)^n\times\) for all \(n>N\). Here we use \(\lim \frac{n}{(1+\epsilon)^n}=0\). So for some \(N\in\mathbb N\), we have \(n>N\implies \frac{n}{(1+\epsilon)^n}<1\). So for \(n>N\), we have \(1\le \sqrt[n]{n}<1+\epsilon\).

For any \(a>0\), \[\lim_{n\to\infty} \sqrt[n]{a}=1\]

Let's work with \(a\ge 1\) first. For any \(\epsilon>0\), there is \(N\in\mathbb N\) such that \(1\le a<(1+\epsilon)^n\) for all \(n>N\). Then \(1\le \sqrt[n]{a}<1+\epsilon\).

For \(0<a<1\), we have \(1<\frac{1}{a}\). So \(\lim \sqrt[n]{a}=\lim \frac{1}{\sqrt[n]{\frac{1}{a}}}=\frac{1}{\lim \sqrt[n]{\frac{1}{a}}}=1\)

Limits and Cauchy sequences

Every convergent sequence is Cauchy.

Let \((a_n)\) be a convergent sequence converging to \(a\). Take any \(\epsilon>0\), find \(N\in\mathbb N\) such that \(|a_n-a|<\frac{\epsilon}{4}\) forall \(n>N\). Then for \(n>N+1\) and \(m>N+1\), \[|a_n - a_m|=|(a_n-a_N)+(a_N-a_m)|\le|a_n-a_{N+1}|+|a_m-a_{N+1}|\\ \le|a_n-a|+|a-a_{N+1}|+|a_m-a|+|a-a_{N+1}|=|a_n-a|+|a_m-a|+2|a-a_{N+1}|<\epsilon\]

This conclude the proof by picking the \(N+1\) to be the natural number corresponding to \(\epsilon\): \(\forall m,n>N+1\) we have \(|a_n-a_m|<\epsilon\).

[Nested interval] Let \((a_n)\) and \((b_n)\) be real sequences such that:

  1. \(\forall n\in\mathbb N,b_n>a_n\);
  2. \(\forall n\in\mathbb N, [a_{n+1},b_{n+1}]\subseteq [a_n, b_n]\);
  3. \(\lim_{n\to\infty}(b_n-a_n)=0\). Then \(\bigcup_{n\in\mathbb N}[a_n,b_n]\) is nonempty and contains one and only one point.

From 2, \(a_{n+1}\ge a_n\ge a_1\) and \(b_{n+1}\le b_n \le b_1\). So \((a_n)\) is increasing and bounded above by \(b_1\) so convergent say \(\lim a_n = a\); \((b_n)\) is decreasing and bounded below by \(a_1\), so convergent say \(\lim b_n=b\). Then \(a\le b\). But \(b-a=\lim(b_n-a_n)=0\) So \(a=b\). Since \(a\ge a_n\forall n\) and \(a=b \le b_n\forall n\), we have \(a\in\bigcup_{n\in\mathbb N}[a_n,b_n]\). So nonempty. Why unique? otherwise say \(x\ne y\in\bigcup_{n\in\mathbb N}[a_n,b_n]\). Wlog \(y> x\), Then for any \(n\in\mathbb N, x,y\in[a_n,b_n]\). Then \(b_n-a_n\ge y-x > 0\). Contradicting 3.

Every real Cauchy sequence is convergent.

Say \((x_n)\) is a cauchy sequence. For \(n\in\mathbb N\), let us set \(a_n:=\inf\{x_k:k\ge n\}\) and \(b_n:=\sup\{x_k|k\ge n\}\). So we have \(a_n\le a_{n+1}\le b_{n+1}\le b_n\). Let's show \(\lim (b_n-a_n)=0\): We need the fact that \(\sup (A-B)=\sup A - \inf B\) for any set \(A,B\) where \(\sup,\inf\) exists and \(A-B:=\{a-b|a\in A, b\in B\}\). So \(\{x_k:k\ge n\}-\{x_k:k\ge n\}=\{x_i-x_j|i\ge n, k\ge n\}\). So \(b_n-a_n=\sup\{x_i-x_j|i\ge n, k\ge n\}\). But \((x_n)\) is Cauchy. So for any \(\epsilon >0\) there is \(N\in\mathbb N\) such that \(\forall i,j>N\) we have \(x_i-x_j\le|x_i-x_j|<\epsilon\). So \(b_N - a_N < \epsilon\). Since forall \(n>N\) we have \(b_N\le b_n\) and \(a_n\le a_N\), \(b_n - a_n < \epsilon\). This finishes the proof.

So by the nested interval principle, there is an \(A\in\bigcup_{n\in\mathbb N}[a_n, b_n]\). Since \(a_n\le A\le b_n\) for all \(n\in\mathbb N\) and for any \(k\ge n\) we have \(a_n=\inf\{x_j:j\ge n\}\le x_k\le \sup\{x_j:j\ge n\}=b_k\). We must have \(|A-x_k|\le b_n-a_n\). By the above argument \(A\) is the limit of \((x_n)\): fix \(\epsilon > 0\), for some \(N\in\mathbb N\), for all \(k\ge n>N\), we have \(b_n-a_n<\epsilon\) so \(|A-x_k|<\epsilon\) forall \(k\ge n>N\).

Combine these we get

[Cauchy criterion for convergence] A real number sequence is convergent if and only if it is Cauchy.

Since the prove relies on the completeness axioms, the requirement of being real cannot be dropped. Indeed Cauchy sequence in \(\mathbb Q\) may not converge in \(\mathbb Q\).

Let \((a_n)\in\mathbb Q\) be defined as the decimal approximation of \(\sqrt 2\in\mathbb R\). So \(a_0 = 1\), \(a_1=\frac{7}{5}(=1.4)\), \(a_2=\frac{141}{100}(=1.41)\), \(a_3=\frac{707}{500}(=1.414)\) and so on. Then the sequence is Cauchy, but the limit is \(\sqrt{2}\) which is not in \(\mathbb Q\) but in \(\mathbb R\).

\(e\)

The knowledge in this part is not that much, but \(e\) is too important, so we list it as a separate part. We first begin by the Bernoulli inequality:

[Bernoulli's inequality] For any \(n\in\mathbb N\) and \(\alpha>-1\) \[ (1+\alpha)^n\ge 1 + n\alpha \]

True for \(n=1\). Suppose \((1+\alpha)^n\ge 1 + n\alpha\), then: \[ (1+\alpha)^{n+1}=(1+\alpha)(1+\alpha)^n\ge(1+\alpha)(1+n\alpha)=1+(n+1)\alpha+n\alpha^2\\ \ge 1+(n+1)\alpha \]

We then consider the sequence \((y_n)_{n=2}^{\infty}\) where \(y_n=\left(1+\frac{1}{n}\right)^{n+1}\).

\((y_n)\) is decreasing.

\[\frac{y_{n-1}}{y_n}=\frac{\left(1+\frac{1}{n-1}\right)^n}{\left(1+\frac{1}{n}\right)^{n+1}}=\frac{n^{2n}}{(n^2-1)^n}\times\frac{n}{n+1}=\left(1+\frac{1}{n^2-1}\right)^n\frac{n}{n+1}\ge\\\left(1+\frac{n}{n^2-1}\right)\frac{n}{n+1}>\left(1+\frac{1}{n}\right)\frac{n}{n+1}=1\] The last \(>\) is because \(\frac{n}{n^2-1}-\frac{1}{n}=\frac{1}{n(n^2-1)}>0\) forall \(n>2\).

\(\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^n\) exists.

\[\lim \left(1+\frac{1}{n}\right)^n=\lim\left(1+\frac{1}{n}\right)^{n+1}\left(1+\frac{1}{n}\right)^{-1}\\=\lim \left(1+\frac{1}{n}\right)^{n+1}\times \lim\frac{1}{1+\frac{1}{n}}=\lim \left(1+\frac{1}{n}\right)^{n+1} \]

\[e:=\lim_{n\to\infty}\left(1+\frac{1}{n}\right)^n\]

Subsequences

Let \((x_n)_{n\in\mathbb N}\) be a sequence and \((n_k)_{k\in\mathbb N}\) be a seuqnce of natural numbers. We use \((x_{n_k})_{k\in\mathbb N}\) to denote the sequence \((x_{n_0},x_{n_1},...)\), we call this a subsequence of \((x_n)_{n\in\mathbb N}\).

Every infinite sequence \((x_{n})\in\mathbb{R}\) has a monotone subsequence.

Let us call a positive integer \(n\) a "peak of the sequence" if \(n<m\) implies \(x_{n}>x_{m}\). Suppose first that the sequence has infinitely many peaks, \(n_0<n_{1}<n_{2}<n_{3}<\dots <n_{j}<\dots\). Then the subsequence \((x_{n_{j}})\) corresponding to these peaks is monotonically decreasing. So suppose now that there are only finitely many peaks, let \(N\) be the last peak and \(n_0=N+1\). Then \(n_0\) is not a peak, since \(N<n_{0}\), which implies the existence of \(n_1\) with \(n_{0}<n_{1}\) and \(x_{n_{0}}\leq x_{n_{1}}\). Again, \(n_{1}>N\) is not a peak, hence there is an \(n_{2}\) where \(n_{1}<n_{2}\) with \(x_{n_{1}}\le x_{n_{2}}\). Repeating this process leads to an infinite increasing subsequence \(x_{n_{0}}\leq x_{n_{1}}\leq x_{n_{2}}\leq ...\) as desired.

[Bolzano-Weierstrass] Every bounded sequence of real numbers has a convergent subsequence.

Let \(E\) be the set \({x_n|n\in\mathbb N}\). If \(E\) is finite then there must be some \(x\in E\) such that there is \(n_0 < n_1 < n_2<...\) such that \(x_{n_1}=x_{n_2}=...=x\). So \((x_{n_k})_{k\in\mathbb N}\) is constant hence converges.

If \(E\) is infinite, we can use the theorem above to construct a monotone sequence which must converge by previous theorems.

Summary

In this lecture, we studied (not in this order)

  • What a sequence is, what a monotone sequence is and what a Cauchy sequence is;
  • What the limit of a sequence is
  • In \(\mathbb R\), convergent sequence \(\iff\) Cauchy sequence
  • We defined \(e\)

Series (\(\sum\)) and mode of convergence (Lecture 6)

Let \((a_n)\) be a sequence of real numbers. Let \(s_n=\sum_{i=0}^n a_k\). We say the series \(\sum_{i=0}^\infty a_i\) converges if and only if \((s_n)\) converges. If the series doesn't converge we call it a divergent series.

  • Recall from high school the geometric progression \[1+\frac{1}{2}+\frac{1}{4}+\frac{1}{8}+...\] The partial sum \(s_n=1+\frac{1}{2}+...+\frac{1}{2^n}\) is given by \(2-\frac{1}{2^n}\). From previous lecture, this is a convergent series, converging to 0.

  • Consider the series \(\sum_{i=1}^\infty \frac{1}{i}\). This is a divergent series. Because consider the partial sum \(s_n=1+\frac{1}{2}+\frac{1}{3}+...+\frac{1}{n}\). Then \(s_{2n}-s_n=\frac{1}{n+1}+...+\frac{1}{n+n}>n\times\frac{1}{2n}=\frac{1}{2}\). So the partial sums don't form a cauchy sequence hence is not convergent. This series is called the harmonic series.

If the series \(\sum_{n=0}^\infty a_n\) converges then \(\lim_{n\to\infty} a_n = 0\)

Let \(s_n:=\sum_{i=0}^n a_i\) be partial sums. By definition \((s_n)\) converges, say to \(s\). Then \(\lim s_{n+1}=s\) Hence \(\lim (s_{n+1}-s_n)=0\). Then \(\lim a_{n+1}=0\). Then \(\lim a_n=0\)

If the series \(\sum_{n=0}^\infty a_n\) converges and \(s_n:=\sum_{i=0}^\infty a_i\). Then \(s_{2n}-s_n\) converges to \(0\).

Trivial because \(\lim s_{2n}=\lim s_n\). Just think about it.

  • \(1+1+1+1+1+...\) cannot converge because \(s_{2n}=2n\) and \(s_n=n\). So \(s_{2n}-s_n=n\) doesn't converge.
  • \(\sum_{i=0}^\infty (-1)^i\) cannot converge because \(s_{2n}=1\) and \(s_n=(-1)^n\). So \((s_{2n}-s_n)\) is the sequence \((0,2,0,2,0,2,...)\). This visibly does not converge.

[Cauchy's criterion for convergence of series] \(\sum_{n=0}^\infty a_n\) converges if and only if for every \(\epsilon>0\) there exists \(N\in\mathbb N\) such that for all \(m\ge n>N\) we have \(|a_n+...+a_m|<\epsilon\).

Let \(s_n:=\sum_{i=0}ˆn a_i\) be partial sums. Then \(\sum_{n=0}^\infty a_n\) converges \(\iff (s_n)\) converges \(\iff (s_n)\) Cauchy \(\iff \forall \epsilon>0\exists N\in\mathbb N\forall m\ge n>N, |s_n - s_m|<\epsilon\). But \(|s_n - s_m| = |a_n+...+a_m|\).

The series \(\sum_{i=0}^\infty a_i\) is said to be absolutely convergent if the series \(\sum_{i=0}^\infty |a_i|\) is convergent.

If the series \(\sum_{i=0}^\infty a_i\) is absolutely convergent, then \(\sum_{i=0}^\infty a_i\) is convergent

Since \(|a_n+...+a_m|\le |a_n| + ... + |a_m|\), cauchy's creterion, the series converges.

[Criterion for convergence of series of series of nonnegative terms] A series whose terms are nonnegative converges if and only if the sequence of partial sums is bounded above.

This is monotone convergence theorem: since the terms are non-negative, the sequence of partial sums are non-decreasing, so bounded above \(\iff\) convergent.

Comparison test

[Comparison theorem] Let \(\sum_{i=0}^\infty a_i\) and \(\sum_{i=0}^\infty b_i\) with non-negative terms. If there exists \(N\in\mathbb N\) such that for all \(n>N\), \(a_n\le b_n\). Then if \(\sum_{i=0}^\infty b_i\) is convergent, (say to \(B\)), so is \(\sum_{i=0}^\infty a_i\).

Let \(s_n\) and \(s'_n\) be partial sums for series \(\sum_{i=0}^\infty a_i\) and \(\sum_{i=0}^\infty b_i\). Then \(\forall n\in \mathbb N, s_n\le s'_n\). \(s_n\) is increasing because of non-negativeness of terms and bounded above by \(s_n'\le B\). So \(s_n\) is convergent.

Note that it doesn't really matter whether \(s_n\le s_n'\). We can proceed the proof almost in the exact same manner as long as we know for some \(K>0\) and \(\forall n\in\mathbb N,a_n\le Kb_n\). Because \(s_n\le Ks_n'\). Also a series' convergence is unaffected if we just insert some finite number of terms or delete some finite number of terms.

Let \((a_n)\) and \((b_n)\) be two sequences of postive numbers. Assume \(\lim \frac{a_n}{b_n}=L>0\) then \(\sum_{i=0}^\infty a_i\) converges if and only if \(\sum_{i=0}^\infty b_i\) converges

So by definition of limit. Take \(\epsilon=\frac{L}{2}>0\), we have for some \(N\in\mathbb N\) and any \(n>N\), we have \(|\frac{a_n}{b_n}-L|<\frac{L}{2}\). We have \[ \frac{1}{2}L <\frac{a_n}{b_n} < \frac{3}{2}L \]. This is true by the above remark.

[The Wierstrass M-test for absolute convergence] Let \(\sum_{i=0}^\infty a_i\) and \(\sum_{i=0}^\infty b_i\) be two series. If for some \(N\in\mathbb N\) and for all \(n>N,|a_n|\le b_n\). Then convergence of \(\sum_{i=0}^\infty b_i\) implies absolute convergence of \(\sum_{i=0}^\infty a_i\).

  • \(\sum_{i=0}^\infty\frac{2n}{n^2+1}\) is divergent. Because \[\frac{\frac{2n}{n^2+1}}{\frac{1}{n}}=\frac{2n^2}{n^2+1}\\\lim=2\]. If the series is convergent then \(\sum \frac{1}{i}\) is convergent which is not.
  • \(\sum_{i=1}^\infty \frac{1}{n^2}\) is convergent because \[ \frac{1}{n(n+1)}<\frac{1}{n^2}<\frac{1}{n(n-1)}\] \(\frac1{k+1}=\frac1k-\frac1{k+1}\). So the partial sum for \(\sum\frac1{n(n+1)}\) is \(1-\frac 1{n+1}\), hence convergent convergent. Similarly for \(\sum_{i=2}^\infty\frac 1{n(n-1)}\) is convergent. So \(\sum_{i=2}^\infty\frac 1{n^2}\) is convergent. So \(\sum_{i=1}^\infty\frac1{n^2}\) is convergent.
  • The series \(\sum_{i=1}^\infty\frac{n}{2n^3+2}\) is convergent because \(\lim\frac{\frac{n}{2n^3+2}}{\frac{1}{n^2}}=\frac{1}{2}\) and \(\sum\frac{1}{n^2}\) is convergent.

Root test

[limit superior and limit inferior] The limit superior and limit inferior of a sequence \((x_n)\) is defined as \[\limsup_{n\to\infty}x_n:=\lim_{n\to\infty}\sup\{x_k|k\ge n\}\\\liminf_{n\to\infty}x_n:=\lim_{n\to\infty}\inf\{x_k|k\ge n\}\] provided the limits exist otherwise we say \[\limsup_{n\to\infty} {x_n}:=\infty\\\liminf_{n\to\infty} {x_n}:=-\infty\].

The theory of limit superior and limit inferior can be further developed together with subsequences for example see Zorich Vladimir (2016). In short, \(\limsup\) is the \(sup\) of all the limits of subsequence and \(\liminf\) is the \(\inf\) of all the limits of subsequence etc.

[Root test or Cauchy's test] Let \((a_n)\) be a sequence of number and let \(\alpha:=\limsup\sqrt[n] {|a_n|}\). Then - if \(\alpha < 1\) then the series converges absolutely; - if \(\alpha > 1\) then the series diverges; - if \(\alpha = 1\) no conclusion can be made: there are both convergent and divergent series with \(\alpha=1\).

  • So for some \(\alpha<q<1\) and \(N\in\mathbb N\) such that we have \(\forall n>N, \sqrt[n]{|a_n|}<q\) i.e. \(|a_n|<q^n\). Comparing to \(\sum_{i=N+1}^\infty q^i\) we conclude \(\sum_{i=N+1}^\infty a_n\). So \(\sum_{i=0}^\infty a_n\) converges as well.
  • So for some \(N\in \mathbb N\) we have \(\forall n>N, \sqrt[n]{|a_n|}>1\). So \(\lim a_n\ne0\), thus diverges.
  • Consider \(\sum\frac1n\) and \(\sum\frac1{n^2}\).

Let us see for what value would the following converge:\[\sum_{i=1}^\infty\left(2+\left(-1\right)^n\right)^nx^n\] Note that the sequence \(\left(\sqrt[n]{\left|\left(2+\left(-1\right)^n\right)^nx^n\right|}\right)\) does not converge for any \(x\ne 0\). But \(\sup\left\{\sqrt[k]{\left|\left(2+\left(-1\right)^k\right)^kx^k\right|}|k\ge n\right\}=3|x|\) so for \(|x|<\frac13\), the sequence converges (absolutely) and for \(|x|>\frac13\) the sequence diverges. For \(x=\frac13\), the test tells us nothing so we have to work harder: \(\left(2+\left(-1\right)^n\right)^n\left(\frac13\right)^n=1\) for \(n\) even so the squence does not tends to \(0\) hence the series cannot converge for \(|x|=1/3\).

d’Alembert’s test or ratio test

[ratio test or d'Alembert's test] Let \((a_n)\) be a sequence of number and let \(\alpha:=\lim\left|\frac{a_{n+1}}{a_n}\right|\). Then - if \(\alpha < 1\) then the series converges absolutely; - if \(\alpha > 1\) then the series diverges; - if \(\alpha = 1\) no conclusion can be made: there are both convergent and divergent series with \(\alpha=1\).

  • So there is some \(\alpha<q<1\) and \(N\in\mathbb N\) such that for all \(n>N,\left|\frac{a_{n+1}}{a_n}\right|<q\). Since by changing the series with only finitely many terms the series still converge. We have can assume \(\left|\frac{a_{n+1}}{a_n}\right|<q\) for all \(n\in\mathbb N\). Then \[ \left|\frac{a_{n+1}}{a_1}\right|=\left|\frac{a_{n+1}}{a_n}\right|\left|\frac{a_n}{a_{n-1}}\right|...\left|\frac{a_2}{a_1}\right|<q^n \]. So \(|a_n|<|a_1|q^n\) with \(q<1\). By comparison test, since \(\sum|a_1|q^n=|a_1|\sum q^n\) converges, so does \(\sum |a_n|\).
  • If \(\alpha>1\). Then from some \(N\in\mathbb N\) onwards \(\left|\frac{a_{n+1}}{a_n}\right|>1\). Then \(|a_n|\) is stricly increasing from \(N\) onwards, so \(\lim a_n\) cannot be \(0\). Hence the series must diverge.
  • Consider \(\sum\frac1n\) and \(\sum\frac1{n^2}\).

Let us work out for what \(x\) does the following converges: \[\sum_{n=1}^\infty\frac1{n!}x^n\].

  • For \(x=0\) the series is \(\sum 0\) is \(0\) converges.
  • For \(x\ne 0\), \(\left|\frac{a_{n+1}}{a_n}\right|=\frac{|x|}{n+1}\) which converges for any \(x\). So the series converge everywhere.
  • In fact this is \(e^x\). But this is hard. Maybe we will prove this after taylor series.
  • Let's just work with \(x=1\) for now: \(\sum_{n=1}^\infty\frac1{n!}=e\) because Let us write \(e_n:=\left(1+\frac1n\right)^n\) and \(s_n\) be the n-th partial sum of the series. Then \[ e_n=1+{n\choose1}\frac1n+{n\choose 2}\frac{1}{n^2}+\cdots+\frac1{n^n}=1+\frac{n}{1!}\frac1n+\frac{n(n-1)}{2!}\frac1{n^2}+\cdots+\frac1{n^n}\\ =1+1+\frac1{2!}\left(1-\frac1n\right)+\cdots+\frac1{k!}\left(1-\frac1n\right)\left(1-\frac2n\right)\cdots\left(1-\frac{k-1}n\right)+\cdots\frac1{n!}\left(1-\frac1n\right)\cdots\left(1-\frac{n-1}n\right) \] This is because \({n\choose k}\frac1{n^k}=\frac{n(n-1)\cdots(n-k+1)}{k!}\frac1{n^k}=\frac1{k!}\frac nn\frac{n-1}n\frac{n-2}n\cdots\frac{n-k+1}n\). Then \(e_n<s_n\). For any fixed \(k\) and then all \(n>k\) \[1+1+\frac1{2!}\left(1-\frac1n\right)+\cdots+\frac1{k!}\left(1-\frac1n\right)\left(1-\frac2n\right)\cdots\left(1-\frac{k-1}n\right)<e_n\]. As \(n\to\infty\), lhs tends \(s_k\). So \(s_k\le e\) for any \(k\in\mathbb N\). So \(e_n< s_n\le e\). By sandwich theorem \(\lim s_n=e\).

\(\sum_{n=1}^\infty\frac1{n^p}\)

I think the following weird theorem is proved by Cauchy but I might be wrong.

If \(a_1\ge a_2\ge a_3\ge...\ge0\) then \(\sum_{i=1}^\infty a_i\) converges if and only if \(\sum_{i=1}^\infty2^ia_{2^i}=a_1+2a_2+4a_4+8a_8+...\) converges.

  • Left to right trivial.
  • Right to left: Since \[a_2\le a_2\le a_1\\ 2a_4\le a_3+a_4\le2a_2\\4a_8\le a_5+a_6+a_7+a_8\le4a_4\\...\\2^na_{2^{n+1}}\le a_{2^n+1}+\cdots+a_{2^{n+1}}\le2^na_{2^n}\]. If we write \(A_k=a_1+\cdots+a_k\) and \(S_k=a_1+2a_2+\cdots+2^na_{2^n}\) and add the above inequalies together, we get \(\frac12\left(S_{n+1}-a_1\right)\le A_{2^{n+1}}-a_1\le S_n\). This shows that either both \(A_k\) and \(S_k\) are bounded or neither is bounded. Since \(A_k\) and \(S_k\) are both increasing, they either both converge or both diverge.

The importance of above theorem lies in the following corollary.

\(\sum_{n=1}^\infty\frac1{n^p}\) converges for \(p>1\) and diverges for \(p\le1\).

  • For \(p\ge0\), \(\sum_{n=1}^\infty\frac1{n^p}\) converges if and only if \(\sum_{n=1}^\infty2^n\frac1{\left(2^n\right)^p}\) converges. The latter equals \(\sum_{n=0}^\infty\left(2^{1-p}\right)^n\) which converges if and only if \(2^{1-p}<1\) if and only if \(p < 1\).
  • For \(p<0\) the series obiously diverges: just look at it \(\sum n^q\) with \(q=-p\ge 0\).

\(\sum_{n=1}^\infty\frac1{n^p}\) is important because we often use it in comparison test to study convergence of other sequences.

Summary

In this lecture we studied in details about:

  • what a series is and what it means to say it converges;
  • some examples of convergent and divergent series, in particular \(\sum\frac1{n^p}\);
  • some test we can use to see if a series is convergent;
  • we also briefly mentioned \(e\).

Riemann Rearrangement Theorem and Power Series (Lecture 7)

Alternating seuqnece

Last time we mainly talked about absolute convergent. This may not be enough. Recall the harmonic series \(\sum\frac1n\) diverges. We are going to focus on series like \(\sum(-1)^n\frac1n\) in this lecture. This series converge but not absolutely.

[Alternating series test] Let \((a_n)\) be a sequence of positive numbers such that

  1. \(\forall n\in\mathbb N, a_n\ge a_{n+1}\)
  2. \(\lim a_n=0\)

Then the series \[\sum_{n=1}^\infty(-1)^{n-1}a_n=a_1-a_2+a_3-a_4+\cdots \] is convergent

Denote by \[s_n := a_1-a_2+a_3+\cdots+(-1)^{n-1}a_n\] the \(n\)-th partial sum of the alternating series. Set \[t_n := s_{2n}=a_1-a_2+a_3+\cdots+a_{2n-1}-a_{2n}\\ u_n:=s_{2n-1}=a_1-a_2+a_3+\cdots+a_{2n-1} \]

So \(t_{n+1}-t_n=s_{2n+2}-s_{2n}=a_{2n+1}-a_{2n+2}\ge0\) and \(u_{n+1}-u_n=-a_{2n}+a_{2n+1}\le0\). Also \(u_n-t_n=a_{2n}\ge 0\). So we have \(t_n\) increasing, \(u_n\) decreasing and \(u_n\ge t_n\). Then by monotone convergence theorem \(t_n\) converges to some \(t\) and \(u_n\) converges to some \(u\). Also \(u-t=\lim(u_n-t_n)=\lim a_{2n}=0\). So \(t=u\). This is saying \(\lim s_{2n}=\lim s_{2n+1}=t\). We claim that \(\lim s_n=t\). For any \(\epsilon>0\), there is \(N\in\mathbb N\) such that \(\forall k>N\), \(|s_{2k}-t|<\epsilon\) and \(|s_{2k+1}-t|<\epsilon\). So for all \(n>2N+1\), if \(n\) is even then \(n=2k\) with \(k>N\) otherwise \(n\) is odd so \(n=2k+1\) with \(k>N\).

I would like to point out that knowing the subsequence of even indices and subsequence of odd indices both converge to the same real number though is sufficient for the convergence of real number sequence. (This is essentially because there is not a lot of ways of getting larger and larger in natural number). But this requires reasoning. It is not generally true that the whole sequence converges just because we choose one (or several) ways of how \(n\) gets large. This is particularly true when we are dealing with complex functions.

The manifested example is the alternating harmonic sequence: \[1-\frac12+\frac13-\frac14+\cdots\] Sometimes just to emphasise the series converges but not absolutely we say it converges conditionally. The terminology will be more clear once we know rearranging series.

The series \(\sum_{n=1}^\infty\frac{x^n}n\) converges if and only if \(x\in[-1,1)\)

  • If \(x>0\) then \(\lim\frac{\frac{x^{n+1}}{n+1}}{\frac{x^n}{n}}=\lim\frac{xn}{n+1}=x\). So by ratio test, if \(x < 1\), the series converges. If \(x>1\), the series diverges. If \(x=1\), it is the harmonic series which diverges. For \(x<0\), we write \(y=-x>0\). The series is \(\sum\frac{(-1)^ny^n}n\). For \(y\le1\) we have \(\frac{y^n}n>\frac{y^{n+1}}{n+1}\) and \(\lim \frac{y^n}n=0\). So by previous theorem we know that the series converges. On the other hand for \(y>1\) then \(\lim \frac {y^n} n\) is not zero. So the sequence diverges.

Rearrange series

[Illustrative example] Let us work with \(1-\frac12+\frac13-\frac14+\cdots=:\sum_{i=1}^\infty(-1)^{n-1}\frac1n\). Note we are fixing the order of summation, namely we are considering the limit of partial sum \(s_n:=\sum_{i=1}^n(-1)^{i-1}\frac1i\) converging to some \(s\). But what if we are using a different order of summation: \[1-\frac12-\frac14+\frac13-\frac16-\frac18+\frac15-\frac1{10}-\frac1{12}+\cdots\]. If we denote the partial sum as \(t_n\). Then

\[t_{3n}=\left(1-\frac12-\frac14\right)+\left(\frac13-\frac16-\frac18\right)+\cdots+\left(\frac1{2n-1}-\frac1{4n-2}-\frac1{4n}\right)\\ =\frac12\left(1-\frac12\right)+\frac12\left(\frac13-\frac14\right)+\cdots+\frac12\left(\frac1{2n-1}-\frac1{2n}\right)=\frac12s_{2n}\]. So \(t_{3n+1}=t_{3n}+\frac1{2n+1}\) and \(t_{3n+1}=t_{3n+1}-\frac1{4n+2}\). So \(\lim t_{3n}=\frac12s\) and \(\lim t_{3n+1}=\lim t_{3n}+\lim\frac1{2n+1}=\frac12s\) and \(\lim t_{3n+2}=\lim t_{3n+1}-\frac1{4n+2}=\frac12s\). With some argument like the one in last section we know \(\lim t_n=\frac12 s\).

So just by rearranging the series, we make the result smaller by half. Similarly we can by rearranging sequence to make the series diverges as well. This is why we call it conditional convergence.

Let us now work with more generality:

We say \(\sum_{n=0}^\infty b_n\) is a rearrangement of \(\sum_{n=0}^\infty a_n\) if there is a bijection \(\pi:\mathbb N\to\mathbb N\) such that \(b_n=a_{\pi(n)}\) for all \(n\in\mathbb N\). Of course if our series starts with \(i\), then rearrangement is a bijction \(\pi:\{i,i+1,\cdots,\}\to\{i,i+1,\cdots\}\) such that \(b_n=a_{\pi(n)}\) for all \(n\ge i\)

[Rearrangement Theorem I] Let \(\sum a_n\) be an absolute convergent series converging to \(s\). Then all the rearrangement also converge to \(s\).

  • First, we restrict our attention to the case where \(a_n\ge 0\) for all \(n\). Suppose \(b_1+b_2+\cdots\) is a rearrangement of \(a_1+a_2+a_3+\cdots\). Let \(s_n\) be partial sums for \(a\)'s and \(t_n\) be partial sums for \(b\)'s. Then both \((s_n)\) and \((t_n)\) are increasing sequences and \(\lim s_n=s\). So \(s_n\le s\). Also \[t_n=b_1+b_2+\cdots+b_n=a_{\pi(1)}+a_{\pi(2)}+\cdots+a_{\pi(n)}\] So we also have \(t_n \le s_{\max\{\pi(i)|i=1,\cdots,n\}}\le s\). So we know \(t_n\) converges. Say \(\lim t_n=t\). Then \(t\le s\). But similarly \[s_n=a_1+a_2+\cdots+a_n=b_{\pi^{-1}(1)}+\cdots+b_{\pi^{-1}(n)} \le t_{\max\{\pi^{-1}(i)|i=1,\cdots,n\}}\] We can conclude in a similar manner that \(s\le t\). So we have \(s=t\).

  • Generally speaking. We can consider the positive part and negative part of \(a_n\) to be \(a_n^+:=\frac{a_n+|a_n|}{2}\) and \(a_n^-:=\frac{-a_n+|a_n|}2\). This is:

\[\begin{equation} a_n^+:=\left\{ \begin{aligned} &a_n & a_n\ge 0\\ &0 & a_n < 0 \end{aligned} \right.\\ a_n^-:=\left\{ \begin{aligned} &0 & a_n\ge 0\\ &-a_n & a_n<0 \end{aligned} \right. \end{equation}\]

By comparison test both \(\sum a_n^+\) and \(\sum a_n^-\) converges (absolutely) because \(0\le a_n^+\le|a_n|\) and \(0\le a_n^-\le|a_n|\). Let \(a^+:=\sum a_n^+\) and \(a^-:=\sum a_n^-\). Note that \(a_n=a_n^+-a_n^-\). So \[s=\sum a_n=\sum \left(a_n^+-a_n^-\right)=a^+-a^-\]

Then if \(b_1+b_2+\cdots\) is a rearrangement of \(a_1+a_2+\cdots\), let us write \(b_n^+\) and \(b_n^-\) for positive part and negative part of \(b_n\). Then by our previous discussion \(b_1^++b_2^++\cdots=a_1^++a_2^++\cdots\) and \(b_1^-+b_2^-+\cdots=a_1^-+a_2^-+\cdots\) and \(b_n=b_n^+-b_n^-\). So \[ b_1+b_2+\cdots=\sum b_n=\sum\left(b_n^+-b_n^-\right)\\ =\sum b_n^+-\sum b_n^-=\sum a_n^+-\sum a_n^-\\ =\sum\left(a_n^+-a_n^-\right)=\sum a_n=s \]

[Rearrangement Theorem II] Let \(\sum a_n\) be a conditionally convergent series. Then

  • for any \(s\in\mathbb R\), there is a rearrangement \(\sum b_n\) such that \(\sum b_n=s\).

  • there is a rearrangement \(\sum b_n\) such that the series diverges to \(+\infty\)

  • there is a rearrangement \(\sum b_n\) such that the series diverges to \(-\infty\).

Before we prove this fun little theorem, let us see what these two theorems are saying. The first one says that absolute convergent series can only converge to one number. The second theorem says that for a series that converges but not absolutely, by rearranging them the series can converge to anywhere or nowhere. This is the meaning of what we call conditional convergence. The convergence dependes on ordering but absolute convergence is independent of ordering.

  • Let us prove the cases for which none of \(a_n\) is zero and \(s>0\).

  • Let \((a_{n_k})\) and \(a_{m_j}\) be the subsequences of postive numbers and subsequences of negative numbers respectively.

  • Then both subsequences diverges: for otherwise

  • they both converges
  • \((a_{n_k})\) converges but \((a_{m_j})\) diverges
  • \((a_{n_k})\) diverges but \((a_{m_j})\) converges

but none is possible

  • then \(\sum_n |a_n|=\sum_k a_{n_k}-\sum_j a_{m_j}\) converges. But we assumed \(\sum_n a_n\) conditionally converges.
  • \(\sum_n a_n=\sum_k a_{n_k}-\sum_j a_{m_j}\) diverges
  • \(\sum_n a_n=\sum_k a_{n_k}-\sum_j a_{m_j}\) diverges

  • Since both subsequeces are monotone then they diverge to \(+\infty\) and \(-\infty\) respectively.

  • Then there exists \(k_1\) such that \[ a_{n_1}+a_{n_2}+\cdots+a_{n_{k_1-1}}<s\le a_{n_1}+a_{n_2}+\cdots+a_{n_{k_1}} \]

  • Then there exists \(j_1\) such that (remember) \[ a_{n_1}+a_{n_2}+\cdots+a_{n_{k_1}}+\\a_{m_1}+a_{m_2}+\cdots+a_{m_{j_1+1}} \\< s \le\\ a_{n_1}+a_{n_2}+\cdots+a_{n_{k_1}}+\\a_{m_1}+a_{m_2}+\cdots+a_{m_{j_1}} \]

  • Continue this way we get our rearrangement:

\[ \begin{equation} \pi(i)=\left\{ \begin{aligned} & n_i & 1\le i \le k_1\\ & m_{i-k_1} & k_1 < i \le k_1+j_1\\ & n_{i-j_1} & k_1+j_1 < i \le k_1+j_1+k_2\\ & \cdots \end{aligned} \right. \end{equation} \]

  • Note that \(|s-(a_{n_1}+a_{n_2}+\cdots+a_{n_{k_1}})|<a_{n_{k_1+1}}\) and \(|s-(a_{n_1}+a_{n_2}+\cdots+a_{n_{k_1}}+a_{m_1}+a_{m_2}+\cdots+a_{m_{j_1}})|<-a_{m_{j_1+1}}\) etc. And as \(\sum_n a_n\) converges we have \(\lim_n a_n=0\) so \(\lim_k a_{n_k}=\lim_j a_{m_j}=0\). So the sum converges to \(s\).

  • For \(\sum a_n\) with some \(a_n=0\). If number of zero terms is finite say there are \(m\) zeros, first make \(\sum b_n\) be the rearrangement which move all \(0\) to the front. Then we can think \(\sum_{n=m+1}^\infty b_n\). The use the argument above to rearranage \(b_n\) from \(m+1\) onwards and keep the front zero the same. If the number of zero terms is infinite. We can use the above argument to rearrange nonzero terms to \(b_1+b_2\cdots\) such that \(\sum b_n=s\). Then define \(c_{2k+1}=b_k\) and \(c_{2k}=0\). Then we rearranged all of \(a_n\) and \(\sum c_n = s\).

  • For \(s<0\) and diverging to \(\pm\infty\), proof is similar.

If you find the above theorem hard to follow. Don't worry too much about it. I wasn't very careful when proving the theorem. This theorem is more of a demonstration or warning that don't rearrange a conditional convergence sequence.

Cauchy product of series

Let \(\sum_{n=0}^\infty a_n\) and \(\sum_{n=0}^\infty b_n\) be two convergent series with sum \(A\) and \(B\) with at least one of them being absolutely convergent. Then the series \(\sum c_n\) converge to \(AB\) where \[ c_n=\sum_{i=0}^na_ib_{n-i} \]

Assume \(\sum a_n\) is the abosulte convergent one. Let us write \(A_n\) for partial sums of \(\sum_n a_n\) and \(B_n\) for partial sums of \(\sum_n b_n\) and \(S_n\) be partial sums for \(\sum_n c_n\).

  • \(AB = (A-A_n)B+A_nB\)
  • \(S_n= \sum_{k=0}^n a_kB_{n-k}\) because \(\sum_{k=0}^n a_kB_{n-k}=\sum_{k=0}^n a_k\sum_{i=0}^{n-k}b_i=\sum_{k=0}^n\sum_{i=0}^{n-k}a_kb_i=\sum_{k=0}^nc_k\).
  • So \(AB-S_n=(A-A_n)B+A_nB-\sum_{k=0}^n a_kB_{n-k}=(A-A_n)B+\sum_{k=0}^{n}a_k(B-B_{n-k})\)
  • We know that \(\lim_n(A-A_n)=0\)
  • If we set \(N:=[\frac n2]\). Then \(\sum_{k=0}^{n}a_k(B-B_{n-k})=\sum_{k=0}^{N}a_k(B-B_{n-k})+\sum_{k=N+1}^{n}a_k(B-B_{n-k})\).
  • \(|\sum_{k=0}^{N}a_k(B-B_{n-k})|\le\sum_{k=0}^{N}|a_k||(B-B_{n-k})|\le\max\{B-B_k|N<k\le n\}\sum_{k=0}^N|a_k|\). Since \(\lim_k B_k=B\), we have that \((B-B_k)\) is bounded and \(\max\{B-B_k|N<k\le n\}\) converging to \(0\) (Cauchy criterion of convergent series), then \(|\sum_{k=0}^{N}a_k(B-B_{n-k})|\) converges to 0. \(|\sum_{k=N+1}^{n}a_k(B-B_{n-k})|\le\sum_{k=N+1}^{n}|a_k||(B-B_{n-k})|\le M\sum_{k=N+1}^\infty|a_k|\). Since \(\sum |a_n|\) converges we have \(\lim_n|\sum_{k=N+1}^{n}a_k(B-B_{n-k})| = 0\). So \(\lim_n|\sum_{k=0}^{n}a_k(B-B_{n-k})|=0\)
  • So \(\lim_n (AB-S_n)=(\lim_n(A-A_n))B+\lim_n \sum_{k=0}^{n}a_k(B-B_{n-k})=0\).

Last time we proved (In this example we are writing \(0^0=0\), so the series below can start with indices \(0\) and make everything slightly easier.) \[ e(x):=\sum_{n=0}^\infty\frac1{n!}x^n \] converges absolutely for all \(x\in\mathbb R\).

Let us compute \(e(x)e(y)=\left(\sum_{n=0}^\infty\frac1{n!}x^n\right)\left(\sum_{n=0}^\infty\frac1{n!}y^n\right)\). Then the \(n\)-th term of \(e(x)e(y)\) is

\[ \sum_{i=0}^n\frac{x^i y^{n-i}}{i!(n-i)!} \] The \(n\)-th term of \(e(x+y)\) is

\[ \frac{(x+y)^n}{n!}=\frac1{n!}\sum_{i=0}^n{n\choose i}x^iy^{n-i}=\frac1{n!}\sum_{i=0}^n\frac{n!}{i!(n-i)!}x^iy^{n-i}=\sum_{i=0}^n\frac{x^i y^{n-i}}{i!(n-i)!} \] So we have \(e(x)e(y)=e(x+y)\).

Power series

A power series is a series of the form \(\sum_n a_nx^n\) where \(a_n\) and \(x\) are real numbers

\[ \sum_{n=0}^\infty\frac1{n!}x^n \] is a power series convergent for all \(x\in\mathbb R\).

\[ \sum_{n=1}^\infty n^nx^n \] is a power series convergent only for \(x=0\). Indeed otherwise \(\lim n^nx^n\ne 0\).

\[ \sum_{n=0}^\infty x^n \] converges if and only if \(x\in(-1,1)\).

\[ \sum_{i=1}^\infty(-1)^{n-1}\frac{x^n}n \] converges if and only if \(x\in[-1,1]\)

Let \(r\in\mathbb R\) be such that \(\sum a_nr^n\) converges, then the power seires \(\sum a_n x^n\) converges absolutely for all \(|x|<|r|\).

If \(r=0\) there is nothing to prove. So assume \(r\ne0\). Since \(\sum a_n r^n\) converges, we have \(\lim a_nr^n=0\). So the sequence \((a_nr^n)\) is bounded, say by \(K\). Let \(x\in\mathbb R\) with \(|x|<|r|\). Let \(y:=\frac{|x|}{|r|}\). Then \(0\le y<1\). So \(|a_n x^n|=|a_n||x|^n=|a_n||r^n|y^n\le Ky^n\). So the series \(\sum |a_n x^n|\) converges by comparison test.

Let \(\sum a_nx^n\) be a power series. Then exactly one of the following occurs:

  1. The series converges only when \(x=0\);
  2. The series converges absolutely for all \(x\in\mathbb R\);
  3. There is some \(r>0\) such that the series converges absolutely for all \(x\in\mathbb R\) such that \(|x|<r\) and divergent for all \(x\in\mathbb R\) such that \(|x|>r\).

Let

\[ E:=\{x\in\mathbb R|x\ge 0\land \sum a_nx^n \mathrm{ converges}\} \] Obviously \(0\in E\). If \(E=\{0\}\), then 1 is true. Otherwise some \(0<\alpha\in E\). Then by the previous theorem the power series converges for all \(y\in\mathbb R\) such that \(0\le|y|\le\alpha\). If \(E\) is not bounded above then any \(0<y\in\mathbb R\) is not an upperbound for \(E\). So there is \(x\in E\) with \(x>y\). So the power series also converges for \(y\) by previous theorem. Thus \(E=[0,\infty)\). So the series converges absolutely for all \(x\in\mathbb R\). So 2 is true in this case. Otherwise, \(0<r=\sup E\) exists. Then for any \(x\in(-r,r)\). We can find some \(y\in E\) such that \(|x|<y\). So the \(\sum a_n y^n\) converges then \(\sum a_n x^n\) converges absolutely by previous theorem. If \(|x|>r\). Then there is \(y\in\mathbb R\) such that \(r<y<|x|\) with \(y\notin E\). The series is not convergent at \(y\). The the series is not convergent at \(x\).Otherwise by previous theorem the series is absolute convergent at \(y\) which is not true. So 3 is true.

Naturally only one of the 3 cases can be true.

The radiius of convergence \(R\) of the power series \(\sum a_n x^n\) is defined as following:

  • if the power series converges for all \(x\in\mathbb R\), then \(R:=\infty\)
  • if the power series converges only for \(0\), then \(R:=0\)
  • if the power series converges absolutely for \(|x|<r\) and diverges for \(|x|>r\) for some \(0<r<\infty\), then \(R:=r\)

\[R=\frac1{\limsup|c_n^{1/n}|}\]

  • Left as an exercise. See Rudin and others (1964) Theorem 3.39.
  • Or use root test.

\[1-\frac{x^2}{2!}+\frac{x^4}{4!}-\frac{x^6}{6!}+\cdots\] has radius of convergence \(R=\infty\).

Set \(y=x^2\) then the series is \[a_0+a_1y+a_2y^2+\cdots\] where \(a_n=\frac{(-1)^n}{(2n)!}\).

Using ratio test we have \(\lim\frac{|a_{n+1}y^{n+1}|}{|a_n y^n|}=\lim\frac{|y|}{(2n+1)(2n+1)}=0\). So for any \(y\ge 0\), the series converges. So forall \(x\in\mathbb R\) the series converges.

\[x-\frac{x^3}{3!}+\frac{x^5}{5!}-\frac{x^7}{7!}+\cdots\] has infinite radius of convergence. The proof is similar to the previous one.

\[\sum_{n=0}^\infty\frac{(-1)^nx^n}{2n+1}\] has radius of convergence 1.

We use ratio test: \[\lim\frac{|a_{n+1}x^{n+1}|}{|a_n x^n|}=\lim\frac{|x|(2n+1)}{2n+3}=|x|\] So the series is absolutely convergent if \(|x|<1\) and diverges if \(|x|>1\).

For the sake of completeness let us see what happens when \(|x|=1\). For \(x=1\) we have \(1-\frac13+\frac15-\frac17+\cdots\)which converges by the alternating series test. For \(x=-1\) we have \(1+\frac13+\frac15+\frac17\cdots\)which diverges by the comparison test with the harmonic series.

Summary

  • This lecture with last lecture concludes all we want to talk about series;
  • We learned test for alternating series;
  • We learned that it is safe to rearrange absolutely convergent series;
  • We learned that it is dangerous to rearrange conditionally convergent series;
  • We learned (basic) power series and radius of convergence.

Basic Topology on \(\mathbb R^k\) (Lecture 8)

Last lecture we conclude our study on sequences and series. From this lecture, we will focus on functions and properties of functions such as continuity, differentiability, integrability etc. In this lecture we will focus on the topology of \(\mathbb R^k\) to make a strong foundation for study of functions.

Metric Spaces

A set \(X\) whose elements we shall call points is said to be a metric space if for any points \(p\) and \(q\) there is a real number \(d(p,q)\) called the distance from \(p\) to \(q\), such that

  • [M1] \(d(p,q)>0\) if \(p\ne q\)
  • [M2] \(d(p,p) = 0\);
  • [M3] \(d(p,q)\le d(p,r)+d(r,q)\)

Any function \(X\times X\to \mathbb R\) satisfying above is called a metric on \(X\).

The most important examples of metric spaces is \(\mathbb R^k\) with the distance between \(x=(x_1,\dots,x_k)\) and \(y=(y_1,\dots,y_k)\) to be \(|x-y|\) where \(|x|=\sqrt{\sum_{i=1}^k x_i^2}\)

Let us check it is a metric - M1, M2 are trivial - M3 is a consequence of the Cauchy-Schwarz inequality:

[Cauchy-Schwarz inequality] \[ |x\cdot y|\le|x||y| \] where \(x\cdot y=\sum_{i=1}^k x_iy_i\)

Let us write \(A=\sum |x_i^2|\), \(B=\sum|y_i^2|\) and \(C=\sum x_i y_i\). If \(B=0\), then \(y_1=y_2=\dots=y_k=0\) and there is nothing to prove. Otherwise \(B > 0\), then \[ \begin{aligned} 0\le \sum|Bx_i-Cy_i|^2&=\sum(Bx_i-Cy_i)(Bx_i-Cy_i)\\&=\sum B^2|x_i^2|-BC\sum x_i y_i-BC\sum x_i y_i+C^2\sum |y_i|^2\\ &= B^2A-BC^2-BC^2+C^2B\\ &=B(AB-C^2) \end{aligned} \] Thus \(0\le B(AB-C^2)\) then \(0 \le AB-C^2\) or \(C^2\le AB\). So \[ |\sum x_i y_i|\le \sqrt{\sum |x_i^2|}\sqrt{\sum|y_i^2|} \] This is exactly what we want.

So from the Cauchy-Schwarz inequality we have \[ \begin{aligned} |x+y|^2&=(x+y)\cdot(x+y)\\ &=x\cdot x+2x\cdot y+y\cdot y\\ &\le x\cdot x+2|x\cdot y|+y\cdot y\\ &= |x|^2+2|x||y|+|y|^2\\ &=(|x|+|y|)^2 \end{aligned} \] Thus \(|x+y|\le |x|+|y|\). Then \(|x-y|=|x-z+(z-y)|\le|x-z|+|z-y|\) as required.

If \((X, d)\) is a metric space and \(Y\subseteq X\), then \((Y, d|_{Y\times Y})\) is also a metric space.

Topology induced by the metric

  • If \(a_i<b_i\) for \(i=1,\dots,k\), then the set of all points \(x=(x_1,\dots,x_k)\in\mathbb R^k\) where \(x_i\in(a_i,b_i)\forall i\) is called a \(k\)-cell. (So \(1-cell\) is closed interval, \(2-cell\) is a rectangle etc).
  • If \(x\in\mathbb R^k\) and \(r>0\), the open (or closed) ball \(B\) with center at \(x\) and radius \(r\) is the set of all \(y\in\mathbb R^k\) such that \(|x-y|<r\) (or \(|x-y|\le r\)).
  • \(E\subset \mathbb R^k\) is convex if \(\forall x, y\in E,\forall \lambda\in(0,1)\) we have \(\lambda x+(1-\lambda)y\in E\)

balls are convex. For if \(|y-x|<r\) and \(|z-x|<r\) and \(0<\lambda<1\) we have \[ \begin{aligned} |\lambda y+(1-y)z - x| &= |\lambda (y-x)+(1-\lambda)(z-x)| \\ &\le \lambda |y-x|+(1-\lambda)|z-x| \\ &\le \lambda r+(1-\lambda) r\\ &=r \end{aligned} \] Similarly cells are convex as well.

Let \((X,d)\) be a metric space. All points and sets mentioned below are understood to be elements and subsets of \(X\).

  • A neighborhood of \(p\) is a set \(N_r(p)\) consisting of all \(q\) such that \(d(p,q)<r\) for some \(r>0\). \(r\) is called the raius of \(N_r(p)\).
  • A point \(p\) is a limit point of a set \(E\) if every neighborhood of \(p\) contains a point \(q\ne p\) such \(q\in E\).
  • If \(p\in E\) but is not a limit point of \(E\), then \(p\) is called an isolated point of \(E\)
  • \(E\) is closed iff every limit point of \(E\) is a point of \(E\)
  • A point \(p\) is an interior point of \(E\) if there is a neighborhood \(N\) of \(p\) such that \(N\subseteq E\).
  • \(E\) is open iff every point of \(E\) is an interior point of \(E\).
  • \(E\) is perfect iff \(E\) is closed and every point of \(E\) is a limit point of \(E\)
  • \(E\) is bounded if there is real number \(M\) and a point \(q\in X\) such that \(\forall p\in E, d(p,q)<M\).
  • \(E\) is dense in \(X\) if every point of \(X\) is either a limit point of \(E\) or a point of \(E\) (or both).

Every neighborhood is an open set

Consider \(E=N_r(p)\) and \(q\in E\). Then there is a positive real number \(h\) such that \(d(p,q)=r-h\). For all points \(s\) such that \(d(q,s) < h\) we have \[ d(p,s)\le d(p,q) + d(q,s) < r-h+h=r \]

If \(p\) is a limit point of a set \(E\), then every neighbourhood of \(p\) contains infinitely many points of \(E\)

Suppose there is a neighbourhood \(N\) of \(p\) which contains only a finite number of points of \(E\). Let \(q_1,\dots,q_n\) be the points of \(N\cap E\) that is not \(p\), put \(r = \min_{1\le m\le n} d(p,q_m)\). Then \(r>0\). The neighbourhood \(N_r(p)\) contains no point \(q\) of \(E\) such that \(q\ne p\). Then \(p\) is not a limit point of \(E\).

A finite set hence has no limit points. Hence a finite set must be closed.

Let us consider the following subsets of \(\mathbb R^2\)

  1. The set of all \(z\in \mathbb R^2\) such that \(|z|<1\).
  2. The set of all \(z\in \mathbb R^2\) such that \(|z|\le 1\).
  3. A nonempty finite set.
  4. \(\mathbb Z\)
  5. \(E=\{\frac 1 {n+1}|n\in\mathbb N\}\). Then \(0\) is a limit point of \(E\) but no points of \(E\) is a limit point of \(E\).
  6. \(\mathbb R^2\)
  7. \(\{x|a< x < b\}\)

Then

Closed Open Perfect Bounded
1 No Yes No Yes
2 Yes No Yes Yes
3 Yes No No Yes
4 Yes No No No
5 No No No Yes
6 No * No Yes

the segment \((a,b)\) is open as a subset of \(\mathbb R\) but not open as \(\mathbb R^2\)

A set \(E\) is open if and only if its complement is closed

Suppose \(E^c\) is closed. Let \(x\in E\), then \(x\notin E^c\) and \(x\) is not a limit point of \(E^c\). Hence there is a neighbourhood \(N\) of \(x\) such that \(E^c\cap N\) is empty, this is \(N\subseteq E\). Hence \(x\) is an interior point of \(E\). So \(E\) is open.

Next suppose \(E\) is open. Let \(x\) be a limit point of \(E^c\). Then every neighbourhood of \(x\) contains a point of \(E^c\). Then \(x\) is not an interior point of \(E\). Since \(E\) is open, then \(x\in E^c\). Hence \(E^c\) is closed.

A set \(F\) is closed if and only if its complement is open

Recall the generalised de Morgan Law: \[ \left(\bigcup_\alpha E_\alpha\right)^c=\bigcap_\alpha E_\alpha^c \]

  1. For any collection \(\{G_\alpha\}\) of open sets, \(\bigcup_\alpha G_\alpha\) is open;
  2. For any collection \(\{F_\alpha\}\) of closed sets, \(\bigcap_\alpha F_\alpha\) is closed;
  3. For any finite collection \(G_1,\dots,G_n\) of open sets, \(\bigcap_{i=1}^n G_i\) is open;
  4. For any finite collection \(F_1,\dots,F_n\) of closed sets, \(\bigcup_{i=1}^n F_i\) is closed.
  1. For simplicity, write \(G=\bigcup_\alpha G_\alpha\). If \(x\in G\), then for some \(\alpha\) \(x\in G_\alpha\). Then \(x\) is an interior point of \(G_\alpha\) hence an interior point of \(G\). Hence \(G\) is open.
  2. Then \(\{F_\alpha^c\}\) is a collection of open sets, then \(\bigcup_\alpha F_\alpha^c\) is open then \((\bigcup_\alpha F_\alpha^c)^c\) is closed, but \((\bigcup_\alpha F_\alpha^c)^c=\bigcap_\alpha (F_\alpha^c)^c=\bigcap_\alpha F_\alpha\).
  3. For simplicity write \(H=\bigcap_{i=1}^n G_i\). For any \(x\in H\), there exist neighbourhoods \(N_i\) of \(x\) with radii \(r_i\) such that \(N_i\subseteq G_i\). Let \(r=\min(r_1,\dots,r_n)\) and let \(N\) be the neighbourhood of \(x\) with radius \(r\). Then \(N\subseteq N_i\subseteq G_i\) for all \(i=1,\dots,n\). Hence \(N\subseteq H\), so \(H is open\).
  4. Then \(F_i^c\) are open, then \(\bigcap_{i=1}^nF_i^c\) is open then \(\left(\bigcap_{i=1}^nF_i^c\right)^c\) is closed. \(\left(\bigcap_{i=1}^nF_i^c\right)^c=\bigcup_{i=1}^n\left(F_i^c\right)^c=\bigcup_{i=1}^nF_i\)

In 3 and 4 of previous theorem, the finiteness condition is essential. For example, let \(G_n:=\left(-\frac1n,\frac1n\right)\) for \(n=1,\dots,\). Then \(G_n\) is open as a subset of \(\mathbb R\) for all \(n\). But \(G=\bigcap_{n=1}^\infty G_n\) contains only \(0\) and therefore is not open.

Similarly union of an infinite collection of closed sets need not to be closed. For example for each point \(x\in\mathbb Q\), \(\{x\}\) is closed but \(\mathbb Q\) is not. For example \(\pi\) is a limit point of \(\mathbb R\) but not a rational number. (\(\pi\) being irrational is actually hard to prove.)

Let \((X,d)\) be a metric space and \(E\subseteq X\) and let \(E'\) be the set of all limit points of \(E\) in \(X\), then the closure of \(E\) is \(\bar E:=E\cup E'\).

Let \((X,d)\) be a metric space and \(E\subseteq X\), then

  1. \(\bar E\) is closed;
  2. \(E=\bar E\) if and only if \(E\) is closed;
  3. \(\bar E\subseteq F\) for every closed set \(F\subseteq X\) such that \(E\subseteq F\).
  1. Let \(p\in X\) such that \(p\notin \bar E\) then \(p\) is neither a point of \(E\) nor a limit point of \(E\). Hence \(p\) has neighbourhood which does not intersect \(E\). The complement of \(\bar E\) is therefore open. Hence \(\bar E\) is closed.
  2. left to right trivial by 1; If \(E\) is closed then \(E'\subseteq E\) then \(\bar E=E\).
  3. If \(F\) is closed and \(E\subseteq F\), then \(F'\subseteq F\) then \(E'\subseteq F\) then \(\bar E\subseteq F\).

By previous theorem \[\bar E=\bigcap_{F\in\mathcal P(X)\\ F \textrm{ closed}\\ E\subseteq F} F\]

Let \(E\) be a nonempty set of real numbers which is bounded above. Let \(y=\sup E\). Then \(y\in\bar E\). Hence \(y\in E\) if \(E\) is closed

If \(y\in E\) then certainly \(y\in\bar E\). Assume \(y\notin E\). For every \(h>0\), there exists a point \(x\in E\) such that \(y-h < x < y\), for otherwise \(y-h\) would be an upper bound of \(E\). Thus \(y\) is a limit point of \(E\). Then \(y\in\bar E\).

Let \((X, d)\) be a metric space, suppose that \(E\subseteq Y\subseteq X\). Then \(Y\) is also a metric space by restricting \(d\) to \(Y\times Y\). Then to say \(E\) is open in \(X\) means that to each point \(p\in E\) there is associated a positive number \(r\) such that \(\forall q\in X,d(p,q)< r \implies q\in E\). We can also say \(E\) is an open subset of \(Y\) or open relative to \(Y\) that is for each \(p\in E\) there is associated a positive number \(r\) such that \(\forall q\in Y, d(p,q)< r\implies q\in E\). We saw in a previous example that, open relative to \(Y\) doesn't necessarily mean open relative to \(X\). But we still have the following theorem

Suppose \(Y\subseteq X\). \(E\subseteq Y\) is open relative to \(Y\) if and only if \(E=Y\cap G\) for some \(G\subseteq X\) open in \(X\).

From right to left is easier: if \(G\) is open in \(X\) and \(E=G\cap Y\) then, for every \(p\in E\) has a neighbourhood \(V_p\subseteq G\). Then \(V_p\cap Y\subseteq E\) and \(V_p\cap Y\subseteq E\). So \(E\) is open relative to \(Y\).

Compact sets

  • By an open cover of a set \(E\) in a metric space \((X, d)\) we mean a collection \(\{G_\alpha\}\) of open subsets of \(X\) such that \(E\subseteq \bigcup_\alpha G_\alpha\).
  • A subset \(K\subseteq X\) is said to be compact if and only if every open cover of \(K\) contains a finite subcover. More explicitly, if \(\{G_\alpha\}\) is an open cover of \(K\), then there are finitely many indices \(\alpha_1,\dots,\alpha_n\) such that \(K\subseteq G_{\alpha_1}\cup\dots\cup G_{\alpha_n}\).

Obviously every finite subset is compact.

We have seen that \(E\subseteq Y\subseteq X\) then \(E\) might be open relative \(Y\) without being open relative to \(X\). However compactness behaves better:

Suppose \(K\subseteq Y\subseteq X\). Then \(K\) is compact relatvie to \(X\) if and only if \(K\) is compact relative to \(Y\).

Because of this theorem, we are able to in many situations to regard compact sets as metric spaces in their own right without paying attention to any embedding space. In particular althought it makes little sense to talk about open spaces or closed spaces (every metric space \(X\) is an open and closed subset of itself), however it does make sense to talkt about compact metric spaces.

Suppose \(K\) is compact relative to \(X\) and let \(\{V_\alpha\}\) be a collection of sets open relative to \(Y\) such that \(K\subseteq \bigcup_\alpha V_\alpha\). Then there are sets \(G_\alpha\) open relative to \(X\) such that \(V_\alpha=Y\cap G_\alpha\) for all \(\alpha\). Since \(K\) is compact relative to \(X\), we can find some finitely many indices \(\alpha_1,\dots,\alpha_n\) such that \(K\subseteq G_{\alpha_1}\cup\dots\cup G_{\alpha_n}\). Then \(K\subseteq V_{\alpha_1}\cup\dots\cup V_{\alpha_n}\). We have found a finite subcover of \(\{V_\alpha\}\).

Conversely, suppose \(K\) is compact relative to \(Y\) and \(\{G_\alpha\}\) is a collection of open subsets of \(X\) which covers \(K\). Put \(V_\alpha=Y\cap G_\alpha\). Then there are some finitely many indeces \(\alpha_1,\dots,\alpha_n\) such that \(K\subseteq V_{\alpha_1}\cup\dots\cup V_{\alpha_n}\). Then certainly \(K\subseteq G_{\alpha_1}\cup\dots\cup G_{\alpha_n}\) because \(V_{\alpha_i}\subseteq G_{\alpha_i}\).

Compact subsets are closed.

Let \(K\) be compact. We will prove \(K^c\) open. Suppose \(p\notin K\). If \(q\in K\). let \(V_q\) and \(W_q\) be neighbourhoods of \(p\) and \(q\) with radius less than \(\frac12 d(p,q)\). Then \(K\) is covered by \(\{W_q|q\in K\}\). Hence we can find finitely many \(q_1,\dots,q_n\in K\) sucht that \(K\subseteq W_{q_1}\cup\dots\cup W_{q_n}=: W\). Write \(V:= V_{q_1}\cap\dots\cap V_{q_n}\), then \(V\) is a neighbourhood of \(p\) which does not intersect \(W\). Hence \(V\subseteq K^c\). Then \(p\) is an interior point.

Closed subsets of compact subset is compact.

Suppose \(F\subseteq K\subseteq X\), \(F\) is closed relative to \(X\) and \(K\) is compact. Let \(\{V_\alpha\}\) be an open cover of \(F\). If \(F^c\) is adjoined to \(\{V_\alpha\}\) we have another open cover $$ of \(K\). Since \(K\) is compact there is a finite subcolelction \(\Phi\) of \(\Omega\) covering \(K\), hence covering \(F\). If \(F^c\) is a memember of \(\Phi\), then \(\Phi-{F^c}\) still covers F. We have thus found a finite subcover.

If \(F\) is closed and \(K\) compact, then \(F\cap K\) is compact.

If \(\{K_\alpha\}\) is a collection of compact subsets of a metric space \(X\) such that every finite subcollection of \(\{K_\alpha\}\) is nonempty then \(\bigcap K_\alpha\) is nonempty.

Fix a member \(K_1\) of \(\{K_\alpha\}\) and put \(G_\alpha=K_\alpha^c\). For contradiction assume that no point of \(K_1\) belongs to every \(K_\alpha\). Then the sets \(G_\alpha\) is an open cover of \(K_1\), there are finitely many indices \(\alpha_1,\dots,\alpha_n\) such that \(K_1\subseteq G_{\alpha_1}\cup\dots\cup G_{\alpha_n}\). Then \(K_1\cap K_{\alpha_1}\cap\dots\cap K_{\alpha_n}\) is empty.

If \(\{K_n\}\) is a sequence of nonempty compact sets such that \(K_{n+1}\subseteq K_n\), then \(\bigcap_{i=1}^\infty K_n\).

If \(E\) is an infinite subset of a compact set \(K\), then \(E\) has a limit point in \(K\).

Otherwise, each \(q\in K\) would have a neighbourhood \(V_q\) which only intersect \(E\) at \(q\). Clearly no finite subcollection of \(\{V_q\}\) can cover \(E\). Same is true for \(K\) since \(E\subset K\). But \(K\) is compact.

Let us recall one of the previous theorems, namely nested interval principle:

If \(\{I_n\}\) is a sequence in \(\mathbb R^1\) such that \(I_{n+1}\subseteq I_n\) then \(\bigcap I_n\) is not empty.
Write \(I_n=[a_n,b_n]\). Let \(E=\{a_n\}\). Then \(E\) is nonempty and bounded above by \(b_1\). Let \(x=\sup E\). If \(m,n\) are positive integers, then \[ a_n\le a_{m+n}\le b_{m+n} \le b_m, \] so that \(x\le b_m\) for each \(m\). Since it is obvious that \(a_m\le x\), we see that \(x\in I_m\) for \(m=1,2,3\dots\).
Let \(k\) be a positive integer. If \(\{I_n\}\) is a sequence of \(k-cells\) such that \(I_{n+1}\subseteq I_n\). Then \(\bigcap_1^\infty I_n\) is nonempty

The proof has the idea as the above, mutatis mutandis. Let \(I_n\) be the cells of points \(x=(x_1,\dots,x_k)\) such that for all \(1\le j\le k; n=1,2,3,\dots\) \[ a_{n,j}\le x_j\le b_{n,j} \]

Write \(I_{n,j}=[ a_{n,j}, b_{n,j}]\). For each \(j\), by previous theorem there is a real number \(x_j^*\in\bigcap_n I_{n,j}\). Then \((x_1^*,\dots,x_k^*)\in I_n\) for all \(n\).

All of above is to prove

Every \(k\)-cell is compact

Let \(I\) be a \(k\)-cell, consisting of all points \(x=(x_1,\dots,x_k)\) such that \(a_j\le x_j\le b_j\) for all \(1\le j\le k\). Put \[ \delta=\sqrt{\sum_1^k (b_j-a_j)^2}. \] Then for all \(x,y\in I, |x-y|\le \delta\).

Suppose for contradiction that there exists an open cover \(\{G_\alpha\}\) of \(I\) without subcover of \(I\). Put \(c_i=\frac{a_j+b_j}2\). Then the intervals \([a_j, c_j]\) and \([c_j, b_j]\) determines \(2^k\) \(k\)-cells \(Q_i\) whose union is \(I\). At least one of \(Q_i\) call it \(I_1\) cannot be covered by finite subcollection of \(\{G_\alpha\}\). Otherwise \(I\) can be covered by a finite cover. We can continue to divide \(I_1\) and continue the whole process. We then obtain a sequnce of \(\{I_n\}\) such that

  1. \(\dots\subseteq I_3\subseteq I_2\subseteq I_1\subseteq I\);
  2. \(I_n\) is not covered by any finite subcollection of \(\{G_\alpha\}\);
  3. if \(x, y\in I_n\), then \(|x-y|\le 2^{-n}\delta\).

Then by previous theorem, there is a point \(\iota\) in every \(I_n\). For some \(\alpha\), \(\iota\in G_\alpha\). Since \(G_\alpha\) is open, there is some \(r>0\) such that \(|y-\iota|< r\) implies that \(y\in G_\alpha\). Then for \(n\) large enough \(2^{-n}\delta< r\) then 3 would implies \(I_n\subseteq G_\alpha\) contradicting 2.

[Heine-Borel] If a set \(E\subseteq \mathbb R^k\), TFAE:

  1. \(E\) is closed and bounded;
  2. \(E\) is compact;
  3. Every infinite subset of \(E\) has a limit point in \(E\).
  • Suppose 1, then \(E\subseteq I\) for some \(k\)-cell \(I\). Then \(E\) is compact as a closed subset of compact set \(I\).
  • 2 implies 3 is a previous theorem
  • Let us now prove 3 implying 1.
  • Suppose \(E\) is not bounded, then \(E\) contains distinct points \(x_n\) such that \(|x_n|>n\). Then \(\{x_n\}\) is a set without limit point. Then 3 implies \(E\) being bounded.
  • If \(E\) is not closed, then there is a point \(x_0\in\mathbb R^k\) which is a limit point of \(E\) but not a point of \(E\). For \(n=1,2,3,\dots\), there are points \(x_n\in E\) such that \(|x_n-x_0|<\frac1n\). Let \(S=\{x_n\}\). Then \(S\) is infinite (for otherwise \(|x_n-x_0|\) would have a constant postitive value, for infinitely many \(n\)). \(S\) has \(x_0\) as limit point and no other limit point in \(\mathbb R^k\). For if \(y\ne x_0\) then \[ \begin{aligned} |x_n-y|&\ge|x_0-y|-|x_n-x_0|\\ &\ge|x_0-y|-\frac1n\ge\frac12|x_0-y| \end{aligned} \] for all but finitely many \(n\); this shows that \(y\) is not alimit point of \(S\) (otherwise there should be infintely many points in intersection by a previous theorem). Thus \(S\) has no limit point in \(E\); then \(E\) must be closed if 3 holds.
We can now generalise a theorem proved before

[Weierstrass] Every bounded infinite subset of \(\mathbb R^k\) has a limit in \(\mathbb R^k\).

Being bounded, the set \(E\) is a subset of \(k\)-cell \(I\subseteq \mathbb R^k\). \(I\) is compact, then \(E\) has a limit point in \(I\).

Perfect set

Every nonempty perfect set in \(\mathbb R^k\) is uncountable.

Since \(P\) has limit points, \(P\) must be infinite. If \(P\) is countable, we can enumerate \(P\) by \(x_1,x_2,x_3\dots\). Let us construct a sequence \(\{V_n\}\) of neighbourhoods as following:

  • Let \(V_1\) be arbitrary neighbourhood of \(x_1\). If \(V_1\) consists of all \(y\in\mathbb R^k\) such that \(|y-x_1|< r\), the closure \(\bar {V_1}\) is the set of all points \(y\in\mathbb R^k\) such that \(|y-x_1|\le r\).
  • Suppose \(V_n\) has been constructed, so that \(V_n \cap P\) is not empty. Since every point of \(P\) is a limit point of \(P\). There is a neighbourhood \(V_{n+1}\) such that
  1. \(\bar{V}_{n+1}\subseteq V_n\)
  2. \(x_n\notin \bar{V}_{n+1}\)
  3. \(V_{n+1}\cap P\) is nonempty. Hence by induction we can construct a sequence of \(\{V_n\}\) such that each \(V_n\cap P\) is nonempty.

Put \(K_n=\bar{V}_n\cap P\). Since \(\bar{V}_n\) is closed and bounded hence compact. Then \(x_n\notin K_{n+1}\). No point of \(P\) is in \(\bigcap K_n\). But \(K_n\subseteq P\). Then \(\bigcap K_n\) is empty. Since each \(K_n\) is nonempty and \(K_{n+1}\subseteq K_n\), this contradicts a previous corollary.

Every interval is uncountable. \(\mathbb R\) is uncountable.

Summary

In this lecture, we are trying to learn basic concepts of topology:

  • we defined what is a metric space and what is a metric;
  • we defined the topology induced by metrics;
  • we discussed compact set and perfect set;
  • we generalised some theorems from \(\mathbb R^1\) to higher dimensions \(\mathbb R^k\);
  • We actually proved the uncountability of \(\mathbb R\) rigorously.

Function and continuity (Lecture 9)

Last lecture, we talked about standard topology or Euclidean topology on \(\mathbb R^k\). In this lecture, we are going to use this topology to build notions of limit of a function and continuity of function.

Limit of a function

Let \((X,d_X)\) and \((Y,d_Y)\) be metric spaces; suppose \(E\subseteq X\), \(f : E\to Y\) and \(p\) is a limit point of \(E\). We write \(f(x)\to q\) as \(x\to p\) or \[\lim_{x\to p} f(x)\] if there is a point \(q\in Y\) such that \[\forall \epsilon>0\exists\delta>0 , 0< d_X(x,p)\implies d_Y(f(x),q)<\epsilon\]

Note that \(p\) does not have to be a point of \(E\).

[Heine's Definition] Let \(X,Y,f,p,q\) be as above. Then \[\lim_{x\to p}f(x)=q\] iff and only if \[\lim_{n\to\infty} f(p_n)=q\] for every sequence \((p_n)\) in \(E\) such that \(p_n\ne p\) and \(\lim_{n\to\infty} p_n=p\).

  • from left to right: Choose any sequence \((p_n)\) with required property. Let \(\epsilon>0 be given\), there is a \(\delta>0\) such that for all \(x\in E\) and \(0 < d_(x,p) < \delta\) we would have \(d_Y(f(x),q)<\epsilon\). There is also an \(N\in\mathbb N\) such that \(\forall n>N\) we have \(0< d_X(p_n,p)<\delta\). Thus for all \(n>N\) we have \(d_Y(f(p_n),q)<\epsilon\). This is what we want.

  • from right to left: We prove the contrapositive: assume \(\lim_{x\to p}f(x)\ne q\). Then there is some \(\epsilon>0\) such that for every \(\delta > 0\) there is some \(x\in E\) such that \(0< d_X(x,p) < \delta\) but \(d_Y(f(x),q)\ge \epsilon\). Let \(\delta_n=\frac1n\). Then we find a sequence in \(E\) satisfies required properties but the limit is wrong.

From above theorem and uniqueness of limit of a sequence, limit of a function is unique if exists.

From the above theorem and properties of limit, immediately we have: Let \(f,g:E\to\mathbb R\) Suppose \[\lim_{x\to p} f(x)=A,\lim_{x\to p} g(x)=B\]. Then:

  1. \(\lim_{x\to p} (f+g)(x) = A + B\)
  2. \(\lim_{x\to p} (fg)(x) = AB\)
  3. \(\lim_{x\to p} \left(\frac f g\right)(x)=\frac A B\). Of course provided \(g\) and \(B\) are non-zero

If \(f,g:E\to\mathbb R^k\) then 1 is true 2 becomes \(\lim (f\dot g)=A\dot B\)

Continuous functions

  • Suppose \(X\) and \(Y\) are metric spaces, \(E\subseteq X\) and \(p\in E\) and \(f\) maps \(E\) into \(Y\). Then \(f\) is said to be continuous at \(p\) if for every \(\epsilon>0\) there is a \(\delta\) such that for all points \(x\in E\) with \(d_X(x,p)<\delta\), we have \(d_Y(f(x),f(p))<\epsilon\).
  • If \(f\) is continuous at every point of \(E\), we say \(f\) is continuous on \(E\).

Note that:

  • \(f\) does not have to be defined at point \(p\) for this definition to make sense.
  • If \(p\) is an isolated point of \(E\), then our definition implies that every function which has \(E\) as domain is automatically continuous at \(p\). Because no matter which \(\epsilon>0\), we can always find a \(\delta>0\) such that the only point in \(E\) and the neighbourhood with radius \(\delta\) is \(p\). Then \(d_Y(f(x),f(p))=0<\epsilon\).

If \(p\) is a limit point of \(E\). Then \(f\) is continuous at \(p\) if and only if \(\lim_{x\to p} f(x)=f(p)\)

This is really just rephrasing definition.

Suppose \(X,Y,Z\) are metric spaces with \(E\subset X\), \(f:E\to Y\), \(g:f(E)\to Z\). Then if \(f\) is continuous at a point \(p\in E\) and if \(g\) is continuous at the point \(f(p)\), then \(g\circ f\) is continuous at p.

Let \(\epsilon > 0\). Since \(g\) is continuous at \(f(p)\), there is some \(\eta>0\) such that \[d_Z(g(y), g(f(p))<\epsilon \textrm{if } d_Y(y,f(p))<\eta\textrm{ and }t\in f(E).\]

Since \(f\) is continuous at \(p\), there is a \(\delta>0\) such that \[d_Y(f(x),f(p))<\eta\textrm{ if }d_X(x,p)<\delta \textrm{ and }x\in E\].

Combine these, if \(d_X(x,p)<\delta\) and \(x\in E\). Then \[d_Z(g\circ f(x), g\circ f(p))<\epsilon.\] This is continuity at \(p\).

A mapping \(f:X\to Y\) is continuous if and only if \(f^{-1}(V)\) is open for every open set \(V\) in \(Y\).

  • From left to right: suppose \(f\) is continuous on \(X\) and \(V\) is an open set in \(Y\). We have to show that \(f^{-1}(V)\) is an interior point of \(f^{-1}(V)\). Suppose \(f(p)\in V\). Then since \(V\) open, there is \(\epsilon > 0\) such that \(y\in V\) if \(d_Y(f(p),y)<\epsilon\). Since \(f\) is continuous then there is \(\delta>0\) such that \(d_Y(f(x), f(p))<\epsilon\) if \(d_X(x,p)<\delta\). Thus \(x\in f^{-1}(V)\) as long as \(d_X(x,p)<\delta\).
  • From right to left: Fix \(p\in X\) and \(\epsilon > 0\). Let \(V\) be ball of radius \(\epsilon\) around \(f(p)\). Then \(V\) is open then \(f^{-1}(V)\) is open. Then there is \(\delta > 0\) such that \(x\in f^{-1}(V)\) as long as \(d_X(p,x)<\delta\). But if \(x\in f^{-1}(V)\) then \(f(x)\in V\). So \(d_Y(f(x),f(p))<\epsilon\)

A mapping \(f:X\to Y\) is continuous if and only if \(f^{-1}(C)\) is closed for all closed set \(C\) in \(Y\).

A set is closed if and only if its complement is open. But \(f^{-1}(E^c)=[f^{-1}(E)]^c\)

Let \(f\) and \(g\) be continuous functions from \(X \to \mathbb R\). Then \(f+g\), \(fg\) \(f/g\) are continuous.

At isolated points, nothing to prove. Otherwise use previous theorems, easy.

  1. Let \(f_1,\dots,f_k\) be a real functions on a metric space \(X\), let \(f:X\to \mathbb R^k\) be defined as \[f(x)=(f_1(x),\dots,f_k(x)).\] Then \(f\) is continuous if and only if \(f_i\) are continuous for all \(i=1,\dots,k\).
  2. Thus if \(f,g:X\to \mathbb R^k\) then \(f+g:X\to\mathbb R^k\) and \(f\dot g:X\to\mathbb R\) are all continuous.

From 1 to 2 is trivial. Just prove 1. Part 1 follows from this inequaly: \[|f_j(x)-f_j(y)|\le |f(x)-f(y)|=\sqrt{\sum_{i=1}^k |f_i(x)-f_i(y)|^2}\] for \(j=1,\dots,k\). Thus assuming continuity of \(f\) then continuity of \(f_i\) is trivial. Conversely for any \(\epsilon>0\), there is \(\delta_i\) such that \(d(x,y)\le \delta_i\) we have \(|f_i(x)-f_i(y)|\le \frac{\epsilon^2}k\) hence \(|f(x)-f(y)| \le \epsilon\).

If we define \(\phi_i (x): \mathbb R^k \to \mathbb R=x_i\). Then \(\phi_i\) are continuous. Apply above theorem repeatedly we can show that evaluatiing every multivariable polynomial is continuous: \[P(x)=\sum c_{n_1\dots n_k}x_1^{n_1}\dots x_k^{n_k}.\]

Since \(\left||x|-|y|\right|\le|x-y|\), we have \(x\mapsto |x|\) is continuous. Hence if \(f:X\to\mathbb R^k\) is continuous, \(\phi(x):=|f(x)|\) is continuous.

Being continuous is a "local" property. In the definition the complement of \(E\) never played any rule. Hence nothing of interest is lost if we just think \(E\) as a metric space on its own right other than a subset of an ambient space. Similarly we think the image as a metric space on its own (with subspace topology of course). In this way we don't have to pass subsets around.

Continuity and compactness

A mapping \(f:E\to \mathbb R^k\) is bounded if and only if there is a real number \(M\) such that \(|f(x)|\le M\) forall \(x\in E\).

Suppose \(f\) is continuous mapping of compact metric space X to compact metric space Y. Then image of \(f\) is compact.

Let \(\{V_\alpha\}\) convers \(f(X)\). Since \(f\) is continuous, \(f^{-1}(V_\alpha)\) is open and covers \(X\). Hence there is a choice of finitely many indeces \(\alpha_1,\dots,\alpha_n\) such that \(X\subseteq f^{-1}(V_{\alpha_1})\cup\dots \cup f^{-1}(V_{\alpha_n})\). Then since \(f(f^{-1}(E))\subseteq E\) So \(f(X)\subseteq V_{\alpha_1}\cup\dots V_{\alpha_n}\). This is everything.

If \(f\) is continuous mapping from compact \(X\) to \(\mathbb R^k\) then image is closed and bounded. Hence \(f\) is bounded.

Preivous theorem + Heine-Borel.

The following is super important! If you decided just to remember one thing from this lecture, remeber this!

[IMPORTANT] Suppose \(f\) is continuous \(X\to \mathbb R\) and \(M=\sup f(X)\) and \(m=\in f(X)\). Then there exists points \(p,q\in X\) such that \(f(p)=M\) and \(f(q)=m\).

previous theorem + a theorem from last lecture. Recall:

Let \(E\) be nonempty set of real numbers which is bounded above. Let \(y=\sup E\). Then \(y\in\bar E\). Hence \(y\in E\) if \(E\) is closed.

Use \(E\) to be \(f(X)\) which is closed and bounded. \(\inf\) is similar.

Suppose \(f\) is a continuous bijection from compact \(X\) to (not necessarily compact) \(Y\). Then the inverse \(f^{-1}:Y\to X\) is continuous

Suffices to prove that \(f(V)\) is open in \(Y\) for every open set \(V\) in \(X\). Fix \(V\). Then \(V^c\) is closed hence compact. \(f(V^c)\) is compact in \(Y\) hence closed in \(Y\). Since \(f\) bijection \(f(V^c)=f(V)^c\) is closed. Then \(f(V)\) is open.

Here compactness of \(X\) is really important. Consider \(X=[0,2\pi)\) and \(Y\) being the unit circle on \(\mathbb R^2\) and \(f(t)=(\cos t, \sin t)\). Though havn't formally defined trigonometry. Let's just work from intuition. \(f\) is indeed bijection. But \(X\) is not compact in \(\mathbb R\). The inverse function is not continuous because \(f^{-1}(1,0)=0\) but just above \((1,0)\) the preimage is close \(0\) just below \((1,0)\) preimage is close to \(2\pi\). Note \(Y\) is compact and this still fails. \(X\) being compact is the key.

Uniform continuity

This part is potentially confusing, I am not sure how much I would use in future lectures. If you find yourself wandering too much about the difference between uniform continuity and continuity, perhaps save this bit for later. But uniform continuity is nevertheless an important topic and should be understood fully.

Let \(f:X\to Y\) be a mapping between metric spaces. We say \(f\) is uniformly continuous on \(X\) if for every \(\epsilon > 0\) there exists \(\delta>0\) such that \(d_Y(f(p),f(q))<\epsilon\) for all \(p, q\in \mathbb X\) for which \(d_X(p,q)<\epsilon\).

For continuity the choice of \(\delta\) can be dependent on choice of \(p\). But for uniform continuity \(\delta\) can only dependent on \(\epsilon\) and nothing else. Trivially, uniform continuity \(\implies\) continuity but generally not vice versa unless \(X\) is compact. See below.

Let \(f\) be a continuous mapping from compact \(X\) to \(Y\). Then \(f\) is uniformly continuous.

Let \(\epsilon > 0\). Since \(f\) is continuous, we can associate each \(p\in X\) a positive number \(\phi(p)\) such that \[q\in X, d_X(p,q)<\phi(p) \implies d_Y(f(p), f(q))<\frac\epsilon 2.\] Let \(J(p)\) be the open neighbourhood around \(p\) with radius \(\frac12 \phi(p)\). Then \(p\in J(p)\) hence the collection of all \(\{J(p)\}\) forms an open cover of \(X\). Hence admit a finite subcover \(X\subseteq J(p_1)\cup\dots\cup J(p_n)\). Let \[\delta=\frac12\min(\phi(p_1),\dots,\phi(p_n)).\] Then by finiteness \(\delta > 0\). Suppose \(p,q\) are points of distance less than \(\delta\). Then there is an interger \(m\) between \(1\) and \(n\) such that \(p\in J(p_m)\). Hence \[d_X(p, p_m)\le \frac12\phi(p_m)\] Also, \[d_X(q, p_m)\le d_X(p,q)+d_X(p,p_m)<\delta+\frac12\phi(p_m)\le\phi(p_m).\] Then by definition of \(\phi\) we have \[d_Y(f(p),f(q))\le d_Y(f(p),f(p_m))+d_Y(f(q),f(p_m))<\frac\epsilon2+\frac\epsilon2=\epsilon\]

However,

If \(E\) is noncompact set in \(\mathbb R\). Then

  1. there is a continuous function on \(E\) that is not bounded;
  2. there is a continuous and bounded function on \(E\) without maximum;
  3. If \(E\) is bounded, there is a continuous function on \(E\) that is not uniformly continuous.

If \(E\) is bounded then \(E\) is not closed. Hence there is a limit point \(x_0\) of \(E\) but not in \(E\). Consider \[f(x)=\frac1{x-x_0}\] This is continuous but not bounded. It is not uniformly continuous. Let \(\delta>0\) be arbitrary. Choose a point \(x\in E\) so that \(|x-x_0|<\delta\). Let \(t\) be a number close to \(x_0\) we can make the difference \(|f(t)-f(x)|>1\) while \(|t-x|<\delta\). The detail is left as an exercise of algebraic manipulation.

Consider \[g(x)=\frac1{1+(x-x_0)^2}\] This function is bounded since \(0< g(x) < 1\). supremum is \(1\) but the supremum is never achieved.

If \(E\) is unbounded, then \(f(x)=x\) is a continuous function that is not bounded and \[h(x)=\frac{x^2}{1+x^2}\] is a bounded function but supremum is \(1\) and this is never attained. This finishes the proof.

Discontinuity

If \(x\) is a point in the domain of the definition of the function \(f\) at which \(f\) is not continuous, we say that \(f\) is discontinuous at \(x\), or that \(f\) has a discontinuity at \(x\). In this part we are going to classify two types of discontinuities.

[One side limit] Let \(f\) be defined on \((a,b)\). Consider any point \(x\) such that \(a\le x< b\). We write \[f(x+)=q\] if \(f(t_n)\to q\) as \(n\to\infty\) for all sequences \((t_n)\) in \((x,b)\) such that \(t_n\to x\). For the definition of \(f(x-)\) for \(a< x\le b\), we restrict ourselves to sequences \((t_n)\) in \((a,x)\).

It is clear that any point \(x\) of \((a,b)\), then \(\lim_{t\to x} f(t)\) exists if and only if \[f(x+)=f(x-)=\lim_{t\to x} f(t)\]

Let \(f\) be defined on \((a,b)\). If \(f\) is discontinuous at a point \(x\) and if \(f(x+)\) an \(f(x-)\) exist, then \(f\) is said to have a discontinuity of the first kind, or a simple discontinuity at \(x\). Otherwise the discontinuity is said to be of the second kind.

There are two ways in which a function can have a simple discontinuity:

  • \(f(x+)\ne f(x-)\) in this case the value of \(f(x)\) doesn't really matter.
  • \(f(x+)=f(x-)\ne f(x)\)
  1. Define: \[ f(x)=\left\{ \begin{aligned} & 1 & (x\textrm{ rational})\\ & 0 & (x\textrm{ irrational}) \end{aligned}\right. \] Then \(f\) has a discontinuity of the second kind at every point \(x\), since neither \(f(x+)\) nor \(f(x-)\) exists

  2. Define: \[ f(x)=\left\{ \begin{aligned} & x & (x\textrm{ rational}) \\ & 0 & (x\textrm{ irrational}) \end{aligned} \right. \] Then \(f\) is continuous at \(x=0\) and discontinuous of the second kind at every other point.

  3. Define: \[ f(x)=\left\{ \begin{aligned} & x+2 & (-3< x <-2)\\ & -x-2 & (-2\le x<0)\\ & x+2 & (0\le x<1) \end{aligned} \right. \] Then \(f\) has a simple discontinuity at \(x=0\) and is continuous at every other point of \((-3,1)\).

  4. Define \[ f(x)=\left\{ \begin{aligned} & \sin\frac1x & (x\ne 0)\\ & 0 & (x = 0) \end{aligned} \right. \] Neither \(f(0+)\) nor \(f(0-)\) exists, \(f\) has a discontinuity of second at \(x=0\). Assume the continuity of \(\sin\). Then \(f(x)\) is continuous at every other points \(x\ne 0\). One side limit doesn't exists because \(sin(2n\pi)=0\) but \(sin\left(\left(2n+\frac12\right)\pi\right)=1\) for all \(n\in\mathbb Z\) or something like that. Then by picking \((t_n)=\left(\frac1{2n\pi}\right)\) and \((s_n)=\left(\frac1{\left(2n+\frac12\right)\pi}\right)\). They both tends to \(0\) but result in different limits after applying \(f\). Similarly for \(f(0-)\).

Monotonic functions

Let \(f:(a,b)\to\mathbb R\). We say \(f\) to be monotonically increasing (decreasing resp.) on \((a,b)\) if \(a< x < y< b\) implies \(f(x)\le f(y)\) (\(f(y) \le f(x)\) resp.).

Let \(f\) be monotonically increasing on \((a,b)\). Then \(f(x+)\) and \(f(x-)\) exists at every point of \(x\in(a,b)\). More precisely, \[\sup_{a < t < x}f(t)=f(x-)\le f(x)\le f(x+)=\inf_{x < t < b} f(t)\] Futhermore, if \(a < x < y < b\), then \[f(x+)\le f(y-)\].

For monotonically decreasing functions, analogous results hold.

By hypothesis, the set of numbers \(f(t)\) where \(a < t < x\) is bounded above by \(f(x)\). Hence a least upper bound exists, we call is \(A\). Then \(A\le f(x)\). We need to show \(A=f(x-)\).

Let \(\epsilon>0\). Then by defniition of \(\sup\), there is a \(\delta>0\) such that \(a< x-\delta < x\) and \[A-\epsilon< f(x-\delta)\le A\] By monotonicity, for all \(x-\delta< t < x\) \[f(x-\delta)\le f(t)\le A\] Thus we have \[|f(t)-A|\le \epsilon\] for all \(x-\delta < t < x\). Hence we have \(f(x-)=A\). The other half use the same method mutatis mutandis.

Assume \(a < x < y < b\), then \[f(x+)=\inf_{x< t < b}f(t)=\inf_{x < t < y}f(t)\] Similarly we also have \[f(y-)=\sup_{a < t < y} f(t)=\sup_{x< t < y} f(t)\]. This finishes the proof.

Monotonic functions have no discontinuity of the second kind.

Let \(f\) be monotonic on \((a,b)\). Then the set of points of \((a,b)\) at which \(f\) is discontinuous is at most countable.

Wlog, \(f\) is monotonically increasing. Let \(E\) be the set of discontinuities of \(f\).

With every point \(x\in E\) we associate a rational number \(r(x)\) such that \(f(x-)< r(x) < f(x+)\). Since \(x_1 < x_2\) implies \(f(x_1 + )\le f(x_2-)\), we can see that \(r(x_1)\ne r(x_2)\). But rational numbers are countable.

Summary

In this lecture we discussed:

  • limit of function and continuity;
  • continuity interacting with compactness;
  • two type of discontinuities;
  • monotonic functions and their discontinuities.

Differentiation (Lecture 10)

Last time we discussed notion of continuity, this time we shall focus on differentiability. Let us consider the function \(x\mapsto |x|\). This functoin is continuous at \(x=0\), however it has a sharp corner at \(x=0\). In this lecture we will see in what sense this corner is troublesome.

Definition and basic properties

Let \(f : [a,b]\to\mathbb R\). For any \(x\in[a,b]\), we consider the quotient \[ \phi(t)=\frac{f(t)-f(x)}{t-x} \] for \(t\in(a,b)\) and \(t\ne x\). Define \[ f'(x)=\lim_{t\to x}\phi(t) \]

provided the limit exists.

We thus associate the function \(f\) with another function \(f'\) whose domain is the set of points \(x\) at which the limit exists. \(f'\) is called the derivative of \(f\).

If \(f'\) is defined at a point \(x\). We say that \(f\) is differentiable at \(x\). If \(f'\) is defined everywhere on a set \(E\subseteq[a,b]\), we say that \(f\) is differentiable on \(E\). We can consider one sided limit to define one sided derivative at end points \(a\) and \(b\). But we wont discuss it in detail.

Let \(f:[a,b]\to\mathbb R\), then \(f\) is differentiable at \(x\in[a,b]\), then \(f\) is continuous at \(x\).

As \(t\to x\), \[ f(t)-f(x)=\frac{f(t)-f(x)}{t-x}(t-x) \] will approach \(f'(x)\cdot 0 = 0\).

The converse is not true, in fact there exists function continuous every but not differentiable anywhere.

Suppose \(f\) and \(g\) are defined on \([a,b]\) and differentiable at \(x\in [a,b]\). Then \(f+g\), \(fg\) and \(\frac{f}{g}\) are differentiable at \(x\) with

  • \((f+g)'(x)=f'(x)+g'(x)\);
  • \((fg)'(x) = f'(x)g(x) + f(x)g'(x)\);
  • \(\left(\frac{f}{g}\right)'(x)=\frac{g(x)f'(x)-g'(x)f(x)}{g^2(x)}\) (Assuming \(g(x)\ne 0\)).
  • Clear
  • Write \(h=fg\). Then \[ h(t)-h(x)=f(t)[g(t)-g(x)]+g(x)[f(t)-f(x)] \] Divide by \(t-x\), since \(f\) is continuous at \(x\), \(f(t)\to f(x)\) as \(t\to x\). The result follows.
  • Write \(h=\frac f g\), then \[ \frac{h(t)-h(x)}{t-x}=\frac{1}{g(t)g(x)}\left[g(x)\frac{f(t)-f(x)}{t-x}-f(x)\frac{g(t)-g(x)}{t-x} \right] \] by continuity of \(f\) and \(g\) and results from previous lecture, the result follows.
  • Clearly, constant functions are differentiable and the derivative is 0.
  • If \(f(x)=x\), then \(f'(x)=1\).
  • If \(g(x)=x^2=f(x)^2\), then \(g'(x)=f'(x)f(x)+f(x)f'(x)=2f(x)=2x\)
  • By induction suppose the derivative of \(g_n(x)=x^n\) is \(g_n'(x)=nx^{n-1}\). Then \(g_{n+1}(x)=x^{n+1}=g_n{x}f(x)\), the derivative will be \(g_n'(x)f(x)+g_n(x)f'(x)=nx^{n-1}x+x^n=(n+1)x^n\).
  • Hence for all \(n\in\mathbb N\), we have the derivative of \(x\mapsto x^n\) is \(x\mapsto nx^{n-1}\).
  • Using first and second part of previous theorem, we know every polynomial is differentiable. Using the third part, we know that every quotient of polynomial is differentiable except for finite points where the denominator is zero.

[Chain rule] Suppose \(f\) is continuous on \([a,b]\), \(f'(x)\) exists at some point \(x\in[a,b]\), \(g\) is defined on an interval \(I\) which contains the range of \(f\), and \(g\) is differentiable at the point \(f(x)\). Let us write \(h(t)=g(f(t))\). Then \(h\) is differentiable at \(x\) and

\[ h'(x)=g'(f(x))f'(x) \]

Write \(y=f(x)\). Then by definition: \[ f(t)-f(x)=(t-x)[f'(x)+u(t)]\\ g(s)-g(y)=(s-y)[g'(y)+v(s)] \] for some \(u\) and \(v\) such that \(u(t)\to 0\) as \(t\to x\) and \(v(s)\to 0\) as \(s\to y\). Let \(s=f(t)\), we have \[ h(t)-h(x)=g(f(t))-g(f(x))\\ =[f(t)-f(x)][g'(y)+v(s)]\\ =(t-x)[f'(x)+u(t)][g'(y)+v(s)] \]

in another word, for \(t\ne x\): \[ \frac{h(t)-h(x)}{t-x}=[g'(y)+v(s)][f'(x)+u(t)] \] As \(t\to x\), we have \(s\to y\) by continuity of \(f\). Hence rhs approaches \(g'(y)f'(x)\).

This is exactly what we want.

Chain rule is of vital importance, one should practice it a lot.

  • Define \(f\) to be \[ f(x)=\left\{ \begin{aligned} & x\sin{\frac1 x} & (x\ne 0)\\ & 0 & (x = 0) \end{aligned} \right. \] Take it for granted that \(\sin\) has \(\cos\) as its derivative for now. Let us calculate derivative of \(f\) for \(x\ne 0\), we can see \[ f'(x)=\sin{\frac1 x}-\frac{1}{x}\cos{\frac1x} \]

However at \(x=0\), we cannot use any of the theorem since \(\frac 1 x\) is not defined at \(0\). Let us work harder: for \(t\ne 0\), \[ \frac{f(t)-f(0)}{t-0}=\sin{\frac 1t} \] . As \(t\to 0\), this does not attain any limit. Hence \(f'\) is not defined at \(0\).

  • Let \(f\) be defined as \[ f(x)=\left\{ \begin{aligned} &x^2\sin{\frac1x}&(x\ne 0)\\ &0&(x=0) \end{aligned} \right. \] for \(x\ne 0\), the theorem does everything for us: \[ f'(x)=2x\sin{\frac1x}-\cos{\frac1x} & (x\ne 0) \] At \(x=0\), let us work harder: \[ 0\le\left|\frac{f(t)-f(0)}{t-0}\right|=\left|t\sin{\frac1t}\right|\le|t| \] Thus as \(t\to 0\), we see \(f'(0)=0\). In this case \(f\) is differentiable at all points \(x\). But its derivative is not continuous since \(\cos{\frac1x}\) does not have a limit. (For reason, similar to \(\sin{\frac1x}\))

Mean value theorems and other theorems

Let \(f:X\to\mathbb R\) where \(X\) is a metric space. We say \(f\) has a local maximum at a point \(p\) if ther exists some \(\delta>0\) such that \(f(q)\le f(p)\) for all \(q\in X\) with \(d(p,q)<\delta\).

Local minimum is similar.

(draw a little picture to demonstrate)

[Rolle] Let \(f:[a,b]\to\mathbb R\). If \(x\in(a,b)\) is a local maximum, if \(f'(x)\) exists then \(f'(x)=0\).

Use the \(\delta\) from definition of local maxima so that \(a< x-\delta< x< x+\delta< b\). If \(x-\delta< t < x\), then \[ \frac{f(t)-f(x)}{t-x}\ge 0 \] Let \(t\to x\), we see \(f'(x)\ge 0\). If \(x< t< x+\delta\), then \[ \frac{f(t)-f(x)}{t-x}\le 0 \] Hence \(f'(x)\le 0\). Together: \(f'(x)=0\)

For local minimum \(x\), if \(f'(x)\) exists, it is \(0\)

Apply the previous theorem to \(g=-f\).

[Cauchy] If \(f\) and \(g\) are continuous real functions on \([a,b]\) which are differentiable in \((a,b)\) then there is a point \(x\in(a,b)\) at which \[ [f(b)-f(a)]g'(x)=[g(b)-g(a)]f'(x) \] Note that, we do not require differentiability at \(a\) and \(b\).

Let us consider \(h:[a,b]\to\mathbb R\) \[ h(t)=[f(b)-f(a)]g(t)-[g(b)-g(a)]f(t) \] We can see \(h\) is continuous on \([a,b]\) and differentiable on \((a,b)\) and \[ h(a)=f(b)g(a)-f(a)g(b)=h(b). \] To prove the theorem, we need to show that \(h'(x)=0\) for some \(x\in(a,b)\).

If \(h\) is constant, this holds for every \(x\in(a,b)\). If \(h(t)>h(a)\) for some \(t\in(a,b)\), let \(x\) be a point in \([a,b]\) at which \(h\) has maximum. (Remember the important theorem, this is that.) \(x\in(a,b)\) and the previous theorem shows that \(h'(x)=0\). If \(h(t)< h(a)\) for some \(t\), we can use the important theorem to pick a minimum.

This theorem is some times refered to as generalized mean value theorem. We more often just use a corollary.

If \(f:[a,b]\to\mathbb R\) is differentiable in \((a,b)\), then there is some point \(x\in(a,b)\) at which \[ f(b)-f(a)=(b-a)f'(x) \] (draw a picture here)

Use \(g(x)=x\)

Suppose \(f\) is differentiable in \((a,b)\).

  • If \(f'(x)\ge 0\) for all \(x\in(a,b)\), then \(f\) is monotonically increasing;
  • If \(f'(x)=0\) for all \(x\in(a,b)\), then \(f\) is constant;
  • If \(f'(x)\le 0\) for all \(x\in(a,b)\), then \(f\) is monotonically decreasing.

For any pair of \(x_1,x_2\) in \((a,b)\), there is some \(x\) in between \(x_1\) and \(x_2\) such that \[ f(x_2)-f(x_1)=(x_2-x_1)f'(x) \].

Everything follows from this equation.

Suppose \(f:[a,b]\to\mathbb R\) is differentiable on \([a,b]\) and suppose \(f'(a)<\lambda< f'(b)\). Then there is a point \(x\in(a,b)\) such that \(f'(x)=\lambda\).

A similar result holds if \(f'(a)>f'(b)\).

Put \(g(t)=f(t)-\lambda t\). Then \(g'(a)< 0\) so that for some \(t_1\in(a,b)\) that \(g(t_1)< g(a)\). \(g'(b)>0\), hence for some \(t_2\in(a,b)\) there is some \(g(t_2) < g(b)\). Hence \(g\) attains minimum on \([a,b]\). (The important theorem again). At minimum \(x\), \(g'(x)=0\). Hence \(f'(x)=\lambda\).

If \(f\) is differentiable on \([a,b]\) then \(f'\) cannot have any simple discontinuities on \([a,b]\).

But \(f'\) may have discontinuities of the second kind. See above example \(x^2\sin{\frac1x}\)

We are going to state and prove L'Hospital's Rule which is useful in the evaluation of limits.

[L'Hospital's rule] Suppose \(f\) and \(g\) are real and differentiable in \((a,b)\) and \(g'(x)\ne 0\) for all \(x\in(a,b)\) where \(-\infty\le a< b\le\infty\). Suppose as \(x\to a\) \[ \frac{f'(x)}{g'(x)}\to A. \] If as \(x\to a\), \[f(x)\to 0\land g(x)\to 0\] then as \(x\to a\) \[ \frac{f(x)}{g(x)}\to A. \]

The analogous statement is also true if \(x\to b\). This is not the fullest version of L'Hospital's rule. For more information, one can read Rudin and others (1964) theorem 5.13.

We choose a real number \(q\) such that \(A< q\) and then choose \(A< r < q\). There is a point \(c\in(a,b)\) such that \(a< x < c\) implies \[ \frac{f'(x)}{g'(x)}< r. \]

If \(a< x < y < c\), by generalized mean value theorem, there is a point \(t\in(x,y)\) such that \[ \frac{f(x)-f(y)}{g(x)-g(y)}=\frac{f'(t)}{g'(t)}< r. \] Suppose that \(f(x)\to 0\) and \(g(x)\to 0\), as \(x\to a\), we see that \[ \frac{f(y)}{g(y)}\le r < q. \] This is true for any \(A< q\). Hence \(\frac{f(y)}{g(y)}\le A\).

Similarly, let us choose \(p\) such that \(p < A\), we can find \(c_3\) such that for \(a< x < c_3\) \[ p<\frac{f(x)}{g(x)}. \] Then \(A\le\frac{f(x)}{g(x)}\). The result follows.

\(\lim_{x\to 0}\frac{\sin x}{x}=\lim_{x\to 0}\frac{\cos x}=1\)

Taylor's theorem

If \(f\) has a derivative \(f'\) on an interval and if \(f'\) is itself differentiable we denote the derivative of \(f'\) by \(f''\), the second derivative of \(f\). Continuing in this manner, we obtain functions \(f,f',f'',f^{(3)},\dots,f^{(n)}\), each of which is the derivative of the preceding one. \(f^{(n)}\) is called the \(n\)-th derivative of \(f\).

In order for \(f^{(n)}\) to exist at a point \(x\), \(f^{(n-1)}\) needs to exist in a neighborhood of \(x\) (or a one sided neighborhood if \(x\) is endpoint) and \(f^{(n-2)}\) must exist in a neighborhood of \(x\) etc.

Taylor's theorem Suppose \(f:[a,b]\to\mathbb R\) and \(n\) is a positive integer, \(f^{(n-1)}\) is continuous on \([a,b]\) and \(f^{(n)}(t)\) exists for every \(t\in(a,b)\). Let \(\alpha\) and \(\beta\) be distinct points of \([a,b]\) and define \[ P(t):=\sum_{k=0}^{n-1}\frac{f^{(k)}(\alpha)}{k!}(t-\alpha)^k. \] Then there exists a point \(x\) between \(\alpha\) and \(\beta\) such that \[ f(\beta)=P(\beta)+\frac{f^{(n)}(x)}{n!}(\beta-\alpha)^n. \]

For \(n=1\), this is just the mean value theorem. In general, the theorem shows that \(f\) can be approxiamted by a polynomial of degree \(n-1\) and the error can be estimated once we know the bounds on \(|f^{(n)}(x)|\).

Let \(M\) be the number defined by \[ f(\beta)=P(\beta)+M(\beta-\alpha)^n \] and for \(a\le t\le b\), put \[ g(t)=f(t)-P(t)-M(t-\alpha)^n \]. We need to show that \(n!M=f^{(n)}(x)\) for some \(x\) between \(\alpha\) and \(\beta\).

We know that \[ g^{(n)}(t)=f^{(n)}(t)-n!M. \] Hence the proof is complete if \(g^{(n)}(x)=0\) for some \(x\) between \(\alpha\) and \(\beta\).

Since \(P^{(k)}(\alpha)=f^{(k)}(\alpha)\) for \(k=0,\dots,n-1\), we have \[ g(\alpha)=g'(\alpha)=\dots=g^{(n-1)}(\alpha)=0 \] Our choice of \(M\) shows that \(g(\beta)=0\). Thus there is some \(x_1\) between \(\alpha\) and \(\beta\) such that \(g'(x_1)=0\) by mean value theorem. Since \(g'(\alpha)=0\), we can use mean value theorem to find \(x_2\) between \(\alpha\) and \(x_1\) such that \(g''(x_2)=0\). After \(n\) steps we arrive at the conclusion that \(g^{(n)}(x_n)=0\) for some \(x_n\) between \(\alpha\) and \(x_{n-1}\).

Let \(r\in\mathbb R\), let \(f:(-1,1]\to\mathbb R\) be given by \(f(t)=(1+t)^r\). Since \[ f^{(n)}(x)=r(r-1)\dots(r-n+1)(1+x)^{r-n} \] and \[ f^{(n)}(0)=r(r-1)\dots(r-n+1) \]

We can see that \[ (1+x)^{r}=1+\frac r{1!}x+\frac{r(r-1)}{2!}x^2+\dots+\frac{r(r-1)\dots(r-n+1)}{n!}x^n+\frac{f^{(n+1)}(0)}{(n+1)!}x_0^{n+1} \] for some \(x_0\)

We conclude this lecture by a brief introduction of differentiation of vector valued functions

Summary

In this lecture, we talked about differentiation of function \(f:[a,b]\to\mathbb R\).

  • Definition and basic properties
  • Mean value theorem
  • Taylor theorem

Riemman Integral (Lecture 11)

Last time, we talked about differentiation, this time, we embark on Riemann integration of a function \(f:[a,b]\to \mathbb R\), sometimes also known as Riemann-Stieltjes integral. Our definition, compared to a standard approach of choosing an \(\alpha(x)=x\), seems to be unnecessarily complicated, and this criticism is perhaps true. But by allowing \(\alpha(x)\) to be any monotonically increasing function, pedagogically not more is added and careless reader could assume \(\alpha(x)=x\) any way, but the advantage is clear when introducing integration by part and changing variables.

Definition and existence of integral

Before we can define integral, we define partition of an interval.

Let \([a,b]\) be a given interval. By a partition \(P\) of \([a,b]\), we mean a finite set of points \(x_0,x_1,\dots,x_n\) where \[ a=x_0\le x_1\le\dots\le x_{n-1}\le x_n=b \] For \(i=1,\dots,n\), we write \[ \Delta x_i=x_i-x_{i-1} \] Suppose \(f\) is bounded real function defined on \([a,b]\). Corresponding to each partition \(P\) of \([a,b]\), we write \[ \begin{aligned} M_i &= \sup_{x_{i-1}\le x\le x_i}f(x)\\ m_i &= \inf_{x_{i-1}\le x\le x_i}f(x)\\ U(P,f) &= \sum_{i=1}^n M_i \Delta x_i \\ L(P,f) &= \sum_{i=1}^n m_i \Delta x_i \end{aligned} \] and finally we define the upper and lower Riemann integrals of \(f\) over \([a,b]\) respectively, \[ \begin{aligned} \overline{\int_a^b}f\mathrm{d}x=\inf_{P\text{ is a partition}}U(P,f)\\ \underline{\int_a^b}f\mathrm{d}x=\sup_{P\text{ is a partition}}L(P,f)\\ \end{aligned} \]

If the upper and lower integrals are equal, we say \(f\) is Riemann integrable on \([a,b]\) and we denote the common value as \[ \int_a^b f\mathrm{d}x \] This is called the Riemann integral of \(f\) over \([a,b]\).

Draw a picture here.

Since \(f\) is assumed to be bounded, there exists two numbers \(m\) and \(M\) such that for all \(a\le x\le b\) we have \(m\le f(x)\le M\). Then for any partition \(P\), we have \[ m(b-a)\le L(P, f)\le U(P,f)\le M(b-a) \] so the upper and lower sums are bounded hence ther infimum and superemum exist.

We can generalise the notion of Reimann integral a little to the Reimann-Stieltjes integral.

Let \(\alpha\) be a monotonically increasing function on \([a,b]\) and \(P\) be a partition of \([a,b]\). We write \[ \Delta \alpha_i=\alpha(x_i)-\alpha(x_{i-1}) \] By monotonicity of \(\alpha\), each \(\Delta \alpha_i\ge 0\). For any bounded function \(f:[a,b]\to\mathbb R\), we write \[ \begin{aligned} U(P,f,\alpha)&=\sum_{i=1}^n M_i\Delta\alpha_i\\ L(P,f,\alpha)&=\sum_{i=1}^n m_i\Delta\alpha_i \end{aligned} \] and we define \[ \begin{aligned} \overline{\int_a^b}f\mathrm{d}\alpha=\inf_{P\text{ is a partition}}U(P,f,\alpha)\\ \underline{\int_a^b}f\mathrm{d}\alpha=\sup_{P\text{ is a partition}}L(P,f,\alpha)\\ \end{aligned} \] If the upper integral and lower integral are equal, we denote the common value by \[ \int_a^b f\mathrm{d}\alpha \] This is called the Riemann-Stieltjes integral of \(f\) with respec to \(\alpha\) over \([a,b]\).

Note that by taking \(\alpha(x)=x\), we recover the Reimann integral. We now investigate existence of integral. From this point in this lecture, we assume \(f\) to be a bounded real function and \(\alpha\) to be monotonically increasing on \([a,b]\).

Let \(P, P^*\) be partions, we say \(P^*\) is a refinement of \(P\) if \(P\subseteq P^*\). Given two partions \(P_1\) and \(P_2\), we say that \(P^*\) is their common refinment if \(P^*=P_1\cup P_2\).

If \(P^*\) is a refinement of \(P\), then \[ \begin{aligned} L(P,f,\alpha)&\le L(P^*,f\alpha)\\ U(P^*,f,\alpha)&le U(P,f,\alpha) \end{aligned} \]

Let us assume that \(P^*\) contains just on point more than \(P\). Say the extra point is \(x^*\), and suppose \(x_{i-1}< x^* < x_i\) where \(x_{i-1}\) and \(x_i\) are two consecutive points of \(P\). Let \[ \begin{aligned} w_1 &= \inf_{x_{i-1}\le x\le x^*} f(x) \\ w_2 &= \inf_{x^*\le x \le x_i}f(x) \end{aligned} \] Clearly \(w_1\ge m_i\) and \(w_2\ge m_i\). Hence \[ \begin{aligned} L(P^*,f,\alpha)-L(P,f,\alpha)&=w_1[\alpha(x^*)-\alpha(x_{i-1})]+w_2[\alpha(x_i)-\alpha(x^*)]-m_i[\alpha(x_i)-\alpha(x_{i-1})]\\ &=(w_1-m_i)[\alpha(x^*)-\alpha(x_{i-1})]+(w_2-m_i)[\alpha(x_i)-\alpha(x*)]\ge 0 \end{aligned} \]

If \(P^*\) contains \(k\) points more than \(P\), say the \(k\) points are \(y_1\dots y_k\). Then \(L(P^*,f,\alpha)\ge L(P\cup\{y_1,\dots,y_{k-1}\},f,\alpha)\), we can conclude by induction.

For upper sum, the proof is analogous hence left as an exercise.

\[ \underline{\int_a^b}f\mathrm{d}\alpha\le\overline{\int_a^b}f\mathrm{d}\alpha \]

For any partitions \(P_1\) and \(P_2\), let \(P^*\) be their common refinement. By previous theorem, we have \[ L(P_1,f,\alpha)\le L(P^*,f,\alpha)\le U(P^*,f,\alpha)\le U(P_2,f,\alpha) \] Hence \[ L(P_1,f,\alpha)\le U(P_2,f,\alpha) \] Since this is true for all \(P_1\) we have \[ \sup_{P_1\text{ is a partition}}L(P_1,f,\alpha)\le U(P_2,f,\alpha) \] This is: \[ \underline{\int_a^b}f\mathrm{d}\alpha\le U(P_2,f,\alpha) \] Since this is true for any \(P_2\), we have: \[ \underline{\int_a^b}f\mathrm{d}\alpha\le\inf_{P_2\text{ is a partition}} U(P_2,f,\alpha) \] This is exactly what we want: \[ \underline{\int_a^b}f\mathrm{d}\alpha\le\overline{\int_a^b}f\mathrm{d}\alpha \]

\(f:[a,b]\to\mathbb R\) is integrable with respect to \(\alpha\) if and only if for every \(\epsilon>0\) there is a partition \(P\) such that \[ U(P,f,\alpha)-L(P,f,\alpha)<\epsilon \]

For any partition \(P\) we have \[ L(P,f,\alpha)\le\underline{\int_a^b}f\mathrm{d}\alpha\le\overline{\int_a^b}f\mathrm{d}\alpha\le U(P,f,\alpha) \]

Let us prove from right to left: the inequality implies that \[ 0\le\overline{\int_a^b}f\mathrm{d}\alpha-\underline{\int_a^b}f\mathrm{d}\alpha<\epsilon \] Since this is true for any \(\epsilon>0\), we have to conclude that \[ \overline{\int_a^b}f\mathrm{d}\alpha=\underline{\int_a^b}f\mathrm{d}\alpha \] This is precisely what we mean by \(f\) being integrable with respect to \(\alpha\).

Let us prove from left to right: for any \(f:[a,b]\to\mathbb R\) integrable with respect to \(\alpha\) and let \(\epsilon>0\) be given. There exists partitions \(P_1\) and \(P_2\) due to definition of \(\sup\) and \(\inf\) such that \[ \begin{aligned} U(P_2,f,\alpha)-\int_a^b f\mathrm{d}\alpha&<\frac\epsilon2\\ \int_a^bf\mathrm d\alpha-L(P_1,f,\alpha)<\frac\epsilon2 \end{aligned} \] So let \(P\) be the common refine ment of \(P_1\) and \(P_2\). Then by previous theorems and inequalies \[ U(P,f,\alpha)\le U(P_2,f,\alpha)<\int_a^b f\mathrm d\alpha+\frac\epsilon2< L(P_1,f,\alpha)+\epsilon\le L(P,f,\alpha)+\epsilon \] This is what we want.

This theorem is a convenient criterion for integrability. Before we apply it, we state some closely related facts.

  • If \(U(P,f,\alpha)-L(P,f,\alpha)<\epsilon\) then for any refinement \(P^*\) of \(P\) we have \(U(P^*,f,\alpha)-L(P^*,f,\alpha)<\epsilon\)
  • If \(U(P,f,\alpha)-L(P,f,\alpha)<\epsilon\) for \(P=\{x_0,\dots,x_n\}\) and if \(s_i\) and \(t_i\) are points in \([x_{i-1},x_i]\) then \[ \sum_{i=1}^n\left|f(s_i)-f(t_i)\right|\Delta\alpha_i<\epsilon \]
  • If \(U(P,f,\alpha)-L(P,f,\alpha)<\epsilon\) for \(P=\{x_0,\dots,x_n\}\) and if \(t_i\) are points in \([x_{i-1},x_i]\) then \[. then for any $f:[a,b]\to\mathbb R$ integrable with respect to $\alpha$, we have \] |_{i=1}^nf(t_i)_i-_a^bfd|<$$
  • Trivial from a theorem above.
  • We know \(f(s_i)\) and \(f(t_i)\) are in \([m_i,M_i]\) hence \(|f(s_i)-f(t_i)|\le M_i-m_i\). Thus \(\sum_{i=1}^n |f(s_i)-f(t_i)|\le U(P,f,\alpha)-L(P,f,\alpha)\)
  • We know obviously \[ L(P,f,\alpha)\le\sum f(t_i)\Delta\alpha_i\le U(P,f,\alpha) \] and \[ L(P,f,\alpha)\le\int f\mathrm d\alpha\le U(P,f,\alpha) \] This prove the last part.

If \(f:[a,b]\mathbb R\) is continuous on \([a,b]\) then \(f\) is integrable with respect to \(\alpha\)

Let \(\epsilon>0\) be arbitrary. Choose \(\eta>0\) such that \([\alpha(b)-\alpha(a)]\eta<\epsilon\). Then since \(f\) is continuous on \([a,b]\) and \([a,b]\) is compact, \(f\) is uniformly continuous on \([a,b]\). Then there is a \(\delta>0\) such that for any \(x,t\in[a,b]\) with \(|x-t|<\delta\) we have \[ |f(x)-f(t)|<\eta. \]

If \(P\) is any partition of \([a,b]\) such that \(\Delta x_i<\delta\) for all \(i\), then by the choice of \(\delta\) we have \(M_i-m_i\le\eta\). Therefore \[ U(P,f,\alpha)-L(P,f,\alpha)=\sum_{i=1}^n(M_i-m_i)\Delta\alpha_i\le\eta\sum_{i=1}^n\Delta\alpha_i=\eta[\alpha(b)-\alpha(a)]<\epsilon. \] By the above criterion we are done.

We missed a results of intermediate value theorem in previous lectures, which we quote now.

If \(g:[a,b]\to\mathbb R\) be a continuous function and \(g(a)< g(b)\) and \(c\) is a number such that \(f(a)< c< f(b)\) then there is a point \(x\in(a,b)\) such that \(f(x)=c\).

Let \(E=\{y\in[a,b]|g(y)\le c\}\). Then \(a\in E\) and \(E\) is bounded above. Then \(x_0=\sup E\) exists.

Prove that \(f(x_0)=c\) as an exercise using continuity and definition of supremum.

If \(f\) is monotic on \([a,b]\) and if \(\alpha\) is continuous on \([a,b]\) then \(f\) is integrable with respect to \(\alpha\) over \([a,b]\).

Let \(\epsilon>0\) and \(n\) be a positive integer. By intermediate value theorem we can choose a partition such that \[ \Delta\alpha_i=\frac{\alpha(b)-\alpha(a)}n \]

Without loss of generality suppose \(f\) is monotonically increasing. Then \(M_i=f(x_i)\) and \(m_i=f(x_{i-1})\). So \[ U(P,f,\alpha)-L(P,f,\alpha)=\frac{\alpha(b)-\alpha(a)}n\sum_{i=1}^n[f(x_i)-f(x_{i-1})]=\frac{\alpha(b)-\alpha(a)}n(f(b)-f(a)) \] For \(n\) large enough, this is less than \(\epsilon\). So \(f\) is integrable.

We quote a theorem without proof.

Suppose \(f\) is bounded on \([a,b]\) with finitely many discontinuities. Suppose \(\alpha\) is continuous at every discontinuities of \(f\). Then \(f\) is integrable with respect to \(\alpha\).

Suppose \(f:[a,b]\to\mathbb R\) is integrable with respect to \(\alpha\), \(m\le f\le M\), \(\phi\) is continuous on \([m,M]\). Write \(h(x)=\phi(f(x))\) on \([a,b]\). Then \(h\) is integrable over \([a,b]\) with respect to \(\alpha\).

Choose \(\epsilon>0\). Since \(\phi\) is uniformly continuous on \([m, M]\), there exists \(\delta>0\) such that \(\delta<\epsilon\) and \(|\phi(s)-\phi(t)|<\epsilon\) for any \(s,t\in[m, M]\) \(|s-t|\le\delta\).

Since \(f\) is integrable over \([a,b]\) with respect to \(\alpha\), there is a partition \(P=\{x_0,\dots,x_n\}\) such that \[ U(P,f,\alpha)-L(P,f,\alpha)<\delta^2 \] Let us write \(M_i\) and \(m_i\) be supremum and infimum of \(f\) on \([x_{i-1},x_i]\) and \(M_i^*\) and \(m_i^*\) be supremum and infimum of \(h\). We put \(i=1,\dots,n\) into two sets \(A\) and \(B\): put \(i\in A\) if \(M_i-m_i<\delta\) and \(i\in B\) otherwise.

  • For any \(i\in A\), our choice of \(\delta\) shows that \(M_i^*-m_i^*\le\epsilon\)
  • For any \(i\in B\), \(M_i^*-m_i^*\le2K\) where \(K=\sup_{m\le t\le M}|\phi(t)|\). By \(U(P,f,\alpha)-L(P,f,\alpha)<\delta^2\) and definition of \(B\), we get \[ \delta\sum_{i\in B}\Delta\alpha_i\le\sum_{i\in B}(M_i-m_i)\Delta \alpha_i<\delta^2 \] so that \(\sum_{i\in B}\Delta\alpha_i<\delta\). It follows that \[ U(P,h,\alpha)-L(P,h,\alpha)=\sum_{i\in A}(M_i^*-m_i^*)\Delta\alpha+i+\sum_{i\in B}(M_i^*-m_i^*)\Delta\alpha_i\le\epsilon[\alpha(b)-\alpha(a)]+2K\delta<\epsilon[\alpha(b)-\alpha(a)+2K]. \] Since \(\epsilon\) is arbitrary, we proved integrability of \(h\) by the previous criterion.

Properties of integral

Assume \(f\) is an integrable function on \([a,b]\) with respect to \(\alpha\) and \(c\in\mathbb R\) is a constant then \(c\cdot f\) is integrable and \(\int_a^b cf\mathrm d\alpha=c\int_a^b f\mathrm d\alpha\).

If \(c=0\), trivial. We can assume \(c>0\) for \(cf=(-c)(-f)\). Let \(\epsilon>0\) be given and by integrability of \(f\) we can find a partition \(P\) such that \[ U(P,f,\alpha)-L(P,f,\alpha)<\frac\epsilon c. \] So \[ U(P,cf,\alpha)-L(P,cf,\alpha)<\epsilon. \]

Since for any partition \(P\), \(U(P,cf,\alpha)=cU(P,f,\alpha)\) and \(L(P,cf,\alpha)=cL(P,f,\alpha)\), we have \[ \begin{aligned} \overline{\int_a^b}cf\mathrm{d}x=c\overline{\int_a^b}f\mathrm{d}x\\ \underline{\int_a^b}cf\mathrm{d}x=c\underline{\int_a^b}f\mathrm{d}x\\ \end{aligned} \] This is what we want.

Assume \(f_1\) and \(f_2\) are real functions on \([a,b]\) that are integrable with respect to \(\alpha\). Then

  • \(f_1+f_2\) is also integrable with respect to \(\alpha\)
  • \(\int_a^b(f_1+f_2)\mathrm d\alpha=\int_a^bf_1\mathrm d\alpha+\int_a^b\mathrm d\alpha\)

Write \(f=f_1+f_2\) then for any partition \(P\) of \([a,b]\), we have \[ L(P,f_1,\alpha)+L(P,f_2,\alpha)\le L(P,f,\alpha)\le U(P,f,\alpha)\le U(P,f_1,\alpha)+U(P,f_2,\alpha). \] Let \(\epsilon>0\), then there is two partitions \(P_1\) and \(P_2\) such that \(U(P_1,f_1,\alpha)-L(P_1,f_1,\alpha)<\epsilon\) and \(U(P_2,f_2,\alpha)-L(P_2,f_2,\alpha)<\epsilon\). By one of previous theorem, let \(P\) be common refinment of \(P_1\) and \(P_2\), we have \(U(P,f_i,\alpha)-L(P,f_i,\alpha)\) for \(i=1,2\). So \[ U(P,f,\alpha)-L(P,f,\alpha)< 2\epsilon. \] With the same \(P\) we have \[ U(P,f_i,\alpha)<\int f_i\mathrm d\alpha+\epsilon \] So \[ \int f\mathrm d\alpha\le U(P,f,\alpha)<\int f_1\mathrm d\alpha+\int f_2\mathrm d\alpha+2\epsilon+2\epsilon. \] Since \(\epsilon\) is arbitrary we conclude that \(\int f\mathrm d\alpha\le\int f_1\mathrm d\alpha+\int f_2\mathrm d\alpha\)

If we use the same method but consider \(-f_1\) and \(-f_2\) we get \(\int ((-f_1)+(-f_2))\mathrm d\alpha<\int -f_1\mathrm d\alpha+\int -f_2\mathrm d\alpha\). Combine with the last theorem, we have \[ -\int (f_1+f_2)< -\int f_1-\int f_2. \] We rest our case.

By similar method, we obtain that
  • If \(f_1(x)\le f_2(x)\) for any \(x\in[a,b]\), we have \[ \int_a^b f_1\mathrm d\alpha\le\int_a^b f_2\mathrm d\alpha \]
  • If \(f\) is integrable on \([a,b]\) and \(a< c< b\), then \(f\) is integrable on \([a,c]\) and \([b,c]\) and \[ \int_a^b f\mathrm d\alpha=\int_a^c f\mathrm d\alpha+\int_c^b f\mathrm d\alpha. \]
  • If \(f\) is integrable on \([a,b]\) and \(|f(x)|< M\) for all \(x\in[a,b]\) then \[ |\int_a^b f\mathrm d\alpha|\le M[\alpha(b)-\alpha(a)] \]
  • If \(f\) is integrable with respect to \(\alpha_1\) and \(\alpha_2\) then \(f\) is integrable with respect to \(\alpha_1+\alpha_2\) and \[ \int_a^b f\mathrm d{(\alpha_1+\alpha_2)}=\int_a^b f\mathrm d\alpha_1+\int_a^b f\mathrm d\alpha_2 \]
  • If If \(f\) is integrable with respect to \(\alpha\) and \(c\) is a postive constant then \(f\) is integrable with respect to \(c\alpha\) and \[ \int_a^b f\mathrm d(c\alpha)=c\int_a^b f\mathrm d\alpha \]

Proofs are similarly to the proof of the above theorem. In second part, by passing to refinements we amy restrict ourselves to partitions which cantain the point \(c\) in approximating \(\int f\mathrm d\alpha\).

This is a good exercise for testing understanding of the definition of Riemann-Stieltjes integral.

If \(f\) and \(g\) are integrable function on \([a,b]\) with respect to \(\alpha\) then

  • \(fg\) is integrable
  • \(|f|\) is integrable and \[ \left|\int_a^b f\mathrm d\alpha\right|\le\int_a^b|f|\mathrm d\alpha \]
  • \(\phi(t)=t^2\) is certainly continuous. So \(\phi(f(x))=f^2(x)\) and \(g^2(x)\) is integrable. Basic arithmetics: \[ 4fg=(f+g)^2-(f-g)^2. \] We are done.
  • \(\phi(t)=|t|\) is continuous. So \(|f|\) is integrable. Depending on the sign of \(\int f \mathrm d\alpha\), choose \(c=1\) or \(c=-1\) so that \(c\int fd\alpha \ge 0\). Then \[ \left|\int f\mathrm d\alpha\right|=c\int f\mathrm d\alpha=\int cf\mathrm d\alpha\le\int|f|\mathrm d\alpha \] (\(cf\le |f|\).)

The unit step function \(I\) is defined by \[ I(x)=\left\{ \begin{aligned} &0&(x\le 0)\\ &1&(x>0) \end{aligned} \right. \]

Let \(a< s< b\), assume \(f\) to be bounded on \([a,b]\), continuous at \(s\) and \(\alpha(x)=I(x-s)\). Then \[ \int_a^b\mathrm d\alpha=f(s) \]

Consider partition \(P=\{x_0,x_1,x_2,x_3\}\) where \(x_0=a\) and \(x_1=s< x_2 < x_3=b\). Then \[ U(P,f,\alpha)=M_2, L(P,f,\alpha)=m_2. \] Since \(f\) is continuous at \(s\) we see that \(M_2\) and \(m_2\) converges to \(f(s)\) as \(x_2\to s\).

Suppose \(c_n\ge 0\) for all \(n\in\mathbb N\) and \(\sum c_n\) converges, \((s_n)\) is a sequence of distinct points in \((a,b)\) and write \[ \alpha(x)=\sum_{n=1}^\infty c_nI(x-s_n). \] Let \(f:[a,b]\to\mathbb R\) be continuous then \[ \int_a^b f\mathrm d\alpha=\sum_{n=1}^\infty c_n f(s_n). \]

Comparison test shows that \(\alpha(x)\) converges for all \(x\). \(\alpha\) is evidently monotonic since \(c_n\ge 0\). We have \(\alpha(a)=0\) and \(\alpha(b)=\sum c_n\).

Let \(\epsilon>0\), we choose \(N\) such that \[ \sum_{N+1}^\infty c_n<\epsilon. \] Split \(\alpha(x)\) as \[ \alpha_1(x)=\sum_{n=1}^N c_nI(x-s_n),\alpha_2(x)=\sum_{n=N+1}\infty c_n I(x-s_n). \] By previous theorems we have \[ \int_a^b f\mathrm d\alpha_1=\sum_{n=1}^N c_n f(s_n) \] Since \(\alpha_2(b)-alpha_2(a)<\epsilon\), \[ \left|\int_a^b f\mathrm d\alpha_2\le M\epsilon\right| \] where \(M=\sup|f(x)|\). Since \(\alpha=\alpha_1+\alpha_2\), we have \[ \left|\int_a^b f\mathrm d\alpha-\sum_{n=1}^Nc_nf(s_n)\right|\le M\epsilon \] Taking limit as \(N\to\infty\), we conclude.

Assume \(\alpha\) increases monotonically and is differentiable. Assume \(\alpha'\) is integrable with respect to \(x\). Let \(f\) be a bounded real function on \([a,b]\). Then \(f\) is integrable if and only if \(f\alpha'\) is integrable. In that case we have \[ \int_a^b f\mathrm d\alpha=\int_a^b f(x)\alpha'(x)\mathrm dx. \]

Let \(\epsilon>0\) since \(\alpha'\) then there is a partition \(P=\{x_0,\dots,x_n\}\) such that \[ U(P,\alpha')-L(P,\alpha')<\epsilon. \] Use mean value theorem to find points \(t_i\in[x_{i-1},x_i]\) such that \(\Delta\alpha_i=\alpha'(t_i)\Delta x_i\). Then if \(s_i\in[x_{i-1},x_i]\) we have \[ \sum_{i=1}^n|\alpha'(s_i)-\alpha'(t_i)|\Delta x_i<\epsilon \] by the theorem immediately after the criterion of integrability. Write \(M=\sup|f(x)|\). By previous inequality and \[ \sum_{i=1}^nf(s_i)\Delta\alpha_i=\sum_{i=1}^nf(s_i)\alpha'(t_i)\Delta x_i, \] we have \[ \left|\sum_{i=1}^nf(s_i)\Delta \alpha_i-\sum_{i=1}^nf(s_i)\alpha'(s_i)\Delta x_i\right|\le M\epsilon. \] In particular, we have \[ \sum_{i=1}^n f(s_i)\Delta\alpha_i\le U(P,f\alpha')+M\epsilon \] for any \(s_i\in[x_{i-1},x_i]\). Then \[ U(P,f,\alpha)\le U(P, f\alpha')+M\epsilon. \] Similarly we have \[ U(P,f\alpha')\le U(P,f,\alpha)+M\epsilon. \] So \[ |U(P,f,\alpha)-U(P,f\alpha')|\le M\epsilon. \] Now the above argument remains true for any refinement of \(P\). Hence we have \[ \left|\overline{\int_a^b}f\mathrm{d}\alpha-\overline{\int_a^b}f(x)\alpha'(x)\mathrm{d}x\right|\le M\epsilon. \] Since \(\epsilon\) is arbitrary, we have \[ \overline{\int_a^b}f\mathrm{d}\alpha=\overline{\int_a^b}f(x)\alpha'(x)\mathrm{d}x \] for any bounded \(f\). The equality of lower integral is similar. The theorem follows.

Suppose \(\phi\) is a strictly increasing continuous function that maps \([A,B]\) onto \([a,b]\). Suppose \(\alpha\) is monotonically increasing function on \([a,b]\) and \(f:[a,b]\to\mathbb R\) is integrable with respect to \(\alpha\). Define \(\beta\) and \(g\) to be functinos on \([A,B]\) by \[ \beta(y)=\alpha(\phi(y)), g(y)=f(\phi(y)) \] Then \(g\) is integrable with respect to \(\beta\) and \[ \int_A^B g\mathrm d\beta=\int_a^b f\mathrm d\alpha. \]

To each partition \(P=\{x_0,x_1,\dots,x_n\}\) of \([a,b]\) there is a partion \(Q=\{y_0,\dots,y_n\}\) of \([A,B]\) such that \(x_i=\phi(y_i)\). All partition of \([A,B]\) can be obtained this way. By intermediate value theorem, we see the values taken by \(f\) on \([x_{i-1},x_i]\) are the same as those taken by \(g\) on \([y_{i-1},y_i]\), hence we have \[ U(Q,g,\beta)=U(P,f,\alpha),L(Q,g,\beta)=L(P,f,\alpha). \] \(f\) is integrable so \(P\) can be chosen so that \(U(P,f,\alpha)\) and \(L(P,f, \alpha)\) are arbitrarily close to \(\int f\mathrm d\alpha\). Then find corresponding \(Q\), we conclude the proof.

Note the following special case: Take \(\alpha(x)=x\). Then \(\beta=\phi\). Assume \(\phi'\) is integrable on \([A,B]\). Then \[ \int_a^b f(x)\mathrm dx=\int_A^Bf(\phi(y))\mathrm d\phi=\int_A^Bf(\phi(y))\phi'(y)\mathrm dy. \]

Integration and differentiation

Let \(f:[a,b]\to\mathbb R\) be integrable. For \(a\le x\le b\), write \[ F(x)=\int_a^x f(t)\mathrm dt. \] Then \(F\) is continuous on \([a,b]\). If \(f\) is continuous at \(x_0\in[a,b]\). Then \(F\) is differentiable at \(x_0\) with \(F'(x_0)=f(x_0)\)

Since \(f\) is integrable, \(f\) is bounded say \(|f(t)|\le M\) for all \(a\le t\le b\). If \(a\le x< y\le b\), we have \[ |F(y)-F(x)|=\left|\int_x^yf(t)\mathrm dt\right|\le M(y-x). \] Given \(\epsilon>0\) we see that \[ |F(y)-F(x)|<\epsilon \] for any \(|y-x|<\frac\epsilon M\). This proves continuity of \(F\). (also uniform continuity).

Suppose \(f\) is continuous at \(x_0\). Given \(\epsilon>0\) choose \(\delta>0\) such that \(|f(t)-f(x_0)|<\epsilon\) for all \(t\in[a,b]\) with \(|t-x_0|<\delta\). Hence, if \[ x_0-\delta< s\le x_0\le t< x_0+\delta\text{ and }a\le s< t\le b, \] we have \[ \left|\frac{F(t)-F(s)}{t-s}-f(x_0)\right|=\left|\frac1{t-s}\int_t^s[f(u)-f(x_0)]\mathrm du\right|<\epsilon. \] (The above equality used \(\int_s^t f(x_0)=(t-s)f(x_0)\) which is trivial so long as anyone cares to prove it.) Hence \(F'(x_0)=f(x_0)\).

[Fundamental theorem of calculus] If \(f:[a,b]\to\mathbb R\) is integrable and there is a differentiable function \(F:[a,b]\to\mathbb R\) such that \(F'=f\), then \[ \int_a^b f(x)dx=F(b)-F(a) \]

Let \(\epsilon>0\). Choose a partition \(P=\{x_0,\dots,x_n\}\) of \([a,b]\) such that \(U(P,f)-L(P,f)<\epsilon\). Use mean value theorem to find points \(t_i\in[x_{i-1},x_i]\) such that \[ F(x_i)-F(x_{i-1})=f(t_i)\Delta x_i. \] Then \[ \sum_{i=1}^nf(t_i)\Delta x_i=F(b)-F(a). \] Then \[ \left|F(b)-F(a)-\int_a^b f(x)\mathrm dx\right|<\epsilon \] by the theorem immediately after the criterion of integrability. Since \(\epsilon\) is arbitrary, the proof is complete.

[Integration by parts] Suppose \(F\) and \(G\) are differentiable functions on \([a,b]\) with \(f:=F'\) and \(g:=G'\) are both integrable on \([a,b]\). Then \[ \int_a^b FG\mathrm dx=F(b)G(b)-F(a)G(a)-\int_a^bfG\mathrm dx. \]

Use ftc to \(H(x)=F(x)G(x)\) and the fact that \(H'(x)=f(x)G(x)+F(x)g(x)\). Just note that \(H'\) is integrable by previous theorems.

For lack of time, we will omit integration of functions \([a,b]\to\mathbb R^k\), for interested reader, feel free to read Rudin and others (1964) 6.23 to 6.27. The exercises on Rudin and others (1964) are strongly recommended.

Summary

This lecture is about Riemann-Stieltjes integral:

  • its definition
  • its property
  • Fundamental theorem of calculus

Some special functions

In this lecture we will formally define exponentiation, logarithm, sine and cosine functions.

\(\exp\) and \(\log\)

We define the exponential function \(\exp:\mathbb R\to\mathbb R\) by the power series \[ \exp(x)=\sum_{n=0}^\infty\frac{x^n}{n!} \]

We have previously saw that \(\exp\) converges for all \(x\in\mathbb R\) and \(\exp(x)\exp(y)=\exp(x+y)\). We have the following corollaries.

  • \(\exp(0)=1\)
  • \(\exp(-x)=\frac1{\exp(x)}\) for all \(x\in\mathbb R\)
  • \(\exp(x)>0\) for all \(x\in\mathbb R\)
  • straight from definition by substitute \(x=0\) in.
  • \(1=\exp(0)=\exp(-x+x)=\exp(x)\exp(-x)\).
  • partial sums are \(>0\).

\(f(x)=\exp(x)\) is differentiable everywhere and \(f'=f\).

Let \(a\in\mathbb R\) then \[ \frac{\exp(a+h)-exp(a)}h=\exp(a)\frac{\exp(h)-1}h=\exp(a)\left(1+\frac h{2!}+\frac {h^2}{3!}+\dots\right). \]

For \(|h|<2\), we have \[ \left|\frac h{2!}+\frac{h^2}{3!}+\dots\right|\le\frac{|h|}2+\frac{|h|^2}4+\dots+\frac{|h|^n}{2^n}+\dots=\frac{\frac{|h|}2}{1-\frac{|h|}2} \] and this tends to zero as \(h\to 0\). Hence \[ f'(a)=\lim_{h\to0}\frac{\exp(a+h)-exp(a)}h=exp(a)\lim_{h\to0}{\exp(h)-1}{h}=\exp(a). \]

Since \(\exp\) is everywhere continuous, it is everywhere differentiable.

\(\lim_{x\to+\infty}x^n\exp(-x)=0\) for every \(n\in\mathbb N\).

By defniition for \(x>0\) \[ \exp(x)>\frac{x^{n+1}}{(n+1)!} \] hence \[ x^n\exp(-x)<\frac{(n+1)!}x \] As \(x\to\infty\), the above tends to zero.

We often write \(\exp(x)\) as \(e^x\) this because

  • for \(n\in\mathbb N\), \(\exp(n)=\exp(1+1\dots+1)=\exp(1)\exp(1)\dots\exp(1)=e\cdot e\dots e=e^n\);
  • then if \(p=\frac nm\in\mathbb Q\), \(\exp(p)^m=\exp(pm)=\exp(n)=e^n\), hence \(\exp(p)=e^{\frac nm}\);
  • then for any \(x\in\mathbb R\), using continuity one can verify that \(\exp(x)=\sup_{\{y\in\mathbb Q|y < x\}}e^y\).

By intermediate value theorem, domain of \(\exp\) is \(\mathbb R_{> 0}\), i.e \(\exp:\mathbb R\to\mathbb R_{> 0}\) is bijective. Hence there is an inverse function \(\log:\mathbb R_{> 0}\to\mathbb R\) that is also strictly increasing and differentiable (think about why?).

By definition of inverse function: for all \(y>0\) \[ \exp(\log y)=y, \] and for all \(x\in\mathbb R\): \[ \log(\exp x)=x. \] Differentiating this we get \[ \log'(\exp(x))\exp'(x)=1 \] since \(\exp'=\exp\) we get for all \(y>0\) \[ \log'(y)=\frac1y. \] We use Fundamental theorem of calculus and the fact that \(\log(1)=\log(\exp(0))=0\), we get \[ \log(y)=\log(y)-\log(1)=\int_1^y\frac1x\mathrm dx. \]

Note that this integral can sometimes taken as definition of logarithm.

Since \(\log:\mathbb R_{>0}\to\mathbb R\) is bijective, for any \(u,v>0\) we can write \(u=\exp(x)\) and \(v=\exp(y)\) for some \(x,y\in\mathbb R\) then \[ \log(uv)=\log(\exp(x)\exp(y))=\log(\exp(x+y))=x+y=\log(u)+\log(v) \]

We can see that for \(x>0\) and integer \(n\) \[ x^n=x\cdot x\dots x=\exp(\log x)\dots\exp(\log x)=\exp(\log x+\dots+\log x)=\exp(n\log(x)) \] Then for any postive integer \(m\), we have \[ x^{\frac1m}=\exp(\frac1m\log(x)) \] Then for any \(\alpha\in\mathbb Q\), we have \[ x^\alpha=\exp(\alpha\log x)=e^{\alpha\log x}. \] Then for any real \(\alpha\), by continuity, \(\sup_{\{y\in\mathbb Q|y< \alpha\}}x^y=e^{\alpha\log x}\). Thus we can calculate the derivative of \(x^\alpha\) for any real number \(\alpha\): \[ (x^\alpha)'=\exp(\alpha\log x)\cdot\frac\alpha x=\alpha x^{\alpha-1}. \]

This is different from previous result where we only considered the integer case.

Contrary to \(\exp\), \(\log\) grows extremely slowly:

For all \(\alpha>0\), we have \[ \lim_{x\to\infty}x^{-\alpha}\log x=0 \]

For any \(0<\epsilon<\alpha\) and \(x>1\) we have \[ \begin{aligned} x^{-\alpha}\log x&=x^{-\alpha}\int_1^x\frac1t\mathrm dt < x^{-\alpha}\int_1^x t^{\epsilon-1}\\ &=x^{-\alpha}\frac{x^{\epsilon}-1}{\epsilon}<\frac{x^{\epsilon-\alpha}}\epsilon. \end{aligned} \] So the theorem follows.

Sine and cosine

We define \(\sin:\mathbb R\to\mathbb R\) and \(\cos:\mathbb R\to\mathbb R\) by \[ \begin{aligned} \sin(x)&=\sum_{n=0}^\infty\frac{(-1)^n}{(2n+1)!}x^{2n+1}\\ \cos(x)&=\sum_{n=0}^\infty\frac{(-1)^n}{(2n)!}x^(2n) \end{aligned} \]

One can show the radius of convergence of both function is infinite.

\[ \begin{aligned} \sin(0) &=0\\ \cos(0) &=1\\ \sin(x+y)&=\sin(x)\cos(y)+\cos(x)\sin(y)\\ \cos(x+y)&=\cos(x)\cos(y)-\sin(x)\sin(y)\\ \sin^2(x)+\cos^2(x)&=1 \end{aligned} \]

All the prove is basically expanding series. It is a complicated exercise but the idea is really simple.

From the above theorem we get that for all \(x\in\mathbb R\) we have \(-1\le\sin x\le 1\) and \(-1\le cos(x)\le 1\)

\[ \sin'=cos\text{ and }\cos'=-sin \]

We prove the first and leave \(\cos'\) as exercise:

\[ \begin{aligned} \sin'(x)&=\lim_{y\to0}\frac{\sin(x+y)-\sin(x)}{y}\\ &=\lim_{y\to0}\frac{\sin x\cos y+\cos x\sin y-\sin x}{y}\\ &=\lim_{y\to 0}\left[\frac{\sin(x)(\cos y-1)}{y}+\frac{\cos x\sin y}{y} \right]\\ &=\sin(x)\lim_{y\to 0}\frac{\cos y-1}{y}+\cos x\lim_{y\to 0}{\frac{\sin{y}}{y}}. \end{aligned} \]

By definition: \[ \left|\frac{\cos y-1}{y}\right|=|-\frac{y}{2!}+\frac{y^3}{4!}-\frac{y^5}{6!}+\dots|\le\frac{y}{2}+\frac{y^3}{2^3}+\frac{y^5}{2^5}+\dots \] Thus \(\lim_{y\to 0}\frac{\cos y-1}{y}=0\). By definition \[ \frac{\sin{y}}{y}-1=-\frac{y^2}{3!}+\frac{y^4}{5!}-\frac{y^6}{7!}+\dots \] By similar reasoning, \(\lim_{y\to 0}\frac{\sin y}y=1\). Hence \(\sin'(x)=cos(x)\)

Since \(\sin,\cos\) are differentiable they are continuous.

If \(0 < x < 2\), we have \[ \sin(x)=\left(x-\frac{x^3}{3!}\right)+\left(\frac{x^5}{5!}-\frac{x^7}{7!}\right)+\dots>0 \] Then for \(0< x < 2\) we have \((cos x)'=-sin(x)< 0\). Hence \(\cos x\) is decreasing. Note that \[ \cos(x)=\left(1-\frac{x^2}2!\right)+\left(\frac{x^4}{4!}-\frac{x^6}{6!})+\dots. \] Hence \(\cos(\sqrt2)>0\). We rearrange \(\cos(x)\) as following: \[ \cos x=1-\frac{x^2}{2!}+\frac{x^4}{4!}-(\frac{x^6}{6!}-\frac{x^8}{8!})-\dots \] This shows that \(\cos(\sqrt3)<0\). This with intermediate value theorem proves that for some number which we call \(\frac\pi 2\) satisfies \(\cos(\frac\pi 2)=0\) hence \(\sin(\frac\pi 2)=1\).

\[ \begin{aligned} & \sin(x+\frac\pi2)&=\cos x ,& \cos(x+\frac\pi2)=-\sin x,\\ & \sin(x+\pi)=-\sin(x) , & \cos(x+\pi)=-\cos(x),\\ & \sin(x+2\pi)=\sin(x), & \cos(x+2\pi)=\cos(x) \end{aligned} \]

Since \(\sin(x+2\pi)=\sin(x)\) and \(\cos(x+2\pi)=\cos(x)\), we say \(\sin\) and \(\cos\) has period \(2\pi\)

Let us consider a circle in \(\mathbb R^2\) defined by \(\{(x,y)|x^2+y^2\le1\}\). The area of the upper semicircle can be calculated as

\[ \int_{-1}^1\sqrt{1-x^2}\mathrm dx=\int_0^\pi\sin^2\theta\mathrm d\theta=\frac12\int_0^\pi(1-\cos2\theta)\mathrm d\theta=\frac12\pi. \] Here we used integration and change of variable by \(x=\cos\theta\). This shows that our definition of \(\pi\) doesn't contradict our formulae for area of circles.

We are going to briefly overview the gamma function, this function appears at many areas in analysis. This section is not intended to be proved in a very rigorous manner. Readers are welcome to fill in the details.

Gamma function \(\Gamma\)

For \(0< x <\infty\), we define \[ \Gamma(x)=\int_0^\infty t^{x-1}e^{-t}\mathrm dt \] Check that the integral converges for all \(x\), i.e.: \[ \lim_{y\to\infty}\int_0^y t^{x-1}e^{-t}\mathrm dt \] exists.

  • For \(0< x <\infty\), \(\Gamma(x+1)=x\Gamma(x)\);
  • \(\Gamma(n+1)=n!\) for \(n=1,2,3,\dots\);
  • \(\log\Gamma\) is connvex on \((0,\infty)\). where \(f:E\to\mathbb R\) is said to be a convex function if \(E\) is a convex set and for all \(x_1,x_2\in E\) and \(t\in[0,1]\) we have \(f(tx_1+(1-t)x_2)\le tf(x_1)+(1-t)f(x_2)\)
  • Integration by part
  • \(\Gamma(1)=1\) then use first part and induction.
  • This requires Hölder's inequality,4 omitted.

Suprisingly, these 3 properties determines \(\Gamma\) completely:

If \(f\) is a postive function on \((0,\infty)\) such that

  • \(f(x+1)=x f(x)\)
  • \(f(1)=1\)
  • \(log f\) is convex

then \(f\) is the Gamma function.

Since \(\Gamma\) statisfies all three properties, we just need to prove these 3 properties determines a function uniquely. By the first property, we just need to check everything on \((0,1)\), then for all \(x\ge 1\) will be determined using \(f(x+1)=xf(x)\).

Put \(\phi=\log f\). Then \(\phi(x+1)=\log x+\phi(x)\) by first property and \(\phi(1)=0\) by second property. Since \(\phi\) is convex, suppose \(0 < x < 1\) and \(n\) positive integer, we have \(\phi(n+1)=\phi(n)+\log(n)=\phi(n-1)+\log(n-1)+\log(n)=\dots=\log(n!)\). By convexicity: \[ \log n\le\frac{\phi(n+1+x)-\phi(n+1)}{x}\le\log(n+1). \] But \(\phi(n+1+x)=\phi(x)+\log[x(x+1)\dots(x+n)]\). Thus \[ 0\le\phi(x)-\log\left[\frac{n!n^x}{x(x+1)\cdots(x+n)} \right]\le x\log\left(1+\frac1n\right). \] As \(n\to\infty\), \(\log\left(1+\frac1n\right)\to 0\). Hence \(\phi(x)\) is uniquely determined. So is \(f\). Then \(f\) must be \(\Gamma\).

Coincidentally, we found out \[ \Gamma(x)=\lim_{n\to\infty}\frac{n!n^x}{x(x+1)\cdots(x+n)}, \] for \(0 < x < 1\) and use \(\Gamma(x+1)=x\Gamma(x)\) to deduce this holds for all \(x>0\).

We define the beta function \(B(x,y)\) to be \[ \int_0^1t^{x-1}(1-t)^{y-1}\mathrm dt \]

\[ B(x,y)=\frac{\Gamma(x)\Gamma(y)}{\Gamma(x+y)} \]

The idea is to define \[ f(x)=\frac{\Gamma(x+y)}{\Gamma(y)}B(x,y) \] and prove that \(f\) has the 3 properties hence \(f(x)=\Gamma(x)\).

Let us not worry about it too much. See Rudin and others (1964) 8.20 for details.

If we substitute \(t=\sin^2\theta\) we have: \[ 2\int_0^{\frac\pi2}(\sin\theta)^{2x-1}(\cos\theta)^{2y-1}\mathrm d\theta=\frac{\Gamma(x)\Gamma(y)}{\Gamma(x+y)} \] We put \(x=y=\frac12\), then \[ 2\int_0^{\frac\pi2}1\mathrm d\theta=\pi=\frac{\Gamma(\frac12)\Gamma(\frac12)}{\Gamma(1)}=\Gamma(\frac12)^2 \] hence \(\Gamma(12)=\sqrt{\pi}\).

Summary

In this lecture we defined \(\exp,\log,\sin,\cos,\Gamma\). When dealing with \(\sin\) and \(\cos\), one can avoid some of the arithmetic boredom if one goes into complex plane, but we didn't develop complex plane, so we avoid this approach.

  • Definition of \(\exp\);
  • \(\log\) as inverse of \(\exp\);
  • \(\sin\) and \(\cos\) as power series;
  • crash course of \(\Gamma\) function.

If anyone is interested in \(\Gamma\), Artin (2015) is a good place to start.

References

Artin, Emil. 2015. The Gamma Function. Courier Dover Publications.

Halmos, Paul R. 2017. Naive Set Theory. Courier Dover Publications.

Hamilton, Alan G. 1988. Logic for Mathematicians. Cambridge University Press.

Jech, Thomas J. 2008. The Axiom of Choice. Courier Corporation.

Rudin, Walter, and others. 1964. Principles of Mathematical Analysis. Vol. 3. McGraw-hill New York.

Stewart, Ian, and David Tall. 2015. The Foundations of Mathematics. OUP Oxford.

Van Dalen, Dirk. 2004. Logic and Structure. Springer.

Zorich Vladimir, A. 2016. “Mathematical Analysis I.”


  1. see for example https://plato.stanford.edu/entries/conditionals/ and relevant pages for a start.

  2. If you don't believe this, try to find black snow and a not red grass.

  3. We need so called transitive sets (sets without \(\in\)-holes). Interested reader may find Jech (2008) helpful.

  4. For example, see https://en.wikipedia.org/wiki/H%C3%B6lder%27s_inequality and Rudin and others (1964) 8.18(c)