[summary: The Free group may be constructed formally using van der Waerden's trick, which is not intuitive at all but leads to a definition that is very easy to work with. This page will detail van der Waerden's construction, and will prove that the trick yields a group which has all the right properties to be the free group.]
The Free group may be constructed formally using van der Waerden's trick, which is not intuitive at all but leads to a definition that is very easy to work with. This page will detail van der Waerden's construction, and will prove that the trick yields a group which has all the right properties to be the free group.
The construction
Write $~$X^r$~$ for the set that contains all the freely reduced words over $~$X \cup X^{-1}$~$ (so, for instance, excluding the word $~$aa^{-1}$~$). %%note:We use the superscript $~$r$~$ to denote "reduced".%%
We define the free group $~$F(X)$~$, or $~$FX$~$, on the set $~$X$~$ to be a certain Subgroup of the Symmetric group $~$\mathrm{Sym}(X^r)$~$: namely that which is generated by the following elements, one for each $~$x \in X \cup X^{-1}$~$:
- $~$\rho_x : \mathrm{Sym}(X^r) \to \mathrm{Sym}(X^r)$~$, sending $~$a_1 a_2 \dots a_n \mapsto a_1 a_2 \dots a_n x$~$ if $~$a_n \not = x^{-1}$~$, and $~$a_1 a_2 \dots a_{n-1} x^{-1} \mapsto a_1 a_2 \dots a_{n-1}$~$.
- $~$\rho_{x^{-1}} : \mathrm{Sym}(X^r) \to \mathrm{Sym}(X^r)$~$, sending $~$a_1 a_2 \dots a_n \mapsto a_1 a_2 \dots a_n x^{-1}$~$ if $~$a_n \not = x$~$, and $~$a_1 a_2 \dots a_{n-1} x \mapsto a_1 a_2 \dots a_{n-1}$~$.
Recall that each $~$\rho_x$~$ lies in $~$\mathrm{Sym}(X^r)$~$, so each is a Bijective function from $~$X^r$~$ to $~$X^r$~$. We specify it by stating what it does to every element of $~$X^r$~$ (that is, to every freely reduced word over $~$X$~$).
We first specify what it does to those words which don't end in $~$x^{-1}$~$: $~$\rho_x$~$ simply appends an $~$x$~$ to such words. We then specify it to the remaining words, those which do end in $~$x^{-1}$~$: then $~$\rho_x$~$ just removes the $~$x^{-1}$~$.
It's easy to check that if $~$\rho_x$~$ is given a freely-reduced word as input, then it produces a freely-reduced word as output, because the only change to the word is at the end and we make sure to provide a separate definition if $~$x$~$ is to be cancelled. Therefore each $~$\rho_x$~$ is a function $~$X^r \to X^r$~$.
Then we do it all again for all the inverses $~$x^{-1}$~$, creating the functions $~$\rho_{x^{-1}}$~$; and finally, we add in the Identity element, denoted $~$\rho_{\varepsilon}$~$, which simply returns its input unchanged.
Notice that the $~$\rho_x$~$ and $~$\rho_{x^{-1}}$~$ are all indeed bijective (and therefore members of $~$\mathrm{Sym}(X^r)$~$), because in fact $~$\rho_x$~$ and $~$\rho_{x^{-1}}$~$ are inverse to each other (each cancelling off what the other did), and a function with an inverse is bijective.
So, we've defined the free group as a certain subgroup of the symmetric group. Remember that the subgroup has as its group operation "function composition"; so $~$\rho_x \cdot \rho_y = \rho_x \circ \rho_y$~$, for instance. We will write $~$\rho_x \rho_y$~$ for this, omitting the group operation.
Something key to notice is that if we apply $~$\rho_{a_n} \rho_{a_{n-1}} \dots \rho_{a_1}$~$ to the empty word $~$\varepsilon$~$, we get $$~$\rho_{a_n} \rho_{a_{n-1}} \dots \rho_{a_1}(\varepsilon) = \rho_{a_n} \rho_{a_{n-1}} \dots \rho_{a_3}(\rho_{a_2}(a_1)) = \rho_{a_n a_{n-1} \dots a_3}(a_1 a_2) = \dots = a_1 a_2 \dots a_n$~$$ if $~$a_1 a_2 \dots a_n$~$ is a freely reduced word. (Indeed, if the word is freely reduced then none of the successive $~$\rho_{a_i}, \rho_{a_{i+1}}$~$ can have cancelled each other's effect out, so every application of a $~$\rho_{a_i}$~$ must be appending a letter.) Hence we might hope to have captured the freely reduced words in our subgroup.
The formal definition is the same as the intuitive definition
We'll show that there is a bijection between the free group and the set of reduced words, by "converting" each reduced word into a corresponding member of the free group.
Take a reduced word, $~$w = a_1 a_2 \dots a_n$~$, and produce the member of the free group (that is, the function) $~$\rho_{a_1} \rho_{a_2} \dots \rho_{a_n}$~$. %%note:Recall that the group operation here is composition of functions, so this is actually the function $~$\rho_{a_1} \circ \rho_{a_2} \circ \dots \circ \rho_{a_n}$~$.%% This really does produce a member of the free group (i.e. of the subgroup of the symmetry group), because each $~$a_i$~$ is an element of $~$X \cup X^{-1}$~$ and we have already specified how to make $~$\rho_{a_i}$~$ from such an element.
Now, we claim that in fact this map is injective: that is, we can't take two words $~$a_1 a_2 \dots a_n$~$ and $~$b_1 b_2 \dots b_m$~$ and produce the same member of the free group. (That is, we show that $~$\rho_{a_1} \rho_{a_2} \dots \rho_{a_n} = \rho_{b_1} \rho_{b_2} \dots \rho_{b_m}$~$ implies $~$a_1 \dots a_n = b_1 \dots b_m$~$.) Indeed, if the two functions ("elements of the free group") are equal, then they must in particular do the same thing when they are applied to the empty word $~$\varepsilon$~$. But by the "key notice" above, when we evaluate $~$\rho_{a_1} \rho_{a_2} \dots \rho_{a_n}$~$ at the empty word, we get $~$a_n a_{n-1} \dots a_2 a_1$~$; and when we evaluate $~$\rho_{b_1} \rho_{b_2} \dots \rho_{b_m}$~$ at the empty word, we get $~$b_m b_{m-1} \dots b_2 b_1$~$; so the two words must be equal after all. [todo: we've got this backwards]
Finally, the map is surjective: we can make any member of the free group by "converting the appropriate reduced word into a function". Indeed, the free group is generated by the $~$\rho_x$~$ and $~$\rho_{x^{-1}}$~$ for $~$x \in X$~$, so every element is some $~$\rho_{x_1} \dots \rho_{x_n}$~$ for some selection of $~$x_1, \dots, x_n \in X \cup X^{-1}$~$. Note that $~$x_1 \dots x_n$~$ need not necessarily be freely reduced as a word at the moment; but if it is indeed not freely reduced, so some $~$x_i, x_{i+1}$~$ cancel each other out, then removing that pair completely doesn't change the function $~$\rho_{x_1} \dots \rho_{x_n}$~$. For example, $~$\rho_{x_1} \rho_{x_1^{-1}} \rho_{x_2} = \rho_{x_2}$~$. Hence the process of "performing one step of a free reduction" (i.e. removing a cancelling pair) doesn't change the member of the free group as a function; and since each such removal makes the word shorter, it must eventually terminate. It remains to show that it doesn't matter in what order we remove the cancelling pairs; but that is immediate because we've already shown that our "conversion" process is injective: we started with a member of the free group, so if it corresponds to a freely reduced word then it corresponds to a unique freely reduced word. Since we've just shown that it does indeed correspond to a freely reduced word (by repeatedly removing cancelling pairs), we are done.
The above shows that the free group can be considered just to be the set of reduced words.