Showing posts with label let. Show all posts
Showing posts with label let. Show all posts

Thursday, March 10, 2011

Counting the number of characters in TeX (part 2)

In a previous post, I explained how to get a control sequence \char to be \let to a character in a "TeX string". I will now show how to iterate over the whole Hello world!, using it for instance to count the number of characters.

The idea is to put a marker at the end of the string, and having a function that continues eating characters until the marker is found. I will use the macro \nil (which does not even need to be defined) as a marker. As an example, look at the following code:
\let\char=a
\ifx\char\nil yes\else no\fi.
\let\char=\nil
\ifx\char\nil yes\else no\fi.
This produces "no. yes.", which is exactly what we need.

Now the problem is that TeX processes token on-line, so we need a macro, e.g. that would behave like this: "if the next token is not \nil, execute me again, i.e., put me again before the remaining of the tokens.
The magic TeX construct that we need is called \afterassignment. It saves the next token and insert it back in the stream after the next assignment, which will be a \let in our case. As an example, consider the following code:
\def\showchar{Char is [\char].}
\afterassignment\showchar\gobblechar Hello world!
It produces "Char is [H].ello world!" since \showchar is put just after `H' has been assigned to \char.

I will now use \ifx and \afterassignemnt in two macros that will call each other. The first one handles the assignment of one token (character) to \char and calls the second; The second check if the token is \nil and if not, calls the first.

\def\assignthencheck{\afterassignment\checknil\gobblechar}

\def\checknil{%
  \ifx\char\nil%
     STOP%
     \let\next=\relax%
  \else%
     (\char)\let\next=\assignthencheck%
  \fi%
  \next%
}

\assignthencheck Hello world!\nil

This produces "(H)(e)(l)(l)(o)( )(w)(o)(r)(l)(d)(!)STOP"

The \assignthencheck macro should be clear after the explanation above. So, after \char has been \let to the first character `H', \checknil is called. This one assign the \next depending on what is \char: since it is not \nil, I asked TeX to print \char between parentheses (to see what happens), then \next is \let to \assignthencheck. At the end of the macro, \next is inserted back on the stream of tokens, i.e., before ello world!\nil. Hence, all characters are processed and printed in parentheses until \nil is found. In which case there is nothing left to do, so I print "STOP" and assign \next is \relax, which means "do nothing", to ensure the \next at the end of the macro will not do anything.

This is now easy to combine the whole to count the number of characters, using a TeX counter. I put here the complete example from the beginning for completeness.

\def\gobblechar{\let\char= }
\newcount\charcount
\def\countunlessnil{%
  \ifx\char\nil \let\next=\relax%
  \else%
    \let\next=\auxcountchar%
    \advance\charcount by 1%
  \fi\next
}%

\def\auxcountchar{%
  \afterassignment\countunlessnil\gobblechar%
}
\def\countchar#1{\edef\xx{#1}\charcount=0 \expandafter\auxcountchar\xx\nil}

\def\shownumchar#1{%
  \countchar{#1}%
  There are \the\charcount\ characters in [#1].%
}

\shownumchar{Hello world!}

\shownumchar{ Hello world!}

\def\text{Hello world!}
\def\atext{ \text\ }
\shownumchar{\atext}

This produces
There are 12 characters in [Hello world!].
There are 13 characters in [ Hello world!].
There are 14 characters in [ Hello world! ].


The \shownumchar macro is easy, it just calls \countchar then uses the TeX counter \charcount to display the number of characters in its argument.

The \countchar macro uses another trick that allows it to be called on macros that themselves call macros (last example, with \atext): the \edef expands its argument as much as possible before assigning it to \xx, then the \expandafter ensures \xx is expanded before \auxcountchar (otherwise, the number of characters would be 1, since we are in fact counting tokens and an unexpanded macro is a single token).


Final note: for completeness reasons, I must mention that in LaTeX there exists a \@tfor macro that iterates over a list of tokens. However, it does not handle spaces: they are skipped over unless explicit like "\ ".

Counting the number of characters in TeX

I was recently asked if it is possible know if there is only one character in a TeX "string," so that we could change the style. This was because usually text did not fit in a box of fixed size, so the font had to be smaller, unless there was only one character.
This is a good example to introduce the notions of scanning in TeX and the usage of \let. We want to create a macro \countchar so that for instance \countchar{Hello world!} would set a TeX counter to 12.
In this first part, I will explain how to catch one character into a control sequence (macro) \gobblechar in TeX. In a second post I describe how to use this macro to count the number of characters.

Gobbling one character using \def


First of all, we need to be able to look at one character at a time. Let us see the different options. One possibility is to define a macro with one argument, then use the macro without {}:
\def\gobblechar#1{\def\char{#1}}
\gobblechar Hello world!. Char is [\char].

This produces the following output: "ello world!. Char is [H]." This is the expected result: without the curly brackets around the "Hello world'" sentence, the argument of the macro becomes the first non-blank token, i.e., H; it is saved in the \char control sequence and the remaining "ello world!" is printed as usual.
    Let us try now with a control sequence as argument:
    \def\text{Hello world!}
    \gobblechar \text. Char is [\char].

    This produces ". Char is [Hello world!]." because now TeX takes the macro \text as a single token, feeding it as argument to \gobblechar. To prevent this behavior, we need \text to be expanded before \gobblechar. We achieve this by doing the following code:
    \expandafter\gobblechar \text. Char is [\char].
    Which produces again "ello world!. Char is [H]." There is still one problem with this function: it is not possible to catch spaces that way:
    \def\text{ Hello world!}
    
    \expandafter\gobblechar \text. Char is [\char].
    This produces "ello world!. Char is [H]." and the space before the `H' is forgotten. As I stated above, the macro will look for the first non-blank token. This is where we leave the \def solution and go with the \let construct.

    Gobbling one character using \let

    \let is a powerful TeX construct that is often unknown to people. While \def is used to build a control sequence (or macro) that will expand to something else, \let creates an alias to something else. Compare for instance:
    \def\deftext{\text}
    \let\lettext=\text
    \ifx\deftext\text yes\else no\fi.
    \ifx\lettext\text yes\else no\fi.
    
    Which produces "no. yes." The \ifx construct can tell if two macros are "the same" (it is in fact a bit more complicated, I'll maybe write about it in the future). Using \let, it is for instance possible to swap the meaning of \a and \b by using:
    \let\tmp=\a \let\a=\b \let\b=\tmp
    And it is possible to \let to a character, for instance,
    \let\a=a
    This is \a\ sentence th\a t h\a s some \a's.
    
    produces "This is a sentence that has some a's." So, let us now redefine our \gobblechar as follows:
    \def\gobblechar{\let\char=}
    \expandafter\gobblechar \text. Char is [\char].
    
    This produces again "ello world!. Char is [H]." Indeed, TeX will expand the beginning to \let\char= Hello World!, and since the definition of \let states that there can be one optional space between the equal sign and the token. Let us put modify the macro to:
    \def\gobblechar{\let\char= }
    \expandafter\gobblechar \text. Char is [\char].
    
    This now produces "Hello world!. Char is [ ]." with the leading space correctly caught in \char. Note that simply writing
    \let\char=  Hello world!. Char is [\char].
    
    would not work, since TeX would directly convert the two spaces between `=' and `H' to a single space. And then assign \char to `H'.