The idea is to put a marker at the end of the string, and having a function that continues eating characters until the marker is found. I will use the macro \nil (which does not even need to be defined) as a marker. As an example, look at the following code:
\let\char=a \ifx\char\nil yes\else no\fi. \let\char=\nil \ifx\char\nil yes\else no\fi.This produces "no. yes.", which is exactly what we need.
Now the problem is that TeX processes token on-line, so we need a macro, e.g. that would behave like this: "if the next token is not \nil, execute me again, i.e., put me again before the remaining of the tokens.
The magic TeX construct that we need is called \afterassignment. It saves the next token and insert it back in the stream after the next assignment, which will be a \let in our case. As an example, consider the following code:
\def\showchar{Char is [\char].} \afterassignment\showchar\gobblechar Hello world!It produces "Char is [H].ello world!" since \showchar is put just after `H' has been assigned to \char.
I will now use \ifx and \afterassignemnt in two macros that will call each other. The first one handles the assignment of one token (character) to \char and calls the second; The second check if the token is \nil and if not, calls the first.
\def\assignthencheck{\afterassignment\checknil\gobblechar} \def\checknil{% \ifx\char\nil% STOP% \let\next=\relax% \else% (\char)\let\next=\assignthencheck% \fi% \next% } \assignthencheck Hello world!\nil
This produces "(H)(e)(l)(l)(o)( )(w)(o)(r)(l)(d)(!)STOP"
The \assignthencheck macro should be clear after the explanation above. So, after \char has been \let to the first character `H', \checknil is called. This one assign the \next depending on what is \char: since it is not \nil, I asked TeX to print \char between parentheses (to see what happens), then \next is \let to \assignthencheck. At the end of the macro, \next is inserted back on the stream of tokens, i.e., before
ello world!\nil
. Hence, all characters are processed and printed in parentheses until \nil is found. In which case there is nothing left to do, so I print "STOP" and assign \next is \relax, which means "do nothing", to ensure the \next at the end of the macro will not do anything.This is now easy to combine the whole to count the number of characters, using a TeX counter. I put here the complete example from the beginning for completeness.
\def\gobblechar{\let\char= } \newcount\charcount \def\countunlessnil{% \ifx\char\nil \let\next=\relax% \else% \let\next=\auxcountchar% \advance\charcount by 1% \fi\next }% \def\auxcountchar{% \afterassignment\countunlessnil\gobblechar% } \def\countchar#1{\edef\xx{#1}\charcount=0 \expandafter\auxcountchar\xx\nil} \def\shownumchar#1{% \countchar{#1}% There are \the\charcount\ characters in [#1].% } \shownumchar{Hello world!} \shownumchar{ Hello world!} \def\text{Hello world!} \def\atext{ \text\ } \shownumchar{\atext}
This produces
There are 12 characters in [Hello world!].
There are 13 characters in [ Hello world!].
There are 14 characters in [ Hello world! ].
The \shownumchar macro is easy, it just calls \countchar then uses the TeX counter \charcount to display the number of characters in its argument.
The \countchar macro uses another trick that allows it to be called on macros that themselves call macros (last example, with \atext): the \edef expands its argument as much as possible before assigning it to \xx, then the \expandafter ensures \xx is expanded before \auxcountchar (otherwise, the number of characters would be 1, since we are in fact counting tokens and an unexpanded macro is a single token).
Final note: for completeness reasons, I must mention that in LaTeX there exists a
\@tfor
macro that iterates over a list of tokens. However, it does not handle spaces: they are skipped over unless explicit like "\
".
Thanks a lot!
ReplyDeleteI have used the example to create my own parser for the strings like "c|c|l|r" where I neglect the character '|'. And the check like ifx \char| works perfectly.
But I would also like to work with the cases when there are the characters like '{', '}', '>'.
How would one deal with such the case?
Thank you!
Hi Vasil,
DeleteIndeed, the '{' and '}' are special characters. However, they are not special "by nature", but because they have a different category code (catcode) than other "regular" characters. You can change these codes by using for instance:
\catcode`{=12
\catcode`}=12
I made a post on this subject some time ago:
http://tex-and-stuff.blogspot.fr/2011/02/category-codes-in-tex-and-hash-sign.html
Cheers
Thank you very much! Was very helpful!
ReplyDeleteThanks a lot for this very usefull page.
ReplyDeleteI can't make your \countchar macro work if there's another instruction in if (e.g. \emph), I get an error:
Use of \@rule doesn't match its definition
Is there anyway to work around this?
Hi Benjamin,
DeleteSorry I just find now the time to look at your problem.
What I explained here is for plain TeX, not LaTeX. I just check and in works find in TeX with {\it ...} for instance. It also "works" in LaTeX but the number of characters is strange, I suspect LaTeX adds a lot more stuff in macros. I don't know how to make it work with complicated macros such as \emph (yes, it is a complicated macro, just look at how it is defined in texinfo.tex).
This comment has been removed by the author.
ReplyDeleteThis is a great set of macros, very useful and quick. Do you mind if I use it in a package I'm working on? Of course, you'd be given credit for it.
ReplyDeleteHi Donald,
DeleteAbsolutely not, feel free to use it in whichever package you want. I'll be interested to know what it is about when it's finished :-)
Very helpful, so many thanks!
ReplyDelete