Friday, October 7, 2011

Tex capacity exceeded, sorry!



Every once in a while, someone in my company that tries to generate our documentation get the following error :

Tex capacity exceeded, sorry [main memory size 1000000].

This message is even accompanied (in the log file with the following message

If you really absolutely need more capacity, you can ask a wizard to enlarge me.

Since I am the de facto "wizard" around, here, people turn to me, and every time, we struggle to fix this problem since not one of us has the same linux installation.

So I decided to make a note on the main ways to solve this problem.
  • Often, this means there is a problem in the LaTeX code, like infinite loop (if you are a guru and use \loop commands), but more often there is a forgotten group somewhere.
  • In our case, we really have a memory problem as the document we generate has 600+ pages, hence we need to increase the main memory size:
    1. Locate texmf.cnf and edit it (as root). (On some distribution, you may prefer to edit files in /etc/texmf/texmf.d)
    2. Search for main_memory and change the size (for instance to 2000000).
    3. You need to recreate the format files of LaTeX depending on your distribution, you may do so by using update-texmf (debian) or fmtutil-sys --all (redhat)
    4. If this does not work (i.e., LaTeX still complains having only 1000000 memory), look in your home directory: you may have an old ~/.texmf-var directory that LaTeX uses first and hence does not care about your system-wide configuration (This mistake cost me hours on more that one occasion!)

Monday, April 11, 2011

Removing garbage from the pdf bookmarks

In the process of generating proceedings for the Workshop on Intermediate Representation, I explained previously how to include pdf pages and I mixed this with a technique so that the titles are aligned on the left and the authors on the right (see adjusting paragraphs).
Since I also wanted slightly different colors (red!40!black for the title, and blue!60!black for the authors), all this ended up in the bookmark section of the pdf file:

red!40!black Tirex: A Textual Target-level...Exchangeto 1em. to 0pt

And this for all included files.

I have to yet worked out a clean way to get rid of this garbage, but I found that bookmarks are saved by the hyperref package in the .out file, which contains lines like the following:

\BOOKMARK [0][]{chapter*.3}{red!40!black \040Tirex: A Textual 
Target-Level Intermediate Representation for Compiler Exchangeto 
1em. to 0pt to 1em. blue!50!black \040Artur Pietrek, Florent 
Bouchez and Benoit Dupont De Dinechin}{}

So the obvious trick is to clean this .out file save it and backup it.

\BOOKMARK [0][]{chapter*.3}{Tirex: A Textual Target-Level 
Intermediate Representation for Compiler Exchange -- Artur 
Pietrek, Florent Bouchez and Benoit Dupont De Dinechin}{}

Then, run again pdflatex once. This is important since the first run will erase the .out file and replace it with a file with garble again.

This produces the expected result, with a clean index bookmark in my pdf. However, I would prefer a cleaner way to perform this. This is probably possible in the hyperref package, but the difficulty will likely come from the interaction with pdfpages.

Thursday, March 24, 2011

Publishing conference proceedings using LaTeX part III: polishing the proceedings

This is the last part of the "creating proceedings using LaTeX" series. In two previous posts, I explained some guidelines to give to the writers of articles, then how to include the resulting pdf in LaTeX. I will now details some polishing for the proceedings.

Having a table of contents


With the pdfpages package, I explained how to add lines in the TOC for every inserted pdf. You then have just to use the \tableofcontents macro.
Since the workshop will be organized into section, I wanted to group the articles into sessions in the table of contents (TOC). I created a counter (to number the sessions), and a macro to create a new session (with a title).

\newcounter{sessioncount}
\setcounter{sessioncount}{0}

\newcommand\addsession[1]{
  \stepcounter{sessioncount}
  \addtocontents{toc}{\vskip 8pt {\noindent\Large 
    {\sc Session \arabic{sessioncount}} #1}\protect\par}
}

The \addsession macro takes a title as argument and adds a line in the TOC, looking like this:
SESSION X: session title

So I inserted three of these between the inclusion of articles.

Finally, I wanted to have the workshop name & date at the top of all pages, plus page numbers at the bottom. I used the fancyhdr package which allows "fancy headers."

\usepackage{fancyhdr}
\pagestyle{fancy}      % don't forget this or it won't work!

This initializes the package, then we can make our own design using:

\fancyhead{}
\fancyfoot{}
\renewcommand\headrulewidth{0pt}
\fancyhead[L]{\it Workshop title}
\fancyhead[R]{\it Some location, Year}
\fancyfoot[C]{\thepage}

First, we remove any existing headers and footers, then remove the line below the headers (by setting its width to 0pt). Then I choose to add the workshop name at the top left, the location and date at the top right, and finally page numbers for all pages at the bottom center. These are inserted on all pages, in particular pages included from other pdfs!

Publishing conference proceedings using LaTeX part II: concatenating pdf files

This is the second post on the series "Publishing conference proceedings using LaTeX". The first one dealt with guidelines sent to authors to have coherent files. In this part, I will explain how to merge all articles into a single file.

Concatenating pdf files in LaTeX



It would be easy to do so using pdftk for instance; However, I wanted to add page numbers and some information in the "header", i.e., the first line at the top of a page. So I went for another option, the pdfpages package.

This package lets you include pages of a pdf file directly into you document. I won't go into details (you can read the documentation for that) and only show how I used it:

\includepdf[pages=1-, pagecommand={},%
  addtotoc={1,chapter,0,Title\qquad{\it Authors},paper:X}]%
  {papers/paper_X.pdf} 

This include all pages of the file given in argument. The pagecommand option is used to pass LaTeX code that will be executed on each added page. The default is
pagecommand={\thispagestyle{empty}})
which is not what I wanted (I wanted page numbers and headers!), so I passed an empty code.
The addtotoc creates an entry in the table of contents (TOC), which is nice to have at the beginning of the proceedings; Its options are:
  • 1: the hyperlink will jump to page 1 of the include file.
  • chapter: tells to typeset the line in the TOC as the \chapter command.
  • 0: is the "depth" in sectioning (section is 1, chapter 0, etc.).
  • "Title..." is the text displayed in the toc.
  • "paper:X" is a label on the included pages (probably the first one), in case you want to \ref it.


To include all files more easily, I used the \foreach command of the TikZ package. I also used my trick for adjusting two paragraphs (the \alignpars macro) so that the article title will be align to the left and the authors or the right, with "leaders" (dotted line) linking them (although I had a problem with the \vphantom trick which does not seem to be allowed in a TOC, and had to replace it with \hbox to 0pt{}).

So I first devised a macro to add an article:
\newcommand\addarticle[3]{
  \clearpage{\pagestyle{empty}\cleardoublepage}
  \includepdf[pages=1-, pagecommand={}, 
  addtotoc={1,chapter,1,{\alignpars{#1}{\it #2}},paper:#3}]  
     {papers/paper_#3} 
}

The first line of the macro clears the last page; Then, if we are on an even page we also clears this one (so that every article starts on an odd-numbered page) while removing any headers/footers (e.g., page numbers) from this page.

I then called the macro from inside a \foreach loop. Here is an example with only two dummy articles (I had 10 articles):

\foreach \file/\title/\authors in {
   1.pdf/
   {First article}/
   {Some Authors},
%
   2.pdf/
   {Second article}/
{Some other Authors}}
%
{\addarticle{\title}{\authors}{\file}}

In the next (and last) part, I will explain how to do some more polishing to the proceedings, with sessions in the TOC for instance.

Publishing conference proceedings using LaTeX part I: guidelines to authors

I had to create the proceedings for the workshop I am co-organizing yesterday (Workshop on Intermediate Representations).

My first idea was to create some introductory pages in pdf format using LaTeX and then use pdftk to assemble the files. But I also wanted all papers to look similar, and the whole proceedings to have page numbers, so I choose another way. As the whole procedure might be a bit long (and my posts usually are already quite long enough), I will split this "tutorial" into different posts. This one deals with the guidelines to authors, and will be followed by: part II, part III, other parts ???

Guidelines to authors


First of all, to maintain coherence, all authors should abide to the same rules when producing the final version of their articles. Here is the set of rules I gave them:
  • Use the standard sigplanconf template with 9pt fonts, and maximum 8 pages (this was the template asked for the submission). Since those are not copyrighted proceedings, there was no need for the copyright (end of first column of the first page), so the option nocopyrightspace had to be added to the class.
  • Since the workshop is held in France, I wanted A4 paper format. This is however not an available option for sigplanconf, so I choose margins that more or less did not change the original "letter" layout:
    • A4 paper
    • 1 inch margins at top and bottom
    • .65 inch margins at left and right
    This was easily achieved with the help of the geometry package:
    \usepackage[ a4paper,
      top=1in,
      bottom=1in,
      left=.65in,
      right=.65in,
      offset={0pt,0pt}
      ]{geometry}
    

  • Some people reported problems when compiling using latex and then producing the pdf with dvipdf (the layout was right in A4 format in the .dvi, but back to letter format in the final pdf). Since I wanted pdf files anyway, I recommended to compile the LaTeX files with pdflatex.


  • It is common to balance the two columns on the last page of an article as it looks better. This is usually the bibliography at this point, and I used to this manually by using \input{article.bbl}, once the bibliography was completely done, and inserting a \newpage at the right position (which, in two-column mode, starts a new column leaving blanks space at the bottom of the current one). But I discovered a package to do this:
    \usepackage{balance}
    
    And then at the end of the article:
    \balance
    \bibliography{article}
    

  • Finally, you should make sure that there is no page number, no header, no footer, as these will be added later when compiling all articles. This is the default with the sigplanconf document class, but with another class you may have for instance to issue a \pagestyle{empty} command.

Wednesday, March 23, 2011

Adjusting end of paragraph with start of the next one

I had to compile a selection of articles to create so-called "proceedings" for a workshop. One problem was to join together multiple pdf files; I will address this problem in a later post.
The other problem I had is I I wanted to display the list of pairs article title / article authors in a similar style to a "table of contents", i.e., the title on the left with a line of dots (a.k.a. "leaders" in TeX) joining the names of authors on the right.

If everything fits on one line


Here is such an example:

My dummy title . . . . . . . . . . . . . . . . . . . . . . A. Nonymous



It this case, it is very easily achieved using the following code:

\noindent
My dummy title\dotfill {\it A. Nonymous}

Note that if you want more control over the way leaders are displayed, you can use the generic macro, \leaders of which \dotfill is just a specialization:

\noindent
My dummy title\leaders\hbox to 1em{\hss.\hss}
\hfill {\it A. Nonymous}

The \leaders macro takes first a box, and then "repeat" this box over the length of the next \hskip, i.e., some horizontal "glue". In this case, \hfill is a shortcut for
\hskip 0pt plus 1fill

which means "a horizontal space of length 0 and infinite stretchability". In our case, since the title and author do not fill completely the line, the \hfill will take up all the remaining space between the two, and then the \leaders macro will fill this space with horizontal boxes of length exactly 1em (i.e., approximately the length of an "M") containing a centered dot "."

When one line is not enough

In that case, we start to have some problems. Consider the following example:

Lorem ipsum dolor sit amet, consectetur.
\dotfill{\it A. Nonymous and U. Known}

This produces the following
Lorem ipsum dolor sit amet, consectetur. A. Nonymous
and U. Known
But we in fact want:
Lorem ipsum dolor sit amet, consectetur. . . . . . . .
. . . . . . . . . . . . . . . . . . A. Nonymous and U. Known
Because all the line where the \dotfill appears is already full, there is no more space to "fill"!

The idea to solve this problem is to separate the title from the authors by two such fills, and incent TeX to break between those fills. Here is the solution:

\noindent
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do.\dotfill
\penalty0\hskip 0pt plus -1fill%
\vphantom{x}\dotfill{\it A. Nonymous and U. Known}\break

Let us review this code in details:
  • The \penalty0 tells TeX that there is no penalty in breaking a line here.
  • The \hskip 0pt plus -1fill is a negative fill and has the following effect: if TeX does not break at the penalty, (because there is enough space on the line), then it will cancel the first \dotfill; else, it will do nothing (there is nothing to cancel at the beginning of a new line). This is to avoid a blank inserted between the two dotfill if they fit into a single line.
  • The \vphantom{x} before the second \dotfill is so that there will be something at the beginning of the second line after which to insert the dotfill. This macro insert a "phantom" box of width 0 (and height the same as an "x" but this is a side-effect).
  • Finally, we end the paragraph with \break, which makes TeX break the paragraph without adding any glue, i.e., the end of the paragraph will be stuck to the right. This is what forces the second \dotfill to take the remaining space.

To conclude, here is the final code embedded in a macro:
\def\alignpars#1#2{
\noindent \ignorespaces #1
\dotfill%
\penalty0\hskip 0pt plus -1fill\relax
\vphantom{x}\dotfill
#2\unskip\break
}

I added in this macro \ignorespaces so that any spaces at the beginning of the first arguments will not produce blanks, and \unskip after the second arguments for the same reason (remove spaces at the end so there is no blank).

You can try this macros on variable-length paragraphs and see for yourself:

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.
}

\alignpars{
Lorem ipsum dolor sit amet.
laborum.}{
Excepteur sint proident, sunt in laborum.
}

\alignpars{
Lorem ipsum dolor sit amet snatohu sntahx laborum.}{
Excepteur sint proident, sunt in laborum.}


\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat.
laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat.
laborum.}{
Lorem ipsum dolor sit amet, 
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat.
laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat. santoheu satoheu snthao eusnth
aoeu laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
toto nulla pariatur. Excepteur sint occaecat. santoheu satoheu snthao eusnth
lfugiat aborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
nulla pariatur. Excepteur.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
toto nulla pariatur. Excepteur sint occaecat. santoheu satoheu snthao eusnth
sotnuh saousnth lfugiat aborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
pariatur. Excepteur.
}

Tuesday, March 22, 2011

Modifing some heading styles in LaTeX table of contents

I had several questions asked about the style of section titles in the table of contents. These concern the display or not of section numbers and/or pages for these sections.

Sectioning depth in TOC

The first easy question concerns the sectioning depth at which to stop using numbers. Usually, in the article class, sections are numbered up to the \subsubsection level, (e.g., "1.1.1 My sub-sub-section"), and not numbered below that, i.e., not for paragraphs or sub-paragraphs.

Sometimes, you may not want the sub-sub-sections to be in the table of contents (TOC), or you may want the paragraphs. For this, LaTeX provides a counter, tocdepth that we can set to determine the sectioning depth at which to stop putting sections in TOC. For instance:
\setcounter{tocdepth}{4}
will display up to the paragraph, while
\setcounter{tocdepth}{1}
will only display the sections.

Additionally, you may not want any number associated to sub-section, or you may want numbers associated to sub-paragraphs. This is controled by the secnumdepth counter respectively by:
\setcounter{secnumdepth}{1}
or
\setcounter{secnumdepth}{5}

Removing page numbers in TOC

Someone wanted more control over the table of contents. He wanted sub-sub-sections to appear but without any page number associated to them, i.e., not dotted line with the page number at the far right. This is trickier than the previous customization, because there is no mechanism provided to control this in LaTeX. So we need to modify the core function of the sectioning to perform this. We can find this macro by looking first at the table of contents file, i.e., the one with .toc extension.
In this file there are line like:
\contentsline {subsection}{\numberline {1.1}Test}{1}{section.1.1}

By searching the latex.ltx source file, we find that the \contentsline macro is defined as follows:
\def\contentsline#1{\csname l@#1\endcsname}

Meaning that it expands to a macro constructed using its first argument. In the example above, it would become \csname l@subsection, i.e., the control sequence \l@subsection.

In turn we find the definition of this macro in the article.cls class file:
\newcommand*\l@subsection{\@dottedtocline{2}{1.5em}{2.3em}}

Which brings us to the final macro \@dottedtocline. This is the macro that displays the contents of one line of the TOC when you call \tableofcontents. Here is its definition (warning: unreadable code !):
\def\@dottedtocline#1#2#3#4#5{%
  \ifnum #1>\c@tocdepth \else
    \vskip \z@ \@plus.2\p@
    {\leftskip #2\relax \rightskip \@tocrmarg \parfillskip -\rightskip
     \parindent #2\relax\@afterindenttrue
     \interlinepenalty\@M
     \leavevmode
     \@tempdima #3\relax
     \advance\leftskip \@tempdima \null\nobreak\hskip -\leftskip
     {#4}\nobreak
     \leaders\hbox{$\m@th
        \mkern \@dotsep mu\hbox{.}\mkern \@dotsep
        mu$}\hfill
     \nobreak
     \hb@xt@\@pnumwidth{\hfil\normalfont \normalcolor #5}%
     \par}%
  \fi}

This macros is pretty horrible, but hopefully we don't have to understand all of it! In fact we are interested only in the part of it that displays the dotted line and the page number. This is the part where the \leaders macro is used, and the fifth parameter is the page number. Replacing this by a simple \hfill\kern0pt removes the "leaders" and page number for all sectioning commands (starting at sub-sections in the article class, as the \l@section macro and above use a different code).

To have control on the depth, we can create a new counter:
\newcounter{tocnopages}
\setcounter{tocnopages}{2} % display page number up to sub-sections

and then redefine the macros with a condition on this counter
(the \ifnum #1>\c@tocnopages... \else... \fi part):

\makeatletter
\def\@dottedtocline#1#2#3#4#5{%
  \ifnum #1>\c@tocdepth \else
    \vskip \z@ \@plus.2\p@
    {\leftskip #2\relax \rightskip \@tocrmarg \parfillskip -\rightskip
     \parindent #2\relax\@afterindenttrue
     \interlinepenalty\@M
     \leavevmode
     \@tempdima #3\relax
     \advance\leftskip \@tempdima \null\nobreak\hskip -\leftskip
     {#4}\nobreak
     \ifnum #1>\c@tocnopages \hfill \kern0pt \else
       \leaders\hbox{$\m@th
          \mkern \@dotsep mu\hbox{.}\mkern \@dotsep
          mu$}\hfill
       \nobreak
       \hb@xt@\@pnumwidth{\hfil\normalfont \normalcolor #5}%
     \fi
     \par}%
  \fi}
\makeatother


Removing numbering but only in the TOC

This question was asked on StackOverflow: How to display for instance "I am a subsection" in the TOC, but still having: "2.1 I am a subsection" in the body of the document. I answered directly on the StackOverflow website so go get a look at it if you want to know more. The idea is to redefine conditionally
the \numberline macro (see the \contentsline above) to something empty in the \@dottedtocline macro.

Thursday, March 10, 2011

Counting the number of characters in TeX (part 2)

In a previous post, I explained how to get a control sequence \char to be \let to a character in a "TeX string". I will now show how to iterate over the whole Hello world!, using it for instance to count the number of characters.

The idea is to put a marker at the end of the string, and having a function that continues eating characters until the marker is found. I will use the macro \nil (which does not even need to be defined) as a marker. As an example, look at the following code:
\let\char=a
\ifx\char\nil yes\else no\fi.
\let\char=\nil
\ifx\char\nil yes\else no\fi.
This produces "no. yes.", which is exactly what we need.

Now the problem is that TeX processes token on-line, so we need a macro, e.g. that would behave like this: "if the next token is not \nil, execute me again, i.e., put me again before the remaining of the tokens.
The magic TeX construct that we need is called \afterassignment. It saves the next token and insert it back in the stream after the next assignment, which will be a \let in our case. As an example, consider the following code:
\def\showchar{Char is [\char].}
\afterassignment\showchar\gobblechar Hello world!
It produces "Char is [H].ello world!" since \showchar is put just after `H' has been assigned to \char.

I will now use \ifx and \afterassignemnt in two macros that will call each other. The first one handles the assignment of one token (character) to \char and calls the second; The second check if the token is \nil and if not, calls the first.

\def\assignthencheck{\afterassignment\checknil\gobblechar}

\def\checknil{%
  \ifx\char\nil%
     STOP%
     \let\next=\relax%
  \else%
     (\char)\let\next=\assignthencheck%
  \fi%
  \next%
}

\assignthencheck Hello world!\nil

This produces "(H)(e)(l)(l)(o)( )(w)(o)(r)(l)(d)(!)STOP"

The \assignthencheck macro should be clear after the explanation above. So, after \char has been \let to the first character `H', \checknil is called. This one assign the \next depending on what is \char: since it is not \nil, I asked TeX to print \char between parentheses (to see what happens), then \next is \let to \assignthencheck. At the end of the macro, \next is inserted back on the stream of tokens, i.e., before ello world!\nil. Hence, all characters are processed and printed in parentheses until \nil is found. In which case there is nothing left to do, so I print "STOP" and assign \next is \relax, which means "do nothing", to ensure the \next at the end of the macro will not do anything.

This is now easy to combine the whole to count the number of characters, using a TeX counter. I put here the complete example from the beginning for completeness.

\def\gobblechar{\let\char= }
\newcount\charcount
\def\countunlessnil{%
  \ifx\char\nil \let\next=\relax%
  \else%
    \let\next=\auxcountchar%
    \advance\charcount by 1%
  \fi\next
}%

\def\auxcountchar{%
  \afterassignment\countunlessnil\gobblechar%
}
\def\countchar#1{\edef\xx{#1}\charcount=0 \expandafter\auxcountchar\xx\nil}

\def\shownumchar#1{%
  \countchar{#1}%
  There are \the\charcount\ characters in [#1].%
}

\shownumchar{Hello world!}

\shownumchar{ Hello world!}

\def\text{Hello world!}
\def\atext{ \text\ }
\shownumchar{\atext}

This produces
There are 12 characters in [Hello world!].
There are 13 characters in [ Hello world!].
There are 14 characters in [ Hello world! ].


The \shownumchar macro is easy, it just calls \countchar then uses the TeX counter \charcount to display the number of characters in its argument.

The \countchar macro uses another trick that allows it to be called on macros that themselves call macros (last example, with \atext): the \edef expands its argument as much as possible before assigning it to \xx, then the \expandafter ensures \xx is expanded before \auxcountchar (otherwise, the number of characters would be 1, since we are in fact counting tokens and an unexpanded macro is a single token).


Final note: for completeness reasons, I must mention that in LaTeX there exists a \@tfor macro that iterates over a list of tokens. However, it does not handle spaces: they are skipped over unless explicit like "\ ".

Counting the number of characters in TeX

I was recently asked if it is possible know if there is only one character in a TeX "string," so that we could change the style. This was because usually text did not fit in a box of fixed size, so the font had to be smaller, unless there was only one character.
This is a good example to introduce the notions of scanning in TeX and the usage of \let. We want to create a macro \countchar so that for instance \countchar{Hello world!} would set a TeX counter to 12.
In this first part, I will explain how to catch one character into a control sequence (macro) \gobblechar in TeX. In a second post I describe how to use this macro to count the number of characters.

Gobbling one character using \def


First of all, we need to be able to look at one character at a time. Let us see the different options. One possibility is to define a macro with one argument, then use the macro without {}:
\def\gobblechar#1{\def\char{#1}}
\gobblechar Hello world!. Char is [\char].

This produces the following output: "ello world!. Char is [H]." This is the expected result: without the curly brackets around the "Hello world'" sentence, the argument of the macro becomes the first non-blank token, i.e., H; it is saved in the \char control sequence and the remaining "ello world!" is printed as usual.
    Let us try now with a control sequence as argument:
    \def\text{Hello world!}
    \gobblechar \text. Char is [\char].

    This produces ". Char is [Hello world!]." because now TeX takes the macro \text as a single token, feeding it as argument to \gobblechar. To prevent this behavior, we need \text to be expanded before \gobblechar. We achieve this by doing the following code:
    \expandafter\gobblechar \text. Char is [\char].
    Which produces again "ello world!. Char is [H]." There is still one problem with this function: it is not possible to catch spaces that way:
    \def\text{ Hello world!}
    
    \expandafter\gobblechar \text. Char is [\char].
    This produces "ello world!. Char is [H]." and the space before the `H' is forgotten. As I stated above, the macro will look for the first non-blank token. This is where we leave the \def solution and go with the \let construct.

    Gobbling one character using \let

    \let is a powerful TeX construct that is often unknown to people. While \def is used to build a control sequence (or macro) that will expand to something else, \let creates an alias to something else. Compare for instance:
    \def\deftext{\text}
    \let\lettext=\text
    \ifx\deftext\text yes\else no\fi.
    \ifx\lettext\text yes\else no\fi.
    
    Which produces "no. yes." The \ifx construct can tell if two macros are "the same" (it is in fact a bit more complicated, I'll maybe write about it in the future). Using \let, it is for instance possible to swap the meaning of \a and \b by using:
    \let\tmp=\a \let\a=\b \let\b=\tmp
    And it is possible to \let to a character, for instance,
    \let\a=a
    This is \a\ sentence th\a t h\a s some \a's.
    
    produces "This is a sentence that has some a's." So, let us now redefine our \gobblechar as follows:
    \def\gobblechar{\let\char=}
    \expandafter\gobblechar \text. Char is [\char].
    
    This produces again "ello world!. Char is [H]." Indeed, TeX will expand the beginning to \let\char= Hello World!, and since the definition of \let states that there can be one optional space between the equal sign and the token. Let us put modify the macro to:
    \def\gobblechar{\let\char= }
    \expandafter\gobblechar \text. Char is [\char].
    
    This now produces "Hello world!. Char is [ ]." with the leading space correctly caught in \char. Note that simply writing
    \let\char=  Hello world!. Char is [\char].
    
    would not work, since TeX would directly convert the two spaces between `=' and `H' to a single space. And then assign \char to `H'.

    Friday, February 25, 2011

    TeX source for the "category codes" post

    This post gives the TeX source for a previous post on Category codes in TeX Save this text in a "hash-in-refs.tex" file and process it with the following command: tex hash-in-refs.tex
    \input eplain
    \enablehyperlinks
    
    \newcount\exno
    \exno=0
    \newcount\answ
    \answ=0
    
    {
      \catcode`#=12
      \catcode`!=6
      \gdef\exercise{
        \global\advance\exno by 1 \href{#ex\the\exno}{Exercise \the\exno}
      }
      \gdef\exercisearg!1{
        \global\advance\exno by 1 \href{#ex\the\exno}{Exercise \the\exno} ({\it !1})
      }
    }
    
    \def\answer{
      \global\advance\answ by 1
      {\xrdef{ex\the\answ} Answer of exercise \the\answ}
    }
    
    \exercise
    This is the first exercise.
    \medskip
    
    \exercise
    This is the second exercise.
    \medskip
    
    \exercisearg{Category codes\dots}
    \par
    {
    An exercise on category codes: how to use '/' for commands instead of
    \catcode`/=0
    \catcode`\\=12
    '{/tt \}' ?
    % '$/backslash$'?
    }
    
    (For instance, {\tt /def/a$\{$macro a$\}$}).
    
    
    \vfill
    
    \eject %page break
    
    \answer
    Answer to the first exercise.
    
    \eject
    
    \answer
    Answer to the second exercise.
    
    \eject
    
    \answer
    
    The ``escape'' character '$\backslash$' is of category 0. Hence,
    {
     \catcode`/=0
     \catcode`\\=12
     /tt \catcode/catcode`//=12 `/=0
    } does the trick.
     
    \vfill
    
    \end
    

    Category codes in TeX and hash sign "#" problem


    Back when I didn't have this blog, I already stated to write a few things down so that I won't forget them later. Also, why not put them on the Internet for everyone to see them? I prefer to have all similar information grouped at the same place so there it is, but the original version is still available on my website at
    http://florent.bouchez.free.fr/?page=TeX/category

    Character category codes, token expanding and 'hash' ("#") problem in TeX


    I recently finally understood what "category codes" are in TeX. Coupled with token expansion, these are very delicate things to manipulate. To introduce the problem I will give you an example that comes from my actual experience.

    Suppose you are using plain TeX, and would like to insert hyperlinks to your document. A way to do this is to use the eplain set of macros, which defines in particular \href and \xrdef:
    • \xrdef defines a position in the document as a target for a hyperlink. For instance, \xrdef{label}
    • \href creates a hyperlink to a target, for instance
      \href{http://perso.ens-lyon.fr/florent.bouchez}{Florent's web page}
      However, to reference a target inside the document, one needs to add a hash sign ("#") at the begining of the first argument, i.e.:
      \href{#label}{link to label}

    Now, here is the problem. Suppose you want to create a macro which creates hyperlinks in your document. For instance a macro \exercise that increments a counter and prints "Exercise XX" as a hyperlink to the answer. The first solution that pops into the mind is the following:

    \newcount\exno \exno=0
    \newcount\answ \answ=0
    
    \def\exercise{
      \global\advance\exno by 1
      \href{#ex\the\exno}{Exercise \the\exno}
    }
    
    \def\answer{
      \global\advance\answ by 1
      {\xrdef{ex\the\answ} Answer of exercise \the\answ\quad}
    }
      

    Suppose we put this code in file exercises.tex, we can include in a document:

    \input eplain
    \enablehyperlinks
    
    \input exercises
    
    \exercise
    This is the first exercise.
    
    \exercise
    This is the second exercise.
    
    \eject %page break
    
    \answer
    Answer to the first exercise.
    
    \answer
    Answer to the second exercise.
      

    However, this solution does not work and the following error occurs:

    ! Illegal parameter number in definition of \exercise.
     
                       e
    l.10   \global\advance\exno by 1 \href{#e
                                             x\the\exno}{Exercise \the\exno}
    ? 
      

    The problem is the following: the character "#" is normally used for macro parameters, like #1, #2 and so forth. So TeX expects a number to follow "#" and not the letter "e". However, it is possible to use directly \href{#label}{link to label} (not in a macro) without any problem. Why so?

    In fact, \href contains some TeX trickery to allow the use of "#" as a "normal" character. So, when TeX encounters \href in normal text, it does a bit of magic before continuing to read. However, when reading the definition of macro \exercise, it does not expand (i.e., "execute") \href since it only wants to compute the list of tokens that will be associated with macro \exercise. When converting the body of the macro into tokens, it then finds the hash "#" to be a problem. To understand how to bypass this problem, we need to know what are the "category codes" in TeX.


    Category codes in TeX


    In fact, TeX does not "know" that "#" is the character for arguments, at neither that "{" and "}" are used for grouping. All it does know is that a character of category code 1 (like "{") opens a group, a character of category code 2 (like "}") closes a group, and a character of category 6 (like "#") is used for macro arguments. Category codes are a kind of labels that are attached to characters, but are not fixed. When TeX reads a character, its actions are determined by the category of that character. There are 16 categories that I will not describe here (one can find them on the TeXbook), but what is important to know is that it is possible to change the category of one character by using the command \catcode. For instance, the category codes of "{", "}", and "#" are set every time TeX is run by using the following commands:
    \catcode`{=1
    \catcode`}=2
    \catcode`#=6
      

    Suppose you prefer pikes ("<" and ">") to be used for grouping, you can just add at the beginning of your file:
    \catcode`<=1
    \catcode`>=2
      

    And then you can write some TeX using pikes instead of braces, and it is even possible to mix them:
    \def\a< this is a >
    \def\b{ this is b >
      

    Two other important category codes for us now are the categories 11, category "letters" (a-z and A-Z), and 12, category "others" which contains for instance "@" or "!". TeX does not do anything special when it encounters a character of category 12 (it just prints it), so, if you want for instance to type a lot of "#" without using "\#", it is possible to do the following:
    \catcode`#=12
    This is # just some normal \TeX with a lot # of # hash characters # scattered around.
    

    However, this will prevent you from using arguments in later macro definitions, so it is advisable to restore the category of "#" afterwards:
    \catcode`#=12
    Text with lots of #.
    \catcode`#=6
    Normal code should now use \#.
    

    A better solution is to use braces since category code definitions obey grouping (i.e., the category code is restored to its value before the beginning of the group whenever leaving the group):
    { \catcode`#=12
      Text with lots of #.
    }
    % category code of # is restored
    Normal code should now use \#.
    

    Back to the problem of # in \href


    It now seems possible to solve our problem by changing the category code of "#" in the \exercise macro:
    \def\exercise{
      \catcode`\#=12
      \global\advance\exno by 1 \href{#ex\the\exno}{Exercise \the\exno}
    }
    

    But this gives exactly the same error as above:
    ! Illegal parameter number in definition of \exercise.
     
                       e
    l.10   \global\advance\exno by 1 \href{#e
                                             x\the\exno}{Exercise \the\exno}
    ?         
    

    Now, to understand the problem, it is really important to know what \catcode does and does not. It does change the category code of character that will be read next, but does not change the category code of characters already read and converted to tokens. So, the problem is again that when reading the definition of macro \exercise, the whole body is converted to tokens, so the \catcode command is not executed, and whenever TeX reads "#" it is still of category code 6. The solution is then to change the category code of "#" before defining \exercise, as follows:
    {
      \catcode`#=12
      \gdef\exercise{
        \global\advance\exno by 1 \href{#ex\the\exno}{Exercise \the\exno}
      }
    }
    

    Notice the grouping so that the category of "#" reverts back to 6 after the definition. Notice also that the definition of \exercise should now be global (using \gdef) since \def also obeys grouping. Without it, \exercise would be defined only until the current groups ends, hence not defined after the last "}".


    Now, to conclude, suppose we want the \exercise macro to have an argument so that it prints as "Exercise XX (argument)", how can we do that since "#" is not available for arguments anymore?

    Answer: as explained before, we are not restricted to the hash sign for macro arguments, any character with category code 6 will do the trick, for instance "a" after \catcode`a=6. However, this would forbid us to use the "a" which would be a problem as we need it (e.g., for \advance). It is better to use for instance "!" or "@":
    {
      \catcode`#=12
      \catcode`!=6
      \gdef\exercisearg!1{
        \global\advance\exno by 1 \href{#ex\the\exno}{Exercise \the\exno} ({\it !1})
      }
    }
    \exercisearg{category codes\dots}
    

    You can download the files I've used on my website: the TeX file, and
    the .dvi file (with clickable links) obtained after TeXing the .tex file. For completeness reasons, I've also posted the TeX file in this blog.

    Category code 13: active characters


    One special category code deserves the right to have a name: category 13, denoted by \active. Active characters are normal control sequences, but are not prefixed with an espace (\). For instance, ~ is an active character that has been def'd to a non breakable space (a space with infinite penalty if breaking it). It is possible to make any character active to use them as control sequences. For instance,
    \catcode`?=\active
    \catcode`a=\active
    \def?{coucou}
    \defa{hello tout le monde}
    
    ?a?a?a?a?a?a?
    

    will print coucouhello tout le mondecoucouhello tout le mondecoucouhello tout le mondecoucouhello tout le mondecoucouhello tout le mondecoucouhello tout le mondecoucou, though it is usually not advisable to change the category codes of letters or digits for obvious reasons... (Except maybe the letter 'e' but only if your name is George Perec and you want to TeX-typeset La disparition. But remember also to redefine first all sectioning commands.)


    List of LaTeX macros for writing packages

     Martin Scharrer has compiled a list of LaTeX macros useful for people that are building packages. This is a great document, I personally have spent numerous times source-diving into the base LaTeX file /usr/share/texmf/tex/latex/base/latex.ltx to find how things work, scratching my head very hard to understand what macros like \@dblarg do. This is even more difficult than it seems as macros tend to be used at the end of other macros, i.e., without their arguments.

    More information at:
    http://www.scharrer-online.de/wiki/LaTeX/Docs/macros2e

    Purpose of this blog

    Over the time, I've had the chance to work with LaTeX and became increasingly interested in this typesetting language. But the more you want to do the more you have to know the underlying structure it is built on: TeX. One problem I had to deal with was that LaTeX is now so widely spread that it very hard to find specific information on plain TeX. The best source for this is the TeXbook by its creator D. Knuth.

    I will try to relate in this blog some of my experience in playing with plain TeX constructs, and also probably post solutions (in TeX or LaTeX) to typesetting problems I had to solve.