Thursday, March 24, 2011

Publishing conference proceedings using LaTeX part III: polishing the proceedings

This is the last part of the "creating proceedings using LaTeX" series. In two previous posts, I explained some guidelines to give to the writers of articles, then how to include the resulting pdf in LaTeX. I will now details some polishing for the proceedings.

Having a table of contents


With the pdfpages package, I explained how to add lines in the TOC for every inserted pdf. You then have just to use the \tableofcontents macro.
Since the workshop will be organized into section, I wanted to group the articles into sessions in the table of contents (TOC). I created a counter (to number the sessions), and a macro to create a new session (with a title).

\newcounter{sessioncount}
\setcounter{sessioncount}{0}

\newcommand\addsession[1]{
  \stepcounter{sessioncount}
  \addtocontents{toc}{\vskip 8pt {\noindent\Large 
    {\sc Session \arabic{sessioncount}} #1}\protect\par}
}

The \addsession macro takes a title as argument and adds a line in the TOC, looking like this:
SESSION X: session title

So I inserted three of these between the inclusion of articles.

Finally, I wanted to have the workshop name & date at the top of all pages, plus page numbers at the bottom. I used the fancyhdr package which allows "fancy headers."

\usepackage{fancyhdr}
\pagestyle{fancy}      % don't forget this or it won't work!

This initializes the package, then we can make our own design using:

\fancyhead{}
\fancyfoot{}
\renewcommand\headrulewidth{0pt}
\fancyhead[L]{\it Workshop title}
\fancyhead[R]{\it Some location, Year}
\fancyfoot[C]{\thepage}

First, we remove any existing headers and footers, then remove the line below the headers (by setting its width to 0pt). Then I choose to add the workshop name at the top left, the location and date at the top right, and finally page numbers for all pages at the bottom center. These are inserted on all pages, in particular pages included from other pdfs!

Publishing conference proceedings using LaTeX part II: concatenating pdf files

This is the second post on the series "Publishing conference proceedings using LaTeX". The first one dealt with guidelines sent to authors to have coherent files. In this part, I will explain how to merge all articles into a single file.

Concatenating pdf files in LaTeX



It would be easy to do so using pdftk for instance; However, I wanted to add page numbers and some information in the "header", i.e., the first line at the top of a page. So I went for another option, the pdfpages package.

This package lets you include pages of a pdf file directly into you document. I won't go into details (you can read the documentation for that) and only show how I used it:

\includepdf[pages=1-, pagecommand={},%
  addtotoc={1,chapter,0,Title\qquad{\it Authors},paper:X}]%
  {papers/paper_X.pdf} 

This include all pages of the file given in argument. The pagecommand option is used to pass LaTeX code that will be executed on each added page. The default is
pagecommand={\thispagestyle{empty}})
which is not what I wanted (I wanted page numbers and headers!), so I passed an empty code.
The addtotoc creates an entry in the table of contents (TOC), which is nice to have at the beginning of the proceedings; Its options are:
  • 1: the hyperlink will jump to page 1 of the include file.
  • chapter: tells to typeset the line in the TOC as the \chapter command.
  • 0: is the "depth" in sectioning (section is 1, chapter 0, etc.).
  • "Title..." is the text displayed in the toc.
  • "paper:X" is a label on the included pages (probably the first one), in case you want to \ref it.


To include all files more easily, I used the \foreach command of the TikZ package. I also used my trick for adjusting two paragraphs (the \alignpars macro) so that the article title will be align to the left and the authors or the right, with "leaders" (dotted line) linking them (although I had a problem with the \vphantom trick which does not seem to be allowed in a TOC, and had to replace it with \hbox to 0pt{}).

So I first devised a macro to add an article:
\newcommand\addarticle[3]{
  \clearpage{\pagestyle{empty}\cleardoublepage}
  \includepdf[pages=1-, pagecommand={}, 
  addtotoc={1,chapter,1,{\alignpars{#1}{\it #2}},paper:#3}]  
     {papers/paper_#3} 
}

The first line of the macro clears the last page; Then, if we are on an even page we also clears this one (so that every article starts on an odd-numbered page) while removing any headers/footers (e.g., page numbers) from this page.

I then called the macro from inside a \foreach loop. Here is an example with only two dummy articles (I had 10 articles):

\foreach \file/\title/\authors in {
   1.pdf/
   {First article}/
   {Some Authors},
%
   2.pdf/
   {Second article}/
{Some other Authors}}
%
{\addarticle{\title}{\authors}{\file}}

In the next (and last) part, I will explain how to do some more polishing to the proceedings, with sessions in the TOC for instance.

Publishing conference proceedings using LaTeX part I: guidelines to authors

I had to create the proceedings for the workshop I am co-organizing yesterday (Workshop on Intermediate Representations).

My first idea was to create some introductory pages in pdf format using LaTeX and then use pdftk to assemble the files. But I also wanted all papers to look similar, and the whole proceedings to have page numbers, so I choose another way. As the whole procedure might be a bit long (and my posts usually are already quite long enough), I will split this "tutorial" into different posts. This one deals with the guidelines to authors, and will be followed by: part II, part III, other parts ???

Guidelines to authors


First of all, to maintain coherence, all authors should abide to the same rules when producing the final version of their articles. Here is the set of rules I gave them:
  • Use the standard sigplanconf template with 9pt fonts, and maximum 8 pages (this was the template asked for the submission). Since those are not copyrighted proceedings, there was no need for the copyright (end of first column of the first page), so the option nocopyrightspace had to be added to the class.
  • Since the workshop is held in France, I wanted A4 paper format. This is however not an available option for sigplanconf, so I choose margins that more or less did not change the original "letter" layout:
    • A4 paper
    • 1 inch margins at top and bottom
    • .65 inch margins at left and right
    This was easily achieved with the help of the geometry package:
    \usepackage[ a4paper,
      top=1in,
      bottom=1in,
      left=.65in,
      right=.65in,
      offset={0pt,0pt}
      ]{geometry}
    

  • Some people reported problems when compiling using latex and then producing the pdf with dvipdf (the layout was right in A4 format in the .dvi, but back to letter format in the final pdf). Since I wanted pdf files anyway, I recommended to compile the LaTeX files with pdflatex.


  • It is common to balance the two columns on the last page of an article as it looks better. This is usually the bibliography at this point, and I used to this manually by using \input{article.bbl}, once the bibliography was completely done, and inserting a \newpage at the right position (which, in two-column mode, starts a new column leaving blanks space at the bottom of the current one). But I discovered a package to do this:
    \usepackage{balance}
    
    And then at the end of the article:
    \balance
    \bibliography{article}
    

  • Finally, you should make sure that there is no page number, no header, no footer, as these will be added later when compiling all articles. This is the default with the sigplanconf document class, but with another class you may have for instance to issue a \pagestyle{empty} command.

Wednesday, March 23, 2011

Adjusting end of paragraph with start of the next one

I had to compile a selection of articles to create so-called "proceedings" for a workshop. One problem was to join together multiple pdf files; I will address this problem in a later post.
The other problem I had is I I wanted to display the list of pairs article title / article authors in a similar style to a "table of contents", i.e., the title on the left with a line of dots (a.k.a. "leaders" in TeX) joining the names of authors on the right.

If everything fits on one line


Here is such an example:

My dummy title . . . . . . . . . . . . . . . . . . . . . . A. Nonymous



It this case, it is very easily achieved using the following code:

\noindent
My dummy title\dotfill {\it A. Nonymous}

Note that if you want more control over the way leaders are displayed, you can use the generic macro, \leaders of which \dotfill is just a specialization:

\noindent
My dummy title\leaders\hbox to 1em{\hss.\hss}
\hfill {\it A. Nonymous}

The \leaders macro takes first a box, and then "repeat" this box over the length of the next \hskip, i.e., some horizontal "glue". In this case, \hfill is a shortcut for
\hskip 0pt plus 1fill

which means "a horizontal space of length 0 and infinite stretchability". In our case, since the title and author do not fill completely the line, the \hfill will take up all the remaining space between the two, and then the \leaders macro will fill this space with horizontal boxes of length exactly 1em (i.e., approximately the length of an "M") containing a centered dot "."

When one line is not enough

In that case, we start to have some problems. Consider the following example:

Lorem ipsum dolor sit amet, consectetur.
\dotfill{\it A. Nonymous and U. Known}

This produces the following
Lorem ipsum dolor sit amet, consectetur. A. Nonymous
and U. Known
But we in fact want:
Lorem ipsum dolor sit amet, consectetur. . . . . . . .
. . . . . . . . . . . . . . . . . . A. Nonymous and U. Known
Because all the line where the \dotfill appears is already full, there is no more space to "fill"!

The idea to solve this problem is to separate the title from the authors by two such fills, and incent TeX to break between those fills. Here is the solution:

\noindent
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do.\dotfill
\penalty0\hskip 0pt plus -1fill%
\vphantom{x}\dotfill{\it A. Nonymous and U. Known}\break

Let us review this code in details:
  • The \penalty0 tells TeX that there is no penalty in breaking a line here.
  • The \hskip 0pt plus -1fill is a negative fill and has the following effect: if TeX does not break at the penalty, (because there is enough space on the line), then it will cancel the first \dotfill; else, it will do nothing (there is nothing to cancel at the beginning of a new line). This is to avoid a blank inserted between the two dotfill if they fit into a single line.
  • The \vphantom{x} before the second \dotfill is so that there will be something at the beginning of the second line after which to insert the dotfill. This macro insert a "phantom" box of width 0 (and height the same as an "x" but this is a side-effect).
  • Finally, we end the paragraph with \break, which makes TeX break the paragraph without adding any glue, i.e., the end of the paragraph will be stuck to the right. This is what forces the second \dotfill to take the remaining space.

To conclude, here is the final code embedded in a macro:
\def\alignpars#1#2{
\noindent \ignorespaces #1
\dotfill%
\penalty0\hskip 0pt plus -1fill\relax
\vphantom{x}\dotfill
#2\unskip\break
}

I added in this macro \ignorespaces so that any spaces at the beginning of the first arguments will not produce blanks, and \unskip after the second arguments for the same reason (remove spaces at the end so there is no blank).

You can try this macros on variable-length paragraphs and see for yourself:

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.
}

\alignpars{
Lorem ipsum dolor sit amet.
laborum.}{
Excepteur sint proident, sunt in laborum.
}

\alignpars{
Lorem ipsum dolor sit amet snatohu sntahx laborum.}{
Excepteur sint proident, sunt in laborum.}


\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat.
laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat.
laborum.}{
Lorem ipsum dolor sit amet, 
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat.
laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat. santoheu satoheu snthao eusnth
aoeu laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
toto nulla pariatur. Excepteur sint occaecat. santoheu satoheu snthao eusnth
lfugiat aborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
nulla pariatur. Excepteur.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
toto nulla pariatur. Excepteur sint occaecat. santoheu satoheu snthao eusnth
sotnuh saousnth lfugiat aborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
pariatur. Excepteur.
}

Tuesday, March 22, 2011

Modifing some heading styles in LaTeX table of contents

I had several questions asked about the style of section titles in the table of contents. These concern the display or not of section numbers and/or pages for these sections.

Sectioning depth in TOC

The first easy question concerns the sectioning depth at which to stop using numbers. Usually, in the article class, sections are numbered up to the \subsubsection level, (e.g., "1.1.1 My sub-sub-section"), and not numbered below that, i.e., not for paragraphs or sub-paragraphs.

Sometimes, you may not want the sub-sub-sections to be in the table of contents (TOC), or you may want the paragraphs. For this, LaTeX provides a counter, tocdepth that we can set to determine the sectioning depth at which to stop putting sections in TOC. For instance:
\setcounter{tocdepth}{4}
will display up to the paragraph, while
\setcounter{tocdepth}{1}
will only display the sections.

Additionally, you may not want any number associated to sub-section, or you may want numbers associated to sub-paragraphs. This is controled by the secnumdepth counter respectively by:
\setcounter{secnumdepth}{1}
or
\setcounter{secnumdepth}{5}

Removing page numbers in TOC

Someone wanted more control over the table of contents. He wanted sub-sub-sections to appear but without any page number associated to them, i.e., not dotted line with the page number at the far right. This is trickier than the previous customization, because there is no mechanism provided to control this in LaTeX. So we need to modify the core function of the sectioning to perform this. We can find this macro by looking first at the table of contents file, i.e., the one with .toc extension.
In this file there are line like:
\contentsline {subsection}{\numberline {1.1}Test}{1}{section.1.1}

By searching the latex.ltx source file, we find that the \contentsline macro is defined as follows:
\def\contentsline#1{\csname l@#1\endcsname}

Meaning that it expands to a macro constructed using its first argument. In the example above, it would become \csname l@subsection, i.e., the control sequence \l@subsection.

In turn we find the definition of this macro in the article.cls class file:
\newcommand*\l@subsection{\@dottedtocline{2}{1.5em}{2.3em}}

Which brings us to the final macro \@dottedtocline. This is the macro that displays the contents of one line of the TOC when you call \tableofcontents. Here is its definition (warning: unreadable code !):
\def\@dottedtocline#1#2#3#4#5{%
  \ifnum #1>\c@tocdepth \else
    \vskip \z@ \@plus.2\p@
    {\leftskip #2\relax \rightskip \@tocrmarg \parfillskip -\rightskip
     \parindent #2\relax\@afterindenttrue
     \interlinepenalty\@M
     \leavevmode
     \@tempdima #3\relax
     \advance\leftskip \@tempdima \null\nobreak\hskip -\leftskip
     {#4}\nobreak
     \leaders\hbox{$\m@th
        \mkern \@dotsep mu\hbox{.}\mkern \@dotsep
        mu$}\hfill
     \nobreak
     \hb@xt@\@pnumwidth{\hfil\normalfont \normalcolor #5}%
     \par}%
  \fi}

This macros is pretty horrible, but hopefully we don't have to understand all of it! In fact we are interested only in the part of it that displays the dotted line and the page number. This is the part where the \leaders macro is used, and the fifth parameter is the page number. Replacing this by a simple \hfill\kern0pt removes the "leaders" and page number for all sectioning commands (starting at sub-sections in the article class, as the \l@section macro and above use a different code).

To have control on the depth, we can create a new counter:
\newcounter{tocnopages}
\setcounter{tocnopages}{2} % display page number up to sub-sections

and then redefine the macros with a condition on this counter
(the \ifnum #1>\c@tocnopages... \else... \fi part):

\makeatletter
\def\@dottedtocline#1#2#3#4#5{%
  \ifnum #1>\c@tocdepth \else
    \vskip \z@ \@plus.2\p@
    {\leftskip #2\relax \rightskip \@tocrmarg \parfillskip -\rightskip
     \parindent #2\relax\@afterindenttrue
     \interlinepenalty\@M
     \leavevmode
     \@tempdima #3\relax
     \advance\leftskip \@tempdima \null\nobreak\hskip -\leftskip
     {#4}\nobreak
     \ifnum #1>\c@tocnopages \hfill \kern0pt \else
       \leaders\hbox{$\m@th
          \mkern \@dotsep mu\hbox{.}\mkern \@dotsep
          mu$}\hfill
       \nobreak
       \hb@xt@\@pnumwidth{\hfil\normalfont \normalcolor #5}%
     \fi
     \par}%
  \fi}
\makeatother


Removing numbering but only in the TOC

This question was asked on StackOverflow: How to display for instance "I am a subsection" in the TOC, but still having: "2.1 I am a subsection" in the body of the document. I answered directly on the StackOverflow website so go get a look at it if you want to know more. The idea is to redefine conditionally
the \numberline macro (see the \contentsline above) to something empty in the \@dottedtocline macro.

Thursday, March 10, 2011

Counting the number of characters in TeX (part 2)

In a previous post, I explained how to get a control sequence \char to be \let to a character in a "TeX string". I will now show how to iterate over the whole Hello world!, using it for instance to count the number of characters.

The idea is to put a marker at the end of the string, and having a function that continues eating characters until the marker is found. I will use the macro \nil (which does not even need to be defined) as a marker. As an example, look at the following code:
\let\char=a
\ifx\char\nil yes\else no\fi.
\let\char=\nil
\ifx\char\nil yes\else no\fi.
This produces "no. yes.", which is exactly what we need.

Now the problem is that TeX processes token on-line, so we need a macro, e.g. that would behave like this: "if the next token is not \nil, execute me again, i.e., put me again before the remaining of the tokens.
The magic TeX construct that we need is called \afterassignment. It saves the next token and insert it back in the stream after the next assignment, which will be a \let in our case. As an example, consider the following code:
\def\showchar{Char is [\char].}
\afterassignment\showchar\gobblechar Hello world!
It produces "Char is [H].ello world!" since \showchar is put just after `H' has been assigned to \char.

I will now use \ifx and \afterassignemnt in two macros that will call each other. The first one handles the assignment of one token (character) to \char and calls the second; The second check if the token is \nil and if not, calls the first.

\def\assignthencheck{\afterassignment\checknil\gobblechar}

\def\checknil{%
  \ifx\char\nil%
     STOP%
     \let\next=\relax%
  \else%
     (\char)\let\next=\assignthencheck%
  \fi%
  \next%
}

\assignthencheck Hello world!\nil

This produces "(H)(e)(l)(l)(o)( )(w)(o)(r)(l)(d)(!)STOP"

The \assignthencheck macro should be clear after the explanation above. So, after \char has been \let to the first character `H', \checknil is called. This one assign the \next depending on what is \char: since it is not \nil, I asked TeX to print \char between parentheses (to see what happens), then \next is \let to \assignthencheck. At the end of the macro, \next is inserted back on the stream of tokens, i.e., before ello world!\nil. Hence, all characters are processed and printed in parentheses until \nil is found. In which case there is nothing left to do, so I print "STOP" and assign \next is \relax, which means "do nothing", to ensure the \next at the end of the macro will not do anything.

This is now easy to combine the whole to count the number of characters, using a TeX counter. I put here the complete example from the beginning for completeness.

\def\gobblechar{\let\char= }
\newcount\charcount
\def\countunlessnil{%
  \ifx\char\nil \let\next=\relax%
  \else%
    \let\next=\auxcountchar%
    \advance\charcount by 1%
  \fi\next
}%

\def\auxcountchar{%
  \afterassignment\countunlessnil\gobblechar%
}
\def\countchar#1{\edef\xx{#1}\charcount=0 \expandafter\auxcountchar\xx\nil}

\def\shownumchar#1{%
  \countchar{#1}%
  There are \the\charcount\ characters in [#1].%
}

\shownumchar{Hello world!}

\shownumchar{ Hello world!}

\def\text{Hello world!}
\def\atext{ \text\ }
\shownumchar{\atext}

This produces
There are 12 characters in [Hello world!].
There are 13 characters in [ Hello world!].
There are 14 characters in [ Hello world! ].


The \shownumchar macro is easy, it just calls \countchar then uses the TeX counter \charcount to display the number of characters in its argument.

The \countchar macro uses another trick that allows it to be called on macros that themselves call macros (last example, with \atext): the \edef expands its argument as much as possible before assigning it to \xx, then the \expandafter ensures \xx is expanded before \auxcountchar (otherwise, the number of characters would be 1, since we are in fact counting tokens and an unexpanded macro is a single token).


Final note: for completeness reasons, I must mention that in LaTeX there exists a \@tfor macro that iterates over a list of tokens. However, it does not handle spaces: they are skipped over unless explicit like "\ ".

Counting the number of characters in TeX

I was recently asked if it is possible know if there is only one character in a TeX "string," so that we could change the style. This was because usually text did not fit in a box of fixed size, so the font had to be smaller, unless there was only one character.
This is a good example to introduce the notions of scanning in TeX and the usage of \let. We want to create a macro \countchar so that for instance \countchar{Hello world!} would set a TeX counter to 12.
In this first part, I will explain how to catch one character into a control sequence (macro) \gobblechar in TeX. In a second post I describe how to use this macro to count the number of characters.

Gobbling one character using \def


First of all, we need to be able to look at one character at a time. Let us see the different options. One possibility is to define a macro with one argument, then use the macro without {}:
\def\gobblechar#1{\def\char{#1}}
\gobblechar Hello world!. Char is [\char].

This produces the following output: "ello world!. Char is [H]." This is the expected result: without the curly brackets around the "Hello world'" sentence, the argument of the macro becomes the first non-blank token, i.e., H; it is saved in the \char control sequence and the remaining "ello world!" is printed as usual.
    Let us try now with a control sequence as argument:
    \def\text{Hello world!}
    \gobblechar \text. Char is [\char].

    This produces ". Char is [Hello world!]." because now TeX takes the macro \text as a single token, feeding it as argument to \gobblechar. To prevent this behavior, we need \text to be expanded before \gobblechar. We achieve this by doing the following code:
    \expandafter\gobblechar \text. Char is [\char].
    Which produces again "ello world!. Char is [H]." There is still one problem with this function: it is not possible to catch spaces that way:
    \def\text{ Hello world!}
    
    \expandafter\gobblechar \text. Char is [\char].
    This produces "ello world!. Char is [H]." and the space before the `H' is forgotten. As I stated above, the macro will look for the first non-blank token. This is where we leave the \def solution and go with the \let construct.

    Gobbling one character using \let

    \let is a powerful TeX construct that is often unknown to people. While \def is used to build a control sequence (or macro) that will expand to something else, \let creates an alias to something else. Compare for instance:
    \def\deftext{\text}
    \let\lettext=\text
    \ifx\deftext\text yes\else no\fi.
    \ifx\lettext\text yes\else no\fi.
    
    Which produces "no. yes." The \ifx construct can tell if two macros are "the same" (it is in fact a bit more complicated, I'll maybe write about it in the future). Using \let, it is for instance possible to swap the meaning of \a and \b by using:
    \let\tmp=\a \let\a=\b \let\b=\tmp
    And it is possible to \let to a character, for instance,
    \let\a=a
    This is \a\ sentence th\a t h\a s some \a's.
    
    produces "This is a sentence that has some a's." So, let us now redefine our \gobblechar as follows:
    \def\gobblechar{\let\char=}
    \expandafter\gobblechar \text. Char is [\char].
    
    This produces again "ello world!. Char is [H]." Indeed, TeX will expand the beginning to \let\char= Hello World!, and since the definition of \let states that there can be one optional space between the equal sign and the token. Let us put modify the macro to:
    \def\gobblechar{\let\char= }
    \expandafter\gobblechar \text. Char is [\char].
    
    This now produces "Hello world!. Char is [ ]." with the leading space correctly caught in \char. Note that simply writing
    \let\char=  Hello world!. Char is [\char].
    
    would not work, since TeX would directly convert the two spaces between `=' and `H' to a single space. And then assign \char to `H'.