Friday, October 7, 2011

Tex capacity exceeded, sorry!



Every once in a while, someone in my company that tries to generate our documentation get the following error :

Tex capacity exceeded, sorry [main memory size 1000000].

This message is even accompanied (in the log file with the following message

If you really absolutely need more capacity, you can ask a wizard to enlarge me.

Since I am the de facto "wizard" around, here, people turn to me, and every time, we struggle to fix this problem since not one of us has the same linux installation.

So I decided to make a note on the main ways to solve this problem.
  • Often, this means there is a problem in the LaTeX code, like infinite loop (if you are a guru and use \loop commands), but more often there is a forgotten group somewhere.
  • In our case, we really have a memory problem as the document we generate has 600+ pages, hence we need to increase the main memory size:
    1. Locate texmf.cnf and edit it (as root). (On some distribution, you may prefer to edit files in /etc/texmf/texmf.d)
    2. Search for main_memory and change the size (for instance to 2000000).
    3. You need to recreate the format files of LaTeX depending on your distribution, you may do so by using update-texmf (debian) or fmtutil-sys --all (redhat)
    4. If this does not work (i.e., LaTeX still complains having only 1000000 memory), look in your home directory: you may have an old ~/.texmf-var directory that LaTeX uses first and hence does not care about your system-wide configuration (This mistake cost me hours on more that one occasion!)

Monday, April 11, 2011

Removing garbage from the pdf bookmarks

In the process of generating proceedings for the Workshop on Intermediate Representation, I explained previously how to include pdf pages and I mixed this with a technique so that the titles are aligned on the left and the authors on the right (see adjusting paragraphs).
Since I also wanted slightly different colors (red!40!black for the title, and blue!60!black for the authors), all this ended up in the bookmark section of the pdf file:

red!40!black Tirex: A Textual Target-level...Exchangeto 1em. to 0pt

And this for all included files.

I have to yet worked out a clean way to get rid of this garbage, but I found that bookmarks are saved by the hyperref package in the .out file, which contains lines like the following:

\BOOKMARK [0][]{chapter*.3}{red!40!black \040Tirex: A Textual 
Target-Level Intermediate Representation for Compiler Exchangeto 
1em. to 0pt to 1em. blue!50!black \040Artur Pietrek, Florent 
Bouchez and Benoit Dupont De Dinechin}{}

So the obvious trick is to clean this .out file save it and backup it.

\BOOKMARK [0][]{chapter*.3}{Tirex: A Textual Target-Level 
Intermediate Representation for Compiler Exchange -- Artur 
Pietrek, Florent Bouchez and Benoit Dupont De Dinechin}{}

Then, run again pdflatex once. This is important since the first run will erase the .out file and replace it with a file with garble again.

This produces the expected result, with a clean index bookmark in my pdf. However, I would prefer a cleaner way to perform this. This is probably possible in the hyperref package, but the difficulty will likely come from the interaction with pdfpages.

Thursday, March 24, 2011

Publishing conference proceedings using LaTeX part III: polishing the proceedings

This is the last part of the "creating proceedings using LaTeX" series. In two previous posts, I explained some guidelines to give to the writers of articles, then how to include the resulting pdf in LaTeX. I will now details some polishing for the proceedings.

Having a table of contents


With the pdfpages package, I explained how to add lines in the TOC for every inserted pdf. You then have just to use the \tableofcontents macro.
Since the workshop will be organized into section, I wanted to group the articles into sessions in the table of contents (TOC). I created a counter (to number the sessions), and a macro to create a new session (with a title).

\newcounter{sessioncount}
\setcounter{sessioncount}{0}

\newcommand\addsession[1]{
  \stepcounter{sessioncount}
  \addtocontents{toc}{\vskip 8pt {\noindent\Large 
    {\sc Session \arabic{sessioncount}} #1}\protect\par}
}

The \addsession macro takes a title as argument and adds a line in the TOC, looking like this:
SESSION X: session title

So I inserted three of these between the inclusion of articles.

Finally, I wanted to have the workshop name & date at the top of all pages, plus page numbers at the bottom. I used the fancyhdr package which allows "fancy headers."

\usepackage{fancyhdr}
\pagestyle{fancy}      % don't forget this or it won't work!

This initializes the package, then we can make our own design using:

\fancyhead{}
\fancyfoot{}
\renewcommand\headrulewidth{0pt}
\fancyhead[L]{\it Workshop title}
\fancyhead[R]{\it Some location, Year}
\fancyfoot[C]{\thepage}

First, we remove any existing headers and footers, then remove the line below the headers (by setting its width to 0pt). Then I choose to add the workshop name at the top left, the location and date at the top right, and finally page numbers for all pages at the bottom center. These are inserted on all pages, in particular pages included from other pdfs!

Publishing conference proceedings using LaTeX part II: concatenating pdf files

This is the second post on the series "Publishing conference proceedings using LaTeX". The first one dealt with guidelines sent to authors to have coherent files. In this part, I will explain how to merge all articles into a single file.

Concatenating pdf files in LaTeX



It would be easy to do so using pdftk for instance; However, I wanted to add page numbers and some information in the "header", i.e., the first line at the top of a page. So I went for another option, the pdfpages package.

This package lets you include pages of a pdf file directly into you document. I won't go into details (you can read the documentation for that) and only show how I used it:

\includepdf[pages=1-, pagecommand={},%
  addtotoc={1,chapter,0,Title\qquad{\it Authors},paper:X}]%
  {papers/paper_X.pdf} 

This include all pages of the file given in argument. The pagecommand option is used to pass LaTeX code that will be executed on each added page. The default is
pagecommand={\thispagestyle{empty}})
which is not what I wanted (I wanted page numbers and headers!), so I passed an empty code.
The addtotoc creates an entry in the table of contents (TOC), which is nice to have at the beginning of the proceedings; Its options are:
  • 1: the hyperlink will jump to page 1 of the include file.
  • chapter: tells to typeset the line in the TOC as the \chapter command.
  • 0: is the "depth" in sectioning (section is 1, chapter 0, etc.).
  • "Title..." is the text displayed in the toc.
  • "paper:X" is a label on the included pages (probably the first one), in case you want to \ref it.


To include all files more easily, I used the \foreach command of the TikZ package. I also used my trick for adjusting two paragraphs (the \alignpars macro) so that the article title will be align to the left and the authors or the right, with "leaders" (dotted line) linking them (although I had a problem with the \vphantom trick which does not seem to be allowed in a TOC, and had to replace it with \hbox to 0pt{}).

So I first devised a macro to add an article:
\newcommand\addarticle[3]{
  \clearpage{\pagestyle{empty}\cleardoublepage}
  \includepdf[pages=1-, pagecommand={}, 
  addtotoc={1,chapter,1,{\alignpars{#1}{\it #2}},paper:#3}]  
     {papers/paper_#3} 
}

The first line of the macro clears the last page; Then, if we are on an even page we also clears this one (so that every article starts on an odd-numbered page) while removing any headers/footers (e.g., page numbers) from this page.

I then called the macro from inside a \foreach loop. Here is an example with only two dummy articles (I had 10 articles):

\foreach \file/\title/\authors in {
   1.pdf/
   {First article}/
   {Some Authors},
%
   2.pdf/
   {Second article}/
{Some other Authors}}
%
{\addarticle{\title}{\authors}{\file}}

In the next (and last) part, I will explain how to do some more polishing to the proceedings, with sessions in the TOC for instance.

Publishing conference proceedings using LaTeX part I: guidelines to authors

I had to create the proceedings for the workshop I am co-organizing yesterday (Workshop on Intermediate Representations).

My first idea was to create some introductory pages in pdf format using LaTeX and then use pdftk to assemble the files. But I also wanted all papers to look similar, and the whole proceedings to have page numbers, so I choose another way. As the whole procedure might be a bit long (and my posts usually are already quite long enough), I will split this "tutorial" into different posts. This one deals with the guidelines to authors, and will be followed by: part II, part III, other parts ???

Guidelines to authors


First of all, to maintain coherence, all authors should abide to the same rules when producing the final version of their articles. Here is the set of rules I gave them:
  • Use the standard sigplanconf template with 9pt fonts, and maximum 8 pages (this was the template asked for the submission). Since those are not copyrighted proceedings, there was no need for the copyright (end of first column of the first page), so the option nocopyrightspace had to be added to the class.
  • Since the workshop is held in France, I wanted A4 paper format. This is however not an available option for sigplanconf, so I choose margins that more or less did not change the original "letter" layout:
    • A4 paper
    • 1 inch margins at top and bottom
    • .65 inch margins at left and right
    This was easily achieved with the help of the geometry package:
    \usepackage[ a4paper,
      top=1in,
      bottom=1in,
      left=.65in,
      right=.65in,
      offset={0pt,0pt}
      ]{geometry}
    

  • Some people reported problems when compiling using latex and then producing the pdf with dvipdf (the layout was right in A4 format in the .dvi, but back to letter format in the final pdf). Since I wanted pdf files anyway, I recommended to compile the LaTeX files with pdflatex.


  • It is common to balance the two columns on the last page of an article as it looks better. This is usually the bibliography at this point, and I used to this manually by using \input{article.bbl}, once the bibliography was completely done, and inserting a \newpage at the right position (which, in two-column mode, starts a new column leaving blanks space at the bottom of the current one). But I discovered a package to do this:
    \usepackage{balance}
    
    And then at the end of the article:
    \balance
    \bibliography{article}
    

  • Finally, you should make sure that there is no page number, no header, no footer, as these will be added later when compiling all articles. This is the default with the sigplanconf document class, but with another class you may have for instance to issue a \pagestyle{empty} command.

Wednesday, March 23, 2011

Adjusting end of paragraph with start of the next one

I had to compile a selection of articles to create so-called "proceedings" for a workshop. One problem was to join together multiple pdf files; I will address this problem in a later post.
The other problem I had is I I wanted to display the list of pairs article title / article authors in a similar style to a "table of contents", i.e., the title on the left with a line of dots (a.k.a. "leaders" in TeX) joining the names of authors on the right.

If everything fits on one line


Here is such an example:

My dummy title . . . . . . . . . . . . . . . . . . . . . . A. Nonymous



It this case, it is very easily achieved using the following code:

\noindent
My dummy title\dotfill {\it A. Nonymous}

Note that if you want more control over the way leaders are displayed, you can use the generic macro, \leaders of which \dotfill is just a specialization:

\noindent
My dummy title\leaders\hbox to 1em{\hss.\hss}
\hfill {\it A. Nonymous}

The \leaders macro takes first a box, and then "repeat" this box over the length of the next \hskip, i.e., some horizontal "glue". In this case, \hfill is a shortcut for
\hskip 0pt plus 1fill

which means "a horizontal space of length 0 and infinite stretchability". In our case, since the title and author do not fill completely the line, the \hfill will take up all the remaining space between the two, and then the \leaders macro will fill this space with horizontal boxes of length exactly 1em (i.e., approximately the length of an "M") containing a centered dot "."

When one line is not enough

In that case, we start to have some problems. Consider the following example:

Lorem ipsum dolor sit amet, consectetur.
\dotfill{\it A. Nonymous and U. Known}

This produces the following
Lorem ipsum dolor sit amet, consectetur. A. Nonymous
and U. Known
But we in fact want:
Lorem ipsum dolor sit amet, consectetur. . . . . . . .
. . . . . . . . . . . . . . . . . . A. Nonymous and U. Known
Because all the line where the \dotfill appears is already full, there is no more space to "fill"!

The idea to solve this problem is to separate the title from the authors by two such fills, and incent TeX to break between those fills. Here is the solution:

\noindent
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do.\dotfill
\penalty0\hskip 0pt plus -1fill%
\vphantom{x}\dotfill{\it A. Nonymous and U. Known}\break

Let us review this code in details:
  • The \penalty0 tells TeX that there is no penalty in breaking a line here.
  • The \hskip 0pt plus -1fill is a negative fill and has the following effect: if TeX does not break at the penalty, (because there is enough space on the line), then it will cancel the first \dotfill; else, it will do nothing (there is nothing to cancel at the beginning of a new line). This is to avoid a blank inserted between the two dotfill if they fit into a single line.
  • The \vphantom{x} before the second \dotfill is so that there will be something at the beginning of the second line after which to insert the dotfill. This macro insert a "phantom" box of width 0 (and height the same as an "x" but this is a side-effect).
  • Finally, we end the paragraph with \break, which makes TeX break the paragraph without adding any glue, i.e., the end of the paragraph will be stuck to the right. This is what forces the second \dotfill to take the remaining space.

To conclude, here is the final code embedded in a macro:
\def\alignpars#1#2{
\noindent \ignorespaces #1
\dotfill%
\penalty0\hskip 0pt plus -1fill\relax
\vphantom{x}\dotfill
#2\unskip\break
}

I added in this macro \ignorespaces so that any spaces at the beginning of the first arguments will not produce blanks, and \unskip after the second arguments for the same reason (remove spaces at the end so there is no blank).

You can try this macros on variable-length paragraphs and see for yourself:

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.
}

\alignpars{
Lorem ipsum dolor sit amet.
laborum.}{
Excepteur sint proident, sunt in laborum.
}

\alignpars{
Lorem ipsum dolor sit amet snatohu sntahx laborum.}{
Excepteur sint proident, sunt in laborum.}


\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat.
laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat.
laborum.}{
Lorem ipsum dolor sit amet, 
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
laborum.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat.
laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur sint occaecat. santoheu satoheu snthao eusnth
aoeu laborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
fugiat nulla pariatur. Excepteur.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
toto nulla pariatur. Excepteur sint occaecat. santoheu satoheu snthao eusnth
lfugiat aborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
nulla pariatur. Excepteur.
}

\alignpars{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
toto nulla pariatur. Excepteur sint occaecat. santoheu satoheu snthao eusnth
sotnuh saousnth lfugiat aborum.}{
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
pariatur. Excepteur.
}

Tuesday, March 22, 2011

Modifing some heading styles in LaTeX table of contents

I had several questions asked about the style of section titles in the table of contents. These concern the display or not of section numbers and/or pages for these sections.

Sectioning depth in TOC

The first easy question concerns the sectioning depth at which to stop using numbers. Usually, in the article class, sections are numbered up to the \subsubsection level, (e.g., "1.1.1 My sub-sub-section"), and not numbered below that, i.e., not for paragraphs or sub-paragraphs.

Sometimes, you may not want the sub-sub-sections to be in the table of contents (TOC), or you may want the paragraphs. For this, LaTeX provides a counter, tocdepth that we can set to determine the sectioning depth at which to stop putting sections in TOC. For instance:
\setcounter{tocdepth}{4}
will display up to the paragraph, while
\setcounter{tocdepth}{1}
will only display the sections.

Additionally, you may not want any number associated to sub-section, or you may want numbers associated to sub-paragraphs. This is controled by the secnumdepth counter respectively by:
\setcounter{secnumdepth}{1}
or
\setcounter{secnumdepth}{5}

Removing page numbers in TOC

Someone wanted more control over the table of contents. He wanted sub-sub-sections to appear but without any page number associated to them, i.e., not dotted line with the page number at the far right. This is trickier than the previous customization, because there is no mechanism provided to control this in LaTeX. So we need to modify the core function of the sectioning to perform this. We can find this macro by looking first at the table of contents file, i.e., the one with .toc extension.
In this file there are line like:
\contentsline {subsection}{\numberline {1.1}Test}{1}{section.1.1}

By searching the latex.ltx source file, we find that the \contentsline macro is defined as follows:
\def\contentsline#1{\csname l@#1\endcsname}

Meaning that it expands to a macro constructed using its first argument. In the example above, it would become \csname l@subsection, i.e., the control sequence \l@subsection.

In turn we find the definition of this macro in the article.cls class file:
\newcommand*\l@subsection{\@dottedtocline{2}{1.5em}{2.3em}}

Which brings us to the final macro \@dottedtocline. This is the macro that displays the contents of one line of the TOC when you call \tableofcontents. Here is its definition (warning: unreadable code !):
\def\@dottedtocline#1#2#3#4#5{%
  \ifnum #1>\c@tocdepth \else
    \vskip \z@ \@plus.2\p@
    {\leftskip #2\relax \rightskip \@tocrmarg \parfillskip -\rightskip
     \parindent #2\relax\@afterindenttrue
     \interlinepenalty\@M
     \leavevmode
     \@tempdima #3\relax
     \advance\leftskip \@tempdima \null\nobreak\hskip -\leftskip
     {#4}\nobreak
     \leaders\hbox{$\m@th
        \mkern \@dotsep mu\hbox{.}\mkern \@dotsep
        mu$}\hfill
     \nobreak
     \hb@xt@\@pnumwidth{\hfil\normalfont \normalcolor #5}%
     \par}%
  \fi}

This macros is pretty horrible, but hopefully we don't have to understand all of it! In fact we are interested only in the part of it that displays the dotted line and the page number. This is the part where the \leaders macro is used, and the fifth parameter is the page number. Replacing this by a simple \hfill\kern0pt removes the "leaders" and page number for all sectioning commands (starting at sub-sections in the article class, as the \l@section macro and above use a different code).

To have control on the depth, we can create a new counter:
\newcounter{tocnopages}
\setcounter{tocnopages}{2} % display page number up to sub-sections

and then redefine the macros with a condition on this counter
(the \ifnum #1>\c@tocnopages... \else... \fi part):

\makeatletter
\def\@dottedtocline#1#2#3#4#5{%
  \ifnum #1>\c@tocdepth \else
    \vskip \z@ \@plus.2\p@
    {\leftskip #2\relax \rightskip \@tocrmarg \parfillskip -\rightskip
     \parindent #2\relax\@afterindenttrue
     \interlinepenalty\@M
     \leavevmode
     \@tempdima #3\relax
     \advance\leftskip \@tempdima \null\nobreak\hskip -\leftskip
     {#4}\nobreak
     \ifnum #1>\c@tocnopages \hfill \kern0pt \else
       \leaders\hbox{$\m@th
          \mkern \@dotsep mu\hbox{.}\mkern \@dotsep
          mu$}\hfill
       \nobreak
       \hb@xt@\@pnumwidth{\hfil\normalfont \normalcolor #5}%
     \fi
     \par}%
  \fi}
\makeatother


Removing numbering but only in the TOC

This question was asked on StackOverflow: How to display for instance "I am a subsection" in the TOC, but still having: "2.1 I am a subsection" in the body of the document. I answered directly on the StackOverflow website so go get a look at it if you want to know more. The idea is to redefine conditionally
the \numberline macro (see the \contentsline above) to something empty in the \@dottedtocline macro.