Learning the GNU development tools

Edition 1

1998-07-29

Eleftherios Gkioulekas

Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the sections entitled "Copying" and "Philosophical issues" are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice and the sections entitled "Copying" and "Philosophical issues" may be stated in a translation approved by the Free Software Foundation instead of the original English.

Preface

The purpose of this document is to introduce you to the GNU build system, and show you how to use it to write good code. It is also meant to serve as a manual for Autotools, an additional package that provides a variety of additional features. Finally it discusses peripheral topics such as how to use GNU Emacs as a source code navigator and how to make heads and tails out of Texinfo. The intended reader should be a software developer who understands his programming languages, and wants to learn how to put together his programs the way a typical FSF program is put together.

When we speak of the GNU build system we refer primarily to the following three programs:

Autoconf produces a configuration shell script, named `configure', which probes the installer platform for portability related information which is required to customize makefiles, configuration header files, and other application specific files. Then it proceeds to generate customized versions of these files from generic templates. This way, the user will not need to customize these files manually.
Automake produces makefile templates, `Makefile.in' to be used by Autoconf, from a very high level specification stored in a file called `Makefile.am'. Automake produces makefiles that conform to the GNU makefile standards, taking away the extraordinary effort required to produce them by hand. Automake requires Autoconf in order to be used properly.
Libtool makes it possible to compile position indepedent code and build shared libraries in a portable manner. It does not require either Autoconf, or Automake and can be used indepedently. Automake however supports libtool and interoperates with it in a seamless manner.

The GNU build system has two goals. The first is to simplify the development of portable programs. The second is to simplify the building of programs that are distributed as source code. The first goal is achieved by the automatic generation of a `configure' shell script. The second goal is achieved by the automatic generation of Makefiles and other shell scripts that are typically used in the building process. This way the developer can concentrate on debugging his source code, instead of his overly complex Makefiles. And the installer can compile and install the program directly from the source code distribution by a simple and automatic procedure.

The GNU build system needs to be installed only when you are developing programs that are meant to be distributed. To build a program from distributed source code, you only need make, the compiler, a shell, and occasionally standard Unix utilities like sed, awk, yacc, lex.

Some tasks that are simplified by the GNU build system include:

Building multidirectory software packages. It is much more difficult to use raw make recursively. Having simplified this step, the developer is encouraged to organize his source code in a deep directory tree rather than lump everything under the same directory. Developers that use raw make often can't justify the inconvenience of recursive make and prefer to disorganize their source code. With the GNU tools this is no longer necessary.
Automatic configuration. You will never have to tell your users that they need to edit your Makefile. You yourself will not have to edit your Makefiles as you move new versions of your code back and forth between different machines.
Automatic makefile generation. Writing makefiles involves a lot of repetition, and in large projects it will get on your nerves. The GNU build system instead requires you to write `Makefile.am' files that are much more terse and easy to maintain.
Support for test suites. You can very easily write test suite code, and by adding one extra line in your `Makefile.am' make a check target available such that you can compile and run the entire test suite by running make check.
Automatic distribution building. The GNU build tools are meant to be used in the development of free software, therefore if you have a working build system in place for your programs, you can create a source code distribution out of it by running make distcheck.
Shared libraries. Building shared libraries becomes as easy as building static libraries.

The Autotools package complements the GNU build system by providing the following additional features:

Automatic generation of legal notices that are needed in order to apply the GNU GPL license.
Automatic generation of directory trees for new software packages, such that they conform to the GNITS standard (more or less).
Some rudimentary portability framework for C++ programs. There is a lot of room for improvement here, in the future. Also a framework for embedding text into your executable and handling include files accross multiple directories.
Support for writing portable software that uses both Fortran and C++.
Additional support for writing software documentation in Texinfo, but also in LaTeX.

Autotools is still under development and there may still be bugs. At the moment Autotools doesn't do shared libraries, but that will change in the future.

This effort began by my attempt to write a tutorial for Autoconf. It involved into "Learning Autoconf and Automake". Along the way I developed Autotools to deal with things that annoyed me or to cover needs from my own work. Ultimately I want this document to be both a unified introduction of the GNU build system as well as documentation for the Autotools package.

I believe that knowing these tools and having this know-how is very important, and should not be missed from engineering or science students who will one day go out and do software development for academic or industrial research. Many students are incredibly undertrained in software engineering and write a lot of bad code. This is very very sad because of all people, it is them that have the greatest need to write portable, robust and reliable code. I found from my own experience that moving away from Fortran and C, and towards C++ is the first step in writing better code. The second step is to use the sophisticated GNU build system and use it properly, as described in this document. Ultimately, I am hoping that this document will help people get over the learning curve of the second step, so they can be productive and ready to study the reference manuals that are distributed with all these tools.

This manual of course is still under construction. When I am done constructing it some paragraph somewhere will be inserted with the traditional run-down of summaries about each chapter. I write this manual in a highly non-linear way, so while it is under construction you will find that some parts are better-developed than others. If you wish to contribute sections of the manual that I haven't written or haven't yet developed fully, please contact me.

Chapters 1,2,3,4 are okey. Chapter 5 is okey to, but needs a little more work. I removed the other chapters to minimize confusion, but the sources for them are still being distributed as part of the Autotools package for those that found them useful. The other chapters need a lot of rewriting and they would do more harm than good at this point to the unsuspecting reader. Please contact me if you have any suggestions for improving this manual.

Acknowledgements

This document and the Autotools package have originally been written by Eleftherios Gkioulekas. Many people have further contributed to this effort, directly or indirectly, in various way. Here is a list of these people. Please help me keep it complete and exempt of errors.

The appendix Philosophical issues has been written by Richard Stallman. (see section Philosophical issues)
The chapter on Fortran, and the Autotools support for developing software that is partly written in Fortran is derived from the work of John Eaton on GNU Octave, which I mainly generalized for use in other programs. (see section Fortran with Autoconf).
Mark Galassi was the first person, to the best of my knowledge, who tried to write an Autoconf tutorial. It is thanks to his work that I was inspired to begin this work.

FIXME: I need to start keeping track of acknowledgements here

Copying

The following notice refers to the Autotools package with which this document is being distributed. The following notice refers to the Autotools package, which includes this documentation, as well as the source code for utilities like `acmkdir' and for additional Autoconf macros. The complete GNU build system involves other packages also, such as Autoconf, Automake, Libtool and a few other accessories. These packages are also free software, and you can obtain them from the Free Software Foundation. For details on doing so, please visit their web site http://www.fsf.org/. Although Autotools has been designed to work with the GNU build system, it is not yet an official part of the GNU project.

The Autotools package is "free"; this means that everyone is free to use it and free to redistribute it on a free basis. The Autotools package is not in the public domain; it is copyrighted and there are restrictions on its distribution, but these restrictions are designed to permit everything that a good cooperating citizen would want to do. What is not allowed is to try to prevent others from further sharing any version of this package that they might get from you.

Specifically, we want to make sure that you have the right to give away copies of the programs that relate to Autotools, that you receive source code or else can get it if you want it, that you can change these programs or use pieces of them in new free programs, and that you know you can do these things.

To make sure that everyone has such rights, we have to forbid you to deprive anyone else of these rights. For example, if you distribute copies of the Autotools-related code, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must tell them their rights.

Also, for our own protection, we must make certain that everyone finds out that there is no warranty for the programs that relate to Autotools. If these programs are modified by someone else and passed on, we want their recipients to know that what they have is not what we distributed, so that any problems introduced by others will not reflect on our reputation.

The precise conditions of the licenses for the programs currently being distributed that relate to Autotools are found in the General Public Licenses that accompany it.

Introduction to the GNU build system

Installing a GNU package

When you download an autoconfiguring package , it usually has a filename like: `foo-1.0.tar.gz' where the number is a version number. To install it, first you have to unpack the package to a directory someplace:

% gunzip foo-1.0.tar.gz
% tar xf foo-1.0.tar

Then you enter the directory and look for files like `README' or `INSTALL' that explain what you need to do. Almost always this amounts to typing the following commands:

% cd foo-1.0
% configure 
% make
% make check
% su
# make install

The `configure' command invokes a shell script that is distributed with the package that configures the package for you automatically. First it probes your system through a set of tests that allow it to determine things it needs to know, and then it uses this knowledge to generate automatically a `Makefile' from a template stored in a file called `Makefile.in'. When you invoke `make' with no argument, it executes the default target of the generated `Makefile'. That target will compile your source code, but will not install it. If your software comes with self-tests then you can compile and run them by typing `make check'. To install your software, you need to explicitly invoke `make' again with the target `install'. In order for `make' to work, you must make the directory where the `Makefile' is located the current directory.

During installation, the following files go to the following places:

Executables   -> /usr/local/bin
Libraries     -> /usr/local/lib
Header files  -> /usr/local/include
Man pages     -> /usr/local/man/man?
Info files    -> /usr/local/info

where `foo' is the name of the package. The `/usr/local' directory is called the prefix. The default prefix is always `/usr/local' but you can set it to anything you like when you call `configure'. For example, if you want to install the package to your home directory instead, you will have to do this instead:

% configure --prefix=/home/skeletor
% make
% make check
% make install

The `--prefix' argument tells `configure' where you want to install your package, and `configure' will take that into account and build the proper makefile automatically.

The `configure' script is compiled by `autoconf' from the contents of a file called `configure.in'. These files are very easy to maintain, and in this tutorial we will teach you how they work. The `Makefile.in' file is also compiled by `automake' from a very high-level specification stored in a file called `Makefile.am'. The developer then only needs to maintain `configure.in' and `Makefile.am'. As it turns out, these are so much easier to work with than Makefiles and so much more powerful, that you will find that you will not want to go back to Makefiles ever again once you get the hang of it.

In some packages, the `configure' script supports many more options than just `--prefix'. To find out about these options you should consult the file `INSTALL' and `README' that are traditionally distributed with the package, and also look at `configure''s self documenting facility:

% configure --help

Configure scripts can also report the version of Autoconf that generated them:

% configure --version

The makefiles generated by `automake' support a few more targets for undoing the installation process to various levels. More specifically:

If configure or make did it, make distclean undoes it.
If make did it, make clean undoes it.
If make install did it, make uninstall undoes it.

Also, in the spirit of free redistributable code, there are targets for cutting a source code distribution. If you type

% make dist

it will rebuild the `foo-1.0.tar.gz' file that you started with. If you modified the source, the modifications will be included in the distribution (and you should probably change the version number). Before putting a distribution up on FTP, you can test its integrity with:

% make distcheck

This makes the distribution, then unpacks it in a temporary subdirectory and tries to configure it, build it, run the test-suite, and check if the installation script works. If everything is okey then you're told that your distribution is ready.

Once you go through this tutorial, you'll have the know-how you need to develop autoconfiguring programs with such powerful Makefiles.

Installing the GNU build system

It is not unusual to be stuck on a system that does not have the GNU build tools installed. If you do have them installed, check to see whether you have the most recent versions. To do that type:

% autoconf --version
% automake --version
% libtool --version

If you don't have any of the above packages, you need to get a copy and install them on your computer. The distribution filenames for the GNU build tools are:

autoconf-2.12.tar.gz
automake-1.3.tar.gz
libtool-1.3.tar.gz

Before installing these packages however, you will need to install the following needed packages from the FSF:

make-3.76.1.tar.gz
m4-1.4.tar.gz
texinfo-3.9.tar.gz
tar-1.12.shar.gz

You will need the GNU versions of make, m4 and tar even if your system already has native versions of these utilities. To check whether you do have the GNU versions see whether they accept the --version flag. If you have proprietory versions of make or m4, rename them and then install the GNU ones. You will also need to install Perl, the GNU C compiler, and the TeX typesetter.

It is important to note that the end user will only need a decent shell and a working make to build a source code distribution. The developer however needs to gather all of these tools in order to create the distribution.

Finally, to install Autotools begin by installing the following additional utilities from FSF:

bash-2.01.tar.gz
sharutils-4.2.tar.gz

and then install

autotools-X.X.tar.gz

You should be able to obtain a copy of Autotools from the same site from which you received this document.

The installation process, for most of these tools is rather straightforward:

% ./configure
% make
% make check
% make install

Most of these tools include documentation which you can build with

% make dvi

Exceptions to the rule are Perl, the GNU C compiler and TeX which have a more complicated installation procedure. However, you are very likely to have these installed already.

The versions numbers indicated above were the current ones at the time of this writing. If more recent versions are available, you may want to use them instead.

Hello world example

To get your feet wet we will show you how to do the Hello world program using `autoconf' and `automake'. In the fine tradition of K&R, the C version of the hello world program is:

#include <stdio.h>
main()
{
 printf("Howdy world!\n");
}

Let's say we've put this in a file called `hello.c'. Please place this file under an empty directory since we will be producing a lot of clutter soon enough! It can be compiled and ran directly with the following commands:

% gcc hello.c -o hello
% hello

If you are on a non-GNU variant of Unix, your compiler might be called `cc' but the usage will be pretty much the same.

Now to do the same thing the `autoconf' and `automake' way create first the following files:

`Makefile.am'

bin_PROGRAMS = hello
hello_SOURCES = hello.c

`configure.in'

AC_INIT(hello.c)
AM_INIT_AUTOMAKE(hello,1.0)
AC_PROG_CC
AC_PROG_INSTALL
AC_OUTPUT(Makefile)

Now run `autoconf':

% aclocal
% autoconf

This will create the shell script `configure'. Next, run `automake':

% automake -a
required file "./install-sh" not found; installing
required file "./mkinstalldirs" not found; installing
required file "./missing" not found; installing
required file "./INSTALL" not found; installing
required file "./NEWS" not found
required file "./README" not found
required file "./COPYING" not found; installing
required file "./AUTHORS" not found
required file "./ChangeLog" not found

The first time you do this, you get a spew of messages. It says that `automake' installed a whole bunch of cryptic stuff: `install-sh', `mkinstalldirs' and `missing'. These are shell scripts that are needed by the makefiles that `automake' generates. You don't have to worry about what they do. It also complains that the following files are not around:

INSTALL, COPYING, NEWS, README, AUTHORS, ChangeLog

These files are required to be present by the GNU coding standards, and we will discuss them in detail later. Nevertheless, it is important that these files are at least touched, because when we try to make a test distribution by calling `make distcheck' later on, it will cause a fatal error if any of these files are missing. Eventually, we will suggest that you use the `acmkdir' utility to automatically generate templates for these files which you can edit at will. To make these files exist, now please type:

% touch NEWS README AUTHORS ChangeLog

and to make Automake aware of the existence of these files, please rerun it:

% automake -a

Only when Automake completes without error messages, you can assume that the generated `Makefile.in' might be correct.

Now you are "all set" in the sense that your package is in the state that will allow you, as well as the end-user to type:

% configure
% make
% hello

to compile and run the hello world program. The idea of course is that the end-user will get the package "all-set" and will not have to have a copy of `automake' and `autoconf' to get it compiled. This is the developer's responsibility. If you really want to install it, go ahead and do it:

# make install

Oops, you changed your mind! Then uninstall it:

# make uninstall

If you didn't use the `--prefix' argument to point to your home directory you may need to be superuser to invoke the install commands.

Please note that in order for the above to work you need to use the GNU `gcc' compiler. Automake dependends on `gcc''s ability to compute the dependencies, so without `gcc' this example will not work. If you do have `gcc' installed, then the `configure' script will select it for you.

If you feel like cutting a distribution, you can do it with:

% make distcheck

This will create a file called `hello-1.0.tar.gz' in the current working directory so that when unpacked it is "all-set" for the user to fire away `configure' and start building. While building that file, Automake includes the precomputed dependencies and disables the dependencies from the end-user makefiles. This way the end-user will not have to have `gcc' to compile the package.

Now pretend that you are the end-user, unpack this file, enter it and compile it all over again:

% gunzip hello-1.0.tar.gz
% tar xf hello-1.0.tar
% cd hello-1.0
% configure
% make 
% hello

And this is the full circle.

It is very important that when you run Automake the `configure' file already exists, otherwise Automake will not include it in the distribution when you do `make dist' and the target `distcheck' will fail to build. This means that you should run Autoconf before running Automake. To see this effect go back up to the toplevel directory and do the following:

% rm -f configure
% automake
% make distcheck

You will notice that the `distcheck' target fails. Before you ever cut a distribution and put it up on FTP, you should put content to the files

INSTALL, COPYING, NEWS, README, AUTHORS, ChangeLog

The file `COPYING' has to do with copyright issues, which we will discuss on a separate chapter. The other files are part of the software documentation. The GNU coding standards require that these files be present when you distribute your source code.

Maintaining the documentation files

In this section we give a summary overview of how you should maintain these files. For more details, please see the GNU coding standards, as published by the FSF.

The README file: Every distribution must contain this file. This is the file that the installer must read fully after unpacking the distribution and before configuring it. You should briefly explain the purpose of the distribution, and reference all other documentation available. Instructions for installing the package normally belong in the `INSTALL' file. However if you have something that you feel the installer should know then mention it in this file.
The INSTALL file: Because the GNU installation procedure is streamlined, a standard `INSTALL' file will be created for you automatically by Automake. If you have something very important to say, it may be best to say it in the `README' file instead. the `INSTALL' file is mostly for the benefit of people who've never installed a GNU package before. However, if your package is very unusual, you may decide that it is best to modify the standard INSTALL file or write your own.
The AUTHORS file: The purpose of this file is to record for copyright purposes who wrote what. The file must state the name of the owner, and it must also list the names of contributors and for each contributor the set of files they modified and the set of new files that they created. For example, typical contents of the `AUTHORS' file might look like:
```
Authors of FOO
See also the files THANKS and ChangeLog

Bart Simpson designed and implemented FOO
Principal Skinner:
  entire files  bob1.cc, bob2.cc, bob3.cc
  extensive changes in foo1.cc, foo2.cc, foo3.cc
```
The THANKS file: Here you list the names and email addresses of the people who contributed to the development of your package either through actual contributed work or through comments and suggestions. There is no legal need to maintain but it is good manners to credit and thank all who have helped you, besides those listed in `AUTHORS'. The wording of the `THANKS' file may look like this:
```
FOO THANKS file

FOO has originally been written by Your Name. Many people have further
contributed to FOO by reporting problems, suggesting various improvements,
or submitting actual code. Here is a list of these people. Help me keep
it complete and exempt of errors

Name1 <email address1>
Name2 <email address2>
....
```
A good habit is to use the `THANKS' file to record people's email addresses instead of having them in many places (like `AUTHORS', `ChangeLog'). This will make it easier to keep them updated.
The NEWS file: This is where you tell the users about the major features of this distribution. You needn't go into details, but you do need to list all the major features. The GNU coding standards explain in more detail how to structure this file.
The ChangeLog file: This file you are meant to maintain as you develop your code. It is meant to be like a diary where you write down in detail what changes you made to your code as you develop it. The GNU coding standards explain in a lot of detail how you should structure a `ChangeLog', so you should read about it there. The basic idea is to record modifications you make to code that already works. It is not necessary to detail how you developed a new piece of code, until you get it to work. It is important however to note down version releases. This way by viewing the `ChangeLog' file you can tell rather quickly the changes between versions. This file also records the names of the people making the modifications. This way if development is done by a team, and you want to address a bug in a certain part of the code, you can see who has been playing with it recently, get them on a mailing list and talk. The Gnu coding standards go into the details about how to do this. You can automate `ChangeLog' maintance with Emacs using the M-x add-change-log-entry-other-window. It may be easier to bind a key (for example f8) to this command by adding:
```
(global-set-key [f8] 'add-change-log-entry-other-window)
```
to your `.emacs' file. Then, after having made a modification and while the cursor is still at the place where you made the modification, press f8 and record your entry. Recently Emacs has decided to use the ISO 8601 standard for dates which is: YYYY-MM-DD (year-month-date). A typical `ChangeLog' entry looks like this:
```
1998-05-17  Eleftherios Gkioulekas  <lf@amath.washington.edu>

 * src/acmkdir.sh: Now acmkdir will put better default content
   to the files README, NEWS, AUTHORS, THANKS
```
Every entry contains all the changes you made within the period of a day. The most recent changes are listed at the top, the older changes slowly scroll to the bottom. If you are a vi user, please read my Emacs for Vi users document (which I still have not written). Emacs has excellent support for vi emulation, and it comes with a lot more helpful features than merely ChangeLog maintance, such as editing files over an FTP link, highlighting your language syntax with colors, and many others.
COPYING Here you list the terms of your distribution, in particular the GPL. This file will be generated for you automatically. You can also generate it with the gpl utility:
```
% gpl -l COPYING
```

Most of these files are easy to maintain. Later we will show you how to use `acmkdir' to create a new directory for a new distribution. The `acmkdir' utility will provide you with templates for all of these files from which you can begin editing.

Copyright and Free Software

Understanding Copyright

If you are just writing programs for your own internal use and you don't plan to redistribute them, you don't really need to worry too much about copyright. However, if you want to give your programs to other people then copyright issues become relavant. The main reason why `autoconf' and `automake' were developed was to facilitate the distribution of source code by making packages autoconfiguring. So, if you want to use these tools, you probably also want to know something about copyright issues. The following sections will focus primarily on the legal issues surrounding software. For a discussion of the philosophical issues please see section Philosophical issues. At this point, I should point out that I am not a lawyer, this is not legal advice, and I do not represent the opinions of the Free Software Foundation.

When you create a work, like a computer program, or a novel, and so on, you automatically have a set of legal rights called copyright. This means that you have the right to forbid others to use, modify and redistribute your work. By default no-one, except you the owner, is allowed to do any of these things. To relax these restrictions, you need to enter into an agreement with other people individually when they receive a copy from you. Such an agreement is called a License Agreement, which potentially entails rights and obligations to both you and them. It is very important that the License is written by a lawyer, and invoked from every file that is part of the work, in order for that file to fall under the terms the License. This can be done either by including the full text of the license or by including a legalese reference to the full text of the License. In the free software community, we standardize on using primarily the GNU General Public License, which we will discuss in the next section.

Copyright is transferable. This means that you have the right to transfer most your rights, that we call copyright, to another person or organization, with the exception of the moral right. The moral right is your right to say that you were the first owner of the work. This transfer is called copyright assignment. The moral right will force others to credit you, even if you must assign your copyright to them. When a work is being developed by a team, it makes legal sense to transfer the copyright to a single organization that can then coordinate enforcement of the copyright. In the free software community, some people assign their software to the Free Software Foundation. The arrangement is that copyright is transfered to the FSF. The FSF then grants you all the rights back in the form of a License Agreement, and commits itself legally to distribute the work only as free software. If you want to do this, you should contact the FSF for more information. It is not a good idea to assign your copyright to anyone else, unless you know very well that this is what you want to do.

The legal meaning of the word "use", as it refers to software, is peculiar, because software itself is very peculiar compared to all other forms of copyrighted work. For an executable program, "use" means to run it. But, for a library "use" refers to the act of linking it to your program. Copyright also covers derived work. If someone takes your code and modifies it, he is legally bound by the conditions under which you permitted him to do that. Similarly, if he links against a library that you wrote, then although he has a copyright to his code, he is bound by the license agreement that allowed him to do the linking, and he can only license his work in way that is also consistent with that agreement.

The concept of derived work is actually very slippery ground. Supposedly, what is copyrighted is not the algorithm but the implementation. What this means is that if you take someone's code, fire up an editor and modify it, then the resulting code is derived work. If you take someone's code understand the idea behind the implementation and reimplement the idea, then it is not derived work, even if the two end-results are remarkably similar, which they will be if the idea is very simple. So the property of a work being derived is not an inherent property of the work itself, but of the process with which you created the work. In practical terms, it's derived work if a judge says so in court.

Because copyright law is by default restrictive, you must explicitly grant permissions to your users to enable them to use your work. You do this, when you grant them a License Agreement. Even though the user never signs the agreement, nothing else grants the user any rights, so merely by using the program, the user is bound by the agreement. With some proprietary software, you are bound by the agreement the minute you break the seal in the packaging to unpack the box that contains the media with your software.

In addition to copyright law, there is another legal beast: the patent law. Unlike copyright, which you own automatically by the act of creating the work, you don't get a patent unless you file an application for it. If approved, the work is published but others must pay you royalties in order to use it in any way.

The problem with patents is that they cover algorithms, and if an algorithm is patented you can't write an implementation for it without a license. What makes it worse is that it is very difficult and expensive to find out whether the algorithms that you use are patented or will be patented in the future. What makes it insane is that the patent office, in its infinite stupidity, has patented algorithms that are very trivial with nothing innovative about them. For example, the use of backing store in a multiprocesing window system, like X11, is covered by patent 4,555,775. In the spring of 1991, the owner of the patent, AT&T, threatened to sue every member of the X Consortium including MIT. Backing store is the idea that the windowing system save the contents of all windows at all times. This way, when a window is covered by another window and then exposed again, it is redrawn by the windowing system, and not the code responsible for the application. Other insane patents include the IBM patent 4,674,040 which covers "cut and paste between files" in a text editor. Recently, a Microsoft backed company called "Wang" took Netscape to court over a patent that covered "bookmarks"! Wang lost.

Although most of these patents don't stand a chance in court, the cost of litigation is sufficient to terrorize small bussinesses, non-profit organizations like the Free Software Foundation, as well as individual software developers. For this reason, companies are all too eager to patent whatever they can get away with patenting to protect themselves from being sued by others, further complicating this problem. In practice, you will not be sued unless your code threatens the interests of a big corporation, or if a big corporation's lawyers get too much time on their hands.

Both copyright and patent laws are being used mainly to destroy our freedom. By freedom we refer to three things: the freedom to use software, the freedom to modify it and improve it, and the freedom to redistribute it with the modifications and improvements so that the whole community benefits. When you purchase commercial software, you are not really purchasing the software but a license that gives you limitted rights to the software. You never ever get source code and you are most definitely not granted rights to redistribute it. Finally your rights to use are also restricted in many ways. There are licenses that require that you use the software on only one computer screen. Other licenses allow only a limitted number of users to use the software at the same time. And to top this, some licenses expire after a year and you have to renew them, and other licenses even have illegal terms such as granting right to use on the condition that you do not compete with the company that produced it!

The opposite to this is free software. We must emphasize that by free we mean freedom, and not price. For example, although Internet Explorer is distributed for free, it is not free software, because you don't get the source code and permission to modify. The price is not as important as the freedom. You may have to pay money to obtain free software. In fact it is ok to sell free software and use some of the funds raised to develop more free software. Obscene pricing and rent-like licensing that requires you to pay thousands of dollars per year is only a consequence of not having freedom. But freedom is about more than just that. It is about being free from other people controlling our computer lifes. With free software, the only restrictions that you operate under are the technical limits of the software itself. With non-free software, you are imposed additional legal restrictions, and these restrictions reduce essentially to other people controlling your life. You find suddenly that you can't grab a copy of your software to install to your laptop. You find that you can't share copies with your students for classroom use. You find that you can not modify it to suit your needs, or maintain it when the owner goes out of bussiness. You find that you can not verify that the software does not contain Trojan horses. Software freedom refers to breaking these walls. Because making software free, increases the usefulness of your work, you are encouraged to free your software.

Freeing your software

The prefered way to free your software is to distribute it under the terms of the GNU General Public License, also known as the "GPL". To best understand the GPL, you simply have to sit down and read the original document very carefully. In broad strokes, the license does the following:

Grants you unrestricted permission to run the software and maintain as many copies as you like. You can also make personal custom modifications and use them freely. The GNU GPL does not restrict what you do in the privacy of your own computer in any way. On the contrary it waives all the restrictions that the copyright law applies to private use. It only kicks in when you distribute the software you have received to other people.
Grants permission to redistribute the software or modified versions (derived work) provided that you do so only under the terms of the GNU public license. This means that if you make any changes, they must also be free. This protects the freedom of the package by forbidding someone to make a free version proprietary. This measure is called copyleft because it subverts the copyright law to do the opposite of what copyright is normally used for (i.e. to prevent a user to make a proprietary package free). The GNU GPL does not oblidge you to distribute your modifications. It only requires that you distribute them freely, if you choose to distribute them at all. The decision of whether or not to distribute your derived work is entirely up to you.
Disclaims all warranty to protect the authors from lawsuits, including various implied warranties such as the warranty for merchantability and fitness for a particular purpose.
Protects the software's freedom from patents as much as this is possible. The scenario that we want to avoid is someone receiving the source code, and then filing a patent application for the algorithms it uses, effectively making the software proprietary. If you do that, the GPL will forbid you from redistributing the software as well as derived work. So you will not want to do it. The patent law itself will prevent someone else, who distributes a competing product, to patent the algorithm if your implementation predates his.

The purpose of the GPL is to use the copyright law to encourage a world in which software is not copyrighted. If copyright didn't cover software, then we would all be free to use, modify and redistribute software, and we would not be able to restrict others from enjoying these freedoms because there would be no law giving anyone such power. One way to grant the freedoms to the users of your software is to revoke your copyright on the software completely. This is called putting your work in the public domain. The problem with this is that it only grants the freedoms. It does not create the reality in which no-one can take these freedoms away from derived works. In fact the copyright law covers by default derived works regardless of whether the original was public domain or copyrighted. By distributing your work under the GPL, you grant the same freedoms, and at the same time you protect these freedoms from hoarders.

The philosophy behind the GPL is that software should not be copyrighted. Copyright is more appropriate in artistic expression, because the purpose of art is to express an idea in an "artistic" manner. The copyright law itself makes a distinction between idea and expression and covers only the expression, not the idea. If any number of writers are asked to write a story based on a given story idea, they will end up writing completely different stories. Even when different writers enliven a historical event, there is still plenty of room to write about it from many different angles. As a result, what promotes the growth of the arts is not the sharing of the expression but the sharing of ideas, and copyright does promote the growth of the arts by protecting the expression but not covering the ideas.

This seperation of idea and expression breaks down in the field of software. The established legal thinking is that algorithms are ideas and software is expression. The mathematical way of thinking about algorithms however points out that algorithms must be precisely defined, and that can only be done if the algorithms are completely "expressed" in terms of a notational system. In mathematics, equations are such a notational system that can represent a certain set of algorithms. Software is just a different notational system, which can represent all the Turing-computable algorithms. Unlike art, the point of the exercise is to represent the algorithm, not to write a poem about it. As modern research begins to move away from inventing equations, and towards inventing algorithms, it becomes increasingly important that we be free to use, modify and redistribute algorithms, in their software representation, in the same way as in mathematical equations.

The GNU GPL is a legal instrument that has been designed to create a safe haven in which software can be written free from copyright law encumberence. It creates a notion of public good which is similar to public domain with the difference that if you derive your work from public good, the result must also be a public good.

To apply the GPL to your programs you need to do the following things:

Attach a copy of the GNU public license to the toplevel directory of your source code in a file called `COPYING'.
Include a legal notice to every file that you want covered by the GPL, saying that it is covered by the GPL. It is important that all files that constitute source code must include this notice, including `Makefile.am', `configure.in' files and shell scripts. The legal notice should look like this:
```
// Copyright (C) (year) (Your Name) <your@email.address>
// 
// This program is free software; you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation; either version 2 of the License, or
// (at your option) any later version.
//
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
// 
// You should have received a copy of the GNU General Public License
// along with this program; if not, write to the Free Software
// Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
```
If you have assigned your copyright to an organization, like the Free Software Foundation, then you should probably fashion your copyright notice like this:
```
// Copyright (C) (year) Free Software Foundation
// (your name) <your@email.address> (initial year)
// etc...
```
This legal notice works like a subroutine. By invoking it, you invoke the full text of the GNU General Public License which is too lengthy to include in every source file. Where you see `(year)' you need to list all the years in which you finished preparing a version that was actually released, and which was an ancestor to the current version. This list is not the list of years in which versions were released. It is a list of years in which versions, later released, were completed. If you finish a version on Dec 31, 1997 and release it on Jan 1, 1998, you need to include 1997, but you do not need to include 1998. This rule is complicated, but it is dictated by international copyright law.
If your file is part of a library, and it invokes the GPL, then this causes the entire library to be under the GPL. When you link a GPLed library to produce an executable, then the executable is also under the GPL. This prevents people from distributing proprietary software that uses your library. In general, this is a good thing because it promotes the development of more free software. In some cases however, if your library aspires to implement a standard which you want to advocate, you may want to allow proprietary developers to use your library, while at the same time protecting the freedom of the library. The main reason for wanting to do this is not to give proprietary developers a freebie, but to benefit free software by allowing proprietary developers to make their software interopable with free software. One way to do this is to invoke the GNU GPL and then add the following additional legal wording:
```
// As a special exception, permission is granted for additional uses of
// the text contained in its release of LIB.
// 
// The exception is that, if you link the LIB library with other files
// to produce an executable, this does not by itself cause the
// resulting executable to be covered by the GNU General Public License.
// Your use of that executable is in no way restricted on account of
// linking the LIB library code into it.
// 
// This exception does not however invalidate any other reasons why
// the executable file might be covered by the GNU General Public License.
// 
// This exception applies only to the code released under the 
// name LIB.  If you copy code from other releases into a copy of
// LIB, as the General Public License permits, the exception does
// not apply to the code that you add in this way.  To avoid misleading
// anyone as to the status of such modified files, you must delete
// this exception notice from them.
// 
// If you write modifications of your own for LIB, it is your choice
// whether to permit this exception to apply to your modifications.
// If you do not wish that, delete this exception notice.  
```
Make sure to substitute "LIB" with the name of your library. This wording is used by the GNU project in the GUILE library. Similar terms are also used for the GNU C++ Standard Library to allow proprietary developers to use the GNU compiler. It is important to understand that you can not take a GPLed file, and slap this additional wording to it, without the original authors permission. You can however, at your option, use this wording in files that are your own original work. Also, if a file already invokes such wording, you can at your option retain or discard the wording in derived versions. Finally, if the library is linking any files that do not invoke this wording, then the permissions do not apply to the library as a whole, and if you link it to an executable, then you can not distribute that executable under a proprietary license. The individual files however, that do contain this notice, can be used to form a library that you can link into a proprietary executable, if that is technically possible.

For files hat are not really source code, like Makefiles and Autoconf macros, you should use this alternative wording instead:

// This file is free software; as a special exception the author gives
// unlimited permission to copy and/or distribute it, with or without 
// modifications, as long as this notice is preserved.
// 
// This program is distributed in the hope that it will be useful, but
// WITHOUT ANY WARRANTY, to the extent permitted by law; without even the
// implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Note that this notice is very liberal. Please use it mainly for cases where you'd find it absurd to claim copyright. Such cases are mainly cases where it is rather hard to distinguish derived work from original work because it's all so similar.

Use the `AUTHORS' file to keep records of who wrote what. These records will be important if you decide at some point to register your copyright. They are also important if you plan to assign your work to the FSF, because they will want to register the copyright. We discuss this in more detail later.
If you modify someone else's GPL covered file make sure to comply with section 2 of the GPL. To do that place notices stating that you changed the file and the date of the change. Also your program should advertise the fact that it is free software, that there is no warranty and that it can be redistributed under the condition of the GPL. A standard way of doing this is to make your program output this notice when it is passed the --version command-line flag. For details please read the GPL. Also see the GNITS conding standards for suggestions on the output of --version and --help, keeping in mind that these are simply suggestions.

You may feel that all this legal crap is stupid, and you just want to write code and get your work done. Every true hacker feels the same way and resents the fact that to code nowadays we have to have this sort of legal education. However unless you apply some license to your program, you are not granting anyone any permissions whatsoever. The `gpl' utility has been written to help you bother maintain the legalese. Many people write free software with ambiguous copyright terms, making it unusable to people who want to be precise about using the GNU GPL.

Invoking the `gpl' utility

Maintaining these legalese notices can be quite painful after some time. To ease the burden, Autotools distributes a utility called `gpl'. This utility will conveniently generate for you all the legal wording you will ever want to use. It is important to know that this application is not approved in any way by the Free Software Foundation. By this I mean that I haven't asked their opinion of it yet.

To create the file `COPYING' type:

% gpl -l COPYING

If you want to include a copy of the GPL in your documentation, you can generate a copy in texinfo format like this:

% gpl -lt gpl.texi

Also, every time you want to create a new file, use the `gpl' to generate the copyright notice. If you want it covered by the GPL use the standard notice. If you want to invoke the Guile-like permissions, then also use the library notice. If you want to grant unlimited permissions, meaning no copyleft, use the special notice. The `gpl' utility takes many different flags to take into account the different commenting conventions.

For a C file, create the standard notice with
```
% gpl -c file.c
```
the library notice with
```
% gpl -cL file.c
```
and the special notice with
```
% gpl -cS file.c
```
For a C++ file, create the standard notice with
```
% gpl -cc file.cc
```
the library notice with
```
% gpl -ccL file.cc
```
and the special notice with
```
% gpl -ccS file.cc
```
For a shells script (bash, perl) that uses hash marks for commenting, create the standard notice with
```
% gpl -sh foo.pl
```
the library notice with
```
% gpl -shL foo.tcl
```
and the special notice with
```
% gpl -shS foo.pl
```
It does not make sense to use the library notice, if no executable is being formed from this file. If however, you parse that file into C code that is then compiled into object code, then you may consider using the library notice on it instead of the special notice. One of the features provided by Autotools allows you to embed text, such as Tcl scripts, into the executable. In that case, you can use the library notice to license the original text.
For files that define autoconf macros:
```
% gpl -m4 file.m4
```
In general, we exempt autoconf macro files from the GNU GPL because the terms of autoconf also exclude its output, the `configure' script, from the GPL.
For `Makefile.am', or files that describe targets:
```
% gpl -am Makefile.am
```
For these we also exempt them from the GPL because they are so trivial that it makes no sense to add copyleft protection.

Inserting notices with Emacs

If you are using GNU Emacs, then you can insert these copyright notices on-demand while you're editing your source code. Autotools bundles two Emacs packages: gpl and gpl-copying which provide you with equivalents of the `gpl' command that can be run under Emacs. These packages will be byte-compiled and installed automatically for you while installing Autotools.

To use these packages, in your `.emacs' you must declare your identity by adding the following commands:

(setq user-mail-address "me@here.com")
(setq user-full-name "My Name")

Then you must require the packages to be loaded:

(require 'gpl)
(require 'gpl-copying)

These packages introduce a set of Emacs commands all of which are prefixed as gpl-. To invoke any of these commands press M-x, type the name of the command and press enter.

The following commands will generate notices for your source code:

`gpl-c': Insert the standard GPL copyright notice using C commenting.
`gpl-cL': lnsert the standard GPL copyright notice using C commenting, followed by a Guile-like library exception. This notice is used by the Guile library. You may want to use it for libraries that you write that implement some type of a standard that you wish to encourage. You will be prompted for the name of your package.
`gpl-cc': Insert the standard GPL copyright notice using C++ commenting.
`gpl-ccL': Insert the standard GPL copyright notice using C++ commenting, followed by a Guile-like library exception. You will be prompted for the name of your package
`gpl-sh': Insert the standard GPL copyright notice using shell commenting (i.e. has marks).
`gpl-shL': Insert the standard GPL copyright notice using shell commenting, followed by a Guile-like library exception. This can be useful for source files, like Tcl files, which are executable code that gets linked in to form an executable, and which use hash marks for commenting.
`gpl-shS': Insert the standard GPL notice using shell commenting, followed by the special Autoconf exception. This is useful for small shell scripts that are distributed as part of a build system.
`gpl-m4': Insert the standard GPL copyright notice using m4 commenting (i.e. dnl) and the special Autoconf exception. This is the prefered notice for new Autoconf macros.
`gpl-el': Insert the standard GPL copyright notice using Elisp commenting. This is useful for writing Emacs extension files in Elisp.

The following commands will generate notices for your source code:

`gpl-insert-copying-texinfo': Insert a set of paragraphs very similar to the ones appearing at the Copying section of this manual. It is a good idea to include this notice in an unnumbered chapter titled "Copying" in the Texinfo documentation of your source code. You will be prompted for the title of your package. That title will substitute the word Autotools as it appears in the corresponding section in this manual.
`gpl-insert-license-texinfo': Insert the full text of the GNU General Public License in Texinfo format. If your documentation is very extensive, it may be a good idea to include this notice either at the very beginning of your manual, or at the end. You should include the full license, if you plan to distribute the manual separately from the package as a printed book.

Compiling with Makefiles

Direct compilation

We begin at the beginning. If you recall, we showed to you that the hello world program can be compiled very simply with the following command:

% gcc hello.c -o hello

Even in this simple case you have quite a few options:

The `-g' flag causes the compiler to output debugging information to the executable. This way, you can step your program through a debugger if it crashes.
The `-O', `-O2', `-O3' flags activate optimization. Some compilers can go as far as `-O6'. These numbers are called optimization levels. When you compile your program with optimization enabled, the compiler applies certain algorithms to the machine code output to make it go faster. The cost is that your program compiles much more slowly and that although you can step it through a debugger if you used the `-g' flag, things will be a little strange. During development the programmer usually uses no optimization, and only activates it when he is about to run the program for a production run. A good advice: always test your code with optimization activated as well. If optimization breaks your code, then this is telling you that you have a memory bug. Good luck finding it.
The `-Wall' flag tells the compiler to issue warnings when it sees bad programming style. Some of these warning catch actual bugs, but occasionally some of the warnings complain about something correct that you did on purpose. For this reason you control whether you want to see these warnings or not with this flag.

Here are some variations of the above example:

% gcc -g -O3 hello.c hello
% gcc -g -Wall hello.c -o hello
% gcc -g -Wall -O3 hello.c -o hello

Compilers have many more flags like that, and some of these flags are compiler dependent.

Now let's consider the case where you have a much larger program. made of source files `foo1.c', `foo2.c', `foo3.c' and header files `header1.h' and `header2.h'. One way to compile the program is like this:

% gcc foo1.c foo2.c foo3.c -o foo

This is fine when you have only a few files to deal with. Eventually when you have more than a hundred files, this is very slow and inefficient, because everytime you change one of the `foo' files, all of them have to be recompiled. In large projects this can very well take a quite a few minutes, and in very large projects hours. The solution is to compile each part seperately and put them all together at the end, like this:

% gcc -c foo1.c
% gcc -c foo2.c
% gcc -c foo3.c
% gcc foo1.o foo2.o foo3.o -o foo

The first three lines compile the three parts seperately and generate output in the files `foo1.o', `foo2.o', `foo3.o'. The fourth line puts it all back together. This way if you make a change only in `foo1.o' you just do:

% gcc -c foo1.c
% gcc foo1.o foo2.o foo3.o -o foo

This feature of the compiler offers a way out, but it's hardly a solution.

Writing out these commands everytime becomes annoying very soon.
When it is a header file that you modified, you need to figure out which source files included it and rebuild all of them for your change to take effect. If you forget to rebuild just one of the source files, this bug fix you just made may not work, and you will be all confused.
Once you are done compiling you have all these `.o' files sitting around and you may want to have a safe way of getting rid of them. Typing
```
rm -f *.o
```
is dangerous because you may misspell `o' for `c' or you may do this:
```
rm -f * .o
```
and become depressed.

The `make' utility was written to address these problems.

Enter Makefiles

The `make' utility takes its instructions from a file called `Makefile' in the directory in which it was invoked. The `Makefile' involves four concepts: the target, the dependencies, the rules, and the source. Before we illustrate these concepts with examples we will explain them in abstract terms for those who are mathematically minded:

Source are the files that you wrote and which are present when the distribution is unpacked. A source file has no dependencies, but itself it may be a dependency to a target.
Target is a file that you wish to generate from the source, or an action that you wish to cause (called phony target since no file by the name of the target is generated). For example when you type `make install' you do not generate a file called `install'. What you generate is the action of moving certain files in certain places. Generating the file or taking the action is called building the target. A target may have dependencies and it may itself be a dependency to another target.
A Dependency can be either a source file or a target. If it is a source file, then we say that, as a dependency to a certain target, it has changed if and only if that target's latest update happened before the corresponding source file's latest update. If it is a target, then we define recursively that, as a dependency to a certain target, it has changed if and only if it has a dependency of its own that has changed. Okey, take a big breath and read this again a few times out loud.
Rules is a set of instructions that detail how to build a target if the dependencies of that target have changed. These instructions need to be issued if and only if at least one of the dependencies has changed, and they must be issued only after the corresponding rules for the dependencies that have changed are recursively invoked. You may want to take a big breath here as well. Note that this recursion will be terminated only when it hits source files because they don't have any dependencies. Then it will wrap itself back up and work from bottom to top all the way to building the target at the top of the recursion.

The `Makefile' is essentially a collection of logical statements about these four concepts. The content of each statement in English is:

To build this target, first make sure that these dependencies are up to date. If not build them first in the order in which they are listed. Then execute these rules to build this target.

Given a complete collection of such statements it is possible to infer what action needs to be taken to build a specific target, from the source files and the current state of the distribution. By action we mean passing commands to the shell. One reason why this is useful is because if part of the building process does not need to be repeated, it will not be repeated. The `make' program will detect that certain dependencies have not changed and skip the action required for rebuilding their targets. Another reason why this approach is useful is because it is intuitive in human terms. At least, it will be intuitive when we illustrate it to you.

In make-speak each statement has the following form:

target: dependency1 dependency2 ....
       shell-command-1
       shell-command-2
       shell-command-3

where target is the name of the target and dependency* the name of the dependencies, which can be either source files or other targets. The shell commands that follow are the commands that need to be passed to the shell to build the target after the dependencies have been built. To be compatible with most versions of make, you must seperate these statements with a blank line. Also, the shell-command* must be indented with the tab key. Don't forget your tab keys otherwise make will not work.

When you run make you can pass the target that you want to build as an argument. If you omit arguments and call make by itself then the first target mentioned in the Makefile is the one that gets built. The makefiles that Automake generates have the phony target all be the default target. That target will compile your code but not install it. They also provide a few more phony targets such as install, check, dist, distcheck, clean, distclean as we have discussed earlier. So Automake is saving you quite a lot of work because without it you would have to write a lot of repetitive code to provide all these phony targets.

To illustrate these concepts with an example suppose that you have this situation:

Four source files: `foo1.c', `foo2.c', `foo3.c',`foo4.c'
Three include files: `gleep1.h',`gleep2.h',`gleep3.h'
`foo1.c' includes `gleep2.h' and `gleep3.h'
`foo2.c' includes `gleep1.h'
`foo3.c' includes `gleep1.h' and `gleep2.h'
`foo4.c' includes `gleep3.h'

To build an executable `foo' you need to build object files and then link them together. We say that the executable depends on the object files and that each object file depends on a corresponding `*.c' file and the `*.h' files that it includes. Then to get to an executable `foo' you need to go through the following dependencies:

foo: foo1.o foo2.o foo3.o foo4.o
foo1.o: foo1.c gleep2.h gleep3.h
foo2.o: foo2.c gleep1.h
foo3.o: foo3.c gleep1.h gleep2.h
foo4.o: foo4.c gleep3.h

The thing on the left-hand-side is the target, the thing on the right-hand-side is the dependencies. The logic is that to build the thing on the left, you need to build the things on the right first. So, if `foo1.c' changes, `foo1.o' must be rebuilt. If `gleep3.h' changes then `foo1.o' and `foo4.o' must be rebuilt. That's the game.

The way the `Makefile' actually looks like is like this:

foo: foo1.o foo2.o foo3.o foo4.o
        gcc foo1.o foo2.o foo3.o foo4.o -o foo
 
foo1.o: foo1.c gleep2.h gleep3.h
        gcc -c foo1.c

foo2.o: foo2.c gleep1.h
        gcc -c foo2.c

foo3.o: foo3.c gleep1.h gleep2.h
        gcc -c foo3.c

foo4.o: foo4.c gleep3.h
        gcc -c foo4.c

It's the same thing as before except that we have supplemented the rules by which the target is built from the dependencies. Things to note about syntax:

The rules must be indented with a tab. To get a tab you must press the TAB key on your keyboard.
Each statement must be separated from the next statement with a blank line.
The first target is what gets built when you type simply
```
% make
```
Therefore, the target for the executable must go at the beginning.

If you omit the tabs or the blank line, then the Makefile will not work. Some versions of `make' have relaxed the blank line rule, since it's redundant, but to be portable, just put the damn blank line in.

You may ask, "how does `make' know what I changed?". It knows because UNIX keeps track of the exact date and time in which every file and directory was modified. This is called the Unix time-stamp. What happens then is that `make' checks whether any of the dependencies is newer than the main target. If so, then the target must be rebuilt. Cool. Now do the target's dependencies have to be rebuilt? Let's look at their dependencies and find out! In this recursive fashion, the logic is untangled and `make' does the Right Thing.

The `touch' command allows you to fake time-stamps and make a file look as if it has been just modified. This way you can force make to rebuild everything by saying something like:

% touch *.c *.h

If you are building more than one executable, then you may want to make a phony target all be the first target:

all: foo1 foo2 foo3

Then calling make will attempt to build all and that will cause make to loop over `foo1', `foo2', `foo3' and get them built. Of course you can also tell make to build these individually by typing:

% make foo1
% make foo2
% make foo3

Anything that is a target can be an argument. You might even say

% make bar.o

if all you want is to build a certain object file and then stop.

Problems with Makefiles and workarounds

The main problem with maintaining Makefiles, in fact what we mean when we complain about maintaining Makefiles, is keeping track of the dependencies. The `make' utility will do its job if you tell it what the dependencies are, but it won't figure them out for you. There's a good reason for this of course, and herein lies the wisdom of Unix. To figure out the dependencies, you need to know something about the syntax of the files that you are working with!. And syntax is the turf of the compiler, and not `make'. The GNU compiler honors this responsibility and if you type:

% gcc -MM foo1.c
% gcc -MM foo2.c
% gcc -MM foo3.c
% gcc -MM foo4.c

it will compute the dependencies and put them out in standard output. Even so, it is clear that something else is needed to take advantage of this feature, if available, to generate a correct `Makefile' automatically. This is the main problem for which the only work-around is to use another tool that generates Makefiles.

The other big problem comes about with situations in which a software project spans many subdirectories. Each subdirectory needs to have a Makefile, and every Makefile must have a way to make sure that `make' gets called recursively to handle the subdirectories. This can be done, but it is quite cumbersome and annoying. Some programmers may choose to do without the advantages of a well-organized directory tree for this reason.

There are a few other little problems, but they have for most part solutions within the realm of the `make' utility. One such problem is that if you move to a system where the compiler is called `cc' instead of `gcc' you need to edit the Makefile everywhere. Here's a solution:

CC = gcc 

#CFLAGS = -Wall -g -O3
CFLAGS = -Wall -g

foo: foo1.o foo2.o foo3.o foo4.o
        $(CC) $(CFLAGS) foo1.o foo2.o foo3.o foo4.o -o foo

foo1.o: foo1.c gleep2.h gleep3.h
        $(CC) $(CFLAGS) -c foo1.c

foo2.o: foo2.c gleep1.h
        $(CC) $(CFLAGS) -c foo2.c

foo3.o: foo3.c gleep1.h gleep2.h
        $(CC) $(CFLAGS) -c foo3.c

foo4.o: foo4.c gleep3.h
        $(CC) $(CFLAGS) -c foo4.c

Now the user just has to modify the first line where he defines the macro-variable `CC', and whatever he puts there gets substituted in the rules bellow. The other macro variable, `CFLAGS' can be used to turn optimization on and off. Putting a `#' mark in the beginning of a line, makes the line a comment, and the line is ignored.

Another problem is that there is a lot of redundancy in this makefile. Every object file is built from the source file the same way. Clearly there should be a way to take advantage of that right? Here it is:

CC = gcc 
CFLAGS = -Wall -g

.SUFFIXES: .c .o 

.c.o:
        $(CC) $(CFLAGS) -c $<

.o:
        $(CC) $(CFLAGS) $< -o $@

foo: foo1.o foo2.o foo3.o foo4.o
foo1.o: foo1.c gleep2.h gleep3.h
foo2.o: foo2.c gleep1.h
foo3.o: foo3.c gleep1.h gleep2.h
foo4.o: foo4.c gleep3.h

Now this is more abstract, and has some cool punctuation. The `SUFFIXES' thing tells `make' that files that are possible targets, fall under three categories: files that end in `.c', files that end in `.o' and files that end in nothing. Now let's look at the next line:

.c.o:
        $(CC) $(CFLAGS) -c $<

This line is an abstract rule that tells `make' how to make `.o' files from `.c' files. The punctuation marks have the following meanings:

`$<': are the dependencies that changed causing the target to need to be rebuilt
`$@': is the target
`$^': are all the dependencies for the current rule

In the same spirit, the next rule tells how to make the executable file from the `.o' files.

.o:
        $(CC) $(CFLAGS) $< -o $@

All that has to follow the abstract rules is the dependencies, without the specific rules! If you are using `gcc' these dependencies can be generated automatically and then you can include them from your Makefile. Unfortunately this approach doesn't work with all of the other compilers. And there is no standard way to include another file into Makefile source. (1) Of course, what we will point out eventually is that `automake' can take care of the dependencies for you.

The Makefile in our example can be enhanced in the following way:

CC = gcc
CFLAGS = -Wall -g
OBJECTS = foo1.o foo2.o foo3.o foo4.o
PREFIX = /usr/local

.SUFFIXES: .c .o

.c.o:
        $(CC) $(CFLAGS) -c $<

.o:
        $(CC) $(CFLAGS) $< -o $@

foo: $(OBJECTS)
foo1.o: foo1.c gleep2.h gleep3.h
foo2.o: foo2.c gleep1.h
foo3.o: foo3.c gleep1.h gleep2.h
foo4.o: foo4.c gleep3.h

clean:
        rm -f $(OBJECTS)

distclean:
        rm -f $(OBJECTS) foo

install:
        rm -f $(PREFIX)/bin/foo
        cp foo $(PREFIX)/bin/foo

We've added three fake targets called `clean' and `distclean', `install' and introduced a few more macro-variables to control redundancy. I am sure some bells are ringing now. When you type:

% make

the first target (which is `foo') gets build, and your program compiles. When you type

% make install

since there is no file called `install' anywhere, the rule there is executed which has the effect of copying the executable over at `/usr/local/bin'. To get rid of the object files,

% make clean

and to get rid of the executable as well

% make distclean

Such fake targets are called phony targets in makefile parlance. As you can see, the `make' utility is quite powerful and there's a lot it can do. If you want to become a `make' wizard, all you need to do is read the GNU Make Manual and waste a lot of time spiffying up your makefiles, instead of getting your programs debugged, The GNU Make manual is extremely well written, and will make for enjoyable reading. It is also free, unlike "published" books.

The reason we went to the trouble to explain `make' is because it is important to understand what happens behind the hood, and because in many cases, `make' is a fine thing to use. It works for simple programs. And it works for many other things such as formatting TeX documents and so on.

As we evolve to more and more complicated projects, there's two things that we need. A more high-level way of specifying what you want to build, and a way of automatically determining the values that you want to put to things like CFLAGS, PREFIX and so on. The first thing is what `automake' does, the second thing is what `autoconf' does.

Building libraries

There's one last thing that we need to mention before moving on, and that's libraries. As you recall, to put together an executable, we make a whole bunch of `.o' files and then put them all together. It just so happens in many cases that a set of `.o' files together forms a cohesive unit that can be reused in many applications, and you'd like to use them in other programs. To make things simpler, what you do is put the `.o' files together and make a library.

A library is usually composed of many `.c' files and hopefully only one or at most two `.h' files. It's a good practice to minimize the use of header files and put all your gunk in one header file, because this way the user of your library won't have to be typing an endless stream of `#include' directives for every `.c' file he writes that depends on the library. Be considerate. The user might be you! Header files fall under two categories: public and private. The public header files must be installed at `/prefix/include' whereas the private ones are only meant to be used internally. The public header files export documented library features to the user. The private header files export undocumented library features that are to be used only by the developer of the library and only for the purpose of developing the library.

Suppose that we have a library called `barf' that's made of the following files:

`barf.h', `barf1.c', `barf2.c', `barf3.c'

In real life, the names should be more meaningful than that, but we're being general here. To build it, you first make the `.o' files:

% gcc -c barf1.c
% gcc -c barf2.c
% gcc -c barf3.c

and then you do this magic:

% rm -f libbarf.a
% ar cru libbarf.a barf1.o barf2.o barf3.o

This will create a file libbarf.a from the object files `barf1.o', `barf2.o', `barf3.p'. On most Unix systems, the library won't work unless it's "blessed" by a program called `ranlib':

% ranlib libbarf.a

On other Unix systems, you might find that `ranlib' doesn't even exist because it's not needed.

The reason for this is historical. Originally ar was meant to be used merely for packaging files together. The more well known program tar is a descendent of ar that was designed to handle making such archives on a tape device. Now that tape devices are more or less obsolete, tar is playing the role that was originally meant for ar. As for ar, way back, some people thought to use it to package *.o files. However the linker wanted a symbol table to be passed along with the archive for the convenience of the people writing the code for the linker. Perhaps also for efficiency. So the ranlib program was written to generate that table and add it to the *.a file. Then some Unix vendors thought that if they incorporated ranlib to ar then users wouldn't have to worry about forgetting to call ranlib. So they provided ranlib but it did nothing. Some of the more evil ones dropped it all-together breaking many people's makefiles that tried to run ranlib. In the next chapter we will show you that Autoconf and Automake will automatically determine for you how to deal with ranlib in a portable manner.

Anyway, once you have a library, you put the header file `barf.h' under `/usr/local/include' and the `libbarf.a' file under `/usr/local/lib'. If you are in development phase, you put them somewhere else, under a prefix different other than `/usr/local'.

Now, how do we use libraries? Well, suppose that a program uses the barf function defined in the barf library. Then a typical program might look like:

// -* main.c *-
#include <stdio.h>
#include <barf.h>
main()
{
 printf("This is barf!\n");
 barf();
 printf("Barf me!\n");
}

If the library was installed in `/usr/local' then you can compile like this:

% gcc -c main.c
% gcc main.o -o main -lbarf

Of course, if you did not install in `/prefix' instead of `/usr/local' or `/usr' then you are in trouble. Now you have to do it this way:

% gcc -I/prefix/include -c main.c
% gcc main.o -o main -L/prefix/lib -lbarf

The `-I' flag tells the compiler where to find any extra header files (like `barf.h') and the `-L' flag tells the compiler where to find any extra libraries (like `libbarf.a'). The `-lbarf' flag tells the compiler to bring in the entire `libbarf.a' library with all its enclosed `.o' files and link it in with whathaveyou to produce the executable.

If the library hasn't been installed yet, and is present in the same directory as the object file `main.o' then you can link them by passing its filename instead:

% gcc main.o libbarf.a -o main

Please link libraries with their full names if they haven't yet been installed under the prefix directory and reserve using the -l flag only for libraries that have already been installed. This is very important. When you use Automake it helps it keep the dependencies straight. And when you use shared libraries, it is absolutely essential.

Also, please pay attention to the order with which you link your libraries. When the linker links a library, it does not embed into the executable code the entire library, but only the symbols that are needed from the library. In order for the linker to know what symbols are really needed from any given library, it must have already parsed all the other libraries and object files that depend on that library! This implies that you first link your object files, then you link the higher-level libraries, then the lower-level libraries. If you are the author of the libraries, you must write your libraries in such a manner, that the dependency graph of your libraries is a tree. If two libraries depend on each other bidirectionally, then you may have trouble linking them in. This suggests that they should be one library instead!

While we are at the topic, when you compile ordinary programs like the hello world program what really goes on behind the scenes is this:

% gcc -c hello.c
% gcc -o hello hello.o -lc

This links in the C system library `libc.a'. The standard include files that you use, such as `stdio.h', `stdlib.h' and whathaveyou are all refering to various parts of these libraries. These libraries get linked in by default when the `-o' flag is present. Note that other C compilers may be calling their system libraries something else. For this reason the corresponding flags are assumed and you don't have to supply them.

The catch is that there are many functions that you think of as standard that are not included in the `libc.a' library. For example all the math functions that are declared in `math.h' are defined in a library called `libm.a' which is not linked by default. So if the hello world program needed the math library you should be doing this instead:

% gcc -c hello.c
% gcc -o hello hello.o -lm

On some old Linux systems it used to be required that you also link a `libieee.a' library:

% gcc -o hello hello.o -lieee -lm

More problems of this sort occur when you use more esoteric system calls like sockets. Some systems require you to link in additional system libraries such as `libbsd.a', `libsocket.a', `libnsl.a'. Also if you are linking Fortran and C code together you must also link the Fortran run-time libraries. These libraries have non-standard names and depend on the Fortran compiler you use. Finally, a very common problem is encountered when you are writing X applications. The X libraries and header files like to be placed in non-standard locations so you must provide system-dependent -I and -L flags so that the compiler can find them. Also the most recent version of X requires you to link in some additional libraries on top of libX11.a and some rare systems require you to link some additional system libraries to access networking features (recall that X is built on top of the sockets interface and it is essentially a communications protocol between the computer running the program and computer that controls the screen in which the X program is displayed.) Fortunately, Autoconf can help you deal with all of this. We will cover these issues in more detail in subsequent chapters.

Because it is necessary to link system libraries to form an executable, under copyright law, the executable is derived work from the system libraries. This means that you must pay attention to the license terms of these libraries. The GNU `libc' library is under the LGPL license which allows you to link and distribute both free and proprietary executables. The `stdc++' library is also under terms that permit the distribution of proprietary executables. The `libg++' library however only permits you to build free executables. If you are on a GNU system, including Linux-based GNU systems, the legalese is pretty straightforward. If you are on a proprietary Unix system, you need to be more careful. The GNU GPL does not allow GPLed code to be linked against proprietary library. Because on Unix systems, the system libraries are proprietary, their terms may not allow you to distribute executables derived from them. In practice, they do however, since proprietary Unix systems do want to attract proprietary applications. In the same spirit, the GNU GPL also makes an exception and explicitly permits the linking of GPL code with proprietary system libraries, provided that said libraries are system libraries. This includes proprietary `libc.a' libraries, the `libdxml.a' library in Digital Unix, proprietary Fortran system libraries like `libUfor.a', and the X11 libraries.

Using Automake and Autoconf

Hello World revisited

To begin, let's review the simplest example, the hello world program:

`hello.c'

#include <stdio.h>
main()
{
 printf("Howdy, world!\n");
}

`Makefile.am'

bin_PROGRAMS = hello
hello_SOURCES = hello.c

`configure.in'

AC_INIT(hello.cc)
AM_INIT_AUTOMAKE(hello,1.0)
AC_PROG_CC
AC_PROG_INSTALL
AC_OUTPUT(Makefile)

The language of `Makefile.am' is a logic language. There is no explicit statement of execution. Only a statement of relations from which execution is inferred. On the other hand, the language of `configure.in' is procedural. Each line of `configure.in' is a command that is executed.

Seen in this light, here's what the `configure.in' commands shown do:

The AC_INIT command initializes the configure script. It must be passed as argument the name of one of the source files. Any source file will do.
The AM_INIT_AUTOMAKE performs some further initializations that are related to the fact that we are using `automake'. If you are writing your `Makefile.in' by hand, then you don't need to call this command. The two comma-separated arguments are the name of the package and the version number.
The AC_PROG_CC checks to see which C compiler you have.
The AC_PROG_INSTALL checks to see whether your system has a BSD compatible install utility. If not then it uses `install-sh' which `automake' will install at the root of your package directory if it's not there yet.
The AC_OUTPUT tells the configure script to generate `Makefile' from `Makefile.in'

The `Makefile.am' is more obvious. The first line specifies the name of the program we are building. The second line specifies the source files that compose the program.

For now, as far as `configure.in' is concerned you need to know the following additional facts:

If you are building a library, then your configure script must determine how to handle `ranlib'. To do that, add the AC_PROG_RANLIB command.
If you want to have your makefiles call recursively makefiles at subdirectories then the configure script needs to be told to find out how to do that. For this purpose you add the AC_PROG_MAKE_SET command.
If you have any makefiles in subdirectories you must also put them in the AC_OUTPUT statement like this:
```
AC_OUTPUT(Makefile          \
          dir1/Makefile     \
          dir2/Makefile     \
         )
```
Note that the backslashes are not needed if you are using the bash shell. For portability reasons, however, it is a good idea to include them.

As we explained before to build this package you need to execute the following commands:

% aclocal
% autoconf
% touch README AUTHORS NEWS ChangeLog
% automake -a 
% configure
% make

The first three commands, are for the maintainer only. When the user unpacks a distribution, he should be able to start from `configure' and move on.

The `aclocal' command installs a file called `aclocal.m4'. Normally, in that file you are supposed to place the definitions of any `autoconf' macros that you've written that happen to be in use in `configure.in'. We will teach you how to write `autoconf' macros later. The `automake' utility uses the AM_INIT_AUTOMAKE macro which is not part of the standard `autoconf' macros. For this reason, it's definition needs to be placed in `aclocal.m4'. If you call `aclocal' with no arguments then it will generate the appropriate `aclocal.m4' file. Later we will show you how to use `aclocal' to also install your own `autoconf' macros.
The `autoconf' command combines the `aclocal.m4' and `configure.in' files and produces the `configure' script. And now we are in bussiness.
The `touch' command makes the files `README' and friends exist. It is important that these files exist before calling Automake, because Automake decides whether to include them in a distribution by checking if they exist at the time that you invoke `automake'. Automake must decide to include these files, because when you type `make distcheck' the presense of these files will be required.
The `automake' command compiles a `Makefile.in' file from `Makefile.am' and if absent it installs various files that are required either by the GNU coding standards or by the makefile that will be generated.

If you are curious you can take a look at the generated `Makefile'. It looks like gorilla spit but it will give you an idea of how one gets there from the `Makefile.am'.

The `configure' script is an information gatherer. It finds out things about your system. That information is given to you in two ways. One way is through defining C preprocessor macros that you can test for directly in your source code with preprocessor directives. This is done by passing -D flags to the compiler. The other way is by making certain variables defined at the `Makefile.am' level. This way you can, for example, have the configure script find out how a certain library is linked, export is as a `Makefile.am' variable and use that variable in your `Makefile.am'. Also, through certain special variables, `configure' can control how the compiler is invoked by the `Makefile'.

Using configuration headers

As you may have noticed, the `configure' script in the previous example defines two preprocessor macros that you can use in your code: PACKAGE and VERSION. As you become a power-user of `autoconf' you will get define even more such macros. If you inspect the output of `make' during compilation, you will see that these macros get defined by passing `-D' flags to the compiler, one for each macro. When there is too many of these flags getting passed around, this can cause two problems: it can make the `make' output hard to read, and more importantly it can hit the buffer limits of various braindead implementations of `make'. To work around this problem, an alternative approach is to define all these macros in a special header file and include it in all the sources.

A hello world program using this technique looks like this

`configure.in'

AC_INIT
AM_CONFIG_HEADER(config.h)
AM_INIT_AUTOMAKE(hello,0.1)
AC_PROG_CXX
AC_PROG_INSTALL
AC_OUTPUT(Makefile)

`Makefile.am'

bin_PROGRAMS = hello
hello_SOURCES = hello.c

`hello.c'

#ifdef HAVE_CONFIG_H
#include <config.h>
#endif

#include <stdio.h>
main()
{
 printf("Howdy, pardner!\n");
}

Note that we call a new macro in `configure.in': AM_CONFIG_HEADER. Also we include the configuration file conditionally with the following three lines:

#ifdef HAVE_CONFIG_H
#include <config.h>
#endif

It is important to make sure that the `config.h' file is the first thing that gets included. Now do the usual routine:

% aclocal
% autoconf
% touch NEWS README AUTHORS ChangeLog
% automake -a

Automake will give you an error message saying that it needs a file called `config.h.in'. You can generate such a file with the `autoheader' program. So run:

% autoheader
Symbol `PACKAGE' is not covered by acconfig.h
Symbol `VERSION' is not covered by acconfig.h

Again, you get error messages. The problem is that autoheader is bundled with the autoconf distribution, not the automake distribution, and consequently doesn't know how to deal with the PACKAGE and VERSION macros. Of course, if `configure' defines a macro, there's nothing to know. On the other hand, when a macro is not defined then there are at least two possible defaults:

#undef PACKAGE
#define PACKAGE 0

The autoheader program here complains that it doesn't know the defaults for the PACKAGE and VERSION macros. To provide the defaults, create a new file `acconfig.h':

`acconfig.h'

#undef PACKAGE
#undef VERSION

and run `autoheader' again:

% autoheader

At this point you must run autoconf again, so that it takes into account the presense of acconfig.h:

% aclocal
% autoconf

Now you can go ahead and build the program:

% configure
% make
Computing dependencies for hello.cc...
echo > .deps/.P
gcc -DHAVE_CONFIG_H -I. -I. -I.   -g -O2 -c hello.cc
gcc -g -O2  -o hello  hello.o

Note that now instead of multiple -D flags, there is only one such flag passed: -DHAVE_CONFIG_H. Also, appropriate -I flags are passed to make sure that `hello.cc' can find and include `config.h'. To test the distribution, type:

% make distcheck
......
========================
hello-0.1.tar.gz is ready for distribution
========================

and it should all work out.

The `config.h' files go a long way back in history. In the past, there used to be packages where you would have to manually edit `config.h' files and adjust the macros you wanted defined by hand. This made these packages very difficult to install because they required intimate knowledge of your operating system. For example, it was not unusual to see a comment saying "if your system has a broken vfork, then define this macro". How the hell are you supposed to know if your systems vfork is broken?? With auto-configuring packages all of these details are taken care of automatically, shifting the burden from the user to the developer where it belongs.

Normally in the `acconfig.h' file you put statements like

#undef MACRO
#define MACRO default

These values are copied over to `config.h.in' and are supplemented with additional defaults for C preprocessor macros that get defined by native autoconf macros like AC_CHECK_HEADERS, AC_CHECK_FUNCS, AC_CHECK_SIZEOF, AC_CHECK_LIB.

If the file `acconfig.h' contains the string @TOP@ then all the lines before the string will be included verbatim to `config.h' before the custom definitions. Also, if the file `acconfig.h' contains the string @BOTTOM@ then all the lines after the string will be included verbatim to `config.h' after the custom definitions. This allows you to include further preprocessor directives that are related to configuration. Some of these directives may be using the custom definitions to conditionally issue further preprocessor directives. Due to a bug in some versions of autoheader if the strings @TOP@ and @BOTTOM@ do appear in your acconfig.h file, then you must make sure that there is at least one line appearing before @TOP@ and one line after @BOTTOM@, even if it has to be a comment. Otherwise, autoheader may not work correctly.

With `autotools' we distribute a utility called `acconfig' which will build `acconfig.h' automatically. By default it will always make sure that

#undef PACKAGE
#undef VERSION

are there. Additionally, if you install macros that are `acconfig' friendly then `acconfig' will also install entries for these macros. The acconfig program may be revised in the future and perhaps it might be eliminated. There is an unofficial patch to Autoconf that will automate the maintance of `acconfig.h', eliminating the need for a seperate program. I am not yet certain if that patch will be part of the official next version of Autoconf, but I very much expect it to. Until then, if you are interested, see: http://www.clark.net/pub/dickey/autoconf/autoconf.html This situation creates a bit of a dilemma about whether I should document and encourage acconfig in this tutorial or not. I believe that the Autoconf patch is a superior solution. However since I am not the one maintaining Autoconf, my hands are tied. For now let's say that if you confine yourself to using only the macros provided by autoconf, automake, and autotools then `acconfig.h' will be completely taken care for you by `acconfig'. In the future, I hope that acconfig.h will be generated by configure and be the sole responsibility of Autoconf.

You may be wondering whether it is worth using `config.h' files in the programs you develop if there aren't all that many macros being defined. My personal recommendation is yes. Use `config.h' files because perhaps in the future your `configure' might need to define even more macros. So get started on the right foot from the beginning. Also, it is nice to just have a config.h file lying around because you can have all your configuration specific C preprocessor directives in one place. In fact, if you are one of these people writing peculiar system software where you get to #include 20 header files on every single source file you write, you can just have them on all thrown into config.h once and for all. In the next chapter we will tell you about the LF macros that get distributed with autotools and this tutorial. These macros do require you to use the `config.h' file. The bottom line is: `config.h' is your friend; trust the config.h.

The building process

FIXME: write about VPATH builds and how to modify optimization

Some general advice

In software engineering, people start from a precise, well-designed specification and proceed to implementation. In research, the specification is fluid and immaterial and the goal is to be able to solve a slightly different problem every day. To have the flexibility to go from variation to variation with the least amount of fuss is the name of the game. By fuss, we refer to debugging, testing and validation. Once you have a code that you know gives the right answer to a specific set of problems, you want to be able to move on to a different set of similar problems with reinventing, debugging and testing as little as possible. These are the two distinct situations that computer programmers get to confront in their lives.

Software engineers can take good care of themselves in both situations. It's part of their training. However, people whose specialty is the scientific problem and not software engineering, must confront the hardest of the two cases, the second one, with very little training in software engineering. As a result they develop code that's clumsy in implementation, clumsy in usage, and with only redeeming quality the fact that it gives the right answer. This way, they do get the work of the day done, but they leave behind them no legacy to do the work of tomorrow. No general-purpose tools, no documentation, no reusable code.

The key to better software engineering is to focus away from developing monolithic applications that do only one job, and focus on developing libraries. One way to think of libraries is as a program with multiple entry points. Every library you write becomes a legacy that you can pass on to other developers. Just like in mathematics you develop little theorems and use the little theorems to hide the complexity in proving bigger theorems, in software engineering you develop libraries to take care of low-level details once and for all so that they are out of the way everytime you make a different implementation for a variation of the problem.

On a higher level you still don't create just one application. You create many little applications that work together. The centralized all-in-one approach in my experience is far less flexible than the decentralized approach in which a set of applications work together as a team to accomplish the goal. In fact this is the fundamental principle behind the design of the Unix operating system. Of course, it is still important to glue together the various components to do the job. This you can do either with scripting or with actually building a suite of specialized monolithic applications derived from the underlying tools.

The name of the game is like this: Break down the program to parts. And the parts to smaller parts, until you get down to simple subproblems that can be easily tested, and from which you can construct variations of the original problem. Implement each one of these as a library, write test code for each library and make sure that the library works. It is very important for your library to have a complete test suite, a collection of programs that are supposed to run silently and return normally (exit(0);) if they execute successfully, and return abnormally (assert(false); exit(1);) if they fail. The purpose of the test suite is to detect bugs in the library, and to convince you, the developer, that the library works. The best time to write a test program is as soon as it is possible! Don't be lazy. Don't just keep throwing in code after code after code. The minute there is enough new code in there to put together some kind of test program, just do it! I can not emphasize that enough. When you write new code you have the illusion that you are producing work, only to find out tomorrow that you need an entire week to debug it. As a rule, internalize the reality that you know you have produced new work everytime you write a working test program for the new features, and not a minute before. Another time when you should definetly write a test suite is when you find a bug while ordinarily using the library. Then, before you even fix the bug, write a test program that detects the bug. Then go fix it. This way, as you add new features to your libraries you have insurance that they won't reawaken old bugs.

Please keep documentation up to date as you go. The best time to write documentation is right after you get a few new test programs working. You might feel that you are too busy to write documentation, but the truth of the matter is that you will always be too busy. After long hours debugging these seg faults, think of it as a celebration of triumph to fire up the editor and document your brand-spanking new cool features.

Please make sure that computational code is completely seperated from I/O code so that someone else can reuse your computational code without being forced to also follow your I/O model. Then write programs that invoke your collection of libraries to solve various problems. By dividing and conquering the problem library by library with a test suite for each step along the way, you can write good and robust code. Also, if you are developing numerical software, please don't expect that other users of your code will be getting a high while entering data for your input files. Instead write an interactive utility that will allow users to configure input files in a user friendly way. Granted, this is too much work in Fortran. Then again, you do know more powerful languages, don't you?

Examples of useful libraries are things like linear algebra libraries, general ODE solvers, interpolation algorithms, and so on. As a result you end up with two packages. A package of libraries complete with a test suite, and a package of applications that invoke the libraries. The package of libraries is well-tested code that can be passed down to future developers. It is code that won't have to be rewritten if it's treated with respect. The package of applications is something that each developer will probably rewrite since different people will probably want to solve different problems. The effect of having a package of libraries is that C++ is elevated to a Very High Level Language that's closer to the problems you are solving. In fact a good rule of thumb is to make the libraries sufficiently sophisticated so that each executable that you produce can be expressed in one source file. All this may sound like common sense, but you will be surprised at how many scientific developers maintain just one does-everything-program that they perpetually hack until it becomes impossible to maintain. And then you will be even more surprised when you find that some professors don't understand why a "simple mathematical modification" of someone else's code is taking you so long.

Every library must have its own directory and Makefile. So a library package will have many subdirectories, each directory being one library. And perhaps if you have too many of them, you might want to group them even further down. Then, there's the applications. If you've done everything right, there should be enough stuff in your libraries to enable you to have one source file per application. Which means that all the source files can probably go down under the same directory.

Very often you will come to a situation where there's something that your libraries to-date can't do, so you implement it and stick it along in your source file for the application. If you find yourself cut and pasting that implementation to other source files, then this means that you have to put this in a library somewhere. And if it doesn't belong to any library you've written so far, maybe to a new library. When you are in a deadline crunch, there's a tendency not to do this since it's easier to cut and paste. The problem is that if you don't take action right then, eventually your code will degenerate to a hard-to-use mess. Keeping the entropy down is something that must be done on a daily basis.

Finally, a word about the age-old issue of language-choice. The GNU coding standards encourage you to program in C and avoid using languages other than C, such as C++ or Fortran. The main advantage of C over C++ and Fortran is that it produces object files that can be linked by any C or C++ compiler. In contrast, C++ object files can only be linked by the compiler that produced them. As for Fortran, aside from the fact that Fortran 90 and 95 have no free compilers, it is not very trivial to mix Fortran 77 with C/C++, so it makes no sense to invite all that trouble without a compelling reason. Nevertheless, my suggestion is to code in C++. The main benefit you get with C++ is robustness. Having constructors and destructors and references can go a long way towayrds helping you to void memory errors, if you know how to make them work for you.

Standard organization with Automake

Now we get into the gory details of software organization. I'll tell you one way to do it. This is advice, not divine will. It's simply a way that works well in general, and a way that works well with autoconf and automake in particular.

The first principle is to maintain the package of libraries seperate from the package of applications. This is not an iron-clad rule. In software engineering, where you have a crystal clear specification, it makes no sense to keep these two seperate. I found from experience that it makes a lot more sense in research. Either of these two packages must have a toplevel directory under which live all of its guts. Now what do the guts look like?

First of all you have the traditional set of information files that we described in Chapter 1:

README, AUTHORS, NEWS, ChangeLog, INSTALL, COPYING

You also have the following subdirectories:

`m4': Here, you install any new `m4' files that your package may want to install. These files define new `autoconf' commands that you may want to make available to other developers who want to use your libraries.
`doc': Here you put the documentation for your code. You have the creative freedom to present the documentation in any way you desire. However, the prefered way to document software is to use Texinfo. Texinfo has the advantage that you can produce both on-line help as well as a nice printed book from the same source. We will say something about Texinfo later.
`src': Here's the source code. You could put it at the toplevel directory as many developers do, but I find it more convenient to keep it away in a subdirectory. Automake makes it trivially easy to do recursive `make', so there is no reason not to take advantage of it to keep your files more organized.
`include': This is an optional directory for distributions that use many libraries. You can have the configure script link all public header files in all the subdirectories under src to this directory. This way it will only be necessary to pass one -I flag to test suites that want to access the include files of other libraries in the distribution. We will discuss this later.
`lib': This is an optional directory where you put portability-related source code. This is mainly replacement implementations for system calls that may not exist on some systems. You can also put tools here that you commonly use accross many different packages, tools that are too simple to just make libraries out of every each one of them. It is suggested that you maintain these tools in a central place. We will discuss this much later.

Together with these subdirectories you need to put a `Makefile.am' and a `configure.in' file. I also suggest that you put a shell script, which you can call `reconf', that contains the following:

#!/bin/sh
rm -f config.cache
rm -f acconfig.h
touch acconfig.h
aclocal -I m4
autoconf
autoheader
acconfig
automake -a
exit

This will generate `configure' and `Makefile.in' and needs to be called whenever you change a `Makefile.am' or a `configure.in' as well as when you change something under the `m4' directory. It will also call acconfig which automatically generates acconfig.h and calle `autoheader' to make config.h.in. The `acconfig' utility is part of `autotools', and if you are maintaining `acconfig.h' by hand, then you want to use this script instead:

#!/bin/sh
rm -f config.cache
aclocal -I m4
autoconf
autoheader
automake -a
exit

At the toplevel directory, you need to put a `Makefile.am' that will tell the computer that all the source code is under the `src' directory. The way to do it is to put the following lines in `Makefile.am':

EXTRA_DIST = reconf
SUBDIRS = m4 doc src

The first line tells automake that the `reconf' script is part of the distribution and must be included when you do make dist.
The second line tells automake that the rest of the distribution is in the subdirectories `m4', `doc' and `src'. It instructs `make' to recursively call itself in these subdirectories. It is important to include the `doc' and `m4' subdirectories here and enhance them with `Makefile.am' so that make dist includes them into the distribution.

If you are also using a `lib' subdirectory, then it should be built before `src':

EXTRA_DIST = reconf
SUBDIRS = m4 doc lib src

The `lib' subdirectory should build a static library that is linked by your executables in `src'. There should be no need to install that library.

At the toplevel directory you also need to put the `configure.in' file. That should look like this:

AC_INIT
AM_INIT_AUTOMAKE(packagename,versionnumber)
[...put your tests here...]
AC_OUTPUT(Makefile                   \
          doc/Makefile               \
          m4/Makefile                \
          src/Makefile               \
          src/dir1/Makefile          \
          src/dir2/Makefile          \
          src/dir3/Makefile          \
          src/dir1/foo1/Makefile     \
          ............               \
         )

You will not need another `configure.in' file. However, every directory level on your tree must have a `Makefile.am'. When you call automake on the top-level directory, it looks at `AC_OUTPUT' at your `configure.in' to decide what other directories have a `Makefile.am' that needs parsing. As you can see from above, a `Makefile.am' file is needed even under the `doc' and `m4' directories. How to set that up is up to you. If you aren't building anything, but just have files and directories hanging around, you must declare these files and directories in the `Makefile.am' like this:

SUBDIRS = dir1 dir2 dir3
EXTRA_DIST = file1 file2 file3

Doing that will cause make dist to include these files and directories to the package distribution.

This tedious setup work needs to be done everytime that you create a new package. If you create enough packages to get sick of it, then you want to look into the `acmkdir' utility that is distributed by Autotools. We will describe it at the next chapter.

Programs and Libraries with Automake

Next we explain how to develop `Makefile.am' files for the source code directory levels. A `Makefile.am' is a set of assignments. These assignments imply the Makefile, a set of targets, dependencies and rules, and the Makefile implies the execution of building.

The first set of assignments going at the beginning look like this:

INCLUDES = -I/dir1 -I/dir2 -I/dir3 ....
LDFLAGS = -L/dir1 -L/dir2 -L/dir3 .... 
LDADD = -llib1 -llib2 -llib3 ...

The `INCLUDES' assignment is where you insert the -I flags that you need to pass to your compiler. If the stuff in this directory is dependent on a library in another directory of the same package, then the -I flag must point to that directory.
The `LDFLAGS' assignment is where you insert the -L flags that are needed by the compiler when it links all the object files to an executable.
The `LDADD' assignment is where you list a long set of installed libraries that you want to link in with all of your executables. Use the -l flag only for installed libraries. You can list libraries that have been built but not installed yet as well, but do this only be providing the full path to these libraries.

If your package contains subdirectories with libraries and you want to link these libraries in another subdirectory you need to put `-I' and `-L' flags in the two variables above. To express the path to these other subdirectories, use the `$(top_srcdir)' variable. For example if you want to access a library under `src/libfoo' you can put something like:

INCLUDES = ... -I$(top_srcdir)/src/libfoo ...
LDFLAGS  = ... -L$(top_srcdir)/src/libfoo ...

on the `Makefile.am' of every directory level that wants access to these libraries. Also, you must make sure that the libraries are built before the directory level is built. To guarantee that, list the library directories in `SUBDIRS' before the directory levels that depend on it. One way to do this is to put all the library directories under a `lib' directory and all the executable directories under a `bin' directory and on the `Makefile.am' for the directory level that contains `lib' and `bin' list them as:

SUBDIRS = lib bin

This will guarantee that all the libraries are available before building any executables. Alternatively, you can simply order your directories in such a way so that the library directories are built first.

Next we list the things that are to be built in this directory level:

bin_PROGRAMS    = prog1 prog2 prog3 ....
lib_LIBRARIES   = libfoo1.a libfoo2.a libfoo3.a ....
check_PROGRAMS  = test1 test2 test3 ....
TESTS           = $(check_PROGRAMS)
include_HEADERS = header1.h header2.h ....

The `bin_PROGRAMS' line lists all the executable files that will be compiled with make and installed with make install under `/prefix/bin', where `prefix' is usually `/usr/local'.
The `lib_LIBRARIES' line lists all the library files that will be compiled with make and installed with make install under `/prefix/lib'.
The `check_PROGRAMS' line lists executable files that are not compiled with a simple make but only with a make check. These programs serve as tests that you, the user can use to test the library.
The `TESTS' line lists executable files which are to be compiled and executed when you run make check. These programs constitute the test suite and they are indispensible when you develop a library. It is common to just set
```
TESTS = $(check_PROGRAMS)
```
This way by commenting the line in and out, you can modify the behaviour of make check. While debugging your test suite, you will want to comment out this line so that make check doesn't run it. However, in the end product, you will want to comment it back in.
The `include_HEADERS' line lists public headers present in this directory that you want to install in /prefix/include. You must list a header file here if you want to cause it to be installed. You can also list it under libfoo_a_SOURCES for the library that it belongs to, but it is imperative to list public headers here so that they can be installed.

It is good programming practice to keep libraries and executables under seperate directory levels. However, it is okey to keep the library and the check executables that test the library under the same directory level because that makes it easier for you to link them with the library.

For each of these types of targets, we must state information that will allow automake and make to infer the building process.

For each Program: You need to declare the set of files that are sources of the program, the set of libraries that must be linked with the program and (optionally) a set of dependencies that need to be built before the program is built. These are declared in assignments that look like this:
```
prog1_SOURCES = foo1.cc foo2.cc ... header1.h header2.h ....
prog1_LDADD   = -lbar1 -lbar2 -lbar3
prog1_LDFLAGS = -L/dir1 -L/dir2 -L/dir3 ...
prog1_DEPENDENCIES = dep1 dep2 dep3 ...
```
In each assignment substitute `prog1' with the name of the program that you are building as it appeared in `bin_PROGRAMS' or `check_PROGRAMS'.
- `prog1_SOURCES': Here you list all the `*.cc' and `*.h' files that compose the source code of the program. The presense of a header file here doesn't cause the file to be installed at `/prefix/include' but it does cause it to be added to the distribution when you do make dist. To cause header files to be installed you must also put them in `include_HEADERS'.
- `prog1_LDADD': Here you add primarily the -l flags for linking whatever libraries are needed by your code. You may also list object files, which have been compiled in an exotic way, as well as paths to uninstalled yet libraries.
- `prog_LDFLAGS': Here you add the -L flags that are needed to resolve the libraries you passed in `prog_LDADD'. Certain flags that need to be passed on every program can be expressed on a global basis by assigning them at `LDFLAGS'.
- `prog1_DEPENDENCIES': If for any reason you want certain other targets to be built before building this program, you can list them here.
This is all you need to do. There is no need to write an extended Makefile with all the targets, dependencies and rules that are required to build the program. They are computed for you by this minimal information by `automake'. Moreover, the targets `dist', `install', `clean' and `distclean' are appropriately setup to handle the program. You don't need to take care of them by yourself.
For each Library: There's a total of four assignments that are relevant to building libraries:
```
lib_LIBRARIES = ... libfoo1.a ...
libfoo1_a_SOURCES      = foo1.cc foo2.cc private1.h private2.h ...
libfoo1_a_LIBADD       = obj1.o obj2.o obj3.o
libfoo1_a_DEPENDENCIES = dep1 dep2 dep3 ...
```
Note that if the name of the library is `libfoo1.a' the prefix that appears in the variables that are related with that library is `libfoo1_a_'.
- `libfoo1_a_SOURCES': Just like with programs, here you list all the `*.cc' files as well as all the private header files that compose the library. By private header file we mean a header file that is used internally by the library and the maintainers of the library, but is not exported to the end-user. You can list public header files also if you like, and perhaps you should for documentation purposes, but if you mention them in include_HEADERS it is not required to repeat them a second time here.
- `libfoo1_a_DEPENDENCIES': If there are any other targets that need to be built before this library is built, list them here.
- `libfoo1_a_LIBADD': If there are any other object files that you want to include in the library list them here. You might be tempted to list them as dependencies in `libfoo1_a_DEPENDENCIES', but that will not work. If you do that, the object files will be built before the library is built but they will not be included in the library! By listing an object file here, you are stating that you want it to be built and you want it to be included in the library.

General Automake principles

In the previous section we described how to use Automake to compile programs, libraries and test suites. To exploit the full power of Automake however, it is important to understand the fundamental ideas behind it.

The simplest way to look at a `Makefile.am' is as a collection of assignments which infer a set of Makefile rules, which in turn infer the building process. There are three types of such assignments:

Global assignments modify the behaviour of the entire Makefile for the given subdirectory. Examples of such assignments are `INCLUDES', `LDADD', `LDFLAGS', `TESTS'. These assignments affect the behaviour of the Makefile in the given directory indepedent of what gets built. In order for an assignment to be global, the name of the variable to which you are assigning must have a special meaning to Automake. If it does not, then the assignment has no effect, but it may be used as a variable in other assignments.
Primitive assignments declare the primitives that we want to build. Such assignments are `bin_PROGRAMS', `lib_LIBRARIES', and others. The general pattern of these assignments is two words seperated by an underscore. The second word is the type of the primitive being built, and it affects what Makefile rules are generated for building the primitive. The first word contains information about where to install the primitive once its built, so it affects the Makefile rules that handle the `install' and `uninstall' targets. The way this works is that for `bin' there corresponds a global assignment for `bindir' containing the installation directory. For example the symbols `bin', `lib', `include' have the following default assignments:
```
bindir     = $(prefix)/bin
libdir     = $(prefix)/lib
includedir = $(prefix)/include
```
These are the directories where you install executables, libraries and public header files. You can override the defaults by inserting different assignments in your `Makefile.am', but please don't do that. Instead you can define new assignments. For example, if you do
```
foodir = $(prefix)/foo
```
then that makes writing `foo_PROGRAMS', `foo_LIBRARIES' install in the `$(prefix)/foo' direcory instead. The symbols `check' and `noinst' have special meanings and you should not ever try to assign to `checkdir' and `noinstdir'.
- The `check' symbol, suggests that the primitive should only be built when the user invokes `make check' and it should not be installed. It is only meant to be executed as part of a test suite and then get scrapped.
- The `noinst' symbol, suggests that the primitive should not be installed. It will be built however normally, when you invoke `make'. You could use this to build convenience libraries which you intend to link in statically to executables which you do plan to install. You could also use this to build executables which will generate source code that will subsequently be used to build something installable.
Property assignments define the properties for every primitive that you declare. A property is also made of two words that are seperated by an underscore. The first word names the primitive to which the property refers to. The second word names the name of the property itself. For example when you define
```
bin_PROGRAMS = hello
```
this means that you can then say:
```
hello_SOURCES = ...
hello_LDADD   = ...
```
and so on. The `SOURCES' and `LDADD' are properties of `hello' which is a `PROGRAMS' primitive.

In addition to all this, you may include ordinary targets in a `Makefile.am' just as you would in an ordinary `Makefile.in'. If you do that however, then please check at some point that your distribution can properly build with `make distcheck'. It is very important that when you define your own rules, to build whatever you want to build, to follow the following guidelines:

Prepend all source files both in the dependencies and the rules with $(srcdir). This variable points to the directory where your source code is located during the current `make', which is not necessarily the same directory as the one returned by ``pwd`'. It is possible to do what is called a VPATH build where the generated files are created in a seperate directory tree from the source code. What ``pwd`' would return to you in that case would be the directory in which files are written, not the directory from which files are read. If you mess this up, then you will know when make distcheck fails, which attempts to do a VPATH build. The directory in which files are written can be accessed by the dot. For example, `./foo'.
If you need to get to any files from the top-level directory use $(top_srcdir) for files which you wrote (and your compiler tools read) and $(top_builddir) for files which the compiler wrote.

For your rules use only the following commands directly:

ar cat chmod cmp cp diff echo egrep expr false grep ls 
mkdir mv pwd rm rmdir sed sleep sort tar test touch true

Any other programs that you want to use, you must do so through make variables. That includes programs such as these:

awk bash bison cc flex install latex ld ldconfig lex ln make
makeinfo perl ranlib shar texi2dvi yacc

The make variables you define through Autoconf in your configure.in. For special-purpose tools, use the AC_PATH_PROGS macro. For example:

AC_PATH_PROGS(BASH, bash sh)
AC_PATH_PROGS(PERL, perl perl5.005 perl5.004 perl5.003 perl5.002 perl5.001)
AC_PATH_PROGS(SHAR, shar)
AC_PATH_PROGS(BISON, bison)

Some special tools have their own macros:

AC_PROG_MAKE_SET -> $(MAKE)   -> make
AC_PROG_RANLIB   -> $(RANLIB) -> ranlib | (do-nothing)
AC_PROG_AWK      -> $(AWK)    -> mawk | gawk | nawk | awk
AC_PROG_LEX      -> $(LEX)    -> flex | lex
AC_PROG_YACC     -> $(YACC)   -> 'bison -y' | byacc | yacc
AC_PROG_LN_S     -> $(LN_S)   -> ln -s

Before using any of these macros, consult the Autoconf documentation to see exactly what it is that they do.

Simple Automake examples

A real life example of a `Makefile.am' for libraries is the one I use to build the Blas-1 library. It looks like this:

* `blas1/Makefile.am'

SUFFIXES = .f
.f.o:
       $(F77) $(FFLAGS) -c $<

lib_LIBRARIES = libblas1.a
libblas1_a_SOURCES = f2c.h caxpy.f ccopy.f cdotc.f cdotu.f crotg.f cscal.f \
 csrot.f csscal.f cswap.f dasum.f daxpy.f dcabs1.f dcopy.f ddot.f dnrm2.f \
 drot.f drotg.f drotm.f drotmg.f dscal.f dswap.f dzasum.f dznrm2.f icamax.f \
 idamax.f isamax.f izamax.f sasum.f saxpy.f scasum.f scnrm2.f scopy.f \ 
 sdot.f snrm2.f srot.f srotg.f srotm.f srotmg.f sscal.f sswap.f zaxpy.f \ 
 zcopy.f zdotc.f zdotu.f zdrot.f zdscal.f zrotg.f zscal.f zswap.f

Because the Blas library is written in Fortran, I need to declare the Fortran suffix at the beginning of the `Makefile.am' with the `SUFFIXES' assignment and then insert an implicit rule for building object files from fortran files. The variables `F77' and `FFLAGS' are defined by Autoconf, by using the Fortran support provided by Autotools. For C or C++ files there is no need to include implicit rules. We discuss Fortran support at a later chapter.

Another important thing to note is the use of the symbol `$<'. We introduced these symbols in Chapter 2, where we mentioned that `$<' is the dependencies that changed causing the target to need to be rebuilt. If you've been paying attention you may be wondering why we didn't say `$(srcdir)/$<' instead. The reason is because for VPATH builds, `make' is sufficiently intelligent to substitute `$<' with the Right Thing.

Now consider the `Makefile.am' for building a library for solving linear systems of equations in a nearby directory:

* `lin/Makefile.am'

SUFFIXES = .f
.f.o:
       $(F77) $(FFLAGS) -c $<
INCLUDES = -I../blas1 -I../mathutil

lib_LIBRARIES = liblin.a
include_HEADERS = lin.h
liblin_a_SOURCES = dgeco.f dgefa.f dgesl.f f2c.h f77-fcn.h lin.h lin.cc

check_PROGRAMS = test1 test2 test3
TESTS = $(check_PROGRAMS)
LDADD = liblin.a ../blas1/libblas1.a ../mathutil/libmathutil.a $(FLIBS) -lm

test1_SOURCES = test1.cc f2c-main.cc
test2_SOURCES = test2.cc f2c-main.cc  
test3_SOURCES = test3.cc f2c-main.cc

In this case, we have a library that contains mixed Fortran and C++ code. We also have an example of a test suite, which in this case contains three test programs. What's new here is that in order to link the test suite properly we need to link in libraries that have been built already in other directories but haven't been installed yet. Because every test program requires to be linked against the same libraries, we set these libraries globally with an `LDADD' assignment for all executables. Because the libraries have not been installed yet we specify them with their full path. This will allow Automake to track dependencies correctly; if `libblas1.a' is modified, it will cause the test suite to be rebuilt. Also the variable `INCLUDES' is globally assigned to make the header files of the other two libraries accessible to the source code in this directory. The variable `$(FLIBS)' is assigned by Autoconf to link the run-time Fortran libraries, and then we link the installed `libm.a' library. Because that library is installed, it must be linked with the `-l' flag. Another peculiarity in this example is the file `f2c-main.cc' which is shared by all three executables. As we will explain later, when you link executables that are derived from mixed Fortran and C or C++ code, then you need to link with the executable this kludge file.

The test-suite files for numerical code will usually invoke the library to perform a computation for which an exact result is known and then verify that the result is true. For non-numerical code, the library will need to be tested in different ways depending on what it does.

Built sources

In some complicated packages, you want to generate part of their source code by executing a program at compile time. For example, in one of the packages that I wrote for an assignment, I had to generate a file `incidence.out' that contained a lot of hairy matrix definitions that were too ugly to just compute and write by hand. That file was then included by `fem.cc' which was part of a library that I wrote to solve simple finite element problems, with a preprocessor statement:

#include "incidence.out"

All source code files that are to be generated during compile time should be listed in the global definition of `BUILT_SOURCES'. This will make sure that these files get compiled before anything else. In our example, the file `incidence.out' is computed by running a program called `incidence' which of course also needs to be compiled before it is run. So the `Makefile.am' that we used looked like this:

noinst_PROGRAMS = incidence
lib_LIBRARIES = libpmf.a

incidence_SOURCES = incidence.cc mathutil.h
incidence_LDADD = -lm

incidence.out: incidence
      ./incidence > incidence.out

BUILT_SOURCES = incidence.out
libpmf_a_SOURCES = laplace.cc laplace.h fem.cc fem.h mathutil.h

check_PROGRAMS = test1 test2
TESTS = $(check_PROGRAMS)

test1_SOURCES = test1.cc
test1_LDADD = libpmf.a -lm

test2_SOURCES = test2.cc
test2_LDADD = libpmf.a -lm

Note that because the executable `incidence' has been created at compile time, the correct path is `./incidence'. Always keep in mind, that the correct path to source files, such as `incidence.cc' is `$(srcdir)/incidence.cc'. Because the `incidence' program is used temporarily only for the purposes of building the `libpmf.a' library, there is no reason to install it. So, we use the `noinst' prefix to instruct Automake not to install it.

Installation directories.

Previously, we mentioned that the symbols `bin', `lib' and `include' refer to installation locations that are defined respectively by the variables `bindir', `libdir' and `includedir'. For completeness, we will now list the installation locations available by default by Automake and describe their purpose.

All installation locations are placed under one of the following directories:

`prefix'

The default value of `$(prefix)' is `/usr/local' and it is used to construct installation locations for machine-indepedent files. The actual value is specified at configure-time with the `--prefix' argument. For example:

configure --prefix=/home/lf

`exec_prefix'

The default value of `$(exec_prefix)' is `$(prefix)' and it used to construct installation location for machine-dependent files. The actual value is specified at configure-time with the `--exec-prefix' argument. For example:

configure --prefix=/home/lf --exec-prefix=/home/lf/gnulinux

The purpose of using a seperate location for machine-dependent files is because then it makes it possible to install the software on a networked file server and make that available to machines with different architectures. To do that there must be seperate copies of all the machine-dependent files for each architecture in use.

Executable files are installed in one of the following locations:

bindir     = $(exec_prefix)/bin
sbindir    = $(exec_prefix)/sbin
libexecdir = $(exec_prefix)/libexec

`bin': Executable programs that users can run.
`sbin': Executable programs for the super-user.
`libexec': Executable programs to be called by other programs.

Library files are installed under

libdir = $(exec_prefix)/lib

Include files are installed under

includedir = $(prefix)/include

Data files are installed in one of the following locations:

datadir        = $(prefix)/share
sysconfdir     = $(prefix)/etc
sharedstatedir = $(prefix)/com
localstatedir  = $(prefix)/var

`data': Read-only architecture indepedent data files.
`sysconf': Read-only configuration files that pertain to a specific machine. All the files in this directory should be ordinary ASCII files.
`sharedstate': Architecture indepedent data files which programs modify while they run.
`localstate': Data files which programs modify while they run that pertain to a specific machine.

Autoconf macros should be installed in `$(datadir)/aclocal'. There is no symbol defined for this location, so you need to define it yourself:

m4dir = $(datadir)/aclocal

FIXME: Emacs Lisp files?

FIXME: Documentation?

Automake, to encourage tidyness, also provides the following locations such that each package can keep its stuff under its own subdirectory:

pkglibdir         = $(libdir)/@PACKAGE@
pkgincludedir     = $(includedir)/@PACKAGE@
pkgdatadir        = $(datadir)/@PACKAGE@

There are a few other such `pkg' locations, but they are not practically useful.

Handling shell scripts

Sometimes you may feel the need to implement some of your programs in a scripting language like Bash or Perl. For example, the `autotools' package is exclusively a collection of shell scripts. Theoretically, a script does not need to be compiled. However, there are still issues pertaining to scripts such as:

You want scripts to be installed with make install, uninstalled with make uninstall and distributed with make dist.
You want scripts to get the path in the #! right.

To let Automake deal with all this, you need to use the `SCRIPTS' primitive. By listing a file under a `SCRIPTS' primitive assignment, you are telling Automake that this file needs to be built, and must be allowed to be installed in a location where executable files are normally installed. Automake by default will not clean scripts when you invoke the `clean' target. To force Automake to clean all the scripts, you need to add the following line to your `Makefile.am':

CLEANFILES = $(bin_SCRIPTS)

You also need to write your own targets for building the script by hand.

For example:

`hello1.sh'

# -* bash *-
echo "Howdy, world!"
exit 0

`hello2.pl'

# -* perl *-
print "Howdy, world!\n";
exit(0);

`Makefile.am'

bin_SCRIPTS = hello1 hello2
CLEANFILES = $(bin_SCRIPTS)
EXTRA_DIST = hello1.sh hello2.pl

hello1: $(srcdir)/hello1.sh
      rm -f hello1
      echo "#! " $(BASH) > hello1
      cat $(srcdir)/hello1.sh >> hello1
      chmod ugo+x hello1

hello2: $(srcdir)/hello2.pl
      $(PERL) -c hello2.pl
      rm -f hello2
      echo "#! " $(PERL) > hello2
      cat $(srcdir)/hello2.pl >> hello2
      chmod ugo+x hello2

`configure.in'

AC_INIT
AM_INIT_AUTOMAKE(hello,0.1)
AC_PATH_PROGS(BASH, bash sh)
AC_PATH_PROGS(PERL, perl perl5.004 perl5.003 perl5.002 perl5.001 perl5)
AC_OUTPUT(Makefile)

Note that in the "source" files `hello1.sh' and `hello2.pl' we do not include a line like

#!/bin/bash
#!/usr/bin/perl

Instead we let Autoconf pick up the correct path, and then we insert it during make. Since we omit the #! line, we leave a comment instead that indicates what kind of file this is.

In the special case of perl we also invoke

perl -c hello2.pl

This checks the perl script for correct syntax. If your scripting language supports this feature I suggest that you use it to catch errors at "compile" time. The AC_PATH_PROGS macro looks for a specific utility and returns the full path.

If you wish to conform to the GNU coding standards, you may want your script to support the --help and --version flags, and you may want --version to pick up the version number from AM_INIT_AUTOMAKE.

Here's an enhanced hello world scripts:

version.sh.in
```
VERSION=@VERSION@
```
version.pl.in
```
$VERSION="@VERSION@";
```

hello1.sh

# -* bash *-
function usage
{
 cat << EOF
Usage: 
% hello [OPTION]
 
Options:
  --help     Print this message
  --version  Print version information

Bug reports to: monica@whitehouse.gov
EOF 
}
 
function version
{
 cat << EOF
hello $VERSION - The friendly hello world program
Copyright (C) 1997 Monica Lewinsky <monica@whitehouse.gov>
This is free software, and you are welcome to redistribute it and modify it 
under certain conditions. There is ABSOLUTELY NO WARRANTY for this software.
For legal details see the GNU General Public License.
EOF
}

function invalid
{
 echo "Invalid usage. For help:"
 echo "% hello --help"
}

# -------------------------
if test $# -ne 0
then
  case $1 in
  --help)
    usage
    exit
    ;;
  --version)
    version
    exit
    ;;
  *)
    invalid
    exit
    ;;
fi

# ------------------------ 
echo "Howdy world"
exit

hello2.pl

# -* perl *-
sub usage
{
 print <<"EOF";
Usage: 
% hello [OPTION]

Options:
  --help     Print this message
  --version  Print version information

Bug reports to: monica@whitehouse.gov
EOF
exit(1);
}

sub version
{
 print <<"EOF";
hello $VERSION - The friendly hello world program
Copyright (C) 1997 Monica Lewinsky <monica@whitehouse.gov>
This is free software, and you are welcome to redistribute it and modify it 
under certain conditions. There is ABSOLUTELY NO WARRANTY for this software.
For legal details see the GNU General Public License.
EOF
 exit(1);
}

sub invalid
{
 print "Invalid usage. For help:\n";
 print "% hello --help\n";
 exit(1);
}

# --------------------------
if ($#ARGV == 0)
{
 do version() if ($ARGV[0] eq "--version");
 do usage()   if ($ARGV[0] eq "--help");
 do invalid();
}
# --------------------------
print "Howdy world\n";
exit(0);

Makefile.am

bin_SCRIPTS = hello1 hello2
CLEANFILES = $(bin_SCRIPTS)
EXTRA_DIST = hello1.sh hello2.pl

hello1: $(srcdir)/hello1.sh $(srcdir)/version.sh
      rm -f hello1
      echo "#! " $(BASH) > hello1
      cat $(srcdir)/version.sh $(srcdir)/hello1.sh >> hello1
      chmod ugo+x hello1

hello2: $(srcdir)/hello2.pl $(srcdir)/version.pl
      $(PERL) -c hello2.pl
      rm -f hello2
      echo "#! " $(PERL) > hello2
      cat $(srcdir)/version.pl $(srcdir)/hello2.pl >> hello2
      chmod ugo+x hello2

configure.in

AC_INIT
AM_INIT_AUTOMAKE(hello,0.1)
AC_PATH_PROGS(BASH, bash sh)
AC_PATH_PROGS(PERL, perl perl5.004 perl5.003 perl5.002 perl5.001 perl5)
AC_OUTPUT(Makefile
          version.sh
          version.pl
         )

Basically the idea with this approach is that when configure calls AC_OUTPUT it will substitute the files version.sh and version.pl with the correct version information. Then, during building, the version files are merged with the scripts. The scripts themselves need some standard boilerplate code to handle the options. I've included that code here as a sample implementation, which I hereby place in the public domain.

This approach can be easily generalized with other scripting languages as well, like Python and Guile.

Handling other obscurities

To install data files, you should use the `DATA' primitive instead of the `SCRIPTS'. The main difference is that `DATA' will allow you to install files in data installation locations, whereas `SCRIPTS' will only allow you to install files in executable installation locations.

Normally it is assumed that the files listed in `DATA' are not derived, so they are not cleaned. If you do want to derive them however from an executable file, then you can do so like this:

bin_PROGRAMS = mkdata
mkdata_SOURCES = mkdata.cc

pkgdata_DATA = thedata
CLEANFILES = $(datadir_DATA)

thedata: mkdata
      ./mkdata > thedata

In general however, data files are boring. You just write them, and list them in a `DATA' assignment:

pkgdata_DATA = foo1.dat foo2.dat foo3.dat ...

If your package requires you to edit a certain type of files, you might want to write an Emacs editing mode for that file type. Emacs modes are written in Elisp files that are prefixed with `.el' like in `foo.el'. Automake will byte-compile and install Elisp files using Emacs for you. You need to invoke the

AM_PATH_LISPDIR

macro in your `configure.in' and list your Elisp files under the `LISP' primitive:

lisp_LISP = mymode.el

The `LISP' primitive also accepts the `noinst' location.

There is also support for installing Autoconf macros, documentation and dealing with shared libraries. These issues however are complicated, and they will be discussed in seperate chapters.

Using Autotools

Introduction

At the moment Autotools distributes the following additional utilitities:

The `gpl' utility, for generating copyright notices.
The `acmkdir' utility for generating directory trees for new packages
The LF macros which introduce mainly support for C++, Fortran and embedded text.

We have already discussed the `gpl' utility in Chapter 1. In this chapter we will focus mainly on the LF macros and the `acmkdir' utility but we will postpone our discussion of Fortran support until the next chapter.

Compiler configuration with the `LF` macros

In last chapter we explained that a minimal `configure.in' file looks like this:

AC_INIT
AM_CONFIG_HEADER(config.h)
AM_INIT_AUTOMAKE(package,version)
AC_PROG_CXX
AC_PROG_RANLIB
AC_OUTPUT(Makefile ... )

If you are not building libraries, you can omit AC_PROG_RANLIB.

Alternatively you can use the following macros that are distributed with Autotools, and made accessible through the `aclocal' utility. All of them are prefixed with `LF' to distinguish them from the standard macros:

LF_CONFIGURE_CC

This macro is equivalent to the following invokation:

AC_PROG_CC
AC_PROG_CPP
AC_AIX
AC_ISC_POSIX
AC_MINIX
AC_HEADER_STDC

which is a traditional Autoconf idiom for setting up the C compiler.

LF_CONFIGURE_CXX

This macro calls

AC_PROG_CXX
AC_PROG_CXXCPP

and then invokes the portability macro:

LF_CPP_PORTABILITY

This is the recommended way for configuring your C++ compiler.

LF_HOST_TYPE

This is here mainly because it is required by `LF_CONFIGURE_FORTRAN'. This macro determines your operating system and defines the C preprocessor macro `YOUR_OS' with the answer. You can use this in your program for spiffiness purposes such as when the program identifies itself at the user's request, or during initialization.

LF_CPP_PORTABILITY

This macro allows you to make your `C++' code more portable and a little nicer.

It detects whether your compiler supports `bool'.
It checks whether your compiler deals with for loops correctly.
At the users option, it introduces nice and powerful assertions.
It defines syntactic sugar, that I am personally addicted to

If you must call this macro, do so after calling `LF_CONFIGURE_CXX'. We describe the features in more detail in the next section. To take advantage of these features, all you have to do is

#include <config.h>

In the past it used to be necessary to have to include a file called `cpp.h'. I've sent this file straight to hell.

LF_SET_WARNINGS

This macro enables you to activate warnings at configure time. If called, then the user can request warnings by passing the `--with-warnings' flag to the compiler like this:

$ configure ... --with-warnings ...

Warnings can help you find out many bugs, as well as help you improve your coding habits. On the other hand, in many cases, many of these warnings are false alarms, which is why the default behaviour of the compiler is to not show them to you. You are probably interested in warnings if you are the developer, or a paranoid end-user.

The minimal recommended `configure.in' file for a pure C++ project is:

AC_INIT
AM_CONFIG_HEADER(config.h)
AM_INIT_AUTOMAKE(package,version)
LF_CONFIGURE_CXX
AC_PROG_RANLIB
AC_OUTPUT(Makefile .... )

A full-blown `configure.in' file for projects that mix Fortran and C++ (and may need the C compiler also if using `f2c') invokes all of the above macros:

AC_INIT
AM_INIT_AUTOMAKE(package,version)
LF_CANONICAL_HOST
LF_CONFIGURE_CC
LF_CONFIGURE_CXX
LF_CONFIGURE_FORTRAN
LF_SET_WARNINGS
AC_PROG_RANLIB
AC_CONFIG_SUBDIRS(fortran/f2c fortran/libf2c)
AC_OUTPUT(Makefile ...)

The features of `LF_CPP_PORTABILITY'

In order for LF_CPP_PORTABILITY to work correctly you need to append certain things at the bottom of your `acconfig.h'. This is done for you automatically by acmkdir. When the LF_CPP_PORTABILITY macro is invoked by `configure.in' then the following portability problems are checked:

The bool data type: It is rather unfortunate that most proprietary compilers don't have such a beautiful and handy feature. If the configure script detects that you have no bool, then it defines the macro CXX_HAS_NO_BOOL. It is possible to emulate bool with the following C preprocessor directives:
```
#ifdef CXX_HAS_NO_BOOL
#define bool int
#define true 1
#define false 0
#endif
```
To make your code portable to compilers that don't support bool, through this workaround, you must follow one rule: never overload your functions in a way in which the only distinguishing feature is bool vs int. This workaround is included in the default `acconfig.h' after @BOTTOM@ that gets installed by acmkdir.
Incorrect for-loop scoping: Another obnoxious bug with many compilers is that they refuse to compile the following code:
```
#include <iostream.h>
main()
{
 for (int i=0;i<10;i++) { }
 for (int i=0;i<10;i++) { }
}
```
This is legal C++ and the variable i is supposed to have scope only inside the forloop braces and the parentheses. Unfortunately, most C++ compilers use an obsolete version of the standard's draft in which the scope of i is the entire main in this example. The workaround we use is as follows:
```
#ifdef CXX_HAS_BUGGY_FOR_LOOPS
#define for if(1) for
#endif
```
By nesting the forloop inside an if-statement, the variable i is assigned the correct scope. Now if your if-statement scoping is also broken then you really need to get another compiler. The macro CXX_HAS_BUGGY_FOR_LOOPS is defined for you if appropriate, and the code for the work-around is included with the default acconfig.h.

In addition to these workarounds, the following additional features are introduced at the end of the default acconfig.h. The features are enabled only if your `configure.in' calls LF_CPP_PORTABILITY.

Looping macros: The macro loop is defined such that
```
loop(i,a,b)
```
is equivalent
```
for (int i = a; i <= b; i++)
```
This is syntactic sugar that makes it easier on the hand to write nested loops like:
```
int Ni,Nj,Nk;
loop(i,0,Ni) loop(j,0,Nj) loop(k,0,Nk) { ... }
```
minimizing the probability of making a spelling bug. If you need to do more unusual looping you can use one of the following macros:
```
inverse_loop(i,a,b)   <--> for (int i = a; i >= b; i--)
integer_loop(i,a,b,s) <--> for (int i = a; i <= b; i += s)
```
This feature depends on having correct scoping in `for' which fortunately is easily taken care of.
Class protection levels: The following macros are defined:
```
#define pub public:
#define pro protected:
#define pri private:
```
Now you can declare a class prototype in a java-like style like this:
```
class foo
{
 pri double a,b;
 pub double c,d;
 
 pub foo();
 pub virtual ~foo();
 
 pri void method1(void);
 pub void method2(void);
};
```
Personally I find this notation more lucid than the standard C++ syntax because this way I can see the protection level of each variable and method without having to possibly scroll up to see what it is. Also, it is less bug-prone this way.
The Pi: Every mathematician would like to know what pi is, so this is as good a place as any to throw it in:
```
const double pi = 3.14159265358979324;
```
Assert: The idea behind assert is simple. Suppose that at a certain point in your code, you expect two variables to be equal. If this expectation is a precondition that must be satisfied in order for the subsequent code to execute correctly, you must assert it with a statement like this:
```
assert(var1 == var2);
```
In general assert takes as argument a boolean expression. If the boolean expression is true, execution continues. Otherwise the `abort' system call is invoked and the program execution is stopped. If a bug prevents the precondition from being true, then you can trace the bug at the point where the precondition breaks down instead of further down in execution or not at all. The `assert' call is implemented as a C preprocessor macro, so it can be enabled or disabled at will. One way to enable assertions is to include `assert.h'.
```
#include <assert.h>
```
Then it's possible to disable them by defining the `NDEBUG' macro. Alternatively, because it is easy to provide our own assert, if your `configure.in' invokes `LF_CPP_PORTABILITY' then `assert' will be conditionally defined for you in the `config.h' file. By default, the `configure' script will enable assertions. You can disable assertions at configure-time like this:
```
% configure ... --disable-assert ...
```
During debugging and testing it is a good idea to leave assertions enabled. However, for production runs it's best to disable them. If your program crashes at an assertion, then the first thing you should do is to find out where the error happens. To do this, run the program under the `gdb' debugger. First invoke the debugger:
```
% gdb
...copyright notice...
```
Then load the executable and set a breakpoint at the `abort' system call:
```
(gdb) file "executable"
(gdb) break abort
```
Now run the program:
```
(gdb) run
```
Instead of crashing, under the debugger the program will be paused when the `abort' system call is invoked, and you will get back the debugger prompt. Now type:
```
(gdb) where
```
to see where the crash happened. You can use the `print' command to look at the contents of variables and you can use the `up' and `down' commands to navigate the stack. For more information, see the GDB documentation or type `help' at the prompt of gdb. Another suggestion is to never call the abort system call directly. Instead, please do this:
```
assert(false);
exit(1);
```
This way if assertions are enabled, the program will stop and the stack will be retained. Otherwise the program will simply exit.

Writing portable C++

The C++ language has been standardized very recently. As a result, not all compilers fully support all the features that the ANSI C++ standard requires, including the g++ compiler itself. Some of the problems commonly encountered, such as incorrect scoping in for-loops and lack of the bool data type can be easily worked around. In this section we give some tips for avoiding more portability problems. I welcome people on the net reading this to email me their tips, to be included in this tutorial.

The following code, as much as it may seem reasonable is not correct:
```
int n = 10;
double **foo;
foo = new (double *)[i];
```
The g++ compiler will parse this and do the right thing, but other compilers are more picky. The correct way to do it is:
```
int n = 10;
double **foo;
foo = new double * [i];
```
Do not use exceptions, RTTI and STL. Yes, they can make your code more spiffy, but almost certainly you can live without them, and save yourself serious portability pains.
Do not use templates, unless it is absolutely necessary. If you believe it is possible to implement an idea without templates, then do it without templates. If you do need to use templates, try to limit the amount of features you invoke to the absolute minimum. In particular:
- A class can have a template function as a member function. Unfortunately this is not supported by g++.
- Some people discovered that by using templates with types that are templates of types that are templates of types that are templates .... recursively, they can force the C++ compiler to do computation at compile time and generate C++ code that is highly optimized. Problems in linear algebra, FFTs and such have been implemented in this manner, and the performance of the corresponding C++ code beat out Fortran implementations. This is a great idea of the future, but at the moment many compilers, including g++ can't properly cope with such code. For now don't make templates of templates of things. Only make templates of things.

FIXME: I need to add some stuff here.

Hello world revisited again

Putting all of this together, we will now show you how to create a super Hello World package, using the LF macros and the utilities that are distributed with the `autotools' distribution.

The first step is to build a directory tree for the new project. Instead of doing it by hand, use the `acmkdir' utility. Type:

% acmkdir hello

`acmkdir' prompts you with the current directory pathname. Make sure that this is indeed the directory where you want to install the directory tree for the new package. You will be prompted for some information about the newly created package. When you are done, `acmkdir' will ask you if you really want to go for it. Say `y'. Then `acmkdir' will do the following:

Create the `hello-0.1' directory and the `doc', `m4' and `src' subdirectories.
Generate the following `configure.in':
```
AC_INIT
AM_CONFIG_HEADER(config.h)
AM_INIT_AUTOMAKE(test,0.1)
LF_HOST_TYPE
LF_CONFIGURE_CXX
LF_SET_WARNINGS
AC_PROG_RANLIB
AC_OUTPUT(Makefile doc/Makefile m4/Makefile src/Makefile)
```
You can edit this and customize it to your needs. More specifically, you will need to update the version number here everytime to you cut a new distribution.
Place boilerplate `Makefile.am' files on the toplevel directory as well as the `doc', `m4' and `src' subdirectories. The toplevel `Makefile.am' contains:
```
EXTRA_DIST = reconf configure
SUBDIRS = m4 doc src
```
The ones in the src and doc subdirectories are empty. The one in `m4' contains a template `Makefile.am' which you should edit if you want to add new macros.
Install the GPL in `COPYING' and a standard `INSTALL' file which you can customize.
Create the files `AUTHORS', `NEWS', `README', `THANKS' and `ChangeLog' and place some default content in them which you should edit further.
Create a `reconf' script for reconfiguring your package every time you modify the contents of the `m4' subdirectory and need to rebuild `configure' and the makefiles:
```
#!/bin/sh
rm -f config.cache
rm -f acconfig.h
aclocal -I m4
autoconf
acconfig
autoheader
automake -a
exit
```
The makes sure that all the utilities are invoked, and in the right order. Before `acmkdir' exits, it will call the `reconf' script for you once to set things up.

It must be obvious that having to do these tasks manually for every package you write can get to be tiring. With `acmkdir' you can slap together all this grunt-work in a matter of seconds.

Now enter the directory `hello-0.1/src' and start coding:

% cd hello-0.1/src
% gpl -cc hello.cc
% vi hello.cc
% vi Makefile.am

This time we will use the following modified hello world program:

#ifdef HAVE_CONFIG_H
#include <config.h>
#endif

#include <iostream.h>

main()
{
 cout << "Welcome to " << PACKAGE << " version " << VERSION;
 cout << " for " << YOUR_OS << endl;
 cout << "Hello World!" << endl;
}

and for `Makefile.am' the same old thing:

bin_PROGRAMS = hello
hello_SOURCES = hello.cc

Now back to the toplevel directory:

% cd ..
% reconf
% configure
% make
% src/hello
Welcome to test version 0.1 for i486-pc-linux-gnulibc1
Hello World!

Note that by using the special macros PACKAGE, VERSION, YOUR_OS the program can identify itself, its version number and the operating system for which it was compiled. The PACKAGE and VERSION are defined by AM_INIT_AUTOMAKE and YOUR_OS by LF_HOST_TYPE.

Now you can experiment with the various options that configure offers. You can do:

% make distclean

and reconfigure the package with one of the following variations in options:

% configure --disable-assert
% configure --with-warnings

or a combination of the above. You can also build a distribution of your hello world and feel cool about yourself:

% make distcheck

The important thing is that you can write extensive programs like this and stay focused on writing code instead of maintaining stupid header file, scripts, makefiles and all that.

Invoking `acmkdir'

The `acmkdir' utility can be invoked in the simple manner that we showed in the last chapter to prepare the directory tree for writing C++ code. Alternatively, it can be instructed to create directory trees for Fortran/C++ code as well as documentation directories.

In general, you invoke `acmkdir' in the following manner:

% acmkdir [OPTIONS] "dirname"

If you are creating a toplevel directory, then everything will appear under `dirname-0.1'. Otherwise, the name `dirname' will be used instead.

`acmkdir' supports the following options:

`--help'

Print a short message reminding the usage of the `acmkdir' command.

`--version'

Print the version information and copyright notice for `acmkdir'.

`-doc'

Instruct `acmkdir' to create a texidoc documentation directory. What to put under that directory will be explained in more detail on a separate chapter about documentation. If your package will have more than one documentation texts, you usually want to invoke this under the `doc' subdirectory:

% cd doc
% acmkdir -doc tutorial
% acmkdir -doc manual

Of course, the `Makefile.am' under the `doc' directory will need to refer to these subdirectories with a SUBDIRS entry:

SUBDIRS = tutorial manual

Alternatively, if you decide to use the `doc' directory itself for documentation (and you are massively sure about this), then you can

% rm -rf doc
% acmkdir -doc doc

Note that this is not the FSF standard way of handling documentation. This is an Autotools feature.

`-latex'

Instruct `acmkdir' to create a latex documentation directory. Again, the details of how to do this will be explained in a separate chapter. The disadvantage of using `latex' for your documentation is that you can only produce a printed book; you can not also generate on-line documentation. The advantage is that you can typeset very complex mathematics, something which you can not do under Texinfo since it only uses plain TeX. If you are documentating mathematical software, you may prefer to write the documentation in Latex. Autotools will provide you with LaTeX macros for making your documentation look like Texinfo documentation.

`-t, --type=TYPE'

Instruct `acmkdir' to create a top-level directory of type TYPE. The types available are: default, traditional, fortran. Eventually I may implement two additional types: f77, f90.

Now, a brief description of these toplevel types:

default

This is the default type of toplevel directory. It is intended for C++ programs and uses the LF macros installed by Autotools. The `acconfig.h' file is automagically generated and a custom `INSTALL' file is installed. The defaults reflect my own personal habits.

traditional

This is much closer to the FSF default habits. The default language is C, the traditional Autoconf macros are used and the `acconfig.h' file is not automatically generated, except for adding the lines

#undef PACKAGE
#undef VERSION

which are required by Automake.

fortran

This is a rather complicated type. It is intended for programs that mix C++ and Fortran. It installs an appropriate `configure.in', and creates an entire directory under the toplevel directory called `fortran'. In that directory, there's installed a copy of the f2c translator. The software is configured such that if a Fortran compiler is not available, f2c is built instead, and then used to compile the Fortran code. We will explain all about Fortran in the next chapter.

Handling Embedded text

In some cases, we want to embed text to the executable file of an application. This may be on-line help pages, or it may be a script of some sort that we intend to execute by an interpreter library that we are linking with, like Guile or Tcl. Whatever the reason, if we want to compile the application as a stand-alone executable, it is necessary to embed the text in the source code. Autotools provides with the build tools necessary to do this painlessly.

As a tutorial example, we will write a simple program that prints the contents of the GNU General Public License. First create the directory tree for the program:

% acmkdir copyleft

Enter the directory and create a copy of the txtc compiler:

% cd copyleft-0.1
% mktxtc

Then edit the file `configure.in' and add a call to the LF_PROG_TXTC macro. This macro depends on

AC_PROG_CC
AC_PROG_AWK

so make sure that these are invoked also. Finally add `txtc.sh' to your AC_OUTPUT. The end-result should look like this:

AC_INIT(reconf)
AM_CONFIG_HEADER(config.h)
AM_INIT_AUTOMAKE(copyleft,0.1)
LF_HOST_TYPE
LF_CONFIGURE_CC
LF_CONFIGURE_CXX
LF_SET_OPTIMIZATION
LF_SET_WARNINGS
AC_PROG_RANLIB
AC_PROG_AWK
LF_PROG_TXTC
AC_OUTPUT(Makefile txtc.sh doc/Makefile m4/Makefile src/Makefile)

Then, enter the `src' directory and create the following files:

% cd src
% gpl -l gpl.txt
% gpl -cc gpl.h
% gpl -cc copyleft.cc

The `gpl.txt' file is the text that we want to print. You can substitute it with any text you want. This file will be compiled into `gpl.o' during the build process. The `gpl.h' file is a header file that gives access to the symbols defined by `gpl.o'. The file `copyleft.cc' is where the main will be written.

Next, add content to these files as follows:

gpl.h

extern int gpl_txt_length;
extern char *gpl_txt[];

copyleft.cc

#ifdef HAVE_CONFIG_H
#include <config.h>
#endif

#include <iostream.h>
#include "gpl.h"
 
main()
{
 loop(i,1,gpl_txt_length)
 { cout << gpl_txt[i] << endl; }
}

Makefile.am

SUFFIXES = .txt
.txt.o:
       $(TXTC) $<
 
bin_PROGRAMS = copyleft
foo_SOURCES = copyleft.cc gpl.h gpl.txt

and now you're set to build. Go back to the toplevel directory and go for it:

$ cd ..
$ reconf
$ configure
$ make
$ src/copyleft | less

To verify that this works properly, do the following:

$ cd src
$ copyleft > copyleft.out 
$ diff gpl.txt copyleft.out

The two files should be identical. Finally, convince yourself that you can make a distribution:

$ make distcheck

and there you are.

Note that in general the text file, as encoded by the text compiler, will not be always identical to the original. There is one and only one modification being made: If any line has any blank spaces at the end, they are trimmed off. This feature was introduced to deal with a bug in the Tcl interpreter, and it is in general a good idea since it conserves a few bytes, it never hurts, and additional whitespace at the end of a line shouldn't really be there.

This magic is put together from many different directions. It begins with the LF_PROG_TXTC macro:

LF_PROG_TXTC

This macro will define the variable TXTC to point to a Text-to-C compiler. To create a copy of the compiler at the toplevel directory of your source code, use the mktxtc command:

% mktxtc

The compiler is implemented as a shell script, and it depends on sed, awk and the C compiler, so you should call the following two macros before invoking AC_PROG_TXTC:

AC_PROG_CC
AC_PROG_AWK

The compiler is intended to be used as follows:

$(TXTC) text1.txt text2.txt text3.txt ...

such that given the files `text1.txt', `text2.txt', etc. object files `text1.o', `text2.o', etc, are generated that contains the text from these files.

From the Automake point of view, you need to add the following two lines to Automake:

SUFFIXES = .txt
.txt.o:
        $(TXTC) $<

assuming that your text files will end in the .txt suffix. The first line informs Automake that there exist source files using non-standard suffixes. Then we describe, in terms of an abstract Makefile rule, how to build an object file from these non-standard suffixes. Recall the use of the symbol $<. Also note that it is not necessary to use $(srcdir) on $< for VPATH builds. If you embed more than one type of files, then you may want to use more than one suffixes. For example, you may have `.hlp' files containing online help and `.scm' files containing Guile code. Then you want to write a rule for each suffix as follows:

SUFFIXES = .hlp .scm
.hlp.o:
        $(TXTC) $<
.scm.o:
        $(TXTC) $<

It is important to put these lines before mentioning any SOURCES assignments. Automake is smart enough to parse these abstract makefile rules and recognize that files ending in these suffixes are valid source code that can be built to object code. This allows you to simply list `gpl.txt' with the other source files in the SOURCES assignment:

copyleft_SOURCES = copyleft.cc gpl.h gpl.txt

In order for this to work however, Automake must be able to see your abstract rules first.

When you "compile" a text file `foo.txt' this makes an object file that defines the following two symbols:

int foo_txt_length;
char *foo_txt[];

Note that the dot characters are converted into underscores. To make these symbols accessible, you need to define an appropriate header file with the following general form:

extern int foo_txt_length; 
extern char *foo_txt[];

When you include this header file into your other C or C++ files then:

You can obtain the filename containing the original text from
```
foo_txt[0];
```
and use it to print diagnostic messages.

You can obtain the text itself line by line:

char *foo_txt[1];   -> first line
char *foo_txt[2];   -> second line
...

The last line is set to NULL and foo_txt_length is defined such that
```
char *foo_txt[foo_txt_length+1] == NULL
```
The last line of the text is:
```
char *foo_txt[foo_txt_length];
```
You can use a for loop (or the loop macro defined by LF_CPP_PORTABILITY) together with foo_txt_length to loop over the entire text, or you can exploit the fact that the last line points to NULL and do a while loop.

and that's all there is to it.

Handling very deep packages

When making a package, you can organize it as a flat package or a deep package. In a flat package, all the source files are placed under src without any subdirectory structure. In a deep package, libraries and groups of executables are seperated by a subdirectory structure. The perennial problem with deep packages is dealing with interdirectory dependencies. What do you do if to compile one library you need header files from another library in another directory? What do you do if to compile the test suite of your library you need to link in another library that has just been compiled in a different directory?

One approach is to just put all these interdependent things in the same directory. This is not very unreasonable since the Makefile.am can document quite thoroughly where each file belongs, in case you need to split them up in the future. On the other hand, this solution becomes less and less preferable as your project grows. You may not want to clutter a directory with source code for too many different things. What do you do then?

The second approach is to be careful about these dependencies and just invoke the necessary features of Automake to make everything work out.

For *.a files (library binaries), the recommended thing to do is to link them by giving the full relative pathname. Doing that allows Automake to work out the dependencies correctly accross multiple directories. It also allows you to easily upgrade to shared libraries with Libtool. To retain some flexibility it may be best to list these interdirectory link sequences in variables and then use these variables. This way, when you move things around you minimize the amount of editing you have to do. In fact, if all you need these library binaries for is to build a test suite you can simply assign them to LDFLAGS. To make these assignments more uniform, you may want to start your pathnames with $(top_builddir).

For *.h files (header files), you can include an

INCLUDES = -I../dir1 -I../dir2 -I../dir3 ...

assignment on every `Makefile.am' of every directory level listing the directories that contain include files that you want to use. If your directory tree is very complicated, you may want to make these assignments more uniform by starting your pathnames from $(top_srcdir). In your source code, you should use the syntax

#include "foo.h"

for include files in the current directory and

#include <foo.h>

for include files in other directories.

There is a better third approach, provided by Autotools, but it only applies to include files. There is nothing more that can be done with library binaries; you simply have to give the path. But with header files, it is possible to arrange at configure-time that all header files are symlinked under the directory $(top_builddir)/include. Then you will only need to list one directory instead of many.

Autotools provides two Autoconf macros: LF_LINK_HEADERS and LF_SET_INCLUDES, to handle this symlinking.

LF_LINK_HEADERS

This macro links the public header files under a certain set of directories under an include directory from the toplevel. A simple way to invoke this macro is by listing the set of directories that contain public header files:

LF_LINK_HEADERS(src/dir1 src/dir2 src/dir3 ... src/dirN)

When this macro is invoked for the first time, the directory `$(top_srcdir)/include' is erased. Then for each directory `src/dirK' listed, we look for the file `src/dirK/Headers' and link the public header files mentioned in that file under `$(top_srcdir)/include'. The link will be either symbolic or hard, depending on the capabilities of your operating system. If possible, a symbolic link will be prefered. You can invoke the same macro by passing an optional argument that specifies a directory name. For example:

LF_LINK_HEADERS(src/dir1 src/dir2 ... src/dirN , foo)

Then the symlinks will be created under the `$(top_srcdir)/include/foo' directory instead. This can be significantly useful if you have very many header files to install and you'd like to call them something like:

#include <foo/file1.h>

During compilation, when you try to

LF_SET_INCLUDES

This macro will cause the `Makefile.am' variable $(default_includes) to contain the correct collection of -I flags, such that the include files are accessible. If you invoke it with no arguments as

LF_SET_INCLUDES

then the following assignment will take place:

default_includes = -I$(prefix) -I$(top_srcdir)/include

If you invoke it with arguments:

LF_SET_INCLUDES(dir1 dir2 ... dirN)

then the following assignment will take place instead:

default_includes = -I$(prefix) -I$(top_srcdir)/include/dir1 \
                   -I$(top_srcdir)/include/dir2 ...         \
                   -I$(top_srcdir)/include/dirN

You may use this variable as part of your INCLUDES assignment.

A typical use of this system involves invoking

LF_LINK_HEADERS(src/dir1 src/dir2 ... src/dirN)
LF_SET_INCLUDES

in your `configure.in' and adding the following two lines in your `Makefile.am':

INCLUDES = $(default_includes)
EXTRA_DIST = Headers

The variable $(default_includes) will be assigned by the configure script to point to the Right Thing. You will also need to include a file called `Headers' in every directory level that you mention in LF_LINK_HEADERS containing the public header files that you wish to symlink. The filenames need to be separated by carriage returns in the `Headers' file. You also need to mention these public header files in a

include_HEADERS = foo1.h foo2.h ...

assignment, in your `Makefile.am', to make sure that they are installed.

With this usage, other programs can access the installed header files as:

#include <foo1.h>

Other directories within the same package can access the uninstalled yet header files in exactly the same manner. Finally, in the same directory you should access the header files as

#include "foo1.h"

This will force the header file in the current directory to be installed, even when there is a similar header file already installed. This is very important when you are rebuilding a new version of an already installed library. Otherwise, building might be confused if your code tries to include the already installed, and not up-to-date, header files from the older version.

Alternatively, you can categorize the header files under a directory, by invoking

LF_LINK_HEADERS(src/dir1 src/dir2 , name1)
LF_LINK_HEADERS(src/dir3 src/dir4 , name2)
LF_SET_INCLUDES(name1 name2)

in your `configure.in'. In your `Makefile.am' files you still add the same two lines:

INCLUDES = $(default_includes)
EXTRA_DIST = Headers

and maintain the `Headers' file as before. However, now the header files will be symlinked to subdirectories of `$(top_srcdir)/include'. This means that although uninstalled header files in all directories must be included by code in the same directory as:

#include "header.h"

code in other directories must access these uninstalled header files as

#include <name1/header.h>

if the header file is under `src/dir1' or `src/dir2' or as

#include <name2/header.h>

if the header file is under `src/dir3' or `src/dir4'. It follows that you probably intend for these header files to be installed correspondingly in such a manner so that other programs can also include them the same way. To accomplish that, under `src/dir1' and `src/dir2' you should list the header files in your `Makefile.am' like this:

name1dir = $(includedir)/name1
name1_HEADERS = header.h ...

and under `src/dir3' and `src/dir4' like this:

name2dir = $(includedir)/name2
name2_HEADERS = header.h

One disadvantage of this approach is that the source tree is modified during configure-time, even during a VPATH build. Some may not like that, but it suits me just fine. Unfortunately, because Automake requires the GNU compiler to compute dependencies, the header files need to be placed in a constant location with respect to the rest of the source code. If a mkdep utility were to be distributed by Automake to compute dependencies when the installer installs the software and not when the developer builds a source code distribution, then it would be possible to allow the location of the header files to be dynamic. If that development ever takes place in Automake, Autotools will immediate follow. If you really don't like this, then don't use this feature.

Usually, if you are installing one or two header files per library you want them to be installed under $(includedir) and be includable with

#include <foo.h>

On the other hand, there are many applications that install a lot of header files, just for one library. In that case, you should put them under a prefix and let them be included as:

#include <prefix/foo.h>

Examples of libraries doing this X11 and Mesa.

This mechanism for tracking include files is most useful for very large projects. You may not want to bother for simple homework-like throwaway hacks. When a project starts to grow, it is very easy to switch.

Fortran with Autoconf

Introduction to Fortran support

This chapter is devoted to Fortran. We will show you how to build programs that combine Fortran and C or C++ code in a portable manner. The main reason for wanting to do this is because there is a lot of free software written in Fortran. If you browse `http://www.netlib.org/' you will find a repository of lots of old, archaic, but very reliable free sources. These programs encapsulate a lot of experience in numerical analysis research over the last couple of decades, which is crucial to getting work done. All of these sources have been written in Fortran. As a developer today, if you know other programming languages, it is unlikely that you will want to write original code in Fortran. You may need, however, to use legacy Fortran code, or the code of a neighbour who still writes in Fortran.

The most portable way to mix Fortran with your C/C++ programs is to translate the Fortran code to C with the `f2c' compiler and compile everything with a C/C++ compiler. The `f2c' compiler is available at `http://www.netlib.org/' but as we will soon explain, it is also distributed with the `autotools' package. Another alternative is to use the GNU Fortran compiler `g77' with `g++' and `gcc'. This compiler is portable among many platforms, so if you want to use a native Fortran compiler without sacrificing portability, this is one way to do it. Another way is to use your OS's native Fortran compiler, which is usually called `f77', if it is compatible with `g77' and `f77'. Because performance is also very important in numerical codes, a good strategy is to prefer to use the native compiler if it is compatible, and support `g77' as a fall-back option. Because many sysadmins don't install `g77' supporting `f2c' as a third fall-back is also a good idea.

Autotools provides support for configuring and building source code written in part or in whole in Fortran. The implementation is based on the build system used by GNU Octave, which has been generalized for use by any program.

Fortran compilers and linkage

The traditional Hello world program in Fortran looks like this:

c....:++++++++++++++=
      PROGRAM MAIN
      PRINT*,'Hello World!'
      END

All lines that begin with `c' are comments. The first line is the equivalent of main() in C. The second line says hello, and the third line indicates the end of the code. It is important that all command lines are indented by 7 spaces, otherwise the compiler will issue a syntax error. Also, if you want to be ANSI compliant, you must write your code all in caps. Nowadays most compilers don't care, but some may still do.

To compile this with `g77' (or `f77') you do something like:

% g77 -o hello hello.f
% hello

To compile it with the f2c translator:

% f2c hello.f
% gcc -o hello hello.c -lf2c -lm

where `-lf2c' links in the translator's system library. In order for this to work, you will have to make sure that the header file f2c.h is present since the translated code in `hello.c' includes it with a statement like

#include "f2c.h"

which explicitly requires it to be present in the current working directory.

In this case, the `main' is written in Fortran. However most of the Fortran you will be using will actually be subroutines and functions. A subroutine looks like this:

c....:++++++++++++++
      SUBROUTINE FHELLO (C)
      CHARACTER *(*) C
      PRINT*,'From Fortran: ',C
      RETURN
      END

This is the analog of a `void' function in C, because it takes arguments but doesn't return anything. The prototype declaration is K&R style: you list all the arguments in parenthesis, seperated with commas, and you declare the types of the variables in the subsequent lines.

Suppose that this subroutine is saved as `fhello.f'. To call it from C you need to know what it looks like from the point of the C compiler. To find out type:

% f2c -P fhello.f
% cat fhello.P

You will find that this subroutine has the following prototype declaration:

extern int fhello_(char *c__, ftnlen c_len);

It may come as a surprise, and this is a moment of revelation, but although in Fortran it appears that the subroutine is taking one argument, in C it appears that it takes two! And this is what makes it difficult to link code in a portable manner between C and Fortran. In C, everything is what it appears to be. If a function takes two arguments, then this means that down to the machine language level, there is two arguments that are being passed around. In Fortran, things are being hidden from you and done in a magic fashion. The Fortran programmer thinks that he is passing one argument, but the compiler compiles code that actually passes two arguments around. In this particular case, the reason for this is that the argument you are passing is a string. In Fortran, strings are not null-terminated, so the `f2c' compiler passes the length of the string as an extra hidden argument. This is called the linkage method of the compiler. Unfortunately, linkage in Fortran is not standard, and there exist compilers that handle strings differently. For example, some compilers will prepend the string with a few bytes containing the length and pass a pointer to the whole thing. This problem is not limitted to strings. It happens in many other instances. The `f2c' and `g77' compilers follow compatible linkage, and we will use this linkage as the ad-hoc standard. A few proprietary Fortran compilers like the Dec Alpha `f77' and the Irix `f77' are also `f2c'-compatible. The reason for this is because most of the compiler developers derived their code from `f2c'. So although a standard was not really intended, there we have one anyway.

A few things to note about the above prototype declaration is that the symbol `fhello' is in lower-case, even though in Fortran we write everything uppercase, and it is appended with an underscore. On some platforms, the proprietary Fortran compiler deviates from the `f2c' standard either by forcing the name to be in upper-case or by omitting the underscore. Fortunately, these cases can be detected with Autoconf and can be worked around with conditional compilation. However, beyond this, other portability problems, such as the strings issue, are too involved to deal with and it is best in these cases that you fall back to `f2c' or `g77'. A final thing to note is that although `fhello' doesn't return anything, it has return type `int' and not `void'. The reason for this is that `int' is the default return type for functions that are not declared. Therefore, to prevent compilation problems, in case the user forgets to declare a Fortran function, `f2c' uses `int' as the return type for subroutines.

In Fortran parlance, a subroutine is what we'd call a `void' function. To Fortran programmers in order for something to be a function it has to return something back. This reflects on the syntax. For example, here's a function that adds two numbers and returns the result:

c....:++++++++++++++++
      DOUBLE PRECISION FUNCTION ADD(A,B)
      DOUBLE PRECISION A,B
      ADD = A + B
      RETURN
      END

The name of the function is also the name of the return variable. If you run this one through `f2c -P' you will find that the C prototype is:

extern doublereal add_(doublereal *a, doublereal *b);

There's plenty of things to note here:

The typenames being used are funny. `doublereal'? what's that!? These are all defined in a header file called `f2c.h' which you are supposed to include in your source code before declaring any prototypes. We will show you how this is all done in the next section. The following table showes the types that are most likely to interest you. For more info, take a look at the `f2c.h' file itself:
```
  integer         -> int
  real            -> float
  doublereal      -> double
  complex         -> struct { real r,i; };
  doublecomplex   -> struct { doublereal r,i; };
```
The arguments are passed by pointer. In Fortran all arguments are passed by reference. The `f2c' compiler implements this by passing the arguments by pointer. On the C/C++ level you may want to wrap the fortran routine with another routine so that you don't have to directly deal with pointers all of the time.
The value returned now is not an `int' but `doublereal'. Of course, the name of the function is lower-case, as always, and there is an underscore at the end.

A more interesting case is when we deal with complex numbers. Consider a function that multiplies two complex numbers:

c....:++++++++++++++++++++++++++++++
      COMPLEX*16 FUNCTION MULT(A,B)
      COMPLEX*16 A,B
      MULT = A*B
      RETURN
      END

As it turns out, the prototype for this function is:

extern Z_f mult_(doublecomplex *ret_val, doublecomplex *a, doublecomplex *b);

Because complex numbers are not a native type in C, they can not be returned efficiently without going through at least one copy. Therefore, for this special case the return value is placed as the first argument in the prototype! Actually despite many people's feelings that Fortran must die, it is still the best tool to use to write optimized functions that are heavy on complex arithmetic.

Walkthrough a simple example

Now that we have brought up some of the issues about Fortran linkage, let's show you how to work around them. We will write a simple Fortran function, and a C program that calls it, and then show you how to turn these two into a GNU-like package, enhanced with a configure script and the works. This discussion assumes that you have installed the utilities in `autotools', the package with which this tutorial is being distributed.

First, begin by building a directory for your new package. Because this project will involve Fortran, you need to pass the `-f' flag to `acmkdir':

% acmkdir -t fortran foo

The `-t' flag directs `acmkdir' to unpack a copy of the `f2c' translator and to build proper toplevel `configure.in' and `Makefile.am' files. This will take a while, so relax and stretch a little bit.

Now enter the `foo-0.1' directory and look around:

% cd foo-0.1
% cat configure.in
AC_INIT
AM_CONFIG_HEADER(config.h)
AM_INIT_AUTOMAKE(hello,0.1)
LF_CONFIGURE_CC
LF_CONFIGURE_CXX
AC_PROG_RANLIB
LF_HOST_TYPE
LF_PROG_F77_PREFER_F2C_COMPATIBILITY
dnl LF_PROG_F77_PREFER_NATIVE_VERSION
LF_PROG_F77
LF_SET_WARNINGS
AC_CONFIG_SUBDIRS(fortran/f2c fortran/libf2c)
AC_OUTPUT([Makefile fortran/Makefile f2c_comp
        doc/Makefile m4/Makefile src/Makefile ])
   
% cat Makefile.am
EXTRA_DIST = reconf configure
SUBDIRS = fortran m4 doc src

There are some new macros in `configure.in' and a new subdirectory: `fortran'. There is also a file that looks like a shell script called `f2c_comp.in'. We will discuss the gory details about all this in the next section. Now let's write the code. Enter the `src' directory and type:

$ cd src
$ mkf2c

This creates the following files:

`f2c.h'

This is the header file that we alluded to in the previous section. It needs to be present on all directory levels that contain Fortran code. It defines all the funny typenames that appear in `f2c' compatible prototype declarations.

`f2c-main.c'

This file contains some silly definitions. You need to link it in whenever you link together a program, but don't add it to various libraries, because then, when you link some of the libraries together you will get error messages for duplicate symbols. The contents of this file are:

#ifdef __cplusplus
extern "C" {
#endif

#if defined (sun)
int MAIN_ () { return 0; }
#elif defined (linux) && defined(__ELF__)
int MAIN__ () { return 0; }
#endif

#ifdef __cplusplus
}
#endif

Now, time to write some code:

$ vi fhello.f
$ vi hello.cc

with

`fhello.f'

c....:++++++++++++++++++++++++++++++
      SUBROUTINE FHELLO (C)
      CHARACTER *(*) C
      PRINT*,'From Fortran: ',C
      RETURN
      END

`hello.cc'

#ifdef HAVE_CONFIG_H
#include <config.h>
#endif
#include <string.h>
#include "f2c.h"
#include "f77-fcn.h"

extern "C"
{
 extern int f77func(fhello,FHELLO)(char *c__, ftnlen c_len);
}

main()
{
 char s[30];
 strcpy(s,"Hello world!");
 f77func(fhello,FHELLO)(s,ftnlen(strlen(s)));
}

The definition of the f77func macro is included in `acconfig.h' automatically for you if the LF_CONFIGURE_FORTRAN macro is included in your `configure.in'. The definition is as follows:

#ifndef f77func
#if defined (F77_APPEND_UNDERSCORE)
#  if defined (F77_UPPERCASE_NAMES)
#    define f77func(f, F) F##_
#  else
#    define f77func(f, F) f##_
#  endif
#else
#  if defined (F77_UPPERCASE_NAMES)
#    define f77func(f, F) F
#  else
#    define f77func(f, F) f
#  endif
#endif
#endif

Recall that we said that the issue of whether to add an underscore and whether to capitalize the name of the routine can be dealt with conditional compilation. This macro is where this conditional compilation happens. The LF_PROG_F77 macro will define

F77_APPEND_UNDERSCORE
F77_UPPERCASE_NAMES

appropriately so that f77func does the right thing.

To compile this, create a `Makefile.am' as follows:

SUFFIXES = .f
.f.o:
        $(F77) -c $<
         
bin_PROGRAMS = hello
hello_SOURCES = hello.cc fhello.f f2c.h f2c-main.c
hello_LDADD = $(FLIBS)

Note that the above `Makefile.am' is only compatible with version 1.3 of Automake, or newer versions. The previous versions don't grok Fortran filenames on the `hello_SOURCES' so you may want to upgrade.

Now you can compile and run the program:

$ cd ..
$ reconf
$ configure
$ make
$ src/hello
 From Fortran: Hello world!

If you have a native `f77' compiler that was used, or the portable `g77' compiler you missed out the coolness of using `f2c'. In order to check that out do:

$ make distclean
$ configure --with-f2c
$ make

and witness the beauty! The package will begin by building an `f2c' binary for your system. Then it will build the Fortran libraries. And finally, it will build the hello world program which you can run as before:

$ src/hello

It may seem an overkill to carry around a Fortran compiler. On the other hand you will find it very convenient, and the `f2c' compiler isn't really that big. If you are spoiled on a system that is well equiped and with a good system administrator, you may find it a nasty surprise one day when you discover that the rest of the world is not necessarily like that.

If you download a real Fortran package from Netlib you might find it very annoying having to enter the filenames for all the Fortran files in `*_SOURCES'. A work-around is to put all these files in their own directory and then do this awk trick:

% ls *.f | awk '{ printf("%s ", $1) }' > tmp

The awk filter will line-up the output of ls in one line. You can use your editor to insert its contents to your `Makefile.am'. Eventually I may come around to write a utility for doing this automagically.

The gory details

The best way to get started is by building the initial directory tree with `acmkdir' like this:

% acmkdir -t fortran <directory-filename>

This will install all the standard stuff. It will also install a directory called `fortran' containing a copy of the f2c compiler and `f2c_comp', a shell script invoking the compiler in a way that it looks the same as invoking a real compiler

The file `configure.in' uses the following special macros:

LF_PROG_F77_PREFER_F2C_COMPATIBILITY

This macro directs Autoconf that the user prefers f2c compatibility over performance. In general Fortran programmers are willing to sacrifice everything for the sake of performance. However, if you want to use Fortran code with C and C++ code, you will have many reasons to also give importance to f2c compatibility. Use this macro to state this preference. The effect is that if the installer's platform has a native Fortran compiler installed, it will be used only if it is f2c compatible. This macro must be invoked before invoking LF_PROG_F77.

LF_PROG_F77_PREFER_NATIVE_VERSION

This macro directs Autoconf that the user prefers performance and doesn't care about f2c compatibility. You may want to invoke this instead if your entire program is written in Fortran. This macro must be invoked before invoking LF_PROG_F77.

LF_PROG_F77

This macro probes the installer platform for an appropriate Fortran compiler. It exports the following variables to Automake:

`F77': The name of the Fortran compiler
`FFLAGS': Flags for the Fortran compiler
`FLIBS': The link sequence for the compiler runtime libraries

It also checks whether the compiler appends underscores to the symbols and whether the symbols are written in lowercase or uppercase characters and defines the following preprocessor macros:

F77_APPEND_UNDERSCORE: Define if the compiler appends an underscore to the symbol names.
F77_UPPERCASE_NAMES: Define if the compiler uses uppercase for symbol names.

These macros are used to define `f77func' macro which takes two arguments; the name of the Fortan subroutine or function in lower case, and then in upper case, and returns the correct symbol name to use for invoking it from C or C++. To obtain the calling sequence for the symbol do:

% f2c -P foo.f

on the file containing the subroutine and examine the file `foo.P'. In order for this macro to work properly you must precede it with calls to

AC_PROG_CC
AC_PROG_RANLIB
LF_HOST_TYPE

You also need to call one of the two *_PREFER_* macros. The default is to prefer f2c compatibility.

In addition to invoking all of the above, you need to make provision for the bundled fortran compiler by adding the following lines at the end of your `configure.in':

AC_CONFIG_SUBDIRS(fortran/f2c fortran/libf2c)
AC_OUTPUT([Makefile fortran/Makefile f2c_comp
           doc/Makefile m4/Makefile src/Makefile])

The AC_CONFIG_SUBDIRS macro directs `configure' to execute the configure scripts in `fortran/f2c' and `fortran/libf2c'. The stuff in AC_OUTPUT that are important to Fortran support are building `fortran/Makefile' and `f2c_comp'. Because, `f2c_comp' is mention in AC_OUTPUT, Automake will automagically bundle it when you build a source code distribution.

If you have originally set up your directory tree for a C or C++ only project and later you realize that you need to also use Fortran, you can upgrade your directory tree to Fortran as follows:

Create the `fortran' directory by invoking
```
% mkfortran
```
and the `f2c_oomp' by invoking
```
% mkf2c_comp
```
both on the toplevel directory level.
Add the following macro invokations in the middle of `configure.in', in this order:
```
AC_PROG_CC
AC_PROG_RANLIB
LF_HOST_TYPE
LF_PROG_F77_PREFER_F2C_COMPATIBILITY
LF_PROG_F77
```
If you have invoked LF_CONFIGURE_CC then there is no need to invoke AC_PROG_CC again.
Add the following invokation just before AC_OUTPUT:
```
AC_CONFIG_SUBDIRS([fortran/f2c fortran/libf2c])
```
and add the following files to AC_OUTPUT:
```
fortran/Makefile f2c_comp
```
Rebuild with
```
% make distclean
% ./reconf
% ./configure
% make
```
It is important to call `reconf' for the changes to take effect.

If a directory level contains Fortran source code, then it is important to let Automake know about it by adding the following lines in the beginning.

SUFFIXES = .f
.f.o:
        $(F77) -c $<

This is pretty much the same idea with the embedded text compiler. You can list the Fortran source code filenames in the SOURCES assignments together with your C and C++ code. To link executables, you must add $(FLIBS) to LDADD and link against `f2c-main.c' just as in the hello world example. Please do not include `f2c-main.c' in any libraries however.

Now consider the file `hello.cc' line by line. First we include the standard configuration stuff:

#ifdef HAVE_CONFIG_H
#include <config.h>
#endif
#include <string.h>

Then we include the Fortran related header files:

#include "f2c.h"

Then we declare the prototypes for the Fortran subroutine:

extern "C"
{
 extern int f77func(fhello,FHELLO)(char *c__, ftnlen c_len);
}

There is a few things to note here:

You should never ever declare Fortran prototypes in header files. The definition of prototypes is dependent on the `f77func' macro, which is dependent on a correct definition of the `F77_APPEND_UNDERSCORE' and `F77_UPPERCASE_NAMES' C preprocessor macros. You should be nice and not force people who don't want to use Fortran to have to get these macros defined in order to use your header files. If you want to export functionality written in Fortran to C users, then wrap the Fortran subroutines and functions with corresponding C subroutines and functions and export these instead on your header file.
You should never use the actual name of the Fortran routine in C, because doing so is not portable. Instead you should use the `f77func' macro, which takes two arguments: the name of the routine in small letters, and the name of the routine in all-caps. The right name of the two is chosen, and the underscore is appended if necessary.
If the source file in which you are writing down the Fortran declarations is compiled with a C++ compiler, as is true in this case, then you must surround the declarations with:
```
extern "C"
{
}
```
The C++ language uses name mangling to support function overloading. This means that if you have two C++ functions called:
```
int foo(double x);
int foo(double x,double y);
```
the C++ compiler internally assigns them different names in an intelligent fashion to avoid conflict. Just like the Fortran compiler does things behind your back, so does the C++ compiler to support some of its special features. Any code written between `extern "C"' is compiled with name mangling disabled. This is necessary for the Fortran declarations because we don't want the names of the Fortran subroutines to be mangled.
When we actually invoke the Fortran subroutine in the `main' we make sure to type-cast all the types to what appears on the prototype above. In this case we do:
```
f77func(fhello,FHELLO)(s,ftnlen(strlen(s)));
```
This may seem pedantic but it is necessary for the C++ compiler, and it is a good habit even for C programmers. Since Fortran routines are supposed to be wrapped, this is not too much to ask.
You need to be very careful with integers. On some systems `long int' is 64 bit and `int' is 32 bit. This means that you should avoid `long int' like the plague! The safest way to deal with integers is to make sure that you always cast your integer stuff to integer explicitly. Unfortunately the standard header file distributed with f2c defines integer as long int to account for 16-bit machines. That's a bad idea, and on the 64-bit Dec Alpha it is a bug. The header file distributed with `mkf2c' does the right thing.
Make sure to list both `f2c.h' and `f2c-main.c' in SOURCES assignments on your `Makefile.am' to make sure that they are included in the source code distribution.

Portability problems with Fortran

Fortran is infested with portability problems. There exist two important Fortran standards: one that was written in 1966 and one that was written in 1977. The 1977 standard is considered to be the standard Fortran. Most of the Fortran code is written by scientists who have never had any formal training in computer programming. As a result, they often write code that is dependent on vendor-extensions to the standard, and not necessarily easy to port. The standard itself is to blame as well, since it is sorely lacking in many aspects. For example, even though standard Fortran has both REAL and DOUBLE PRECISION data types (corresponding to float and double) the standard only supports single precision complex numbers (COMPLEX). Since many people will also want double precision complex numbers, many vendors provided extensions. Most commonly, the double precision complex number is called COMPLEX*16 but you might also see it called DOUBLE COMPLEX. Other such vendors extensions include providing a flush operation of some sort for file I/O, and other such esoteric things.

To make things worse (or better) now there are two more standards out there: the 1990 standard and the 1995 standard. A 2000 standard is also at work. Fortran 90 and its successors try to make Fortran more like C and C++, and even though there are no free compilers for both variants, they are becoming alarmingly popular with the scientific community. In fact, I think that the main reason why these variants of Fortran are being developed is to make more bussiness for proprietary compiler developers. So far as I know, Fortran 90 does not provide any features that C++ can not support with a class library extension. Moreover Fortran 90 does not have the comprehensive foundation that allows C++ to be a self-extensible language. This makes it less worthwhile to invest effort on Fortran 90, because it means that eventually people will want features that can only be implemented by redefining the language and rewriting the compilers for it. Instead, in C++, you can add features to the language simply by writing C++ code, because it has enough core features to allow virtually unlimited self-extensibility.

If your primary interest is portability and free software, you should stay away from Fortran 90 as well as Fortran 95, until someone writes a free compiler for them. You will be better off developing in C++ and only migrating to Fortran 77 the parts that are performance critical. This way you get the best of both worlds.

On the flip side, if you limit your Fortran code just to number-crunching, then it becomes much easier to write portable code. There are still a few things you should take into account however. Some Fortran code has been written in the archaic 1966 style. An example of such code is the fftpack package from netlib. The main problems with such code are the following:

Implicit types: In Fortran 66, programmers were too lazy to define the types of their variables. The idea was that the type was inferred by the first letter of the variable name. That's horror for you! The convention then is that all variables with initial I,J,...,N are type INTEGER. All others are REAL To compile this code with modern compilers it is necessary to add the following line to every source file:
```
IMPLICIT DOUBLE PRECISION (A-H,O-Z)
```
This instructs the compiler to do the right thing, which is to implicitly assume that all variables starting with A-H and O-Z are double precision and all other variables are integers. Alternatively you can say
```
IMPLICIT REAL (A-H,O-Z)
```
but it is very rarely that you will ever want to go with single precision. Occasionally, you may find that the programmer breaks the rules. For example, in fftpack the array IFAC is supposed to be a double even though implicitly it is suggested to be an int. Such inconstances will probably show up in compiler errors. To fix them, declare the type of these variables explicitly. If it's an array then you do it like this:
```
DOUBLE PRECISION IFAC(*)
```
If the variable also appears in a DIMENSION declaration, then you should remove it from the declaration since the two can't coexist in some compilers.
Pseudo-pointers: In archaic Fortran, a dimension declaration of the form:
```
DIMENSION C(1)
```
means that C has an unknown length, instead of meaning that it has length 1. In modern Fortran, this is an unacceptable notation and modern compilers do get confused over it. So all such instances must be replaced with the correct form which is:
```
DIMENSION C(*)
```
Such "arrays" in reality are just pointers. The user can reference the array as far as he likes, but of course, if he takes it too far, the program will either do the Wrong Thing or crash with a segmentation fault.
Constants: A most insidious problem has to do with constants and it is confined, to the best of my knowledge, to the GNU Fortran compiler, but it could very well be a problem in other compilers to which I have no access to. Constants tend to appear in `DATA' statements or variable assignments. The problem is that whenever a constant is in use, the context is never a determining factor for the type of the constant, unlike C which does automatic casting. Examples: `1' is always type INTEGER, `9.435784839284958' is always type REAL (even if the additional precision specified is lost, and even when used in a `DOUBLE PRECISION' context such as being assigned to a `DOUBLE PRECISION' variable!). On the other hand, 1E0 is always REAL and 1D0 is always `DOUBLE PRECISION'. If you want your code to be exclusively double precision, then you should scan the entire source for constants, and make sure that they all have the D0 suffix at the end. Many compilers will tolerate this omission while others will not and go ahead and introduce single precision error to your computations leading to hard to find bugs.

In general the code in http://www.netlib.org/ is very reliable and portable, but you do need to keep your eyes open for little problems like the above.

The appendices

Philosophical issues

The GNU development tools were written primarily to aid the development of free software. Even though software development is mainly a technical issue, the free software movement has always be driven by many philosophical concerns as well.

In this appendix we include a few articles written by Richard Stallman that discuss these concerns. The text of these articles is copyrighted and included here with permission from the following terms:

Copying Notice

Copyright (C) 1998 Free Software Foundation Inc
59 Temple Place, Suite 330, Boston, MA 02111, USA
Verbatim copying and distribution is permitted in any medium,
provided this notice is preserved.

With the advent of the Linux movement, many people nowadays use free software without being informed of the philosophy and culture and its importance. It is our hope that by including some of these articles here, we'll help spread the word.

All of these articles, and others are also distributed on the web at:
@uref{http://www.gnu.org/philosophy/index.html}

Why software should not have owners

Digital information technology contributes to the world by making it easier to copy and modify information. Computers promise to make this easier for all of us.

Not everyone wants it to be easier. The system of copyright gives software programs "owners", most of whom aim to withhold software's potential benefit from the rest of the public. They would like to be the only ones who can copy and modify the software that we use.

The copyright system grew up with printing--a technology for mass production copying. Copyright fit in well with this technology because it restricted only the mass producers of copies. It did not take freedom away from readers of books. An ordinary reader, who did not own a printing press, could copy books only with pen and ink, and few readers were sued for that.

Digital technology is more flexible than the printing press: when information has digital form, you can easily copy it to share it with others. This very flexibility makes a bad fit with a system like copyright. That's the reason for the increasingly nasty and draconian measures now used to enforce software copyright. Consider these four practices of the Software Publishers Association (SPA):

Massive propaganda saying it is wrong to disobey the owners to help your friend.
Solicitation for stool pigeons to inform on their coworkers and colleagues.
Raids (with police help) on offices and schools, in which people are told they must prove they are innocent of illegal copying.
Prosecution (by the US government, at the SPA's request) of people such as MIT's David LaMacchia, not for copying software (he is not accused of copying any), but merely for leaving copying facilities unguarded and failing to censor their use.

All four practices resemble those used in the former Soviet Union, where every copying machine had a guard to prevent forbidden copying, and where individuals had to copy information secretly and pass it from hand to hand as "samizdat". There is of course a difference: the motive for information control in the Soviet Union was political; in the US the motive is profit. But it is the actions that affect us, not the motive. Any attempt to block the sharing of information, no matter why, leads to the same methods and the same harshness.

Owners make several kinds of arguments for giving them the power to control how we use information:

Name calling: Owners use smear words such as "piracy" and "theft", as well as expert terminology such as "intellectual property" and "damage", to suggest a certain line of thinking to the public--a simplistic analogy between programs and physical objects. Our ideas and intuitions about property for material objects are about whether it is right to take an object away from someone else. They don't directly apply to making a copy of something. But the owners ask us to apply them anyway.
Exaggeration: Owners say that they suffer "harm" or "economic loss" when users copy programs themselves. But the copying has no direct effect on the owner, and it harms no one. The owner can lose only if the person who made the copy would otherwise have paid for one from the owner. A little thought shows that most such people would not have bought copies. Yet the owners compute their "losses" as if each and every one would have bought a copy. That is exaggeration--to put it kindly.
The law: Owners often describe the current state of the law, and the harsh penalties they can threaten us with. Implicit in this approach is the suggestion that today's law reflects an unquestionable view of morality--yet at the same time, we are urged to regard these penalties as facts of nature that can't be blamed on anyone. This line of persuasion isn't designed to stand up to critical thinking; it's intended to reinforce a habitual mental pathway. It's elementary that laws don't decide right and wrong. Every American should know that, forty years ago, it was against the law in many states for a black person to sit in the front of a bus; but only racists would say sitting there was wrong.
Natural rights: Authors often claim a special connection with programs they have written, and go on to assert that, as a result, their desires and interests concerning the program simply outweigh those of anyone else--or even those of the whole rest of the world. (Typically companies, not authors, hold the copyrights on software, but we are expected to ignore this discrepancy.) To those who propose this as an ethical axiom--the author is more important than you--I can only say that I, a notable software author myself, call it bunk. But people in general are only likely to feel any sympathy with the natural rights claims for two reasons. One reason is an overstretched analogy with material objects. When I cook spaghetti, I do object if someone else eats it, because then I cannot eat it. His action hurts me exactly as much as it benefits him; only one of us can eat the spaghetti, so the question is, which? The smallest distinction between us is enough to tip the ethical balance. But whether you run or change a program I wrote affects you directly and me only indirectly. Whether you give a copy to your friend affects you and your friend much more than it affects me. I shouldn't have the power to tell you not to do these things. No one should. The second reason is that people have been told that natural rights for authors is the accepted and unquestioned tradition of our society. As a matter of history, the opposite is true. The idea of natural rights of authors was proposed and decisively rejected when the US Constitution was drawn up. That's why the Constitution only permits a system of copyright and does not require one; that's why it says that copyright must be temporary. It also states that the purpose of copyright is to promote progress--not to reward authors. Copyright does reward authors somewhat, and publishers more, but that is intended as a means of modifying their behavior. The real established tradition of our society is that copyright cuts into the natural rights of the public--and that this can only be justified for the public's sake.
Economics The final argument made for having owners of software is that this leads to production of more software. Unlike the others, this argument at least takes a legitimate approach to the subject. It is based on a valid goal--satisfying the users of software. And it is empirically clear that people will produce more of something if they are well paid for doing so. But the economic argument has a flaw: it is based on the assumption that the difference is only a matter of how much money we have to pay. It assumes that "production of software" is what we want, whether the software has owners or not. People readily accept this assumption because it accords with our experiences with material objects. Consider a sandwich, for instance. You might well be able to get an equivalent sandwich either free or for a price. If so, the amount you pay is the only difference. Whether or not you have to buy it, the sandwich has the same taste, the same nutritional value, and in either case you can only eat it once. Whether you get the sandwich from an owner or not cannot directly affect anything but the amount of money you have afterwards. This is true for any kind of material object--whether or not it has an owner does not directly affect what it is, or what you can do with it if you acquire it. But if a program has an owner, this very much affects what it is, and what you can do with a copy if you buy one. The difference is not just a matter of money. The system of owners of software encourages software owners to produce something--but not what society really needs. And it causes intangible ethical pollution that affects us all. What does society need? It needs information that is truly available to its citizens--for example, programs that people can read, fix, adapt, and improve, not just operate. But what software owners typically deliver is a black box that we can't study or change. Society also needs freedom. When a program has an owner, the users lose freedom to control part of their own lives. And above all society needs to encourage the spirit of voluntary cooperation in its citizens. When software owners tell us that helping our neighbors in a natural way is "piracy", they pollute our society's civic spirit. This is why we say that free software is a matter of freedom, not price. The economic argument for owners is erroneous, but the economic issue is real. Some people write useful software for the pleasure of writing it or for admiration and love; but if we want more software than those people write, we need to raise funds. For ten years now, free software developers have tried various methods of finding funds, with some success. There's no need to make anyone rich; the median US family income, around $35k, proves to be enough incentive for many jobs that are less satisfying than programming. For years, until a fellowship made it unnecessary, I made a living from custom enhancements of the free software I had written. Each enhancement was added to the standard released version and thus eventually became available to the general public. Clients paid me so that I would work on the enhancements they wanted, rather than on the features I would otherwise have considered highest priority. The Free Software Foundation (FSF), a tax-exempt charity for free software development, raises funds by selling GNU CD-ROMs, T-shirts, manuals, and deluxe distributions, (all of which users are free to copy and change), as well as from donations. It now has a staff of five programmers, plus three employees who handle mail orders. Some free software developers make money by selling support services. Cygnus Support, with around 50 employees [when this article was written], estimates that about 15 per cent of its staff activity is free software development--a respectable percentage for a software company. Companies including Intel, Motorola, Texas Instruments and Analog Devices have combined to fund the continued development of the free GNU compiler for the language C. Meanwhile, the GNU compiler for the Ada language is being funded by the US Air Force, which believes this is the most cost-effective way to get a high quality compiler. [Air Force funding ended some time ago; the GNU Ada Compiler is now in service, and its maintenance is funded commercially.] All these examples are small; the free software movement is still small, and still young. But the example of listener-supported radio in this country [the US] shows it's possible to support a large activity without forcing each user to pay.

As a computer user today, you may find yourself using a proprietary program. If your friend asks to make a copy, it would be wrong to refuse. Cooperation is more important than copyright. But underground, closet cooperation does not make for a good society. A person should aspire to live an upright life openly with pride, and this means saying "No" to proprietary software.

You deserve to be able to cooperate openly and freely with other people who use software. You deserve to be able to learn how the software works, and to teach your students with it. You deserve to be able to hire your favorite programmer to fix it when it breaks.

You deserve free software.

Categories of software

Here is a glossary of various categories of software that are often mentioned in discussions of free software. It explains which categories overlap or are part of other categories.

Free software: Free software is software that comes with permission for anyone to use, copy, and distribute, either verbatim or with modifications, either gratis or for a fee. In particular, this means that source code must be available. "If it's not source, it's not software." If a program is free, then it can potentially be included in a free operating system such as GNU, or free GNU/Linux systems . There are many different ways to make a program free--many questions of detail, which could be decided in more than one way and still make the program free. Some of the possible variations are described below. Free software is a matter of freedom, not price. But proprietary software companies sometimes use the term "free software" to refer to price. Sometimes they mean that you can obtain a binary copy at no charge; sometimes they mean that a copy is included on a computer that you are buying. This has nothing to do with what we mean by free software in the GNU project. Because of this potential confusion, when a software company says its product is free software, always check the actual distribution terms to see whether users really have all the freedoms that free software implies. Sometimes it really is free software; sometimes it isn't. Many languages have two separate words for "free" as in freedom and "free" as in zero price. For example, French has "libre" and "gratuit". English has a word "gratis" that refers unambiguously to price, but no common adjective that refers unambiguously to freedom. This is unfortunate, because such a word would be useful here. Free software is often more reliable than non-free software.
Open Source software: The term "open source" software is used by some people to mean more or less the same thing as free software.
Public domain software: Public domain software is software that is not copyrighted. It is a special case of non-copylefted free software, which means that some copies or modified versions may not be free at all. Sometimes people use the term "public domain" in a loose fashion to mean "free" or "available gratis." However, "public domain" is a legal term and means, precisely, "not copyrighted". For clarity, we recommend using "public domain" for that meaning only, and using other terms to convey the other meanings.
Copylefted software: Copylefted software is free software whose distribution terms do not let redistributors add any additional restrictions when they redistribute or modify the software. This means that every copy of the software, even if it has been modified, must be free software. In the GNU Project, we copyleft almost all the software we write, because our goal is to give every user the freedoms implied by the term "free software." See Copylefted for more explanation of how copyleft works and why we use it. Copyleft is a general concept; to actually copyleft a program, you need to use a specific set of distribution terms. There are many possible ways to write copyleft distribution terms.
Non-copylefted free software: Non-copylefted free software comes from the author with permission to redistribute and modify, and also to add additional restrictions to it. If a program is free but not copylefted, then some copies or modified versions may not be free at all. A software company can compile the program, with or without modifications, and distribute the executable file as a proprietary software product. The X Window System illustrates this. The X Consortium releases X11 with distribution terms that make it non-copylefted free software. If you wish, you can get a copy which has those distribution terms and is free. However, there are non-free versions as well, and there are popular workstations and PC graphics boards for which non-free versions are the only ones that work. If you are using this hardware, X11 is not free software for you.
GPL-covered software: The GNU GPL is one specific set of distribution terms for copylefting a program. The GNU Project uses it as the distribution terms for most GNU software.
The GNU system: The GNU system is a complete free Unix-like operating system. A Unix-like operating system consists of many programs. We have been accumulating components for this system since 1984; the first test release of a "complete GNU system" was in 1996. We hope that in a year or so this system will be mature enough to recommend it for ordinary users. The GNU system includes all the GNU software, as well as many other packages such as the X Window System and TeX which are not GNU software. Since the purpose of GNU is to be free, every single component in the GNU system has to be free software. They don't all have to be copylefted, however; any kind of free software is legally suitable to include if it helps meet technical goals. We can and do use non-copylefted free software such as the X Window System.
GNU software: GNU software is software that is released under the auspices of the GNU Project. Most GNU software is copylefted, but not all; however, all GNU software must be free software. Some GNU software is written by staff of the Free Software Foundation, but most GNU software is contributed by volunteers. Some contributed software is copyrighted by the Free Software Foundation; some is copyrighted by the contributors who wrote it.
Semi-free software: Semi-free software is software that is not free, but comes with permission for individuals to use, copy, distribute, and modify (including distribution of modified versions) for non-profit purposes. PGP is an example of a semi-free program. Semi-free software is much better than proprietary software, but it still poses problems, and we cannot use it in a free operating system. The restrictions of copyleft are designed to protect the essential freedoms for all users. For us, the only justification for any substantive restriction on using a program is to prevent other people from adding other restrictions. Semi-free programs have additional restrictions, motivated by purely selfish goals. It is impossible to include semi-free software in a free operating system. This is because the distribution terms for the operating system as a whole are the conjunction of the distribution terms for all the programs in it. Adding one semi-free program to the system would make the system as a whole just semi-free. There are two reasons we do not want that to happen:
- We believe that free software should be for everyone--including businesses, not just schools and hobbyists. We want to invite business to use the whole GNU system, and therefore we must not include a semi-free program in it.
- Commercial distribution of free operating systems, including Linux-based GNU systems, is very important, and users appreciate being able to buy commercial CD-ROM distributions. Including one semi-free program in an operating system would cut off commercial CD-ROM distribution for it.
The Free Software Foundation itself is non-commercial, and therefore we would be legally permitted to use a semi-free program "internally". But we don't do that, because that would undermine our efforts to obtain a program which we could also include in GNU. If there is a job that needs doing with software, then until we have a free program to do the job, the GNU system has a gap. We have to tell volunteers, "We don't have a program yet to do this job in GNU, so we hope you will write one." If we ourselves used a semi-free program to do the job, that would undermine what we say; it would take away the impetus (on us, and on others who might listen to our views) to write a free replacement. So we don't do that.
Proprietary software: Proprietary software is software that is not free or semi-free. Its use, redistribution or modification is prohibited, or requires you to ask for permission, or is restricted so much that you effectively can't do it freely. The Free Software Foundation follows the rule that we cannot install any proprietary program on our computers except temporarily for the specific purpose of writing a free replacement for that very program. Aside from that, we feel there is no possible excuse for installing a proprietary program. For example, we felt justified in installing Unix on our computer in the 1980s, because we were using it to write a free replacement for Unix. Nowadays, since free operating systems are available, the excuse is no longer applicable; we have eliminated all our non-free operating systems, and any new computer we install must run a completely free operating system. We don't insist that users of GNU, or contributors to GNU, have to live by this rule. It is a rule we made for ourselves. But we hope you will decide to follow it too.
Freeware: The term "freeware" has no clear accepted definition, but it is commonly used for packages which permit redistribution but not modification (and their source code is not available). These packages are not free software, so please don't use "freeware" to refer to free software.
Shareware: Shareware is software which comes with permission for people to redistribute copies, but says that anyone who continues to use a copy is required to pay a license fee. Shareware is not free software, or even semi-free. There are two reasons it is not:
- For most shareware, source code is not available; thus you cannot modiy the program at all.
- Shareware does not come with permission to make a copy and install it without paying a license fee, not even for individuals engaging in nonprofit activity. (In practice, people often disregard the distribution terms and do this anyway, but the terms don't permit it.)
Commercial Software: Commercial software is software being developed by a business which aims to make money from the use of the software. "Commercial" and "proprietary" are not the same thing! Most commercial software is proprietary , but there is commercial free software, and there is non-commercial non-free software. For example, GNU Ada is always distributed under the terms of the GNU GPL, and every copy is free software; but its developers sell support contracts. When their salesmen speak to prospective customers, sometimes the customers say, "We would feel safer with a commercial compiler." The salesmen reply, "GNU Ada is a commercial compiler; it happens to be free software." For the GNU Project, the emphasis is in the other order: the important thing is that GNU Ada is free software; whether it is commercial is not a crucial question. However, the additional development of GNU Ada that results from the business that supports it is definitely beneficial.

Confusing words

There are a number of words and phrases which we recommend avoiding, either because they are ambiguous or because they imply an opinion that we hope you may not entirely agree with.

For free: If you want to say that a program is free software, please don't say that it is available "for free." That term specifically means "for zero price." Free software is a matter of freedom, not price. Free software is often available for free--for example, on many FTP servers. But free software copies are also available for a price on CD-ROMs, and proprietary software copies may occasionally be available for free.
Freeware: Please don't use the term "freeware" as a synonym for "free software." The term "freeware" was used often in the 1980s for programs released only as executables, with source code not available. Today it has no clear definition.
Give away software: It's misleading to use the term "give away" to mean "distribute a program as free software." It has the same problem as "for free": it implies the issue is price, not freedom.
Intellectual property: Publishers and lawyers like to describe copyright as "intellectual property." This term carries a hidden assumption--that the most natural way to think about the issue of copying is based on an analogy with physical objects, and our ideas of them as property. But this analogy overlooks the crucial difference between material objects and information: information can be copied and shared almost effortlessly, while material objects can't be. Basing your thinking on this analogy is tantamount to ignoring that difference. Even the US legal system does not entirely accept this analogy, since it does not treat copyrights just like physical object property rights. If you don't want to limit yourself to this way of thinking, it is best to avoid using the term "intellectual property" in your words and thoughts. A suggestion has been made to use the term "intellectual policy" instead of `intellectual property."
Piracy: Publishers often refer to prohibited copying as "piracy." In this way, they imply that illegal copying is ethically equivalent to attacking ships on the high seas, kidnaping and murdering the people on them. If you don't believe that illegal copying is just like kidnaping and murder, you might prefer not to use the word "piracy" to describe it. Neutral terms such as "prohibited copying" or "illegal copying" are available for use instead. Some of us might even prefer to use a positive term such as "sharing information with your neighbor."
Protection: Publishers' lawyers love to use the term "protection" to describe copyright. This word carries the implication of preventing destruction or suffering; therefore, it encourages people to identify with the owner and publisher who benefit from copyright, rather than with the users who are restricted by it. It is easy to avoid "protection" and use neutral terms instead. For example, instead of "Copyright protection lasts a very long time," you can say, "Copyright lasts a very long time."
Sell software: The term "sell software" is ambiguous. Strictly speaking, exchanging a copy of a free program for a sum of money is "selling"; but people usually associate the term "sell" with proprietary restrictions on the subsequent use of the software. You can be more precise, and prevent confusion, by saying either "distributing copies of a program for a fee" or "imposing proprietary restrictions on the use of a program," depending on what you mean.
Theft: Copyright apologists often use words like "stolen" and "theft" to describe copyright infringement. At the same time, they ask us to treat the legal system as an authority on ethics: if copying is forbidden, it must be wrong. So it is pertinent to mention that the legal system--at least in the US--rejects the idea that copyright infringement is "theft". Copyright advocates who use terms like "stolen" are misrepresenting the authority that they appeal to. The idea that laws decide what is right or wrong is mistaken in general. Laws are, at their best, an attempt to achieve justice; to say that laws define justice or ethical conduct is turning things upside down.

The X Windows Trap

To copyleft or not to copyleft? That is one of the major controversies in the free software community. The idea of copyleft is that we should fight fire with fire--that we should use copyright to make sure our code stays free. The GNU GPL is one example of a copyleft license.

Some free software developers prefer non-copyleft distribution. Non-copyleft licenses such as the XFree86 and BSD licenses are based on the idea of never saying no to anyone--not even to someone who seeks to use your work as the basis for restricting other people. Non-copyleft licensing does nothing wrong, but it misses the opportunity to actively protect our freedom to change and redistribute software. For that, we need copyleft.

For many years, the X Consortium was the chief opponent of copyleft. It exerted both moral suasion and pressure to discourage free software developers from copylefting their programs. It used moral suasion by suggesting that it is not nice to say no. It used pressure through its rule that copylefted software could not be in the X Distribution.

Why did the X Consortium adopt this policy? It had to do with their definition of success. The X Consortium defined success as popularity--specifically, getting computer companies to use X Windows. This definition put the computer companies in the driver's seat. Whatever they wanted, the X Consortium had to help them get it.

Computer companies normally distribute proprietary software. They wanted free software developers to donate their work for such use. If they had asked for this directly, people would have laughed. But the X Consortium, fronting for them, could present this request as an unselfish one. "Join us in donating our work to proprietary software developers," they said, suggesting that this is a noble form of self-sacrifice. "Join us in achieving popularity", they said, suggesting that it was not even a sacrifice.

But self-sacrifice is not the issue: tossing away the defenses of copyleft, which protect the freedom of everyone in the community, is sacrificing more than yourself. Those who granted the X Consortium's request entrusted the community's future to the good will of the X Consortium.

This trust was misplaced. In its last year, the X Consortium made a plan to restrict the forthcoming X11R6.4 release so that it will not be free software. They decided to start saying no, not only to proprietary software developers, but to our community as well.

There is an irony here. If you said yes when the X Consortium asked you not to use copyleft, you put the X Consortium in a position to license and restrict its version of your program, along with its own code.

Te X Consortium did not carry out this plan. Instead it closed down and transferred X development to the Open Group, whose staff are now carrying out a similar plan. To give them credit, when I asked them to release X11R6.4 under the GNU GPL in parallel with their planned restrictive license, they were willing to consider the idea. (They were firmly against staying with the old X11 distribution terms.) Before they said yes or no to this proposal, it had already failed for another reason: the XFree86 group follows the X Consortium's old policy, and will not accept copylefted software.

Even if the X Consortium and the Open Group had never planned to restrict X, someone else could have done it. Non-copylefted software is vulnerable from all directions; it lets anyone make a non-free version dominant, if he will invest sufficient resources to add some important feature using proprietary code. Users who choose software based on technical characteristics, rather than on freedom, could easily be lured to the non-free version for short term convenience.

The X Consortium and Open Group can no longer exert moral suasion by saying that it is wrong to say no. This will make it easier to decide to copyleft your X-related software.

When you work on the core of X, on programs such as the X server, Xlib, and Xt, there is a practical reason not to use copyleft. The XFree86 group does an important job for the community in maintaining these programs, and the benefit of copylefting our changes would be less than the harm done by a fork in development. So it is better to work with the XFree86 group and not copyleft our changes on these programs. Likewise for utilities such as xset and xrdb, which are close to the core of X, and which do not need major improvements. At least we know that the XFree86 group has a firm commitment to developing these programs as free software.

The issue is different for programs outside the core of X: applications, window managers, and additional libraries and widgets. There is no reason not to copyleft them, and we should copyleft them.

In case anyone feels the pressure exerted by the criteria for inclusion in X Distributions, the GNU project will undertake to publicize copylefted packages that work with X. If you would like to copyleft something, and you worry that its omission from X Distributions will impede its popularity, please ask us to help.

At the same time, it is better if we do not feel too much need for popularity. When a businessman tempts you with "more popularity", he may try to convince you that his use of your program is crucial to its success. Don't believe it! If your program is good, it will find many users anyway; you don't need to feel desperate for any particular users, and you will be stronger if you do not. You can get an indescribable sense of joy and freedom by responding, "Take it or leave it--that's no skin off my back." Often the businessman will turn around and accept the program with copyleft, once you call the bluff.

Friends, free software developers, don't repeat a mistake. If we do not copyleft our software, we put its future at the mercy of anyone equipped with more resources than scruples. With copyleft, we can defend freedom, not just for ourselves, but for our whole community.

Why free software needs free documentation

The biggest deficiency in free operating systems is not in the software--it is the lack of good free manuals that we can include in these systems. Many of our most important programs do not come with full manuals. Documentation is an essential part of any software package; when an important free software package does not come with a free manual, that is a major gap. We have many such gaps today.

Once upon a time, many years ago, I thought I would learn Perl. I got a copy of a free manual, but I found it hard to read. When I asked Perl users about alternatives, they told me that there were better introductory manuals--but those were not free.

Why was this? The authors of the good manuals had written them for O'Reilly Associates, which published them with restrictive terms--no copying, no modification, source files not available--which exclude them from the free software community.

That wasn't the first time this sort of thing has happened, and (to our community's great loss) it was far from the last. Proprietary manual publishers have enticed a great many authors to restrict their manuals since then. Many times I have heard a GNU user eagerly tell me about a manual that he is writing, with which he expects to help the GNU project--and then had my hopes dashed, as he proceeded to explain that he had signed a contract with a publisher that would restrict it so that we cannot use it.

Given that writing good English is a rare skill among programmers, we can ill afford to lose manuals this way.

Free documentation, like free software, is a matter of freedom, not price. The problem with these manuals was not that O'Reilly Associates charged a price for printed copies--that in itself is fine. (The Free Software Foundation sells printed copies of free GNU manuals, too.) But GNU manuals are available in source code form, while these manuals are available only on paper. GNU manuals come with permission to copy and modify; the Perl manuals do not. These restrictions are the problems.

The criterion for a free manual is pretty much the same as for free software: it is a matter of giving all users certain freedoms. Redistribution (including commercial redistribution) must be permitted, so that the manual can accompany every copy of the program, on-line or on paper. Permission for modification is crucial too.

As a general rule, I don't believe that it is essential for people to have permission to modify all sorts of articles and books. The issues for writings are not necessarily the same as those for software. For example, I don't think you or I are obliged to give permission to modify articles like this one, which describe our actions and our views.

But there is a particular reason why the freedom to modify is crucial for documentation for free software. When people exercise their right to modify the software, and add or change its features, if they are conscientious they will change the manual too--so they can provide accurate and usable documentation with the modified program. A manual which forbids programmers to be conscientious and finish the job, or more precisely requires them to write a new manual from scratch if they change the program, does not fill our community's needs.

While a blanket prohibition on modification is unacceptable, some kinds of limits on the method of modification pose no problem. For example, requirements to preserve the original author's copyright notice, the distribution terms, or the list of authors, are ok. It is also no problem to require modified versions to include notice that they were modified, even to have entire sections that may not be deleted or changed, as long as these sections deal with nontechnical topics. (Some GNU manuals have them.)

These kinds of restrictions are not a problem because, as a practical matter, they don't stop the conscientious programmer from adapting the manual to fit the modified program. In other words, they don't block the free software community from doing its thing with the program and the manual together.

However, it must be possible to modify all the technical content of the manual; otherwise, the restrictions do block the community, the manual is not free, and so we need another manual.

Unfortunately, it is often hard to find someone to write another manual when a proprietary manual exists. The obstacle is that many users think that a proprietary manual is good enough--so they don't see the need to write a free manual. They do not see that the free operating system has a gap that needs filling.

Why do users think that proprietary manuals are good enough? Some have not considered the issue. I hope this article will do something to change that.

Other users consider proprietary manuals acceptable for the same reason so many people consider proprietary software acceptable: they judge in purely practical terms, not using freedom as a criterion. These people are entitled to their opinions, but since those opinions spring from values which do not include freedom, they are no guide for those of us who do value freedom.

Please spread the word about this issue. We continue to lose manuals to proprietary publishing. If we spread the word that proprietary manuals are not sufficient, perhaps the next person who wants to help GNU by writing documentation will realize, before it is too late, that he must above all make it free.

We can also encourage commercial publishers to sell free, copylefted manuals instead of proprietary ones. One way you can help this is to check the distribution terms of a manual before you buy it, and prefer copylefted manuals to non-copylefted ones.

This document was generated on 22 August 1998 using the texi2html translator version 1.51.