Go to the first, previous, next, last section, table of contents.


Choosing Files and Names for tar

@UNREVISED

@FIXME{Melissa (still) Doesn't Really Like This "Intro" Paragraph!!!}

Certain options to tar enable you to specify a name for your archive. Other options let you decide which files to include or exclude from the archive, based on when or whether files were modified, whether the file names do or don't match specified patterns, or whether files are in specified directories.

Choosing and Naming Archive Files

@UNREVISED

@FIXME{should the title of this section actually be, "naming an archive"?}

By default, tar uses an archive file name that was compiled when it was built on the system; usually this name refers to some physical tape drive on the machine. However, the person who installed tar on the system may not set the default to a meaningful value as far as most users are concerned. As a result, you will usually want to tell tar where to find (or create) the archive. The --file=archive-name (-f archive-name) option allows you to either specify or name a file to use as the archive instead of the default archive file location.

--file=archive-name
-f archive-name
Name the archive to create or operate on. Use in conjunction with any operation.

For example, in this tar command,

$ tar -cvf collection.tar blues folk jazz

`collection.tar' is the name of the archive. It must directly follow the `-f' option, since whatever directly follows `-f' will end up naming the archive. If you neglect to specify an archive name, you may end up overwriting a file in the working directory with the archive you create since tar will use this file's name for the archive name.

An archive can be saved as a file in the file system, sent through a pipe or over a network, or written to an I/O device such as a tape, floppy disk, or CD write drive.

If you do not name the archive, tar uses the value of the environment variable TAPE as the file name for the archive. If that is not available, tar uses a default, compiled-in archive name, usually that for tape unit zero (ie. `/dev/tu00'). tar always needs an archive name.

If you use `-' as an archive-name, tar reads the archive from standard input (when listing or extracting files), or writes it to standard output (when creating an archive). If you use `-' as an archive-name when modifying an archive, tar reads the original archive from its standard input and writes the entire new archive to its standard output.

@FIXME{might want a different example here; this is already used in "notable tar usages".}

$ cd sourcedir; tar -cf - . | (cd targetdir; tar -xf -)

@FIXME{help!}

To specify an archive file on a device attached to a remote machine, use the following:

--file=hostname:/dev/file name

tar will complete the remote connection, if possible, and prompt you for a username and password. If you use `--file=@hostname:/dev/file name', tar will complete the remote connection, if possible, using your username as the username on the remote machine.

If the archive file name includes a colon (`:'), then it is assumed to be a file on another machine. If the archive file is `user@host:file', then file is used on the host host. The remote host is accessed using the rsh program, with a username of user. If the username is omitted (along with the `@' sign), then your user name will be used. (This is the normal rsh behavior.) It is necessary for the remote machine, in addition to permitting your rsh access, to have the `/usr/ucb/rmt' program installed. If you need to use a file whose name includes a colon, then the remote tape drive behavior can be inhibited by using the --force-local option.

@FIXME{i know we went over this yesterday, but bob (and now i do again, too) thinks it's out of the middle of nowhere. it doesn't seem to tie into what came before it well enough <<i moved it now, is it better here?>>. bob also comments that if Amanda isn't free software, we shouldn't mention it..}

When the archive is being created to `/dev/null', GNU tar tries to minimize input and output operations. The Amanda backup system, when used with GNU tar, has an initial sizing pass which uses this feature.

Selecting Archive Members

File Name arguments specify which files in the file system tar operates on, when creating or adding to an archive, or which archive members tar operates on, when reading or deleting from an archive. See section The Five Advanced tar Operations.

To specify file names, you can include them as the last arguments on the command line, as follows:

tar operation [option1 option2 ...] [file name-1 file name-2 ...]

If you specify a directory name as a file name argument, all the files in that directory are operated on by tar.

If you do not specify files when tar is invoked with --create (-c), tar operates on all the non-directory files in the working directory. If you specify either --list (-t) or --extract (--get, -x), tar operates on all the archive members in the archive. If you specify any operation other than one of these three, tar does nothing.

By default, tar takes file names from the command line. However, there are other ways to specify file or member names, or to modify the manner in which tar selects the files or members upon which to operate; @FIXME{add xref here}. In general, these methods work both for specifying the names of files and archive members.

Reading Names from a File

@UNREVISED

Instead of giving the names of files or archive members on the command line, you can put the names into a file, and then use the --files-from=file-of-names (-T file-of-names) option to tar. Give the name of the file which contains the list of files to include as the argument to `--files-from'. In the list, the file names should be separated by newlines. You will frequently use this option when you have generated the list of files to archive with the find utility.

--files-from=file name
-T file name
Get names to extract or create from file file name.

If you give a single dash as a file name for `--files-from', (i.e., you specify either `--files-from=-' or `-T -'), then the file names are read from standard input.

Unless you are running tar with `--create', you can not use both `--files-from=-' and `--file=-' (`-f -') in the same command.

@FIXME{add bob's example, from his message on 2-10-97}

The following example shows how to use find to generate a list of files smaller than 400K in length and put that list into a file called `small-files'. You can then use the `-T' option to tar to specify the files from that file, `small-files', to create the archive `little.tgz'. (The `-z' option to tar compresses the archive with gzip; see section Creating and Reading Compressed Archives for more information.)

$ find . -size -400 -print > small-files
$ tar -c -v -z -T small-files -f little.tgz

@FIXME{say more here to conclude the example/section?}

The --null option causes --files-from=file-of-names (-T file-of-names) to read file names terminated by a NUL instead of a newline, so files whose names contain newlines can be archived using `--files-from'.

--null
Only consider NUL terminated file names, instead of files that terminate in a newline.

The `--null' option is just like the one in GNU xargs and cpio, and is useful with the `-print0' predicate of GNU find. In tar, `--null' also causes --directory=directory (-C directory) options to be treated as file names to archive, in case there are any files out there called `-C'.

This example shows how to use find to generate a list of files larger than 800K in length and put that list into a file called `long-files'. The `-print0' option to find just just like `-print', except that it separates files with a NUL rather than with a newline. You can then run tar with both the `--null' and `-T' options to specify that tar get the files from that file, `long-files', to create the archive `big.tgz'. The `--null' option to tar will cause tar to recognize the NUL separator between files.

$ find . -size +800 -print0 > long-files
$ tar -c -v --null --files-from=long-files --file=big.tar

@FIXME{say anything else here to conclude the section?}

Excluding Some Files

@UNREVISED

To avoid operating on files whose names match a particular pattern, use the --exclude=pattern or --exclude-from=file-of-patterns (-X file-of-patterns) options.

--exclude=pattern
Causes tar to ignore files that match the pattern.

The --exclude=pattern option will prevent any file or member which matches the shell wildcards (pattern) from being operated on (pattern can be a single file name or a more complex expression). For example, if you want to create an archive with all the contents of `/tmp' except the file `/tmp/foo', you can use the command `tar --create --file=arch.tar --exclude=foo'. You may give multiple `--exclude' options.

--exclude-from=file
-X file
Causes tar to ignore files that match the patterns listed in file.

Use the `--exclude-from=file-of-patterns' option to read a list of shell wildcards, one per line, from file; tar will ignore files matching those regular expressions. Thus if tar is called as `tar -c -X foo .' and the file `foo' contains a single line `*.o', no files whose names end in `.o' will be added to the archive.

@FIXME{do the exclude options files need to have stuff separated by newlines the same as the files-from option does?}

Problems with Using the exclude Options

@FIXME{put in for the editor's/editors' amusement, but should be taken out in the final draft, just in case! : }

Some users find `exclude' options confusing. Here are some common pitfalls:

Wildcards Patterns and Matching

Globbing is the operation by which wildcard characters, `*' or `?' for example, are replaced and expanded into all existing files matching the given pattern. However, tar often uses wildcard patterns for matching (or globbing) archive members instead of actual files in the filesystem. Wildcard patterns are also used for verifying volume labels of tar archives. This section has the purpose of explaining wildcard syntax for tar.

@FIXME{the next few paragraphs need work.}

A pattern should be written according to shell syntax, using wildcard characters to effect globbing. Most characters in the pattern stand for themselves in the matched string, and case is significant: `a' will match only `a', and not `A'. The character `?' in the pattern matches any single character in the matched string. The character `*' in the pattern matches zero, one, or more single characters in the matched string. The character `\' says to take the following character of the pattern literally; it is useful when one needs to match the `?', `*', `[' or `\' characters, themselves.

The character `[', up to the matching `]', introduces a character class. A character class is a list of acceptable characters for the next single character of the matched string. For example, `[abcde]' would match any of the first five letters of the alphabet. Note that within a character class, all of the "special characters" listed above other than `\' lose their special meaning; for example, `[-\\[*?]]' would match any of the characters, `-', `\', `[', `*', `?', or `]'. (Due to parsing constraints, the characters `-' and `]' must either come first or last in a character class.)

If the first character of the class after the opening `[' is `!' or `^', then the meaning of the class is reversed. Rather than listing character to match, it lists those characters which are forbidden as the next single character of the matched string.

Other characters of the class stand for themselves. The special construction `[a-e]', using an hyphen between two letters, is meant to represent all characters between a and e, inclusive.

@FIXME{need to add a sentence or so here to make this clear for those who don't have dan around.}

Periods (`.') or forward slashes (`/') are not considered special for wildcard matches. However, if a pattern completely matches a directory prefix of a matched string, then it matches the full matched string: excluding a directory also excludes all the files beneath it.

There are some discussions floating in the air and asking for modifications in the way GNU tar accomplishes wildcard matches. We perceive any change of semantics in this area as a delicate thing to impose on GNU tar users. On the other hand, the GNU project should be progressive enough to correct any ill design: compatibility at all price is not always a good attitude. In conclusion, it is possible that slight amendments be later brought to the previous description. Your opinions on the matter are welcome.

Operating Only on New Files

@UNREVISED

The --after-date=date (--newer=date, -N date) option causes tar to only work on files whose modification or inode-changed times are newer than the date given. If you use this option when creating or appending to an archive, the archive will only include new files. If you use `--after-date' when extracting an archive, tar will only extract files newer than the date you specify.

If you only want tar to make the date comparison based on modification of the actual contents of the file (rather than inode changes), then use the --newer-mtime=date option.

You may use these options with any operation. Note that these options differ from the --update (-u) operation in that they allow you to specify a particular date against which tar can compare when deciding whether or not to archive the files.

--after-date=date
--newer=date
-N date
Only store files newer than date. Acts on files only if their modification or inode-changed times are later than date. Use in conjunction with any operation.
--newer-mtime=date
Acts like --after-date=date (--newer=date, -N date), but only looks at modification times.

These options limit tar to only operating on files which have been modified after the date specified. A file is considered to have changed if the contents have been modified, or if the owner, permissions, and so forth, have been changed. (For more information on how to specify a date, see section Date input formats; remember that the entire date argument must be quoted if it contains any spaces.)

Gurus would say that --after-date=date (--newer=date, -N date) tests both the mtime (time the contents of the file were last modified) and ctime (time the file's status was last changed: owner, permissions, etc) fields, while --newer-mtime=date tests only mtime field.

To be precise, --after-date=date (--newer=date, -N date) checks both mtime and ctime and processes the file if either one is more recent than date, while --newer-mtime=date only checks mtime and disregards ctime. Neither uses atime (the last time the contents of the file were looked at).

Date specifiers can have embedded spaces. Because of this, you may need to quote date arguments to keep the shell from parsing them as separate arguments.

@FIXME{Need example of --newer-mtime with quoted argument.}

Please Note: --after-date=date (--newer=date, -N date) and --newer-mtime=date should not be used for incremental backups. Some files (such as those in renamed directories) are not selected properly by these options. See section The Incremental Options.

To select files newer than the modification time of a file that already exists, you can use the `--reference' (`-r') option of GNU date, available in GNU shell utilities 1.13 or later. It returns the timestamp of that already existing file; this timestamp expands to become the referent date which `--newer' uses to determine which files to archive. For example, you could say,

$ tar -cf archive.tar --newer="`date -r file`" /home

which tells @FIXME{need to fill this in!}.

Descending into Directories

@UNREVISED

@FIXME{arrggh! this is still somewhat confusing to me. :-< }

@FIXME{show dan bob's comments, from 2-10-97}

Usually, tar will recursively explore all directories (either those given on the command line or through the --files-from=file-of-names (-T file-of-names) option) for the various files they contain. However, you may not always want tar to act this way.

The --no-recursion option inhibits tar's recursive descent into specified directories. If you specify `--no-recursion', you can use the find utility for hunting through levels of directories to construct a list of file names which you could then pass to tar. find allows you to be more selective when choosing which files to archive; see section Reading Names from a File for more information on using find with tar, or look.

--no-recursion
Prevents tar from recursively descending directories.

When you use `--no-recursion', GNU tar grabs directory entries themselves, but does not descend on them recursively. Many people use find for locating files they want to back up, and since tar usually recursively descends on directories, they have to use the `! -d' option to find @FIXME{needs more explanation or a cite to another info file} as they usually do not want all the files in a directory. They then use the option to archive the files located via find.

The problem when restoring files archived in this manner is that the directories themselves are not in the archive; so the --same-permissions (--preserve-permissions, -p) option does not affect them--while users might really like it to. Specifying --no-recursion is a way to tell tar to grab only the directory entries given to it, adding no new files on its own.

@FIXME{example here}

Crossing Filesystem Boundaries

@UNREVISED

tar will normally automatically cross file system boundaries in order to archive files which are part of a directory tree. You can change this behavior by running tar and specifying --one-file-system (-l). This option only affects files that are archived because they are in a directory that is being archived; tar will still archive files explicitly named on the command line or through --files-from=file-of-names (-T file-of-names), regardless of where they reside.

--one-file-system
-l
Prevents tar from crossing file system boundaries when archiving. Use in conjunction with any write operation.

The `--one-file-system' option causes tar to modify its normal behavior in archiving the contents of directories. If a file in a directory is not on the same filesystem as the directory itself, then tar will not archive that file. If the file is a directory itself, tar will not archive anything beneath it; in other words, tar will not cross mount points.

It is reported that using this option, the mount point is is archived, but nothing under it.

This option is useful for making full or incremental archival backups of a file system. If this option is used in conjunction with --verbose (-v), files that are excluded are mentioned by name on the standard error.

Changing the Working Directory

@FIXME{need to read over this node now for continuity; i've switched things around some.}

@UNREVISED

To change the working directory in the middle of a list of file names, either on the command line or in a file specified using --files-from=file-of-names (-T file-of-names), use --directory=directory (-C directory). This will change the working directory to the directory directory after that point in the list.

--directory=directory
-C directory
Changes the working directory in the middle of a command line.

For example,

$ tar -c -f jams.tar grape prune -C food cherry

will place the files `grape' and `prune' from the current directory into the archive `jams.tar', followed by the file `cherry' from the directory `food'. This option is especially useful when you have several widely separated files that you want to store in the same archive.

Note that the file `cherry' is recorded in the archive under the precise name `cherry', not `food/cherry'. Thus, the archive will contain three files that all appear to have come from the same directory; if the archive is extracted with plain `tar --extract', all three files will be written in the current directory.

Contrast this with the command,

$ tar -c -f jams.tar grape prune -C food red/cherry

which records the third file in the archive under the name `red/cherry' so that, if the archive is extracted using `tar --extract', the third file will be written in a subdirectory named `orange-colored'.

You can use the `--directory' option to make the archive independent of the original name of the directory holding the files. The following command places the files `/etc/passwd', `/etc/hosts', and `/lib/libc.a' into the archive `foo.tar':

$ tar -c -f foo.tar -C /etc passwd hosts -C /lib libc.a

However, the names of the archive members will be exactly what they were on the command line: `passwd', `hosts', and `libc.a'. They will not appear to be related by file name to the original directories where those files were located.

Note that `--directory' options are interpreted consecutively. If `--directory' specifies a relative file name, it is interpreted relative to the then current directory, which might not be the same as the original current working directory of tar, due to a previous `--directory' option.

@FIXME{dan: does this mean that you *can* use the short option form, but you can *not* use the long option form with --files-from? or is this totally screwed?}

When using `--files-from' (see section Reading Names from a File), you can put `-C' options in the file list. Unfortunately, you cannot put `--directory' options in the file list. (This interpretation can be disabled by using the --null option.)

Absolute File Names

@UNREVISED

-P
--absolute-names
Do not strip leading slashes from file names.

By default, GNU tar drops a leading `/' on input or output. This option turns off this behavior; it is equivalent to changing to the root directory before running tar (except it also turns off the usual warning message).

When tar extracts archive members from an archive, it strips any leading slashes (`/') from the member name. This causes absolute member names in the archive to be treated as relative file names. This allows you to have such members extracted wherever you want, instead of being restricted to extracting the member in the exact directory named in the archive. For example, if the archive member has the name `/etc/passwd', tar will extract it as if the name were really `etc/passwd'.

Other tar programs do not do this. As a result, if you create an archive whose member names start with a slash, they will be difficult for other people with a non-GNU tar program to use. Therefore, GNU tar also strips leading slashes from member names when putting members into the archive. For example, if you ask tar to add the file `/bin/ls' to an archive, it will do so, but the member name will be `bin/ls'.

If you use the --absolute-names (-P) option, tar will do neither of these transformations.

To archive or extract files relative to the root directory, specify the --absolute-names (-P) option.

Normally, tar acts on files relative to the working directory--ignoring superior directory names when archiving, and ignoring leading slashes when extracting.

When you specify --absolute-names (-P), tar stores file names including all superior directory names, and preserves leading slashes. If you only invoked tar from the root directory you would never need the --absolute-names (-P) option, but using this option may be more convenient than switching to root.

@FIXME{Should be an example in the tutorial/wizardry section using this to transfer files between systems.}

@FIXME{Is write access an issue?}

--absolute-names
Preserves full file names (inclusing superior dirctory names) when archiving files. Preserves leading slash when extracting files.

@FIXME{this is still horrible; need to talk with dan on monday.}

tar prints out a message about removing the `/' from file names. This message appears once per GNU tar invocation. It represents something which ought to be told; ignoring what it means can cause very serious surprises, later.

Some people, nevertheless, do not want to see this message. Wanting to play really dangerously, one may of course redirect tar standard error to the sink. For example, under sh:

$ tar -c -f archive.tar /home 2> /dev/null

Another solution, both nicer and simpler, would be to change to the `/' directory first, and then avoid absolute notation. For example:

$ (cd / && tar -c -f archive.tar home)
$ tar -c -f archive.tar -C  / home


Go to the first, previous, next, last section, table of contents.