@c Copyright (c) 1998 - 2001 Kungliga Tekniska Högskolan
@c (Royal Institute of Technology, Stockholm, Sweden).
@c All rights reserved.

@c $arla: storage.texi,v 1.9 2003/04/24 11:50:42 lha Exp $

@node Organization of data, AFS and the real world, AFS infrastructure, Top
@comment  node-name,  next,  previous,  up

@chapter Organization of data

This chapter describes how data is stored and how AFS is different from,
for example, NFS. It also describes how data is kept consistent and what
the requirements were and how that inpacted on the design.

@menu
* Requirements::  
* Data organization::
* Callbacks::
* Volume management::
* Relationship between pts uid and unix uid::
@end menu

@node Requirements, Data organization, Organization of data, Organization of data
@comment  node-name,  next,  previous,  up
@section Requirements

@itemize @bullet
@item Scalability

It should be possible to use AFS with hundred-thousands of users without
problems.

Writes that are done to different parts of the filesystem should not
affect each other. It should be possible to distribute out the reads and
writes over many fileservers. If you have a file that is accessed by
many clients, it should be possible to distribute the load.

@comment What has this to do with requirements?
@comment If there is multiple writes to the same file, are you sure that isn't a
@comment database.

@item Transparent to users

Users should not need to know where their files are stored. It should be
possible to move their files while they are using their files.

@item Easy to admin

It should be easy for a administrator to make changes to the
filesystem. For example to change quota for a user or project. It should
also be possible to move the users data for a fileserver to a less
loaded one, or one with more diskspace available.

Some benefits of using AFS are:

@itemize @bullet
@item user-transparent data migration
@item an ability for on-line backups;
@item data replication that provides both load balancing and robustness of
critical data
@item global name space without automounters and other add-ons;
@item @@sys variables for platform-independent paths to binary location;
@item enhanced security;
@item client-side caching;
@end itemize
@end itemize

@section Anti-requirements

@itemize @bullet
@item No databases

AFS isn't constructed for storing databases. It would be possible to use
AFS for storing a database if a layer above for locking and synchronizing
data would be provided.

One of the problems is that AFS doesn't include mandatory byte-range
locks. AFS uses advisory locking on whole files.

If you need a real database, use one, they are much more efficent on
solving a database problem. Don't use AFS.

@end itemize

@node Data organization, Callbacks, Requirements, Organization of data
@comment  node-name,  next,  previous,  up
@section Volume

A volume is a unit that is smaller than a partition. It is usually (or should
be) a well defined area, like a user's home directory, a project work
area, or a program distribution.

Quota is controlled on volume-level. All day-to-day management is done
on volumes.

@section Partition

In AFS a partition is what normally is named a partition. All partions
that afs is using are named a special way, @file{/vicepNN}, where NN is
ranged from a to z, continuing with aa to zz. The fileserver (and
volser) automaticly picks upp all partitions starting with @file{/vicep}

Volumes are stored in a partition. Volumes can't span several
partitions. Partitions are added when the fileserver is created or when
a new disk is added to a filesystem. 

@section Volume cloning and read-only clones

A clone of a volume is often needed for volume operations. A clone is
a copy-on-write copy of a volume, the clone is the read-only version.

Two special versions of a clone are the read-only volume and the backup
volume. The read-only volume is a snapshot of a read-write volume (that
is what a clone is) that can be replicated to several fileservers to
distribute the load. Each fileserver plus partition where a read-only
clone is located is called a replication-site. It usually does not make
sense to have more than one read-only clone on each fileserver.

The backup volume is a clone that typically is made (with @code{vos
backupsys}) each night to enable the user to retrieve yesterday's data
when they happen to remove a file. This is a very useful feature, since
it lessens the load on the system-administrators to restore files from
backup. The volume is usually mounted in the root user's home directory
under the name OldFiles. A special feature of the backup volume is that
you can't follow mountpoints out of a backup volume.

@section Mountpoints

Volumes are independent of each other. To glue together the file tree
there are @samp{mountpoint}s. Mountpoints are really symlinks that are
formated in a special way so that they point out a volume and an
optional cell. An AFS-cache-manager will show a mountpoint as directory
and in fact it will be the root directory of the target volume.

@node Callbacks, Volume management, Data organization, Organization of data
@comment  node-name,  next,  previous,  up
@section Callbacks

Callbacks are messages that enable the AFS-cache-manager to keep the
files without asking the server if there is newer version of the file.

A callback is a promise from the fileserver that it will notify the
client if the file (or directory) changes within the timelimit of the
callback.

For contents of read-only volumes there is only one callback per volume
called a volume callback and it will be broken when the read-only volume
is updated.

The time range of callbacks is from 5 to 60 minutes depending on
how many users of the file exist.

@node Volume management, Relationship between pts uid and unix uid, Callbacks, Organization of data
@comment  node-name,  next,  previous,  up
@section Volume management

All volume management is done with the @code{vos} command. To get a list
of all commands @code{vos help} can be used. For help on a specific vos
subcommand, @code{vos subcommand -h} can be used.

@itemize @bullet
@item Create

@example
vos create mim c HO.staff.lha.fluff -quota 400000
@end example

@item Move

Volumes can be moved from a server to another, even when users are using
the volume.

@item Replicate

Read-only volumes can be replicated over several servers, they are first
added with @code{vos addsite}, and the replicated with @code{vos
release} over the servers.

@item Release

When you want to distribute the changes in the readwrite volume to the
read-only clones.

@item Remove

Volumes can be removed

Note that you shouldn't remove the last readonly volume since this makes
clients misbehave. If you are moving the volume you should rather add a
new RO to the new server and then remove it from the old server.

@item Backup and restoration of volumes.

@code{vos backup} and @code{vos backupsys} creates the backup volume.

To stream a volume out to a @file{file} or @file{stdout} you use
@code{vos dump}. The opposite command is named @code{vos restore}.

@end itemize

@node Relationship between pts uid and unix uid, , Volume management, Organization of data
@comment  node-name,  next,  previous,  up
@section Relationship between pts uid and unix uid

@cindex pts
@cindex uid

Files in AFS are created with the pts uid of the token that was valid at
the time. The pts uid number is then by commands like @code{ls -l}
interpreted as a unix uid and translated into a username. If the pts and
the unix uids differ, this might confuse the user as it looks like as
her files are owned by someone else. This is however not the case.
Complications can occur if programs impose further access restrictions
based on these wrongly interpreted uids instead of using the
@code{access()} system call for that purpose. Graphical file browsers
are typically prone to that problem with the effect that the users are
not able to see their own files in these tools.