@c Copyright (c) 1998 - 2001 Kungliga Tekniska Högskolan @c (Royal Institute of Technology, Stockholm, Sweden). @c All rights reserved. @c $arla: storage.texi,v 1.9 2003/04/24 11:50:42 lha Exp $ @node Organization of data, AFS and the real world, AFS infrastructure, Top @comment node-name, next, previous, up @chapter Organization of data This chapter describes how data is stored and how AFS is different from, for example, NFS. It also describes how data is kept consistent and what the requirements were and how that inpacted on the design. @menu * Requirements:: * Data organization:: * Callbacks:: * Volume management:: * Relationship between pts uid and unix uid:: @end menu @node Requirements, Data organization, Organization of data, Organization of data @comment node-name, next, previous, up @section Requirements @itemize @bullet @item Scalability It should be possible to use AFS with hundred-thousands of users without problems. Writes that are done to different parts of the filesystem should not affect each other. It should be possible to distribute out the reads and writes over many fileservers. If you have a file that is accessed by many clients, it should be possible to distribute the load. @comment What has this to do with requirements? @comment If there is multiple writes to the same file, are you sure that isn't a @comment database. @item Transparent to users Users should not need to know where their files are stored. It should be possible to move their files while they are using their files. @item Easy to admin It should be easy for a administrator to make changes to the filesystem. For example to change quota for a user or project. It should also be possible to move the users data for a fileserver to a less loaded one, or one with more diskspace available. Some benefits of using AFS are: @itemize @bullet @item user-transparent data migration @item an ability for on-line backups; @item data replication that provides both load balancing and robustness of critical data @item global name space without automounters and other add-ons; @item @@sys variables for platform-independent paths to binary location; @item enhanced security; @item client-side caching; @end itemize @end itemize @section Anti-requirements @itemize @bullet @item No databases AFS isn't constructed for storing databases. It would be possible to use AFS for storing a database if a layer above for locking and synchronizing data would be provided. One of the problems is that AFS doesn't include mandatory byte-range locks. AFS uses advisory locking on whole files. If you need a real database, use one, they are much more efficent on solving a database problem. Don't use AFS. @end itemize @node Data organization, Callbacks, Requirements, Organization of data @comment node-name, next, previous, up @section Volume A volume is a unit that is smaller than a partition. It is usually (or should be) a well defined area, like a user's home directory, a project work area, or a program distribution. Quota is controlled on volume-level. All day-to-day management is done on volumes. @section Partition In AFS a partition is what normally is named a partition. All partions that afs is using are named a special way, @file{/vicepNN}, where NN is ranged from a to z, continuing with aa to zz. The fileserver (and volser) automaticly picks upp all partitions starting with @file{/vicep} Volumes are stored in a partition. Volumes can't span several partitions. Partitions are added when the fileserver is created or when a new disk is added to a filesystem. @section Volume cloning and read-only clones A clone of a volume is often needed for volume operations. A clone is a copy-on-write copy of a volume, the clone is the read-only version. Two special versions of a clone are the read-only volume and the backup volume. The read-only volume is a snapshot of a read-write volume (that is what a clone is) that can be replicated to several fileservers to distribute the load. Each fileserver plus partition where a read-only clone is located is called a replication-site. It usually does not make sense to have more than one read-only clone on each fileserver. The backup volume is a clone that typically is made (with @code{vos backupsys}) each night to enable the user to retrieve yesterday's data when they happen to remove a file. This is a very useful feature, since it lessens the load on the system-administrators to restore files from backup. The volume is usually mounted in the root user's home directory under the name OldFiles. A special feature of the backup volume is that you can't follow mountpoints out of a backup volume. @section Mountpoints Volumes are independent of each other. To glue together the file tree there are @samp{mountpoint}s. Mountpoints are really symlinks that are formated in a special way so that they point out a volume and an optional cell. An AFS-cache-manager will show a mountpoint as directory and in fact it will be the root directory of the target volume. @node Callbacks, Volume management, Data organization, Organization of data @comment node-name, next, previous, up @section Callbacks Callbacks are messages that enable the AFS-cache-manager to keep the files without asking the server if there is newer version of the file. A callback is a promise from the fileserver that it will notify the client if the file (or directory) changes within the timelimit of the callback. For contents of read-only volumes there is only one callback per volume called a volume callback and it will be broken when the read-only volume is updated. The time range of callbacks is from 5 to 60 minutes depending on how many users of the file exist. @node Volume management, Relationship between pts uid and unix uid, Callbacks, Organization of data @comment node-name, next, previous, up @section Volume management All volume management is done with the @code{vos} command. To get a list of all commands @code{vos help} can be used. For help on a specific vos subcommand, @code{vos subcommand -h} can be used. @itemize @bullet @item Create @example vos create mim c HO.staff.lha.fluff -quota 400000 @end example @item Move Volumes can be moved from a server to another, even when users are using the volume. @item Replicate Read-only volumes can be replicated over several servers, they are first added with @code{vos addsite}, and the replicated with @code{vos release} over the servers. @item Release When you want to distribute the changes in the readwrite volume to the read-only clones. @item Remove Volumes can be removed Note that you shouldn't remove the last readonly volume since this makes clients misbehave. If you are moving the volume you should rather add a new RO to the new server and then remove it from the old server. @item Backup and restoration of volumes. @code{vos backup} and @code{vos backupsys} creates the backup volume. To stream a volume out to a @file{file} or @file{stdout} you use @code{vos dump}. The opposite command is named @code{vos restore}. @end itemize @node Relationship between pts uid and unix uid, , Volume management, Organization of data @comment node-name, next, previous, up @section Relationship between pts uid and unix uid @cindex pts @cindex uid Files in AFS are created with the pts uid of the token that was valid at the time. The pts uid number is then by commands like @code{ls -l} interpreted as a unix uid and translated into a username. If the pts and the unix uids differ, this might confuse the user as it looks like as her files are owned by someone else. This is however not the case. Complications can occur if programs impose further access restrictions based on these wrongly interpreted uids instead of using the @code{access()} system call for that purpose. Graphical file browsers are typically prone to that problem with the effect that the users are not able to see their own files in these tools.