When a file is annexed, a key is generated from its content and/or metadata. The file checked into git symlinks to the key. This key can later be used to retrieve the file's content (its value).
Multiple pluggable key-value backends are supported, and a single repository can use different ones for different files.
WORM
("Write Once, Read Many") This assumes that any file with the same basename, size, and modification time has the same content. This is the default, and the least expensive backend.SHA1
-- This uses a key based on a sha1 checksum. This allows verifying that the file content is right, and can avoid duplicates of files with the same content. Its need to generate checksums can make it slower for large files.SHA512
,SHA384
,SHA256
,SHA224
-- Like SHA1, but larger checksums. Mostly useful for the very paranoid, or anyone who is researching checksum collisions and wants to annex their colliding data. ;)SHA1E
,SHA512E
, etc -- Variants that preserve filename extension as part of the key. Useful for archival tasks where the filename extension contains metadata that should be preserved.
The annex.backends
git-config setting can be used to list the backends
git-annex should use. The first one listed will be used by default when
new files are added.
For finer control of what backend is used when adding different types of
files, the .gitattributes
file can be used. The annex.backend
attribute can be set to the name of the backend to use for matching files.
For example, to use the SHA1 backend for sound files, which tend to be
smallish and might be modified or copied over time, you could set in
.gitattributes
:
*.mp3 annex.backend=SHA1
*.ogg annex.backend=SHA1