zfs

Updated: February 11, 2024

“zed"fs or ‘z’fs is a file system and volume manager created by Sun Microsystems which have been bought out by Oracle.


Table of Contents

INSTALL

If doing a fresh install of ubuntu with full erase, formatting to zfs is an option that can be selected.

sudo add-apt-repository ppa:zfs-native/stable
sudo apt update
sudo apt install ubuntu-zfs

ABOUT

Z File System - computes a checksum for every piece of data. Working as an overlay to setup a storage pool up to 256 zettabytes, or 2^78 bytes through creation of vdevs. A storage pool may contain up to 2^64 devices, and a single host can contain up to 2^64 storage pools. ZFS uses 128 bits to store most of its values. One directory can have 2^48 files, up to 16 exabytes each.

Scalability

ZFS can manage zettabytes of data. So the odds of a home user hitting the wall of too many drives and data are none.

Integrity

ZFS keeps a checksum with files. This keeps even the copies from getting data corruption. It will automatically repair when it can. If there is data corruption, it will be known.

Drive Pooling

ZFS is able to add drives of different types and sizes to be included into a pool. Think of it like RAM. Just as adding another stick to get more RAM, ZFS can add a drives to create more volume. However, when adding to drive pool it must be added as vdev (fulfilling enough drives for the vdev RAID)

Datasets

Basically just a named chunk of data. It can be made to resemble a traditional filesystem or run as a block device (zvol).
ZVOLs reserve space to the volume requested plus metadata so 4gb may become 4.13gb. ZVOLs also do not have a mount point, they get a device node under /dev/zvol/ so they can be accessed like any other block device.
Snapshots is a read only copy of a dataset in a specific moment in time. A clone is a new dataset based on a snapshot. More datasets give more control by usage of dataset properties, which work similar to pool properties. ZFS uses datasets are like partitions. There are 5 types:

zfs list                # show datasets and basic information
zfs list -t snapshot    # list only snapshots
zfs rename db/production db/old     # rename a dataset
    -f      # forces an unmount during rename in case processes are running and wont unmount
zsf destroy db/old      # delete a dataset (-vn to see as a dry run)

Properties

Most properties apply only to data written AFTER the property is changed. For instance the compression property tells ZFS to compress data before writing to disk.
Disabling compression won’t uncompress any data written before the change. SO, in order to make changes one must rewrite every file. To do this, create a new dataset copy the data over with ‘zfs send’ and then destroy original dataset.

read-only       No writing to the dataset.  
atime           When the file was last accessed.  If on, increases snapshot size.  If off, significant performance gains.  
exec            Determine if binaries and commands can run on a filesystem.  (does not prohibit running interpreted scripts)  
setuid          setuid is considered risky as these programs are like passwd and login.  Best to leave off.

RAID

RAID 0 = No redundancy. Greater perfomance with multiple disks. Data is writen to drives evenly to take advantage of throughput.

RAID 1 = Two disks are mirrored. Each disk gets complete copy of data.

RAID 2 3 4 = Obsolete and not used.

RAID 5 = Needs at least 3 disks. Uses striping to divide data across all drives, with additional parity data. Has less storage cost than RAID 1. If 2 drives fail, you lose all your data.

RAID 6 = Similar to RAID 5, but adds an additional parity block, writing two parity blocks for each bit of data striped across the disks. If 2 drives fail, you still have your data.

RAID 10 = AKA RAID 1+0 divides data between primary disks and mirrors the data to secondary disks.

RAID Z = AKA RAID Z1 requires at least 3 disks. 3 drives are usually combines into a virtual device (vdev). Data is stored in blocks of a fixed length instead of equally dividing files. This provides the ability to recover from a failure on any single disk on vdev. It is anologous with RAID 5 and can tolerate the failure of a single drive. The minimum number of devices in a raidz group is one more than the number of parity disks. The recommended number is between 3 and 9 to help increase performance.

RAID Z2 = uses 2 data blocks and 2 parity blocks for each piece of information. It is anologous with RAID 6 and can withstand 2 drive failures at once. Requires at least 4 drives.

RAID Z3 = Requires at least 5 disks and can survice up to 3 disks failing at once. Rarely used because it is not space efficient.

CLI TOOLS

ZPOOL

zpool get all           # view properties of all pools
zpool get all zroot     # get property of specific pool (zroot)
zpool get size          # get property by name (size)
zpool history zroot     # see all changes that have ever been made

ZFSUTILS

ZSYS

DATA RECOVERY

SCRUBBING

It is good to scrub quarterly but if you are abusing cheap hardware then every month.

zpool scrub zroot       # methodically check entire pool for errors
zpool status            # check progress of running scrub
zpool status -x         # the single line "all pools are healthy"
zpool scrub -s zroot    # cancel ongoing scrub

HOT SPARES

Devices associated as hot spares are not actively used in the pool. When a device fails, it is automatically replaced by a hot spare.

Spares can be shared accross multiple pools by being added with zpool add and removed with zpool remove. Once replacement is initiated, a new spare vdev is created until original device is replaced. Then the spare becomes available again if another device fails.

# create a new pool with a hot spare
zpool create tank mirror sda sdb spare sdc
zfs create tank/dataset1    # create a dataset on the tank pool

# to replace failed sda device
zpool replace tank sda sdd

# to remove the spare
zpool remove tank sdc

POOL CHECKPOINT

Before performing a risky action (eg zfs destroy), a pool checkpoint can be created so that in case of a mistake or failure the pool can be rewound. If successful, the checkpoint is discarded.

# Create a checkpoint for a pool
zpool checkpoint pool

# To rewind, first export, then rewind
zpool export pool
zpool import --rewind-to-checkpoint pool

# To discard checkpoint from a pool
zpool checkpoint -d pool

ZPOOL FLAGS

-?					# help message
-V, --version		# shows version

zpool add
	-f				# forces use of vdev if they appear in use or replication conflict.
	-g				# display vdev guids which can be used in place of device names for detach/offline/remove/replace.
	-L				# display real paths for vdev resolving all symbolic links.  Used to look up block device names.
	-n				# displays the config that would be used without adding vdev.
    -o              # set a pools property at creation
    -O              # set property for the root dataset
	-P				# display real paths for vdev (can be used with -L)
    -V              # zfs create a volume dataset

zpool list
	-o [allocated, capacity, expandsize, fragmentation, free, health, guid, size]

zpool clear pool [device] 	# clear device errors in a pool

EXAMPLES

pool name = tank

# create a single raidz root vdev with 6 disks
zpool create tank raidz sda sdb sdc sdd sde sdf

# create a mirrored storage pool
zpool create tank mirror sda sdb mirror sdc sdd

# create a zfs storage pool with partitions
zpool create tank sda1 sdb2

# adding a mirror to a ZFS storage pool
zpool add tank mirror sda sdb

# list available storage pools
zpool list

# destroy a ZFS storage pool
zpool destroy -f tank

# export a pool so it can be relocated or later imported
zpool export tank

# display available pools for import
zpool import

# import a pool
zpool import tank

# upgrade all ZFS pools to current version
zpool upgrade -a

# add cache devices to pool
zpool add pool cache sdc sdd

# monitor capacity and write speeds of cache
zpool iostat -v pool 5

# add output columns to status and iostat with -c
zpool status -c vendor,model,size
zpool iostat -vc slaves