Contents
What do you do to use a large disk with NetBSD?
Here are two:
wd1 at atabus3 drive 0
wd1:
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 5589 GB, 11628021 cyl, 16 head, 63 sec, 512 bytes/sect x 11721045168 sectors
wd1: GPT GUID: 8ee69292-5099-11e4-833b-001cc4d779ed
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
wd2 at atabus4 drive 0
wd2:
wd2: drive supports 16-sector PIO transfers, LBA48 addressing
wd2: 5589 GB, 11628021 cyl, 16 head, 63 sec, 512 bytes/sect x 11721045168 sectors
wd2: GPT GUID: 902717d6-5099-11e4-833b-001cc4d779ed
wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd2(ahcisata0:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
Disklabel
The NetBSD kernel seems to recgonize the disk size fine. But when running the disklabel program to partition the disk, you get this:
% disklabel wd1
# /dev/rwd1d:
type: ESDI
disk: WDC WD60EFRX-68M
label: fictitious
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 11628021
total sectors: 4294967295
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # microseconds
track-to-track seek: 0 # microseconds
drivedata: 0
4 partitions:
# size offset fstype [fsize bsize cpg/sgs]
a: 4294967295 0 4.2BSD 0 0 0 # (Cyl. 0 - 4294967295*)
d: 4294967295 0 unused 0 0 # (Cyl. 0 - 4294967295*)
disklabel: boot block size 0
disklabel: super block size 0
While the disk geometry seems to be correct, the sectors and partition sizes are limited to 4294967295 which is 232-1. Older NetBSD kernels even produce a size from the lower 32bit of the real sector count. This is because the disklabel data structure has only 32bit fields to store partition sizes and offsets and the total sector count. This is good for up to 2TB disks with the standard sector size of 512 bytes.
The disklabel is part of the original disk driver interface and the driver will use that data inside its strategy routine to validate disk accesses.
You can still use the disk somewhat. While normal partitions are limited by the 32bit values in the disklabel (or even the truncated values set by older kernels), there is the "raw" partition (on i386/amd64 that's partition 'd', other archs mostly use 'c') where the driver ignores the disklabel and validates the access against an internal value which is not truncated.
There are a few obstacles though.
formatting
% sudo newfs /dev/rwd1d
newfs: /dev/rwd1d partition type is not `4.2BSD'
The raw partition has no correct type and newfs refuses to work, we have to persuade it a little bit:
% sudo newfs -O2 -F -s 11721045168 /dev/rwd1d
/dev/rwd1d: 5723166.5MB (11721045168 sectors) block size 32768, fragment size 4096
using 7710 cylinder groups of 742.31MB, 23754 blks, 46848 inodes.
super-block backups (for fsck_ffs -b #) at:
192, 1520448, 3040704, 4560960, 6081216, 7601472, 9121728, 10641984, 12162240, 13682496, 15202752,
.......................................................................................................
We need -O2 for a filesystem > 1TB, -F to prevent newfs from checking a disklabel and -s to tell it the disk size.
using
Then we can mount and use the disk:
% sudo mount /dev/wd1d /mnt
% sudo mkdir -m 1777 /mnt/scratch
% dd if=/dev/zero bs=1024k count=1000 of=/mnt/scratch/testfile
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 10.350 secs (101311690 bytes/sec)
% ls -la /mnt/scratch/
total 2048592
drwxrwxrwt 2 root wheel 512 Oct 25 17:34 .
drwxr-xr-x 3 root wheel 512 Oct 25 17:34 ..
-rw-r--r-- 1 mlelstv wheel 1048576000 Oct 25 17:35 testfile
% sudo umount /mnt
checking
Filesystem checking needs extra effort:
% sudo fsck /dev/rwd1d
fsck: vfstype `unused' on partition `/dev/rwd1d' is not supported
Again you need to augment information from the disklabel for the raw partition:
% sudo fsck -t ffs /dev/rwd1d
** /dev/rwd1d
** File system is clean; not checking
% sudo fsck -t ffs -f /dev/rwd1d
** /dev/rwd1d
** File system is already clean
** Last Mounted on /mnt
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
3 files, 256074 used, 1442176277 free (21 frags, 180272032 blocks, 0.0% fragmentation)
and more
You can also modify the disklabel, set the raw partition to type 4.2BSD (including fsize+bsize parameters) and delete the 'a' partition to avoid overlap warnings. The result looks like:
4 partitions:
# size offset fstype [fsize bsize cpg/sgs]
d: 4294967295 0 4.2BSD 4096 65536 0 # (Cyl. 0 - 4294967295*)
This makes fsck recognize the disk correctly, newfs still requires the disksize parameter.
One more caveat, when you write the disklabel to the disk, it has to coexist with the filesystem data. Since the raw partition starts at offset 0 (the disklabel data is ignored by the driver!), this does not work with every filesystem, but FFS is safe.
Wedges
Wedges solve two problems in NetBSD. They support larger disks without the 32bit limitation of the disklabel, and they can be used with any partition information on the disk which makes it possible to exchange disks between different platforms.
Here is a disk with a wedge:
% sudo dkctl wd2 listwedges
/dev/rwd2d: 1 wedge:
dk6: hugedisk2, 11721043968 blocks at 1024, type: ccd
Wedges can be created and removed at any time with the dkctl command by specifying a name, start offset on the disk, the wedge size and a type.
% sudo dkctl wd2 addwedge testwedge 34 500 ffs
dk5 created successfully.
% sudo dkctl wd2 addwedge testwedge 500 100 unused
dkctl: /dev/rwd2d: addwedge: Invalid argument
% sudo dkctl wd2 addwedge testwedge 534 100 unused
dkctl: /dev/rwd2d: addwedge: File exists
% sudo dkctl wd2 addwedge testwedge2 534 100 unused
dk10 created successfully.
% sudo dkctl wd2 listwedges
/dev/rwd2d: 3 wedges:
dk5: testwedge, 500 blocks at 34, type: ffs
dk10: testwedge2, 100 blocks at 534, type: unused
dk6: hugedisk2, 11721043968 blocks at 1024, type: ccd
You can see that creation of a wedge is validated, it requires a unique name and the wedges must not overlap.
% sudo newfs NAME=testwedge
/dev/rdk5: 0.2MB (500 sectors) block size 4096, fragment size 512
using 3 cylinder groups of 0.08MB, 21 blks, 32 inodes.
super-block backups (for fsck_ffs -b #) at:
32, 200, 368,
Older versions of newfs do not support wedgenames, you need to specify the device, e.g. /dev/rdk5.
GPT
If wedges could only be created by running a command, they wouldn't be useful. But the NetBSD kernel can generate wedges automatically when a disk is attached. Most NetBSD architectures will scan for MBR (PC BIOS Master Boot Record) and GPT (GUID Partition Table). There is also support for standard BSD disklabels but which is currently disabled.
A GPT can be displayed, created and edited with the GPT command:
% sudo gpt show wd2
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 990
1024 11721043968 1 GPT part - NetBSD ccd component
11721044992 143
11721045135 32 Sec GPT table
11721045167 1 Sec GPT header
% sudo gpt show wd1
start size index contents
0 11721045168
% sudo gpt create wd1
% sudo gpt show wd1
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 11721045101
11721045135 32 Sec GPT table
11721045167 1 Sec GPT header
% sudo gpt add -a 512k -l hugedisk1 -t ccd wd1
Partition 1 added, use:
dkctl wd1 addwedge 1024 11721043968
to create a wedge for it
% sudo gpt show wd1
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 990
1024 11721043968 1 GPT part - NetBSD ccd component
11721044992 143
11721045135 32 Sec GPT table
11721045167 1 Sec GPT header
Since wedges are currently only created when a device is attached, we need either a reboot or some magic using drvctl.
% sudo drvctl -d wd1
% sudo drvctl -d wd2
% sudo drvctl -a ata_hl -r atabus3
% sudo drvctl -a ata_hl -r atabus4
NetBSD-current as a of 20141104 can rescan a device for wedges instead. This will delete all unused wedges for a device and readd them according to the label.
% sudo dkctl wd1 makewedges
successfully scanned /dev/rwd1d.
% sudo dkctl wd2 makewedges
successfully scanned /dev/rwd2d.
The console or dmesg will reveal that both disks have been reattached and wedges have been created:
wd1 at atabus3 drive 0
wd1:
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 5589 GB, 11628021 cyl, 16 head, 63 sec, 512 bytes/sect x 11721045168 sectors
wd1: GPT GUID: 38ba6ff4-e48a-42e4-a513-fe217d7fa013
dk5 at wd1: hugedisk1
dk5: 11721043968 blocks at 1024, type: ccd
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
wd2 at atabus4 drive 0
wd2:
wd2: drive supports 16-sector PIO transfers, LBA48 addressing
wd2: 5589 GB, 11628021 cyl, 16 head, 63 sec, 512 bytes/sect x 11721045168 sectors
wd2: GPT GUID: 902717d6-5099-11e4-833b-001cc4d779ed
dk6 at wd2: hugedisk2
dk6: 11721043968 blocks at 1024, type: ccd
wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd2(ahcisata0:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
CCD
If the partitions (and wedges) were created with type ffs you could just use them to format filesystems and mount them but I created them with type ccd to be used by the concatenating disk driver.
ccd is a very old driver and has some deficiencies. For one, the ccdconfig tool didn't understand wedge names until very recently.
% sudo ccdconfig -c ccd0 16 none NAME=hugedisk1 NAME=hugedisk2
ccdconfig: NAME=hugedisk1: No such file or directory
configuring
For now we use the wedge device directly.
% sudo ccdconfig -c ccd0 16 none /dev/dk5 /dev/dk6
% disklabel ccd0
# /dev/rccd0d:
type: ccd
disk: ccd
label: fictitious
flags:
bytes/sector: 512
sectors/track: 2048
tracks/cylinder: 1
sectors/cylinder: 2048
cylinders: 11446332
total sectors: 4294967295
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # microseconds
track-to-track seek: 0 # microseconds
drivedata: 0
4 partitions:
# size offset fstype [fsize bsize cpg/sgs]
a: 4294967295 0 4.2BSD 0 0 0 # (Cyl. 0 - 4294967295*)
d: 4294967295 0 unused 0 0 # (Cyl. 0 - 4294967295*)
disklabel: boot block size 0
disklabel: super block size 0
using
Obviously the device that spans two disks is also too large for disklabel. What about wedges?
% sudo dkctl ccd0 listwedges
dkctl: /dev/rccd0d: listwedges: Inappropriate ioctl for device
The ccd driver does not support wedges at all. But we can still use the raw partition.
% sudo newfs -O2 -F -s 23442087936 /dev/rccd0d
/dev/rccd0d: 11446332.0MB (23442087936 sectors) block size 32768, fragment size 4096
using 15420 cylinder groups of 742.31MB, 23754 blks, 46848 inodes.
super-block backups (for fsck_ffs -b #) at:
192, 1520448, 3040704, 4560960, 6081216, 7601472, 9121728, 10641984, 12162240, 13682496, 15202752, 16723008, 18243264,
......................................................................................................................
% sudo mount /dev/ccd0d /mnt
% df -h /mnt
Filesystem Size Used Avail %Cap Mounted on
/dev/ccd0d 11T 4.0K 10T 0% /mnt
% sudo umount /mnt
% sudo ccdconfig -u ccd0
Raidframe
Since ccd wasn't that good, lets create a RAID0 using the raidframe driver.
First, change the partition types:
% sudo gpt type -i 1 -t ccd -T raid wd1
partition 1 type changed
% sudo gpt type -i 1 -t ccd -T raid wd2
partition 1 type changed
% sudo gpt show wd1
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 990
1024 11721043968 1 GPT part - NetBSD RAIDFrame component
11721044992 143
11721045135 32 Sec GPT table
11721045167 1 Sec GPT header
% sudo gpt show wd2
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 990
1024 11721043968 1 GPT part - NetBSD RAIDFrame component
11721044992 143
11721045135 32 Sec GPT table
11721045167 1 Sec GPT header
And the reattachment magic (or remake the wedges on recent NetBSD):
% sudo drvctl -d wd1
% sudo drvctl -d wd2
% sudo drvctl -a ata_hl -r atabus3
% sudo drvctl -a ata_hl -r atabus4
% sudo dkctl wd1 listwedges
/dev/rwd1d: 1 wedge:
dk5: hugedisk1, 11721043968 blocks at 1024, type: raidframe
% sudo dkctl wd2 listwedges
/dev/rwd2d: 1 wedge:
dk6: hugedisk2, 11721043968 blocks at 1024, type: raidframe
configuring
For raidframe we need a configuration file:
% cat raid0.conf
START array
# numRow numCol numSpare
1 2 0
START disks
/dev/dk5
/dev/dk6
START layout
# sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
16 1 1 0
START queue
fifo 100
% sudo raidctl -C raid0.conf raid0
% sudo raidctl -I 7894737 raid0
% sudo raidctl -A yes raid0
raid0: Autoconfigure: Yes
raid0: Root: No
% disklabel raid0
# /dev/rraid0d:
type: RAID
disk: raid
label: fictitious
flags:
bytes/sector: 512
sectors/track: 32
tracks/cylinder: 8
sectors/cylinder: 256
cylinders: 91570655
total sectors: 4294967295
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # microseconds
track-to-track seek: 0 # microseconds
drivedata: 0
4 partitions:
# size offset fstype [fsize bsize cpg/sgs]
a: 4294967295 0 4.2BSD 0 0 0 # (Cyl. 0 - 4294967295*)
d: 4294967295 0 unused 0 0 # (Cyl. 0 - 4294967295*)
disklabel: boot block size 0
disklabel: super block size 0
% sudo dkctl raid0 listwedges
/dev/rraid0d: no wedges configured
formatting
The raidframe driver has no problems with wedges. We can create a GPT on the raid device and create the wedge and format a filesystem. This time manually but the reattachment magic would do the same.
% sudo gpt create raid0
% sudo gpt add -a 1024 -t ffs -l stripes raid0
Partition 1 added, use:
dkctl raid0 addwedge 34 23442087740
to create a wedge for it
% sudo dkctl raid0 addwedge stripes 34 23442087740 ffs
dk10 created successfully.
% sudo newfs -O2 NAME=stripes
/dev/rdk10: 11446332.0MB (23442087736 sectors) block size 32768, fragment size 4096
using 15420 cylinder groups of 742.31MB, 23754 blks, 46848 inodes.
super-block backups (for fsck_ffs -b #) at:
192, 1520448, 3040704, 4560960, 6081216, 7601472, 9121728, 10641984, 12162240, 13682496, 15202752, 16723008, 18243264,
......................................................................................................................
using
Wedges can be mounted by name.
% sudo mount NAME=stripes /mnt
% df -h /mnt
Filesystem Size Used Avail %Cap Mounted on
/dev/dk10 11T 4.0K 10T 0% /mnt
% sudo mkdir -m 1777 /mnt/scratch
% dd if=/dev/zero bs=1024k count=1000 of=/mnt/scratch/testfile
1000+0 records in
1000+0 records out
1048576000 bytes transferred in 4.961 secs (211363837 bytes/sec)
% sudo umount /mnt
checking
And also be checked.
% sudo fsck NAME=stripes
** /dev/rdk10
** File system is clean; not checking
Older versions of fsck do not support wedgenames, you need to specify the device, e.g. /dev/rdk10.
and more
Verify that this would reconfigure automatically after a reboot:
% sudo raidctl -u raid0
% sudo drvctl -d wd1
% sudo drvctl -d wd2
% sudo drvctl -a ata_hl -r atabus3
% sudo drvctl -a ata_hl -r atabus4
% sudo raidctl -c raid0.conf raid0
% sudo mount NAME=stripes /mnt
% df -h /mnt
Filesystem Size Used Avail %Cap Mounted on
/dev/dk10 11T 1.0G 10T 0% /mnt
% ls -l /mnt/scratch/testfile
-rw-r--r-- 1 mlelstv wheel 1048576000 Oct 25 21:47 /mnt/scratch/testfile
Unlike wedges, the raidframe devices are not automatically created when a disk attaches. That's why we needed the raidctl -c command. The raidframe driver however scans all disks available when booting and autoconfigures devices for all raidsets found that have the autoconfig flag set.
LVM
Linux LVM is a different scheme to manage disk space, it uses its own label to group multiple disks together and to carve out blocks using the device mapper driver to form logical volumes.
The device mapper provides logical block and character devices that route I/O to physical disks or some other logical devices. Such devices can be used for filesystems like a disk partition or wedge.
LVM is mostly used on raw disks as there is rarely a necessity to partition a disk first. But it can also be used on disk partitions or wedges, this has advantages if the disks are used for booting or are moved between different systems or platforms.
LVM on raw disks
LVM disks are labeled first with the pvcreate command and then coalesced into a volume group.
% sudo lvm pvcreate /dev/rwd1d
Physical volume "/dev/rwd1d" successfully created
% sudo lvm pvcreate /dev/rwd2d
Physical volume "/dev/rwd2d" successfully created
% sudo lvm vgcreate vg0 /dev/rwd1d /dev/rwd2d
Volume group "vg0" successfully created
% sudo lvm pvs
PV VG Fmt Attr PSize PFree
/dev/rwd1d vg0 lvm2 a- 2.00t 2.00t
/dev/rwd2d vg0 lvm2 a- 2.00t 2.00t
This shows a problem with large disks, the LVM tools only understand conventional disk partitions that are limited to 2TB each. However, neither LVM, the device mapper driver nor the disk drivers when using the raw partition are bound to the disklabel information. But you need to tell LVM the real size to override the synthesized disklabel. The real size is 11721045168 sectors by 512 bytes giving 5723166 Megabytes.
% sudo lvm vgremove vg0
Volume group "vg0" successfully removed
% sudo lvm pvcreate --setphysicalvolumesize=5723166 /dev/rwd1d
WARNING: /dev/rwd1d: Overriding real size. You could lose data.
Physical volume "/dev/rwd1d" successfully created
% sudo lvm pvcreate --setphysicalvolumesize=5723166 /dev/rwd2d
WARNING: /dev/rwd2d: Overriding real size. You could lose data.
Physical volume "/dev/rwd2d" successfully created
% sudo lvm vgcreate vg0 /dev/rwd1d /dev/rwd2d
Volume group "vg0" successfully created
% sudo lvm pvs
PV VG Fmt Attr PSize PFree
/dev/rwd1d vg0 lvm2 a- 5.46t 5.46t
/dev/rwd2d vg0 lvm2 a- 5.46t 5.46t
LVM on wedges
Here are again the two disks with a large wedge each. The wedge type is unknown because the GPT lists the partition as type linux-lvm which has no well-known wedge type.
% sudo dkctl wd1 listwedges
/dev/rwd1d: 1 wedge:
dk8: hugedisk1, 11721045100 blocks at 34, type:
% sudo dkctl wd2 listwedges
/dev/rwd2d: 1 wedge:
dk9: hugedisk2, 11721045100 blocks at 34, type:
Now label the disks, form a volume group, create a logical partition and a filesystem:
% sudo lvm pvcreate /dev/rdk8
Physical volume "/dev/rdk8" successfully created
% sudo lvm pvcreate /dev/rdk9
Physical volume "/dev/rdk9" successfully created
% sudo lvm vgcreate vg0 /dev/rdk8 /dev/rdk9
Volume group "vg0" successfully created
% sudo lvm pvs
PV VG Fmt Attr PSize PFree
/dev/rdk8 vg0 lvm2 a- 5.46t 5.46t
/dev/rdk9 vg0 lvm2 a- 5.46t 5.46t
% sudo lvm lvcreate -L 500m -n lvtest vg0
Logical volume "lvtest" created
% sudo newfs -O2 /dev/vg0/lvtest
/dev/mapper/rvg0-lvtest: 500.0MB (1024000 sectors) block size 8192, fragment size 1024
using 11 cylinder groups of 45.46MB, 5819 blks, 10976 inodes.
super-block backups (for fsck_ffs -b #) at:
144, 93248, 186352, 279456, 372560, 465664, 558768, 651872, 744976, 838080,
...............................................................................
% sudo mount /dev/vg0/lvtest /mnt
mount_ffs: "/dev/vg0/lvtest" is a non-resolved or relative path.
mount_ffs: using "/dev/mapper/vg0-lvtest" instead.
% df -h /mnt
Filesystem Size Used Avail %Cap Mounted on
/dev/mapper/vg0-lvtest 470M 1.0K 447M 0% /mnt
There is one issue with LVM on wedges. LVM scans disks for its label to coalesce them into volume groups and to find the logical volumes. You can restrict the search in the LVM configuration, but if you use wedges, you must scan all wedges as the name of the device may change. Also, the result of a scan is saved to optimize subsequent scans. If your disk configuration changes, you either need to remove the cached result or configure LVM to not save it with the write_cache_state option.