Contents

  1. What do you do to use a large disk with NetBSD?
  2. Disklabel
    1. formatting
    2. using
    3. checking
    4. and more
  3. Wedges
  4. GPT
  5. CCD
    1. configuring
    2. using
  6. Raidframe
    1. configuring
    2. formatting
    3. using
    4. checking
    5. and more
  7. LVM
    1. LVM on raw disks
    2. LVM on wedges

What do you do to use a large disk with NetBSD?

Here are two:


    wd1 at atabus3 drive 0
    wd1: 
    wd1: drive supports 16-sector PIO transfers, LBA48 addressing
    wd1: 5589 GB, 11628021 cyl, 16 head, 63 sec, 512 bytes/sect x 11721045168 sectors
    wd1: GPT GUID: 8ee69292-5099-11e4-833b-001cc4d779ed
    wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
    wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
    
    wd2 at atabus4 drive 0
    wd2: 
    wd2: drive supports 16-sector PIO transfers, LBA48 addressing
    wd2: 5589 GB, 11628021 cyl, 16 head, 63 sec, 512 bytes/sect x 11721045168 sectors
    wd2: GPT GUID: 902717d6-5099-11e4-833b-001cc4d779ed
    wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
    wd2(ahcisata0:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)

Disklabel

The NetBSD kernel seems to recgonize the disk size fine. But when running the disklabel program to partition the disk, you get this:


    % disklabel wd1
    # /dev/rwd1d:
    type: ESDI
    disk: WDC WD60EFRX-68M
    label: fictitious
    flags:
    bytes/sector: 512
    sectors/track: 63
    tracks/cylinder: 16
    sectors/cylinder: 1008
    cylinders: 11628021
    total sectors: 4294967295
    rpm: 3600
    interleave: 1
    trackskew: 0
    cylinderskew: 0
    headswitch: 0           # microseconds
    track-to-track seek: 0  # microseconds
    drivedata: 0 

    4 partitions:
    #        size    offset     fstype [fsize bsize cpg/sgs]
    a: 4294967295         0     4.2BSD      0     0     0  # (Cyl.      0 - 4294967295*)
    d: 4294967295         0     unused      0     0        # (Cyl.      0 - 4294967295*)
    disklabel: boot block size 0
    disklabel: super block size 0

While the disk geometry seems to be correct, the sectors and partition sizes are limited to 4294967295 which is 232-1. Older NetBSD kernels even produce a size from the lower 32bit of the real sector count. This is because the disklabel data structure has only 32bit fields to store partition sizes and offsets and the total sector count. This is good for up to 2TB disks with the standard sector size of 512 bytes.

The disklabel is part of the original disk driver interface and the driver will use that data inside its strategy routine to validate disk accesses.

You can still use the disk somewhat. While normal partitions are limited by the 32bit values in the disklabel (or even the truncated values set by older kernels), there is the "raw" partition (on i386/amd64 that's partition 'd', other archs mostly use 'c') where the driver ignores the disklabel and validates the access against an internal value which is not truncated.

There are a few obstacles though.

formatting


    % sudo newfs /dev/rwd1d
    newfs: /dev/rwd1d partition type is not `4.2BSD'

The raw partition has no correct type and newfs refuses to work, we have to persuade it a little bit:


    % sudo newfs -O2 -F -s 11721045168 /dev/rwd1d
    /dev/rwd1d: 5723166.5MB (11721045168 sectors) block size 32768, fragment size 4096
            using 7710 cylinder groups of 742.31MB, 23754 blks, 46848 inodes.
    super-block backups (for fsck_ffs -b #) at:
    192, 1520448, 3040704, 4560960, 6081216, 7601472, 9121728, 10641984, 12162240, 13682496, 15202752,
    .......................................................................................................

We need -O2 for a filesystem > 1TB, -F to prevent newfs from checking a disklabel and -s to tell it the disk size.

using

Then we can mount and use the disk:


    % sudo mount /dev/wd1d /mnt
    % sudo mkdir -m 1777 /mnt/scratch
    % dd if=/dev/zero bs=1024k count=1000 of=/mnt/scratch/testfile
    1000+0 records in
    1000+0 records out
    1048576000 bytes transferred in 10.350 secs (101311690 bytes/sec)
    % ls -la /mnt/scratch/
    total 2048592
    drwxrwxrwt  2 root     wheel         512 Oct 25 17:34 .
    drwxr-xr-x  3 root     wheel         512 Oct 25 17:34 ..
    -rw-r--r--  1 mlelstv  wheel  1048576000 Oct 25 17:35 testfile
    % sudo umount /mnt

checking

Filesystem checking needs extra effort:


    % sudo fsck /dev/rwd1d
    fsck: vfstype `unused' on partition `/dev/rwd1d' is not supported

Again you need to augment information from the disklabel for the raw partition:


    % sudo fsck -t ffs /dev/rwd1d
    ** /dev/rwd1d
    ** File system is clean; not checking
    % sudo fsck -t ffs -f /dev/rwd1d
    ** /dev/rwd1d
    ** File system is already clean
    ** Last Mounted on /mnt
    ** Phase 1 - Check Blocks and Sizes
    ** Phase 2 - Check Pathnames
    ** Phase 3 - Check Connectivity
    ** Phase 4 - Check Reference Counts
    ** Phase 5 - Check Cyl groups
    3 files, 256074 used, 1442176277 free (21 frags, 180272032 blocks, 0.0% fragmentation)

and more

You can also modify the disklabel, set the raw partition to type 4.2BSD (including fsize+bsize parameters) and delete the 'a' partition to avoid overlap warnings. The result looks like:


    4 partitions:
    #        size    offset     fstype [fsize bsize cpg/sgs]
     d: 4294967295         0     4.2BSD   4096 65536     0  # (Cyl.      0 - 4294967295*)

This makes fsck recognize the disk correctly, newfs still requires the disksize parameter.

One more caveat, when you write the disklabel to the disk, it has to coexist with the filesystem data. Since the raw partition starts at offset 0 (the disklabel data is ignored by the driver!), this does not work with every filesystem, but FFS is safe.

Wedges

Wedges solve two problems in NetBSD. They support larger disks without the 32bit limitation of the disklabel, and they can be used with any partition information on the disk which makes it possible to exchange disks between different platforms.

Here is a disk with a wedge:


    % sudo dkctl wd2 listwedges
    /dev/rwd2d: 1 wedge:
    dk6: hugedisk2, 11721043968 blocks at 1024, type: ccd

Wedges can be created and removed at any time with the dkctl command by specifying a name, start offset on the disk, the wedge size and a type.


    % sudo dkctl wd2 addwedge testwedge 34 500 ffs
    dk5 created successfully.
    % sudo dkctl wd2 addwedge testwedge 500 100 unused
    dkctl: /dev/rwd2d: addwedge: Invalid argument
    % sudo dkctl wd2 addwedge testwedge 534 100 unused
    dkctl: /dev/rwd2d: addwedge: File exists
    % sudo dkctl wd2 addwedge testwedge2 534 100 unused
    dk10 created successfully.
    % sudo dkctl wd2 listwedges
    /dev/rwd2d: 3 wedges:
    dk5: testwedge, 500 blocks at 34, type: ffs
    dk10: testwedge2, 100 blocks at 534, type: unused
    dk6: hugedisk2, 11721043968 blocks at 1024, type: ccd

You can see that creation of a wedge is validated, it requires a unique name and the wedges must not overlap.


    % sudo newfs NAME=testwedge
    /dev/rdk5: 0.2MB (500 sectors) block size 4096, fragment size 512
            using 3 cylinder groups of 0.08MB, 21 blks, 32 inodes.
        super-block backups (for fsck_ffs -b #) at:
        32, 200, 368,

Older versions of newfs do not support wedgenames, you need to specify the device, e.g. /dev/rdk5.

GPT

If wedges could only be created by running a command, they wouldn't be useful. But the NetBSD kernel can generate wedges automatically when a disk is attached. Most NetBSD architectures will scan for MBR (PC BIOS Master Boot Record) and GPT (GUID Partition Table). There is also support for standard BSD disklabels but which is currently disabled.

A GPT can be displayed, created and edited with the GPT command:


    % sudo gpt show wd2
            start         size  index  contents
                0            1         PMBR
                1            1         Pri GPT header
                2           32         Pri GPT table
               34          990         
             1024  11721043968      1  GPT part - NetBSD ccd component
      11721044992          143         
      11721045135           32         Sec GPT table
      11721045167            1         Sec GPT header
    % sudo gpt show wd1
            start         size  index  contents
                0  11721045168         
    % sudo gpt create wd1
    % sudo gpt show wd1
            start         size  index  contents
                0            1         PMBR
                1            1         Pri GPT header
                2           32         Pri GPT table
               34  11721045101         
      11721045135           32         Sec GPT table
      11721045167            1         Sec GPT header
    % sudo gpt add -a 512k -l hugedisk1 -t ccd wd1
    Partition 1 added, use:
            dkctl wd1 addwedge  1024 11721043968 
    to create a wedge for it
    % sudo gpt show wd1
            start         size  index  contents
                0            1         PMBR
                1            1         Pri GPT header
                2           32         Pri GPT table
               34          990         
             1024  11721043968      1  GPT part - NetBSD ccd component
      11721044992          143         
      11721045135           32         Sec GPT table
      11721045167            1         Sec GPT header

Since wedges are currently only created when a device is attached, we need either a reboot or some magic using drvctl.


    % sudo drvctl -d wd1
    % sudo drvctl -d wd2
    % sudo drvctl -a ata_hl -r atabus3
    % sudo drvctl -a ata_hl -r atabus4

NetBSD-current as a of 20141104 can rescan a device for wedges instead. This will delete all unused wedges for a device and readd them according to the label.


    % sudo dkctl wd1 makewedges
    successfully scanned /dev/rwd1d.
    % sudo dkctl wd2 makewedges
    successfully scanned /dev/rwd2d.

The console or dmesg will reveal that both disks have been reattached and wedges have been created:


    wd1 at atabus3 drive 0
    wd1: 
    wd1: drive supports 16-sector PIO transfers, LBA48 addressing
    wd1: 5589 GB, 11628021 cyl, 16 head, 63 sec, 512 bytes/sect x 11721045168 sectors
    wd1: GPT GUID: 38ba6ff4-e48a-42e4-a513-fe217d7fa013
    dk5 at wd1: hugedisk1
    dk5: 11721043968 blocks at 1024, type: ccd
    wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
    wd1(ahcisata0:1:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
    wd2 at atabus4 drive 0
    wd2: 
    wd2: drive supports 16-sector PIO transfers, LBA48 addressing
    wd2: 5589 GB, 11628021 cyl, 16 head, 63 sec, 512 bytes/sect x 11721045168 sectors
    wd2: GPT GUID: 902717d6-5099-11e4-833b-001cc4d779ed
    dk6 at wd2: hugedisk2
    dk6: 11721043968 blocks at 1024, type: ccd
    wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
    wd2(ahcisata0:2:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)

CCD

If the partitions (and wedges) were created with type ffs you could just use them to format filesystems and mount them but I created them with type ccd to be used by the concatenating disk driver.

ccd is a very old driver and has some deficiencies. For one, the ccdconfig tool didn't understand wedge names until very recently.


   % sudo ccdconfig -c ccd0 16 none NAME=hugedisk1 NAME=hugedisk2
   ccdconfig: NAME=hugedisk1: No such file or directory

configuring

For now we use the wedge device directly.


    % sudo ccdconfig -c ccd0 16 none /dev/dk5 /dev/dk6
    % disklabel ccd0
    # /dev/rccd0d:
    type: ccd
    disk: ccd
    label: fictitious
    flags:
    bytes/sector: 512
    sectors/track: 2048
    tracks/cylinder: 1
    sectors/cylinder: 2048
    cylinders: 11446332
    total sectors: 4294967295
    rpm: 3600
    interleave: 1
    trackskew: 0
    cylinderskew: 0
    headswitch: 0           # microseconds
    track-to-track seek: 0  # microseconds
    drivedata: 0 
    
    4 partitions:
    #        size    offset     fstype [fsize bsize cpg/sgs]
    a: 4294967295         0     4.2BSD      0     0     0  # (Cyl.      0 - 4294967295*)
    d: 4294967295         0     unused      0     0        # (Cyl.      0 - 4294967295*)
    disklabel: boot block size 0
    disklabel: super block size 0

using

Obviously the device that spans two disks is also too large for disklabel. What about wedges?


    % sudo dkctl ccd0 listwedges
    dkctl: /dev/rccd0d: listwedges: Inappropriate ioctl for device

The ccd driver does not support wedges at all. But we can still use the raw partition.


    % sudo newfs -O2 -F -s 23442087936 /dev/rccd0d
    /dev/rccd0d: 11446332.0MB (23442087936 sectors) block size 32768, fragment size 4096
            using 15420 cylinder groups of 742.31MB, 23754 blks, 46848 inodes.
    super-block backups (for fsck_ffs -b #) at:
    192, 1520448, 3040704, 4560960, 6081216, 7601472, 9121728, 10641984, 12162240, 13682496, 15202752, 16723008, 18243264,
    ......................................................................................................................
    % sudo mount /dev/ccd0d /mnt
    % df -h /mnt
    Filesystem         Size       Used      Avail %Cap Mounted on
    /dev/ccd0d          11T       4.0K        10T   0% /mnt
    % sudo umount /mnt
    % sudo ccdconfig -u ccd0

Raidframe

Since ccd wasn't that good, lets create a RAID0 using the raidframe driver.

First, change the partition types:


% sudo gpt type -i 1 -t ccd -T raid wd1
partition 1 type changed
% sudo gpt type -i 1 -t ccd -T raid wd2
partition 1 type changed
% sudo gpt show wd1
        start         size  index  contents
            0            1         PMBR
            1            1         Pri GPT header
            2           32         Pri GPT table
           34          990         
         1024  11721043968      1  GPT part - NetBSD RAIDFrame component
  11721044992          143         
  11721045135           32         Sec GPT table
  11721045167            1         Sec GPT header
% sudo gpt show wd2
        start         size  index  contents
            0            1         PMBR
            1            1         Pri GPT header
            2           32         Pri GPT table
           34          990         
         1024  11721043968      1  GPT part - NetBSD RAIDFrame component
  11721044992          143         
  11721045135           32         Sec GPT table
  11721045167            1         Sec GPT header

And the reattachment magic (or remake the wedges on recent NetBSD):


    % sudo drvctl -d wd1
    % sudo drvctl -d wd2
    % sudo drvctl -a ata_hl -r atabus3
    % sudo drvctl -a ata_hl -r atabus4
    % sudo dkctl wd1 listwedges
    /dev/rwd1d: 1 wedge:
    dk5: hugedisk1, 11721043968 blocks at 1024, type: raidframe
    % sudo dkctl wd2 listwedges
    /dev/rwd2d: 1 wedge:
    dk6: hugedisk2, 11721043968 blocks at 1024, type: raidframe

configuring

For raidframe we need a configuration file:


    % cat raid0.conf
    START array
    # numRow numCol numSpare
    1 2 0
    
    START disks
    /dev/dk5
    /dev/dk6
    
    START layout
    # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
    16 1 1 0
    
    START queue
    fifo 100
    % sudo raidctl -C raid0.conf raid0
    % sudo raidctl -I 7894737 raid0
    % sudo raidctl -A yes raid0
    raid0: Autoconfigure: Yes
    raid0: Root: No
    % disklabel raid0
    # /dev/rraid0d:
    type: RAID
    disk: raid
    label: fictitious
    flags:
    bytes/sector: 512
    sectors/track: 32
    tracks/cylinder: 8
    sectors/cylinder: 256
    cylinders: 91570655
    total sectors: 4294967295
    rpm: 3600
    interleave: 1
    trackskew: 0
    cylinderskew: 0
    headswitch: 0           # microseconds
    track-to-track seek: 0  # microseconds
    drivedata: 0 
    
    4 partitions:
    #        size    offset     fstype [fsize bsize cpg/sgs]
    a: 4294967295         0     4.2BSD      0     0     0  # (Cyl.      0 - 4294967295*)
    d: 4294967295         0     unused      0     0        # (Cyl.      0 - 4294967295*)
    disklabel: boot block size 0
    disklabel: super block size 0
    % sudo dkctl raid0 listwedges
    /dev/rraid0d: no wedges configured

formatting

The raidframe driver has no problems with wedges. We can create a GPT on the raid device and create the wedge and format a filesystem. This time manually but the reattachment magic would do the same.


    % sudo gpt create raid0
    % sudo gpt add -a 1024 -t ffs -l stripes raid0
    Partition 1 added, use:
            dkctl raid0 addwedge  34 23442087740 
    to create a wedge for it
    % sudo dkctl raid0 addwedge stripes 34 23442087740 ffs
    dk10 created successfully.
    % sudo newfs -O2 NAME=stripes
    /dev/rdk10: 11446332.0MB (23442087736 sectors) block size 32768, fragment size 4096
            using 15420 cylinder groups of 742.31MB, 23754 blks, 46848 inodes.
    super-block backups (for fsck_ffs -b #) at:
    192, 1520448, 3040704, 4560960, 6081216, 7601472, 9121728, 10641984, 12162240, 13682496, 15202752, 16723008, 18243264,
    ......................................................................................................................

using

Wedges can be mounted by name.


    % sudo mount NAME=stripes /mnt
    % df -h /mnt
    Filesystem         Size       Used      Avail %Cap Mounted on
    /dev/dk10           11T       4.0K        10T   0% /mnt
    % sudo mkdir -m 1777 /mnt/scratch
    % dd if=/dev/zero bs=1024k count=1000 of=/mnt/scratch/testfile
    1000+0 records in
    1000+0 records out
    1048576000 bytes transferred in 4.961 secs (211363837 bytes/sec)
    % sudo umount /mnt

checking

And also be checked.


    % sudo fsck NAME=stripes
    ** /dev/rdk10
    ** File system is clean; not checking

Older versions of fsck do not support wedgenames, you need to specify the device, e.g. /dev/rdk10.

and more

Verify that this would reconfigure automatically after a reboot:


    % sudo raidctl -u raid0
    % sudo drvctl -d wd1
    % sudo drvctl -d wd2
    % sudo drvctl -a ata_hl -r atabus3
    % sudo drvctl -a ata_hl -r atabus4
    % sudo raidctl -c raid0.conf raid0
    % sudo mount NAME=stripes /mnt
    % df -h /mnt
    Filesystem         Size       Used      Avail %Cap Mounted on
    /dev/dk10           11T       1.0G        10T   0% /mnt
    % ls -l /mnt/scratch/testfile
    -rw-r--r--  1 mlelstv  wheel  1048576000 Oct 25 21:47 /mnt/scratch/testfile

Unlike wedges, the raidframe devices are not automatically created when a disk attaches. That's why we needed the raidctl -c command. The raidframe driver however scans all disks available when booting and autoconfigures devices for all raidsets found that have the autoconfig flag set.

LVM

Linux LVM is a different scheme to manage disk space, it uses its own label to group multiple disks together and to carve out blocks using the device mapper driver to form logical volumes.

The device mapper provides logical block and character devices that route I/O to physical disks or some other logical devices. Such devices can be used for filesystems like a disk partition or wedge.

LVM is mostly used on raw disks as there is rarely a necessity to partition a disk first. But it can also be used on disk partitions or wedges, this has advantages if the disks are used for booting or are moved between different systems or platforms.

LVM on raw disks

LVM disks are labeled first with the pvcreate command and then coalesced into a volume group.


    % sudo lvm pvcreate /dev/rwd1d
    Physical volume "/dev/rwd1d" successfully created
    % sudo lvm pvcreate /dev/rwd2d
    Physical volume "/dev/rwd2d" successfully created
    % sudo lvm vgcreate vg0 /dev/rwd1d /dev/rwd2d
    Volume group "vg0" successfully created
    % sudo lvm pvs
    PV         VG   Fmt  Attr PSize PFree
    /dev/rwd1d vg0  lvm2 a-   2.00t 2.00t
    /dev/rwd2d vg0  lvm2 a-   2.00t 2.00t

This shows a problem with large disks, the LVM tools only understand conventional disk partitions that are limited to 2TB each. However, neither LVM, the device mapper driver nor the disk drivers when using the raw partition are bound to the disklabel information. But you need to tell LVM the real size to override the synthesized disklabel. The real size is 11721045168 sectors by 512 bytes giving 5723166 Megabytes.


    % sudo lvm vgremove vg0
    Volume group "vg0" successfully removed
    % sudo lvm pvcreate --setphysicalvolumesize=5723166 /dev/rwd1d
    WARNING: /dev/rwd1d: Overriding real size. You could lose data.
    Physical volume "/dev/rwd1d" successfully created
    % sudo lvm pvcreate --setphysicalvolumesize=5723166 /dev/rwd2d
    WARNING: /dev/rwd2d: Overriding real size. You could lose data.
    Physical volume "/dev/rwd2d" successfully created
    % sudo lvm vgcreate vg0 /dev/rwd1d /dev/rwd2d
    Volume group "vg0" successfully created
    % sudo lvm pvs
    PV         VG   Fmt  Attr PSize PFree
    /dev/rwd1d vg0  lvm2 a-   5.46t 5.46t
    /dev/rwd2d vg0  lvm2 a-   5.46t 5.46t

LVM on wedges

Here are again the two disks with a large wedge each. The wedge type is unknown because the GPT lists the partition as type linux-lvm which has no well-known wedge type.


    % sudo dkctl wd1 listwedges
    /dev/rwd1d: 1 wedge:
    dk8: hugedisk1, 11721045100 blocks at 34, type: 
    % sudo dkctl wd2 listwedges
    /dev/rwd2d: 1 wedge:
    dk9: hugedisk2, 11721045100 blocks at 34, type: 

Now label the disks, form a volume group, create a logical partition and a filesystem:


    % sudo lvm pvcreate /dev/rdk8
    Physical volume "/dev/rdk8" successfully created
    % sudo lvm pvcreate /dev/rdk9
    Physical volume "/dev/rdk9" successfully created
    % sudo lvm vgcreate vg0 /dev/rdk8 /dev/rdk9
    Volume group "vg0" successfully created
    % sudo lvm pvs
    PV         VG   Fmt  Attr PSize PFree
    /dev/rdk8  vg0  lvm2 a-   5.46t 5.46t
    /dev/rdk9  vg0  lvm2 a-   5.46t 5.46t
    % sudo lvm lvcreate -L 500m -n lvtest vg0
    Logical volume "lvtest" created
    % sudo newfs -O2 /dev/vg0/lvtest
    /dev/mapper/rvg0-lvtest: 500.0MB (1024000 sectors) block size 8192, fragment size 1024
    using 11 cylinder groups of 45.46MB, 5819 blks, 10976 inodes.
    super-block backups (for fsck_ffs -b #) at:
    144, 93248, 186352, 279456, 372560, 465664, 558768, 651872, 744976, 838080,
    ...............................................................................
    % sudo mount /dev/vg0/lvtest /mnt
    mount_ffs: "/dev/vg0/lvtest" is a non-resolved or relative path.
    mount_ffs: using "/dev/mapper/vg0-lvtest" instead.
    % df -h /mnt
    Filesystem                   Size       Used      Avail %Cap Mounted on
    /dev/mapper/vg0-lvtest       470M       1.0K       447M   0% /mnt

There is one issue with LVM on wedges. LVM scans disks for its label to coalesce them into volume groups and to find the logical volumes. You can restrict the search in the LVM configuration, but if you use wedges, you must scan all wedges as the name of the device may change. Also, the result of a scan is saved to optimize subsequent scans. If your disk configuration changes, you either need to remove the cached result or configure LVM to not save it with the write_cache_state option.