Unix holey files
2014-04-18 09:51![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Unix has sparse files. If you write a byte at a seek()ed location to a file then all unwritten bytes prior to that seek()ed-and-write()n byte have value zero when read. Those zeroed bytes take no storage space on the disk (although the accounting for the storage does take some space). You can think of the file as having a "hole".
Sparse files are useful for network testing, as they allow the performance of the storage and I/O hardware to be taken out of the test, leaving the performance of the operating system and the network.
Sparse files for testing are conveniently created using dd(1). For example, to create a 10GiB test file named ‘test-10gibyte.bin’:
$ dd if=/dev/zero of=test-10gibyte.bin bs=1 count=1 seek=$(( (10 * 1024 * 1024 * 1024) - 1))
and to create a 10GB file named ‘test-10gbyte.bin’:
$ dd if=/dev/zero of=test-10gbyte.bin bs=1 count=1 seek=$(( (10 * 1000 * 1000 * 1000) - 1))
Aside: Units for networking and units for RAM
Networking uses SI units for bandwidth, due to the close relationship of bandwidth with signalling frequencies, measured in SI's Hertz. The error between (103)n and (210)n increases with n; becoming concerning when n=3 (GB versus GiB); and being unsustainably large when n≥4 (TB versus TiB).
Networking also uses bits as the basic unit rather than bytes, again due to the closer relationship of bits to signalling frequencies. In networking there are 8 bits per byte. Care is taken to distinguish Gbps (gigabits per second) and GBps (gigabytes per second) due to the eight-fold difference. Incorrect casing of the ‘b’ leads to exasperated coworkers.