Recently we received an email from a customer who noticed a slight variation in the reporting of space used for an archive. Below is the question he asked and the answer we sent back. I thought it might help others when trying to fully understand LTFS archiving.
Our customer writes:
… I offloaded 2.20TB (2207.51GB) to the LTO6, the database details tells me it is actually 2.21TB in size, and the CSV report tells me it is 2.19TB used….
Imagine Products Response:
Several things come into play when you’re talking about file sizes. Bottom line is that it’s not as simple as one might think, and file sizes are not a good indication of data matching.
First, please realize that a tape is NOT a hard disk. While LTFS is a great invention to allow us to “see” the contents represented similarly to what a hard disk might look like to the computer, a tape and a disk are physically different and how they actually store data is quite different.
Also please know that Finder uses Spotlight to index the contents of mounted volumes, and often yields very confusing if not nonsensical results when dealing with large data sets, especially if they’re being browsed or changed. And Spotlight doesn’t work well (or at all) with LTO tapes.
The short explanation is there’s a difference between “size” and “size on disk”.
Why is There a Big Difference Between ‘Size’ and ‘Size on Disk’?
Most of the time, the values for ‘Size’ and ‘Size on Disk’ will be very close to matching when checking a folder or file’s size, but what if there is a huge discrepancy between the two?
Another thing that comes into play with tapes is the available and consumed space calculations depend upon responses from the deck. In the real world, tape sometimes has bad spots on it and the tape deck is designed to check for those and automatically skip bad sections. When that happens it simply rewrites the file it was working on to the next segment of tape and marks the bad section as deleted. This of course consumes what was thought to be usable space, but the drive doesn’t communicate that to our software so we really only have an approximation of how much data might fit on any given tape. To allow for this tolerance, we give PreRoll Post
a cushion of 5% of the reported space–in other words, we won’t let you attempt to add more data to a tape than 95% of it’s reported available space. This reserve is purely to allot for any bad tape sections.
Anyway, more to the point, while hard disk allocated space is in 4KB chunks regardless of the actual data size, tape doesn’t behave in that manner. It is not exFAT formatted. So when you add lots of files (quantity more than overall aggregate size of them matters) this difference accumulates.
With an application like PreRoll Post
, we use file copy routines to exactly copy the files and then double check them with checksums (that also test the byte sequences, not just total bytes). So, you can rest assured that the copies are 100% exactly all your data and an exact duplicate of them (regardless of the approximate size calculations).
About PreRoll Post:
PreRoll Post is an LTFS archiving application optimized for the media and entertainment industry. PreRoll Post securely archives assets to LTO tapes or ODA cartridges using simple drag and drop functionality as well as checksum technology to ensure archives are 100% accurate. PreRoll Post uses the LTFS open-source so even tapes not created in PreRoll Post can be imported and retrieved (*only those using LTFS). PreRoll Post is compatible with any LTO tape drive as well as Sony’s Optical Disc Archive. For more information visit the Archive Home Page