Understanding File Systems

Complete Developer Podcast - A podcast by BJ Burns and Will Gant - Thursdays

Categories:

Back in the day, before you learned to work with things like http, networks, databases, and the like, you learned to work directly with the file system. While relatively simple on the surface, the file system of any modern operating system can be more complex than you might realize. Worse still, if you started programming in the last ten years or so, the use of the filesystem for data storage was probably (and rightly so) de-emphasized in favor of learning about databases. While you are less likely to have to use the file system these days, that doesn’t mean that you can entirely ignore it. Plenty of tools still make heavy use of the file system, including developer tools, video encoding, image processing, bulk data import/export, and the like. In addition, legacy systems may still be dependent on the file system for some of their processing, so you may have to work with files on occasion. If you haven’t done a lot with the file system recently (or ever), there are plenty of things that can cause you problems that you aren’t prepared for. Worse still, many of these problems only appear on production systems, under load, or in unusual circumstances that are hard to replicate in a development environment. While mucking about with the filesystem is fairly straightforward, there are a few pitfalls for the unwary, especially on systems accessing large numbers of files, large files, or doing so in unusual circumstances. However, once built well, such systems can often run for decades without modification, as the means of working with files are very old, very stable, and very well-established in most environments. Learning to do this well is a skill that will pay off over time. Episode Breakdown Locks, and the lack thereof. When you open a file, you specify how you plan to work with it (Read/Write/Both) as well as whether to allow other processes to read or write the file while open. You have to be careful with this, because you could break other processes with the wrong locking settings, either by keeping them from accessing the file, or by altering the file while they are accessing it. The same principal can work in reverse on you. For instance, if you are reading a file while allowing other processes to write to the file, you could run into a situation where the file changes between reads. You may also find that you are denied access to a file due to the constraints placed on it by other apps. Those other apps may not be running (or running at much lower scale) in your development environment. Removable (or disconnect-able) media. Back in the day, we had to worry about floppy disks (or even zip disks, or CDs) getting ejected while we were attempting to use them. While less of a worry today, it’s still a potential problem. Things like USB drives, network shares, and even the occasional DVD (or even a floppy!!) could be disconnected or removed while you are accessing them. Worse still, you aren’t necessarily going to get a lot of notice if it is going to happen. You probably also don’t want to immediately fail if the device is temporarily disconnected (USB hub power issues, or network blips, for instance), but rather have some retry logic. In addition to total failure, at least some of these devices could remain connected, but become horribly slow. This is especially true of networked devices. You may need to take this into account as well. Permissions Just because you can see (or even open) a file, doesn’t mean that you can write to it. File system permissions are one of the primary ways that you’ll get burned here. Files aren’t the only things that are secured. Paths are as well, so you may not even be able to see a folder in which your file resides. Permissions also may be more granular than you want. For instance, you might be able to read a file,

Visit the podcast's native language site