-
Notifications
You must be signed in to change notification settings - Fork 67
XP, Vista and File Metadata
This page describes the evolution of metadata storage from XP to Vista (and later), and shows how File Metadata restores old capabilities in a new context.
If metadata, such as tags and comments, is added to a file, then the key question is: where is it stored? There are two main options:
- Inside the file. The format of a file must include well-defined locations for the metadata properties. For example, Word documents know how to store Subject and Author; .jpg files know how to store Tags.
- Outside the file. This can be in a central store, or in a store unique to each file, a sort of companion file. Because there is no dependency on the file format, any property can be added to a file of any type The advantage of keeping metadata inside a file is that it is intrinsically kept with the file when it is moved around, or even sent, say, as an e-mail attachment. The drawback is that, if the metadata is to be displayed in a generic tool such as Explorer, some piece of code has to be plugged in to read and write each file format, as the metadata will not always be stored in the same place. Another potential difficulty is that changing the metadata will update the file version and modification date as the system has no way of telling a metadata change from any other change. It will also only be possible to store a given metadata property if it is specified by the file format: for instance, PDFs do not have a well-defined location for a comment property.
Correspondingly, with metadata kept outside the file, it can be read and written using a single mechanism, and version information is preserved when it is updated. However, maintaining the connection between the file and the metadata as the file is moved about is a major problem.
Explorer presented metadata through a single user experience: the Summary tab in the Properties dialog. But under the covers, different storage mechanisms were used. Windows shipped with one main plug-in, to support image files (specifically, .bmp, .db, .gif, .ico, .jfif, ,jpe, .jpeg, .jpg, .png, .rle , .tif, tiff, .and .wdp). Explorer also had built-in (as far as I can tell) support for OLE Compound Documents, which meant that metadata in Office files (from 1997 - 2007), or any custom format based on the OLE standards, could be read and updated.
These mechanisms stored metadata 'inside the file'. But there was also an 'outside the file' fallback. If the file was not an extension for which a plug-in was defined, and not an OLE Compound Document, but was on an NTFS drive, then Explorer would use NTFS’s built-in capability to store properties in an annex to a file, an 'alternate stream'. Note that this mechanism partially addresses the main problem with 'outside the file' storage in that the metadata would be moved around with the file as long as it was always held on an NTFS drive. The metadata would still be lost, however, if the file was moved to an FAT drive, or sent as an e-mail attachment.
If none of the above mechanisms were available, then the Summary tab simply did not appear.
During the lifetime of XP, Microsoft introduced Windows Desktop Search. This naturally introduced a need to read a broader range of properties from a wide variety of file formats, which resulted in the standardisation of an interface to be implemented by a format-specific piece of code that could present metadata properties to Windows Desktop Search in a simple, consistent way, whatever the file format. In Vista, support for metadata properties was refactored out of Explorer and Search into an extensible property system built into Windows and consumed by both. Specific formats are handled by Property Handlers, which support the standard property access interface carried forward from Windows Desktop Search.
However, support for 'outside the file' storage was removed. Here's a quote from a blog entry from Vista’s beta period which appears to have been written by one of the developers:
In my opinion, XP was too aggressive in the way it handled properties. It assumed everything was an OLE Compound Document, or failing that, would store data in NTFS secondary streams. This works great for setting properties on arbitrary types, but it leads to a whole slew of other bugs, including several where a user's data is lost.
Vista avoids this assumption. Instead, property handlers are explicitly registered and are assumed to provide a stronger guarantee that if they are storing properties, they are storing them properly in the file itself.
http://blogs.msdn.com/b/benkaras/archive/2007/01/21/what-do-property-handlers-accomplish.aspx
So it seems there were complaints based on the genuine limitations of 'outside the file' storage, and that Microsoft pulled the capability in response.
File Metadata takes the view that the choice should be yours. It provides a Property Handler that plugs into the Vista and later property system and stores metadata in the same NTFS ‘alternate stream’ property store that was used by XP. It recognises the limitations of this 'outside the file' storage mechanism, but prefers to explain them, rather than withdraw the ability to store metadata on files of any type.
This comes close to restoring XP’s ability to store metadata on files of any type (provided that they are on an NTFS drive). One difference is that the capability must be turned on per extension, because there is no place to plug in default handling for 'all other file types'. Another is that the range of properties that can be stored is massively increased, as it includes all of the 600 or so properties defined by the Vista property system. One thing that is carried forward: metadata written by XP on a file without a recognised extension or storage format (i.e. written to an alternate stream) will be read by File Metadata.