Simplify Searching with Multiple File Extensions

Authored April 2000

Search worms (file system explorers like Verity and Index Server) generally use file extension and folder to determine which files to include within an index. So, for example, we could instruct Verity to create a collection out of all the ".HTM" and ".CFM" files in the "/content" folder.

Although this works very well for content heavy sites it begins to fail for application-like sites. There are always many files that you don't want indexed: functional includes like shopping carts and database updates pages, custom tags, and so on.

A solution using Extensions

This problem as most often been addressed through use of directory segmentation: placing "non-searchable" files in a separate directory often called "Scripts". This can sometimes cause issues with portability and maintenance since files that work together are not kept together.

To solve this problem we have developed a simple solution: create multiple file extension types for ColdFusion (or ASP for that matter). We began doing this in IIS 3.0 which annoyingly required direct modification of the ColdFusion registry keys. IIS 4 and 5 make the creation of additional extensions and their association with a script engines quite easy. We've been told that Apache and most other HTTP servers also handle these types of changes handily, but unfortunately lack experience with them.

For example purposes let's create a new ColdFusion extension: "CFMS" ("CFM System"). "CFMS" will be used on "code only" templates such as form and calculator results, files that make up multi-template "wizards", shopping cart templates, and so on. Basically any file that would be confusing as a search result. Note that the actual extension used is up to you. We prefer appending a single letter to "CFM", but the extension could be "Fred" just as easily. Also you may create as many extensions as make sense to you.

Adding the Extension

We can't do a complete tutorial on setting extensions and script mappings, so we'll just do a quick run down of the steps using IIS 5. The process is very similiar in IIS 4.

  1. Open the Internet Information Services control panel. Right click the root of the server and select "Properties".
  2. Select the Master Properties for the WWW Service and hit "Edit".
  3. Select the "Home Directory" tab.
  4. In the "Application Settings" area click the "Configuration" button.
  5. Select the "App Mappings" tab.
  6. Note the "Executable Path" for the .CFM mapping (this is usually "C:\ColdFusion\Bin\iscf.dll").
  7. Click the "Add" button. Put the path you noted in the "Executable" field and your new extension in the "Extension" field. Make sure that the "Script Engine" control is checked and then click "OKay".
  8. Your new extension should appear in the list. "OKay" out of all the dialogs and you're done.

You can now create files with the new extension and they will be parsed by the ColdFusion engine.

The Pros

Once configured properly there is no functional difference between your new extension and the default extension(s). ColdFusion will parse "Index.CFMS" just as it would "Index.CFM". The main benefit is that it's now a very easy task to "hide" these files from the Verity Search Engine or Index Server: simply don't include "CFMS" in the searchable extensions list. No fuss no muss and, most importantly, no loss in performance.

You can virtually group files easily without resorting to physical directory segmentation. For example if all of your CFML custom tags had an "CFMX" extension they could reside near whatever code used them without being returned by Verity. But a simple file search for the new extension will return a list of all your custom tags. You can also do text searches over specific file groups in ColdFusion Studio.

Applying this technique to certain application architectures can be very useful. Many developer's, for example, use common prefixes to denote specific file types. Prepending "DB_" to files that do only database access is one example. Such prefixes can be replaced with custom extensions to gain a bit more flexibility.

The Cons

Although not technically a con it should be obvious that a site could easily be architected in such a way as to make this technique unneeded. It should also be fairly clear that retrofitting a site with this technique may be more trouble than it's worth. This technique makes sense for small sites or new development more than anything else. However a largish site that's running into problems creating a workable search engine may find this technique easier than a complete rebuild.

The biggest potential problem with this technique is an obvious lack of portability. The non-standard extensions mean confusion when distributing your code outside your organization. Even in a smooth transfer there is still added server configuration time, at least initially, to handle the new extensions.

Conclusion

Although not for everybody this technique can greatly simplify searchable sites with high code content. Unlike other possible solutions (strict code segmentation or advanced search logic as two examples) there is no performance or development time loss.

23 Current Sessions; Time: 07:13:39 07-01-2009; Tick: 219