As a developer or webmaster, whenever you take large-volume content from elsewhere in your company, you find yourself having to deal with the biggest interface of them all, the one between the word processor and the Web site.
May 26, 2000
Many webmasters never face this challenge. They work only with writers and designers who know well what their demands are, and deliver text in small quantities, in just the right formats. But other webmasters have to work with files that may have originated in Microsoft Word, Excel, WordPerfect, Emacs, VI, PowerPoint, PageMaker, FrameMaker, or even more obscure forms. These files may be sales material, product data, press releases, annual reports, white papers, or product manuals.
The easy way out is to have everything just remain as it is, and offer a link to download it. This can work well on an intranet, but may not work so well in the rest of the real world. Unix users, for example, probably can't play PowerPoint presentations.
The next easiest option is to turn everything into Adobe Acrobat PDFs (Portable Document Format). There's a PDF reader for almost every platform, so portability isn't an issue. But speed may be. A PowerPoint PDF can be a meg or two, small enough for any connection faster than an ISDN, but murderous on a 56K modem. If a lot of your visitors are on old modems, they may balk at downloading several PDFs.
A third option is to have everybody give you text files, which you can render as HTML. Aside from the obvious labor costs, you'll lose some tone and perhaps some content provider support, because, by gum, they've worked hard to make that material look good, only to have you make them strip away everything but the cold ASCII or Unicode characters. This approach does, however, fit more neatly into on-demand schemes such as Active Server Pages, because you can pop the text into a database. You'll probably also be able to automate some of your conversions, but almost certainly not all of them. The reason? No standardization. You're likely to get a half-dozen different file types, and with no discernible structural regularity in any individual document. Style tagging is often ignored by content generators, so you have no handles with which to map the text. This manual conversion approach is almost always a major labor sponge.
Of course, there's a final option, to get content generators to cooperate with you by "single sourcing" themselves. "Single source" is a newly emerging approach, not really a technology. As money has tightened, more companies are looking for ways to reuse what they already have, and writing things once for multiple uses has a great appeal. A Word user, for example, would write a white paper, then merely pass it through a filter to create HTML or even database-ready text formats. Sounds great. And the results can be. But the implementation is a major exercise in social engineering.
Users must agree to use a predictable structure, or at least a regular tagging scheme. Otherwise, there's nothing for the output filter to work with. Most users kick back at this, especially when they're used to "just writing". Word 2000's built-in HTML output filter uses Word's styles to create an internal CSS, which can swell the files from a few K to a meg or more. For instance, this article saved as Word HTML is 8K; but a mere 4K as text. Furthermore, the CSS isn't configurable. You'd be well advised to invest in a third-party filter such as Webconvert (http://www.webconvert.com/).
You'll have to be involved in setting up the technology. At the very least, you must supply the specifications. You may also have to help select filters and even get involved in setting up training sessions. And there will be training sessions.
The cost of all this up-front preparation is substantial. But in many circumstances, it can pay off handsomely. You, of course, will be happier and more productive. But so, eventually, will the content generators themselves be. After you overcome the initial reluctance to "write to a damned plan", writers can often see that they can get done faster. Management is often much happier too, when efficiency swings upward.
Not every content application is single-source-friendly. Word is pretty safe, if nothing else because there are third-party filters for it. Excel is also a reasonably safe choice, for the same reason. PowerPoint isn't as safe. PageMaker and Quark are almost pathological in their unwillingness to play well with other formats. FrameMaker is fabulously friendly, albeit harder to pay for and use.
Single source can work with active pages, too, with only a little more work to direct the various pieces to a database. Again, however, it requires strong structural typing on the part of the content generators.
If the culture can sustain it and you can go single source, there's almost always a quick uptick in productivity that maintains itself over time. If you can't, PDF is generally a good backup position. It's easy to do, has some flexibility, and it's simple to add to your site.
Tim Altom is Head TechnoDude and VP at Simply Written Inc. in Indianapolis. His company has helped lots of firms move to a single source operation. He also teaches the Clustar System to geeks who have to write manuals, to help them get done with minimal time and trouble. The Simply Written site is at www.simplywritten.com.