WebDevelopersJournal.comTips on Web Page Design, HTML and Graphics
SITE SEARCH
Newsletters
Java/Open Source Update



Jobs at webdeveloper.com

Resources By Subject
Technical
Graphical
Authoring
Business
WDJ resources
Archive

internet.com

internet.commerce


Developer Channel


Find a web host with:
CGI Access DB Support Telnet Access
NT Servers UNIX Servers



Semi-automatic?

JavaScript
JavaScript Helper:
Meet Paige Turner, the least geeky geek we've ever come across.

Variables and Operators Explained:
First of a three part guide to JavaScript basics.

Controlling Forms:
Enhance your HTML forms with a touch of JS.

DHTML:
Forget how it works, let's see some in action!


XML Content Syndication:
Part 4

Multiple Formats From One XML File

"Applied XML Solutions," a new book from Benoît Marchal, shows professional developers how to apply XML to a variety of real-world applications. These include using XML as a scripting substitute and using XSLT to facilitate communication between incompatible systems. Here we present the fourth and final part of the chapter devoted to content syndication: producing HTML, WML and RSS from XML.
December 13, 2000
Extract published courtesy of Sams Publishing.

This article is in four parts:

The Style Sheets

This section presents the three style sheets.

HTML Style Sheet

The style sheet for HTML documents is in Listing 4.5. It builds a short table of contents before listing the various news items.

Listing 4.5 html.xsl
<?xml version="1.0"?>

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/TR/REC-html40"
  version="1.0">

<xsl:output method="html"/>

<xsl:template match="/">
<HTML><HEAD><TITLE>Pineapplesoft Daily</TITLE></HEAD>
<BODY><H1>Pineapplesoft Daily</H1>
  <H2><A NAME="toc">Today's News</A></H2>
  <xsl:for-each select="News/Item">
   <P><A HREF="#{generate-id(.)}">
     <xsl:value-of select="Title"/></A><BR/>
     <SMALL>by <xsl:value-of select="Author"/></SMALL><BR/>
     <xsl:value-of select="Abstract"/></P>
  </xsl:for-each>
  <H2>News Items</H2>
  <xsl:for-each select="News/Item">
    <H3><A NAME="{generate-id(.)}"><xsl:value-of
     select="Title"/></A></H3>
    <P><I>by <xsl:value-of select="Author"/></I></P>
    <xsl:for-each select="Para">
     <P><xsl:value-of select="."/></P>
    </xsl:for-each>
    <A HREF="#toc"><SMALL>More News</SMALL></A><BR/>
  </xsl:for-each>
</BODY></HTML>
</xsl:template>


</xsl:stylesheet>

Figure 4.7 presents the result in a browser.



Figure 4.7 
The HTML document in a browser.


Crash Course in XSL - The basics of XSL are easy to learn. An XSL style sheet is a list with one or more templates. Each template describes how to transform one element (and its descendants) from the original XML document into the output format.

Let's look at a very simple template:

  <xsl:template match="/">
      <HTML>
      <HEAD><TITLE>Today's News</TITLE></HEAD>
       <BODY><H1> Today's News </H1>
         <xsl:for-each select="News/Item/Para">
          <P><xsl:value-of select="."/></P>
         </xsl:for-each>
       </BODY>
      </HTML>
    </xsl:template>
As you can see, most of the template is a regular HTML document peppered with XSL statements (xsl:for-each and xsl:value-of) to extract information from the original XML document.

The match attribute points to the element to which the template applies: / is the root of the document.

You will find more information in Appendix C of the book, "XSLT Reference," or in a tutorial such as XML by Example.


WML Style Sheet

Listing 4.6 is the style sheet to generate WML. Again, it starts with a table of contents. Unlike the HTML document, however, it places each news item on a different card.

Listing 4.6 wml.xsl

<?xml version="1.0"?>

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

<xsl:output
  method="xml"
  doctype-public="-//WAPFORUM//DTD WML 1.1//EN"
  doctype-system="http://www.wapforum.org/DTD/wml_1.1.xml"/>

<xsl:template match="/">
<wml>
<card id="toc" title="Pineapplesoft Daily">
  <p align="center"><b>Today's News</b></p>
  <xsl:for-each select="News/Item">
   <p><anchor><xsl:value-of select="Title"/><go
     href="#{generate-id(.)}"/></anchor></p>
  </xsl:for-each>
</card>
<xsl:for-each select="News/Item">
  <card id="{generate-id(.)}" title="Pineapplesoft Daily">
   <p align="center"><b><xsl:value-of select="Title"/></b></p>
   <p><small>by <xsl:value-of select="Author"/></small></p>
   <p><xsl:value-of select="Abstract"/></p>
   <p><small><anchor>More News...<go
     href="#toc"/></anchor></small></p>
   <xsl:for-each select="Para">
     <p><xsl:value-of select="."/></p>
   </xsl:for-each>
   <p><anchor>More News...<go href="#toc"/></anchor></p>
  </card>
</xsl:for-each>
</wml>
</xsl:template>


</xsl:stylesheet>

Unlike Listing 4.5, this style sheet generates an XML document. Indeed, WML follows the XML syntax. The style sheet also issues a DOCTYPE statement, as required by the WML specification.

Figure 4.8 shows the result in a WAP browser.



Figure 4.8
The WML document in a WAP browser.

RSS Style Sheet

The last style sheet is in Listing 4.7. This style sheet generates an RSS document. Unlike the previous two style sheets, this one limits itself to a table of contents—you will recall that RSS is not designed to handle large documents. RSS is essentially a table of contents of the portal Web site.

Listing 4.7 rss.xsl

<?xml version="1.0"?>

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

<xsl:output
  method="xml"
  doctype-public="-//Netscape Communications//DTD RSS 0.91//EN"
  doctype-system="http://my.netscape.com/publish/formats/rss-0.91.dtd"/>

<xsl:template match="/">
<rss version="0.91"><channel>
<title>Pineapplesoft Daily</title>
<description>Your source for technology news, trends and
  facts of interest to web developers.</description>
<link>http://www.pineapplesoft.com</link>
<language>en</language>
<xsl:for-each select="News/Item">
  <item>
   <title><xsl:value-of select="Title"/></title>
   <link><xsl:value-of select="/News/URL"/>#<xsl:value-of
    select="generate-id(.)"/></link>
   <description><xsl:value-of select="Abstract"/></description>
  </item>
</xsl:for-each>
</channel></rss>
</xsl:template>


</xsl:stylesheet>

Warning - The RSS style sheet is tricky—it can be used safely with Xalan, but it might not work with other XSL processors.

The problem is that the RSS document must include links to the news items. The easiest path is from RSS to the servlet. When a visitor follows a link, RSS will retrieve the news in the appropriate format: HTML or WML.

But how do you link it to a specific news item on the news page? You can simply use a reference at the end of the URL: http://localhost:8080/publish/index#N8. This is where it gets tricky.

The style sheet uses generate-id() to create the references. This works because Xalan always generates an identifier based on the original XML document. If you run the style sheet twice on the same document, you get the same identifier.

However, the XSL specification is not so strict. It guarantees only that generate-id() will generate unique identifiers for each run. Theoretically, you could run the style sheet twice on the same document and get different identifiers.

Therefore, you might find that this trick does not work with other processors. In that case, simply include identifiers in the original XML document: <Item id="N8"/>.


Figure 4.9 illustrates the result with My Userland, an RSS portal.



Figure 4.9
Registering the RSS file on My Userland.

Building and Running the Project

The publishing project is available on a CD-ROM included with the book. Copy the project directory from the CD-ROM to your hard disk. Under Windows, start the server by double-clicking publish.bat. Next, open a browser and type the following URL (refer to Figure 4.7 above):

http://localhost:8080/publish

If possible, you should download at least one WAP browser and test the document again. You also might want to register the RSS channel with a portal.


Warning - This project uses Xalan 1.0 as the XSLT processor. If you are using another processor, you will need to adapt style().

The project also uses Jetty as the Web server. However, because it is based on servlets, it should be easy to adapt to another Web server. You can add servlet support to most Web servers through JRun.


If you develop your own documents, register the corresponding RSS channels with http://www.xmltree.com, my.netscape.com, and my.userland.com.

Additional Resources

If you find this project useful, be sure you read Chapter 8 as well. Chapter 8 presents a different twist on the same technique and many useful extensions to the servlet.

DocBook

As has already been indicated, for more complex documents, you can turn to the DocBook DTD available from http://www.docbook.org. DocBook is a powerful DTD for document publishing and is available in both SGML and XML.

XHTML

WML is the most popular markup language for mobile users, but the W3C is working on its own solution. The W3C has developed XHTML, an XML version of HTML. The recommendation is available from http://www.w3.org/TR/xhtml1.

The major advantage of XHTML is that it is based on HTML so it will be familiar for Web designers. The major inconvenience of XHTML is also that it is based on HTML. This results in a large and complex markup language. Therefore, XHTML currently is too complex for mobile phones.

The W3C is working to simplify XHTML. Only time will tell whether XHTML will achieve widespread acceptance.

Open eBook

Another interesting format for mobile users is the Open eBook specification. Open eBook was designed for eBook, a different group of mobile users. An eBook can take many forms, but it is generally a palm-sized device on which readers download books.

You will find more information on Open eBook from the Open eBook Forum at http://www.openebook.org. A popular eBook reader is the Rocket eBook, available from http://www.rocketebook.com. Unfortunately, it does not support the Open eBook format yet.

ICE

I introduced RSS as the content syndication format in this chapter because RSS is very popular. RSS is not the only choice, however.

An alternative is ICE (Information Content and Exchange protocol). ICE is a more ambitious project that aims to link content providers and publishers. You can find more information on ICE at http://www.w3.org/TR/NOTE-ice.



Copyright Sams Publishing. All rights reserved.

Back to Part Three

Suits PonytailsPropheadsContact WDJDiscussWeb AudioSearch