Information Supply Chain

Sharing Is a Virtue

The benefits of peer-to-peer networking are a force too strong for legal problems to stop

Napster Inc. and MP3 recently have been in the news because of copyright issues. But that is just a diversion from the real issue of a grass roots effort to build peer-to-peer networking (PPN) over the Internet. In spite of the court injunction against Napster, the importance of PPN in the future of the Internet, the Web, and extensible markup language (XML) is undiminished. Because the Internet is decentralized by its nature, the issues that surround Napster have no significant impact on it. PPN is a protocol, beyond any legal issues.

However, the final outcome of these legal battles will clearly affect how people choose to use or abuse this protocol. If people use PPN, or even the Internet, for copyright infringement, they are breaking the law. But the technology is independent of the crime. Legal issues aside, PPN is an exciting capability. And, in fact, users are already migrating to other PPN technologies, such as Gnutella or Scour.

These networking efforts are effectively open, yet nonmainstream, network communities within the Web that do not use centralized servers. These distributed network communities will have a significant impact on traditional content providers. But I'm jumping ahead of myself. Let's talk about current events first, before digging down into the more interesting issues.

What Is Napster?

For those of you who are busy leveraging rather than enjoying the Web, MP3 is a music standard released a few years ago. Although not its intent, you can use MP3 to record music from existing music CDs and upload these music files to the Web. The words piracy, copyright infringement, and violation easily come to mind. It is reminiscent of the old Internet cry that "information wants to be free."

Okay, so you have some cool music. How do you find other folks with some different, cool music to swap? Search engines are slow and swamped. Enter Napster to save the day. Napster software (named for its creator, Shawn Fanning, whose nickname is Napster because of his curly hair) indexes MP3 music files on a user's PC and displays this index to other Napster users over the Internet. A simple search locates the music of interest, which users can then download through the free Napster network. I don't think that Napster folks were originally thinking about piracy, and I don't think that Tim Berners-Lee worried about copyright infringement when he created the Web.

None other than Andy Grove, CEO of Intel, expressed an interest in Napster in the May issue of Fortune. He is not interested in the music or the specifics; he is interested in the architecture. Farsighted, that guy -- he knows a turning point when he sees one. Napster (as well as other network software listed in the sidebar, www.intelligententerprise.com) enables PPN. This means that I can connect directly to your computer and you to mine, without the intervention of an external server. (Reminds me of AppleTalk.)

If Napster were an isolated instance of PPN, then I am not sure that Grove would be so interested. But five or more different PPN offerings are available. One of the more flexible PPN systems is Gnutella, which works with more than just MP3. It also works with corporate data.

Gnutella Network

Gnutella (named for a European breakfast cereal) is a protocol for connecting computers on a peer-to-peer basis across the Internet, in contrast to Napster, which is more like a centralized index of file servers. With Gnutella client software on a local computer, users can select what they want to share, index that information, and search for shared files and information across a distributed Gnutella network.

Gnutella provides more than simply a search engine and file server. It also provides the protocol for a distributed capability that is similar to the founding concept and protocol behind the Internet itself.

In fact, Gnutella is like a mini-Internet within the Internet. The Internet uses large servers and Cisco switches, and Gnutella leverages the resulting bandwidth capabilities to permit local computers to function as nodes and servers for the Gnutella network.

Software and protocols such as Napster and Gnutella provide the pipe, the pathway, the journey, and the network rolled all in one. One user can connect successively through other users to gain access to the entire network of content providers. The destination is the millions of distributed PCs or other content providers. And the content can be music, multimedia, or corporate data. This is where PPN gets interesting.

The PPN protocols, architecture, and approach that let content flow directly from client to client, without the intervention of a middle man, middleware, or central server, present both a threat and an opportunity to traditional content providers. Indeed, this kind of PPN across the Web has two interrelated implications. First, it changes the nature of search engines and content providers, because PPN enables direct and fresh access to the content without time delays due to centralized indexing. Second, PPN may be the method of implementing some of the advanced, complex features of the XML Linking Language (XLink, sometimes known as XLL), bringing us closer to the Knowledge Navigator, a concept described by John Sculley of Apple Computers in the late 1980s.

This futuristic, intelligent assistant could search the global network to collect and combine information from across the world. Although a tool that lets one Web site automatically collect information from many other Web sites and then combines the results into a coherent presentation for the user is a sophisticated application, it is both possible and not that difficult to design.

Content Is King

Currently, the Internet is a massive source of content for the user. Millions of servers throughout the world store this content. Companies such as Intel may spend as much as $80M to support the servers and network infrastructure that supplies Internet content to the world. As Grove states in the Fortune article, if the bandwidth requirements shift from a server model to a PPN model, then that $80M budget will have to be reconsidered.

Rather than building up bandwidth resources for centralized access to the content server, corporations will have to redirect these resources to satisfy PPN software requests from individual computers. This is not a bad thing, just another decentralized thing -- and that can be significantly leveraged into a good thing.

Today, standard search engines catalog server-based, centralized Web-based content and store the indexed information in a centralized database for retrieval. As they have demonstrated a few times, hackers can attack and shut down a centralized content server. However, a distributed content network, just like the Internet itself, is much more difficult to attack successfully. Sure, hackers can shut down one segment of a network, denying access to its information. But for the most part, traffic will be diverted to other segments.

Another limitation of conventional search engines is that they retrieve Web page information by using intelligent agents or automated Web crawlers to index Web sites. However, search engine technology has not been able to keep up with the increasing number of Web pages, estimated at more than one billion and growing. In fact, according to George Cybenko, a Dartmouth computer scientist, the Web is growing so fast that a search engine needs a 50Mbps (T3) network line just to keep up with its automated Web crawlers and spiders as they index new Web sites. The unverified extrapolation is that the number of Web pages doubles every 60 days. This estimate is not unreasonable when you consider that a Web site contains many Web pages and corporations can publish or update hundreds or thousands of Web pages per day.

Furthermore, much of Web site information is static, out of date, and incorrect, pointing the user at broken links and nonexistent Web pages. Even Yahoo, which screens its information manually, has the problem of too much information to keep it all up to date. Moreover, centralized search engines cannot retrieve dynamic content from pages that are built on the fly from e-commerce sites, database searches, or user interactions. Users have been screaming for better search engine technology since the start of the Web's popularity in 1994.

So the next step in the Web's information-sharing evolution is a PPN parallel search and file-sharing community, distributed throughout the regions of the Web and the Internet. PPNs decentralize information as well as the search capability. They also provide access to dynamic content, rather than the static information provided by traditional search engines.

Content providers can define which files and content are shareable. This approach is a departure from today's search paradigm, in which search engines merely point users to the correct Web site, forcing the user to navigate to the information of interest. With PPN capability, the journey is also the destination, because users can search for and go directly to the information of interest, as defined by the content providers.

The potential behind this is remarkable. Simply select content and share it. It's really that easy, and the power of sharing content is limitless. The file and information formats are not important. Any media can be shared. And pushed. Because the user has control over what is shareable, the user can also "push" information as responses. This capability provides a significant opportunity for portals, traditional search engines, and other commercial content providers.

Just as search engines such as Yahoo and Altavista make money by selling keywords and advertising, any commercial content provider can create interesting content-rich advertisements that are pushed in answer to search queries. However, these ads will present a double-edged sword, because they cannot be purely self-serving.

In a competitive information space like the Web, where time and attention are golden, users resent content-free ads that waste time because they provide no information or entertainment. PPN software works both ways. Users can boycott a blatant ad and filter out an entire Web site. Just as spam refers to unwanted junk email, the PPN user community will coin a term to describe content-free ads to be ostracized.

But the financial opportunities are too great, so commercial content providers will learn quickly what works and what does not. Companies have not been able to manage this part of the Web-browsing experience previously. Now companies will be able to control how search queries are answered more intelligently, taking charge and driving the flow of information traffic. PPN-based content will do for searching what the Web did for information, and it will do for the Web what advertising did for radio and TV. The die is cast, and the map is in hand; the only question now is how long will the journey take. Like everything else that is Web-based, probably not very long.

XML Linking Language

The second implication deals with XLink, a very powerful language that dramatically extends the capabilities of linking documents well beyond the abilities of mere HTML Web pages. XLink enables bidirectional linking, multiway linking, and out-of-line linking. Bidirectional linking is simply the idea of visiting a Web page and returning to the starting point by clicking on the same link again. Multiway linking is the implementation of Web rings by using XLink. Rather than using the navigation buttons, a user can traverse back and forth and all around within a predefined set of Web pages. Out-of-line linking is the concept of hyperlinking between two or more pages that were not originally linked, by using a separate file for the links that connect the Web pages. Xlink includes these abilities and a few other specifications.

These extended linking capabilities support the virtual XML document feature, because these links let a developer access specific content within a variety of distributed documents and then display the results to a user. Users are not aware that the "current" document exists only while they are looking at it. (Shades of quantum mechanics!)

Another type of individualized, virtual document already exists. When a user interacts with a database-generated or script-based Web page, the resulting Web page of dynamic content exists only in response to a user's unique parameters. Although traditional search engines cannot index dynamic content from other Web sites, these same search engines create dynamic content themselves in the form of retrieval results. Dynamic content was once the realm of the Perl or Java programmer. XLink provides this ability to the non-programmer.

The issue with XLink is that it is not clear how to implement the solution to these specifications. One possibility may be a relative of Napster or Gnutella that will define a robust PPN protocol for developing the XLink capabilities. A protocol that combines the distributed appeal of PPN with the power of XML and XLink standards would be a formidable agent for change. In fact, PPN is the potential spark of Web and Internet access that will fan the roaring flame of universal information retrieval to meet the business needs of the 21st century.

Contact Mr. Hank Simon

Hank Simon (hank.simon@lmco.com) has been working with artificial intelligence and knowledge discovery, in various forms, for the past 22 years. He is currently consulting and writing about XML, WAP, and Bluetooth Web technologies.

hank.simon@lmco.com

Source: intelligententerprise.com

Intelligent Enterprise Magazine - Information Supply Chain