A technical overview of the Xanadu Hypertext System
The Xanadu Hypertext System is a unified system for storing and managing certain forms of information, namely: bodies of documents. It is built around a small number of relatively simple concepts:
o DOCUMENTS
o multiple VERSIONS of documents
o LINKS between documents
o distributed storage and distributed computation -- NETWORKING
o isolation of storage functions from display functions -- BACKEND/FRONTEND
We will discuss each of these in turn and thereby build up a picture of what the Xanadu System is, what it does, and what makes it unique.
Documents
A "document" is a chunk of contiguous characters (or other data) which functions in some way as a conceptual unit. It is simply a piece of text which some user has declared to be a single thing. It corresponds roughly to the notion of a "file" as found in most computer systems. Beware of this analogy, however, because it breaks down quite thoroughly in the face of the other properties of the system.
A document may contain whatever the user decides to put in it. While documents will generally contain text, the system assumes nothing about their contents. Any arbitrary set of data which can be represented as a string of bytes may be stored in a document. Possible documents include letters, books, chapters of books, memos, marginal notes, musical scores, reports, pictures, computer programs, maps, diagrams, recordings, etc.: whatever the user desires the document to be.
Each document stored in the system is assigned a unique identifier when it is created. This identifier can then be used in later transactions with the system to designate that particular document. This document identifier is a special sort of number called a "tumbler". Tumblers are multipart numbers similar to those used to number sections of technical manuals and to index books in the Dewey Decimal System. For example, 1.0.3.15 and 123.5.6391.0.8.1 are both tumblers. They have the property that they can be expanded at any point, and it is therefore possible to implement a numbering system that never runs out of precision. This means that there is no limit to the number of document identifiers possible and therefore no theoretical limit to the number of possible documents due to a finite address space.
Since a document is a linear array of bytes, we can refer to any particular character in a document by its position in the document. Thus, by combining a document's unique identifier with a character position within that document, we can uniquely address any character stored in the entire system.
By designating a particular (starting) character and a length we can address any contiguous string of characters in the system. Such a string is called a "span". By clever use of tumblers instead of plain integers to represent character positions and span lengths, it is possible to have spans which contain whole documents and cross document boundaries.
Versions
The Xanadu Hypertext System can manage multiple differing versions of a single document. Typically, documents will change with time, as they are corrected or revised. Parallel versions may be made, as when writers weigh alternate drafts of a chapter or when a computer program and its documentation are maintained for several different installations.
Whenever a document is changed, the changes are recorded. The system remembers both the old and the new versions and can retrieve either on command. Thus it is possible to retrieve a document in the state it was in at any point in time since its creation. This can be particularly useful, for example, in software development or creative writing when one may make false starts and wish to back up to a previous state. In addition, the system can indicate the changes which caused two alternate versions to differ. Thus a complete "audit trail" through the history of a document can be constructed. Such a trail is in the form of an historical trace tree which branches whenever parallel versions are maintained.
In the general case, the problem of comparing two pieces of text to determine the differences between them is NP-complete. It may seem curious then that the Xanadu System is able to do this. The reason that the system can determine how documents differ is because it does not compare them. Rather, it performs the actual changes and "remembers" them, instead of having a whole new document passed to it by some external editor, as most systems would do.
Xanadu understands a simple set of editing primitives by which it can be directed to make changes to a document:
o INSERT characters at some point in the document.
o DELETE characters from some point in the document.
o MOVE a span of characters from one location in the document to another.
o COPY a span of characters from one location in the document (or from anywhere in the rest of the system, for that matter) to some location in the document.
A moment's thought will reveal that these operations are sufficient to perform any more complex editing operations in their entirety.
This sort of arrangement is not unique. However, most systems which do this, such as Bell Laboratories SCCS (Source Code Control System) and DEC's CMS (Code Management System) merely keep a log of the edit operations and then perform those operations in chronological order to produce the desired version. While this is sufficient in many cases, it is unwieldy when there are many parallel branching versions and it is costly in terms of computational overhead, becoming more so as more and more changes are made to a document.
In the Xanadu System, these primitives correspond to operations on the proprietary data structures in which the documents are stored. These data structures tell the system which characters are in which versions and in what order. This allows the proper retrieval of any version quickly and efficiently. The historical traceback facility is built in implicitly. To the best of our knowledge, the Xanadu System is the only system in existence able to manage multiple versions of data in such a general and efficient manner.
Links
A "link" is merely an explicit logical connection between one piece of data and another. The Xanadu Hypertext System supports a completely generalized linking facility. Any body of text in the entire system can be linked to any other body of text in the system. Links may connect text within a document or run between documents.
Any given link connects from one set of text in the system to another. The bodies of text at either end of the link are therefore called the link's "end-sets". The first end-set is called the "fromset" and the second is called the "to-set". By convention links have an implied direction from the from-set to the to-set, not surprisingly, although the relationship is entirely symmetrical and a link may be followed in either direction.
The potential contents of an end-set are totally unrestricted. An end-set may consist of an unlimited number of spans. The characters linked to or from need not be contiguous or even in the same document.
Links themselves are said to reside inside documents, located in a separate logical address space from the text. A link may reside in one document and link from a second document to yet a third. The location of residence of a link is entirely independent of the contents of the link's end-sets. The fact that links are among the potential contents of documents means that links themselves may be linked from or to, just as characters may. Thus very general sorts of indirect structures may be assembled.
It was mentioned above that the end-sets are symmetric and only distinguished by convention. In fact, a link may have any number of end-sets. The use of these extra end-sets may vary depending on the purpose of the link. We have found it convenient to define, again by convention, a third end-set called the "three-set".
The purpose of the three-set is to designate information pertaining to the link itself. By this means we can define "link types" which may in turn be used to determine how the link is to be interpreted. For example, we might create a "quotation link" type, which designates that the passage linked to is to be displayed in quotation marks at the location linked from. A "footnote link" on the other hand, might simply designate that a superscripted asterisk be shown at the "from" location and that the to-set contents be displayed only when the user so requests. In a more advanced system, the three-set might point to a program or subroutine to be downloaded into the display computer. This program would then decide how to display the link and its end-sets. In any case, the mechanism is totally general and constrained only by convention.
There are very few systems which attempt to implement a link facility at all like this. We are aware of only one system which even comes close, Englebart's "Augment", available on Tymenet. Augment implements links by embedding code sequences directly into the text. This technique has a number of drawbacks: it rapidly becomes unworkable when more than a small number of links are defined, it is very difficult to maintain the bidirectionality of links using such a scheme, discontinuous end-sets are just about impossible, and embedded code sequences would interfere with the versioning mechanism. As with versions, proprietary data structures enable our system to work. Xanadu links are not embedded in the text. Although links are said to reside in documents, this is simply a means to allow links to be addressed and thus linked to. There are a number of tricky problems that a linkage system such as this presents. In particular, answering the question, "what links connect from Qother places to a particular location?", is very difficult to answer using most conventional schemes of organizing data. Consider, for example, the problems inherent in assembling such things as citation indices. Nevertheless, we believe we have the general-case solution to these problems.
One might ask what happens to the end-sets of links when the documents containing the characters in the end-sets are changed. There is a very close relationship between the linking facility and the versioning facility. When new versions of a document are created, links are preserved. It is possible to link to a particular version of a document, to all versions, or to the most current version. Links will always point to the same text even though the document containing the text may change. For example:
here is some text.
some text in a document.
here is some more text.
some characters are inserted
here is me more text.
some characters are deleted
[the underlined characters represent an end-set of some link]
The ability to maintain links across multiple versions of documents is one of the totally unique and most powerful aspects of the Xanadu System. The problems inherent in doing this are almost unimaginably subtle, but we believe we have found the solution.
Multi-user
The facilities of the Xanadu Hypertext System would be of limited utility if each user functioned in isolation. Xanadu is therefore designed to be a multi-user system, allowing independent users to access and manipulate the same body of information.
Each user in the system has a unique user identifier tumbler. Each document in the system has an owner, and the owner's user identifier tumbler comprises one of the fields of the document identifier tumbler. By this means we can immediately determine the ownership of any document.
The owner of a document may grant other users permission to access it. As with most aspects of the Xanadu system, the access privilege mechanism is as general as possible. Permission to access a particular document may be granted to an arbitrary number of individuals, to groups of users, or to the entire user community. Type of access permitted (Read , Write, etc.) is not a relevant concept in the Xanadu System. You may always edit another user's document to which you have access, but in doing so you create a new version which is your own document. The other user is unaware of your version unless you inform him or her.
The multi-user capability of Xanadu makes a number of useful activities possible. For example, you can write marginal notes commenting on someone else's work. You write your comments in a document of your own and then link them to the other person's document. The other user need never be aware of this. It is like scribbling in a library book without defacing the book to the eyes of others. As another example, it is possible, using documents, links, and the permission facility, to create a quite sophisticated "electronic mail" system. By passing a link to another user you can point him at something you want him to read.
Networking
As the number of users on an installation of the Xanadu System grows larger, the processing and storage capability of even the largest and fastest computer will eventually be used up. The only viable solution to the fundamental resource limitations of centralized facilities is distributed processing and storage. The Xanadu System is designed to be able to grow arbitrarily large without having a significant adverse impact on overall system performance. It does this by allowing the system to spread out over as many machines as are needed to support all users effectively.
The portion of the system which stores and retrieves documents and links is designed to be able to run simultaneously on an arbitrary number of machines which are tied together in a network by some sort of communication medium. If a piece of information is requested at one node of the network but is not stored at that node, the Xanadu software running on that node's computer sends a request via the network to the machine in which the information is stored. This machine in turn sends back the requested data.
The system does not care what communications technology is used, as long as it gets the bits from one place to the other. No particular network topology is assumed, as long as paths exist between any nodes which might wish to communicate. Nodes may be added to or deleted from the network continuously with the minimum possible disruption. If a node is deleted, data that was stored solely on that node becomes unavailable, of course, and any communication paths that went through that node may need to be rerouted, possibly delaying some transmissions.
Communications routing from one node to another is automatic and implicit in the network design. Nodes are not required to have full knowledge of the entire network, thus the network can grow without every node being informed of every change to the network topology.
Various optimizations are built in. If a node finds that it is being given frequent requests for data stored on a distant node, it can make a copy of that data and store it locally. The system does not mind that there may be multiple copies of some information. If a request for such information is made, the system tries to get it from the closest or most quickly accessible source. If changes are made to one copy, the system keeps track. All of this is accomplished internally by the same data structures that manage all the other functions of the system.
Backend/Frontend
All of the capabilities described above are part of what is called the "backend". The backend handles storage, retrieval, versioning, linking and editing of data. However, it "knows" nothing about the data, per se. The backend is the portion which does all the "impossible" things. It contains the proprietary components which we have developed.
User interaction - the display of old data and the acceptance of new data - is accomplished by the other major portion of the system, the "frontend". The frontend worries about whether the data is text or pictures or music or whatever. The frontend decides how the data is to be formatted for display. The frontend knows how to interface with various input or output devices.
The frontend and the backend communicate using a standardized request and data transmission protocol. The frontend sends commands and data to the backend which responds with other data and status information. The specifications for this protocol are free for the asking to anyone. The frontend will, in general, be on a separate computer which is connected to the backend by a communications line (not to be confused with the communications lines which connect the Xanadu network - the whole network is the backend). Any program which understands the protocol, running on any computer connected to the backend, can be a frontend. The frontend need not be designed by us, written by us, or purchased from us. While we will be producing and selling frontends, we expect others to do so as well.
Frontends may be general purpose or special purpose. Different applications, different forms of data, different display and input devices and different user preferences can all be accommodated. For exam- ple, one might create an office automation frontend which handles text processing, electronic mail and project management functions through Xanadu, and does accounting and database stuff locally. A software development frontend might use Xanadu to manage source and object code and documentation and have local capabilities for compiling, executing and debugging programs. As a third example, a CAD/CAM frontend might use Xanadu to store drawings and documentation and have simulation and modeling functions built in. All of these different activities can be accommodated using the same backend design - the backend is application independent.
Summary
The Xanadu Hypertext System is the most powerful system in existence for managing text and other similar information. It can store arbitrary numbers of documents and retrieve them on demand. It can organize them by using a highly generalized link facility and by allowing (and keeping track of) multiple differing versions of documents. It can perform these services for multiple users on the same system, and allow those users to work with each other on a common document base. It can grow indefinitely over a large distributed network with minimal degradation in performance. It can accommodate almost any conceivable display or input device or mode of user interaction by separating these functions into an independent frontend. The Xanadu Hypertext System is the only system which can do all of these things.