OUTLINE OF A NEW MODEL OF INFORMATION CONTENT AND STRUCTURE

October 7, 1998 Draft

by David Wojick (dwojick@shentel.net)

Introduction

This is the information age, but what is information? Several years ago, while working on an information system design problem, I realized that there is no scientific definition of "information content", as that term is ordinarily used. Traditional information theory does not provide one; because information theory does not take into account that information has content. It accepts random strings of letters as information.

So I set out to do what in mathematical logic is called a "rational reconstruction". This means to develop a formal definition of an ordinary language concept. In fact, I have found not one, but 3 definitions of information content, nested one within another. These 3 definitions turn out to be quite powerful. They provide the basis for a new model of information content and structure, one that is applicable to any information design activity. In fact, the model supports several new design methods, including actually measuring and optimizing content in various interesting ways.

In my vision people cruise through the structure of their information as one would fly a starship. They visit strange and beautiful worlds, worlds that are in a sense already real. We just can't see them yet.

Some aspects of this model are merely conjectural at present. A research program will be required to test and refine its many facets. However, because the model grows out of 25 years of study of the structure of complex situations, many of its applications have already been proven in areas like engineering, management, and public policy. In what follows the examples will be from my experience in information system design.

(Historical note: The information system referred to above was a hypermedia system to support training for the Standard Army Automated Contracting Software. My goal was to develop a taxonomy of hyperlink types, so I asked myself "what are all the ways that two or more pieces of information can be related?" It was at this point that I realized I did not know what information content was. The model proposed here is the beginning of a precise answer to my question.)

What is information content?

The 3 proposed definitions, or types, of information content are:

Type 1: The propositional content of expressed thought.
Type 2: The propositional content of symbolic expression or display.
Type 3: The propositional content of meaningful expression or display.

In all three types the information content is propositional in nature. Propositions are the fundamental objects in mathematical logic, first postulated by Boole in 1845. At its simplest, a proposition is the meaning of an statement independent of the language used to express that meaning. You can say that snow is white in many languages, so there is something --the proposition that snow is white -- that is independent of the words you use.

While unfamiliar to most people, this concept is explained in detail in most logic texts. In fact, many practitioners of mathematical logic consider that discipline to be the science of propositions. So if these definitions are accepted as capturing the ordinary language concept of information content, then we have a solid grounding in a well developed discipline.

The basic difference between the three types of information content is this. Type 1 information requires an expression of thought, so it can only be produced by thinking things using language. This is the core definition, as in "he informed us that he was leaving". Type 2 information includes type 1. It can be produced by devices as well as by people, but still requires something like language. This captures the fact that we get information from speedometers and spreadsheets. Type 3 information allows for non symbolic sources of content such as a video.

Relation of my model of information content to information theory

Before proceeding, it is important to explain how my model differs from what has traditionally been called "information theory". Simply put, information theory is about the transmission of information, not its content. This is stated quite nicely by James Gleick in his book CHAOS: Making a New Science (Viking, New York, 1987, p. 255). Gleick is referring to fundamental research done at the University of California at Santa Cruz:

"The most characteristically Santa Cruzian imprint on chaos research involved a piece of mathematics cum philosophy known as information theory, invented in the late 1940s by a researcher at the Bell Telephone Laboratories, Claude Shannon. Shannon called his work 'The Mathematical Theory of Communication', but it concerned a rather special quantity called information, and the name information theory stuck. The theory was a product of the electronic age. Communication lines and radio transmissions were carrying a certain thing, and computers would soon be storing this same thing on punch cards or magnetic cylinders, and the thing was neither knowledge or meaning. Its basic units were not ideas or concepts or even, necessarily, words or numbers."

"This thing could be sense or nonsense -- but the engineers and mathematicians could measure it, transmit it, and test the transmission for accuracy. Information proved as good a word as any, but people had to remember that they were using a specialized value-free term without the usual connotations of facts, learning, wisdom, understanding, enlightenment."

Unlike the information theory described above, my model of information content has precisely to do with information as a meaningful thing. I would argue that something that is not meaningful has no information content. In fact, propositions are units of meaning. That is why all three of the proposed information types are defined so as to be propositional in nature. This how we use the words "information" and "information content".

(Caveat: The model has nothing per se to do with issues of the utility of information. It does not distinguish good information from bad, true from false, important from trivial. Using the model should make it easier to make these distinctions, but they are not specifically addressed. The focus is on the nature and structure of content, not its quality.)

The power of these definitions derives from the fact that expressing a proposition is in a way a very simple act. Indeed, I argue below that any such act can be considered as consisting of just three basic elements. Understanding these elements leads us to a new understanding of information content, as well as to the discovery of a rich world of information structures that underlies all bodies of information.

The basic elements of information content.

From the point of view of mathematical logic, any body of expressed propositions is made up of the following three basic elements:

1. The context of the expression.

For type 1 this is typically who said or wrote what--the actual language used, when, where, why, etc. For type 2 devices this will include actual displays, printouts, etc. For type 3 non symbolic expressions it is the facts about the video, etc.

Note that according to our three definitions information is always a tangible thing created by a specific act of expression at a specific time and place. It may be sounds, marks on paper, a dial reading, a video, even an action, but it is always tangible. Thus on this view information is never an intangible something in someone's head. The latter may be knowledge or belief, but it is not information. We are therefore talking about something that always has a physical aspect.

Note too that information content has to be expressed. Thus simply seeing that a tree is in the road, an act of perception, does not involve information.

2. The propositions expressed.

Already discussed.

3. The things referred to by these propositions.

These things are called in logic the "referents" of the propositions. Reference is discussed in most logic textbooks. Note that referents can be activities, properties, or anything that can be talked about. To say that the snow is white is to refer to snow the stuff and whiteness the attribute. To say that snow is melting is to refer to snow the stuff and the activity of melting. Referents do not have to be real, nor do the propositions have to be true. Novels and lies have content.

Thus any instance of information content, call it a piece of information, involves (1) propositions, (2) expressed in a given context, and (3) referring to certain things. This sounds abstract, but in any given case these elements are pretty obvious. (Or some of them are obvious, others are not, but I will not go into that issue here.) What is important is that different pieces of information can be fit together according to how their elements are related. This leads to a rich new science that I call information structures.

Information structures.

Given the three basic elements of information content, we note the following. Any two pieces of information content can be related to one another by features of one, or a combination, of these basic elements. The question then becomes: what are the most important relationships that underlie a given body of information? How do the pieces fit together? As it turns out, some of the most important relations are well known, while others are less so.

Moreover, there may be a number of very different relationships that are important in a given body of information. If so then when we try to understand that information we are in fact trying to do several things at once, that is, to master several relationships at once. I believe this is one of the chief obstacles to efficient learning. We try to grasp several different relationships without differentiating them.

I call the array of information units that are related by a given relationship an "information structure". I also conjecture that any body of important information includes a number of different, yet significant, structures. Certainly this is true for every body of information I have analyzed so far. By using the model you can systematically identify the most important information structures that underlie a given body of information. One can even quantify and measure, as well as display, these structures in various useful ways.

Let us consider briefly some of the more common and typically important systems of relations, i.e. structures, that obtain among the three basic elements in a body of information.

a. Context-based relations.

Alphabetical listing is a common way to relate pieces of information content that is based on their physical form (i.e., the language used). Chronologies of the events that produce information, such as speeches or scientific articles, are another example. So are logs of gauge readings, for that matter.

b. Proposition based relations.

Relations of mathematics and logic are typically proposition based. The former includes spreadsheet information, business or engineering calculations, computer software functional designs, etc. (Of course the referents of these manipulations are also going to be related in important ways, as discussed below). Logic relations include implication and contradiction, both of which are important in computer science, problem solving, the law, etc. Some of these relations are well known.

Less well known is the system of propositional relations displayed by the issue tree diagram I developed at Carnegie Mellon in the 1970's. I now conjecture that this diagram displays the fundamental relationship between the propositions expressed in most bodies of information. If our definitions are correct the issue tree is the fundamental structure of information content.

c. Referent-based relations.

Identical reference (i.e., being about the same thing) is a common relation among some of the pieces of content in a body or system of information. However, it is seldom the case that all of the pieces of information content in a given body of information are related by identical reference.

Rather, the propositions typically refer to various members of one or more systems of related referents. This is because things can be related to one another in so many different and important ways. Great care is necessary to distinguish the different kinds of relations between the referents in a body of information, because they define different information structures. Moreover, in many cases we are ignorant of the important ways that things are related. That's what science is about after all.

d. Hybrid relations.

There are some important relations among pieces of information that are hybrids of the above. For example, we sometimes express propositions that refer to other, previously expressed, propositions or groups of propositions. In logic the former are called meta level expressions. Likewise, poetry seems often to depend on relations that combine the properties of a physical language with propositional properties.

The case of information system design.

Our model of information content and structure has major implications for the design of information systems. For example, because of a failure to distinguish between the various important structures that relate information content, many information systems reflect a jumble of partial structures.

At the other extreme are systems designed around a single structure, say a telephone book. This design tends to minimize the availability of the other, often equally important, structures to the user. Kinship or geographical relations of the listed parties, for example. These structures are often available only through laborious analysis.

Information system as I use it here is a very broad term, encompassing such diverse creations as data bases, financial systems, management and executive information systems, office automation systems, interactive courseware, command and control systems, etc. Even books and magazines. While many system design issues are different for these different sorts of systems, two fundamental issues are always present:

Design issue#1. What information is to be included in the system?

Design issue#2. How should this information be organized?

To vastly oversimplify the matter, using our model these two issues really come down to the question of which structures to incorporate (issue #2), and how much of each (issue #1)? But what would traditionally be the second design issue is in fact now the first. Thus we seem to be designing our information systems in a backward fashion. Moreover, once the significant structures are selected, the question of what information to include becomes largely one of level of detail and unit cost.

Then there is the issue of display of information. I envision the visual navigation of important information structures in virtual reality. Indeed I have already started experimenting with such navigation, using simple 3D navigation software. In my vision people cruise through the structure of their information as one would fly a starship. They visit strange and beautiful worlds, worlds that are in a sense already real.

We just can't see them yet.

David Wojick