Subject (documents)
Encyclopedia
In library and information science
Library and information science
Library and information science is a merging of the two fields library science and information science...

 document
Document
The term document has multiple meanings in ordinary language and in scholarship. WordNet 3.1. lists four meanings :* document, written document, papers...

s (such as books, articles and pictures) are classified and searched by subject - as well as by other attributes such as author, genre and document type. This makes "subject" a fundamental term in this field. Library and information specialists assign subject labels to documents to make them findable. There are many ways to do this and in general there is not always consensus about which subject should be assigned to a given document . To optimize subject indexing and searching, we need to have a deeper understanding of what a subject is. The question: "what is to be understood by the statement 'document A belongs to subject category X'?" has been debated in the field for more than 100 years (cf., below).

Definition

Hjørland (1992, p. 185) defined subjects as the epistemological potentials of documents.
This definition is in line with the request oriented understanding of indexing quoted below. The idea is that a document is assigned a subject to ease retrieval and findability. And the criteria for what should be found - what constitutes knowledge - is in the end an epistemological question.

Charles Ammi Cutter
Charles Ammi Cutter
Charles Ammi Cutter is an important figure in the history of American library science.Born in Boston, Massachusetts, Cutter was appointed assistant librarian of Harvard Divinity School while still a student there...

 (1837–1903)

For Cutter the stability of subjects depends on a social process in which their meaning is stabilized in a name or a designation. A subject "referred . . . to those intellections . . . that had received a name that itself represented a distinct consensus in usage" (Miksa, 1983a, p. 60) and: the "systematic structure of established subjects" is "resident in the public realm" (Miksa, 1983a, p. 69); "[s]ubjects are by their very nature locations in a classificatory structure of publicly accumulated knowledge (Miksa, 1983a, p. 61). Bernd Frohmann adds:

"The stability of the public realm in turn relies upon natural and objective mental structures which, with proper education, govern a natural progression from particular to general concepts.
Since for Cutter, mind, society, and SKO [Systems of Knowledge Organization] stand one behind the other, each supporting each, all manifesting the same structure, his discursive construction of subjects invites connections with discourses of mind, education, and society. The Dewey Decimal Classification
Dewey Decimal Classification
Dewey Decimal Classification, is a proprietary system of library classification developed by Melvil Dewey in 1876.It has been greatly modified and expanded through 23 major revisions, the most recent in 2011...

 (DDC), by contrast, severs those connections. Melvil Dewey
Melvil Dewey
Melville Louis Kossuth Dewey was an American librarian and educator, inventor of the Dewey Decimal system of library classification, and a founder of the Lake Placid Club....

 emphasized more than once that his system maps no structure beyond its own; there is neither a "transcendental deduction" of its categories nor any reference to Cutter's objective structure of social consensus. It is content-free: Dewey disdained any philosophical excogitation of the meaning of his class symbols, leaving the job of finding verbal equivalents to others. His innovation and the essence of the system lay in the notation. The DDC is a poorly semiotic system of expanding nests of ten digits, lacking any referent beyond itself. In it, a subject is wholly constituted in terms of its position in the system. The essential characteristic of a subject is a class symbol which refers only to other symbols. Its verbal equivalent is accidental, a merely pragmatic characteristic...
....
The conflict of interpretations over "subjects" became explicit in the battles between "bibliography" (an approach to subjects having much in common with Cutter's) and Dewey's "close classification". William Fletcher spoke for the scholarly bibliographer.... Fletcher's "subjects", like Cutter's, referred to the categories of a fantasized, stable social order, whereas Dewey's subjects were elements of a semiological system of standardized, techno-bureaucratic administrative software for the library in its corporate, rather than high culture, incarnation". (Frohmann, 1994, 112-113).

Cutter's early view on what a subject is, is probably wiser than most understandings that dominated the 20th century - and also the understanding reflected in the ISO-standard quoted below. The early statements quoted by Frohmann indicate that subjects are somehow shaped in social processes. When that is said, it should be added that they are not particularly detailed or clear. We only get a vague idea of the social nature of subjects.

S. R. Ranganathan
S. R. Ranganathan
Shiyali Ramamrita Ranganathan was a mathematician and librarian from India. His most notable contributions to the field were his five laws of library science and the development of the first major analytico-synthetic classification system, the colon classification...

 (1892–1972)

A system, which has en explicit theoretical foundation is Ranganathan's Colon Classification
Colon classification
Colon classification is a system of library classification developed by S. R. Ranganathan. It was the first ever faceted classification. The first edition was published in 1933. Since then six more editions have been published...

. Ranganathan provided an explicit definition of the concept of "subject":

"Subject - an organized body of ideas, whose extension and intention are likely to fall coherently within the field of interests
and comfortably within the intellectual competence and the field of inevitable specialization of a normal person".
(Ranganathan, 1967, p. 82).

A related definition is given by on of Ranganathan's students:

"A subject is an organized and systematized body of ideas. It may consist of one idea or a combination of several..."
(Gopinath, 1976, p. 51)".

Ranganathan's definition of "subject" is strongly influenced by his Colon Classification system. The colon system is based on the combination of single elements from facets to subject designation. This is the reason why the combined nature of subjects are emphasized so strongly. It leads, however, to absurdities such as the claim that gold cannot be a subject (but is alternatively termed "an isolate"). This aspect of the theory has been criticized by Metcalfe (1973, p. 318). Metcalfe's skepticism regarding Ranganathan's theory is formulated in hard words (op. cit., p. 317): "This pseudo-science imposed itself on British disciples from about 1950 on...".

It seems unacceptable that Ranganathan defines the word subject in a way that favors his own system. A scientific concept like "subject" should make it possible to compare different ways of establishing access to information. Whether or not subjects are combined or not should be examined once their definition has been given, it should not determined a priory, in the definition.

Besides the emphasis on the combined, organizing and systematizing nature of subjects contains Ranganathan's definition of subject the pragmatic demand, that a subject should be determined in a way that suits a normal person's competency or specialization. Again we see a strange kind of wishful thinking mixing a general understanding of a concept with demands put by his own specific system. One thing is what the word subject means, quite another issue is how to provide subject descriptions that fulfill demands such as the specificity of a given information retrieval language which fulfill demands put on the system, such as precision and recall
Precision and recall
In pattern recognition and information retrieval, precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance...

. If researchers too much define terms in ways that favor specific kinds of systems, that are such definitions not useful to provide more general theories about subjects, subject analysis and IR. Among other things are comparative studies of different kinds of systems made difficult.

Based on these arguments (as well as additional arguments which have been used in the literature) we may conclude that Ranganathan's definition of the concept "subject" is not suited for scientific use. Like the definition of "subject" given by the ISO-standard for topic maps may Ranganathan's definition be useful within his own closed system. The purpose of a scientific and scholarly field is, however, to examine the relative fruitfulness of systems such as topic maps and Colon classification. For such purpose is another understanding of "subject" necessary.

Patrick Wilson
Patrick Wilson (librarian)
Patrick Wilson was a noted librarian, information scientist and philosopher who served as a professor at the University of California, Berkeley and as dean of the School of Library and Information Studies there...

 (1927–2003)

In his book Wilson (1968) examined - in particular by thought experiments - the suitability of different methods of examining the subject of a document. The methods were:

- To identify the author's purpose for writing the document
- To at weight the relative dominance and subordination of different elements in the picture, which the reading imposes on the reader.
- To group or count the documents use of concepts and references
- To construe a set of rules for selecting the elements which are necessary as opposed to unnecessary for the work as a whole.

Patrick Wilson shows convincingly that each of these methods are insufficient to determine the subject of a document and is led to conclude ( p. 89): "The notion of the subject of a writing is indeterminate..." or, on p. 92 (about what users may expect to find using a particular position in a library classification system): "For nothing definite can be expected of the things found at any given position". In connection to the last quote has Wilson an interesting footnote in which he writes that authors of documents often use terms in ambiguous ways ("hostility" is used as an example). Even if the librarian could personally develop a very precise understanding of a concept, he would be unable to use it in his classification, because none of the documents use the term in the same precise way. Based on this argumentation is Wilson led to conclude: "If people write on what are for them ill-defined phenomena, a correct description of their subjects must reflect the ill-definedness".

Wilson's concept of subject was discussed by Hjørland (1992) who found that it is problematic to give up the precise understanding of such a basic term in LIS. Wilson's arguments led him to an agnostic position which Hjørland found unacceptable and unnecessary. Concerning the authors' use of ambiguous terms, the role of the subject analysis is to determine which documents would be fruitful for users to identify whether or not the documents use one or another term or whether a given term in a document is used in one or another meaning. Clear and relevant concepts and distinctions in classification systems and controlled vocabularies may be fruitful even if they are applied to documents with ambiguous terminology.

"Content oriented" versus "Request oriented" views

Request oriented indexing is indexing in which the anticipated request from users is influencing how documents are being indexed. The indexer ask himself: “Under which descriptors should this entity be found?” and “think of all the possible queries and decide for which ones the entity at hand is relevant” (Soergel, 1985, p. 230 .

Request oriented indexing may be indexing that is targeted towards a particular audience or user group. For example, a library or a database for feminist studies may index documents different compared to a historical library. It is probably better, however, to understand request oriented indexing as policy based indexing: The indexing is done according to some ideals and reflects the purpose of the library or database doing the indexing. In this way it is not necessarily a kind of indexing based on user studies. Only if empirical data about use or users are applied should request oriented indexing be regarded as a user-based approach.

The problem of whether the subject is in the content of a document (objectively) or in the mind of the individual users (subjectively) or in a community (intersubjectively, as a social construction) is a part of the philosophical subject–object problem.

The subject knowledge view

Rowley & Hartley (2008, p. 109) wrote “In order to achieve good consistent indexing, the indexer must have a through appreciation of the structure of the subject and the nature of the contribution that the document is making to the advancement of knowledge within a particular discipline“. This is accordance with Hjørland's definition given above.

Other views and definitions

In the ISO-standard for topic maps the concept of subject is defined this way:

"Subject
Anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever." ISO 13250-1, here cited from draft: http://www1.y12.doe.gov/capabilities/sgml/sc34/document/0446.htm#overview)

This definition may work well with the closed system of concepts provided by the topic maps standard. In broader contexts, however, is not fruitful because it does not contain any specification of what to identify in a document or in a discourse when ascribing subject identification terms or symbols to it. If different methods of subject analysis imply different results, which of these results can then be said to reflect the (true) subject? (Given that the expression "a true subject assignment" is meaningful at all, which is an important part of the problem). Different persons may have different opinions about what the subject of a specific document is. How can a theoretical understanding of the term "subject" be helpful deciding principles of subject analysis?

Indexing words versus concepts versus subjects

A proposal for the differentiation between concept indexing and subject indexing was given by Bernier (1980). In his opinion subject indexes are different from, and can be contrasted with, indexes to concepts, topics and words. Subjects are what authors are working and reporting on. A document can have the subject of Chromatography if this is what the author wishes to inform about. Papers using Chromatography as a
research method or discussing it in a subsection do not have Chromatography as subjects. Indexers can easily drift into indexing concepts and words rather than subjects, but this is not good indexing. Bernier does not, however, differentiate author’s subjects from those of the information seeker. A user may want a document about a subject, which is different from the one intended by its author. From the point of view of information systems, the subject of a document is related to the questions that the document can answer for the users (cf. the distinction between a content oriented and a request-oriented approach).

Hjørland & Nicolaisen (2005) investigated the concept of subject in relation to Bradford's law
Bradford's law
Bradford's law is a pattern first described by Samuel C. Bradford in 1934 that estimates the exponentially diminishing returns of extending a search for references in science journals...

 of scattering and made a distinction between three kinds of scattering:

• Lexical scattering is the scattering of words in texts and in collections of texts.
• Semantic scattering is the scattering of concepts in texts and in collections of texts.
• Subject scattering is the scattering of items useful to a given task or problem.

Isness

"The FRSAR Working Group is aware that some controlled vocabularies provide terminology to express other aspects of works in addition to subject (such as form, genre, and target audience of resources). While very important and the focus of many user queries, these aspects describe isness or what class the work belongs to based on form or genre (e.g., novel, play, poem, essay, biography, symphony, concerto, sonata, map, drawing, painting, photograph, etc.) rather than what the work is about." (IFLA, 2010, p. 10).

Ofness

"Those LIS authors who have focused on the subjects of visual resources, such as artworks and photographs, have often been concerned with how to distinguish between the "aboutness" and the "ofness" (both specific and generic depiction or representation) of such works (Shatford, 1986). In this sense, "aboutness" has a narrower meaning than that used above. A painting of a sunset over San Francisco, for instance, might be analyzed as being (generically) "of" sunsets and specifically) "of" San Francisco, but also "about" the passage of time." (IFLA, 2010, p. 11).
See also: Baca & Harpring (2000) and Shatford (1986).

See also

  • Aboutness
    Aboutness
    Aboutness is a term used in library and information science , linguistics, philosophy of language, and philosophy of mind. In LIS, it is often considered synonymous with subject . In philosophy it has been often considered synonymous with intentionality, perhaps since John Searle .R. A...

  • Document classification
    Document classification
    Document classification or document categorization is a problem in both library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" or algorithmically...

  • Subject indexing
    Subject indexing
    Subject indexing is the act of describing or classifying a document by index terms or other symbols in order to indicate what the document is about, to summarize its content or to increase its findability. In other words, it is about identifying and describing the subject of documents...

  • Subject access
    Subject access
    Subject access refers to the methods and systems by which books, journals, and other documents are accessed in a given bibliographic database . The single records in a bibliographic file is structured in fields and each field can be searchable and combined with other fields. Such searchable data...

  • Subject term
  • Topic-comment
    Topic-comment
    In linguistics, the topic is informally what is being talked about, and the comment is what is being said about the topic...


Literature

Drake, C. L. (1960). What is a subject? Australian Library Journal, 9, 34-41.

Hjørland, Birger (1997): Information Seeking and Subject Representation. An Activity-theoretical approach to Information Science. Westport & London: Greenwood Press.

Hjørland, Birger (2009). Book review of: Rowley, Jennifer & Hartley, Richard (2008). Organizing Knowledge. An Introduction to Managing Access to Information. Aldershot: Ashgate Publishing Limited. IN: Journal of Documentation, 65(1), 166-169. Manuscript retrieved 2011-10-15 from: http://arizona.openrepository.com/arizona/bitstream/10150/106533/1/Book_review_Rowley_&_Hartley.doc

IFLA (2010).Functional Requirements for Subject Authority Data (FRSAD): A Conceptual Model. By IFLA Working Group on the Functional Requirements for Subject Authority Records (FRSAR). Edited by Marcia Lei Zeng, Maja umer, Athena Salaba. International Federation of Library Associations and Institutions. Berlin: De Gruyter. Retrieved 2011-09-14 from: http://www.ifla.org/files/classification-and-indexing/functional-requirements-for-subject-authority-data/frsad-final-report.pdf

Miksa, F. (1983b): The Subject in the Dictionary Catalog from Cutter to the Present. Chicago: American Library Association.

Welty, C. A. (1998). The Ontological Nature of Subject Taxonomies. IN: N. Guarino (ed.), Proceedings of the First Conference on Formal Ontology and Information Systems, Amsterdam, IOS Press. http://www.cs.vassar.edu/faculty/welty/papers/fois-98/fois-98-1.html
  • Category: Information science
  • Category: Library science
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK