written 5.6 years ago by | • modified 5.6 years ago |
Subject: Storage Network Management and Retrieval
Difficulty: Medium
Marks: 05M
written 5.6 years ago by | • modified 5.6 years ago |
Subject: Storage Network Management and Retrieval
Difficulty: Medium
Marks: 05M
written 5.6 years ago by |
Indexing, the act of assigning index terms to a document, may be carried out either manually or automatically. In either situation the indexing language may be controlled, that is, limited to a predefined set of index terms, or uncontrolled, allowing use of any term that fits some broad criteria. One source of problems with author-developed indexes is that the process is generally manual, with an uncontrolled vocabulary and no predefined inclusion rules. There is no reason to expect consistency of indexing done in this manner across a document collection.
Indexing has three primary purposes in information retrieval:
To permit easy location of documents by topic.
To define topic areas, and hence relate one document to another, and
To predict relevance of a given document to a specified information need.
There are two types of indexing mechanisms:
Manual indexing
Automatic indexing
Generally, if indexes are assigned manually, there is no control over the vocabulary terms. On the other hand, automatically assigned indexes are controlled as they follow some predefined rules or a set of index terms. Most importantly, in both the methods, the indexing should be controlled by one or the other way. Consistency in the database indexes can be expected in a controlled method rather than the uncontrolled method.
Manual indexing: manual indexing makes use of an uncontrolled indexing language. This indicates intellectual efforts being taken by the author to identify and describe the content of a document. As a result, there is a lack of consistency. Hence, two indexers will never assign the same set of indexes for a given document as the vocabulary used by both of them would be different. Even a single indexer might not assign the same index for the given document over a period of time. For a large document, several indexes might be used which might not be consistent.
Automatic indexing: Automatic indexing is a technique to determine the index terms in a document automatically. An algorithm finds a relevant index term for a specific document to be represented so that the document should be located. The algorithm although avoids the consistency issue of manual indexing to some extent, but at the same time does not offer the flexibility of defining terms to an individual as that of manual indexing. The success ratio of index terms is left at the mercy of the system programmer’s knowledge and viewpoint who built the algorithm.