10 Things You should know About Document Indexing

Понедельник, 07 Февраля 2022 г. 11:46 + в цитатник

It's document indexing that makes the tremendous speeds of document retrievals possible. As you might have noticed, Internet search engines retrieve documents highly relevant to your specific query from among billions of documents on the Web in under a second. This could have been simply impossible if they'd to locate through most of the billions in reaction to each query.

1. Search engines use what is called an inverted list index that lists the documents against each word, instead of words in each document. In reaction to an issue, the engines research the query words in their index and then list the documents against those words.

2. Typically there will be hundreds of documents, if not thousands, against each word. After that it becomes necessary to rank the documents in order of relevance to the query. Relevance is determined by using certain rules set by the engine, and typically involves more compared to the density of the particular query words in each document.

3. The major search engines do what is recognized as full-text indexing, i.e. they check all the language in the document's content, and list it against each one of these words (except perhaps too common words like 'the').

4. Not totally all indexing is full-text indexing. Full text indexes tend to be huge, requiring much storage area on the own. Indexing by document meta tags occupy much less space. The meta tags provide details about the document that helps retrieve it. For example, a quick note about the content of the document, its date of creation/modification and the author name might be attached as meta tags with each document.

5. Meta tag indexing requires that an individual has an notion of what the tags contain so that the person can query using these values. That is typically attained by having standard practices for describing document contents and document naming. Often, drop-down selection boxes of such descriptions and names are useful for manually tagging the document to ensure that different users will utilize the same terms for similar documents. Prawo jazdy kolekcjonerskie

6. Indexing is principally used with unstructured documents, such as correspondence, reports, articles and so on. Structured documents such as transaction records are generally stored in databases, and have unique IDs for every document. Database queries may then talk about the right document in very little time (instead of many documents raised by search queries).

7. Computer systems typically add certain meta information automatically to each document they create or modify. The date of creation and document author name are samples of such automatically added data. Other data such as document content description may be manually added by an individual, or added using such devices as standard-description barcode cards.

8. Indexing may be specialized as when scientific documents are indexed using scientific notation as opposed to standard words. The main element issue is ease of subsequent retrieval. Searchers for scientific documents, as an example, will typically find it better to retrieve documents utilising the specialized notations.

9. When paper documents are scanned into digital images, they cannot be indexed as such. Instead, the images need to be processed further using such tools as OCR (Optical Character Recognition) software to convert the images of text characters into standard, machine readable ASCII or Unicode characters.

10. Document indexing is not the only method to facilitate their subsequent retrieval. A hierarchical directory structure with meaningfully named folders and subfolders, and proper classification of documents and their storage in relevant subfolders, can enable quick browsing to the proper folder and retrieval. Where necessary, this is combined with folder-level indexing and search.

Minus the facility of indexing the tens and thousands of documents using, say a desktop search facility, businesses might find that retrieving unstructured documents is just a tough, and often simply impossible, task. Indexing, full text or meta tag based, changes the situation dramatically which makes it possible to retrieve even a specific e-mail comparatively quickly. Indexing is thus a powerful business tool.

Комментировать

« Пред. запись — К дневнику — След. запись »

Страницы: [1] [Новые]

LiveInternetLiveInternet

-Поиск по дневнику

-Подписка по e-mail

-Статистика

10 Things You should know About Document Indexing