![]() ![]() In our example we used the default configuration english for the English language. It is possible to have many different configurations in the same database, and predefined configurations are available for various languages. The choices of parser, dictionaries and which types of tokens to index are determined by the selected text search configuration ( Section 12.7). In this example that happened to the punctuation sign - because there are in fact no dictionaries assigned for its token type ( Space symbols), meaning space tokens will never be indexed. If no dictionary in the list recognizes the token then it is also ignored. Some words are recognized as stop words ( Section 12.6.1), which causes them to be ignored since they occur too frequently to be useful in searching. For example, rats became rat because one of the dictionaries recognized that the word rats is a plural form of rat. The first dictionary that recognizes the token emits one or more normalized lexemes to represent the token. For each token, a list of dictionaries ( Section 12.6) is consulted, where the list can vary depending on the token type. The to_tsvector function internally calls a parser which breaks the document text into tokens and assigns a type to each token. In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored. ![]() SELECT to_tsvector('english', 'a fat cat sat on a mat - it ate a fat rats') The document is processed according to the specified or default text search configuration. ![]() To_tsvector parses a textual document into tokens, reduces the tokens to lexemes, and returns a tsvector which lists the lexemes together with their positions in the document. To_tsvector( document text) returns tsvector PostgreSQL provides the function to_tsvector for converting a document to the tsvector data type. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |