|Subject:||Re: [SQL] making 'like' queries quicker|
|Date:||Dec 18, 1999 1:53:29 pm|
A general rule of thumb is that indexes only work on exact matches.
The reason why indexes are faster is because they store a pointer to each of the actual records in a table in a separate file. Each pointer is associated with a a number which corresponds to the record's "hashed value."
E.g. if you have a record which contains a field whose value is "dog ate my sandal," then, when the record is indexed, the hashing algorith calculates a number based on the letters in the value, which is the hashed value (imagine adding the letters together to arrive at a sum) *. That hashed value is then stored in a hash or btree (in Postgres) index.
When you search for a word, if an index search is used, the same hashing algorhithm is used to arrive at a hash value for the search term. Then all records with the same hash value are checked. If you are looking for "dog ate my sandal," then the hash value for your search term and the record in the database is the same, so that record is returned.
Other values, such as "oh oh, I ate my dog," can have the same hash value, depending on the algorithm used. If multiple records are found with the index search, each one is checked in turn for an exact match.
The moral of the story is that because the hashed value only matches for records with identical data, and some random records, while other records in the database are not even looked at, using an index when you don't have an exact value known won't work.
Because postgres allows function return values to be used as a basis for an index, it is possible to find "non-exact matches." E.g. using a lowercase value of a value to calculate an index. "lower(fieldname)" is a valid "column name" to use when building an index in Postgres. Similarly you could use a function which only returns the first three letters of the value, if those are the only letters you care about. You would have to write a C function to do that, and then use that when creating the index. You cannot use SQL functions for creating an index in Postgres. Built-in functions, on the other hand, (e.g. substring()) could be used, but I believe the parser can't handle them if they have more than one argument. I could be wrong on this. If I am, I would love to be enlightened on this by somebody.
Just to elaborate on this a bit, it is possible to have other schemes for calculating the hash value, with more complexity, such as finding all words in a value, and then adding an entry into an index for each of the matching records. This type of approach would take more space, and would be slower to maintain, but it would allow searches of individuals words within strings. Postgres doesn't have an indexing scheme like this built-in. This was just an example of what you could do with an index.
I don't know all the internals of how indexes are implemented in Postgres, so if I am off, please correct me.
* A sum is not used. The algorithm calculates a value which has the best possible distribution of values within a certain range, so the index has an even distribution of values in all areas.
Whatabout queries which only end with a wildcard? is there any way to accelerate such a query?
Is there a way to make queries using the 'like' operator quicker, more specifically for queries that look like: select name from table where name like '%abc%';
These kinds of queries (where the search string starts with a wildcard) never use indexes, so you're stuck with the sequential scan. A faster computer is probably your best option. :(