IMHO, yes, I think this is a bad approach. If you knew that the first 20 characters will be unique, then it should be a first class column with a unique constraint. If you want to have better searching on the Name column, then using full-text search is the right way to go.
If you want to ensure that the varchar(max) column is unique, then create a computed column that generates a hash off the value and place a unique constraint off that.
IMHO, yes, I think this is a bad approach. If you knew that the first 20 characters will be unique, then it should be a first class column with a unique constraint. If you want to have better searching on the Name column, then using full-text search is the right way to go.
If you want to ensure that the varchar(max) column is unique, then create a computed column that generates a hash off the value and place a unique constraint off that. Alter Table Add NameHash Hashbytes('SHA1', Name) ADDITION Given our discussion, if your searches are always going to be on a exact match, then you could hash your search parameter and compare it to NameHash above. However, the catch is that the match must be an exact match (i.e.
Case-sensitive). I still content that the FTS will be your best bet. Even though there is overhead in breaking up your text into words, FTS is the tool best designed for doing searches against large amounts of text.
The longer your search criteria, the more exact it will be, the faster the search.
Yep, we know all about HASHBYTES. Still forces us to write a funky query to exploit the index though. – luksan Jun 2 '10 at 20:29 1 @luksan - How so?
You would never use NameHash in your search. If you are searching on name, you would use full-text search and that will definitely use the (full-text) index. – Thomas Jun 2 '10 at 20:31 @Thomas If we enabled full-text indexing, yes.
Problem is I don't know the overhead of doing that, whether it is as optimized for a scenario like this (matching the entire string as opposed to individual words), whether it is as fast as a regular index, etc. That's why I posted the question. – luksan Jun 2 '10 at 20:34 @luksan - Given the data you have show us and how long it is, the benefit would be enormous. FTS breaks up the data into words where "words" are determined by word breaks which are culture specific.
In your case, you have obvious word breaks (the commas) in your text so FTS would work very well. In fact, I think the benefit would be enormous. The overhead is minimal and can be adjusted to scale well.
– Thomas Jun 2 '10 at 20:38 @luksan - Keep in mind that the NameHash computed column I suggested was only to provide a means to ensure uniqueness on a varchar(max) column. It is entirely orthogonal to the need for an index to speed searches on names and for that Full Text Search would be the right solution. – Thomas Jun 2 '10 at 20:40.
Your index is on name_indexable, not on name. Since name_indexable is generated from a function involving name instead of directly on the column name, the optimizer won't automatically use the index when your where clause includes a reference to name. You must search on name_indexable in order to use the index.
Since you have a middle tier, your best bet is probably to provide a function that searches on name_indexable if the given name is.
Yeah, this is what I am trying to avoid though. – luksan Jun 2 '10 at 19:46 1 You could try using a WITH (INDEX xxx) query hint. I'm not sure if this will actually work or not with the way you're computing the column, but it's worth trying.
– Donnie Jun 2 '10 at 19:57 interesting idea, still results in the same query though. Actually the index does get used in the original query, but it is performing a scan rather than a seek. – luksan Jun 2 '10 at 20:31.
Make the Name_Index column a persisted computed column and the primary key and enforce uniqueness by appending the ChemicalId instead of relying on triggers. CREATE TABLE dbo. Chemical (ChemicalId int NOT NULL IDENTITY(1,1), Name Nvarchar(max) NOT NULL, Description Nvarchar(max) NOT NULL, Name_Index AS (CONVERT(VARCHAR(20), LEFT(Name, 20)) + CONVERT(VARCHAR(20), ChemicalId)) PERSISTED PRIMARY KEY).
If making it persisted allows the index to be used, good. If not, maybe you should bite the bullet and make it a "full-time" column. – Philip Kelley Jun 2 '10 at 20:03 That would guarantee that any value inserted into Name (even duplicates) could never violate the unique constraint, since the ID is always going to be unique.
– luksan Jun 2 '10 at 20:08 Goal is not really to make it the primary key, we like our identity keys. I checked the query plan, your solution still does not allow us to cause an index seek without explicitly specifying Name_Index in the where clause. – luksan Jun 2 '10 at 20:17.
Did you try WHERE Name_Indexable='1,2,3-Propanetriol' After all that is where your index is created on.
1 makes sense to me. – KM. Jun 2 '10 at 19:41 Yes, sorry.
I fixed my example to make it clearer as to what I am trying to achieve. – luksan Jun 2 '10 at 19:41.
I find your solution from the question (the last query) very good, but I personally prefer to say SQL more exactly what and how I want to do. So if you works with Microsoft SQL Server or with some other SQL Server which supports CTE (common table expression) you can rewrite your query like following: DECLARE @data nvarchar(max); SET @data = '1,1''-Bicyclohexyl-2-carboxylic acid, 4'',5-dihydroxy-2'',3-dimethyl-5'',6-bis(1-oxo-2-propen-1-yl)oxy-, methyl ester'; WITH ReduceData (ChemicalId, Name, Description) AS ( SELECT ChemicalId, Name, Description FROM dbo. Chemical WHERE Name_Indexable=LEFT(@data,20) ) SELECT ChemicalId, Name, Description FROM ReduceData WHERE Name=@data (In the real implementation you probably don't need to define @data.
Instead you can just use a parameterized query. ). What I suggest is just to say SQL more explicit what you want.
All CTE Queries can be very good optimized. It can be that your original query will be compiled to the absolutely the same execution plan as my CTE version. You can look at the both plans and compare there.In your project you have probably much more complex queries as from your question.
If you'll use more CTE, your SQL code will be easy to read, it can be very good optimized and you can be sure that SQL Server do exactly what you want. UPDATED: By the way the line ALTER TABLE Chemical ADD Name_Indexable AS LEFT(Name, 20) should be changed to ALTER TABLE Chemical ADD Name_Indexable AS CAST(LEFT(Name, 20) AS varchar(20)) PERSISTED to make a Name_Indexable column of the type varchar(20) on the Microsoft SQL Server 2008 and mark it PERSISTED to stores the computed values in the table, and updates them when any other columns on which the computed column depends are updated.
I don't think you really need the CTE (just AND the two filter criteria), but the example of having a single parameter used twice is great. I totally support this and furthermore, you can wrap this in a parameterized inline table-valued UDF so that the index usage is enforced transparently. Obviously for set-based join operations, this is a bit more problematic, but common joins could be represented with views with the appropriate LEFT() usage.
– Cade Roux Jun 3 '10 at 2:31 Why does persisting the column matter? The column will be "persisted" in the index, which is the only place where it will really be needed (never will be used in a SELECT). Won't persisting it in the table just waste space?
Won't the index nodes automatically be updated if the computed column is not persisted? What is the advantage of persisting? – luksan Jun 3 '10 at 12:07.
Fix your data model. You have a comma-delimited list in the name column, to me that means you would better be able to query if you had a related table. Your name appears to be a list of ingredients not a name.
If this really truly is a real name, then Registered User has a good plan.
It's not a comma-delimited list, it's a chemical name. Really. I can't make this stuff up.
– luksan Jun 2 '10 at 20:14.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.