PDA

View Full Version : IsUpTextAcc



Kevin
05-25-2013, 09:54 AM
It's fairly common for uptext to have errors depending on the situation, which has lead to the UpTextMulti functions. But what if instead of trying to break a full string into substrings for exact searches, we instead search based on an accuracy of our string comparison? Overall it would be far more accurate and would reduce the headache of creating a multi-String that can still fail simply by mixing up with another similarly named object.

But instead of being so hypothetical, let's just say the following snippet should settle the job just fine, with your own defined accuracy variable (1.0 being 100%). The disadvantage of this function is that it's slightly slow handled in this manner, example following.

The comparison of: 'Rune kiteshield' (actual name) with 'R!1n!e lritesl.!ield' (uptext viewed name) scores a 46.6(recurring) % match takes an average of 21ms to determine (1000 run test).

The much better comparison of 'Uncut dragonstone' (actual name) with 'Un!cut dragonston!e' (uptext viewed name) scores an 88% match and takes an average of 23ms to determine (1000 run test).


var
arr: array of array of integer;
//always call this with a true (for recursive reset of the array purposes)
function LevenshteinRec(s, t: string; called: boolean): integer;
var
sLen, tLen, cost, a, b, c: integer;
sShort, tShort: string;
begin
sLen:= Length(s);
tLen:= Length(t);
if(sLen<=1)then
Result:= tLen
else if(tLen<=1)then
Result:= sLen
else begin
if(called)then
begin
SetLength(arr, sLen);
for a:= 0 to (sLen-1) do//cost is simply a re-used variable here
begin
SetLength(arr[a], tLen);
for b:= 0 to (tLen-1) do
begin
arr[a][b]:= 0;
end;
end;
end;
if(arr[slen-1][tlen-1] <> 0) then
begin
Result:= arr[slen-1][tlen-1];
Exit;
end;
if(s[sLen-1]=t[tLen-1])then
cost:= 0
else
cost:= 1;
sShort:= s;
SetLength(sShort, (sLen-1));
tShort:= t;
SetLength(tShort, (tLen-1));
a:= LevenshteinRec(sShort, t, false)+1;
b:= LevenshteinRec(s, tShort, false)+1;
c:= LevenshteinRec(sShort, tShort, false)+cost;
Result:= Min(Min(a, b), c);
arr[slen-1][tlen-1]:= Result;
end;
end;

function IsUpTextAcc(upText: String; desiredAcc: extended): boolean;
var
mismatch, len, acc: extended;
begin
mismatch:= LevenshteinRec(upText, GetUpText, true)-1;
len:= length(upText);
acc:= (len-mismatch) / len;
if (acc >= desiredAcc)then
Result:= true;
end;

function WaitUpTextAcc(UpText: String; acc: Extended; Time: integer): boolean;
var
t: integer;
begin
T := GetSystemTime + Time;
while (GetSystemTime < T) do
begin
if (IsUpTextAcc(UpText, acc)) then
begin
Result := True;
Exit;
end;
Wait(20 + Random(20));
end;
end;

Justin
05-25-2013, 10:25 AM
Nice work Kevin, keep it up :)

Olly
05-25-2013, 02:02 PM
http://villavu.com/forum/showthread.php?t=68176&p=932706#post932706 done before, and will be automatically included in srl6, but none of the less nice. :)

Kevin
05-25-2013, 06:53 PM
Wow, it appears every time I think of something cool it's already done ;)

I did some testing on the jaro-winkler implementation and while I admit unfamiliarity with it, it seems to be showing far better match percents than what I would expect. My example of 'Uncut dragonstone' vs 'Un!cut dragonston!e' returns a massive 98% match with the jaro-winker algorithm - and then for a more ridiculous comparison: ('Rune Kiteshield' vs 'Red Dragon') returns a 52.5% match with the jaro-winkler algrotihm, while Levenshtein should return what is equivalent to a 20% match.
I'm gonna go ahead and say since his Levenshtein implementation doesn't use recursion, his is faster; however I think we should be using his Levenshtein (slightly altered to return a % match) instead of his Jaro-Winkler due to how the match percents are being determined. Are there any worries on this far higher than (what I would expect to be) normal matches by using this separate fomula?

senrath
05-25-2013, 07:54 PM
Are there any worries on this far higher than (what I would expect to be) normal matches by using this separate fomula?

As long as it's consistent, I don't think there's a problem. You just have to know what accuracies are good to use. Now, if it's inconsistent then there's a problem.