Foreward | Introduction
Despite this post, I know next to nothing in regards to how this works its magic under the hood.
I still have yet to learn a lot about Tesseract and OCR in general. However, with that said, one
doesn't need to know a lot on how it works to use it. I remember feeling like Tesseract was this
sort of super complex thing that I wouldn't have time for. This tutorial serves only as a way to
provide enough 'activation energy' to get started with reading text with Tesseract (as it did with me) -
- and enough is very little - and that's what I'm going to try to show today.
Enjoy!
Basic Idea
We'll be using a single function call, tesseractGetText(), that takes in a TBox (within which
will be searched for text), as well as a filter constant that which will be used as 'criteria'
for accurately detecting our text. It returns a string.
In this tutorial, I'm going to show you an alternative method to get prayer info for your
character (convenient for tutorialing I suppose), and ways to massage (if applicable) the data
more into something you can use.
Tools Used
We'll be using the
TesseractTool by @
Olly to help us test and create our filters.
Screenshot software; I personally like ShareX, but for the purposes of this tutorial, I'll be
using the Windows' Snipping Tool. This will be used in tandem with the TesseractTool.
Step 1: Screencap
So, we know that we are going to be trying to read in our prayer info. It's got fairly contrasting
colors between the background and font, seems like a very suitable victim of Tesseract because of this.
Target acquired.
Go ahead and run
TesseractTool.simba
I'm going to be trying to read in both sides of the
'/' there, getting both current and maximum
prayer points. To get this process underway, we're going to be screencapping just about the
tightest cropping of the text you are going to want to read. In this case, I'm going to go ahead
and use Snipping Tool to get a capture of it.
I took a screencap of myself screencapping.
After getting a good clean capture, go ahead and Right Click -> Copy from Snipping Tool. Don't use save
with the snipping tool, the image becomes too compressed and it loses a bit of it's quality.
Man, I really oughtta train prayer someday.
From there, you're going to want to simply tab back to your instance of TesseractTool (now referred
to as 'TT' from now on), and CTRL+V-slap that copied image right in there to begin carving out your filter.
Blown away.
Step 2: Make Filter
Anyways, here comes the sorta trial-and-error part of this process. From here, you need to essentially
play around with these settings until you get something that accurately resembles what your're
looking for. You're going to stretch, squeeze, and carve your way to victory, and I believe in you.
The '
Resize' area of the TT deals with scaling either the width or height up from 1:1 ratio you start at.
This is useful for creating more pixels for the tool to use - remember this will just end up to what
amounts to helping a math function via the filter we're going to create from this. You set which scale
parameters you want to try out, then hit the resize button.
NOTE: Bigger isn't always better.
Next up is the '
Threshold' area of the TT, which deals in accurately masking out the pixels it thinks we
want it to mask out (anything that isn't a character, pretty much). You'll see an invert toggle, which
helps very situationally (in my experience, it's worth a try in lower contrast challenges). You can keep the
drop down to default 'TM_Mean' (I don't know a lot about this one tbh). Then comes the amount, which is
the arguably only setting you need to touch in this area - essentially more threshold amount, the more it
'carves' away at the letters, which might be good for getting rid of straggling colors or pulling together
a lopsided '1' or something. A typical good range I start at is 20-40. Hit the Apply Threshold to see
what's up and to be able to next hit that big ol Tesseract! button. This will print out what it saw with
your settings.
Rinse + wash + repeat until desired results!
- At this point, since this is such a physically small bit of text, not a lot of pixels to work with, one can
readily expect the default filter to work. fig. 1
Default settings.
- I suppose I'll try straight-up enlargment of our sample by setting both width and height scales to 5
retaining all proportion scale as well). As you can see, this was maybe too much as our '/' now reads as a capital i. fig. 2
Enhance.
- From experience, I know that, typically, one can get more accurate '/' readings by having more width, as it
squashes it down to be more horizontal, so I'm going to try that next. I'll also kick the threshold up a bit
just for fun. fig. 3
Squash.
- Maybe I was a little too heavy handed with that adjustment, but at this point, I'm just going to try going
to 3W 3H and see what happens, as well as bring the threshold amount up to 30 (I probably seem like a crazy
person, but I'm kinda winging it too). fig. 4
Dehance.
- Wow! It's completely accurate! However, don't fully celebrate yet! The next step is change what the text says,
re-screencap, re-copy+paste into the TT and make sure the same settings work for different text! fig. 5
Double check.
Wooo! In an ideal world, you want to be certain this works with 3 or 4+ different caps of text, and that if you had
to adjust your filter for the NEWEST cap, that you'd go back and cross check it with your older text samples to make
sure your adjusted filter works for every text sample still. This makes your filter quite robust and less likely to
return data you don't want to be finding. With that said, I'm pretty confident that for this situation, this is going
to be pretty good for two iterations - so I'll move on to last part of this step...
Generating that there fancy-shmancy
TTesseractFilter!!! Wooooooooooooooo -
Simple enough - once you've found your filter settings that you feel satisfied with, head on up to your menu and
go to 'Tesseract' -> 'Write Filter'.
In the console are of TT you should see something like
const myFilter: TTesseractFilter = [ s t u f f ].
Hey, guess what? We now get to see a little bit of code (well actually, you could've set up your simba project before
you took screencaps and fiddled your way around the TT, but you probably didn't, did you.
if you did - damn ur good.
Anyways, let's go ahead a mockup a quick function that returns a string and call it something like 'readPrayer'
that which also has a
TTesseractFilter const matching TT's output.
Simba Code:
function readPrayer: string;
const
filter: TTesseractFilter = [3, 3, [False, 30, TM_Mean]];
var
searchArea: TBox;
begin
result := tesseractGetText(searchArea, filter);
end;
Oh wait, oh no! We're not ready! We don't yet have any values for our
TBox - tragedy! What will we do!?
Oh wait, don't worry we can do that now ez pz.
Step 3: Define Search Area
Finding where the points that make up our box are is likely no new news to you, but I'ma walk through it anyways.
I'm simply going to make sure Simba is paired with our SMART client, then hover my mouse in the two points which
make up the box we want to be trying to read our text from within, reading the x and y coordinate from your Simba window.
You generally want this to be as tight to the data as you can get, but I usually like to leave a 1 or 2 pixel buffer
in between the bounds of the text, if you end up cutting into your text, you won't be getting accurate results!
In this case, I'll hover over the top left and bottom right corners of my prayer stats there on the action bar.
Top left corner:
<- Hover, Coords ->
Bottom right corner:
<- Hover, Coords ->
(Optional: Load a full viewport screencap from your SMART client into something like photoshop or gimp where you
can zoom in and see a coordinate system for super accurate and super tight search boxes for added consistency)
Now with these two points, (318, 311) and (364, 323), I'll use those with a simple
intToBox(x1, y1, x2, y2) to
assign our
searchBox. In the
var section of our function, we now have this.
Simba Code:
var
searchArea: TBox := intToBox(318, 311, 364, 323);
Now we're on to finally testing this in game to see if everything has seemed to come out as planned!
Step 4: Test
For the purposes of testing, I've put our function into a
repeat..until false; with a wait to keep printing
out results for us.
Simba Code:
program ReadSomeDangPrayer;
{$DEFINE SMART}
{$I SRL-6/srl.simba}
function readPrayer: string;
const
filter: TTesseractFilter = [3, 3, [False, 30, TM_Mean]];
var
searchArea: TBox := intToBox(318, 311, 364, 323);
begin
result := tesseractGetText(searchArea, filter);
end;
begin
setupSRL();
repeat
writeLn(readPrayer);
wait(1000);
until false;
end.
Now, if all goes well, you should be getting something like this output in your Simba console:
Which would be great soooo we can move on to the next step, an
oh-so-sensual step ...
Step 5: Massage
Don't let the name or others fool you. This step is strictly for business only.
Now that we have that we have that straightened out, we can move on to turning this string into smaller, more digestible pieces.
First off, we're going to want to expand our
var section a bit: let's go ahead and add a
TStringArray named
arr. You know. Like a pirate.
We're going to be storing the stringed numbers that represent our Current and Maximum prayer points, respectively, in this array with a
handy-dandy function called
explodeWrap(). We'll also create a local string variable for us to mangle and contort in any way
we please before handing it over as a result. Here's what our function looks like now ...
Simba Code:
function readPrayer: string;
const
filter: TTesseractFilter = [3, 3, [False, 30, TM_Mean]];
var
searchArea: TBox := intToBox(318, 311, 364, 323);
ourString: string;
arr: TStringArray;
begin
ourString := tesseractGetText(searchArea, filter);
// split our string at the slash into our array in order
explodeWrap('/', ourString, arr);
// left side of our slash is current prayer
// therefore arr[0] is our current prayer
writeLn('Current Prayer Points: ' + arr[0]);
// right side of our slash is maximum prayer
// therefore arr[1] is our maximum prayer
writeLn('Maximum Prayer Points: ' + arr[1]);
result := ourString;
end;
... and with that, when we run the script ...
#dataisbeautiful
Very nice! However, this doesn't seem very useful... really at all, yet, so let's see if we can do something about that. Let's add
a boolean parameter to our function asking us if we'd like to return what adds up to either the current, or the maximum point value.
I'll have it default to
true for me, but you can do whatever you wish. It's your script. You should know this.
BONUS TIP: Sometimes you may end up with characters you don't want, like spaces, commas, or other stray marks that shouldn't be where they are,
aside from further honing your filter, you can also use something like replace(); to get rid of those insubordinates.
With that said, you should make sure your filter is as good as you can get it first before this. An example to remove spaces would be:
replace(ourString, ' ', '', [rfReplaceAll]); There's also an rfIgnoreCase flag you could add to that array for other uses!
Here's what my final program looks like:
Simba Code:
program ReadSomeDangPrayer;
{$DEFINE SMART}
{$I SRL-6/srl.simba}
// our text finding function
function readPrayer(current: boolean = true): string;
const
filter: TTesseractFilter = [3, 3, [False, 30, TM_Mean]];
var
searchArea: TBox := intToBox(318, 311, 364, 323);
ourString: string;
arr: TStringArray;
begin
ourString := tesseractGetText(searchArea, filter);
// split our string at the slash into our array in order
explodeWrap('/', ourString, arr);
if (current) then
begin
writeLn('Current Prayer Points: ' + arr[0]);
result := arr[0];
end
else
begin
writeLn('Maximum Prayer Points: ' + arr[1]);
result := arr[1];
end;
end;
begin
setupSRL();
repeat
writeLn(readPrayer);
wait(1000);
until false;
end.
And here's what it might return:
Current
Maximum
Now
THAT'S some usable data right there, I tell ye!
Conclusion
Now, this tutorial was fairly purposefully pretty simple, but the main takeaway should be essentially that if you
can predict where you might see text on screen, and you wish to read it, Tesseract is a very viable solution with
not really a whole lot effort. This post might be several real feet long in length, but it probably translates into
about a 5 minute process for simple applications like this. The real time-and-complexity-suck comes from your last
step, massaging whatever data you get into the data you want.
Good luck, and thanks for reading!
- Lama