Regular Expressions. What are they? A regular expression is a means to pattern matching in any programming language. They are usually usually hard to learn if they are meant to be used efficiently, but on the other hand, they are totally worth it and more useful than Pos or PosEx.
Definitions:
Pos:
Finds the first occurence of a string literal within another string. For example if we have "This is a string" and we wanted to find "is" we'd do
Simba Code:
Pos('is', 'This is a string');
The result will be the position of the first occurence of the word is, in the string.
PosEx:
Does the exact same thing, except that you get to specify an offset. Lets say:
Simba Code:
PosEx('is', OurStringHere, 10);
It will look for the first occurence of the word "is" after the first 10 characters in our string.
Simba Code:
PosEx('is', OurStringHere, SomePos + 10);
Will look for the first occurence of the word "is" after Someposition + 10 characters in our string.
Regular Expressions:
Matches a specific pattern or sequence of strings/literals within another string.
Simba Code:
ExecRegExpr('^\[\([0-9]*,(\s*|^\s)[0-9]*\)*', TPA) and ExecRegExpr('\)\]$', TPA)
Will match a PointArray containing one or more points.
Differences:
Now what is the difference between that and a Regular Expression which does only pattern matches?
A regular expression can match html tags, a TPoint with values of the unknown. It can match any pattern or sequences of strings for which your mind can think of.
An example would be a function I wrote called StringToTPointArray. It uses the following Regexes:
Simba Code:
if (ExecRegExpr('^\[\([0-9]*,(\s*|^\s)[0-9]*\)*', TPA) and ExecRegExpr('\)\]$', TPA)) then
begin
if (Not ExecRegExpr('~|!|@|#|\$|%|\^|&|\*|_|\+|=|{|}|"|''|:|;|\.|<|>|\?|/|[a-z]|[A-Z]|(\[-|-\(|\)-|\),-|-\])|\\|\.|(\((\s*|^\s)\()|(,(\s*|^\s),)|(\)(\s*|^\s)\))', TPA)) then
Now that looks like all giberish and a bunch of crap and yes, even a bit scary to think that someone wrote that just to turn any point or array of points into a string.
See in the above, it will match any point you can think of. You cannot trick it in any way as it's a strict regex and it has rules. Example TPA would look like:
Simba Code:
StringToTPA('[(10, 10), (9, 5), (6, 87), (300, 190), (4, 270), (934, 5345)]');
See in the above, you can enter any values you wish and it will tell you if the syntax matches that of a TPointArray or not. You can change any of those numbers or remove points, and it will still be found.
Now lets say you added some random crap, well it will know that you added random crap and it will not find your TPA and it will Not convert your string into points because you are deliberately trying to trick it =].
Symbols and Meanings:
Progress Report:
From Oreilly:
Symbol Meaning
c Match the literal character c once, unless it is one of the special characters.
^ Match the beginning of a line.
. Match any character that isn't a newline.
$ Match the end of a line.
| Logical OR between expressions.
() Group subexpressions.
[] Define a character class.
* Match the preceding expression zero or more times.
+ Match the preceding expression one ore more times.
? Match the preceding expression zero or one time.
{n} Match the preceding expression n times.
{n,} Match the preceding expression at least n times.
{n, m} Match the preceding expression at least n times and at most m times.
\d Match a digit.
\D Match a character that is not a digit.
\w Match an alpha character, including the underscore.
\W Match a character that is not an alpha character.
\s Match a whitespace character (any of \t, \n, \r, or \f).
\S Match a non-whitespace character.
\t Tab.
\n Newline.
\r Carriage return.
\f Form feed.
\m Escape m, where m is one of the metacharacters described above: ^, ., $, |, (), [], *, +, ?, \, or /.
Examples:
Now that you have seen what each of the above characters do, how will this be useful to you. Well in Runescape or anywhere, you may want to match a series of string to find out if someone talked to you or if a command was sent to you, or even a playername, etc.
You cannot do this using Pos or PosEx. A Regular expression must be used.
Lets say the OCR in simba is a bit off and you want to figure out if an option is on screen but it may not be exact. Well you'd do the following:
Example string we want to find:
Simba Code:
'PlayerName' or 'Player Name' or 'Player Name'
With PosEx or Pos, we'd either have to use a for-loop or know exactly how many spaces are inbetween Player and Name or Subtract two positions and copy the result.
With a RegEx, we'd do:
Simba Code:
ExecRegExpr('Player(\s)*Name', OurString);
See how simple that is? It will look for Player + (0 or more spaces) Until it finds Name. If it finds that, it will return true which lets you know that it's found.
Using Pos/PosEx:
Simba Code:
Pos1:= PosEx('Player', OurString, 0); //Offset of 0.
Pos2:= PosEx('Name', OurString, Pos1); //Start searching After Position 1.
Result:= (Length(Copy(OurString, Pos1, Pos2 - Pos1)) > 0); //Copy from position 1 to 2 and test for the length of the string.
Now if the string was more sophisticated such as testing whether a string of points really match the syntax of a TPA, this would be near impossible(very hard to do) with pos and posex. Even something as simple as an HTML tag cannot be done with Pos or PosEx.
Example:
'<td(.*?)</td>' this will match a table tag with any parameters, classes and ID's.
To do this with Pos or PosEx, would require that you know every position of every occurence of a character, sequence or space. That totally breaks the purpose of Pos and PosEx which is supposed to do this for you!
Another one is to find a Comment in a file. Lets say you want to read a file but you don't want to read in the comments too. What do you do? Well you write a Regex for a comment pattern such as:
Comments using the hash sign(#):
Simba Code:
'((^|(\s*))|(^(\s*)))#(.*)$'
Comments using the double slash (//):
Simba Code:
'((^|(\s*))|(^(\s*)))//(.*)$'
Pattern for matching any letters and numbers ONLY!:
Pattern for matching the above multiple times:
Simba Code:
'^([A-Za-z0-9])*$' //Will match it ZERO or MORE times.
In the above, it will go from the beginning of a line (Symbol: ^), search for the pattern until it reaches the end of that line (Symbol: $).
Some of these cannot be done with Pos or PosEx.
Now one more thing, you can even REPLACE a found pattern with another string!
Simba Code:
ReplaceRegExpr('((^|(\s*))|(^(\s*)))//(.*)$', InputString, 7, true);
The above will replace any double slash comment (//...) with 7's in our Input string.
For more crazy string handling commands, Visit Janilabo's thread here: http://villavu.com/forum/showthread.php?t=82205
There you can also view my TPA pattern and probably add or remove from it. Remember, regex's aren't limited and can be shortened or lengthened. It can be the difference between finding a 2 vs. 22 within a string