Results 1 to 10 of 10

Thread: Regular Expressions

  1. #1
    Join Date
    Feb 2011
    Location
    The Future.
    Posts
    5,600
    Mentioned
    396 Post(s)
    Quoted
    1598 Post(s)

    Default Regular Expressions

    Regular Expressions. What are they? A regular expression is a means to pattern matching in any programming language. They are usually usually hard to learn if they are meant to be used efficiently, but on the other hand, they are totally worth it and more useful than Pos or PosEx.

    Definitions:

    Pos:
    Finds the first occurence of a string literal within another string. For example if we have "This is a string" and we wanted to find "is" we'd do
    Simba Code:
    Pos('is', 'This is a string');

    The result will be the position of the first occurence of the word is, in the string.

    PosEx:
    Does the exact same thing, except that you get to specify an offset. Lets say:
    Simba Code:
    PosEx('is', OurStringHere, 10);
    It will look for the first occurence of the word "is" after the first 10 characters in our string.

    Simba Code:
    PosEx('is', OurStringHere, SomePos + 10);
    Will look for the first occurence of the word "is" after Someposition + 10 characters in our string.


    Regular Expressions:
    Matches a specific pattern or sequence of strings/literals within another string.
    Simba Code:
    ExecRegExpr('^\[\([0-9]*,(\s*|^\s)[0-9]*\)*', TPA) and ExecRegExpr('\)\]$', TPA)
    Will match a PointArray containing one or more points.



    Differences:
    Now what is the difference between that and a Regular Expression which does only pattern matches?

    A regular expression can match html tags, a TPoint with values of the unknown. It can match any pattern or sequences of strings for which your mind can think of.

    An example would be a function I wrote called StringToTPointArray. It uses the following Regexes:
    Simba Code:
    if (ExecRegExpr('^\[\([0-9]*,(\s*|^\s)[0-9]*\)*', TPA) and ExecRegExpr('\)\]$', TPA)) then
      begin
        if (Not  ExecRegExpr('~|!|@|#|\$|%|\^|&|\*|_|\+|=|{|}|"|''|:|;|\.|<|>|\?|/|[a-z]|[A-Z]|(\[-|-\(|\)-|\),-|-\])|\\|\.|(\((\s*|^\s)\()|(,(\s*|^\s),)|(\)(\s*|^\s)\))',  TPA)) then

    Now that looks like all giberish and a bunch of crap and yes, even a bit scary to think that someone wrote that just to turn any point or array of points into a string.

    See in the above, it will match any point you can think of. You cannot trick it in any way as it's a strict regex and it has rules. Example TPA would look like:
    Simba Code:
    StringToTPA('[(10, 10), (9, 5), (6, 87), (300, 190), (4, 270), (934, 5345)]');

    See in the above, you can enter any values you wish and it will tell you if the syntax matches that of a TPointArray or not. You can change any of those numbers or remove points, and it will still be found.
    Now lets say you added some random crap, well it will know that you added random crap and it will not find your TPA and it will Not convert your string into points because you are deliberately trying to trick it =].

    Symbols and Meanings:
    Progress Report:
    From Oreilly:
    
    Symbol     Meaning
    c          Match the literal character c once, unless it is one of the special characters.
    ^          Match the beginning of a line.
    .          Match any character that isn't a newline.
    $          Match the end of a line.
    |          Logical OR between expressions.
    ()         Group subexpressions.
    []         Define a character class.
    *          Match the preceding expression zero or more times.
    +          Match the preceding expression one ore more times.
    ?          Match the preceding expression zero or one time.
    {n}        Match the preceding expression n times.
    {n,}       Match the preceding expression at least n times.
    {n, m}     Match the preceding expression at least n times and at most m times.
    \d         Match a digit.
    \D         Match a character that is not a digit.
    \w         Match an alpha character, including the underscore.
    \W         Match a character that is not an alpha character.
    \s         Match a whitespace character (any of \t, \n, \r, or \f).
    \S         Match a non-whitespace character.
    \t         Tab.
    \n         Newline.
    \r         Carriage return.
    \f         Form feed.
    \m         Escape m, where m is one of the metacharacters described above: ^, ., $, |, (), [], *, +, ?, \, or /.



    Examples:

    Now that you have seen what each of the above characters do, how will this be useful to you. Well in Runescape or anywhere, you may want to match a series of string to find out if someone talked to you or if a command was sent to you, or even a playername, etc.

    You cannot do this using Pos or PosEx. A Regular expression must be used.

    Lets say the OCR in simba is a bit off and you want to figure out if an option is on screen but it may not be exact. Well you'd do the following:

    Example string we want to find:
    Simba Code:
    'PlayerName'  or 'Player Name' or 'Player       Name'

    With PosEx or Pos, we'd either have to use a for-loop or know exactly how many spaces are inbetween Player and Name or Subtract two positions and copy the result.

    With a RegEx, we'd do:
    Simba Code:
    ExecRegExpr('Player(\s)*Name', OurString);

    See how simple that is? It will look for Player + (0 or more spaces) Until it finds Name. If it finds that, it will return true which lets you know that it's found.

    Using Pos/PosEx:
    Simba Code:
    Pos1:= PosEx('Player', OurString, 0);    //Offset of 0.
    Pos2:= PosEx('Name', OurString, Pos1);   //Start searching After Position 1.

    Result:= (Length(Copy(OurString, Pos1, Pos2 - Pos1)) > 0);   //Copy  from position 1 to 2 and test for the length of the string.

    Now if the string was more sophisticated such as testing whether a string of points really match the syntax of a TPA, this would be near impossible(very hard to do) with pos and posex. Even something as simple as an HTML tag cannot be done with Pos or PosEx.

    Example:
    '<td(.*?)</td>' this will match a table tag with any parameters, classes and ID's.

    To do this with Pos or PosEx, would require that you know every position of every occurence of a character, sequence or space. That totally breaks the purpose of Pos and PosEx which is supposed to do this for you!

    Another one is to find a Comment in a file. Lets say you want to read a file but you don't want to read in the comments too. What do you do? Well you write a Regex for a comment pattern such as:

    Comments using the hash sign(#):
    Simba Code:
    '((^|(\s*))|(^(\s*)))#(.*)$'

    Comments using the double slash (//):
    Simba Code:
    '((^|(\s*))|(^(\s*)))//(.*)$'

    Pattern for matching any letters and numbers ONLY!:
    Simba Code:
    '^[A-Za-z0-9]$'

    Pattern for matching the above multiple times:
    Simba Code:
    '^([A-Za-z0-9])*$'  //Will match it ZERO or MORE times.

    In the above, it will go from the beginning of a line (Symbol: ^), search for the pattern until it reaches the end of that line (Symbol: $).

    Some of these cannot be done with Pos or PosEx.

    Now one more thing, you can even REPLACE a found pattern with another string!

    Simba Code:
    ReplaceRegExpr('((^|(\s*))|(^(\s*)))//(.*)$', InputString, 7, true);

    The above will replace any double slash comment (//...) with 7's in our Input string.

    For more crazy string handling commands, Visit Janilabo's thread here: http://villavu.com/forum/showthread.php?t=82205

    There you can also view my TPA pattern and probably add or remove from it. Remember, regex's aren't limited and can be shortened or lengthened. It can be the difference between finding a 2 vs. 22 within a string
    Last edited by Brandon; 06-17-2012 at 05:08 PM.
    I am Ggzz..
    Hackintosher

  2. #2
    Join Date
    Feb 2006
    Location
    Helsinki, Finland
    Posts
    1,395
    Mentioned
    30 Post(s)
    Quoted
    107 Post(s)

    Default

    Very nice guide on Regular Expressions - highly useful!
    Cheers Brandon.

    Hopefully you'll keep adding more and more regex stuff for this guide (I see you have already been updating a bit)

    -Jani

  3. #3
    Join Date
    Feb 2011
    Location
    The Future.
    Posts
    5,600
    Mentioned
    396 Post(s)
    Quoted
    1598 Post(s)

    Default

    Quote Originally Posted by Janilabo View Post
    Very nice guide on Regular Expressions - highly useful!
    Cheers Brandon.

    Hopefully you'll keep adding more and more regex stuff for this guide (I see you have already been updating a bit)

    -Jani
    Definitely. Just need some time to write more and explain more. I prefer to write it then break it down piece by piece as it'll get difficult if I just throw it out there. In time I'll upload the rest Hope to see some scripters uses Regular Expressions soon.
    I am Ggzz..
    Hackintosher

  4. #4
    Join Date
    Jan 2012
    Posts
    2,568
    Mentioned
    35 Post(s)
    Quoted
    356 Post(s)

    Default

    Great tutorial! A few things i dont understand:
    Quote Originally Posted by Brandon View Post
    Using Pos/PosEx:
    Simba Code:
    Pos1:= PosEx('Player', OurString, 0);    //Offset of 0.
    Pos2:= PosEx('Name', OurString, Pos1);   //Start searching After Position 1.

    Result:= (Length(Copy(OurString, Pos1, Pos2 - Pos1)) > 0);   //Copy  from position 1 to 2 and test for the length of the string.
    Why do we have to test for the length of string to confirm the string match? Wouldn't (Pos1 > 0) and (Pos2 > 0) be sufficient?

    Quote Originally Posted by Brandon View Post
    Simba Code:
    '((^|(\s*))|(^(\s*)))#(.*)$'
    Doesn't ^ match the start of string/line? (or exclusion of chars used as [^ ])
    So if it's followed by the metachar | what it means?

  5. #5
    Join Date
    Feb 2011
    Location
    The Future.
    Posts
    5,600
    Mentioned
    396 Post(s)
    Quoted
    1598 Post(s)

    Default

    Quote Originally Posted by riwu View Post
    Great tutorial! A few things i dont understand:

    Why do we have to test for the length of string to confirm the string match? Wouldn't (Pos1 > 0) and (Pos2 > 0) be sufficient?
    Because that regex is a very loose regex. A regular expression can match partially or fully depending on how you write it. Testing for the length was (at the time) a good way to check for a full match only. No part, not more, but exact.

    Doesn't ^ match the start of string/line? (or exclusion of chars used as [^ ])
    So if it's followed by the metachar | what it means?
    It does match the beginning of a line. However, that regex is different. It says: "Match the beginning of a line OR zero or more white space characters".. that's because comments don't have to be at the beginning of a line. A comment can be anywhere in-between as well..

    A better solution would be to check for // or /* or {* or (* or whatever specific to your comment type but I chose not to, to show the the different combinations you can use regex's for.

    Simba Code:
    procedure meh; //do something comment here.. this is not at the beginning of the line.

    You are right that [^] means match anything except but that's only when used inside of the class brackets; not capture brackets.


    Btw, this tutorial is completely outdated for lape and should only be used for simple things. If you want full regex abuse (with grouping and all the good stuff), use TRegex.
    I am Ggzz..
    Hackintosher

  6. #6
    Join Date
    Dec 2007
    Posts
    289
    Mentioned
    4 Post(s)
    Quoted
    86 Post(s)

    Default

    Personally I've always used a bit of trial and error when it comes to regular expressions (in fact, probably programming in general...).

    I'd recommend https://jex.im/regulex/ (or any of the other top results on Google when searching for regex visualisers) to help you understand exactly what the regex is doing.

    --

    Any clue as to why this tutorial is in forum guides and not the programming help/tutorial sub-forum?

  7. #7
    Join Date
    Feb 2006
    Location
    Tracy/Davis, California
    Posts
    12,631
    Mentioned
    135 Post(s)
    Quoted
    418 Post(s)

    Default

    Great guide, nice detail!

    Is there any way to return the match string?
    ex:
    Simba Code:
    ExecRegExpr('\cool', 'simba in cool')
    Something like that, to return the matched string 'cool'.

  8. #8
    Join Date
    Feb 2011
    Location
    The Future.
    Posts
    5,600
    Mentioned
    396 Post(s)
    Quoted
    1598 Post(s)

    Default

    Quote Originally Posted by YoHoJo View Post
    Great guide, nice detail!

    Is there any way to return the match string?
    ex:
    Simba Code:
    ExecRegExpr('\cool', 'simba in cool')
    Something like that, to return the matched string 'cool'.
    https://github.com/MerlijnWajer/Simb...PS/regex.simba
    I am Ggzz..
    Hackintosher

  9. #9
    Join Date
    Feb 2006
    Location
    Tracy/Davis, California
    Posts
    12,631
    Mentioned
    135 Post(s)
    Quoted
    418 Post(s)

    Default

    Error: Unknown declaration "TRegExp" at line 3

    Simba Rev 1100

  10. #10
    Join Date
    Jan 2012
    Posts
    2,568
    Mentioned
    35 Post(s)
    Quoted
    356 Post(s)

    Default

    Quote Originally Posted by YoHoJo View Post
    Error: Unknown declaration "TRegExp" at line 3

    Simba Rev 1100
    Lape:
    Code:
    program new;
    var
      x : TRegExpr;
    begin
      x.init();
      x.setExpression('W?alk');
      x.setInputString('Lets talk bitch!');
      writeln(x.execPos(1));
      writeln(x.getMatch(0));
      x.free();
    end.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •