Results 1 to 12 of 12

Thread: Array indexer

  1. #1
    Join Date
    Oct 2013
    Location
    East Coast USA
    Posts
    770
    Mentioned
    61 Post(s)
    Quoted
    364 Post(s)

    Default Array indexer

    Array Indexer (beta)

    A lape "object" to create an index on an array and provide basic searching.

    What is an index?

    If I sent you down to my basement to find some duct tape you will take a long time to come back. Not me... I know exactly where it is. It would take me no time at all.

    In my mind I have a good idea where the items in my basement are. And that, my friends, is an index. A little bit of information that points you in the right direction. That way you don't have to search the whole basement to know there's no duct tape down there.


    When is it worthwhile?

    You probably wouldn't want to inventory the basement unless you know we were playing this game all day. Our computer index is much the same.

    Consider these things:

    • Searches are much faster using the index
    • But it takes a fair amount of time to create
    • And we have to recreate it when the data changes

    An index is worthwhile if the data is stable long enough to have to search it enough times to make up for the indexing time. That's a mouthful.


    How fast is fast?

    Here are some numbers. The test program filled an array of records with random data then performed searches for random data against it. Traditional array searching was done, in one case just returning the first match, in the other returning all matches (i.e. searching the whole array). Index searches were performed, also with both cases of finding one or finding all.



    First off, I'll admit it. Those red boxes are because I didn't get proper numbers - I think the indexer hit a bug or limit. No one's using ten million row arrays anyway

    Of note here is how expensive the indexing is. The "Indexing cost" column is how many full array searches you would have to do before you made up for the indexing time. 276-731 times depending on the array size.

    But look at the gains, too! Even on a small array it's 3.75 times faster to find things. On the big array it gets as high as 1830 times faster.

    Also note how fast lape chunks though those arrays! I expected far worse.


    Pointers

    Yeah, sorry, we can't pull off this voodoo without some pointers.

    If you don't know what a pointer is, every variable you create is sitting in memory somewhere and the memory is numbered to keep track of it. These numbers are called addresses. If I know the address of something I can look there and check it out.

    There is a 'pointer' type. Pointers hold addresses like this. It doesn't hold the data at that address, just the address itself.

    Let's look at this in lape:
    Simba Code:
    var
      i: integer;
      p: pointer;
    begin
      i := 22;
      p := @i;
      writeln('i is a variable = ', i);
      writeln('@i is the address of i = ', @i);
      writeln('p is a pointer to something = ', p);
      writeln('p^ is the thing it points to = ', p^)
    end;
    Progress Report:
    i is a variable = 22
    @i is the address of i = "i"::0x5F58320 (22)
    p is a pointer to something = "i"::0x5F58320
    Error: Access violation at line 10

    Dang it, I always do that! At least simba didn't crash. It failed because 'p' is just a plain old pointer. It has no idea what it's pointing to. We can use what is called 'casting' to be more explicit:
    Simba Code:
    var
      i: integer;
      p: pointer;
    begin
      i := 22;
      p := @i;
      writeln('i is a variable = ', i);
      writeln('@i is the address of i = ', @i);
      writeln('p is a pointer to something = ', p);
      writeln('p^ is the thing it points to = ', integer(p^))
    end;
    Progress Report:
    i is a variable = 22
    @i is the address of i = "i"::0x5F58320 (22)
    p is a pointer to something = "i"::0x5F58320
    p^ is the thing it points to = 22

    That's better. One last thing and we'll get to the real code. Pointers don't have to be generic like above. We can make a pointer to a specific datatype and avoid the casting in the last example:
    Simba Code:
    var
      i: integer;
      p: ^integer;
    begin
      i := 22;
      p := @i;
      writeln('i is a variable = ', i);
      writeln('@i is the address of i = ', @i);
      writeln('p is a pointer to to an integer = ', p);
      writeln('p^ is the thing it points to = ', p^)
    end;


    Usage

    Function list:

    Simba Code:
    procedure TIndex.init(IarrayPtr, IfieldP: pointer; IarraySize, IentrySize, IfieldSize: integer);
    function TIndex.contains(item: Pointer): Boolean;
    function TIndex.min(): Pointer;
    function TIndex.max(): Pointer;
    function TIndex.occurrencesOf(item: Pointer): integer;
    function TIndex.findAll(item: Pointer): TpointerArray;
    function TIndex.find(item: Pointer): Pointer;

    Example usage:

    Simba Code:
    {$include indexer.simba}
    type
      TGrade = record
        name:       string;
        id:         integer;
        bestScore:  extended;
        grade:      char;
        timeSpent:  int64;
      end;

    var
      studentRecords: array of TGrade;
      finder: TGrade;
      idIndex: TIndex;
      studentPointer: ^TGrade;

    begin
      {pretend we have records}
      idIndex.init(@studentRecords[0],
          @(studentRecords[0].id),
          length(studentRecords),
          sizeof(TTest),
          sizeof(studentRecords[0].id));  

      finder.id = 1357; {let's find this guy}
      studentPointer := idIndex.find(@finder);
      writeln('The student with id 1357 is ', studentPointer^.name,
        ' and his grade is ', studentPointer^.grade);
    end.
    As you can see, the initializer looks a little messy but it's not so bad. Even a novice should be able to mimic the example.

    It needs to know:

    • The address of the first record in the array
    • The address of the first field (in the first record)... the one you want to index.
    • How big is the array (and the record and the field).


    When you search, you fill in the search field on a record of that same type and give its address to the search function.

    The find functions return either a pointer to a single record or an array of pointers. You can turn those into usable variables with the '^' notation.


    CODE

    At last, the code.

    Simba Code:
    // 7/2015 -- bonsai
    //
    {$f-}

    type
      TPointerArray = array of Pointer;
      PTreeNode =^ TTreeNode;
      TTreeNode = record
        data: Pointer;
        left: PTreeNode;
        right: PTreeNode;
      end;
      TIndex = record
        head: PTreeNode;
        offset: integer;
        fieldSize: integer;
      end;

    procedure TPointerArray.append(p: Pointer);
    begin
      setLength(self, length(self) + 1);
      self[high(self)] := p;
    end;

    function TIndex._pcompare(a, b: Pointer): integer;
    var
      i: integer;
      p1, p2: pointer;
    begin
      p1 := a + self.offset;
      p2 := b + self.offset;
      case self.fieldSize of
        1:
          begin
            if byte(p1^) = byte(p2^) then
              exit(0)
            else if byte(p1^) < byte(p2^) then
              exit( - 1)
            else
              exit(1);
          end;
        2:
          begin
            if word(p1^) = word(p2^) then
              exit(0)
            else if word(p1^) < word(p2^) then
              exit( - 1)
            else
              exit(1);
          end;
        4:
          begin
            if uint32(p1^) = uint32(p2^) then
              exit(0)
            else if uint32(p1^) < uint32(p2^) then
              exit( - 1)
            else
              exit(1);
          end;
        8:
          begin
            if uint64(p1^) = uint64(p2^) then
              exit(0)
            else if uint64(p1^) < uint64(p2^) then
              exit( - 1)
            else
              exit(1);
          end;
        10:
          begin
            if extended(p1^) = extended(p2^) then
              exit(0)
            else if extended(p1^) < extended(p2^) then
              exit( - 1)
            else
              exit(1);
          end;
      end;
    end;

    function TIndex._newNode(item: Pointer): PTreeNode;
    var
      p: PTreeNode;
    begin
      new(p);
      p^.data := item;
      p^.left := p^.right := nil;
      result := p;
    end;

    procedure TIndex._clearFrom(var node: PTreeNode);
    begin
      if (node = nil) then
        exit;
      _clearFrom(node^.left);
      _clearFrom(node^.right);
      dispose(node);
      node := nil;
    end;

    procedure TIndex._clear();
    begin
      self._clearFrom(self.head);
    end;

    function TIndex._findAllFrom(node: PTreeNode; item: Pointer): TpointerArray;
    var
      tmp: TPointerArray;
    begin
      setLength(result, 0);
      while (node <> nil) do
      begin
        case self._pcompare(node^.data, item) of
         -1: node := node^.right;
          1: node := node^.left;
          0:
            begin
              result.append(node^.data);
              node := node^.left;
            end;
        end;
      end;
    end;

    function TIndex._minFrom(node: PTreeNode): Pointer;
    var
      p: PTreeNode;
    begin
      p := node;
      while (p^.left <> nil) do
        p := p^.left;
      result := p^.data;
    end;

    function TIndex._maxFrom(node: PTreeNode): Pointer;
    var
      p: PTreeNode;
    begin
      p := node;
      while (p^.right <> nil) do
        p := p^.right;
      result := p^.data;
    end;

    function TIndex._insertFrom(var node: PTreeNode; item: Pointer): PTreeNode;
    begin
      if (node = nil) then
      begin
        result := _newNode(item);
        exit
      end;
      if (self._pcompare(item, node^.data) <= 0) then
        node^.left := self._insertFrom(node^.left, item)
      else
        node^.right := self._insertFrom(node^.right, item);
      result := node;
    end;

    procedure TIndex._insert(item: Pointer);
    begin
      self.head := self._insertFrom(self.head, item);
    end;

    function TIndex._occurrencesOfFrom(node: PTreeNode; item: Pointer): integer;
    begin
      if (node = nil) then
        exit(0);
      case self._pcompare(node^.data, item) of
         - 1: result := self._occurrencesOfFrom(node^.right, item);
        0: result := 1 + self._occurrencesOfFrom(node^.left, item);
        1: result := self._occurrencesOfFrom(node^.left, item);
      end;
    end;

    function TIndex.contains(item: Pointer): Boolean;
    begin
      result := (self.find(item) <> nil);
    end;

    function TIndex.min(): Pointer;
    begin
      result := self._minFrom(self.head);
    end;

    function TIndex.max(): Pointer;
    begin
      result := self._maxFrom(self.head);
    end;

    function TIndex.occurrencesOf(item: Pointer): integer;
    begin
      result := self._occurrencesOfFrom(self.head, item);
    end;

    function TIndex.findAll(item: Pointer): TpointerArray;
    begin
      result := self._findAllFrom(self.head, item);
    end;

    function TIndex.find(item: Pointer): Pointer;
    var
      p: PTreeNode;
      tst: integer;
    begin
      result := nil;
      p := self.head;
      while (p <> nil) do
      begin
        tst := self._pcompare(p^.data, item);
        if (tst = 0) then
        begin
          result := p^.data;
          break;
        end
        else
        begin
          if (tst > 0) then
            p := p^.left
          else
            p := p^.right;
        end;
      end;
    end;

    procedure TIndex.debugFrom(node: PTreeNode;
       printData: procedure(p: pointer) = nil);
    begin
       if (node = nil) then exit;
       
       writeln('.. Node: ', node);
       if (printData <> nil) then
          printData(node^.data);
       
       self.debugFrom(node^.left, printData);
       self.debugFrom(node^.right, printData);
    end;

    procedure TIndex.debug(printData: procedure(p: pointer) = nil);
    begin
       writeln('--------------------------------------------');
       writeln('Tree head=', head);
       self.debugFrom(self.head, printData);
    end;  
       
    procedure TIndex.init(IarrayPtr, IfieldP: pointer; IarraySize, IentrySize, IfieldSize: integer);
    var
      i: integer;
      topBound: Pointer;
    begin
      if (IfieldP < IarrayPtr) or
        ((ifieldSize <> 1) and (ifieldSize <> 2) and (ifieldSize <> 4) and (ifieldSize <> 8) and (ifieldSize <> 10)) then
      begin
        writeln('error');
        exit;
      end;
      self.offset := uint32(IfieldP) - uint32(IarrayPtr);
      self.fieldSize := IfieldSize;
     
      if (self.head <> nil) then
      begin
        self._clear();
      end
      self.head := nil;

      topBound := IArrayPtr + (IarraySize * IentrySize);
      repeat
        self._insert(IarrayPtr);
        IarrayPtr += IentrySize;
      until IarrayPtr >= topBound;
    end;

    indexer.simba

    indexerMain.simba

    indexerPerf.simba

  2. #2
    Join Date
    Apr 2015
    Location
    FireFox
    Posts
    528
    Mentioned
    10 Post(s)
    Quoted
    227 Post(s)

    Default

    Nice work, I'll begin delving into this now! Anything that can help improve my knowledge and increase my scripting skills is greatly appreciated.
    REP+

    E: Lets just hope I can understand this Bonsai language
    Scripting with ogLib

  3. #3
    Join Date
    Feb 2011
    Location
    The Future.
    Posts
    5,600
    Mentioned
    396 Post(s)
    Quoted
    1598 Post(s)

    Default

    BinaryTree = ArrayIndexer?
    I am Ggzz..
    Hackintosher

  4. #4
    Join Date
    Oct 2013
    Location
    East Coast USA
    Posts
    770
    Mentioned
    61 Post(s)
    Quoted
    364 Post(s)

    Default

    Quote Originally Posted by Brandon View Post
    BinaryTree = ArrayIndexer?
    Would you be happier with field indexer?

    Or is the word index bothering you?

  5. #5
    Join Date
    Aug 2012
    Posts
    188
    Mentioned
    9 Post(s)
    Quoted
    71 Post(s)

    Default

    Great idea and read bonsai! I love data structures! This is the implementation of a binary search tree correct? I'd love to see a quicksort next

  6. #6
    Join Date
    Oct 2013
    Location
    East Coast USA
    Posts
    770
    Mentioned
    61 Post(s)
    Quoted
    364 Post(s)

    Default

    It's a simplified version packaged up for the purpose. I might play with some other algorithms to see how they compare.

    I put out a more complete bst implementation a while back. I had to rework it for something and ended up putting this together.

    For quicksort are you looking for code? Or are you more interested in the howto and how it performs?

  7. #7
    Join Date
    Aug 2012
    Posts
    188
    Mentioned
    9 Post(s)
    Quoted
    71 Post(s)

    Default

    No I've already had fun with all the sorts I just though a quicksort would be useful for finding data in a large array! Also, what can you use this for in game? I can't think of many large static arrays at the moment

  8. #8
    Join Date
    Sep 2006
    Posts
    6,089
    Mentioned
    77 Post(s)
    Quoted
    43 Post(s)

    Default

    Very cool! I like the idea
    For pcompare you might want to take a look at CompareMem, because I'm not sure if your implementation will work for bytesize 10 (using extended floating point to compare). Also, it might be good to note that references (such as strings and dynamic arrays) cannot be used as in index this way.
    Hup Holland Hup!

  9. #9
    Join Date
    Feb 2011
    Location
    The Future.
    Posts
    5,600
    Mentioned
    396 Post(s)
    Quoted
    1598 Post(s)

    Default

    Quote Originally Posted by bonsai View Post
    It's a simplified version packaged up for the purpose. I might play with some other algorithms to see how they compare.

    I put out a more complete bst implementation a while back. I had to rework it for something and ended up putting this together.

    For quicksort are you looking for code? Or are you more interested in the howto and how it performs?

    Nothing is bothering me. I was just a bit confused as to why you called it indexer since it uses nodes. Nonetheless, good job.
    I am Ggzz..
    Hackintosher

  10. #10
    Join Date
    Oct 2013
    Location
    East Coast USA
    Posts
    770
    Mentioned
    61 Post(s)
    Quoted
    364 Post(s)

    Default

    Quote Originally Posted by Swag Bag View Post
    I just though a quicksort would be useful for finding data in a large array!
    With a sorted array something like a binary search could be used for queries. It would be interesting to compare.

    Also, what can you use this for in game? I can't think of many large static arrays at the moment
    Not a ton of obvious uses for RS scripting. I originally wrote the BST funcs because my autocoloring was pushing the boundaries of what I could do in real-time. It slow sorts big arrays and sifts through them a lot. Very inefficient design.

    I hacked around some with @Obscurity; to see if something like this was helpful in processing the data returned from the opengl plugin.

    Maybe some sort of bitmap processing would have use for such ideas.

    Quote Originally Posted by nielsie95 View Post
    For pcompare you might want to take a look at CompareMem, because I'm not sure if your implementation will work for bytesize 10 (using extended floating point to compare). Also, it might be good to note that references (such as strings and dynamic arrays) cannot be used as in index this way.
    Thanks. I was looking for memcopy/copymemory last week. I think I had it stuck in my head that I had to do it more explicitly.

    Is there some hack to do a memcopy?

    Quote Originally Posted by Brandon View Post
    Nothing is bothering me. I was just a bit confused as to why you called it indexer since it uses nodes. Nonetheless, good job.
    I think bothering was a poor word choice. I have a tendency to be loose with my terminology and sloppy with my code so please do keep me honest.

  11. #11
    Join Date
    Sep 2006
    Posts
    6,089
    Mentioned
    77 Post(s)
    Quoted
    43 Post(s)

    Default

    When you have the pointers and the size, you can just use CompareMem to compare and Move to copy data. They are both available in Lape.

    EDIT: Although I do see now that CompareMem returns a boolean, not really what you need.
    Hup Holland Hup!

  12. #12
    Join Date
    May 2012
    Location
    Moscow, Russia
    Posts
    661
    Mentioned
    35 Post(s)
    Quoted
    102 Post(s)

    Default

    Why not use something like a hash tables?
    Per aspera ad Astra!
    ----------------------------------------
    Slow and steady wins the race.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •