Array indexer

**bonsai** · 07-27-2015, 03:37 AM

Array Indexer (beta)

A lape "object" to create an index on an array and provide basic searching.

What is an index?

If I sent you down to my basement to find some duct tape you will take a long time to come back. Not me... I know exactly where it is. It would take me no time at all.

In my mind I have a good idea where the items in my basement are. And that, my friends, is an index. A little bit of information that points you in the right direction. That way you don't have to search the whole basement to know there's no duct tape down there.

When is it worthwhile?

You probably wouldn't want to inventory the basement unless you know we were playing this game all day. Our computer index is much the same.

Consider these things:

Searches are much faster using the index
But it takes a fair amount of time to create
And we have to recreate it when the data changes

An index is worthwhile if the data is stable long enough to have to search it enough times to make up for the indexing time. That's a mouthful.

How fast is fast?

Here are some numbers. The test program filled an array of records with random data then performed searches for random data against it. Traditional array searching was done, in one case just returning the first match, in the other returning all matches (i.e. searching the whole array). Index searches were performed, also with both cases of finding one or finding all.

First off, I'll admit it. Those red boxes are because I didn't get proper numbers - I think the indexer hit a bug or limit. No one's using ten million row arrays anyway

Of note here is how expensive the indexing is. The "Indexing cost" column is how many full array searches you would have to do before you made up for the indexing time. 276-731 times depending on the array size.

But look at the gains, too! Even on a small array it's 3.75 times faster to find things. On the big array it gets as high as 1830 times faster.

Also note how fast lape chunks though those arrays! I expected far worse.

Pointers

Yeah, sorry, we can't pull off this voodoo without some pointers.

If you don't know what a pointer is, every variable you create is sitting in memory somewhere and the memory is numbered to keep track of it. These numbers are called addresses. If I know the address of something I can look there and check it out.

There is a 'pointer' type. Pointers hold addresses like this. It doesn't hold the data at that address, just the address itself.

Let's look at this in lape:

Simba Code:

var
  i: integer;
  p: pointer;
begin
  i := 22;
  p := @i;
  writeln('i is a variable = ', i);
  writeln('@i is the address of i = ', @i);
  writeln('p is a pointer to something = ', p);
  writeln('p^ is the thing it points to = ', p^)
end;

Progress Report:

i is a variable = 22
@i is the address of i = "i"::0x5F58320 (22)
p is a pointer to something = "i"::0x5F58320
Error: Access violation at line 10

Dang it, I always do that! At least simba didn't crash. It failed because 'p' is just a plain old pointer. It has no idea what it's pointing to. We can use what is called 'casting' to be more explicit:

Simba Code:

var
  i: integer;
  p: pointer;
begin
  i := 22;
  p := @i;
  writeln('i is a variable = ', i);
  writeln('@i is the address of i = ', @i);
  writeln('p is a pointer to something = ', p);
  writeln('p^ is the thing it points to = ', integer(p^))
end;

Progress Report:

i is a variable = 22
@i is the address of i = "i"::0x5F58320 (22)
p is a pointer to something = "i"::0x5F58320
p^ is the thing it points to = 22

That's better. One last thing and we'll get to the real code. Pointers don't have to be generic like above. We can make a pointer to a specific datatype and avoid the casting in the last example:

Simba Code:

var
  i: integer;
  p: ^integer;
begin
  i := 22;
  p := @i;
  writeln('i is a variable = ', i);
  writeln('@i is the address of i = ', @i);
  writeln('p is a pointer to to an integer = ', p);
  writeln('p^ is the thing it points to = ', p^)
end;

Usage

Function list:

Simba Code:

procedure TIndex.init(IarrayPtr, IfieldP: pointer; IarraySize, IentrySize, IfieldSize: integer);
function TIndex.contains(item: Pointer): Boolean;
function TIndex.min(): Pointer;
function TIndex.max(): Pointer;
function TIndex.occurrencesOf(item: Pointer): integer;
function TIndex.findAll(item: Pointer): TpointerArray;
function TIndex.find(item: Pointer): Pointer;

Example usage:

Simba Code:

{$include indexer.simba}
type
  TGrade = record
    name:       string;
    id:         integer;
    bestScore:  extended;
    grade:      char;
    timeSpent:  int64;
  end;

var
  studentRecords: array of TGrade;
  finder: TGrade;
  idIndex: TIndex;
  studentPointer: ^TGrade;

begin
  {pretend we have records}
  idIndex.init(@studentRecords[0], 
      @(studentRecords[0].id), 
      length(studentRecords), 
      sizeof(TTest), 
      sizeof(studentRecords[0].id));  

  finder.id = 1357; {let's find this guy}
  studentPointer := idIndex.find(@finder);
  writeln('The student with id 1357 is ', studentPointer^.name, 
    ' and his grade is ', studentPointer^.grade);
end.

As you can see, the initializer looks a little messy but it's not so bad. Even a novice should be able to mimic the example.

It needs to know:

The address of the first record in the array
The address of the first field (in the first record)... the one you want to index.
How big is the array (and the record and the field).

When you search, you fill in the search field on a record of that same type and give its address to the search function.

The find functions return either a pointer to a single record or an array of pointers. You can turn those into usable variables with the '^' notation.

CODE

At last, the code.

Simba Code:

// 7/2015 -- bonsai
//
{$f-}

type
  TPointerArray = array of Pointer;
  PTreeNode =^ TTreeNode;
  TTreeNode = record
    data: Pointer;
    left: PTreeNode;
    right: PTreeNode;
  end;
  TIndex = record
    head: PTreeNode;
    offset: integer;
    fieldSize: integer;
  end;

procedure TPointerArray.append(p: Pointer);
begin
  setLength(self, length(self) + 1);
  self[high(self)] := p;
end;

function TIndex._pcompare(a, b: Pointer): integer;
var
  i: integer;
  p1, p2: pointer;
begin
  p1 := a + self.offset;
  p2 := b + self.offset;
  case self.fieldSize of
    1:
      begin
        if byte(p1^) = byte(p2^) then
          exit(0)
        else if byte(p1^) < byte(p2^) then
          exit( - 1)
        else
          exit(1);
      end;
    2:
      begin
        if word(p1^) = word(p2^) then
          exit(0)
        else if word(p1^) < word(p2^) then
          exit( - 1)
        else
          exit(1);
      end;
    4:
      begin
        if uint32(p1^) = uint32(p2^) then
          exit(0)
        else if uint32(p1^) < uint32(p2^) then
          exit( - 1)
        else
          exit(1);
      end;
    8:
      begin
        if uint64(p1^) = uint64(p2^) then
          exit(0)
        else if uint64(p1^) < uint64(p2^) then
          exit( - 1)
        else
          exit(1);
      end;
    10:
      begin
        if extended(p1^) = extended(p2^) then
          exit(0)
        else if extended(p1^) < extended(p2^) then
          exit( - 1)
        else
          exit(1);
      end;
  end;
end;

function TIndex._newNode(item: Pointer): PTreeNode;
var
  p: PTreeNode;
begin
  new(p);
  p^.data := item;
  p^.left := p^.right := nil;
  result := p;
end;

procedure TIndex._clearFrom(var node: PTreeNode);
begin
  if (node = nil) then
    exit;
  _clearFrom(node^.left);
  _clearFrom(node^.right);
  dispose(node);
  node := nil;
end;

procedure TIndex._clear();
begin
  self._clearFrom(self.head);
end;

function TIndex._findAllFrom(node: PTreeNode; item: Pointer): TpointerArray;
var
  tmp: TPointerArray;
begin
  setLength(result, 0);
  while (node <> nil) do
  begin
    case self._pcompare(node^.data, item) of
     -1: node := node^.right;
      1: node := node^.left;
      0:
        begin
          result.append(node^.data);
          node := node^.left;
        end;
    end;
  end;
end;

function TIndex._minFrom(node: PTreeNode): Pointer;
var
  p: PTreeNode;
begin
  p := node;
  while (p^.left <> nil) do
    p := p^.left;
  result := p^.data;
end;

function TIndex._maxFrom(node: PTreeNode): Pointer;
var
  p: PTreeNode;
begin
  p := node;
  while (p^.right <> nil) do
    p := p^.right;
  result := p^.data;
end;

function TIndex._insertFrom(var node: PTreeNode; item: Pointer): PTreeNode;
begin
  if (node = nil) then
  begin
    result := _newNode(item);
    exit
  end;
  if (self._pcompare(item, node^.data) <= 0) then
    node^.left := self._insertFrom(node^.left, item)
  else
    node^.right := self._insertFrom(node^.right, item);
  result := node;
end;

procedure TIndex._insert(item: Pointer);
begin
  self.head := self._insertFrom(self.head, item);
end;

function TIndex._occurrencesOfFrom(node: PTreeNode; item: Pointer): integer;
begin
  if (node = nil) then
    exit(0);
  case self._pcompare(node^.data, item) of
     - 1: result := self._occurrencesOfFrom(node^.right, item);
    0: result := 1 + self._occurrencesOfFrom(node^.left, item);
    1: result := self._occurrencesOfFrom(node^.left, item);
  end;
end;

function TIndex.contains(item: Pointer): Boolean;
begin
  result := (self.find(item) <> nil);
end;

function TIndex.min(): Pointer;
begin
  result := self._minFrom(self.head);
end;

function TIndex.max(): Pointer;
begin
  result := self._maxFrom(self.head);
end;

function TIndex.occurrencesOf(item: Pointer): integer;
begin
  result := self._occurrencesOfFrom(self.head, item);
end;

function TIndex.findAll(item: Pointer): TpointerArray;
begin
  result := self._findAllFrom(self.head, item);
end;

function TIndex.find(item: Pointer): Pointer;
var
  p: PTreeNode;
  tst: integer;
begin
  result := nil;
  p := self.head;
  while (p <> nil) do
  begin
    tst := self._pcompare(p^.data, item);
    if (tst = 0) then
    begin
      result := p^.data;
      break;
    end
    else
    begin
      if (tst > 0) then
        p := p^.left
      else
        p := p^.right;
    end;
  end;
end;

procedure TIndex.debugFrom(node: PTreeNode; 
   printData: procedure(p: pointer) = nil);
begin
   if (node = nil) then exit;
   
   writeln('.. Node: ', node);
   if (printData <> nil) then
      printData(node^.data);
   
   self.debugFrom(node^.left, printData);
   self.debugFrom(node^.right, printData);
end; 

procedure TIndex.debug(printData: procedure(p: pointer) = nil);
begin
   writeln('--------------------------------------------');
   writeln('Tree head=', head);
   self.debugFrom(self.head, printData);
end;   
   
procedure TIndex.init(IarrayPtr, IfieldP: pointer; IarraySize, IentrySize, IfieldSize: integer);
var
  i: integer;
  topBound: Pointer;
begin
  if (IfieldP < IarrayPtr) or 
    ((ifieldSize <> 1) and (ifieldSize <> 2) and (ifieldSize <> 4) and (ifieldSize <> 8) and (ifieldSize <> 10)) then
  begin
    writeln('error');
    exit;
  end;
  self.offset := uint32(IfieldP) - uint32(IarrayPtr);
  self.fieldSize := IfieldSize;
  
  if (self.head <> nil) then
  begin
    self._clear();
  end
  self.head := nil;

  topBound := IArrayPtr + (IarraySize * IentrySize);
  repeat
    self._insert(IarrayPtr);
    IarrayPtr += IentrySize;
  until IarrayPtr >= topBound;
end;

indexer.simba

indexerMain.simba

indexerPerf.simba

**srlMW** · 07-27-2015, 03:52 AM

Nice work, I'll begin delving into this now! Anything that can help improve my knowledge and increase my scripting skills is greatly appreciated.

REP+

E: Lets just hope I can understand this Bonsai language

**Brandon** · 07-27-2015, 11:27 AM

BinaryTree = ArrayIndexer?

**bonsai** · 07-27-2015, 01:16 PM

Originally Posted by Brandon

BinaryTree = ArrayIndexer?

Would you be happier with field indexer?

Or is the word index bothering you?

**Swag Bag** · 07-27-2015, 02:18 PM

Great idea and read bonsai! I love data structures! This is the implementation of a binary search tree correct? I'd love to see a quicksort next

**bonsai** · 07-27-2015, 06:46 PM

It's a simplified version packaged up for the purpose. I might play with some other algorithms to see how they compare.

I put out a more complete bst implementation a while back. I had to rework it for something and ended up putting this together.

For quicksort are you looking for code? Or are you more interested in the howto and how it performs?

**Swag Bag** · 07-27-2015, 08:12 PM

No I've already had fun with all the sorts I just though a quicksort would be useful for finding data in a large array! Also, what can you use this for in game? I can't think of many large static arrays at the moment

**nielsie95** · 07-27-2015, 09:31 PM

Very cool! I like the idea

For pcompare you might want to take a look at CompareMem, because I'm not sure if your implementation will work for bytesize 10 (using extended floating point to compare). Also, it might be good to note that references (such as strings and dynamic arrays) cannot be used as in index this way.

**Brandon** · 07-28-2015, 12:05 AM

Originally Posted by bonsai

It's a simplified version packaged up for the purpose. I might play with some other algorithms to see how they compare.

I put out a more complete bst implementation a while back. I had to rework it for something and ended up putting this together.

For quicksort are you looking for code? Or are you more interested in the howto and how it performs?

Nothing is bothering me. I was just a bit confused as to why you called it indexer since it uses nodes. Nonetheless, good job.

**bonsai** · 07-28-2015, 02:24 AM

Originally Posted by Swag Bag

I just though a quicksort would be useful for finding data in a large array!

With a sorted array something like a binary search could be used for queries. It would be interesting to compare.

Also, what can you use this for in game? I can't think of many large static arrays at the moment

Not a ton of obvious uses for RS scripting. I originally wrote the BST funcs because my autocoloring was pushing the boundaries of what I could do in real-time. It slow sorts big arrays and sifts through them a lot. Very inefficient design.

I hacked around some with @Obscurity; to see if something like this was helpful in processing the data returned from the opengl plugin.

Maybe some sort of bitmap processing would have use for such ideas.

Originally Posted by nielsie95

For pcompare you might want to take a look at CompareMem, because I'm not sure if your implementation will work for bytesize 10 (using extended floating point to compare). Also, it might be good to note that references (such as strings and dynamic arrays) cannot be used as in index this way.

Thanks. I was looking for memcopy/copymemory last week. I think I had it stuck in my head that I had to do it more explicitly.

Is there some hack to do a memcopy?

Originally Posted by Brandon

Nothing is bothering me. I was just a bit confused as to why you called it indexer since it uses nodes. Nonetheless, good job.

I think bothering was a poor word choice. I have a tendency to be loose with my terminology and sloppy with my code so please do keep me honest.

**nielsie95** · 07-28-2015, 08:41 AM

When you have the pointers and the size, you can just use CompareMem to compare and Move to copy data. They are both available in Lape.

EDIT: Although I do see now that CompareMem returns a boolean, not really what you need.

**CynicRus** · 07-28-2015, 07:53 PM

Why not use something like a hash tables?

Thread: Array indexer

Thread Tools

Array indexer

Thread Information

Users Browsing this Thread

Posting Permissions