Sep 07 2009
[AS3] Finding Occurrences Of A Sequence Of Bytes Within A ByteArray
The following is a simple way to search within a ByteArray for specific patterns of bytes. At the end of this post is a ByteArrayUtils class that contains the indexOf(), indicesOf and lastIndexOf() methods discussed below. If you just want to grab the code and skip all the walk-through, please feel free to jump there.
Finding The First Occurrence Of The Pattern – indexOf()
- We want to know the index position of the first occurrence of the pattern, if any.
- We take the first byte of the sequence we are looking for, and scan through the target
ByteArraybyte-by-byte to find a match. - Once a match is found, we take the last byte of the sequence we are looking for, and check if the byte at the corresponding position in the target
ByteArrayalso matches. If not, we have found a false candidate and therefore the bytes in-between need not be tested. - We continue working backwards until all the bytes in the sequence are tested, aborting if a comparison is false.
- Once all the bytes in the sequence are tested positive, we have found an occurrence of the pattern.
public static function indexOf(target:ByteArray, pattern:*, fromIndex:int = 0):int { var arr:Array, end:Boolean, found:Boolean, a:int, i:int, j:int, k:int; var toFind:ByteArray = toByteArray(pattern); if (toFind == null) { // ** type of pattern unsupported ** throw new Error("Unsupported Pattern"); return; } a = toFind.length; j = target.length - a; if (fromIndex < 0) { i = j + fromIndex; if (i < 0) { return -1; } } else { i = fromIndex; } while (!end) { if (target[i] == toFind[0]) { // ** found a possible candidate ** found = true; k = a; while (--k) { if (target[i + k] != toFind[k]) { // ** doesn't match, false candidate ** found = false; break; } } if (found) { return i; } } if (fromIndex < 0) { end = (--i < 0); } else { end = (++i > j); } } return -1; }
Defining The Pattern
The indexOf() method has the following signature:
public static function indexOf(target:ByteArray, pattern:*, fromIndex:int = 0):int
If you have the sequence of bytes to find already defined as a ByteArray, you can use that for the pattern parameter (ie, you can look for a ByteArray within a ByteArray):
ByteArrayUtils.indexOf(target, sequence); // where sequence is a ByteArray object
Otherwise, you can define the sequence of bytes as an Array of byte values:
ByteArrayUtils.indexOf(target, [0x72, 0x69, 0x76, 0x76, 0x79]);
You can also define a String to look up:
ByteArrayUtils.indexOf(target, "HELLO");
The toByteArray() method helps to convert Array or String to ByteArray:
public static function toByteArray(obj:*):ByteArray { var toFind:ByteArray; if (obj is ByteArray) { toFind = obj; } else { toFind = new ByteArray(); if (obj is Array) { // ** looking for a sequence of target ** var i:int = obj.length; while (i--) { toFind[i] = obj[i]; } } else if (obj is String) { // ** looking for a sequence of string characters ** toFind.writeUTFBytes(obj); } else { return null; } } return toFind; }
Specifying Where To Begin Searching – fromIndex
- By default, we will scan from index position zero to the last valid index position (which is “length of target byte array less the length of sequence of bytes to find”).
- To begin searching with an offset, specify the
fromIndexparameter.
ByteArrayUtils.indexOf(target, "HELLO", 1000); // skip the first 1000 bytes
Finding Multiple Occurrences – indicesOf()
- The
indicesOf()method returns an array containing the index positions found. If no match is found, an empty array is returned.
var indices:Array = ByteArrayUtils.indicesOf(target, [0x72, 0x69, 0x76, 0x76, 0x79]);
public static function indicesOf(target:ByteArray, pattern:*, fromIndex:uint = 0):Array { var arr:Array = []; var toFind:ByteArray = toByteArray(pattern); if (toFind == null) { // ** type of pattern unsupported ** throw new Error("Unsupported Pattern"); return; } var i:int = indexOf(target, toFind, fromIndex); while (i != -1) { arr[arr.length] = i; i = indexOf(target, toFind, i + 1); } return arr; }
Finding The Last Occurrence – lastIndexOf()
- Use the
lastIndexOf()method to find the last, instead of the first, occurrence of the pattern. - The
indexOf()method has been coded to search backwards if thefromIndexparameter is a negative integer. - The
lastIndexOf()therefore simply calls theindexOf()method passing-1for thefromIndexparameter.
Testing By Unsigned Integers
I tried testing four bytes at a time, ie by comparing unsigned integers, but the performance ended up worse off. I have therefore removed that implementation. The following was the idea behind that implementation:
- If the sequence we are looking for is more than five bytes, we will test for matches using unsigned integers (so four bytes at a time). If the sequence is five bytes or less, we stick to byte-by-byte scan because the first and last byte would have been tested (so left three or less bytes to test).
- As long as there are four or more bytes left to test, unsigned integer comparison is used, otherwise byte value comparison is used.
ByteArrayUtils Class
ByteArrayUtils ActionScript 3.0 Class







