Bloom filter

A Bloom filter is a data structure that allows holding a huge amount of data in a very limited space and is used for checking if an element is in a set. The way it works is simple. It is an array of bits where initially every bit is set to 0. Every time we want to add an element to the filter, its collection of independent hash functions generates hashes of that element that are turned into integers used as indexes in the bit array of which bits in those positions are changed to 1. To check if an element exists in the set, the same process is used, but this time the filter checks if bits match those in array. More detailed text about this data structure I wrote here.

My implementation of a Bloom filter uses hash functions of my own design, details of which you can find here. Here is GitHub repository for the project with the most recent code. The filter’s code is as follows (as it is on 3 June 2018):

package bloomfilter;

/**
 * This class is a simple implementation of a Bloom filter to test whether an 
 * element is a member of a set.
 * @author Maciej Modzelewski <https://maciejmodzelewski.com/contact/>
 * @version 0.1 (14 April 2018)
 */
public class BloomFilter {
    private final int[] bitArray;
    private final Hasher hasher;

    /**
     * Constructs a new <code>BloomFilter</code> with the specified bit array.
     * @param aBitArray the bit array to be used by this filter
     */
    public BloomFilter(int[] aBitArray) {
        bitArray = aBitArray;
        hasher = new Hasher();
    }
    
    private class Hasher {
        private static final int NUMBER_OF_HASH_FUNCTIONS = 3;
        private final int[] hashes;
        
        private Hasher() {
            hashes = new int[NUMBER_OF_HASH_FUNCTIONS];
        }
        
        private int[] hashString(String aKey, int size) {
            for (int i = 0; i < hashes.length; i++) {
                int hashFunction = i;
                int hash = 0;
                
                switch (hashFunction) {
                    case 0: hash = StringHashFunctions.crossHash(aKey, size);
                            break;
                    case 1: hash = StringHashFunctions.indexValueHash(aKey, size);
                            break;
                    case 2: hash = StringHashFunctions.primeHash(aKey, size);
                            break;
                    default: break;
                }
                
                hashes[i] = hash;
            }
            
            return hashes;
        }
    }
    
    /**
     * Adds the specified element to the filter.
     * @param aKey the element to be added to this filter
     */
    public void addKey(String aKey) {
        int size = bitArray.length;
        int[] indexes = hasher.hashString(aKey, size);
        for (int index : indexes) {
            bitArray[index] = 1;
        }
    }
    
    /**
     * Adds the element with the specified keys to the filter.
     * @param hashes an array of three hashes that are keys for the added 
     * element
     */
    public void addKey(int[] hashes) {
        int size = bitArray.length;
        int[] indexes = hashes;
        for (int index : indexes) {
            bitArray[index] = 1;
        }
    }
    
    /**
     * Returns <code>true</code> if this filter contains the specified element.
     * @param aKey the element whose presence in this filter is to be tested
     * @return <code>true</code> if this filter contains the specified element
     */
    public boolean findKey(String aKey) {
        int size = bitArray.length;
        int[] indexes = hasher.hashString(aKey, size);
        for (int index : indexes) {
            if (bitArray[index] == 0) {
                return false;
            }
        }
        return true;
    }
    
    /**
     * Returns <code>true</code> if this filter contains the element with 
     * the specified keys.
     * @param hashes an array of three hashes that are keys whose presence 
     * in this filter is to be tested
     * @return <code>true</code> if this filter contains all specified keys
     */
    public boolean findKey(int[] hashes) {
        int size = bitArray.length;
        int[] indexes = hashes;
        for (int index : indexes) {
            if (bitArray[index] == 0) {
                return false;
            }
        }
        return true;
    }
}

Here is a ZIP containing files with the source code, BloomFilter.java and required StringHashFunctions.java: BloomFilter

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.