Description
-
Boyer-Moore algorithm
In part 1 of HW4, you will implement the Boyer-Moore algorithm as described in lecture.
TODO In the provided class (boyer moore.py), implement
add next element get majority
You may assume in your code and do not need to check that each call to add next element will be a single 1-character string drawn from the upper- and lowercase alphabet. We will check that the values of counter and guess are correct, although we will not check the value of guess when counter is zero.
1
CS 5112 Algorithms and Data Structures for Applications Fall 2019
Testing Boyer-Moore
When you run python3 test boyer moore.py, by default it should terminate when 5 cases failed1. You can change this behavior by changing max test failures, or by passing command-line argu-ments(e.g. python3 test boyer moore.py –max-test-failures 10).
For Boyer-Moore, in addition to the nal result, we will also grade your code by checking if the counter is correct at each step, and if the guess is correct at each step where the counter is not 0. The test cases are in (testcases boyer moore.txt). Each line has 4 semicolon-delimited elds, which are:
-
name
name of the test
symbols
string of symbols
expected guess
symbol, or ‘!’
expected counter
integer
-
Bloom lter
In part 2 of HW4, you will implement a simple Bloom lter.
TODO In the provided class (bloom_filter.py), implement
add_elem
check_membership
Important: Your Bloom lter implementation must use the three hash functions that we have provided (cs5112_hash1, cs5112_hash2, and cs5112_hash3).
Do not use Python’s built-in hash function! You must use these three.
You are asked to implement a Bloom lter with k=3 hash functions and a m=10 bit array.
The comments in bloom_filter.py say more about what we expect for inputs and outputs.
2.1 Testing Bloom lter
For your Bloom lter, in addition to testing the output of your functions, we will also be inspecting how you’re storing data in the underlying array. Therefore, be sure to implement the algorithms to spec.
Run python3 test bloom filter.py for a very simple test.
Note on Python 3’s hash function TL;DR: This note shouldn’t a ect your implementation in bloom_filter.py at all, but might be useful when you are debugging.
Note that Python 3’s built-in hash is non-deterministic. For example, every time you run python3 -c “print(hash(‘hello’))”, the result will be di erent. This is great for security reasons, but it can also be a pain when debugging or for research as we want to be able to regenerate previous experimental results in these scenarios.
-
Some cases may pass \by luck” even when you haven’t implemented the algorithm yet, because we initialize guess as None and counter as 0.
2
CS 5112 Algorithms and Data Structures for Applications Fall 2019
Figure 1: Default behavior since Python 3.3, good for security reasons
One way2 to get past this when you are debugging or testing is by setting the environment variable PYTHONHASHSEED to an integer value, like PYTHONHASHSEED=0 python3 -c “print(hash(’hello’))”.
Figure 2: However, determinism might be more desirable when debugging and testing
-
Admittedly, this approach is intrinsically hacky. In real-world production code, if you actually want a deterministic hash, you should almost always use hashlib.
3