Description
Setup
Update your repository to gain access to the files necessary for this assignment: BufferPool.java, Permissions.java, and an updated version of Database.java. This assignment does rely on a working version of assignment 1. Specifically, you will be reading and writing tuples, and manipulating HeapFiles and HeapPages. If you are concerned that you implementation for assignment 1 is not correct, please meet with the instructor as soon as possible to discuss your options.
Overview
In this assignment, you will implement locking that can be used with transactions. These transactions will use the two phase locking routine that we have discussed in class. These transactions are specifically expected to implement strict two phase locking, meaning that they will acquire all locks before performing any modifications on data. Locks should generally be kept until the transaction is complete (either aborted or committed), though it may be possible to release some locks earlier than that.
Note that you are not being tasked with creating transactions themselves. We are simply providing the methods that transactions would use to access data. It is then the transaction’s responsibility to use these methods properly.
BufferPool
The majority of the work for this assignment is in the BufferPool. A BufferPool can be thought of as an object that manages access to pages within our database. It will manage locks, make sure that a transaction has the necessary locks before reading or writing data, and manage the data access (reads and writes) to the data itself.
It does this by managing a cache of HeapPages
. Any HeapPages
that need to be read or written to will first be stored in this cache. If a page is modified, we will mark that page dirty so that we know that it is not currently the same as the data on disk. This is important, as if we were to modify the data directly on disk then it would be much more difficult to rollback a transaction upon abort.
Examine the methods in BufferPool
and make sure you understand what their purpose is before you begin. Some implementation hints follow:
* You will need to decide on a structure for the page cache. We also need a way of marking whether pages are dirty or not, i.e. if they have been modified.
* You will also need to decide on a structure to manage what locks currently exist. All locks should be done on the page level. It is possible for more than one transaction to have a read (shared) lock on a page, but it is only possible for one write lock to exist for a page. Write locks cannot be acquired if any other locks already exist for a given page (read or write). It is possible (and may be convenient) to upgrade a lock from a read lock to a write lock. Typically this structure would be considered part of the database Catalog
.
* In order for a page to be read or written to, it must be in the BufferPool cache. If a transaction wishes to modify data, then it will call either insertTuple
or deleteTuple
which must verify that the given transaction has the write lock for the requested page before performing the operation. This write lock will have been acquired via a previous call to the getPage()
method.
* Since space in the BufferPool cache is limited, you may have to sometimes evict pages when reading in a new page. We want to ensure that we do not evict any dirty pages, as these pages are currently being worked on by a transaction. Complete your evict()
method such that it evicts the first non-dirty page that it can find. If no such page exists, throw an exception.
* It is possible that a transaction requests a lock that is not currently available. If this occurs, you should block, i.e. periodically try to acquire the lock again until it is recieved (or the transaction is aborted).
* If a transaction has completed successfully (commit), then the BufferPool should make sure that each of the pages modified by the transaction are written to disk. It should then mark each of those pages as clean (not dirty) so that they can be used by other transactions, if necessary. It should also release all locks held by the transaction. The flushPage
and releasePage
methods exist for these purposes.
* If a transaction completes unsuccessfully (abort), then the BufferPool should discard any pages that were modified by that transaction. This is as simple as replacing the modified copy of the page in cache with a fresh copy of the page from disk.
* It may be tempting to remove the page from the BufferPool cache when it is flushed after a transaction. It is actually better to keep this page in cache in case another transaction wishes to use it.
* Please keep in mind that BufferPool
is primarily in charge of monitoring data access. It does not actually perform the read/write operations that are requested, rather it should call methods in HeapPage
or HeapFile
to handle this (or perhaps the HeapFile methods should call BufferPool? This is an implementation decision that you will need to make.). Depending on your implementation, this means that you may not be calling those methods directly, rather, any data access could now go through the BufferPool
(again this is an implementation decision). Depending on your implementation, this could require some modification to the code you wrote for previous homework assignments (specifically executing queries from assignment 2) but you are not required to make those changes for this assignment.
* Care is needed when inserting a tuple into a table. We do not know what page that tuple will be inserted into, since the HeapFile must first scan the pages and find an empty slot. Note, however, that the addTuple
method returns the page that contains the inserted tuple. Therefore, you can use this method to insert the tuple, mark the resulting page as dirty, and insert that page into the buffer pool cache (if necessary) so that it can be properly flushed when the transaction is complete.
* It may be necessary to make modifications to HeapFile
, HeapPage
, and/or Catalog
to complete this assignment. Consider adding transaction id parameters to important methods within these classes to help track operations performed by particular transactions and whether they are valid.
Deadlock
As discussed in class, it is possible for deadlock to occur when working with multiple transactions. We also discussed a few techniques that can be used to avoid deadlocks. You should choose a deadlock avoidance technique and implement it, so that if and when deadlock does occur, at least one of the transactions involved is allowed to proceed.
You may need to abort a transaction as a result of deadlock resolution. You may assume that this transaction will be manually restarted at a later time. In other words, you are not required to restart an aborted transaction.
As a hint, the simplest form of deadlock avoidance would likely involve some sort of timeout – if a lock cannot be acquired in a certain period of time then assume deadlock has occured and abort the transaction.
Testing
You are expected to write some unit tests for this assignment. Each group should submit two unit tests by noon on Thursday, March 27th. I will then go through those unit tests, compile them, and give them back to you to use as your work on the assignment. These tests will also be used to grade your work.
Rubric
The below rubric is intended as a guideline and will be followed as closely as possible, however submissions that deviate too much from the specification may be graded using an alternate rubric.
Page tracking structure (10 points), Lock tracking structure (10 points), Deadlock resolution (5 points)
BufferPool: Constructor and instance variables (10 points), getPage (10points), releasePage (5 points), committing transactions (10 points), aborting transactions (10 points), holdsLock (5 points), insertTuple (5 points), deleteTuple (5 points), flushPage (5 points), evictPage(5 points)
Unit tests: 5 points Total: 100 points
Submission
To submit your homework, simply commit and push your code to your repository by the due date. Be sure to put your names at the top of each file. Any submissions that are committed and pushed after the due date will be discarded.
Acknowledgements
This lab is a heavily modified version of a lab provided from MIT Open Courseware. I am grateful to them for providing this service.
Sam Madden, Robert Morris, Michael Stonebraker, Carlo Curino, 6.830 / 6.814 Database Systems, Fall 2010. (MIT OpenCourseWare: Massachusetts Institute of Technology), https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-830-database-systems-fall-2010/index.htm (Accessed 9/11/17). License: Creative commons BY-NC-SA