One way to handle the problem of reliably storing large files is to break them into pieces that are placed on multiple disks or machines. You're probably familiar with this idea from BitTorrent, the peer-to-peer file sharing protocol, and it's used in a variety of other contexts. By downloading files in multiple small parts, the chance of the entire download failing is reduced (since each part can be restarted individually) and load is spread more evenly across the network.
In this project, you will build a multipart downloader that assembles a large file from multiple parts downloaded individually from multiple machines. It allows the same part to be stored redundantly in multiple locations so it can be resilient to failures. Your program will assemble the file incrementally from its parts as they are downloaded, displaying the current version of the assembled file so that you can see the progress visually.
In later projects, you'll be asked to refine the problem itself. In this case, however, the problem is largely fixed. Furthermore, your solution will be an API that satisfies a given specification. Being an API ('application programmers interface') simply means your solution can be accessed 'programmatically', that is by method calls, rather than through a GUI, for example. Separating out an API is generally good practice because it allows the code to be used in a variety of contexts, separates the user interface from the core functionality, and makes testing easier. We are providing you with a simple GUI that can be easily combined with the API, but in fact it won't make full use of the API. The key method of the API downloads a single file part; in the GUI, this method is just called repeatedly to assemble a single file, but it could be used more ambitiously, for interleaving the downloading of multiple files, for example. The API also includes a method to skip the current sequence of downloadings and start a new sequence from a different location; this could be used with timeouts or under user control to improve performance when some servers are slow.
The breakdown of the file into parts is specified in a special manifest file. Your code will have to parse this file, and then use it to determine which files to download. Both this parsing and the downloading process itself are naturally expressed as state machines.
A variety of failures can occur, and it will be up to you to figure out what they are and decide how they should be handled.
The purpose of this project is to help you (1) become familiar with the basic tools of software development in Java -- the language, the Eclipse IDE, and the JUnit testing framework; (2) get you started programming in Java in a simple imperative style; (3) learn how to conceive of software systems and the environments in which they operate as state machines, and to record designs and environmental assumptions with state machine diagrams; (4) begin to appreciate the importance of testing in building high quality software; (5) acquire some familiarity with some important computer system concepts and technologies, such as URLs and HTTP, redundancy and fault tolerance.
Manifest Files
A manifest file specifies how a given file is broken into parts. It can contain simply a list of URLs pointing to the constituent files.
For example, the file picture.jpg may be broken into three parts all stored on the same machine:
http://mymachine.mit.edu/picture.jpg-part1 http://mymachine.mit.edu/picture.jpg-part2 http://mymachine.mit.edu/picture.jpg-part3The manifest file can give alternatives, so that a part, of several parts, can be stored redundantly in different locations:
http://mymachine.mit.edu/picture.jpg-part1 http://mymachine.mit.edu/picture.jpg-part2 http://mymachine.mit.edu/picture.jpg-part3 -- http://yourmachine.mit.edu/picture.jpg-part1 http://yourmachine.mit.edu/picture.jpg-part2 http://yourmachine.mit.edu/picture.jpg-part3In this case, the three parts are all stored on the machine mymachine.mit.edu and also on the machine yourmachine.mit.edu; if the first machine is not accessible, the downloader can try the second. Note how the alternatives are separated by a line containing two hyphens.
Application Behavior
The GUI offers a text field and an adjacent button marked Download. The user enters a URL in the text field pointing to the manifest file (which may be local or on another machine) and clicks on the button. The application then downloads the file part-by-part, displaying the file progressively in a window for the user to see. If an error occurs in the current series of parts, the next alternative is automatically attempted, starting over at the first part.
API
The GUI component of the application is provided for you. Your task is to implement a class named Multipart in a top-level package named multipart, satisfying the specification in the javadocs for the Multipart class. The class defines the following methods:
Handling faults
A variety of failures may occur during execution, which your code should deal with gracefully. These include network failures (e.g. the program cannot download a file) and other I/O failures (e.g. the program cannot write to an output stream due to file or memory problems), and manifest file syntax errors.
We provide you with two implementations of this interface: a working implementation for use in the application itself, and a stub CannedDownloaderStub that does not actually connect to the network; instead, it returns canned responses. The test case in MultipartTest, a skeleton JUnit test file that we provide to get you going, illustrates the use of this stub.
We also provide some sample manifest files with their associated parts. In approximate order of complexity, these are:
For comparison, the original single-part files are in the same directory.
For the first deadline (Feb 21, 2008), your deliverables are:
and for the second deadline (Feb 28, 2008):
Your code should be committed in the repository you share with your teammate by the deadline; all other parts of the project should be handed in on paper, and stored also in your respository as two separate PDF documents, one for each deadline.
Grades will be allotted according to the following breakdown: problem refinement -- 20%; abstract design -- 20%; code design -- 20%; implementation -- 20%; testing -- 10%; reflection -- 10%.
Your reflections should be done as a team, but don't forget your individual project introspection, as described in the Lab Notebook handout.
For the first deadline, you're asked to hand in the problem refinement, abstract design and code design. A few comments about the form these should take:
The problem refinement should just be a paragraph or two or text, explaining assumptions you're making about the problem. For the midi piano example we did in lecture, this would have included all the issues we discussed in class, such as automatic key repeats, allowing multiple keys to be pressed at once, recording that starts with key releases, record/playback interactions, etc. For this project, you'll want to consider various issues, but especially the content of the manifest file and what assumptions are being made about it beyond the syntax outlined above.
The abstract designs should be given as state machines, with commentary.
The code design should be presented as a dependency diagram, and class/interface skeletons (that is, method declarations) of your state machine implementations.
Code that can be ignored, but might be helpful. You do not need to understand the GUI code in ui/Main.java, but you may find it useful to consider what is going on in the methods startDownload() and downloadMore(). More specifically, you will better understand the way your code is being called by the GUI if you can answer these questions:
Understanding the protocol. Look at the contents of the manifest files we provide, using a web browser. The structure is quite simple, and examining the manifest files and the parts of text files (which, unlike image files, have well-formed subparts) will give you an intuition for what your program should be doing.
Testing. To test your code, you'll need to simulate failures. For failures during downloading, the easiest way to do this is to write an implementation of the interface IDownloader that throws an exception. You'll probably want to start with the code of CannedDownloaderStub that we provide, and modify that.