6.005 Elements of Software Construction
Spring 2008
Project 1: Multipart File Transfer
Wednesday, February 13, 2008
Due: February 21, 2008 and February 28, 2008 (See Deliverables and Grading for details)

Problem
Purpose
Specification
Tasks
Infrastructure
Deliverables and Grading
Hints

Problem

One way to handle the problem of reliably storing large files is to break them into pieces that are placed on multiple disks or machines. You're probably familiar with this idea from BitTorrent, the peer-to-peer file sharing protocol, and it's used in a variety of other contexts. By downloading files in multiple small parts, the chance of the entire download failing is reduced (since each part can be restarted individually) and load is spread more evenly across the network.

In this project, you will build a multipart downloader that assembles a large file from multiple parts downloaded individually from multiple machines. It allows the same part to be stored redundantly in multiple locations so it can be resilient to failures. Your program will assemble the file incrementally from its parts as they are downloaded, displaying the current version of the assembled file so that you can see the progress visually.

In later projects, you'll be asked to refine the problem itself. In this case, however, the problem is largely fixed. Furthermore, your solution will be an API that satisfies a given specification. Being an API ('application programmers interface') simply means your solution can be accessed 'programmatically', that is by method calls, rather than through a GUI, for example. Separating out an API is generally good practice because it allows the code to be used in a variety of contexts, separates the user interface from the core functionality, and makes testing easier. We are providing you with a simple GUI that can be easily combined with the API, but in fact it won't make full use of the API. The key method of the API downloads a single file part; in the GUI, this method is just called repeatedly to assemble a single file, but it could be used more ambitiously, for interleaving the downloading of multiple files, for example. The API also includes a method to skip the current sequence of downloadings and start a new sequence from a different location; this could be used with timeouts or under user control to improve performance when some servers are slow.

The breakdown of the file into parts is specified in a special manifest file. Your code will have to parse this file, and then use it to determine which files to download. Both this parsing and the downloading process itself are naturally expressed as state machines.

A variety of failures can occur, and it will be up to you to figure out what they are and decide how they should be handled.

Purpose

The purpose of this project is to help you (1) become familiar with the basic tools of software development in Java -- the language, the Eclipse IDE, and the JUnit testing framework; (2) get you started programming in Java in a simple imperative style; (3) learn how to conceive of software systems and the environments in which they operate as state machines, and to record designs and environmental assumptions with state machine diagrams; (4) begin to appreciate the importance of testing in building high quality software; (5) acquire some familiarity with some important computer system concepts and technologies, such as URLs and HTTP, redundancy and fault tolerance.

Specification

Manifest Files

A manifest file specifies how a given file is broken into parts. It can contain simply a list of URLs pointing to the constituent files.

For example, the file picture.jpg may be broken into three parts all stored on the same machine:

http://mymachine.mit.edu/picture.jpg-part1
http://mymachine.mit.edu/picture.jpg-part2
http://mymachine.mit.edu/picture.jpg-part3

The manifest file can give alternatives, so that a part, of several parts, can be stored redundantly in different locations:

http://mymachine.mit.edu/picture.jpg-part1
http://mymachine.mit.edu/picture.jpg-part2
http://mymachine.mit.edu/picture.jpg-part3
--
http://yourmachine.mit.edu/picture.jpg-part1
http://yourmachine.mit.edu/picture.jpg-part2
http://yourmachine.mit.edu/picture.jpg-part3

In this case, the three parts are all stored on the machine mymachine.mit.edu and also on the machine yourmachine.mit.edu; if the first machine is not accessible, the downloader can try the second. Note how the alternatives are separated by a line containing two hyphens.

Application Behavior

The GUI offers a text field and an adjacent button marked Download. The user enters a URL in the text field pointing to the manifest file (which may be local or on another machine) and clicks on the button. The application then downloads the file part-by-part, displaying the file progressively in a window for the user to see. If an error occurs in the current series of parts, the next alternative is automatically attempted, starting over at the first part.

API

The GUI component of the application is provided for you. Your task is to implement a class named Multipart in a top-level package named multipart, satisfying the specification in the javadocs for the Multipart class. The class defines the following methods:

a constructor Multipart (IDownloader downloader, String manifestURL) that creates a downloading state machine for the file whose manifest is at the given URL;
a method void getPart (OutputStream stream) that gets the next part (if any) and adds its contents to the given stream;
a method void skip () that skips any remaining parts in the current alternative, so that the next time getPart is called, the series of parts specified in the manifest as the next alternative is used instead.
a getter boolean success () that returns true if the parts were all downloaded successfully.
a getter boolean error () that returns true if an error occurred in the current alternative (which is thus reset by skip()), or in loading the manifest file or in finding the next alternative.

Handling faults

A variety of failures may occur during execution, which your code should deal with gracefully. These include network failures (e.g. the program cannot download a file) and other I/O failures (e.g. the program cannot write to an output stream due to file or memory problems), and manifest file syntax errors.

Tasks

You should perform the following tasks:

Laboratory. Complete the first lab of this project. A separate handout describes the lab activities.
Problem Refinement. Analyze the problem statement carefully, and resolve any ambiguities or omissions. In particular, you should clarify any assumptions that are necessary, and should specify what failures will be handled and how. In most cases, there will be no obvious best way to refine the problem, so you should use your own judgment and justify your decisions carefully (but briefly!). Your work should include a refined specification for the Multipart class itself.
Abstract Design. Design a multipart downloader by constructing a state machine diagram that describes the behavior of the downloader itself. You will probably find it easiest to start with the assumption that there are no failures, so that you can identify the major modes and the transitions between them, and then to consider the various failures that can occur.
Code Design. Sketch the code for two possible implementations of your abstract design. List their relative merits, select one of them, and explain why you chose it.
Implementation. Implement your chosen code design in Java. You might find that you want to make changes to your design. You are free to do this, but should record the changes so it is clear what the original design was and what changes you made to it.
Test. Run the staff test suite using the test framework provided. If your program fails any of the tests, correct the problem, and rerun the test suite. Create at least three additional test cases of your own, that demonstrate the robustness of your Multipart implementation.
Reflection. Write a brief commentary saying what you learned from this experience. What was easy? What was hard? What was unexpected? Briefly evaluate your solution, pointing out its key merits and deficiencies. Discuss how you would design a more flexible system in which the manifest file could itself point to other manifest files, and what additional complications this would cause.

Infrastructure

Note that the Multipart constructor method takes as input an IDownloader. The IDownloader provides functionality for downloading single files, and abstracts away the details of reading and writing files and accessing files with a URL using the HTTP protocol.

We provide you with two implementations of this interface: a working implementation for use in the application itself, and a stub CannedDownloaderStub that does not actually connect to the network; instead, it returns canned responses. The test case in MultipartTest, a skeleton JUnit test file that we provide to get you going, illustrates the use of this stub.

We also provide some sample manifest files with their associated parts. In approximate order of complexity, these are:

For comparison, the original single-part files are in the same directory.

Deliverables and Grading

For the first deadline (Feb 21, 2008), your deliverables are:

the problem refinement;
the abstract design;
and the code design;

and for the second deadline (Feb 28, 2008):

the implementation;
the tests;
and your reflections.

Your code should be committed in the repository you share with your teammate by the deadline; all other parts of the project should be handed in on paper, and stored also in your respository as two separate PDF documents, one for each deadline.

Grades will be allotted according to the following breakdown: problem refinement -- 20%; abstract design -- 20%; code design -- 20%; implementation -- 20%; testing -- 10%; reflection -- 10%.

Your reflections should be done as a team, but don't forget your individual project introspection, as described in the Lab Notebook handout.

Clarifications

For the first deadline, you're asked to hand in the problem refinement, abstract design and code design. A few comments about the form these should take:

The problem refinement should just be a paragraph or two or text, explaining assumptions you're making about the problem. For the midi piano example we did in lecture, this would have included all the issues we discussed in class, such as automatic key repeats, allowing multiple keys to be pressed at once, recording that starts with key releases, record/playback interactions, etc. For this project, you'll want to consider various issues, but especially the content of the manifest file and what assumptions are being made about it beyond the syntax outlined above.

The abstract designs should be given as state machines, with commentary.

The code design should be presented as a dependency diagram, and class/interface skeletons (that is, method declarations) of your state machine implementations.

Hints

Code that can be ignored, but might be helpful. You do not need to understand the GUI code in ui/Main.java, but you may find it useful to consider what is going on in the methods startDownload() and downloadMore(). More specifically, you will better understand the way your code is being called by the GUI if you can answer these questions:

How would you get the GUI to display "Download failed: error beginning download."?
What does timer do?
In downloadMore(), there are two lines that say if(multipart.error()). What is the difference between the two?
What are the possible outcomes of downloadMore(), and how would you get to each of them?

Understanding the protocol. Look at the contents of the manifest files we provide, using a web browser. The structure is quite simple, and examining the manifest files and the parts of text files (which, unlike image files, have well-formed subparts) will give you an intuition for what your program should be doing.

Testing. To test your code, you'll need to simulate failures. For failures during downloading, the easiest way to do this is to write an implementation of the interface IDownloader that throws an exception. You'll probably want to start with the code of CannedDownloaderStub that we provide, and modify that.

6.005 Elements of Software Construction Spring 2008 Project 1: Multipart File Transfer Wednesday, February 13, 2008 Due: February 21, 2008 and February 28, 2008 (See Deliverables and Grading for details)