M.I.T. DEPARTMENT OF EECS

6.033 - Computer System Engineering Valgrind Assignment

Hands-on 3: valgrind

Complete the following hands-on assignment. Do the activities described, and submit your solutions via Gradescope by 11:59pm on 2/28

This assignment relates to the Eraser paper, and exposes you to parallel programming with pthreads, and understanding and fixing concurrency errors. You might find it helpful to skim the Eraser paper, which is assigned for the next section. You also might want to refresh your memory on what race conditions are and the troubles that they can cause by revisiting sections 5.2.2, 5.2.3, and 5.2.4 of the textbook.

This assignment involves a small amount of programming.

I. Warmup

Add the 6.033 and gcc locker (we're using > here to represent the command prompt):
  > add 6.033; add gcc
Download the program ph.c and store it in a file ph.c. On a linux system, you might do:
  > wget http://web.mit.edu/6.033/www/assignments/ph.c
Then compile and run it with one thread:
  > gcc -g -o ph ph.c -pthread
  > ./ph 1
    completion time for put phase = 2.758351
    0: 0 keys missing
    completion time for get phase = 2.270119

The 1 specifies the number of threads that execute put and get operations on the the hash table. The program inserts NKEYS into a hash table in the put phase and then looks them all up in the get phase.

Let's run with two threads: each thread does NKEYS/2 puts in the put phase and NKEYS/2 lookups in the get phase. The program is written in C, the default language for low-level systems programming, and uses pthreads for creating several threads that can run in parallel if our machine has several cores (most processors today have at least two cores).

If we achieve perfect parallelism, then the two phases complete in half the time. Here is the output:

  > ./ph 2
    completion time for put phase = 1.688109
    0: 27 keys missing
    1: 39 keys missing
    completion time for get phase = 3.215020

We see that put phase is faster now, so we were able to exploit two cores (but it didn't get twice as fast). The get phase, however, is slower with two cores, which is disappointing: using more cores, the program ran slower. More on that later, because there is a bigger problem.

You will likely observe that the code is incorrect. The application inserted 27+39 keys in phase 1 that phase 2 couldn't find. In your runs, there may be more or fewer keys missing. There may be even zero keys missing in some runs. If you run with one thread, there will never be any keys missing.

Run the application with 4 threads:

> ./ph 4
  completion time for put phase = 1.113811
  0: 39 keys missing
  2: 31 keys missing
  3: 15 keys missing
  1: 37 keys missing
  completion time for get phase = 3.436008

Notice that: 1) the put runs faster with four cores than two cores, but not four times as fast as a single core; 2) the get phase ran even slower; 3) more keys are missing.

II. Using valgrind to find race conditions

The ph.c program is simple, and you may be able to spot the mistake immediately, but we will use a tool, helgrind that helps us find such errors automatically. helgrind is similar in spirit to the Eraser system that you are reading about, and is part of a suite of tools collectively called valgrind.

Run helgrind with ph. You will see output as follows:

> valgrind --tool=helgrind ./ph 2
  ==5143== Helgrind, a thread error detector
  ==5143== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al.
  ==5143== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
  ==5143== Command: ./ph 2
  ==5143== 
  ==5143== ---Thread-Announcement------------------------------------------
  ==5143== 
  ==5143== Thread #3 was created
  ==5143==    at 0x515543E: clone (clone.S:74)
  ==5143==    by 0x4E44199: do_clone.constprop.3 (createthread.c:75)
  ==5143==    by 0x4E458BA: create_thread (createthread.c:245)
  ==5143==    by 0x4E458BA: pthread_create@@GLIBC_2.2.5 (pthread_create.c:611)
  ==5143==    by 0x4C30E0D: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==5143==    by 0x40108B: main (ph.c:148)
  ==5143== 
  ==5143== ---Thread-Announcement------------------------------------------
  ==5143== 
  ==5143== Thread #2 was created
  ==5143==    at 0x515543E: clone (clone.S:74)
  ==5143==    by 0x4E44199: do_clone.constprop.3 (createthread.c:75)
  ==5143==    by 0x4E458BA: create_thread (createthread.c:245)
  ==5143==    by 0x4E458BA: pthread_create@@GLIBC_2.2.5 (pthread_create.c:611)
  ==5143==    by 0x4C30E0D: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==5143==    by 0x40108B: main (ph.c:148)
  ==5143== 
  ==5143== ----------------------------------------------------------------
  ==5143== 
  ==5143== Possible data race during read of size 4 at 0x1CE5728 by thread #3
  ==5143== Locks held: none
  ==5143==    at 0x400C1D: put (ph.c:61)
  ==5143==    by 0x400E74: put_thread (ph.c:98)
  ==5143==    by 0x4C30FA6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==5143==    by 0x4E45181: start_thread (pthread_create.c:312)
  ==5143==    by 0x515547C: clone (clone.S:111)
  ==5143== 
  ==5143== This conflicts with a previous write of size 4 by thread #2
  ==5143== Locks held: none
  ==5143==    at 0x400CB0: put (ph.c:64)
  ==5143==    by 0x400E74: put_thread (ph.c:98)
  ==5143==    by 0x4C30FA6: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
  ==5143==    by 0x4E45181: start_thread (pthread_create.c:312)
  ==5143==    by 0x515547C: clone (clone.S:111)
  ==5143==  Address 0x1ce5728 is 24000008 bytes inside data symbol "table"
  ==5143== 
  ==5143== 
  ==5143== More than 10000000 total errors detected.  I'm not reporting any more.
  ==5143== Final error counts will be inaccurate.  Go fix your program!
  ==5143== Rerun with --error-limit=no to disable this cutoff.  Note
  ==5143== that errors may occur in your program without prior warning from
  ==5143== Valgrind, because errors are no longer being displayed.
  ==5143== 
  ==5143== 
  ==5143== For counts of detected and suppressed errors, rerun with: -v
  ==5143== Use --history-level=approx or =none to gain increased speed, at
  ==5143== the cost of reduced accuracy of conflicting-access information
  ==5143== ERROR SUMMARY: 10000000 errors from 1 contexts (suppressed: 0 from 0)

III. Questions

Now you're ready for this week's questions.

Like before, the questions are in a read-only google doc. Make sure to enter quesitons in the page indicated (please do not erase the question text) and upload them as a PDF to Gradescope. See more detailed instructions at the end of the first week's hands-on. If you are having Gradescope problems, please post a question on Piazza!