For perspective, this was my final class in the program. I do not write C/C++ regularly, but I did take CS6200: GIOS already.
No, this class is not about writing unit tests. This class is focused on static/dynamic code analysis. That said, I believe this is one of the most industry-relevant classes in the program and sorely underrated. Once you get past the LLVM learning curve, the class is very smooth sailing.
In general, the pace of the class is slow. In combination with the predictable grading, this would be a good class for a Summer semester or a double-up.
Grading
60% Labs, 20% Quizzes/Surveys, 20% Midterm Exam. Final grades were distributed on a standard 10-point grading scale. The class also offers 5% extra credit for participation. If you take the quizzes & exam seriously, it's quite easy to achieve the A grade.
Underrated feature of the class: almost all grades were published within 48 hours of the close of the assignment, including the exam grade.
Labs
We had 2 weeks to complete most labs with exceptions of Lab 0 (1 week), Lab 4 (1 week), and Lab 7 (3 weeks including Thanksgiving). This was way more than enough time. I achieved 100% on all labs without substantial effort: invest the necessary time to get the desired result and avoid obviously incorrect workarounds.
The class has a convenient Docker image for completing the labs. It's worth spending the time to get a good setup with automatic code completion. I used Docker + VSCode + Remote Containers + C/C++ extension.
The best resource was the LLVM_Primer.pdf
document provided by the class. This contains the majority of the LLVM API needed to complete the labs.
The challenge is mostly connecting the lecture concept to the specific tools & implementation. All labs require <200 lines of code and some (Labs 4 & 7) require adding just a few lines.
All labs were weighted 8% with the exception of Lab 0 at 4%. All labs were opened during the 2nd week of classes, so students can (and will) work far ahead on the labs.
Lab 0: Intro to LLVM
This was a very gentle introduction to LLVM. The only goal is to loop through the hierarchy of LLVM functions/code blocks/instructions. You can effectively find the answer verbatim from the primer document.
Lab 1: Fuzzing (LLVM)
This was the first "real" lab and generally pretty tough while students are still acclimating to LLVM. Due to the random nature of fuzzing, I had to spend a decent amount of time tuning the randomness of my strategies. It's easy to underestimate the amount of changes expected between iterations.
Lab 2: Dataflow (LLVM)
I really enjoyed this lab. I had to spend some time to understand how the algorithm from the lectures mapped to the data structures provided in the lab. After figuring that out, this was quite easy & interesting.
Some students lost points from hardcoding a number of iterations for chaotic iteration. Don't do that.
Lab 3: Datalog (LLVM + Z3)
This was effectively Lab 2 but using the Z3 solver by adding constraints. Conceptually, this lab required a lot of time & experimentation to figure out. I somewhat brute-forced the solution by experimenting until the output worked.
Lab 4: Type Systems (TypeScript)
This was a pretty interesting lab: given some JavaScript code & failing unit tests, add type info & fix the bugs for 3 different programs. I use TypeScript professionally on a daily basis, so this lab was quite easy. The only challenge is fixing the issue in a correct way: there are many ways to pass the unit tests without using the intended values.
There was also some gray grading where the staff manually inspects the code for a sufficient amount of typing. In other words, you must have type annotations added to the functions or data structures relevant to the issues.
Lab 5: Cooperative Bug Isolation (LLVM)
In my opinion, this is the most conceptually & technically challenging lab of the class. Some of the LLVM instrumentation can be borrowed from Lab 1 but using additional API such as IRBuilder
. The challenge increases in Part 2 with the implementation of the pseudocode algorithm from the lectures. The solution requires reading log files generated for each iteration, creating an algorithm with a feedback loop.
The lab is testing against randomized programs which makes solution verification much more difficult. I had to rely on Gradescope for most of the solution validation.
Lab 6: Delta Debugging (Java)
This lab was conceptually quite easy. Similar to Lab 2, most of the work was implementing an algorithm from pseudocode. This was a relatively rote lab after figuring out the stopping conditions & potential off-by-one issues. The only real Java knowledge required was a small amount of string manipulation.
Lab 7: KLEE (C + KLEE)
Similar to Lab 3, this involved using the KLEE tool to add constraints (klee_assume()
) & assertions (klee_assert(0)
) for the desired result.
I found this lab to be the most tedious. KLEE is not included in the base Docker image, so this lab required more setup. As a result, I had to work around path issues. I also wasted time before realizing that the "error case" produces the expected file at the absolute Linux path /tmp/
, not within the project directory itself. It's also hard to iterate quickly with KLEE because the output is a tool-specific format.
Students needed to constrain inputs to keep the runtime under 1 minute, but you could be penalized for excluding potential failure inputs. Apparently you're supposed to find the pattern of failing inputs through trial-and-error, but there's no good assurance to avoid over-constraining or under-constraining and there's no pre-deadline test cases on Gradescope to validate constraints. The staff acknowledged the blind grading situation but I doubt this can be easily addressed.
The staff discouraged this approach, but I ended up reverse-engineering obfuscated C code using ASCII codes & C standard library functions. This gave me way more confidence in my constraints. The "intended" approach seemed risk-prone for missing an edge-case.
Quizzes
30 minutes, unproctored, open-everything (sans peers). 1 quiz per lesson and you can complete the quizzes any time before the end of the course.
These can be somewhat tricky because there isn't much opportunity to practice for the questions. The teaching staff provided additional preparation questions & answers, but I found that the questions embedded in the lectures were often more relevant to the quizzes.
I lost trivial points on some quizzes because some questions ask for specific answer formatting. Take your time & reread the instructions before submitting.
Exam
3 hours, proctored, 24 questions. You could bring a page of notes, but you needed to fill your notes in a very specific Word document template & upload to Canvas so you could access the document without getting blocked by the proctoring software. Inconvenient, but the process of distilling notes to the specified format helped reinforce many concepts.
I felt the exam was tough but fair and only covers 4 lessons of material. Class average was somewhat low at 78%. You should expect to make up for this grade with high lab grades & extra credit participation.
My only advice is to carefully review all quizzes, lecture questions, & exam preparation questions. Copy useful diagrams (e.g. soundness vs. completeness) to your notes. You do need to know the details of all the specific tools mentioned in the lectures: Monkey, Korat, Cuzz, Randoop.