15th SoCal PLS Program
9:00 am
Breakfast
10:00 am
Parallelizing Seemingly-Sequential
Computations
Important applications, such as machine
learning and log processing, involve iterating over a large data set
with loop-carried dependences across iterations. As such, these
computations are not embarrassingly parallel. In this talk, I will
describe symbolic parallelism, a general methodology for
parallelizing dependent computations. The basic idea is to break
dependences using efficient symbolic reasoning. This work is
primarily motivated by the need to extract parallelism from data
processing queries that combine standard relational operations (such
as map, reduce, filter) with non-relational application-specific
code. By applying symbolic parallelism to such code and exposing the
resulting parallelism to the underlying query optimizer, we obtain
2-4 orders of magnitude performance improvements over Hadoop and SQL
Server.
10:30 am
OpenMP is Not as Easy as It Appears
This paper aims to show that knowing the
core concepts related to a given parallel architecture is
necessary to write correct code, regardless of the parallel
programming paradigm used. Programmers unaware of architecture
concepts, such as beginners and students, often write parallel
code that is slower than their sequential versions. It is also
easy to write code that produces incorrect answers under
specific conditions, which are hard to detect and correct. The
increasing popularization of multi-core architectures
motivates the implementation of parallel programming
frameworks and tools, such as OpenMP, that aim to lower the
difficulty of parallel programming. OpenMP uses compilation
directives, or pragmas, to reduce the number of lines that the
programmer needs to write. However, the programmer still has
to know when and how to use each of these directives. The
documentation and available tutorials for OpenMP give the idea
that using compilation directives for parallel programming is
easy. In this paper we show that this is not always the case
by analysing a set of corrections of OpenMP programs made by
students of a graduate course in Parallel and Distributed
Computing, at University of São Paulo. Several
incorrect examples of OpenMP pragmas were found in tutorials
and official documents available in the Internet. The idea
that OpenMP is easy to use can lead to superficial efforts in
teaching fundamental parallel programming concepts. This can
in its turn lead to code that does not develop the full
potential of OpenMP, and could also crash inexplicably due to
very specific and hard-to-detect conditions. Our main
contribution is showing how important it is to teach core
architecture and parallel programming concepts properly, even
when you have powerful tools such as OpenMP available.
11:00 am
Break
11:15 am
Data-Driven Precondition Inference with Learned
Features
We present a new black-box approach for likely
precondition inference. We infer preconditions for a code snippet by
using a form of machine learning over a set of test executions. Like
prior data-driven approaches, we employ a feature based learning --
features are boolean predicates over inputs. But prior approaches
are limited by a fixed set of features or a fixed structure for the
predicates. Our approach overcomes these limitations by employing a
generic on-demand feature learning using a program synthesizer, and
producing a CNF over features without any predefined structure. We
also show that, given access to the source code, and an SMT solver
which is able to handle the theories used in the source code, we can
infer verified loop invariants. Furthermore, our approach is
agnostic to the solver, program synthesizer and boolean function
learner used. We have implemented our approach as a tool, PIE
(Precondition Inference Engine) and we demonstrate its capabilities
by inferring preconditions for OCaml library code and verified
invariants for C++ code.
11:45 am
Supporting Reasoning at the Hardware
Level
Modern machines are complicated, even in
their interface. Previous works have tried to formalize
hardware ISAs to aid in verification and reasoning, but the
inherently global, imperative, and stateful nature of existing
machines severely hamstrings these efforts. We take a
different approach: rather than attempting to bridge the gap
between proof and machine, we attempt to shrink it with a new
hardware architecture that presents a purely functional,
lambda calculus-like ISA. We present a working prototype,
hardware usage figures, compiler, and initial proofs of
correctness of some binary programs, and will touch on future
uses and research directions of the machine.
12:15 pm
Lunch
1:30 pm
The Sourcerer 2011 Dataset
We present Sourcerer 2011, a publicly
available dataset of software projects written in the Java
programming language. This dataset contains over 74,000 open
source projects collected from Google Code, SourceForge and
Apache, along with over 119,000 jar files found in those
projects. The dataset comprises over 630M source lines of
code. Because of its large size, processing this many projects
is a time-consuming task. As such, one characteristic of this
dataset is the provision of three different representations
that support different kinds of research work. The different
representations correspond to stages of processing in a tool
chain, from the downloading of the projects over the Internet
to the creation of a relational database of code entities and
their relations for open-ended querying. We believe this
dataset will help a wide variety of research involving source
code analysis, by avoiding many necessary but time-consuming
tasks.
2:00 pm
Abstractor: Embedding Imperative Programs Into
Linear Temporal Logic for Hybrid Systems Reasoning
As part of the VeriDrone hybrid-systems
verification project, we need to reason formally about the behavior
of physical systems that contain controllers running imperative
code. I'll describe how we are able to embed programs (in a mostly
language-indepedent way) into the variant of LTL that we used to
reason about physical systems, including a sound accounting for
floating-point computation errors incurred by the controller. The
talk will touch on programming-language semantics, temporal logic,
verification, and interval-based floating-point reasoning.
2:30 pm
Break
3:00 pm
Fuzzing for SMT Solvers
SMT solvers are vitally important tools which
are employed in a variety of domains, including automated testing,
synthesis, and verification. While their correctness is often
paramount to the task at hand, they are still susceptible to
bugs. To help find these bugs before they become problems, we employ
black-box language fuzzing techniques in order to to automatically
test SMT solvers. To this end, we instantiate existing general
fuzzing techniques specifically for SMT solvers. In addition, we
employ a novel fuzzing technique which generates edge cases that
stress corners of a provided language standard. Another major
contribution of this work is an empirical evaluation of the
bug-finding effectiveness of fuzzers implemented according to four
orthogonal dimensions: syntax-guided versus semantics-guided,
unrestricted versus restricted to a subset of an input grammar
(i.e. Swarm Testing), random versus bounded depth-first, and the
generation of big versus small test cases. Such an extensive
evaluation of different fuzzing approaches has not been performed
before, and the community at large is heavily focused on only a
small portion of this space. While this work is still relatively
young, we report our experiences with a semantics-guided and
unrestricted fuzzer which generates small inputs according to a
bounded depth-first search strategy. Although common wisdom
suggests that this point in the fuzzer state space should be
relatively uninteresting, this fuzzer alone has already found over a
dozen bugs across multiple popular SMT solvers, including Z3, CVC4,
MathSAT 5, and Boolector. More often than not, these bugs have
exposed correctness problems with the underlying solver under test,
and developers have usually been quick to fix them.
3:30 pm
Meta-FRP for livecoding audio
Functional Reactive Programming provides
an interesting programming paradigm. Applying it to the
process of livecoding musical composition results in a novel
outlook---the program is the signal, as input to the
compiler/runtime system---and also invites a notion of
*meta*-FRP---we can reify the time-based program editing
process as a meta-program. Indeed, we can take another step,
and reify edits to the meta-program as a
meta-meta-program.
4:00 pm
Discussion and "quick ideas"
4:30 pm
Business meeting