Probabilistic Approaches to Computational Problems

Probabilistic Approaches to Computational Problems
A. Introduction
Area: Over the past half century, computers have become ubiquitous in almost every
aspect of human life. Computer Science as an academic discipline, starting from the early
1950s, has been studying the mathematical theory and practical aspects of computing. As
automation and computers have entered more and more areas, it has become critical to
understand, analyze, and solve computational problems from a mathematical and algorithmic
point of view. In the past two to three decades, computer science has experienced another
paradigm shift while computing and communications have seamlessly merged and the resulting
technology has become easily available to an unprecedented number of non-expert users.
This change has been supported by new techniques in theoretical as well as applied areas,
expanding the need for mathematics for computer-science problems, ranging from discrete
mathematics to wider parts of probability, statistics, numerical methods, and geometrical
analysis.
The shift in the use of computation by the general populace, as well as the availability of
cheap and easy methods for generating, collecting, and storing complex data, have resulted
in the need for processing extraordinary amounts of data that most existing algorithms are
unable to cope with. This data can be the whole genome of a species, streaming sound
or video or text, multidimensional data with missing values, sales records, astrophysical
observations, etc. Meanwhile, the computation and communication capabilities of existing
electronics have been unable to keep up. It is now increasingly clear that we will not be able
to catch up with the increase in the amount and complexity of data using technology that
roughly doubles its resources every 18 months (Moore’s Law); we need dramatic changes
in our thinking about methodology. In particular, techniques that use probability theory
and randomization, approximation algorithms that return very quick and reasonably correct
answers to complicated optimization questions, and algorithms that process real-time streams
as they are generated using statistical tools have all proved useful, and they represent
our hopes for dealing with otherwise unwieldy data. To develop such techniques, we need
collaboration among researchers from discrete mathematics, algorithms, machine learning,
probability theory, statistics, numerical methods, and geometrical analysis.
Indiana University Bloomington is already home to an outstanding group of researchers
working on different aspects of probabilistic approaches to computational problems. The
objectives of this proposal are to hire colleagues who will bridge the Departments of Mathematics, Computer Science, and Statistics and facilitate collaborations, to bring these new
1
3 of 59
hires and the existing researchers together in order to encourage the free flow of ideas, and to
tackle large and practical problems in the area that can lead to patentable algorithms and
commercializable software.
Rationale: The unifying theme of Big Data concerns a very real problem. There are
various reasons for the generation of huge amounts of data. First, over the past three
decades, it has become cheap and easy to generate and store data. From the individual
who keeps innumerable hours of digital home movies to the telecommunications company
that never throws out an IP address that arrives at a router, data gets stored, with the
hopes of one day using it. Second, there are now many more and varied ways of generating
data than simply a few years ago. Devices that used to serve simple purposes such as
telephones have now become means of data generation and storage. Household appliances
or industrial machines that used to be mostly mechanical now have embedded electronics
in them, and they communicate with one another. Sensor networks can have many small,
cheap nodes generating, storing, and communicating data in the wild. Large telescope arrays
and particle accelerators break records every year in the amount of data they generate and
store. Unfortunately, our analytical capabilities, in hardware or software, do not expand
nearly as fast as our data; we are essentially helpless in the face of the uncontrolled increase
and heterogeneous and complex nature of the data that we generate. While data analytics
and big data have become buzzwords across the world, leading to many doubtless necessary
and useful research and teaching programs (including at Indiana University), the data crisis
remains unresolved, and can be overcome only through a deep understanding and merging of
mathematical and computational principles underlying the notion of algorithms for big data.
The emphasis of this proposal is on the notion of bridge. We have excellent people
in the Department of Mathematics and the Department of Statistics whose research is
connected with questions and applications in computational science, and equally excellent
but relatively new hires in Computer Science who use mathematical tools and who also
explore mathematical questions, many related to Big Data, in their research. Our goals
are to improve communication across departments with overlapping research and teaching
programs and to increase collaboration. To this end, we need individuals who can cross the
bridge, communicating problems from Computer Science to Mathematics and Statistics and
communicating mathematical and statistical models and tools to Computer Science. While
Big Data has been an area of priority (including a Big Data graduate degree within School of
Informatics and Computing) at Indiana University Bloomington, the push has mostly been
in an applied direction. Though practical improvements are on the surface what the layman
sees and appreciates, these improvements must be borne out of mathematical foundations
and new theoretical methods; in the absence of developments in theory, applied progress
will eventually stagnate. Within the context of big data, this is the sense in which our
proposed area merging big-data algorithms with mathematical foundations needs to emerge
on the Indiana University Bloomington campus. The three departments are committed to
nurturing this area by creating an interdisciplinary seminar, regardless of the funding of this
proposal. The magnitude of the success of this cross-disciplinary effort would increase by
hiring additional faculty who are closer to the middle of the existing gap in faculty expertise.
Objective: The main objective of this proposal is to bring together the three departments,
2
4 of 59
in order to investigate, from a mathematical and algorithmic angle, problems related to big
data. While the details of the research agenda will no doubt depend to some extent on future
hires into the departments, the research questions to be investigated initially include the
development of probabilistic methods for analyzing and discovering trends in noisy data, for
communication-efficient algorithms for distributed optimization across a network of computers,
for solving optimization problems approximately, for metric embeddings and manifold learning
(reducing a problem in a complicated space to a problem in a simpler space in order to
facilitate its solution or visualization), and for generating random graphs that more accurately
model real-world networks for use in computational algorithms. These are practical problems
in which Indiana University Bloomington already has expertise, and Indiana University
Bloomington would benefit from expanding through new faculty addressing these problems.
The longer term goals will develop through the establishment of a cross-disciplinary seminar
series that will bring faculty and students from Mathematics, Statistics, and Computer
Science together to learn about the mathematical problems that lie within the computational
ones and to learn about the mathematical tools that can be used to solve those problems.
Additional hires who can help close the gap between the departments, as well as expand
the research agenda, will enable the effective communication between existing world-class
faculty, leading to new mathematical theories, grant applications, patentable algorithms, and
commercializable software.
Current state of knowledge: The theoretical study of algorithms usually focuses on
the optimal speed taken or memory required of an algorithm to solve a particular class
of problems. The computational hardness of a problem (where “hardness” is a technical,
quantifiable term) refers to its i nherent difficulty: a computationally hard problem cannot be
solved quickly and effectively no matter how good the researcher or how fast the computer
running the algorithm for it. This notion of hardness is determined by producing upper and
lower bounds on the amount of resources (typically time and memory) required to solve the
problem. To exhibit an upper bound on the time requirements of a problem, it suffices to
produce a “fast” algorithm that solves the problem. When a problem is inherently hard to
solve, it might be necessary to show a lower bound, proving the impossibility of designing
an efficient (e.g., fast) solution. When a real-life problem is complicated and hard (as they
typically tend to be), or when the input data is extremely large (an increasingly problematic
trend), one might be facing algorithms whose runtimes span millions of years, or whose
memory requirements cannot be met with the existing technology for another few generations.
In that case, one would be willing to compromise, and trade certainty or accuracy for efficiency,
settling for a randomized algorithm that solves the problem only with high probability, or
an approximation algorithm that returns an answer that might have a tiny error in it. It
is important to realize that these slight “imperfections” are not hugely significant: one can
typically guarantee that an algorithm will work “with high probability,” which, in practice,
might be indistinguishable from absolute certainty (e.g., failure probability can typically be
made less than that of a lightning strike on the computer running the algorithm). Likewise,
the output of an “approximate” computation might have a tenth of a per cent error; this
type of error is typically dwarfed by the noise in real-life inputs that introduce significantly
higher variation into the output, making efforts for overly accurate computation superfluous.
Randomized and/or approximation algorithms have become, over time, a big part of the
3
5 of 59
computer science research endeavor, and they borrow heavily from mathematics and statistics
for tools and techniques. In addition, during the past two decades, as the amount of data has
exploded and creation/storage paradigms have changed, the field has turned to different models
of computation such as streaming and sampling, which borrow techniques from probability
theory, statistics, optimization, and information theory. Mathematical and statistical models
underlie many of the new data generation and structural models; the backbone of the analysis
of computation involving big data is also provided by these fundamental disciplines. Funda
Ergun and Qin Zhang from Computer Science work on algorithms for streaming data; their
techniques borrow significantly from the above mentioned ideas of randomization, sampling,
and data generation. In particular, they are interested in devising small-space algorithms,
that is, algorithms that do not store all of their input, for solving problems related to
massive data arriving as a sequential stream. To achieve this task, one can form small
summaries named “sketches” and store these in memory rather than the entire data set.
Most computations on these summaries are necessarily approximate, as the shrinking likely
leads to a loss of information, but it can be shown using probabilistic techniques that the
results are very close to the most accurate answers, with high probability. In the event
that the computation is impossible, they resort to showing lower bounds using techniques
from information theory and communication complexity. Qin Zhang has lately focused on
a more distributed notion of streams, while Funda Ergun has worked on string problems.
Martha White, also in Computer Science, works in streams in a machine-learning environment,
employing combinatorial and probabilistic techniques. A new hire to Computer Science,
Yuan Zhou, is interested in approximation algorithms, some of which employ randomized
techniques. He is also interested, when approximations fail, in finding lower bounds.
Often algorithmic problems and other problems regarding computation and communications networks are formulated in terms of properties of metric graphs. Or alternatively,
it is often proven that large classes of problems are of the same computational complexity,
and one of the problems in the class is described in terms of metric properties of graphs.
Once a problem is formulated in terms of properties of a metric graph, many mathematical
techniques can be brought to bear, including combinatorics, analysis, probability, group
theory, and geometry. We give a few examples below that highlight connections among
researchers already at Indiana University Bloomington.
If one models a communications network via a graph, then in one sense a network
is optimal if it has relatively few connections, but one would still need to remove large
numbers of edges to disconnect the graph. This can be made precise asymptotically by
studying families of graphs called expanders. The original proofs that expanders exist
were probabilistic (by Mark Pinsker, a Russian mathematician working in the Institute
for Information Transmission Problems in the Russian Academy of Sciences, Moscow). In
a major breakthrough, Grigory Margulis, currently the Erastus L. DeForest Professor of
Mathematics at Yale, used group theory and analysis, particularly property (T), to give explicit
constructions of expanders. For the interested reader, property (T) has its own Wikipedia
page: https://en.wikipedia.org/wiki/Kazhdan’s_property_(T). David Fisher, in the
Department of Mathematics at Indiana University, has worked on property (T) and its
generalizations and strengthenings, in part with Margulis. For several decades, all explicit
constructions of expanders either depended on some variant of property (T) or on other
4
6 of 59
results from group-representation theory related to number theory. More recently, ideas from
arithmetic combinatorics and particularly work of Jean Bourgain, Nets Katz, and Terry Tao
have been been used to give many additional explicit constructions of expander families.
These constructions are more robust than their predecessors, and it may be that these new
constructions will be useful to computer scientists. Yuan Zhou, from Computer Science, is
interested in the new notion of probabilistic graph expanders for optimization problems. A
related area of research is the study of sets with small doubling in general groups. This is
motivated in part by constructions of expanders. Other recent constructions of expanders
are by a zig-zag construction due to Avi Wigderson and collaborators and also by operator
algebra techniques due to Laurent Lafforgue. Michael Larsen, also in the Indiana University
Mathematics Department, has analyzed a well-known expander and developed an efficient
algorithm for finding good but not-quite-optimal routes through the resulting network.
Embedding metric spaces into standard metric spaces, such as Euclidean space, is an
area of interest to faculty in Computer Science, Mathematics, and Statistics. The powerful
techniques underlying this research find numerous applications in the design and analysis of
algorithms, including nearest-neighbor search, semidefinite programming and sparsest cuts,
combinatorial optimization problems in network design, etc. The general idea is to convert
a problem born in a complex setting to one lying in a relatively easier setting. Formally,
given a problem, we try to embed its underlying metric space (the original space) into
a simpler one (the host space) and then solve the problem in the host space, where the
algorithm design is much easier. One goal is to make the “distortion” (its definition is based
on applications) of the embedding as small as possible. In addition, a special property of
Euclidean space, known as negative type and defined via an embedding property, is not
only useful in Computer Science, but also in statistics, where it leads to powerful algorithms
for clustering and for testing independence, among others. Lyons, in the Department of
Mathematics at Indiana University, has recent work on this topic motivated by statistics. Qin
Zhang in the Department of Computer Science at Indiana University also has recent work on
this topic motivated by the desire to find better algorithms for string-similarity search/join,
a fundamental problem in database optimization. Funda Ergun, also in the Department of
Computer Science at Indiana University, has work on embeddings motivated by her study of
strings and their patterns and edit distances.
Manifold learning (also called “nonlinear dimension reduction”) posits that high-dimensional
feature vectors lie on (or near) a low-dimensional manifold. Many popular techniques for
learning an unknown data manifold begin by constructing a locally connected graph whose
nodes correspond to feature vectors, then embedding the graph to construct a low-dimensional
Euclidean representation of the presumed manifold. One such representation uses eigenfunctions of the Laplacian on the graph, giving what is known as a spectral embedding, an
extremely popular technique in machine learning. Lyons has recent work with a computer
scientist at the University of Washington on spectral embedding of graphs, and is currently
working with Judge (in the Indiana University Bloomington Department of Mathematics)
on spectral embeddings of manifolds. However, Lyons’ work is in a mathematical direction;
bridge hires would facilitate cross-interdisciplinary fertilization and applications.
A large portion of Indiana University Bloomington Department of Statistics Professor
5
7 of 59
Michael Trosset’s research program involves constructing approximate representations (usually
in low-dimensional Euclidean space) of objects from information about pairwise relationships
between objects. In the context of this proposal, these are problems best described under the
heading of graph embedding. Here are some examples:
1. Determine the 3-dimensional structure of a molecule from information about bond
lengths, bond angles, and other information (often determined by NMR spectroscopy)
about pairwise interatomic distances.
2. Construct Euclidean representations of conceptual spaces, either for visualization or for
subsequent statistical analysis. Such applications are common in psychology and statistics, in which disciplines “graph embedding” is known as “multidimensional scaling.”
Rob Nosofsky in the Indiana University Bloomington Department of Psychological and
Brain Sciences also has work of this type.
3. Latent space approaches to social network analysis were first proposed in 2002. These
approaches model networks as random graphs and posit that nodes correspond to
positions in an unobserved space. The probability of an edge between two nodes is a
function of their positions in the latent space, e.g., the distance or the inner product
between them. Recently, several researchers have argued that hyperbolic manifolds
generate more realistic random graphs than do flat manifolds. While some things
are known about embedding graphs in hyperbolic manifolds, this territory is largely
unexplored.
The sparsest cut problem in combinatorial optimization is to divide the nodes of a graph
into two sets so as to minimize the ratio of the number of edges that go between the two sets
divided by the number of nodes in the smaller half of the partition. This objective function
favors solutions that are both sparse (few edges crossing the cut) and balanced (close to a
bisection), which is the opposite of the goal with expander graphs. That is, graphs with
optimal sparsest cuts are easily disconnected and make bad communication networks. Thus,
it is important to determine the sparsest cut for a given graph. In this context, we have a
problem where exact solutions are known to be NP hard but where results exist showing
that certain kinds of approximate solutions are possible in polynomial time. Finding the
optimal degree of approximation of solutions in this context turns out to be related to finding
embeddings with small distortion of metric graphs of negative type into L1 . Motivated by this
problem, Michel Goemans in Department of Mathematics at the Massachusetts Institute of
Technology and Nati Linial in the School of Computer Science and Engineering at the Hebrew
University of Jerusalem conjectured that every metric graph of negative type admitted an
embedding of bounded distortion in L1 . This conjecture was disproven by Khot and Vishnoi.
Soon thereafter, another, more natural and stronger, counterexample was given by Cheeger
and Kleiner using a new form of differentiation theory. The exact result of Cheeger and
Kleiner, and in particular the graphs they studied, were conjectured by Lee and Naor not
to admit bounded-distortion embeddings in L1 . A key aspect of their use of differentiation
theorems is that they reduce the problem to a geometric one, where one can rescale the
embedding in both domain and range, and so differentiation as an infinitesimal operation
6
8 of 59
makes sense. For other problems in this area, it remains important to study embeddings
of graphs where one cannot reduce the problem to a geometric one that admits rescalings.
For these other problems, it seems likely that the correct approach is to use the idea of
coarse differentiation and coarse analysis that was introduced by Eskin, Fisher (of Indiana
University), and Whyte in their work on geometric group theory. Naor has proposed a
particular problem in this direction to Eskin and Fisher.
In addition, there are many uses of random walks on graphs, effective resistance, and
random spanning trees in Theoretical Computer Science; all of these mathematical topics
are central themes in Lyons’ work. To cite just one example, a recent award-winning paper
provides a probabilistic algorithm that does better than all previous ones for solving the
asymmetric traveling salesman problem. A crucial feature in this new work is the use of
random spanning trees coming from electrical networks. Some of the properties they use
are crucial also to work of Lyons and others in their own study of random spanning trees
and their analogues on infinite graphs. This recent paper cites the book Lyons has just
finished and, in turn, the book has now incorporated some of the new results of this paper.
(Incidentally, this new book has been and remains available online for free. It has been the
most or almost the most popular download on IUB’s domain pages.iu.edu, formerly known
as mypage.iu.edu, for several years. For example, it had 164,460 downloads in 2014 and
161,593 downloads in 2015. Unfortunately, IU no longer provides this download data. Among
its endorsements at www.cambridge.org/9781107160156 is one from Daniel Spielman, a
Nevanlinna Prize winner of the Computer Science department at Yale University, who said,
“It is one of my favorite references for probability on finite graphs. If you want to understand
random walks, isoperimetry, random trees, or percolation, this is where you should start.”)
These particular problems and areas demonstrate close connections between ideas in
theoretical computer science related to graph theory and certain areas of mathematics and
statistics. With the additional resources a successful proposal would bring, these connections
would be strengthened and solidified, making Indiana University Bloomington a world-class
leader in this relatively new cross-disciplinary area.
B. Specific Aims
The objective is to become a world leader in the development of probabilistic tools for
computational problems, challenging the paradigm that hard computational problems are
too difficult to solve by providing tools that allow them to be solved with high probability or
solved more easily by cleverly incorporating probabilistic methods. In order to attract faculty,
postdoctoral, and student talent, the proposal has specific aims that illuminate several major
problems in the area.
The over-arching problem is to discover trends in massive structures with bounded
resources.
Specific Aim 1: Finding trends in large, imperfect data
Problems that involve analyzing text find many applications in computational biology,
text processing, and data mining. Typically the text tends to be very long (often larger
than a computer can hold in main memory), and with some errors (noise). An example of
7
9 of 59
noise is typos in a given text; another is “insertions or deletions” in the human genome. The
problem of identifying trends in long data streams is central to many fields; these trends
can be the existence of a pattern (e.g., the “search” function in text), repetitive trends that
data exhibits (do Earth’s temperature readings display a periodic trend?) or certain types
of correlation between elements (e.g., “tandem repeats,” of the type defabcdabcdklm in
a genome sequence). When the data contain noise, as is the norm with real-life data, it
becomes significantly harder to search for trends in the input. We then have to develop an
approximation algorithm where not only the output, but also the input can be approximate.
Typically algorithms for problems related to long, imperfect inputs resort to randomization,
which helps not only with minimizing the resources needed, but also with “smoothing over”
the imperfections in the data, similar to polling a large population before an election. However,
unlike polling, the trends searched are quite complex and specific, the input massive, and
the resources minuscule. One relatively new area that deals with large data/small resources
is “sublinear algorithms;” those use less space/time than even a basic reading of the input
requires, and produce approximate, probabilistic results. Techniques such as Fourier Analysis,
while helpful, may require time or memory that we cannot afford for such problems. While the
algorithms that eventually work tend to be deceptively simple, it takes involved probabilistic
tools to prove that they work and, thus, to be confident of success. In the case of noisy
data, the existing knowledge needs to be significantly improved through the cooperation of
algorithms researchers from Computer Science, probability theorists, and statisticians.
Solutions to these types of problems are of interest to industry, as they represent real-life
questions and applications. Industrial colleagues have repeatedly asked whether we can
compose trends – for instance, a period that grows steadily over time – as well as how we
can deal with different kinds of noise (for instance, rather than having a few wildly wrong
data points, how about having all data points slightly off?). We expect to collaborate with
industry researchers on many of our Big Data theory problems.
Specific Aim 2: Probabilistic methods for communication-efficient algorithms for
distributed optimization across a network of computers
As the datasets become increasingly larger, a common practice in compute science is to
store big graph/geometric data in different machines connected by a network. Now when
processing a query (e.g., find a good clustering), we need to do a distributed optimization
across the network, whose running time is typically dominated by the total communication
between the machines. An emerging research area in theoretical computer science is to design
communication-efficient algorithms for distributed optimization.
An important question is to understand the communication limits, that is, how much
communication must be transmitted through the network in order to solve the problem. This
is called “multi-party communication complexity” in theoretical computer science, and is
closely related to information theory.
Specific Aim 3: Inference with random graphs
Many times graphs are structures imposed by researchers on data in order to discover
information. Other times, real-world networks exist already, such as various social networks
8
10 of 59
or industrial networks. These real-world networks usually have a structure quite different
from simple nearest-neighbor graphs that one can draw from geometric data. In particular,
they often have the so-called “small-world” property, where it does not require many edges
to go from any one node to any other. This property is shared by many common models of
random networks, which is a major reason for applied interest in random graphs. However,
classical statistical techniques were not designed to handle the essential sorts of dependencies
inherent in network data. This makes inference from network data quite challenging; indeed,
this is the central problem in the field today. We aim to provide useful tools for such popular
tasks, together with theoretical justification for such tools.
Specific Aim 4: Computing with massive graphs
Today one encounters regularly graphs of massive size, the internet and its connections
being the most common example. Because of their size, such graphs do not permit computation
that requires the entire graph to be known. Instead, local algorithms that work near a given
node of interest, or near several randomly chosen nodes, have become important. For example,
one such problem is to find, given a node, a cluster of nodes that contains that node but
that has “few” connections to the outside graph. This is known as the local graph clustering
problem. Practical applications have included understanding the community structure of
social and information networks. Solutions to this problem are also used as a subroutine for
other partitioning problems. One popular method of dealing with massive graphs involves
“local” algorithms. Lyons has written a paper (in press) that presents the best algorithm
to date for a similar problem, that of partitioning the node set into two equal parts while
minimizing (or maximizing) the number of connections between them; his algorithm is a
local one and works on graphs of large girth. Lyons was drawn to the problem for purely
mathematical reasons, as the technique of local algorithms turns out to be related to a field
in mathematics called ergodic theory. Practical applications include understanding disease
spread, transportation and evacuation models, business and computer security intelligence,
systems biology applications, the power grid, and full-scale modeling on massive networks.
Specific Aim 5: k-SAT
One computationally difficult problem that has been intensively studied is called k-SAT.
It is a mathematical abstraction known to be computationally equivalent to questions ranging
from protein-folding to aircraft-crew scheduling in the sense that an efficient algorithm that
was able to decide whether any instance of k-SAT has a solution would immediately be
translatable to an algorithm for all the other problems.
In k-SAT, there are n Boolean variables and m clauses, where each clause specifies the
values of k out of the n variables. The question is whether there is a value for each variable
that satisfies all m clauses. In general, one is interested in m growing proportionally to n
as n → ∞. Although, like the other such problems, it is believed that no efficient algorithm
exists, it might still be the case that most instances can indeed be solved quickly. Thus, much
effort has gone into studying random instances of k-SAT.
Random k-SAT is connected to statistical physics and tools from that area have brought
new light to this computational problem. A major conjecture in the area is that there are
phase transitions in the (apparent) difficulty of solving k-SAT, depending on the ratio m/n.
9
11 of 59
In other words, for large m/n, there is unlikely to be any solution, whereas for small m/n,
not only is there likely to be a solution, but an algorithm is known that is likely to provide
such a solution. The conjecture is that there is a sharp cut-off between these two possibilities
that does not depend on m and n but only on the ratio.
Other problems in the area have to do with the phenomenon that solutions cluster in
groups determined by many of the variables being “frozen,” i.e., taking only one value.
Survey Propagation is an algorithmic way for finding values that satisfy given requirements; it
works by making the assumption that marginals over cluster projections essentially factorize.
Determining the validity of this assumption is an open problem in the area.
Phase transitions and complexity were the topics of a special semester at the Simons
Institute for the Theory of Computing at UCB (https://simons.berkeley.edu/programs/
counting2016) that just concluded. Techniques in this field involve, besides Computer
Science, various elements of discrete probability theory and statistical physics. Complexity
theory also involves the analysis of Boolean functions, which is a form of discrete harmonic
analysis. This was also a main topic of another special semester at the Simons Institute
(https://simons.berkeley.edu/programs/realanalysis2013). Lyons has expertise in
discrete probability and statistical physics, as well as harmonic analysis.
C. Design and Methods
Much of the proposed research is theoretical, not involving experiments except for limited
application to real data of the theoretical techniques that have already been developed. For
a few of the proposed projects, there are natural ways of evaluating and interpreting the
data. For generating graphs that mimic real-world networks, graphs produced by a proposed
algorithm will be measured for their features and compared against values that come from
real-world networks. For problems such as metric embeddings and manifold learning, the
standard tool used to evaluate techniques is to use data where the answer is known and
determine the accuracy of the algorithms or the predictions from the model. These types of
techniques will be used to evaluate models and algorithms developed in this proposal. For
some problems involving long streams, it is possible to compare results theoretically to the
performance of existing algorithms.
Much of this proposal is about establishing connections on the Indiana University Bloomington campus that would allow the expertise of mathematics and statistics faculty to be
brought to bear on practical problems in Computer Science. The PIs are intrinsically motivated to pursue this cross-disciplinary research, which would be facilitated by additional
faculty whose research already crosses these disciplines or comes closer to doing so.
10
12 of 59
D. Timetable
Year 1: 2016-17
• Begin an interdisciplinary seminar devoted to Probabilistic Approaches to Computational Problems
• Hire one interdisciplinary postdoctoral fellow for a 3-year appointment to be housed in
the Department of Mathematics
• Draft ad for three interdisciplinary faculty hires
Year 2: 2017-18
• Hire a cluster of three faculty members in Mathematics, Theoretical Computer Science,
and/or Statistics
• Hire one interdisciplinary postdoctoral fellow for a 3-year appointment to be housed in
the Department of Mathematics
• Hire one interdisciplinary postdoctoral fellow for a 2-year appointment to be housed in
the Department of Computer Science
• Continue and ramp up the interdisciplinary seminar devoted to Probabilistic Approaches
to Computational Problems
• Submit a Simons “Targeted Grants to Institutes” proposal
Year 3: 2018-19
• Hire one interdisciplinary postdoctoral fellow for a 2-year appointment to be housed in
the Department of Statistics
• Hire one interdisciplinary postdoctoral fellow for a 2-year appointment to be housed in
the Department of Computer Science
• Submit a Simons Investigator Grant proposal
• Submit a collaborative National Science Foundation Research proposal
• Continue the interdisciplinary seminar devoted to Probabilistic Approaches to Computational Problems
11
13 of 59
Year 4: 2019-20
• Hire one interdisciplinary postdoctoral fellow for a 1-year appointment to be housed in
the Department of Computer Science
• Continue the interdisciplinary seminar devoted to Probabilistic Approaches to Computational Problems
• Apply for a W. M. Keck Foundation grant
E. Significance and Impact
While each part is significant, and the significance and impact of each is described below,
this proposal is more than the sum of its projects, each of which would be a major advance
by itself in some aspect of computation. The significance of this proposal is in its vision to
connect the work done on campus in mathematics and statistics to the work done on campus
in computer science and to solidify that connection using postdoctoral fellows and strategic
new faculty hires.
Specific Aim 1: Discovering trends in the presence of noisy data. From NetFlix
recommendations to modern information-technology security algorithms to precision medicine,
researchers and industry need to be able to efficiently and accurately identify trends in massive
noisy datasets.
Specific Aim 2: Probabilistic methods for communication-efficient algorithms for
distributed optimization across a network of computers. Distributed optimization is
important for analyzing massive data sets that are stored on multiple machines, where data is
collected from multiple sensors, for instance. It is also important for robotics, cognitive-radio
networks, and mobile-device networks.
Specific Aim 3: Inference with random graphs.
Examples of the scientific applications include inferring the structure of regulatory networks
in biological systems and mapping the connections in the brain. In order to understand and
predict the behavior of epidemics, with the possibility of controlling their virulence, one needs
to infer the structure of the network along which disease spreads.
Specific Aim 4: Computing with massive graphs. Practical applications include
understanding disease spread, transportation and evacuation models, business and computer
security intelligence, systems biology applications, the power grid, and full-scale modeling on
massive networks.
Specific Aim 5: k-SAT. The problem of proving the random k-SAT threshold is of
theoretical importance; it is a 30-year old conjecture. Algorithms for solving k-SAT problems
with high probability have many applications, such as to combinatorial equivalence checking,
automated theorem proving, and software verification. Algorithms have been developed that
12
14 of 59
work well in the space below critical where clustering is not an issue. Algorithms that work
better in spaces below critical where clustering is an issue exist and need to be developed.
F. Future Funding/Sustainability
The standard source for funding for mathematics and computing is the National Science
Foundation. If successful, this proposal would open up new sources of funding and create
patentable algorithms and commercializable software. For example, the NSF has a program
called “Critical Techniques, Technologies and Methodologies for Advancing Foundations and
Applications of Big Data Sciences and Engineering (BIGDATA)”. They particularly solicit
projects that are “collaborative, involving researchers from domain disciplines and one or
more methodological disciplines, e.g., computer science, statistics, mathematics, simulation
and modeling, etc.”
Potential private foundation sources of funding include the Simons Foundation, the Clay
Institute, and the W. M. Keck Foundation. The Simons Foundation’s Investigators program
funds researchers in Mathematics and Theoretical Computer Science during their most
productive years at a level of $100,000/year. The program also includes a special Math+X
award to encourage novel collaborations between mathematics and other fields. While we
are not proposing to solve the famous P vs. NP problem, which would lead to a Millennium
Prize from the Clay Institute, faculty hired on this proposal may be eligible for Clay Research
Fellowships and research generated from this proposal may lead to a Clay Research Award.
The W. M. Keck Foundation’s stated priorities are funding “projects in research that
• Focus on important and emerging areas of research
• Have the potential to develop breakthrough technologies, instrumentation or methodologies
• Are innovative, distinctive and interdisciplinary
• Demonstrate a high level of risk due to unconventional approaches, or by challenging
the prevailing paradigm
• Have the potential for transformative impact, such as the founding of a new field of
research, the enabling of observations not previously possible, or the altered perception
of a previously intractable problem
• Does not focus on clinical or translational research, treatment trials or research for the
sole purpose of drug development
• Fall outside the mission of public funding agencies
• Demonstrate that private philanthropy generally, and the W. M. Keck Foundation in
particular, is essential to the project’s success”
Aspects of this project may meet those criteria.
Additionally, the proposed research broadens the public funding sources we can apply
for to include military and national security funding agencies, in addition to more standard
13
15 of 59
funding through the National Science Foundation. We will also apply for funding in teams,
enhancing the chances an award will be made, increasing the size of the award, and increasing
administrative efficiency.
G. New Positions Proposed
We propose to hire three new tenured or tenure-track faculty members in probability,
analysis, and theoretical computer science, each of whom would have close ties to Computer
Science in the School of Informatics and Computing and to the Department of Mathematics
and the Department of Statistics in the College of Arts and Sciences. These faculty would
close the gap between the existing theoretical expertise in the Department of Mathematics and
the practical needs of faculty in Computer Science for mathematical tools. The departments
and the two schools are committed to making these hires work in whatever way makes most
sense for the hirees; junior faculty are often better served by having a clear tenure home,
for instance. However, the goal is to bring in faculty who can teach and train students in
multiple departments, thereby stabilizing the bridge being built between disciplines in this
proposal. Specifically, we envision faculty whose teaching is split between schools, including
team-teaching where appropriate, while their tenure home is in one unit. There is precedent
for this sort of appointment. Elizabeth Housworth has her full FTE in the Department of
Mathematics, but her teaching is split between Mathematics and Biology as described in her
appointment letter.
We will hire faculty with expertise in algorithms, complexity, and the theory of computation; combinatorial statistics; geometry and analysis at the interface between the continuous
and discrete; and probability and stochastic processes. These areas overlap so that a carefully
chosen selection of three new hires could cover these areas. There are set hiring procedures
within the Department of Mathematics, the Department of Computer Science, and the
Department of Statistics. Only the last has a formal mechanism to include faculty outside
the department in its internal hiring decisions. The Departments of Mathematics and the
Department of Computer Science are committed to extensive consultation during the hiring
process absent a formal mechanism. We expect that at least one faculty member will be hired
in the Department of Mathematics and at least one in the Department of Computer Science.
The third faculty member would be in any of the three departments that suits her best.
The hiring will be done as a cluster. Cluster hires can break down silos and increase
interdisciplinary work, key goals of this proposal. Cluster hires also are known to increase diversity https://www.insidehighered.com/news/2015/05/01/new-report-sayscluster-hiring-can-lead-increased-faculty-diversity, even when increasing diversity is not the stated goal of the cluster hire. Cluster hires work best when institutions create
structures that support the hires, facilitating interactions and valuing the work produced.
The PIs on the proposal will ensure that there are frequent opportunities for interactions
through establishing an interdisciplinary colloquium series and facilitating research discussion
groups for the new faculty and postdoctoral scholars. Teaching across departments, including
team-teaching courses, is another established way of supporting cluster hires. Finally, all the
departments involved have mechanisms for evaluating interdisciplinary work, so that it is
14
16 of 59
valued fairly during the tenure and promotion process.
H. IU and Collaborative Arrangements
We have letters of support from the Chairs of Mathematics, Computer Science, and Statistics,
and the Dean of the College of Arts and Sciences and the Dean of the School of Informatics and
Computing. We have external letters from Robin Pemantle, the Merriam Term Professor of
Mathematics at the University of Pennsylvania, and Yury Markarychev, an Associate Professor
at the Toyota Technological Institute at Chicago. The main collaborative arrangements are
between the PIs on this proposal and the new hires this proposal would fund, with additional
collaboration with Professor Markarychev expected as the work progresses.
I. Metrics and Deliverables
Metrics, deliverables, and assessment of the progress and impact of this proposal:
• We will apply for new funding each year of the proposal.
• We will increase the interdisciplinary work between Mathematics, Statistics, and
Computer Science, measurable by the increase in co-authored papers that bridge
disciplines.
• We will hold a successful interdisciplinary seminar with weekly talks attracting faculty,
postdoctoral fellows, and students from multiple disciplines.
• We will hire faculty who can cross disciplines, as evidenced by their participating in
the teaching missions of multiple departments. These faculty will also facilitate the
existing faculty in mathematics, statistics, and computer science crossing disciplinary
boundaries.
Metrics and assessment of the enhanced reputation of Indiana University Bloomington:
• We will attract postdoctoral and graduate student applicants of the highest caliber.
• We will attract world-renowned speakers to our lecturer series.
• Work conducted under this proposal will win international recognition and awards.
• Graduate students who join PIs on this work will obtain excellent positions in academia
and industry.
15
17 of 59
Biographical Sketch: Russell Lyons
Professional Preparation
Case Western Reserve University, Cleveland, OH
B.A. summa cum laude with departmental honors, May 1979, Mathematics
University of Michigan, Ann Arbor, MI
Ph.D., August 1983, Mathematics
Specialization: Harmonic Analysis
Université de Paris-Sud, Orsay, France
Postdoctoral work, 1983–1985
Specialization: Harmonic Analysis
Appointments
Indiana University, Bloomington, IN: James H. Rudy Professor of Mathematics, 2014–present.
Indiana University, Bloomington, IN: Adjunct Professor of Statistics, 2006–present.
Indiana University, Bloomington, IN: Professor of Mathematics, 1994–2014.
University of Calif., Berkeley: Visiting Miller Research Professor, Spring 2001.
Georgia Institute of Technology, Atlanta, GA: Professor, 2000–2003.
Microsoft Research: Visiting Researcher, Jan.–Mar. 2000, May–June 2004, July 2006, Jan.–June
2007, July 2008–June 2009, Sep.–Dec. 2010, Aug.–Oct. 2011, July–Oct. 2012, May–July 2013,
Jun.–Oct. 2014, Jun.–Aug. 2015, Jun.–Aug. 2016.
Weizmann Institute of Science, Rehovot, Israel: Rosi and Max Varon Visiting Professor, Fall 1997.
Institute for Advanced Studies, Hebrew University of Jerusalem, Israel: Winston Fellow, 1996–97.
Université de Lyon, France: Visiting Professor, May 1996.
University of Wisconsin, Madison, WI: Visiting Associate Professor, Winter 1994.
Indiana University, Bloomington, IN: Associate Professor, 1990–94.
Stanford University, Stanford, CA: Assistant Professor, 1985–90.
External Funding
NATO Postdoctoral Fellowship in Science, Université de Paris-Sud, 1983–84.
AMS Postdoctoral Fellowship, Université de Paris-Sud, 1984–85 and Stanford University, 1985–86.
NSF Mathematical Sciences Postdoctoral Research Fellowship, Stanford University, 1986–89.
Alfred P. Sloan Foundation Research Fellowship, Indiana University, 1990–93.
NSF, Division of Mathematical Sciences, Statistics and Probability Program, $60,000, 1993–96.
NSF, Division of Mathematical Sciences, Statistics and Probability Program, $74,000, 1998–2001.
NSF, Division of Mathematical Sciences, Statistics and Probability Program, $100,000, 2001–04.
NSF, Division of Mathematical Sciences, Statistics and Probability Program, $61,020, 2002–04.
NSF, Division of Mathematical Sciences, Statistics and Probability Program, $258,000, 2004–07.
NSF, Division of Mathematical Sciences, Statistics and Probability Program, $285,000, 2007–10.
NSF, Division of Mathematical Sciences, Statistics and Probability Program, $303,161, 2010–16.
NSF, Division of Mathematical Sciences, Statistics and Probability Program, $15,000, 2015.
NSF, Division of Mathematical Sciences, Statistics and Probability Program, $59,307, 2015–16.
NSF, Division of Mathematical Sciences, Statistics and Probability Program, $150,000, 2016–19.
1
37 of 59
Selected Publications
Lyons, R. Distance covariance in metric spaces, Ann. Probab. 41, no. 5 (2013), 3284–3305.
http://pages.iu.edu/~rdlyons/pdf/dcov.pdf
Lyons, R. Hyperbolic space has strong negative type, Illinois J. Math. 58, no. 4 (2014), 1009–
1013.
http://pages.iu.edu/~rdlyons/pdf/hypneg.pdf
Lyons, R. The spread of evidence-poor medicine via flawed social-network analysis, Stat., Politics,
Policy 2, 1 (2011), Article 2. (27 pp.) DOI: 10.2202/2151-7509.1024.
http://pages.iu.edu/~rdlyons/pdf/CF-pub-erratum.pdf
Lyons, R. Factors of IID on trees, Combin. Probab. Comput., to appear.
http://pages.iu.edu/~rdlyons/pdf/fiid.pdf
Oveis Gharan, S. and Lyons, R. Sharp bounds on random walk eigenvalues via spectral embedding, preprint.
http://pages.iu.edu/~rdlyons/pdf/peigs.pdf
Lyons, R. Determinantal probability measures, Publ. Math. Inst. Hautes Études Sci. 98 (2003),
167-212.
http://pages.iu.edu/~rdlyons/pdf/bases.pdf
Lyons, R. Determinantal probability: basic properties and conjectures, Proc. Intl. Congress Math.,
2014, vol. IV, 137–161.
http://pages.iu.edu/~rdlyons/pdf/icm.pdf
Lyons, R. Fourier-Stieltjes coefficients and asymptotic distribution modulo 1, Ann. of Math. 122
(1985), 155–170.
Lyons, R. The measure of non-normal sets, Invent. Math. 83 (1986), 605–616.
Lyons, R. A new type of sets of uniqueness, Duke Math. J. 57 (1988), 431–458.
Gaboriau, D. and Lyons, R. A measurable-group-theoretic solution to von Neumann’s problem,
Invent. Math. 177, no. 3 (2009), 533–540.
http://pages.iu.edu/~rdlyons/pdf/subr.pdf
Angel, O., Kechris, A.S. and Lyons, R. Random orderings and unique ergodicity of automorphism groups, J. Europ. Math. Soc. 16 (2014), 2059–2095.
http://pages.iu.edu/~rdlyons/pdf/order.pdf
Synergistic Activities
Lecture Series:
Aug. 1999: International Summer School (20 lectures), Jyväskylä, Finland
March 2005: Minicourse at EURANDOM (4 hours), Netherlands
July 2005: Cornell Summer School in Probability (9 hours)
Nov. 2008: Courant Research Centre, Göttingen (Distinguished Lecture Series, 3 hours)
March 2012: Conference at Vanderbilt (3 hours)
Editing:
Annals of Probability, Associate Editor, 2003–2008
Annals of Applied Probability, Associate Editor, 2003–2008
J. Topology Analysis, Associate Editor, 2007–
Tbilisi Mathematical J., Managing Editor, 2009–2014
Journal of Fractal Geometry, Associate Editor, 2013–
15 conferences organized since 1999
130 invited talks at other institutions, conferences, and workshops, Jan. 2000–Dec. 2015
10 presentations on misuse of applied statistics in a variety of venues, 2011–2014
2
38 of 59
Recent and Current Collaborators
Omer Angel, U. British Columbia, Vancouver, Canada; Itai Benjamini, Weizmann Institute of
Science; Damien Gaboriau, ENS-Lyon, France; Alexander Kechris, Caltech; Shayan Oveis Gharan,
U. Washington; Yuval Peres, Microsoft Research, Redmond, WA; Oded Schramm, Microsoft Research, Redmond, WA; Xin Sun, MIT; Andreas Thom, Technische U., Dresden, Germany; Kevin
Zumbrun, IU
Graduate Advisers and Postdoctoral Sponsors (2)
Thesis Advisers: Hugh L. Montgomery, Allen L. Shields, University of Michigan, Ann Arbor
Postdoctoral Adviser: Jean-Pierre Kahane, Université de Paris, Orsay, France
Thesis Adviser and Postgraduate-Scholar Sponsor (8)
Ádám Timár, Rényi Institute, Budapest, Hungary; Serdar Altok, Boğaziçi University, Istanbul, Turkey; Peter Mester, J.P. Morgan, Budapest, Hungary; Sandeep Bhupatiraju, Indiana
U.; Justin Cyr, Indiana U.; Minwoo Park, Indiana U.; Pengfei Tang, Indiana U.; Liviu Ilinca,
G-Research, London, UK
3
39 of 59
Biosketch: Michael W. Trosset
See http://mypage.iu.edu/∼ mtrosset/Personal/cv.pdf for complete curriculum vitae.
Education and Current Employment
• B.A. (summa cum laude), Mathematics and Mathematical Sciences, Rice University, May
1978.
• Ph.D. (Fannie & John Hertz Foundation Fellow), Statistics, University of California, Berkeley,
December 1983.
• Professor (from August 2006) and Chair (from August 2012) of Statistics, Executive Director
of the Indiana Statistical Consulting Center, Indiana University, Bloomington.
Selected Research Grants
1. Principal Investigator, Global Optimization for Multidimensional Scaling (University of Arizona), National Science Foundation, $59,527.35, July 1996 to June 1999.
2. Principal investigator, Statistical Decision-Theoretic Methods for Robust Design Optimization, National Science Foundation, $126,000, August 2004 to July 2008.
3. Principal investigator, Embedding Method for Disparate Data, Office of Naval Research,
$300,000, January 2007 to December 2010.
4. Principal investigator, IU subcontract to Virginia Polytechnic Institute & State University
(L. T. Watson), Parallel Deterministic and Stochastic Global Optimization Algorithms, Air
Force Office of Scientific Research, $210,333, January 2009 to December 2011.
5. Principal investigator, IU subcontract to Johns Hopkins University (C. E. Priebe), Fusion
and Interference from Multiple and Massive Disparate Data Sources, Department of Defense,
$231,116, January 2009 to December 2013.
Books
1. An Introduction to Statistical Inference and Its Applications with R, Chapman & Hall/CRC,
Taylor & Francis Group, June 23, 2009. Supplementary materials are provided on an accompanying web page:
http://mypage.iu.edu/∼ mtrosset/StatInfeR.html
Selected Articles in Professional Journals
1. Biotic and abiotic influences on foraging of Heterotermes aureus. Environmental Entomology,
16:791–795, 1987. (S.C. Jones, M.W. Trosset, W.L. Nutting)
2. Nesting-habitat relationships of riparian birds along the Colorado River in Grand Canyon,
Arizona. The Southwestern Naturalist, 34:260–270, 1989. (B.T. Brown, M.W. Trosset)
1
40 of 59
3. Alzheimer’s disease effects on semantic memory: loss of structure or impaired processing?
Journal of Cognitive Neuropsychology, 3:166–182, 1991. (K.A. Bayles, C.K. Tomoeda, A.W.
Kaszniak, M.W. Trosset)
4. Interference competition in desert subterranean termites. Entomologia Experimentalis et
Applicata, 61:83–90, 1991. (S.C. Jones, M.W. Trosset)
5. Relation of linguistic communication abilities of Alzheimer’s patients to stage of disease.
Brain and Language, 42:454–472, 1992. (K.A. Bayles, C.K. Tomoeda, M.W. Trosset)
6. Optimal shapes for kernel density estimation. Communications in Statistics—Theory and
Methods, 22(2):375–391, February 1993.
7. Alzheimer’s disease: effects on language. Developmental Neuropsychology, 9(2):131–160, 1993.
(K.A. Bayles, C.K. Tomoeda, M.W. Trosset)
8. An extension of the Karush-Kuhn-Tucker necessity conditions to infinite programming. SIAM
Review, 36(1):1–17, March 1994. (R.A. Tapia, M.W. Trosset)
9. Measures of deficit unawareness for predicted performance experiments. Journal of the International Neuropsychological Society, 2:315–322, 1996. (M.W. Trosset, A.W. Kaszniak)
10. A new formulation of the nonmetric STRAIN problem in multidimensional scaling. Journal
of Classification, 15:15–35, 1998.
11. The solution of the metric STRESS and SSTRESS problems in multidimensional scaling by
Newton’s method. Computational Statistics, 13(3):369–396, 1998. (A.J. Kearsley, R.A. Tapia,
M.W. Trosset)
12. A rigorous framework for optimization of expensive functions by surrogates. Structural Optimization, 17(1):1–13, 1999. (A.J. Booker, J.E. Dennis, P.D. Frank, D.B. Serafini, V. Torczon,
M.W. Trosset)
13. Distance matrix completion by numerical optimization. Computational Optimization and
Applications, 17:11–22, 2000.
14. Recursive Bayesian inference for hydrologic models. Water Resources Research, 37(10):2521–
2535, 2001. (M. Thiemann, M.W. Trosset, H. Gupta, S. Sorooshian)
15. Extensions of classical multidimensional scaling via variable reduction. Computational Statistics, 17(2):147–162, 2002.
16. Better initial configurations for metric multidimensional scaling. Computational Statistics
and Data Analysis, 41(1):143–156, 2002. (S.W. Malone, P. Tarazaga, M.W. Trosset)
17. Visualizing correlation. Journal of Computational and Statistical Graphics, 14(1):1–19, 2005.
18. On the diagonal scaling of Euclidean distance matrices to doubly stochastic matrices. Linear
Algebra and Its Applications, 397:253–264, 2005. (C.R. Johnson, R.D. Masson, M.W. Trosset)
19. Approximate solutions of continuous dispersion problems. Annals of Operations Research,
136:65–80, 2005. (A. Dimnaku, R. Kincaid, M.W. Trosset)
2
41 of 59
20. Sensitivity analysis of the strain criterion for multidimensional scaling. Computational Statistics and Data Analysis, 50:135–153, 2006. (R.M. Lewis, M.W. Trosset)
21. The out-of-sample problem for classical multidimensional scaling. Computational Statistics
& Data Analysis, 52(10):4635–4642, June 2008. (M.W. Trosset, C.E. Priebe)
22. Semisupervised learning from dissimilarity data. Computational Statistics & Data Analysis,
52(10):4643–4657, June 2008. (M.W. Trosset, C.E. Priebe, Y. Park, M.I. Miller)
23. Iterative denoising. Computational Statistics, 23(4):497–517, October 2008. (K.E. Giles,
M.W. Trosset, D.J. Marchette, C.E. Priebe)
24. Molecular embedding via a second-order dissimilarity parameterized approach. SIAM Journal
on Scientific Computing, 31(4):2733–2756, 2009. (I.G. Grooms, R.M. Lewis, M.W. Trosset)
25. Euclidean and circum-Euclidean distance matrices: characterizations and linear preservers.
Electronic Journal of Linear Algebra, 20:739–752, 2010. (C.-K. Li, T. Milligan, M.W. Trosset)
26. Parallel deterministic and stochastic global minimization of functions with very many minima. Computational Optimization and Applications, 57:469–492, 2014. (D.R. Easterling, L.T.
Watson, M.L. Madigan, B.S. Castle, M.W. Trosset)
27. Algorithm XXX: QNSTOP—quasi-Newton algorithm for stochastic optimization. To appear
in ACM Transactions on Mathematical Software, 2017. (B.D. Amos, D.R. Easterling, L.T.
Watson, W.I. Thacker, B.S. Castle, M.W. Trosset)
28. Fast embedding for JOFC using the raw stress criterion. arXiv:1502.03391, 2015. In revision.
(V. Lyzinski, Y. Park, C.E. Priebe, M.W. Trosset)
29. On the power of likelihood ratio tests in dimension-restricted submodels. arXiv:1608.00032,
2016. Submitted. (M.W. Trosset, M. Gao, C.E. Priebe)
Senior Theses Directed (College of William & Mary). Anthony Padula, Interpolation and
Pseudorandom Function Generation, 2000; Paul Goger, Computational Experiments with Stochastic Approximation, 2001; Michael Levy, Computational Experiments with Two Response Surface
Methods for Stochastic Optimization, 2003; Kristina Hofmann, Computational Experiments with
Nearest Neighbor Classification, 2005.
Summer REU Students (Matrix Analysis and Applications, College of William & Mary).
Samuel Malone, A Study of the Stationary Configurations of the SStress Criterion for Metric Multidimensional Scaling, 1999; Robert Masson, On the Diagonal Scaling of Euclidean Distance Matrices
to Doubly Stochastic Matrices, 2000.
Ph.D. Students (Indiana University). Minh Tang (Computer Science) defended his dissertation
on Graph Metrics and Dimensionality Reduction in October 2010. He is currently an Assistant
Research Professor at Johns Hopkins University. Brent Castle (Computer Science) defended his
dissertation on Quasi-Newton Methods for Stochastic Optimization with Application to SimulationBased Parameter Estimation in July 2012. He currently works for the Department of Defense.
3
42 of 59
Biographical Sketch: Funda Ergun
School of Informatics and Computing, Indiana University,
Bloomington 150 South Woodlawn Avenue, Bloomington, IN 47405,
USA
Phone: (812)-369-3793; E-mail: [email protected]; Web: www.informatics.indiana.edu/fergun
Professional Preparation
Bilkent University, Ankara, Turkey
The Ohio State University, Columbus, OH
Cornell University, Ithaca, NY
University of Pennsylvania, Philadelphia
Computer Engineering and Information Science B.S. 1990
Computer Science
M.S. 1992
Computer Science
Ph.D. 1998
Computer Science
Post Doc Fellow 1999
Appointments
Indiana University, Bloomington, IN
Indiana University, Bloomington, IN
Simon Fraser University, Burnaby, BC
Simon Fraser University, Burnaby, BC
Simon Fraser University, Burnaby, BC
NEC Research, Princeton, NJ
Case Western Reserve University, Cleveland OH
Bell Laboratories, Murray Hill, NJ
University of Pennsylvania
Professor
Associate Professor
Professor
Associate Professor
Assistant Professor
Visiting Research Scientist
Schroeder Assistant Professor
Member of Technical Staff
Postdoctoral Researcher
2015 - present
2013 - 2015
2013
2006 - 2013
2003 - 2006
2001 - 2002
199 - 2003
1998 - 1999
1997 - 1998
Five Most Relevant Products
Y. Le, J.C. Liu, F. Ergun, D. Wang. Online Load Balancing for MapReduce with Skewed Data Input.
Proceedings of the the 33rd Annual IEEE International Conference on Computer Communications
(INFOCOM), Toronto, ON, April 2014.
F. Ergun, H. Jowhari. On Distance to Monotonicity and Longest Increasing Subsequence of a Data Stream.
Combinatorica (Conf. Version: ACM/SIAM Symposium on Discrete Algorithms, SODA'08),
10.1007/s00493- 014-3035-1, 2014.
P. Berenbrink, F. Ergun, F. Mallmann-Trenn, E. Sadeqi-Azer. Palindrome Recognition in the Streaming
Model. Proceedings of the 31st Annual Symposium on Theoretical Aspects of Computer Science (STACS),
Lyon, France, March 2014.
F. Ergun, S. Muthukrishnan, S.C. Sahinalp. Periodicity testing with sublinear samples and space. ACM
Transactions on Algorithms 6(2), pp 1-14. 2010.
F. Ergun, H. Jowhari, M. Saglam. Periodicity in Streams. Proceedings of the 14th Intl. Workshop on
Randomization and Computation (RANDOM), Barcelona, Spain, September 2010.
Five Other Significant Products
Y. Le, F. Wang, J.C. Liu, F. Ergun. On Datacenter-Network-Aware Load Balancing in MapReduce.
Proceedings of the 8th IEEE CLOUD, New York, NY, June 2015.
T. Batu, F. Ergun, C. Sahinalp. Oblivious String Embeddings and Edit Distance Approximations. Proceedings
of the 17th ACM/SIAM Symposium on Discrete Algorithms (SODA), Miami, Florida, January 2006.
43 of 59
A. Czumaj, F. Ergun, L. Fortnow, A. Magen, I. Newman, R. Rubinfeld, C. Sohler. Sublinear Approximation
of Euclidean Minimum Spanning Tree. SIAM Journal on Computing, 35(1):91–109, 2005.
P. Berenbrink, F. Ergun, T. Friedetzky. Finding Frequent Patterns in a String in Sublinear Time. Proceedings
of the 13th Annual European Symposium on Algorithms (ESA), LNCS, pp. 747–757, Mallorca, Spain,
October 2005.
F. Ergun, S. R. Kumar, R. Rubinfeld. Fast Approximate Probabilistically Checkable Proofs.
Information and Computation, 189(2):135–159, 2004.
Synergistic Activities
Leader, PIMS Collaborative Research Group on Algorithmic Theory of Networks.
Co-organizer, Summer School on Randomized Algorithms, Algorithms, Vancouver, BC, 8/14.
Co-organizer, BIRS Workshop on Communication Complexity, Banff, AB, 8/14.
Co-organizer, Workshop on Streaming Algorithms, Dortmund, Germany, 7/12.
PC Member, Symposium on Discrete Algorithms (SODA), Kyoto, Japan,1/12.
Collaborators in the past 48 months (16 )
Petra Berenbrink (Simon Fraser U.), Will Evans (U of British Columbia), Nick Harvey (U. of British
Columbia), Lisa Higham (U. of Calgary), Hossein Jowhari (U. of Warwick), Bruce Kapron (U. of Victoria),
David Kirkpatrick (U. of British Columbia), Valerie King (U. of Victoria), J.C. Liu (Simon Fraser U.),
Kostas Oikonomou (AT&T Research), Mert Saglam (U. of Washington), Rakesh Sinha (AT&T Research),
Venkatesh Srinivasan (U. of Victoria), D. Wang (Hong Kong Polytechnic U.), F. Wang (U. of Mississippi),
Philipp Woelfel (U. of Calgary)
Current Advisees ( 2 )
Erfan Sadeqi-Azer (one Ph.D. student), Peter Kling (one postdoctoral researcher).
Past Advisees ( 4 )
Hossein Jowhari, Erfan Sadeqi-Azer, Yanfang Le (three graduate students: one Ph.D., two M.S.), Christiane
Lammersen (one postdoctoral researcher) in the past 48 months.
Before that, an additional two postdocs and six graduate students.
Advisors ( 2 )
Ph.D. advisor:
Ronitt Rubinfeld (Massachusetts Institute of Technology, Cambridge,
MA) Postdoctoral advisor:
Sampath Kannan (University of Pennsylvania, Philadelphia, PA)
(two advisors: one Ph.D., one postdoctoral)
44 of 59
Biographical Sketch for Michael Larsen
Professional Preparation
Harvard College, A.B. in Mathematics, 1984.
Princeton University, Ph.D. in Mathematics, 1988.
Institute for Advanced Study, School of Mathematics, 1988–1990.
Appointments
Indiana University, 1997–present.
University of Missouri, 1997–1998.
University of Pennsylvania, 1990–1997.
Publications
Most relevant
• Kazhdan, David; Larsen, Michael; Varshavsky, Yakov: The Tannakian Formalism
and the Langlands Conjectures, Algebra Number Theory 8 (2014), no. 1, 243–256.
• Kollár, János; Larsen, Michael: Quotients of Calabi-Yau varieties. Algebra, arithmetic, and geometry: in honor of Yu. I. Manin. Vol. II, 179–211, Progr. Math.,
270, Birkhäuser Boston, Inc., Boston, MA, 2009.
• Larsen, Michael; Lubotzky, Alexander; Marion, Claude: Deformation theory and
finite simple quotients of triangle groups I, J. Eur. Math. Soc. (JEMS) 16 (2014),
no. 7, 1349-1375.
• Larsen, Michael; Pink, Richard: Finite subgroups of algebraic groups, J. Amer.
Math. Soc. 24 (2011), 1105–1158.
• Larsen, Michael; Shalev, Aner: Tiep, Pham Huu: Waring Problem for Finite
Simple Groups, Annals of Math. (2) 174 (2011), no. 3, 1885–1950.
Representative
• Elkies, Noam; Kuperberg, Greg; Larsen, Michael; Propp, James: Alternating-sign
matrices and domino tilings. I. J. Algebraic Combin. 1 (1992), no. 2, 111–132.
• Freedman, Michael H.; Kitaev, Alexei; Larsen, Michael J.; Wang, Zhenghan:
Topological quantum computation. Mathematical challenges of the 21st century.
Bull. Amer. Math. Soc. (N.S.) 40 (2003), no. 1, 31–38.
• Larsen, Michael: Maximality of Galois actions for compatible systems. Duke
Math. J. 80 (1995), no. 3, 601–630.
• Larsen, Michael; Lunts, Valery A.: Rationality criteria for motivic zeta functions.
Compos. Math. 140 (2004), no. 6, 1537–1560.
• Larsen, Michael; Pink, Richard: On the ℓ-independence of algebraic monodromy
in compatible systems of representations, Invent. Math. 107 (1992), 603–636.
Synergistic Activities
• The proposer is Chair of the prize committee for the Frank Nelson Cole Prize in
Number Theory.
• The proposer serves on the following editorial boards: Journal of the American Mathematical Society, Transactions of the American Mathematical Society, Memoirs of the
American Mathematical Society, Indiana University Mathematics Journal.
1
45 of 59
• The proposer served for three years on the Putnam Committee of the Mathematical
Association of America; he contributed sixteen problems which appeared on Putnam
Examinations over a four year period.
• The proposer developed software for the Gutenberg Project that was solicited by
and subsequently contributed to the optical character recognition project of the Free
Software Foundation and System 4 (which was later folded into Mac OS X) of NeXT
Computer.
• The proposer consulted for E-Systems, a subsidiary of Raytheon, on signal processing
issues related to a robotics program.
Collaborators
Khalid Bou-Rabee (CUNY)
Jean Bourgain (Institute for Advanced Study)
Emmanuel Breuillard (Université Paris-Sud)
Jordan Ellenberg (University of Wisconsin)
David Fisher (Indiana University)
Robert Guralnick (USC)
Bo-Hae Im (KAIST)
David Kazhdan (Hebrew Univiversity)
Chandrashekhar Khare (UCLA)
János Kollár (Princeton University)
Ayelet Lindenstrauss (Indiana University)
Alexander Lubotzky (Hebrew University)
Valery Lunts (Indiana University)
Justin Malestein (Hebrew University)
Claude Marion (Universität Freiburg )
Gunter Malle (Universität Kaiserslautern)
Barry Mazur (Harvard University)
Karl Rubin (UC Irvine)
Gordan Savin (University of Utah)
Aner Shalev (Hebrew University)
Ralf Spatzier (University of Michigan)
Matthew Stover (Temple University)
Pham Tiep (University of Arizona)
Yakov Varshavsky (Hebrew University)
Graduate and Postdoctoral Advisors
Gerd Faltings (Max Planck Institute Bonn)
Robert Langlands (Institute for Advanced Study)
Thesis Advisor and Postgraduate-Scholar Sponsor
Brad Emmons - Utica College (graduate)
Arthur Gershon (graduate)
Chun Yin Hui - VU Amsterdam (graduate)
Bo-hae Im - KAIST (graduate)
2
46 of 59
Daniel Jordan - Columbia College (graduate)
Neeraj Kashyap (graduate)
Eugene Kushnirski - Northwestern University (postdoctoral)
Corey Manack (graduate)
Michael Movshev - SUNY Stony Brook (graduate)
Christopher Thornhill - Wayland Baptist University (graduate)
Krishna Venkata (graduate)
Erik Wallace - University of Connecticut (graduate)
The proposer has supervised eleven Ph. D. theses and is currently supervising six doctoral
students (including two who have not yet qualified to begin work on their theses). He has
also supervised one postdoctoral associate.
3
47 of 59
Yuan Zhou
E-mail: [email protected], Homepage: http://homes.soic.indiana.edu/yzhoucs/
Professional
Preparation
B.Eng. in Computer Science, Tsinghua University
Student in Tsinghua University – Microsoft CS Pilot Class
GPA: 94.1/100, Rank 1/130
Beijing, China, 2005 – 2009
M.Sc. in Computer Science, Carnegie Mellon University
Pittsburgh, Pennsylvania, USA, 2009 – 2013
Ph.D. in Theoretical Computer Science, Carnegie Mellon University
Pittsburgh, Pennsylvania, USA, 2009 – 2014
Advisors: Prof. Venkatesan Guruswami and Prof. Ryan O’Donnell
Appointments
Related
Publications
Assistant Professor
Computer Science Department, Indiana University at Bloomington
2016.08 – current
Instructor in Applied Mathematics
Mathematics Department, Massachusetts Institute of Technology
2014.08 – 2016.06
Optimal Sparse Designs for Process Flexibility via Probabilistic Expanders
Xi Chen, Jiawei Zhang, Yuan Zhou
Operations Research 63(5): pp. 1159–1176 (2015)
Satisfiability of Ordering CSPs Above Average Is Fixed-Parameter Tractable
Konstantin Makarychev, Yury Makarychev, Yuan Zhou
FOCS 2015, Proceedings of the 56th Annual Symposium on Foundations of Computer Science
Constant Factor Lasserre Gaps for Graph Partitioning Problems
Venkatesan Guruswami, Ali Kemal Sinop, Yuan Zhou
SIAM Journal on Optimization 24–4 (2014), pp. 1698–1717
Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing
Yuan Zhou, Xi Chen, Jian Li
ICML 2014, the 30th International Conference on Machine Learning
Hardness of Robust Graph Isomorphism, Lasserre Gaps, and Asymmetry of Random Graphs
Ryan O’Donnell, John Wright, Chenggang Wu, Yuan Zhou
SODA 2014, Proceedings of the 25th annual ACM-SIAM Symposium on Discrete Algorithms
Additional
Publications
Hypercontractive inequalities via SOS, with an application to Vertex-Cover
Manuel Kauers, Ryan O’Donnell, Li-Yang Tan, Yuan Zhou
SODA 2014, Proceedings of the 25th annual ACM-SIAM Symposium on Discrete Algorithms
Approximability and proof complexity
Ryan O’Donnell, Yuan Zhou
SODA 2013, Proceedings of the 24th annual ACM-SIAM Symposium on Discrete Algorithms
Hypercontractivity, Sum-of-Squares Proofs, and their Applications
Boaz Barak, Fernando Brandão, Aram Harrow, Jonathan Kelner, David Steurer, Yuan Zhou
STOC 2012, Proceedings of the 44th annual ACM Symposium on Theory of Computing Conference
48 of 59
Polynomial integrality gaps for strong SDP relaxations of Densest k-Subgraph
Aditya Bhaskara, Moses Charikar, Venkatesan Guruswami, Aravindan Vijayaraghavan, Yuan
Zhou
SODA 2012, Proceedings of the 23th annual ACM-SIAM Symposium on Discrete Algorithms
Approximation Algorithms and Hardness of the k-Route Cut Problem
Julia Chuzhoy, Yury Makarychev, Aravindan Vijayaraghavan, Yuan Zhou
SODA 2012, Proceedings of the 23th annual ACM-SIAM Symposium on Discrete Algorithms
Invited to ACM Transactions on Algorithms
Recent
Collaborators
Xi Chen, New York University
Xue Chen, University of Texas at Austin
Parikshit Gopalan, Microsoft Research
Venkatesan Guruswami, Carnegie Mellon University
Manuel Kauers, Johannes Kepler Universität
Jian Li, Tsinghua University
Konstantin Makarychev, Microsoft Research
Yury Makarychev, Toyota Technological Institute at Chicago
Raghu Meka, University of California at Los Angeles
Ryan O’Donnell, Carnegie Mellon University
Omer Reingold, Samsung Research America
Ali Kemal Sinop, Institute for Advanced Study
Li-Yang Tan, Toyota Technological Institute at Chicago
Madhur Tulsiani, Toyota Technological Institute at Chicago
John Wright, Carnegie Mellon University
Chenggang Wu, Tsinghua University
Salil Vadhan, Harvard University
Yuichi Yoshida, National Institute of Informatics (Japan)
Jiawei Zhang, New York University
Recent
Co-editors
None
Graduate
Advisors and
Postdoctoral
Sponsors
None
Thesis Advisor
and
Postgraduate–
Scholar
Sponsor
None
49 of 59
Martha White
Department of Computer Science and Informatics, Indiana University, Bloomington
150 South Woodlawn Avenue
Bloomington, IN 47405, USA
E-mail: [email protected]
Web: www.informatics.indiana.edu/martha
1. Professional preparation
• B.S., Mathematics, University of Alberta, Edmonton, Canada, 2008.
• B.S., Computing Science, University of Alberta, Edmonton, Canada, 2008.
• M.S., Computing Science, University of Alberta, Edmonton, Canada, 2010.
• Ph.D., Computing Science, University of Alberta, Edmonton, Canada, 2015.
2. Appointments
01/2015 — present
Assistant Professor, Department of Computer Science, School of Informatics and Computing, Indiana University, Bloomington
3. Products
Selected relevant publications
• S. Jain, M. White, P. Radivojac. Estimating the class prior and posterior from noisy positives and
unlabeled data. In Advances in Neural Information Processing Systems (NIPS), 2016.
• R. S. Sutton, A. R. Mahmood and M. White. An Emphatic Approach to the Problem of Off-policy
Temporal-Difference Learning. Journal of Machine Learning Research (JMLR), 2016.
• M. White, J. Wen, M. Bowling and D. Schuurmans. Optimal Estimation of Multivariate ARMA
Models. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI), 2015.
• F. Mirzazadeh, M. White, A. Gyorgy and D. Schuurmans. Scalable Metric Learning for Co-embedding.
In European Conference on Machine Learning and Principles and Practice of Knowledge Discovery
in Databases (ECML PKDD), 2015.
• M. White, Y. Yu, X. Zhang, D. Schuurmans. Convex Multiview Subspace Learning. In Advances in
Neural Information Processing Systems (NIPS), 2012.
Other relevant publications
Clement Gehring, Yangchen Pan and M. White. Incremental Truncated LSTD. In Proceedings of the
Twenty-First International Joint Conference on Artificial Intelligence (IJCAI), 2016.
A. White and M. White. Investigating practical, linear temporal difference learning. In Proceedings
of the International Conference on Autonomous Agents and Multi-agent Systems (AAMAS), 2016.
• J. Veness, M. White, M. Bowling, and A. Gyorgy. Partition Tree Weighting. Data Compression
Conference (DCC), 2013.
• M. White and D. Schuurmans. Generalized Optimal Reverse Prediction. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.
• L. Xu, M. White and D. Schuurmans. Optimal Reverse Prediction: A Unified Perspective on Supervised, Unsupervised and Semi-supervised Learning. In Proceedings of the Twenty-Sixth International
Conference on Machine Learning (ICML), 2009. Honorable Mention for Best Paper
1
50 of 59
4. Synergistic activities
• Program Committee member for several machine learning conferences, including ICML (2015,2016),
NIPS (2015,2016), AAAI (2015,2016), IJCAI (2014, 2015,2016)
• Reviewer for JMLR, ICML, NIPS, IJCAI, AAAI, AISTATS, Machine Learning Journal, Transactions
on Image Processing, Journal of Autonomous Agents and Multi-agent Systems, Artificial Intelligence
Journal, IEEE Transactions on Neural Networks and Learning Systems
• Served on panels for graduate and undergraduate students, through Women in Technology (CeWIT)
at Indiana University
• Tutored native american students under Frontier College, Edmonton, AB, Canada (2014)
• Workshops for youth, including workshops with Women in Scholarship, Engineering, Science and
Technology (WISEST) and Women in Technology (WIT) promoting diversity in Computing Science
(2011, 2007)
5. Collaborators (past 5 years, alphabetical by last name), total = 10
Bowling Michael (U. of Alberta), Degris Thomas (Google Deepmind), Gyorgy Andras (U. of Alberta),
Pestilli Franco (Indiana U.), Radivojac Predrag (Indiana U.), Schuurmans Dale (U. of Alberta), Sutton
Richard (U. of Alberta), Trosset Michael (Indiana U.), Veness Joel (Google Deepmind), Zhang Xinhua
(NICTA)
6. Current advisees
Ph.D.: Tasneem Alowaisheq, Lei Le, Raksha Kumaraswamy, Yangchen Pan.
7. Ph.D. Advisors
Michael Bowling and Dale Schuurmans, University of Alberta
2
51 of 59
Biographical Sketch: Qin Zhang
School of Informatics and Computing, Indiana University, Bloomington
150 South Woodlawn Avenue, Bloomington, IN 47405, USA
Phone: (812)-855-2567; E-mail: [email protected]; Web: http://homes.soic.indiana.edu/qzhangcs/
1. Professional Preparation
B.S., Computer Science, Fudan University, Shanghai, China, 2006.
Ph.D., Computer Science, Hong Kong University of Science and Technology, Hong Kong, 2010.
Post-doctoral fellow, Computer Science, Aarhus University, Aarhus, Denmark, 2012.
Post-doctoral fellow, Computer Science, IBM Research Almaden, San Jose, CA, USA, 2013.
2. Appointments
08/2013 –
Assistant Professor, School of Informatics and Computing, Indiana University,
Bloomington
3. Products
Five Most Relevant Products





D. P. Woodruff, Q. Zhang. An Optimal Lower Bound for Distinct Elements in the Message Passing
Model. Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms (SODA 14), pages
718-733. Portland, OR, USA, January 2014.
K. Yi. Q. Zhang. Optimal Tracking of Distributed Heavy Hitters and Quantiles. Algorithmica,
volume 65, issue 1, pages 206-223, January 2013.
D. P. Woodruff, Q. Zhang. Tight Bounds for Distributed Functional Monitoring. Proceedings of the
44th ACM Symposium on Theory of Computing (STOC 12), pages 941-960. New York, NY, USA,
May 2012.
G. Cormode, S. Muthukrishnan, K. Yi, Q. Zhang. Continuous Sampling from Distributed Streams.
Journal of the ACM. (JACM), 59(2), Article 10, April 2012.
Z. Huang, K. Yi, Q. Zhang. Randomized Algorithms for Tracking Distributed Count, Frequencies,
and Ranks. Proceedings of the 31th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of
Database Systems (PODS 12), pages 295-306. Scottsdale, Arizona, USA, May 2012.
Five Other Significant Products



D. Belazzougui, Q. Zhang. Edit Distance: Sketching, Streaming and Document Exchange.
Proceedings of the 57th IEEE Symposium on Foundations of Computer Science (FOCS 16), to
appear. New Brunswick, NJ, October 2016.
J. M. Phillips, E. Verbin, Q. Zhang. Lower Bounds for Number-in-Hand Multiparty Communication
Complexity, Made Easy. SIAM Journal of Computing (SICOMP), volume 45, issue 1, pages 174196, February 2016.
Q. Zhang. Communication-Efficient Computation on Distributed Noisy Datasets. Proceedings of
the 27th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 15), pages 313322. Portland, Oregon, U.S.A., June 2015.
52 of 59


D. Van Gucht, R. Williams, D. P. Woodruff and Q. Zhang. The Communication Complexity of
Distributed Set-Joins with Applications to Matrix Multiplication. Proceedings of the 34th ACM
SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 15), pages
199-212. Melbourne, VIC, Australia, May-June 2015.
E. Verbin and Q. Zhang. The Limits of Buffering: A Tight Lower Bound for Dynamic Membership in
the External Memory Model. SIAM Journal of Computing (SICOMP), volume 42, issue 1, pages
212-229, January 2013.
Note: All papers above use alphabetic ordering of authors, following the convention of theoretical
computer science.
4. Synergistic Activities



Served on program committees of 11 conferences and workshops in theoretical computer science,
databases and data mining. Ad-hoc reviewers for 11 journals and 17 conferences.
Currently supervising two PhD students. Member of three PhD student advisory committees.
Member of committee developing a course curriculum for the Computational and Analytic Track of
the Data Science Program at School of Informatics and Computing, IUB.
53 of 59
David Fisher
Biographical Sketch
Education
B.A. in Mathematics, Columbia University, May 1993,
summa cum laude.
M.S. in Mathematics, The University of Chicago, August 1994.
PhD. in Mathematics, The University of Chicago, June 1999.
Academic positions
Summer 2005 to present, Department of Mathematics,
Indiana University at Bloomington:
Assistant Professor, 2005-2008,
Associate Professor, 2008-2010,
Full Professor, since July 2010.
Spring 2004 to 2007, member of Doctoral Faculty,
Department of Mathematics, CUNY Graduate Center.
Fall 2002 to Spring 2005, Assistant Professor, Department of
Mathematics and Computer Science, Lehman College-CUNY.
1999 to 2002, Gibbs Instructor and NSF Postdoctoral Fellow,
Department of Mathematics, Yale University.
Five most relevant publications
(1) Coarse differentiation of quasi-isometries II: Rigidity for Sol
and Lamplighter groups. joint with A. Eskin and K. Whyte,
Annals of Math 177 (2013), no. 3, 869910.
(2) Global rigidity of higher rank Anosov actions on tori and nilmanifolds. joint with B. Kalinin and R. Spatzier J. Amer.
Math. Soc. 26 (2013), no. 1, 167198.
(3) Coarse Differentiation of Quasi-isometries I; spaces not quasiisometric to Cayley graphs. joint with A. Eskin and K. Whyte,
Annals of Math. 176 (2012), 221-260.
(4) Local rigidity of affine actions of higher rank groups and lattices,
joint with G. Margulis, Annals of Math. 107 (2009), no. 1,
67122.
(5) Quasi-isometric embeddings of symmetric spaces joint with Kevin
Whyte, preprint: http://arxiv.org/abs/1407.0445.
Other publications
(1) Totally non-symplectic Anosov actions on tori and nilmanifolds,
joint with Boris Kalinin and Ralf Spatzier, Geometry and Topology 15 (2011), no. 1, 191216.
(2) Quasi-isometric rigidity of solvable groups joint with A. Eskin,
Proceedings of the International Congress of Mathematicians
2010 (ICM 2010), volume III.
1
54 of 59
2
(3) Quasi-isometric embeddings of non-uniform lattices joint with
Thang Nguyen, preprint.
(4) Almost isometric actions, property (T ) and local rigidity, joint
with G. Margulis, Invent. Math., 162 (2005) 19-80.
(5) Local rigidity for cocycles, joint with G. Margulis, in Surv. Diff. Geom. Vol
VIII, refereed volume in honor of Calabi, Lawson, Siu and Uhlenbeck , editor: S.T. Yau, 45 pages, 2003.
Synergistic Activities
(1) Co-organized 17 Summer Schools, Workshops, Conferences.
(2) Invited to give mini-courses/lecture series at 9 international
conferences and summer schools since 2006.
(3) Co-organizer of various seminars, lecture series and colloquiua,
Indiana University, Fall 2005 to present.
(4) Mentor for graduate students and postdocs at IU and several
other universities. Reader for theses at University of Michigan
and Université de Valenciennes, France.
(5) Arranged screening and panel discussion of documentary Counting from Infinity on work of Yitang Zhang at IU Cinema. Campus wide math focused event.
Recent Collaborators (11)
Yves de Cornulier, Université de Paris, Orsay.
Tullia Dymarz, University of Wisconsin, Madison.
A.Eskin, University of Chicago.
T.J.Hitchman, University of Northern Iowa.
Boris Kalinin, Penn State University.
Neeraj Kashyp, Indiana University.
G.A.Margulis, Yale University.
Karin Melnick, University of Maryland, College Park.
Thang Nguyen, Indiana University.
Ralf Spatzier, University of Michigan.
K.Whyte, University of Illinois at Chicago.
Thesis Advisor:
R.J.Zimmer, University of Chicago.
Postdoctoral Senior Researcher:
G.A.Margulis, Yale University.
Graduate Students Advised/Postdoctoral Scholars Sponsored:
Irine Peng (postdoctoral mentor, 2008-2011)
Ning Yang (doctoral student, finishing this year)
Thang Nguyen (doctoral student, finishing this year)
55 of 59
Biographical Sketch
Ciprian Demeter
Work address: Department of Mathematics, Indiana University, Bloomington Rawles Hall,
831 East 3rd St. Bloomington, IN 47405 E-mail: [email protected]
Professional Preparation
Member (On leave from IU), 2007-2008, Institute for Advanced Study (Princeton)
Postdoctorate (Hedrick Assistant Professor), 2004-2007, University of California (Los Angeles)
Ph.D., 2004, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, (Mathematics)
M.S., 1999, Babes-Bolyai University, Cluj-Napoca, Romania, (Mathematics)
B.A., 1998, Babes-Bolyai University, Cluj-Napoca, Romania, (Mathematics)
Appointments
2016
, Indiana University (Bloomington), Full Professor
2011- 2016, Indiana University (Bloomington), Associate Professor
2008- 2011, Indiana University (Bloomington), Tenure Track Assistant Professor
2007-2008, Institute for Advanced Study (Princeton), Member
2004-2007, University of California (Los Angeles), Hedrick Assistant Professor
Awards and honors
• Sloan research fellowship (2009-2011)
• Continuous NSF support 2006-present
• Rothrock teaching award (2016)
Selected publications
• Proof of the main conjecture in Vinogradov’s mean value theorem for degrees higher than
three (with Jean Bourgain and Larry Guth), to appear in Annals of Math.
• The proof of the l2 Decoupling Conjecture (with Jean Bourgain), Annals of Math. 182
(2015), no. 1, 351-389.
• Breaking the duality in the return times theorem (with Michael Lacey, Terence Tao and
Christoph Thiele), Duke Math. J. 143 (2008), no. 2, 281-355
• New bounds for the discrete Fourier restriction to the sphere in four and five dimensions
(with Jean Bourgain), Int. Math. Res. Not. IMRN 2015, no. 11, 3150-3184
• Linear independence of time frequency translates for special configurations Math. Res.
Lett. 17 (2010), no. 4, 761-779
• Logarithmic Lp bounds for maximal directional singular integrals in the plane (with Francesco
Di Plinio), J. Geom. Anal. 24 (2014), no. 1, 375-416
1
56 of 59
• On the two dimensional Bilinear Hilbert Transform, (with Christoph Thiele), American
Journal of Mathematics, 132 (2010), no. 1, 201-256
• Modulation invariant bilinear T(1) theorem (with Árpád Bényi, Andrea R. Nahmod,
Christoph M. Thiele, Rodolfo H. Torres, Paco Villarroya), Journal d’Analyse Mathematique, 109 (2009), 279-352
Synergistic Activities
1. In recent years, I have discovered surprising and interesting connections between diverse areas
of Mathematics such as Harmonic Analysis, Ergodic Theory, Number Theory, PDEs, Incidence
Geometry and the theory of random Schrödinger operators. I have made my work public through
thirty research papers, and through numerous talks in various seminars and conferences.
2. In recent years, I have co-organized one section of an AMS meeting on Harmonic Analysis
and related Topics (Bloomington, April 2008), as well as a similar section at the international
AMS meeting in Alba-Iulia, Romania in 2013.
I have co-organized three summer schools with my collaborator Christoph Thiele, and another one with him and Michael Lacey.
3. I have taught at three different universities a variety of Algebra, Geometry, Calculus, Dynamics and Analysis classes. I constantly improved and adapted my teaching skills to the specifics
of each course and conveyed my students motivation and a rigorous understanding of the material. I found particularly interesting and challenging to teach Merit Workshop and small group
Active Learning classes, which gave me the opportunity to spend more time with students and
test their skills better. I have recently taught a few graduate topics classes, that served as a
training for and facilitated the recruitment of my first three graduate students, Francesco Di
Plinio, Prabath Silva and Fangye Shi.
4. I am currently co-organizing the Analysis seminar at Indiana University, Bloomington.
5. I have written a survey paper entitled A guide to Carleson’s Theorem, meant to be a gentle
introduction for a large audience to selected topics in time frequency analysis.
Graduate students and postdoctoral fellows:
I have supervised two successful graduate students, Francesco Di Plinio (currently on tenure
track at University of Virginia) and Prabath Silva (former postdoc at Caltech). I am currently
supervising another two graduate students (Fangye Shi and Dominique Kemp).
I have mentored Zubin Gautam as a postdoctoral fellow. I am currently mentoring Shaoming
Guo as a postdoctoral fellow.
2
57 of 59
Probabilistic Approaches to Computational Problems
Personnel
Russell Lyons is the leading probabilist in the Department of Mathematics and on the
Indiana University Bloomington campus (due to retirements and faculty retention issues).
His research has theoretical connections with the Algorithms and Theory groups in the
Department of Computer Science. He holds many honors, including being a Rudy Professor,
having recently given an invited talk to the International Congress of Mathematicians, and
being a Fellow of the American Mathematical Society.
Michael Larsen is the leading algebraist in the Department of Mathematics, whose work
includes studying expander graphs, a common tool in theoretical computer science. Many
advanced problems benefit from a variety of mathematical techniques, so it is important
to include experts in many areas. He holds many honors, including being a Distinguished
Professor at Indiana University Bloomington, being a Fellow of the American Mathematical
Society, and having received the E. H. Moore Research Article Prize from the American
Mathematical Society.
Ciprian Demeter is the leading harmonic analyst in the Department of Mathematics. Many
advanced problems benefit from a variety of mathematical techniques, so it is important to
include experts in many areas. He won a Sloan Foundation Fellowship. His recent work with
Jean Bourgain on the nonlinear Schrödinger equation was one of four featured current events
in mathematics at the yearly joint meetings of the American Mathematical Society in 2015.
David Fisher is the leading geometer in the Department of Mathematics. Many advanced
problems benefit from a variety of mathematical techniques, so it is important to include
experts in many areas. His research on coarse differentiation is important for understanding
metric embeddings, a common theme in this proposal. He is a Fellow of the American
Mathematical Society and has received a Simons Foundation Fellowship and a prestigious
CAREER award from the National Science Foundation.
Michael Trosset is the leading statistician at Indiana University Bloomington with expertise
in statistical learning and computational statistics. As Director of the Indiana Statistical
Consulting Center since 2006, he has years of experience fostering interdisciplinary research
on campus.
Funda Ergun is the senior member of the Algorithms Group in the Department of Computer
Science. She has experience with metric embeddings and pattern recognition, including in
streaming data. Her synergistic activities include leadership experience organizing a Pacific
Institute for the Mathematical Sciences Collaborative Research Group on Algorithmic Theory
of Networks. She brings a deep understanding of the mathematical needs of computer science
research to this project.
1
58 of 59
Qin Zhang is an Assistant Professor in the Algorithms Group and the Theory Group in
the Department of Computer Science. His research includes algorithms for streaming data
and communication complexity. He has prior research experience in the Theory Group at
IBM Almaden Research Center and at the Center for Massive Data Algorithmics at Aarhus
University. His work is closest to Funda Ergun’s. Having junior faculty involved on the
project is important both for the new ideas they contribute and for the environment they
create for new faculty and postdoctoral hires on this proposal. They can serve as mentors
close in age and experience to the postdoctoral fellows and cluster hire faculty members.
Yuan Zhou is an Assistant Professor in the Algorithms Group in the Department of Computer
Science. His research includes approximation algorithms, complexity theory, and satisfiability
theory. His work is closest to Russell Lyons’. Having junior faculty involved on the project is
important both for the new ideas they contribute and for the environment they create for
new faculty and postdoctoral hires on this proposal. They can serve as mentors close in age
and experience to the postdoctoral fellows and cluster hire faculty members.
Martha White is an Assistant Professor in the Intelligent Systems Group in the Department
of Computer Science. Her research includes work on temporal-difference learning algorithms
and metric learning for determining the best distance function to use for a target task.
Her work is closest to Michael Trosset’s. Having junior faculty involved on the project is
important both for the new ideas they contribute and for the environment they create for
new faculty and postdoctoral hires on this proposal. They can serve as mentors close in age
and experience to the postdoctoral fellows and cluster hire faculty members.
2
59 of 59