Current Research Interests
My research approach has always been driven by trying to understand how
technology trends give us new approaches for overcoming limits to
existing systems architectures. Today, the challenge is to develop,
deploy, and evolve very large scale applications that typically execute
across thousands of network-connected servers, possibly within a single
datacenter, or distributed around the world. The applications of
interest are large-scale Internet sites like Amazon or Ebay, or
Internet search like Google, or globe-spanning content distribution
like Akamai. My interest is not so much the specific application, but
rather the system architecture that makes it possible to develop such
applications orders of magnitude faster and with a whole lot less
people than it takes to run them today.
The key technology enablers we are investigating is statistical learning theory, a
powerful mathematical framework that support a quantitative approach to
characterize the behavior of systems, and programmable network elements,
essentially network routers with the ability to "deep classify" packets
and flows by examining packet contents beyond IP headers. Our technical
approach builds on the paradigm of Observe
network behavior, Analyze it
using statistical methods, and Act upon
it to control network behavior.
Today, edge services are deployed not by shrink-wrap software, but by
plug-in network appliances. Essentially these are routers with service
extensions, providing protocol-aware packet and stream classification
at line speeds, invocation of actions based on these, and policy-based
routing. Packet contents may be modified, delayed, or filtered based on
preferences, context, or policies specified by administrators. New
devices are now available—Programmable Network Elements (PNEs)—giving
even more control over network processing. Using them as a foundation,
the OASIS Project is
investigating a comprehensive network behavior observation
infrastructure for enterprise networks that collects observations—about
protocol types, packet sizes, source and destination addresses, and
numerous other attributes of packets and their flows—from points within
the enterprise—server-side, Internet-side, and access-side—to determine
correlations and to test their causality. For example, if two
attributes correlate statistically, we can test causality by using the
PNEs to reduce or increase one attribute of the traffic flow to verify
whether the second attribute follows. Observation collection and guided
causality experimentation feed our development of statistical models to
discriminate “normal” versus “abnormal” behaviors. In turn, this
enables management algorithms to invoke the actions supported by the
underlying PNEs to block “bad” traffic, slow “suspect” traffic, and
protect “good” traffic when network and system performance is under
stress. Successfully developing such an infrastructure and its
associated algorithms, and interfacing them to higher middleware
layers—in collaboration other RADLab
researchers, will ultimately enhance the reliability and dependability
of distributed applications deployed in enterprise networks and
large-scale datacenters.
Last updated: 27 December 2005, Randy H. Katz, randy@cs.Berkeley.edu