As Ethernet moves
from being an enterprise-centric technology into a WAN
and access technology, designers need to bring
carrier-class capabilities to Ethernet designs. And one
of the most important characteristics of a carrier-class
technology is the implementation of operational,
administration and maintenance (OAM) management
capabilities.
While management capabilities are available for
enterprise-class Ethernet networks, these same
capabilities have not been available for WAN and access
Ethernet networks. Recognizing this need, the IEEE 802.3
CSMA/CD Working Group (the Ethernet standards body),
through its Ethernet in First Mile (802.3ah) task force,
defined a set of OAM capabilities for Ethernet
links.1. These capabilities are introduced
gracefully to ensure backward compatibility with
existing Ethernet implementations, while still providing
advanced monitoring functionality as required in public
networks.
The OAM work of the IEEE 802.3ah task force addresses
three key operational issues when deploying Ethernet
across geographically disparate locations: link
monitoring, fault signaling, and remote loopback. Link
monitoring introduces some basic error definitions for
Ethernet so entities can detect failed and degraded
connections. Fault signaling provides mechanisms for one
entity to signal another that it has detected an error.
Remote loopback, which is often used to troubleshoot
networks, allows one station to put the other station
into a state whereby all inbound traffic is immediately
reflected back onto the link.
|
In this article, we'll take a
close look at the OAM capabilities defined under the
802.3ah EFM standard. During the discussion, we'll
describe the new IEEE 802.3ah OAM functions and provides
some additional details about features and
implementation.
A Glaring Deficiency
Over the past few
years, Ethernet has started to migrate from an
enterprise-class technology to a significant technology
for future carrier deployments. As this migration has
progressed and more and more Ethernet has rolled out in
the public network, a glaring deficiency has been
identified. When compared with more traditional carrier
network technologies, Ethernet lacks the integrated
management capabilities inherent in these other
technologies.
Traditionally, Ethernet networks have been managed
with IP/SNMP. In an enterprise, this is a perfectly
acceptable solution. However, as Ethernet migrates to
public networks, IP/SNMP provides several drawbacks.
First, if the Ethernet layer is not operating properly,
IP/SNMP may not even be available. This implies that
certain management capabilities are required within the
Ethernet layer itself. Second, requiring IP network to
run just to manage the network could be an unacceptable
overhead in term of the complexity of the equipment and
operational carrier network.
For example, Sonet uses overhead bytes for detecting
wiring problems, for monitoring the performance of
individual segments and paths, for signaling remote
faults, and for out-of-band management
communications.3 Likewise, ATM networks
include integrated management capabilities such as
virtual channel (VC) and segment monitoring to ease the
operational burden of large diverse deployments. Even
the Internet Protocol suite (TCP/IP) has its own
management protocols with ICMP, which provide utilities
such as ping and traceroute for network diagnostics and
troubleshooting. In all of these technologies, the
network includes its own operational signaling and
troubleshooting functions to simplify managing large,
complex deployments.
The recent success of Ethernet as a carrier
technology has identified the OAM shortcomings as a
major impediment to more and larger deployments. As
such, OAM for Ethernet networks has become a major
initiative for both carriers and vendors.
Within the international realm of standards
consortia, several bodies have begun major Ethernet OAM
initiatives. In general, these initiatives can be
classified into three levels: the data link layer, the
transport layer, and the services layer. The data link
layer is the collection of individual Ethernet segments.
The transport layer is the collection of forwarding
entities and interconnected segments that form a
multi-hop network Ethernet network, and provide basic
connectivity between devices. The service layer then
uses the network layer to multiplex a variety of
services (for example, VLANs2) on the
underlying network.
Different standards organizations are developing OAM
mechanisms for the various network layers. Specifically,
the partitioning of the OAM functionality falls into
three organizations: the Metro Ethernet Forum (MEF), the
International Telecommunications Union (ITU), and IEEE
802 committee. As shown in Figure 1, the MEF and
ITU are teaming together to define OAM capabilities for
the services layer. The IEEE 802.1 committee is focused
on the transport layer while the IEEE 802.3ah committee
is targeting the data-link layer.
Figure 1: Various forums defining specifications
for the data link, transport, and service layer in
carrier-class Ethernet networks.
The collective work of these standards bodies will
result in a complete Ethernet OAM solution. In this
paper, however, we'll concentrate on providing details
of the completed IEEE 802.3ah OAM specification.
Architecture for 802.3 OAM
Architecturally,
the IEEE 802.3ah EFM specification defines OAM as an
optional sublayer just above the Ethernet media access
controller (MAC). The OAM sublayer consists of a parser
block, a multiplexer block, and a control block. All
three blocks communicate with an OAM client, which is
the "brains of the IEEE 802.3ah management architecture.
The relationship between the blocks, the OAM client, and
the MAC is shown in Figure 2.
Figure 2: Diagram showing the OAM architecture
defined under the 802.3ah specification.
When OAM is present, two connected OAM sublayers
exchange OAM protocol data units (OAMPDUs). The OAMPDUs
are distinguished from other frames though a combination
of the destination MAC address and Ethernet type/length
field. The parser detects incoming OAMPDUs and passes
them to the OAM control block and eventually the OAM
client.
Ethernet OAM shares the same bandwidth as general
application traffic. To limit its impact on the usable
bandwith for the application, it is limited to at most
10 frames/sec. The multiplexer block mentioned above
ensures that OAMPDUs have high priority and limited
bandwidth.
Through the 802.3ah architecture, OAM is independent
of the physical layer and works over any kind of
Ethernet link. The separation of the OAM client and
simple parser functionality allow both hardware and
software implementations. It is hoped that vendors will
quickly implement software versions of Ethernet OAM and
release it into already deployed platforms, and that
hardware versions may be integrated into future Ethernet
MACs for universal availability.
To maintain backward compatibility with existing
Ethernet equipment, the implementation of OAM is always
optional. The lack of OAM or the failure of OAM does not
prevent an Ethernet port from becoming operational.
As a general rule, OAM uses periodic advertisements
to maintain an OAM session and convey information on an
Ethernet link. Sessions are maintained by sending a
minimal number of OAMPDUs per second, and such
advertisements may contain management or fault signaling
information. This principle of unacknowledged
advertisements is used in most of OAM operation, with
the exception of a request/response principal for
obtaining information on the far end of an Ethernet
link.
Understanding the OAMPDU
Under the IEEE
802.3ah architectures, OAMPDUs are an integral
component. These packet data units are normal Ethernet
frames that use a specific multicast destination address
and EtherType. Figure 3 shows the format of an
OAMPDU.
Figure 3: Diagram showing an OAMPDU
format.
The multicast address and EtherType identify the
frame as a slow protocol frame. The standard defines
several slow protocols; one example is link aggregation
control protocol (LACP). 6 The different slow
protocols are identified through the slow protocol
subtype, where subtype 3 is designated for OAM.
Utilizing the slow protocol MAC address, OAMPDUs are
guaranteed to be intercepted by the MAC sublayer and
will not propagate across multiple hops in an Ethernet
network, regardless of whether OAM is implemented or
enabled.
Most information carried by OAMPDU are encoded using
type-length-value (TLV) format. The first octet (or
byte) indicates the type. This type, collorary to a
variable type in a programming language, is also used to
let OAM client knows on how to decode the bytes
containing the information. The next octet carries the
length of the information. This length is typically used
for skipping the information when the type cannot be
interpreted by the OAM client. Following type and
length, one or more octets encode the information
itself.
Discovery
Now that we've discussed the key
blocks defined under the IEEE 802.3ah OAM spec, let's
take a deeper look at the key operations defined under
the protocol. We'll start with discovery.
Discovery is the first phase of the IEEE 802.3ah OAM
protocol, and relies on what are termed information
OAMPDUs. During discovery, information about OAM
entities capabilities, configuration, and identity are
exchanged.
One important aspect of the IEEE 802.3ah OAM spec is
that an OAM entity may be in what is termed active or
passive mode. The difference between the modes is that
an active-mode device can exert more control on its peer
than a passive-mode device. For example, an active-mode
OAM entity can put a passive-mode OAM entity into
loopback mode, but not vice versa. These modes are
intended to support public network deployments, where
carrier side equipment has a superior relationship to
the customer premise equipment.
During discovery, configuration and capability
parameters are also exchanged. Nodes can use the
configuration and capabilities of the peer entity to
determine if an OAM relationship should be instantiated,
and with what functionality. For example, a node may
require that its partner support loopback capability in
order to be accepted into the management network. The
policies used to determine if a peer's configuration and
capability are acceptable are considered policy
decisions and not specified in the standard, but provide
the flexibility for policy based management associations
in a carrier network.
Note: The OAM protocol requires that frames be
exchanged with a minimum frequency to maintain the
relationship. If no OAMPDUs are received in a 5-second
window, the OAM peering relationship is lost and must be
re-established to perform OAM functions.
Remote Failure Indication
A flag in the
OAMPDU allows an OAM entity to convey severe error
conditions to its peer. The severe error conditions are
defined as:
- Link Fault: This flag is raised when a
station stops receiving a transmit signal from its
peer.
- Dying Gasp: This flag is raised when a
station is about to reset, reboot, or otherwise go to
an operationally down state.
- Critical Event: This flag indicates a severe
error condition that does not result in a complete
re-set or re-boot by the peer entity .
One of
the most critical problems in an access network for
carriers is differentiating between a simple power
failure at the customer premise and an equipment or
facility failure. Dying gasp, if implemented correctly,
provides this information by having a station indicate
to the network that it is having a power failure. More
details on the failure may be included in additional
event information conveyed in the frame.
Since the above conditions are severe, OAMPDUs that
carry information concerning these conditions are not
subject to normal rate limiting policy. Such frames can
be sent immediately and repeatedly as they indicate
critical failure information.
Remote Loopback
An OAM entity can put its
remote OAM entity into loopback mode using a
loopback-control OAMPDU. When an OAM entity is in
loopback mode, every frame received is transmitted back
on that same port except for OAMPDUs and pause frames
(belong to MAC control sublayer below OAM sublayer).
The loopback command is acknowledged by responding
with an information OAMPDU with the loopback state
indicated in the state field. OAMPDUs continue to be
exchanged during loopback mode, only data frames are
looped back.
Remote loopback is most useful as a diagnostic tool,
where it can be used to isolate problem segments in a
large network. When in loopback, metrics such as delay,
throughput, bit error rate (BER), and jitter can be
calculated to determine the overall quality of a
connection.
Event Conditions
Ethernet OAM also defines
a set of standard event conditions that Ethernet links
should monitor in normal operation, and if detected,
should be signaled to a peer entity. These conditions
reflect a degraded, but not yet inoperable, Ethernet
connection. These conditions include threshold-crossing
alarms on the frequency of symbol errors and frame
errors.
When a pre-configured error threshold is crossed, an
event notification OAMPDU is sent to the OAM peer.
Additionally, as is done in more traditional SONET and
DSL networks, an errored second was defined to indicate
a period where an unusual number of errors occurred, and
thresholds can be set the number of errored seconds over
a specific period.
Since IEEE 802.3ah OAM does not provide a guaranteed
delivery of any OAMPDU, the event notification OAMPDU
can be sent multiple times to reduce the probability of
a lost notification. Each event notification OAMPDU is
tagged with a sequence number to recognize duplicate
event notifications.
MIB Variable Retrieval and Extensions
The
IEEE 802.3ah Ethernet OAM specs also defined a generic
mechanism for one OAM entity to query another for the
value of any management information base (MIB) variable.
MIB variables include all performance and error
statistics maintained on an Ethernet link. This
capability provides a very generic monitoring capability
for one station to monitor any parameter on another for
performance or error detection.
Vendors and organizations can extend Ethernet OAM
capabilities though organization specific OAMPDUs and
organization specific TLVs within the standard OAMPDUs.
Vendors can use these extensions to implement extra
events; to include additional information during
discovery; or even to add a completely proprietary OAM
protocol to the standard operation. By virtue of TLV
format, unrecognized extensions can be skipped by non
supporting node.
Wrap Up
The Ethernet OAM work of IEEE
802.3ah is the first step toward including inherent
management capabilities in Ethernet equipment for public
network deployment. The protocol described in this
document provides utilities for monitoring and
troubleshooting Ethernet links, and can feed into larger
management frameworks for a fundamental OAM capability.
References
- IEEE, IEEE 802.3ah Draft P802.3ah/D3.3,
"Amendment: Media Access Control Parameters, Physical
Layers and Management Parameters for Subscriber Access
Networks," April 2004.
- IEEE, IEEE Std 802.1q, "Virtual Bridged Local Area
Network," December 1998.
- R. P. Ballart and Y-C. Ching, "SONET: Now It's The
Standard Optical Network," IEEE Communication
Magazine, May 2002.
- T. D. Nadeau, et al., "OAM Requirements for MPLS
Networks," draft-ietf-mpls-oam-requirements-02.txt,
Jun. 2003, (work in progress).
- Metro Ethernet Forum, "Metro Ethernet Network: A
Technical Overview," October 2002.
- IEEE, IEEE Std 802.3, "Carrier Sense Multiple
Access with Collision Detection (CSMA/CD) Access
Method and Physical Layer Specifications," March 2002.
- J. Case et. al., "A Simple Network Management
Protocol (SNMP)," IETF RFC 1157, May 1990.
About the Authors
Stephen
Suryaputra is a senior software engineer for
Hatteras Networks. He received a Sarjana Teknik (B.S. in
Engineering) from Sekolah Tinggi Teknik Surabaya,
Indonesia and an M. S. in Electrical Engineering from
University of Southern California. Stephen can be
reached at ssuryaputra@HatterasNetworks.com.
Matthew Squire is the chief technology officer
for Hatteras Networks. Matthew holds more than fifteen
patents and has more than 10 in the pipeline. He chaired
the OAM sub-taskforce in the IEEE 802.3ah Ethernet in
the First Mile working group and has also served
editorial positions in the Metro Ethernet Forum and ANSI
T1 committee. Matthew can be reached at msquire@hatterasnetworks.com.