TDMA vs. Token Ring (Annex L) for STANAG 5066
Token Ring and TDMA (Time Division Multiple Access) are the primary choices for enabling multiple nodes to share an HF link with high utilization. Token Ring has been standardized as Annex L of STANAG 5066, and there is a placeholder in Annex M for TDMA. This paper analyses the relative merits of TDMA and Token Ring. It concludes that Token Ring is the better approach for HF in most situations, and that current NATO standardization effort should be directed towards improving Token Ring operation rather than adding a new Adaptive TDMA standard.
Isode whitepapers are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
STANAG 5066 MAC Layers
STANAG 5066 provides a link level service over an HF modem, providing reliable data transfer (ARQ), broadcast/non-ARQ, application multiplexing and integrity checking. All of these services sit over a Media Access Control (MAC) layer that controls access to the HF Modem. STANAG 5066 ed3 defines two MAC layer options, with a placeholder for a third:
- Annex K defines a Carrier Sense Multiple Access (CSMA) mechanism, based on "listen before transmit".
- Annex L defines a Wireless Token Ring mechanism.
- Annex M is a placeholder for a TDMA mechanism.
CSMA (Annex K)
Annex K is a good choice for low utilization networks. The Isode whitepaper [Slotted Option for STANAG 5066 Annex K (S5066-EP6)] defines extensions to Annex K which reduce risk of collision and enables faster switching times. This makes Annex K based CSMA viable for higher levels of traffic, but where utilization requests are usually less than network capacity. Under higher load, CSMA has a relatively slow switching time and the time allocation mechanism in S5066-EP6 is not "fair".
This paper is primarily focused on Token Ring and TDMA, which are the primary choices when typical utilization requests are larger than network capacity.
Wireless Token Ring (Annex L)
Token ring provides a mechanism for a set of nodes to communicate. There is a logical token and only the node that holds the token can transmit. The node with the token transmits for as long as it wishes. When it has finished transmitting, it will transmit the token to the next node in the ring. Then the next node transmits, and nodes in the ring can transmit in turn.
The basic model is very simple, although a number of details such as ring formation and adding nodes add complexity.
A token ring will usually be used with three or more nodes. It may be used with two, although a simpler single CAS-1 alternating link is likely to be preferable. For comparison with TDMA, alternating CAS-1 has similar characteristics to Annex L.
In a TDMA system, each node has one or more time slots and each node only transmits in its own slots. A basic TDMA system will have fixed slot assignment.
To optimize performance for many situations, the TDMA needs to be adaptive. There is no standardized HF scheme, so this paper assumes a scheme of the style used by MARLIN (STANAG 4631), which works by having a repeating cycle with a number of slots:
- In each cycle, each node has one "base" slot that it will always use.
- There is an unassigned slot that nodes without slots can use to request a slot.
- Each node has a number of additional slots that the node can assign to other nodes. This enables nodes with high traffic load to request and be allocated additional slots.
This seems a reasonable adaptive approach and there is no clear reason to suggest than an alternative approach would be better for HF.
Metrics for Comparison: Throughput & Latency
There are two key link level metrics for a link level service to support higher layers: throughput and latency. Applications will usually want to optimize for one or the other or a balance. For light traffic, low latency is easy to achieve with either mechanism:
- It comes naturally with TDMA, provided that each node has its slot evenly spread over the cycle.
- For Annex L, each node makes a short transmission in turn, which keeps latency low.
Optimizing throughput, and dealing with situations where there is a mix of requirements for high throughput and low latency is much harder. Therefore this paper is focused on optimizing for throughput and mixed requirements.
The following table shows a range of HF and newer Wideband HF speeds, showing what can be transferred in 5 minutes at full link utilization.
|HF Speed||Notes||5 Minute Volume||Typical Information|
|75 bps||Slowest Narrowband HF speed||2.8 kByte||Small message|
|1,200 bps||Typical Narrowband HF speed||45 kByte||Larger message or very small document|
|9,600 bps||Top Narrowband HF speed||720 kByte||Small document|
|64,000 bps||Typical Wideband HF speed||2.4 MByte||Medium document or small photo|
|240,000 bps||Top Wideband HF speed||9.6 MByte||Good consumer quality photo|
The information that can be effectively sent over HF remains very constrained, and transfer of quite modest information can take a significant time. It is therefore important that care is taken to optimize throughput for all ranges of HF speed. Given this, protocol choices that impact performance by even relatively small amounts (e.g., 10-20%) should be considered carefully.
Turnaround Time & Transmission Time
Turnaround time is often long for deployed HF systems. The whitepaper [Reducing Turnaround Times in STANAG 5066] explains how HF turnaround time can be reduced to 150-200 milliseconds. This sort of limit is going to apply to both TDMA and Annex L, as the limits are on HF preamble times and Power Amplifier turn-on times.
If you make a 1 second transmission, you will get a 15-20% overhead from this overhead, which seems high if you are optimizing for throughput. Ten second transmissions would lead to more reasonable overheads.
In the common case of data transfer between two nodes, one node will make a long transfer and then the other will acknowledge data transferred. This acknowledgement will need two turnaround times and time to transfer data, which will need to be at least 100 milliseconds for a ultra-short Interleaver. In this scenario you are getting at least 500 milliseconds between the long transfers. Here, using a 1second transmissions would have an overhead of 50%; increasing to 20 second transmissions would reduce this overhead to 2.5%.
It is generally recognized that longer interleavers give better performance for data over HF. The paper "Investigating the Effects of Interleaver Size and FEC Code Constraint Over the-Air for the US MIL-STD-188-110C Appendix D WBHF Waveforms" (J.W Nieto & W.N. Furman, HFIA, York HFIA, September 2012) provides useful measurements to quantify this.
That paper provided a number of results and showing how the BER (Bit Error Rate) and PER (Packet Error Rate) changed with interleaver size over the same signal. The PER results give values that can be used directly to evaluate STANAG 5066 efficiency and the following table shows results for selected runs.
|Data Loss on Run|
|Ultra Short (100 milliseconds)||1.7%||5.7%||9.2%||16%||35%||31%||69%|
|Very Short (300 milliseconds)||1.2%||3.9%||6.4%||14%||29%||27%||73%|
|Medium (3 seconds)||0.06%||0.2%||1.8%||7.9%||2.5%||31%||85%|
|Long (10 seconds)||0%||0%||0.4%||1.1%||0.28%||32%||89%|
Four of the runs were error free for all interleavers. The majority of the runs followed the patterns of columns 1-4. These four are shown as typical runs, showing the spectrum of error rates for ultra-short interleavers. There is a clear pattern here, continued into column 5, which is reflected in all of the runs. Columns 5, 6 and 7 are unique runs.
It is often considered that for optimal throughput, it will be sensible to run at a frame error rate in the range 20-50%. If this is the case, most of the runs recorded are at rates where it would be likely desirable to transmit at a higher rate. In columns 1-5, there is clear throughput benefit and associated latency benefit to using a long interleaver. In columns 4 and 5, which give best information on behaviour as a speed closer to the limit is chosen, the difference is significant.
Column 7 reflects conditions where it is clear from the frame loss, that a slower speed should be chosen. If transmission is made in these conditions, shorter interleavers give better throughput. This is perhaps because shorter blocks are more likely to find gaps where transmission is acceptable.
Column 6 is quite flat across the interleaver choices. It is unclear why this is happening.
These numbers suggest that useful throughput performance benefits will be achieved from using a long interleaver and this can help with latency, particularly where a reasonably aggressive speed is chosen to optimize throughput.
This analysis is based on a single set of measurements for an NVIS (Near Vertical Incidence Skywave) and different results are quite possible for other scenarios and for further measurements in the same scenario. We would guess that this analysis will apply to many Skywave situations. Further measurements of the type undertaken by Harris on different data sets seem highly desirable.
It is plausible that some Skywave HF scenarios will have better performance with short interleavers, where some of the shorter blocks being transferred when signal is good will give a better result than transferring a longer block. This is seen in column 7, although here a lower transmission speed seems desirable, which would likely change things.
Groundwave links have less variance than Skywave and it is expected that long interleavers will not give such significant benefits over short ones.
Comparing TDMA and Annex L
Some detailed comparison can now be made.
Some useful basic comparison between TDMA and Token Ring is made in Performance of the HF Token Protocol paper. This shows that for balanced load, Token Ring gives slightly superior performance to TDMA noting that this may be offset by Token Ring setup.
However, unbalanced loads are common. Token ring will adapt naturally to unbalanced load. Adaptive TDMA is needed to support unbalanced loads, and this adaption will increase the overhead of TDMA relative to Token Ring.
Selecting Transmission Lengths for Throughput with Long Interleaver
When choosing an interleaver length for transmission, the sender will consider the amount of data it has to send. If it has a large amount to send an optimal interleaver choice will be long, which is ten seconds. The minimum number of blocks to be transmitted may be constrained by the COMSEC configuration, as discussed in [Reducing Turnaround Times in STANAG 5066]. If turnaround time has been reduced as discussed in this paper, there will be little benefit to be gained in transmitting for more than 30 seconds. There is also the acknowledgement to be considered. Here a much shorter transmission with a short interleaver can be used, ideally at a slow speed to minimize risk of data loss.
When using Annex L, the sender has full control of transmission time. This enables the sender to choose the optimum interleaver size and number of blocks to send, and then to transmit this efficiently.
With Adaptive TDMA, things are more complex as you need to fit transmissions into TDMA slots. The difficulties here are shown in the following example.
Consider a TDMA system, which has three nodes and seven slots of 1 second each. Node 1 has slots 1 and 4; Node 2 has slots 2 and 5; Node 3 has slots 3 and 6; Slot 7 is spare. If node 1 is transmitting data to node 2, it can be allocated slots 5 and 6, so it will have slots 1,4,5,6, with node 2 using slot 2 to return data on.
A number of difficulties can be seen:
- The transmitter ends up with two slots of different lengths, which is sub-optimal, particularly if you are looking to use longer interleavers.
- Idle slots are still "full length". In Annex L, if you are simply passing the token or sending an ack, you can reduce transmission length and reduce speed for safety. This increases overhead.
- There is difficulty fitting interleavers. TDMA base slots are going to be the same length, and need to be kept reasonably short to support low latency. Interleavers only have a limited number of lengths. In the above example, a medium interleaver will be slightly too short to fit into the 2.8 second transmission. There is the choice to waste space in the slot or use a shorter interleaver and take performance hit.
This suggests that Annex L is going to give significantly better performance. The key is that optimum transmission length can be selected, without having to juggle with a TDMA framework. This is a key issue, given the clear performance superiority of longer interleavers on some types of link.
Where short interleavers do not significantly impact performance, such as Groundwave links, a TDMA system can be designed with a specific interleaver length in mind for the basic slot. This can be tuned so that single slot usage is efficient. Two inefficiencies still remain in a TDMA scheme in comparison to Annex L:
- Where there is little data to be transferred, some of the slot is wasted.
- When slots are combined (which remains desirable to reduce and share overhead), fixed interleaver lengths can still lead to a situation where a slot cannot be completely filled.
Fast Rate Switching
The previous two sections assumes a model where data rate selection changes relatively slowly. An alternate model would be to have very short transmissions and to change rate quickly. This choice was explored in an Isode paper presented at Nordic HF 16 "Optimizing applications and data links for HF radio".
That paper concluded that the pragmatic and optimal approach was to select data rate based on SNR variations over the previous 2-4 minutes. It was noted that if you could change data rate every second, that this would gain about 10% performance over a flat rate selection. However, you will lose more than10% performance from the waveform overheads if you switch speed that quickly. Thales have subsequently demonstrated that data rate switching at 1.5 second intervals is possible, but comparative performance data is not available.
Token Ring timings are unpredictable, whereas TDMA timings can be predicted. There are situations where the predictability of TMDA may be beneficial. For example if you want to make regular soundings as part of Automatic Link Maintenance (ALM), you can co-ordinate this with TDMA slots.
Token Loss & Very Poor Networks
An operational issue that has been noted with Annex L (and any token ring scheme) that does not impact TDMA is token loss. When the token gets dropped, communication stops until the token has been recovered. There may also be issues in larger Token Rings where there is only partial connectivity between nodes. Operational reports suggest that this is a slow process with Annex L. Slow token recovery is not an inherent problem with token ring and it may be important to extend Annex L to provide faster token recovery, to mitigate the effects of token loss.
TDMA schemes are more resilient in very poor conditions. With a good adaptation scheme for assigning spare slots to nodes, TDMA should be robust, noting that adaption will generally be slower in poor conditions.
It may make sense to standardize a TDMA scheme in STANAG 5066 to address poor conditions when token rings cannot operate in a stable manner.
Token Ring and TDMA schemes can operate fixed speed and variable speed. Variable speed requires crypto bypass with standard COMSEC architecture, which means that some deployments choose (or are required) to operate fixed speed.
Both Token Ring and TDMA will gain from variable speed. You can:
- Adapt to varying HF conditions.
- Adapt to varying data load.
- Optimize for latency by transmitting slower and minimizing loss.
- Optimize for throughput by transmitting faster (at the cost of latency).
- Minimize risk of token loss by transmitting tokens slowly (Token Ring only)
Annex L is fixed speed, and there seems clear benefit to extending to variable speed.
Guarantee of Low Latency
The slotted nature of TDMA ensures that each node gets regular turns to transmit. This means that low latency is always an available option to all nodes.
With Annex L, low latency is straightforward under low load situations. It can have lower latency than TDMA due to the possibility of shorter transmissions. However, if some nodes have high volumes of data to transmit and transmit for long periods of time, they will reduce latency for all nodes.
Where there is mixed load, desirable behaviour varies. For example if a photograph needs to be transferred at FLASH priority, it makes sense to use long transmissions to transfer this as quickly as possible and delay traffic such as real time chat where low latency is desirable. However, if there was FLASH priority real time chat activity or if the photograph was lower priority, it would make sense to transfer the photograph in more smaller transmissions so that there was reduced impact on the real time traffic. Mixed load is going to need trade-offs, and priority will help assess this.
It is also helpful for a STANAG 5066 server to understand the Quality of Service (QoS) requirements of the various applications being supported, in particular latency and throughput requirements to help make sensible trade-offs.
Current practice in STANAG 5066 is to use the maximum 127.5 second transmission whenever needed. Given typical long turnaround times, this is a sensible strategy to optimize performance. When you move to short turnaround times, as described in [Reducing Turnaround Times in STANAG 5066], there is going to be little throughput performance gain in transmitting for more than 20-30 seconds, so the problem is somewhat reduced.
If guaranteed latency is important, you can constrain maximum transmission times for Annex L. This will guarantee latency at the cost of some throughput, noting that throughput is still expected to be superior to TDMA.
In order to achieve optimum behaviour, Annex L would need to be extended so that nodes exchanged information on current traffic type and priority, and this could be used to control node transmission length selection.
An interesting scenario is where there a large number of nodes and typical transmission is between one pair of nodes. There is a significant overhead to keeping the idle nodes in the loop. This could be overcome to some extent in Annex L, by intelligent use and adaptation of transmit order.
Although CSMA has longer switching times in general, Annex K plus S5066-EP6 will adapt quickly to optimize for this scenario. This suggests that there may be certain high load situations where Annex K gives the best performance.
Both Annex L and TDMA solutions can work well for low latency deployments with light load.
When optimizing for high throughput on Skywave, it is expected that longer interleavers will often give significant performance benefit. Use of longer interleavers with Annex L is straightforward, as the sender has full control of transmission time. The interaction of TDMA timings and longer interleavers is awkward and in general performance loss needs to be accepted. This is a significant downside of TDMA, and there are no benefits that compensate for this.
There are other scenarios, particularly Groundwave, where there is not substantial benefit to Annex L. In these scenarios there are smaller Annex L performance benefits.
It is possible that token loss could significantly impact performance in some scenarios, and that TDMA would give better performance.
There are further identified scenarios where CSMA (Annex K + S5066-EP6) could give superior performance.
Enhancements to Annex L
In making this analysis, three potential enhancements to Annex L have been identified:
- Variable speed. It is clear that this will give useful performance and resilience improvements.
- Sharing of QoS information. This can help tune performance where there is a mix of high throughput and low latency requirements, where traffic can be prioritized.
- Improved recovery from Token loss may be helpful.
This analysis concludes that in good conditions Annex L Token ring has superior characteristics to a TDMA solution for HF for Skywave and that there are no clear over-riding benefits of either approach for Groundwave. It has also identified some improvements that could be made to Annex L.
However, we do not have sufficient information and experience to be certain that this broad conclusion applies in all cases.
TDMA will be a good choice in very poor conditions with high load. It may make sense for NATO to work on a TDMA MAC layer for STANAG 5066 to address these conditions.
Isode recommends to NATO that current focus should be on Annex L and to gain operational experience. It is anticipated that this will lead to standardization developments of Annex L. This experience might also identify situations which would justify standardization work on an Adaptive TDMA MAC layer (Annex M placeholder).