Tech Portal

Tech Portal

Performance Technologies: Zero Copy UDP vs RDMA/RoCEv2

Performance systems can be comprised of many 25GigE cameras talking to a few servers with multiplexing and PTP features utilized by low cost switches.

Performance systems can also be comprised of a lower numbers of 100GigE cameras talking to a few servers with the same multiplexing and PTP features utilized by low cost switches.

Performance systems can also be comprised of a large number of 5GigE or even GigE cameras talking to a single server with multiplexing and PTP features utilized by even lower cost switches. We find that this is a pain point for manufacturers of lower speed cameras and so we leverage our expertise in high performance systems to provide best in class system density along side of lowest system cost with the introduction of the Emergent Eros 5GigE cameras.

Supported NICs

In addition, to powerful processing engine network interface cards (NICs) like the AMD Alveo cards and Emergent’s own FPGA-based NICs for additional functionality beyond standard NICs, we pioneered the use of Mellanox NICs for use with performance camera systems. We now have support for equivalent Broadcom NICs for addressing cost sensitive markets. 100GigE, 25GigE, 10GigE are all supported noting that 5GigE and lower are also supported to provide the same performance benefits for medium to high camera count systems through the use of low cost switches. With the high performance processing options available in GPUs and FPGA cards, one can create the lowest cost and highest density system in the industry.
supported nics

GigE Vision Implementation

We will now take a deeper dive to understand what one should look for in a performance GigE Vision implementation. This short animation illustrates the process of splitting GigE Vision network packets into images. Headers, Leaders, and Trailers get consumed by a control process while the image portions end up in a contiguous memory buffer. When software is used for this process, the whole packet is written to memory and then the image portions need to be read out of memory and written back into another memory location in a non-fragmented (or contiguous) manner. This process can be done in software which costs 3x the memory bandwidth or it can be done by the cards header splitting features for optimal performance. Conventional GigE Vision and TCP are both examples of low performance process. And don’t be fooled by guaranteed transfer mechanism claims of some TCP implementers which is simply meant to say that if you receive a frame that this frame will be complete without any corruption. This is not a guarantee that you will not drop frames. In all senses, TCP is a non-starter for performance applications and amounts to little more than marketing noise.

This short animation illustrates the triple memory bandwidth usage of a system that does not utilize a zero-copy (or header splitting) technology. A system like this can result in data loss as memory bandwidth is exhuasted. Data loss occurs when the buffer in the network card overflows when the CPU and memory do not permit further transfers. This, incidentally, is what RDMA proponents compare with when discussing pros and cons of traditional GigE Vision and RDMA which is very misleading as this is the worst case example.

This short animation illustrates the zero-copy memory bandwidth usage of an optimized GVSP based system using zero-copy. We see in this animation that data flows freely and reliably thanks to zero-copy and system optimization.
The same approach is also used for the ST2110 streaming protocol for the massive media and entertainment market where zero loss is also critical. RDMA/RoCEv2 also supports zero-copy transfer which is its primary benefit. Some will continue to claim THIS is now the guaranteed transfer mechanism which is again false. At high speed, proper system design and margining is important to create a zero loss system for any protocol but we note that zero-copy is the critical first step.

This short animation illustrates the zero-TRANSFER process using GPU Direct which completely bypasses the memory and utilizes only the PCIe end-points of the CPU for 0% memory bandwidth and 0% CPU utilization.

This short animation illustrates the FPGA card process which completely bypasses the memory and CPU for 0% memory bandwidth and 0% CPU utilization since all processing is done on the FPGA card.

This slide highlights the point about multicast technologies. GigE Vision+GVSP is currently the ONLY protocol which supports this fundamental networking feature. Other standards will be quickly dismissed in applications requiring efficient redundancy and distributed processing.

multicasting

This short animation below illustrates how the data from one camera can be sent to multiple devices for parallel processing. A simple use case could even be where a seperate system is used for display while one system performs intense calculations.

Convergence of the Interfaces

This slide is an illustration of how the proposed or ratified changes are converging the interface standards. USB remains mostly the same but is a point to point technology. CXP has adopted the Ethernet physical layer converging towards GigE Vision. GigE Vision+RDMA and GigE Vision+TCP (if and when ratified) is converging to CXP and USB as a point to point technology. (perhaps 2 years out). GigE Vision+GVSP will maintain its integrity and feature set and not converge with the other protocols.

convergence of the interfaces

Protocol Fragmentation

With the introduction of the use of TCP and RDMA for GigE Vision image transmission we see that for some camera providers, the use of these different protocols is creating a fragmented landscape whereby different NICs and drivers will need to be supported across the industry by various third parties. With top performance, Emergent is able to maintain the current protocol across all speeds of interface and not jump around with different protocols and NICs to support your needs.

udp gvsp vs udp tcp rdma 3 udp gvsp vs udp tcp rdma 2

Resend Technology

Why are packet resends used with TCP and RoCEv2 and what are the implications on jitter and latency. If a system is not well designed and tuned then no amount of resends will allow the system to run in a stable manner and will indeed drop packets and frames. In addition, when image transfer bandwidth is slowed due to poor system design then buffers fill up. The fullness of the buffer is a measure of the system latency. The filling up and draining of the buffer is a measure of system jitter. Both jitter and latency are important in machine vision systems – especially those needing to make timely decisions and both are a sign of an unstable system with poor or no safety margin.

FPGA options RDMA/RoCEv2 vs GVSP

What we see is that RoCEv2 requires more FPGA resources to implement and as such the cost of the FPGA will be higher than that of the lightweight standard and mature GigE Vision GVSP protocol. This is even moreso true if a resend buffer is implemented.

We see that in order to create the smallest and lowest cost camera that these cameras from various manufacturers are not supporting RoCEv2 below 10GigE. This is one way in which Emergent’s new Eros 5GigE cameras will continue to provide true ZERO copy performance while the others rely on TCP and conventional GVSP processing which yields 3x memory bandwidth and 33% efficiency of the Emergent ZERO copy methods. In multi-camera systems, Emergent will remain unmatched in system density and price performance metrics.

And remember, that while many state low CPU utilization that it is actually memory bandwidth utilization that is hiding in the background and preventing maximal system performance.

rocev2 vs gvsp 3 rocev2 vs gvsp 2

RoCE vs GVSP

This summary captures the current landscape of RDMA/RoCEv2 and GVSP performance implementations with Emergent

rdma summary.pptx

Q&A

1. What will Emergent do, if ROCEv2 gets part of the standard and is RoCE more flexible than GVSP/Emergent?

RoCEv2 is not a slam dunk. The only benefit after weighing all the data is the zero copy component.

And, yet, it takes away a fundamental feature like multi-casting due to its point to point connected nature like CXP/USB. No GPU Direct for Windows/NVidia No NDSPI/Windows Client Intel/Marvell. What was supposed to be a benefit is the comprehensive list of NIC providers that support the protocol….

FPGA resources to limit to 10G and higher
Jitter/Latency with resends/flow control
Not standardized
Not mature
Slow adoption
Not backwards compatible with existing 1G/2.5/5G
No POE cards
Designed for large file transfer – not streamed video

We will see where this goes; all products Emergent supplies are RDMA/RoCEv2 ready and could adopt this quickly if CXP rushed into things, and now must adopt physical layer of Eth

2. Is it reasonable to have many cameras in a single server and how can the server keep up with the processing needs?

Please see our other presentation where we show as many as 48 25Gige cameras in a single server with 2 GPUs performing H.265 encoding with our eCapture Pro plug-in feature. We will show how other plugins like pattern matching, polarization and inference can also be run with performance thanks to GPU Direct and high performance GPUs from NVidia.

3. Which Broadcom NICs do you support?

We support Broadcom Thor based cards such as P425G (Quad 10/25) and P2100G (Dual 100G) which have optimal resources

4. Is multicast technology important in machine vision applications?

Absolutely. Any system that can benefit from redundancy, rapid failover, and distributed processing will benefit and that will only become more important with higher performance camera systems.

5. We had tested a system with 5 1GigE Cameras per workstation and never brought it to a stable level. Seeing your presentation, what is the difference between your approach and others?

Our focus has always been on performance systems regardless of interface speed. This could be a few 100GigE cameras in a single server or many lower speed
cameras in the same single server. In either case, we always provide a zero copy solution for such performance applications and have over a decade of system performance tuning that we leverage.

6. What is the price of a 48 port 25G switch like you promote?

I won’t provide absolute numbers for this but will say that our competitors have been quoted as saying that these switches are 3x more expensive than reality.
In general, the switches can cost as little as a couple of cameras which has incredible value when working with 8 or more cameras in a system.

Emergent is a partner of NVidia and has been working with NVidia and Mellanox technology since 2015

About Emergent Vision Technologies

logo emergent vision technologies 300x110

Here is a recap of what Emergent is all about…

  • 10+ Awards for innovation and pioneering the high speed GigE Vision imaging movement
  • 10+ years shipping 10GigE cameras with more than 140 models
  • 5+ years shipping 25GigE cameras with more than 55 models
  • 2+ years shipping 100GigE cameras with more than 16 models
  • Camera technology performance leader
  • Focused on high-speed Ethernet/GigE Vision
  • Focused on enabling the processing of high-speed image data
  • Area scan and Line scan models
  • UV, NIR, Polarized, Color, Mono models for multispectral applications
  • Emergent eSDK for full application flexibility
  • Emergent eCapture Pro for a highly comprehensive software solution
  • Most comprehensive range of product and support for high-speed imaging applications
  • Any speed, any resolution, any cable length
  • Available NOW!

We are a multi-award winning company with a focus on high speed GigE Vision product.

We have many years shipping product ranging in speeds from 10GigE up to 100GigE.

We have a strong focus on providing end-to-end technologies and support for our customers applications.

We can fullfil most application needs.

Lastly, products presented are available now.

Adoption of 10GigE Vision and Higher

Here is a quick snapshot of the adoption of GigE Vision products ranging in speeds from 10GigE up to 100GigE. Emergent has shown how top performance can be achieved and opened up many markets including machine vision to the use of such technologies. Some companies are just now leveraging our efforts toward releasing 25G and higher speed products but still a ways to go to release ratified and performance products.

Adoption of 10GigEVision and Higher

Figure: Emergent Vision Technologies is the first provider of cameras based on 10GigE, 25GigE, 50GigE, and 100GigE interfaces.