PVM provides a de facto standard for parallel programming in clusters-of-workstations environment. The PVM communication performance over the 640 Mbps ATOMIC LAN has been dramatically improved using the ATOMIC user-level protocol.
We measured the baseline PVM3.3 communication performance over the ATOMIC LAN at31 Mbps throughput using the regular mode (through the pvmd) and 65 Mbps throughput using the direct mode (over kernel TCP).
Our improved implementation separates PVM control message and PVM data transfer, in which control messages take the regular mode (through the pvmd over kernel UDP) and fast data transfer uses user-level reliable data protocol ( ATP ) over Myrinet API. This combined approach achieved 164-Mbps PVM point-to-point throughput over the ATP.
We also developed the ATOMIC Transport Protocol (ATP), which provides sequenced, reliable data transfer at the user level. It communicates with ATOMIC Myrinet host interface over the user-level Myrinet API and interleaves memory copy with DMA to reduce memory copy overhead.
The ATP software is available here. The following describes its performance.
Experiment configuration: Measured at memory-to-memory level between two Sun SPARC 20/71 stations interconnected through an 8-port Myrinet switch. Device driver version: Myrinet-2.0c.
ATP peak throughput: 164-Mbps. ATP is slower than kernel-TCP for packets smaller than 16K bytes because ATP uses “blocking sending” (and keeps packet boundary): an ATP send will not be returned until the packet is ACKed. TCP uses non-blocking sending (in stream fashion): multiple outstanding TCP sends form a pipeline. Kernel-TCP is slower than ATP for packets larger than 16K bytes because ATP interleaves DMA with memcpy in transferring segments each of which is equal to network MTU (8K). The interleaving starts to take effective when packet size exceeds network MTU.