1================== 2Tag matching logic 3================== 4 5The MPI standard defines a set of rules, known as tag-matching, for matching 6source send operations to destination receives. The following parameters must 7match the following source and destination parameters: 8 9* Communicator 10* User tag - wild card may be specified by the receiver 11* Source rank – wild car may be specified by the receiver 12* Destination rank – wild 13 14The ordering rules require that when more than one pair of send and receive 15message envelopes may match, the pair that includes the earliest posted-send 16and the earliest posted-receive is the pair that must be used to satisfy the 17matching operation. However, this doesn’t imply that tags are consumed in 18the order they are created, e.g., a later generated tag may be consumed, if 19earlier tags can’t be used to satisfy the matching rules. 20 21When a message is sent from the sender to the receiver, the communication 22library may attempt to process the operation either after or before the 23corresponding matching receive is posted. If a matching receive is posted, 24this is an expected message, otherwise it is called an unexpected message. 25Implementations frequently use different matching schemes for these two 26different matching instances. 27 28To keep MPI library memory footprint down, MPI implementations typically use 29two different protocols for this purpose: 30 311. The Eager protocol- the complete message is sent when the send is 32processed by the sender. A completion send is received in the send_cq 33notifying that the buffer can be reused. 34 352. The Rendezvous Protocol - the sender sends the tag-matching header, 36and perhaps a portion of data when first notifying the receiver. When the 37corresponding buffer is posted, the responder will use the information from 38the header to initiate an RDMA READ operation directly to the matching buffer. 39A fin message needs to be received in order for the buffer to be reused. 40 41Tag matching implementation 42=========================== 43 44There are two types of matching objects used, the posted receive list and the 45unexpected message list. The application posts receive buffers through calls 46to the MPI receive routines in the posted receive list and posts send messages 47using the MPI send routines. The head of the posted receive list may be 48maintained by the hardware, with the software expected to shadow this list. 49 50When send is initiated and arrives at the receive side, if there is no 51pre-posted receive for this arriving message, it is passed to the software and 52placed in the unexpected message list. Otherwise the match is processed, 53including rendezvous processing, if appropriate, delivering the data to the 54specified receive buffer. This allows overlapping receive-side MPI tag 55matching with computation. 56 57When a receive-message is posted, the communication library will first check 58the software unexpected message list for a matching receive. If a match is 59found, data is delivered to the user buffer, using a software controlled 60protocol. The UCX implementation uses either an eager or rendezvous protocol, 61depending on data size. If no match is found, the entire pre-posted receive 62list is maintained by the hardware, and there is space to add one more 63pre-posted receive to this list, this receive is passed to the hardware. 64Software is expected to shadow this list, to help with processing MPI cancel 65operations. In addition, because hardware and software are not expected to be 66tightly synchronized with respect to the tag-matching operation, this shadow 67list is used to detect the case that a pre-posted receive is passed to the 68hardware, as the matching unexpected message is being passed from the hardware 69to the software. 70