Nagle's algorithm, which is a feature of all modern TCP/IP stacks, defers the transmission of small (less than Maximum Segment Size (AKA MSS)) segments in an effort to amortize the header overhead per application-layer data byte whenever there is unacknowledged data. This can result in data waiting for the Delay ACK timer (100 milliseconds in ONTAP) before actually being transmitted, thereby resulting in substantial latency introduced for a large fraction of small segments.
When the application-layer traffic pattern is unidirectional with a large fraction of segments being less than MSS, this can result in dramatically lower throughput for the flow. This can be identified in a packet capture by observing a large proportion in any TCP stream of tcp.analysis.ack_rtt values above the Delay ACK timer value (100 ms in ONTAP).