A Processing Pipeline for Large-scale Data Stream at FAST

Zhu Yongxin

We propose a DPDK-based massive astronomical data access technology for the large radio astronomical data stream transmission problem of FAST. This research part mainly completes three tasks. (1) In view of the fact that FAST uses UDP to transmit data, and has the characteristics of fixed bandwidth, large throughput and non-reproducibility, this thesis formally demonstrates that the optimization means at the protocol level such as flow control, congestion control, acknowledgement mechanism and packet loss retransmission are not suitable for the transmission scenario of FAST. (2) Aiming at the difficulties and bottlenecks in the process of traditional data receiving and processing, DPDK is used to reduce the loss of link access and reduce packet loss from both software and hardware aspects such as CPU, Cache, memory and architecture. (3) A custom user space stack is designed for FAST to complete the data parsing process and eliminate the negative effects of the kernel stack, further improving performance and reducing packet loss. We also propose a hybrid memory-based collaborative writing approach to address the problem that traditional storage media cannot overcome the storage bottleneck of massive astronomical data streams. Two main tasks are accomplished in this research section: (1) The new access technology implemented based on the above rewrites the write interface of the traditional astronomical software PSRDADA to achieve higher bandwidth data storage with the network bottleneck solved. (2) According to the data layering principle and storage medium layering principle, Intel Optane persistent memory is selected to fill the performance gap between DRAM memory and hard disk.