Rpi 4 Cluster - Part 3
In Part 2, I stopped at “Configuring the Master Node.” One thing I noticed is, because I made an error in the file /etc/slurm-llnl/cgroup.conf, slurm did not load correctly, and mounting of the NFS share did not happen on any of the nodes, where before I had it working. Took a while to run that down. The clue was when I attempted to test slurm (Garrett’s Part I, Step 7), it couldn’t contact any resource. Checking the service status indicated an error in the file. Correcting the error allowed the slurm test to be successful. Another indicator things are not right shows up during the munge test (paragraph 6.4.2). The line “ENCODE_HOST” would show the node it was executed on, not node 1. But I’m getting a bit ahead of myself.
In Garrett’s reference, Part 1: The Basics, step 5.3.5, the partition name line should be edited. Specifically “Nodes=node[02-04]” should be YOUR node name, i.e. <your_node_name>[02-04], not node[02-04].
The rest of Part 1: The Basics went smoothly after my errors were corrected. I only had one issue with Part II: Some Simple Jobs. In Section 2, paragraph 2.c, the generate.R program, line 5 states color=“darkred”, where it should read col=“darkred.” Same issue in the line-by-line explanation for “hist.”
Note: You can use “gpicview” to view the plots generated, if you are using a desktop.
Referring back to Part 1 of this series, the cases I printed: Pi 4 Stacking Case by Stephen Jogerst , during running of the R program, I noticed the node I used was hovering around 80C. That is too hot, and indicates lack of air flow through the stacking cases. I may have to find another solution that allows fan mounting for each node. I monitored the temp by opening a terminal for that node as follows:
ssh pi@cl02
watch -n 5 vcgencmd measure_temp
The “-n” is the update interval.
In Part 4 of this series, I will start into Part III: OpenMPI, Python, and Parallel Jobs and post my results here. Until next time…