HDG 2D-3D
Table of Contents
- 1. Clone and build hou10ni
- 2. Visualization of the results (on your laptop)
- 3. Step 1: Scalability tests
- 4. Step 2: Impact of the frequency
- 5. Step 3: 2D heterogeneous domain
- 6. Step 4: Dump matrices to files (preparing the next hands-on session)
- 7. Step 5: 3D Domain
- 8. Step 6 : 3D domain on two nodes
The pedagogical objectives of the hands-on session are the following.
- Dealing with a supercomputer and being able to manipulate a slurm/mpi
environment in a reproducible fashion (through
guix
in the same set up as yesterday). - Being able to compile and link a complex application (
hou10ni
in our case) on top of a complex software stack (sparse linear solvers, runtime systems, …). - Executing the application in such a high performance context.
- Analyzing the results and understanding the physical phenomena (related to yesterday class), including visualization (you must have paraview set up on your laptop).
- Studying the impact on your application of important parameters (frequency order, meshes, …)
- Having an advanced control of the underneath sparse linear solver (more will come tomorrow).
- Conducting a performance analysis, including scalability study.
1. Clone and build hou10ni
1.1. Set up
Get connected to plafrim
:
ssh plafrim-hpcs
We use the same guix
environment as in the previous hands-on session.
1.2. Clone
Clone the hou10ni
source code:
git clone https://gitlab.inria.fr/fuentes/hou10ni_school_2019.git cd hou10ni_school_2019 mkdir build cd build
1.3. Build
Set up the required guix
environment and build hou10ni
:
guix environment --pure maphys --ad-hoc maphys pastix starpu vim -- /bin/bash --norc cmake .. make
Note: If a package is not found (such as
starpu
which is provided byguix-hpc
but notguix
core), make sure you have enabled the channels with the set up we viewed yesterday). Theguix describe
command allows you to check whether the three channels you wanted to set up are effectively enabled. Theguix pull -l
command retrieves the list (the history) of the set up you have used so far.
You can test the installation with the following command:
ctest
1.4. Preliminary execution on the frontal node (mistal01
)
You can now run on the frontal node (mistral01
) as follows:
mpirun -np 2 ./hou10ni_lite.out < param_simple_mumps.txt mpirun -np 2 ./hou10ni_lite.out < param_simple_maphys.txt
We now exit the guix environment
exit
All the runs in this hands-on session must be performed from within the build directory using the
./hou10ni_lite.out
command (paths could be handled but not addressed in this hands-on).
1.5. Interactive execution on a computing node (miriel
)
We can now try to perform an interactive execution on a computing node (miriel
in the case of plafrim-hpcs
). Let's first check out whether there are partitions
with available (idle) compute nodes:
sinfo
PARTITION | AVAIL | TIMELIMIT | NODES | STATE | NODELIST |
hpc | up | 8:00:00 | 1 | drain* | miriel007 |
hpc | up | 8:00:00 | 12 | down* | miriel[008-016,019,022,024] |
hpc | up | 8:00:00 | 19 | idle | miriel[001-006,017-018,020-021,023,025-032] |
sirocco | up | 10:00:00 | 1 | drain | sirocco06 |
mistral | up | 3-00:00:00 | 2 | down* | mistral[04,18] |
mistral | up | 3-00:00:00 | 15 | idle | mistral[02-03,05-17] |
In this case, there are 19 miriel
nodes available on the hpc
partition. We
reserve (salloc
) 1 node (-N 1
) on that partition (-p hpc
):
salloc -p hpc -N 1
salloc: Granted job allocation 426500
We can monitor the node that we got with squeue
(restricting the view to our
username with the - u
option):
squeue -u $USER
JOBID | PARTITION | NAME | USER | ST | TIME | NODES | NODELIST(REASON) |
426500 | hpc | bash | hpcs-agu | R | 0:50 | 1 | miriel023 |
We can directly connect to that node (where <XXX>
shall be 023 in this example
and has to be adapted to the value you obtained):
ssh miriel<XXX>
Let's see whether we can reproduce the previous execution:
cd hou10ni_school_2019/build
guix environment --pure maphys --ad-hoc maphys pastix starpu vim -- /bin/bash --norc
mpirun -np 2 ./hou10ni_lite.out < param_simple_mumps.txt
mpirun -np 2 ./hou10ni_lite.out < param_simple_maphys.txt
You can monitor the CPU and memory usage by opening a new terminal on your laptop:
ssh plafrim-hpcs ssh miriel<XXX> top
2. Visualization of the results (on your laptop)
hou10ni
allows you to produce vtu
files which can then be retrieved on your
laptop and viewed with paraview
. These files may take quite a large amount of
disk, especially when running large test cases, and may fill the disk usage
quota shared between all users.
Important: In the following, turns on
vtu
tracing only for a punctual experiment, and turn it off immediately after.
Set the parameter VTU
to true in the param_simple_maphys.txt
file and run
mpirun -np 2 ./hou10ni_lite.out < param_simple_maphys.txt
once again.
From a terminal on your own laptop
. Move to the directory into which you want
to store the results and copy them from plafrim
and visualize them with
paraview
(we assume you have already installed it):
scp plafrim-hpcs:hou10ni_school_2019/build/FILM/Paramsol*vtu . scp plafrim-hpcs:hou10ni_school_2019/build/FILM/V*vtu . paraview V.0001.pvtu paraview Paramsol.pvtu
We assume here that hou10ni_school_2019 is in your home directory. If it is not the case, modify the path accordingly
3. Step 1: Scalability tests
Important: Do not forget to disable the output of
vtu
files in the parameter file.
Even more important: How many nodes did you reserve (
-N
option)? How many cores on those nodes? Do you think you are alone on your compute node (squeue
). You may want to exit the allocation and reserve more tasks (with the-n
salloc
option) and/or make an exclusive reservation (--exclusive
).
3.1. General test
In fortran
, the time can be recorded thanks to:
CALL CPU_TIME(t1)
where t1 is real, of kind dp
.
Hence, if you want to time some part of code, this can be done through:
CALL MPI_BARRIER(MPI_COMM_WORLD,ierr) if (myrank.eq.0) then ! only on the master proc CALL CPU_TIME(t1) end if ! [...] ! part of the code to be timed ! [...] CALL MPI_BARRIER(MPI_COMM_WORLD,ierr) if (myrank.eq.0) then ! only on the master proc CALL CPU_TIME(t2) write(6,*) "Computation time for part 1 is ", t2-t1 end if
In the main program lib/bin/hou10ni_lite.F90
Evaluate the time spent for:
- Initializing the problem (reading the data, reading and partionning the mesh)
- Construction the global matrix
- Solving the linear system
Check the scalability of the code by running param_simple_mumps.txt
and
param_simple_maphys.txt
on 2, 4, 8, 16 and 24 cores on miriel
.
To see only the relevant lines, you may use the command
mpirun -np 4 ./hou10ni_lite.out < param_simple_maphys.txt | grep "Computation time"
3.2. Specific test
3.3. Disabling OpenMP
Set the OMP_NUM_THREADS
environment variable to 1 thanks to the command
export OMP_NUM_THREADS=1
Run the scalability tests once again.
4. Step 2: Impact of the frequency
Enable the output of vtu
files in the parameter file.
With paraview
, compare the solution for \(\omega=0.5\) and \(\omega=5\), using elements of
order 1, 3 and 5. Compare the computation time of the linear solver with mumps
and maphys
.
Only for \(\omega=5\). Set the p-adaptivity
option equal to 1 and the maximal order
to 10. To see the order of the elements, you can use paraview
and visualize
the degrees in paramsol
. Compare the computation time of the linear solvers
when using mumps
and maphys
.
5. Step 3: 2D heterogeneous domain
Copy the parameter files param_marmousi_mumps.txt
and
param_marmousi_maphys.txt
in your build/
directory:
cp /home/hpcs-readonly/hou10ni-testcase/2D/param_marmousi_mumps.txt . cp /home/hpcs-readonly/hou10ni-testcase/2D/param_marmousi_maphys.txt .
Compare the results obtained for \(\omega=1\) without p-adaptivity
, \(p=1\) and with
p-adaptivity
with a maximal order of 10. Compare the computational time of
mumps
and maphys
in the latter case.
You can see the various parameters using paraview
on paramsol
.
Compare the computational time for the two linear solvers for ω=1, 2, 5 and 10.
6. Step 4: Dump matrices to files (preparing the next hands-on session)
In order to prepare the next hands-on session, you can write the matrices into files. We'll use them tomorrow for further testing of sparse linear solvers.
This step is optional as we will provide pre-dumped matrices on
plafrim-hpcs
in the/home/hpcs-readonly/matrices/
repository.
The idea is here that you focus on a particular numerical (frequency, order, meshes, …) and computational (number of processes, …) onto which you would like to perform a more detailed performance analysis tomorrow depending on the sparse linear solver set up. You can do that with either the above 2D test cases or while playing with some below 3D test cases.
We explain below how to dump matrices with hou10ni
. If you have your own
application you want to play with, you can also load the matrices obtained with
your application on plafrim-hpcs
and we'll try to analyze them tomorrow.
hou10ni
delegates the dump of the matrices to the underneath sparse linear
solver. You can do it with the solver of your choice, maphys
or mumps
.
Important: Do not forget to disable the output matrices once you are done with your target matrices.
6.1. maphys
In your maphys
conf_maphys.in
input file, turn on the dump of matrices by
changing the line:
ICNTL(45)=1
Note that three files will be dumped per process (local matrix, local assembled right hand side and subdomain connectivity). The name of the files cannot be changed, so if you want to keep the matrices, create a new folder and put them inside.
For example, suppose you are working on a test case of your choice (numerical
and computational set up) that you want to reference to as mymatofchoice1.
After you have executed hou10ni
in this set up with dump enabled, you can
perform the following instructions:
mkdir maphys_mymatofchoice1 mv maphys_local_* maphys_mymatofchoice1/
6.2. mumps
In your houdini
input file, turn on the dump of matrices by changing the line:
.FALSE. Do you want to output the matrices ? (.FALSE. or .TRUE.)
to:
.TRUE. Do you want to output the matrices ? (.FALSE. or .TRUE.)
With mumps
, dumping the matrices can be done similarly. The matrix will be
stored in files global_mat.txt<pid>
(where pid
is the process id) and the
right and side will be stored in a centralized way in global_mat.txt.rhs
.
mkdir mumps_mymatofchoice2 mv global_mat* mumps_mymatofchoice2/
7. Step 5: 3D Domain
Copy the parameter files param_simple_3D_mumps.txt
and
param_simple_3D_maphys.txt
in your build/
directory :
cp /home/hpcs-readonly/hou10ni-testcase/3D/param_simple_3D_mumps.txt . cp /home/hpcs-readonly/hou10ni-testcase/3D/param_simple_3D_maphys.txt .
Compare the results obtained for ω=1 without p-adaptivity, p=1 and with p-adaptivity with a maximal order of 3 and 5.
Compare the computational time of mumps
and maphys
. Once again, you can
write the matrices to files for the next hands-on session.
8. Step 6 : 3D domain on two nodes
Logout from your miriel
node
exit
You may have to exit
twice (first to exit the guix
environment, then the
node).
8.1. Running on two nodes
Create a slurm
batch file hou10ni.batch
in the build
directory
#!/bin/sh #SBATCH --time=03:59:00 #SBATCH -N 2 #SBATCH -n 48 #SBATCH -p hpc guix environment --pure maphys --ad-hoc maphys pastix starpu -- /bin/bash --norc export OMP_NUM_THREADS=1 srun --mpi=pmi2 ./hou10ni_lite.out < param_simple_3D_maphys.txt
Submit the job
sbatch hou10ni.batch
You can check the state of the job by
squeue
When the job is running, you will see the nodes on which it is submitted and monitor the job by
ssh miriel top
8.2. Discussion about the batch file
Have a look at the batch file above? Do you think ./hou10ni_lite.out
is
running in a guix
environment?
If you have a doubt, check out what's going on with the following code:
guix environment --pure --ad-hoc openmpi openssh mpirun -n 2 ./hou10ni_lite.out < param_simple_mumps.txt
Does it give you a hint?
Read the output of ldd ./hou10ni_lite.out
outside of your guix
environment.
Got it?
You may want to further read the slurm/mpi/guix tutorial.
8.3. Going further
You may want to prepare batch to run large test cases on a larger number of
nodes. For the largest ones, you may want to batch at the end of the session
only in order to avoid consuming too many resources while everybody is working
actively on the hands-on. Do not produce vtu
or matrix outputs for very large
test cases.