Interactive C++ for HPC

  |   Source

I recently discovered cling after I saw this Tweet

From the cling website, "Cling is an interactive C++ interpreter, built on the top of LLVM and Clang libraries. Its advantages over the standard interpreters are that it has command line prompt and uses just-in-time (JIT) compiler for compilation." As I write (or mostly teach through dual-coding with students) quite a bit of C++ and have already been utilizing the IPython/Jupyter notebook as a teaching tool for sometime, I was immediately interested in using MinRK's clingkernel for this purpose as well as exploring and testing out my own ideas. As I learned more about cling, I found most of the examples to be very trivial and those that inluded the calling/loading of external functions typically only utilized the functions from the standard library. I was eager to give it try on something a little more complex, and because most of the code we write in my group is heavily dependent on Trilinos, I thought I would attempt to solve a simple linear algebra system of the form

$$ A \vec{x} = \vec{b} $$

using AztecOO solvers with Epetra datastructures from Trilinos (at least AztecOO and Epetra). This is an adaptation of the example code provided in Figure 1 of the AztecOO User's guide.

Of course, to reproduce the code below, you will need to install cling and Trilinos. Additionally, if you want to use MinRK's clingkernel you need to install the Jupyter notebook from the master branch, and install the kernel. With that, you should be able to download this notebook and execute it with few modifications.

Cling provides a few metaprocessing commands to perform actions that you would normal provide as arguments to the build system, such as providing locations for include files, loading shared libraries, etc. The documentation for the metaprocessing commands are

.x test.cpp – load a file (if the file defines void test() it will be executed)
.L libname – load a libary or file
.x filename.cxx – loads filename and calls void filename() if defined
.I path – adds an include path
.printAST – shows the abstract syntax tree after each processed entity
.q – exit cling
.help – display a brief description

We'll start by loading the location of the installed header files in my Trilinos build (here I only built AztecOO and Epetra in serial to keep it simple). Cling utilizes the Clang compiler which is pretty picky, so I had to edit the header files in a few locations where #include statments where provided with angle brackets < > for local includes, where the compiler wanted double quotes. In other words, I had to change

#include <foo.h>


#include "foo.h"

in a few of the Trilinos headers. So the metaprocessing command to tell the system where the header files are:

In [1]:
.I /usr/local/trilinos-serial/include/

And now we provide the include statememts.

In [2]:
#include "AztecOO.h"

In [3]:
#include "AztecOO_Version.h"
#include "Epetra_SerialComm.h"
#include "Epetra_Map.h"
#include "Epetra_Vector.h"
#include "Epetra_CrsMatrix.h"

Now the metaprocessing to load the shared libraries. Since I'm on a Mac, I had to define the environment variable

export DYLD_LIBRARY_PATH=/usr/local/trilinos-serial/lib:$DYLD_LIBRARY_PATH

before launching the notebook/cling so the system knows where to find the libraries at runtime. On a Linux machine, this variable would be LD_LIBRARY_PATH.

In [4]:
.L libepetra
.L libaztecOO

Now we start in with the actual Epetra/AztecOO code. Since it's not apparent to me whether it's possible for cling to support MPI processes, we'll just provide a serial implementation. We start with some code that instantiates the Epetra serial communicator and provides some information about the version of Trilinos we are using.

In [5]:
Epetra_SerialComm Comm;
int NumMyElements = 1000;

if (Comm.MyPID()==0)
    std::cout << AztecOO_Version() << std::endl << std::endl;

  std::cout << Comm <<std::endl;
AztecOO in Trilinos 12.1 (Dev)

Epetra::Comm::Processor 0 of 1 total processors.

Notice the immediate output to screen. Remember this is C++!

Now we construct a Map that puts same number of equations on each processor. Of course, we only have one processor here, but the code is generic and could also be run in parallel if multiple processors were available.

In [6]:
Epetra_Map Map(-1, NumMyElements, 0, Comm);
int NumGlobalElements = Map.NumGlobalElements();

Now we instantiate the column-row sparse matrix $A$

In [7]:
Epetra_CrsMatrix A(Copy, Map, 3);

Here we fill $A$ to be a finite-difference like operator.

In [8]:
double negOne = -1.0;
double posTwo = 2.0;
for (int i=0; i<NumMyElements; i++) {
    int GlobalRow = A.GRID(i); int RowLess1 = GlobalRow - 1; int RowPlus1 = GlobalRow + 1;

    if (RowLess1!=-1) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowLess1);
    if (RowPlus1!=NumGlobalElements) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowPlus1);
    A.InsertGlobalValues(GlobalRow, 1, &posTwo, &GlobalRow);

Now we call the FillComplete() method to optimize storage of the matrix.

In [9]:

Here we instantiate $\vec{x}$ and the right-hand-side vector $\vec{b}$. Also, we fill $\vec{b}$ with random numbers for our solution.

In [10]:
Epetra_Vector x(Map);
Epetra_Vector b(Map);


Now we instantiate the Epetra_Problem and the Aztec solver from that instance.

In [11]:
Epetra_LinearProblem problem(&A, &x, &b);

AztecOO solver(problem);

Next is the actual solution step. A GMRES solver is used for 10 iterations. This won't be enough for convergence of the solver, but this is just for demonstration purposes and I would like to keep the output short.

In [12]:
solver.SetAztecOption(AZ_precond, AZ_Jacobi);
solver.Iterate(10, 1.0E-2);

std::cout << "Solver performed " << solver.NumIters() << " iterations." << std::endl;
std::cout << "Norm of true residual = " << solver.TrueResidual() << std::endl;
		***** Problem: Epetra::CrsMatrix
		***** Preconditioned GMRES solution
		***** 1 step block Jacobi
		***** No scaling

                iter:    0           residual = 1.000000e+00
                iter:    1           residual = 5.832906e-01
                iter:    2           residual = 4.547092e-01
                iter:    3           residual = 3.831399e-01
                iter:    4           residual = 3.366700e-01
                iter:    5           residual = 3.043737e-01
                iter:    6           residual = 2.789514e-01
                iter:    7           residual = 2.592427e-01
                iter:    8           residual = 2.449521e-01
                iter:    9           residual = 2.328859e-01
                iter:   10           residual = 2.223153e-01


	Warning: maximum number of iterations exceeded
	without convergence

	Solver:			gmres
	number of iterations:	10

	Recursive residual =  3.9447e+00

	Calculated Norms				Requested Norm
	--------------------------------------------	--------------

	||r||_2 / ||r0||_2:		2.223153e-01	1.000000e-02


		Solution time: 0.000680 (sec.)
		total iterations: 10

Solver performed 10 iterations.
Norm of true residual = 3.94466

To demonstrate that is worked, we'll print out the first 10 values of the solution vector $\vec{x}$.

In [13]:
for (int i = 0; i < 10; i++){
    std::cout << x[i] << std::endl; 

It's pretty amazing to be able to do interactive C++ developement. If we could get this to work in parallel, it would be a leap forward in the ability to do introspective high-performance scientific computing.

Comments powered by Disqus