Peridigm tutorial

I recently gave a short introduction to Peridigm, the massively parallel computational peridynamics code. It briefly covers, the history of Peridigm, and tips on building, running, extending, and learning to use Peridigm.

Debugging C/C++ libraries called by Python

I've been debugging a C++ library that I wrote and am calling from Python via a CFFI interface. After an hour or two of debugging with print-statements without resolution, I decided to figure out how to use a proper debugger. I'm very comfortable on the command-line and have used gdb in this way often in the past, so am happy I found an easy solution. The instructions that follow should work for any C/C++ library built with debug flags enabled (i.e. the -g compiler flag) that is called from Python. While I am using CFFI, it should also work for ctypes or SWIG interfaces. Also the commands below are specific to the lldb debugger, but could be easily translated to gdb.

First, in two separate terminal windows, launch ipython and lldb. At the IPython prompt, type:

In [1]: !ps aux | grep -i ipython

and look for the process ID (pid) of the IPython session. Let's assume the process ID in our case is 1234. Go to the lldb command prompt and attach to the IPython session with

(lldb) attach --pid 1234

followed by

(lldb) continue

to release the process. Now your ready to set breakpoints, etc. For example, to set a breakpoint on line 400 of a myfile.cpp we would type

(lldb) breakpoint set -f myfile.cpp -l 400

Assuming myfile.cpp is compiled into a library that is called by the Python script/module myscript.py, we can now run the Python code in the IPython terminal

In [2]: run myscript.py

Back in the lldb terminal we should see our code stopped on line 400. Debug away...

P.S. I found the bug after 5 minutes in a proper debugger...

git and Github tutorial

I recently gave an ad hoc tutorial to a group of students on git and Github. It covers introductory commands and common use cases such as branching, merging, and common workflows that utilize Github. I screencasted the tutorial and recorded the live audio which can be viewed below.

Extracting Exodus information with netcdf-python

In an earlier post I demonstrated how to use a set of ctypes wrappers of Sandia's Exodus database format, i.e. the exodus.py script that is distributed with Trilinos, to extract information from an Exodus database. Exodus is a finite element format library that is build upon NetCDF4 and HDF5. While the use of exodus.py makes for a straightforward and intuitive interface to the database, it requires that Trilinos (at least the SEACAS packages) be installed as shared libraries with an $ACCESS environment variable in your $PATH pointing to the library root. This setup is cumbersome if you don't have any other reason to have Trilinos installed. Thankfully, since Exodus files are NetCDF files are HDF files, we can use other tools to access the data. One of those, that is mature and available in the Python Package Index is netcdf4-python. It can easily be installed via pip i.e.

pip install netcdf4

or if your using the popular Anaconda Python distribution then

conda install netcdf4

should do the trick. This is all we need to have installed to read from an Exodus database. The simple 2D finite element mesh I will be using in the following example can be downloaded here

First we'll import the netCDF4 module and of course Numpy.

In [1]:
import netCDF4
import numpy as np

To open the Exodus file for reading we use the following command.

In [2]:
nc = netCDF4.Dataset('../files/s_curve.g')

Here we are using a "Genesis" file which is a Exodus file used for finite element model input. It typically contains information about the nodal coordinates, mesh connectivity, nodesets for boundary condition application, etc. A common source of creation for these forms of Exodus files is the mesh generation tool Cubit. We could access the output variant of an Exodus file (i.e. a *.e file) in the same manner.

Unless you take a lot of time carefully naming all your variables, nodesets, blocks, etc. during the creation of the database, the variables will be given default names which we can use to access that data. The default names are intuitive enough so that with a little investigation of thier properties, you can usually figure out what they refer to. To see all of the information of the database we can simply

In [3]:
print nc
<type 'netCDF4._netCDF4.Dataset'>
root group (NETCDF3_64BIT data model, file format NETCDF3):
    api_version: 4.98
    version: 4.98
    floating_point_word_size: 8
    file_size: 1
    title: cubit(/Users/john/Desktop/s_curve.g): 07/20/2015: 14:53:37
    dimensions(sizes): len_string(33), len_line(81), four(4), time_step(0), num_dim(3), num_nodes(468), num_elem(437), num_el_blk(1), num_qa_rec(1), num_node_sets(5), num_nod_ns1(21), num_nod_ns2(15), num_nod_ns3(17), num_nod_ns4(15), num_nod_ns5(17), num_el_in_blk1(437), num_nod_per_el1(4), num_att_in_blk1(1)
    variables(dimensions): float64 time_whole(time_step), |S1 qa_records(num_qa_rec,four,len_string), |S1 coor_names(num_dim,len_string), |S1 eb_names(num_el_blk,len_string), int32 ns_status(num_node_sets), int32 ns_prop1(num_node_sets), |S1 ns_names(num_node_sets,len_string), int32 node_ns1(num_nod_ns1), float64 dist_fact_ns1(num_nod_ns1), int32 node_ns2(num_nod_ns2), float64 dist_fact_ns2(num_nod_ns2), int32 node_ns3(num_nod_ns3), float64 dist_fact_ns3(num_nod_ns3), int32 node_ns4(num_nod_ns4), float64 dist_fact_ns4(num_nod_ns4), int32 node_ns5(num_nod_ns5), float64 dist_fact_ns5(num_nod_ns5), int32 elem_map(num_elem), int32 eb_status(num_el_blk), int32 eb_prop1(num_el_blk), float64 attrib1(num_el_in_blk1,num_att_in_blk1), int32 connect1(num_el_in_blk1,num_nod_per_el1), float64 coordx(num_nodes), float64 coordy(num_nodes), float64 coordz(num_nodes)
    groups: 

Or to get even more information about the variables

In [4]:
nc.variables
Out[4]:
OrderedDict([(u'time_whole', <type 'netCDF4._netCDF4.Variable'>
float64 time_whole(time_step)
unlimited dimensions: time_step
current shape = (0,)
filling off
), (u'qa_records', <type 'netCDF4._netCDF4.Variable'>
|S1 qa_records(num_qa_rec, four, len_string)
unlimited dimensions: 
current shape = (1, 4, 33)
filling off
), (u'coor_names', <type 'netCDF4._netCDF4.Variable'>
|S1 coor_names(num_dim, len_string)
unlimited dimensions: 
current shape = (3, 33)
filling off
), (u'eb_names', <type 'netCDF4._netCDF4.Variable'>
|S1 eb_names(num_el_blk, len_string)
unlimited dimensions: 
current shape = (1, 33)
filling off
), (u'ns_status', <type 'netCDF4._netCDF4.Variable'>
int32 ns_status(num_node_sets)
unlimited dimensions: 
current shape = (5,)
filling off
), (u'ns_prop1', <type 'netCDF4._netCDF4.Variable'>
int32 ns_prop1(num_node_sets)
    name: ID
unlimited dimensions: 
current shape = (5,)
filling off
), (u'ns_names', <type 'netCDF4._netCDF4.Variable'>
|S1 ns_names(num_node_sets, len_string)
unlimited dimensions: 
current shape = (5, 33)
filling off
), (u'node_ns1', <type 'netCDF4._netCDF4.Variable'>
int32 node_ns1(num_nod_ns1)
unlimited dimensions: 
current shape = (21,)
filling off
), (u'dist_fact_ns1', <type 'netCDF4._netCDF4.Variable'>
float64 dist_fact_ns1(num_nod_ns1)
unlimited dimensions: 
current shape = (21,)
filling off
), (u'node_ns2', <type 'netCDF4._netCDF4.Variable'>
int32 node_ns2(num_nod_ns2)
unlimited dimensions: 
current shape = (15,)
filling off
), (u'dist_fact_ns2', <type 'netCDF4._netCDF4.Variable'>
float64 dist_fact_ns2(num_nod_ns2)
unlimited dimensions: 
current shape = (15,)
filling off
), (u'node_ns3', <type 'netCDF4._netCDF4.Variable'>
int32 node_ns3(num_nod_ns3)
unlimited dimensions: 
current shape = (17,)
filling off
), (u'dist_fact_ns3', <type 'netCDF4._netCDF4.Variable'>
float64 dist_fact_ns3(num_nod_ns3)
unlimited dimensions: 
current shape = (17,)
filling off
), (u'node_ns4', <type 'netCDF4._netCDF4.Variable'>
int32 node_ns4(num_nod_ns4)
unlimited dimensions: 
current shape = (15,)
filling off
), (u'dist_fact_ns4', <type 'netCDF4._netCDF4.Variable'>
float64 dist_fact_ns4(num_nod_ns4)
unlimited dimensions: 
current shape = (15,)
filling off
), (u'node_ns5', <type 'netCDF4._netCDF4.Variable'>
int32 node_ns5(num_nod_ns5)
unlimited dimensions: 
current shape = (17,)
filling off
), (u'dist_fact_ns5', <type 'netCDF4._netCDF4.Variable'>
float64 dist_fact_ns5(num_nod_ns5)
unlimited dimensions: 
current shape = (17,)
filling off
), (u'elem_map', <type 'netCDF4._netCDF4.Variable'>
int32 elem_map(num_elem)
unlimited dimensions: 
current shape = (437,)
filling off
), (u'eb_status', <type 'netCDF4._netCDF4.Variable'>
int32 eb_status(num_el_blk)
unlimited dimensions: 
current shape = (1,)
filling off
), (u'eb_prop1', <type 'netCDF4._netCDF4.Variable'>
int32 eb_prop1(num_el_blk)
    name: ID
unlimited dimensions: 
current shape = (1,)
filling off
), (u'attrib1', <type 'netCDF4._netCDF4.Variable'>
float64 attrib1(num_el_in_blk1, num_att_in_blk1)
unlimited dimensions: 
current shape = (437, 1)
filling off
), (u'connect1', <type 'netCDF4._netCDF4.Variable'>
int32 connect1(num_el_in_blk1, num_nod_per_el1)
    elem_type: SHELL4
unlimited dimensions: 
current shape = (437, 4)
filling off
), (u'coordx', <type 'netCDF4._netCDF4.Variable'>
float64 coordx(num_nodes)
unlimited dimensions: 
current shape = (468,)
filling off
), (u'coordy', <type 'netCDF4._netCDF4.Variable'>
float64 coordy(num_nodes)
unlimited dimensions: 
current shape = (468,)
filling off
), (u'coordz', <type 'netCDF4._netCDF4.Variable'>
float64 coordz(num_nodes)
unlimited dimensions: 
current shape = (468,)
filling off
)])

I won't attempt to decipher all the information in the database as you can use the Exodus API reference for that. Below, I'll show a short example of how we might do something useful. First we will extract the X and Y nodal coordinate locations from and the connectivity array. The first 10 entries of the connectivity array are shown.

In [5]:
X = nc.variables['coordx']
Y = nc.variables['coordy']
connect = nc.variables['connect1']
print connect[0:10]
[[ 1  2  3  4]
 [ 2  5  6  3]
 [ 5  7  8  6]
 [ 7  9 10  8]
 [ 9 11 12 10]
 [11 13 14 12]
 [13 15 16 14]
 [15 17 18 16]
 [17 19 20 18]
 [19 21 22 20]]

Now we'll use the nodal locations and the connectivity array to plot a visualization of the mesh. First, we upload the required libraries from matplotlib.

In [6]:
%matplotlib inline
from matplotlib.patches import Polygon
from matplotlib.collections import PatchCollection
import matplotlib.pyplot as plt
import matplotlib

Now we'll create an array that contains the $x,y$ nodal coordinate location pairs and extract, via Numpy fancy indexing, the coordinates for each quadralateral in the mesh creating polygons with them for visualization.

In [7]:
xy = np.array([X[:], Y[:]]).T

patches = []
for coords in xy[connect[:]-1]:
    quad = Polygon(coords, True)
    patches.append(quad)

fig, ax = plt.subplots()
colors = 100 * np.random.rand(len(patches))
p = PatchCollection(patches, cmap=matplotlib.cm.coolwarm, alpha=0.4)
p.set_array(np.array(colors))
ax.add_collection(p)
ax.set_xlim([-4, 4])
ax.set_ylim([-4, 4])
ax.set_aspect('equal')
plt.show()

The colors have no meaning other than to give each element a random color. Now we'll close the Exodus file, completing our analysis.

In [8]:
nc.close()

Interactive C++ for HPC

I recently discovered cling after I saw this Tweet

From the cling website, "Cling is an interactive C++ interpreter, built on the top of LLVM and Clang libraries. Its advantages over the standard interpreters are that it has command line prompt and uses just-in-time (JIT) compiler for compilation." As I write (or mostly teach through dual-coding with students) quite a bit of C++ and have already been utilizing the IPython/Jupyter notebook as a teaching tool for sometime, I was immediately interested in using MinRK's clingkernel for this purpose as well as exploring and testing out my own ideas. As I learned more about cling, I found most of the examples to be very trivial and those that inluded the calling/loading of external functions typically only utilized the functions from the standard library. I was eager to give it try on something a little more complex, and because most of the code we write in my group is heavily dependent on Trilinos, I thought I would attempt to solve a simple linear algebra system of the form

$$ A \vec{x} = \vec{b} $$

using AztecOO solvers with Epetra datastructures from Trilinos (at least AztecOO and Epetra). This is an adaptation of the example code provided in Figure 1 of the AztecOO User's guide.

Of course, to reproduce the code below, you will need to install cling and Trilinos. Additionally, if you want to use MinRK's clingkernel you need to install the Jupyter notebook from the master branch, and install the kernel. With that, you should be able to download this notebook and execute it with few modifications.

Cling provides a few metaprocessing commands to perform actions that you would normal provide as arguments to the build system, such as providing locations for include files, loading shared libraries, etc. The documentation for the metaprocessing commands are

.x test.cpp – load a file (if the file defines void test() it will be executed)
.L libname – load a libary or file
.x filename.cxx – loads filename and calls void filename() if defined
.I path – adds an include path
.printAST – shows the abstract syntax tree after each processed entity
.q – exit cling
.help – display a brief description

We'll start by loading the location of the installed header files in my Trilinos build (here I only built AztecOO and Epetra in serial to keep it simple). Cling utilizes the Clang compiler which is pretty picky, so I had to edit the header files in a few locations where #include statments where provided with angle brackets < > for local includes, where the compiler wanted double quotes. In other words, I had to change

#include <foo.h>

to

#include "foo.h"

in a few of the Trilinos headers. So the metaprocessing command to tell the system where the header files are:

In [1]:
.I /usr/local/trilinos-serial/include/

And now we provide the include statememts.

In [2]:
#include "AztecOO.h"

In [3]:
#include "AztecOO_Version.h"
#include "Epetra_SerialComm.h"
#include "Epetra_Map.h"
#include "Epetra_Vector.h"
#include "Epetra_CrsMatrix.h"

Now the metaprocessing to load the shared libraries. Since I'm on a Mac, I had to define the environment variable

export DYLD_LIBRARY_PATH=/usr/local/trilinos-serial/lib:$DYLD_LIBRARY_PATH

before launching the notebook/cling so the system knows where to find the libraries at runtime. On a Linux machine, this variable would be LD_LIBRARY_PATH.

In [4]:
.L libepetra
.L libaztecOO

Now we start in with the actual Epetra/AztecOO code. Since it's not apparent to me whether it's possible for cling to support MPI processes, we'll just provide a serial implementation. We start with some code that instantiates the Epetra serial communicator and provides some information about the version of Trilinos we are using.

In [5]:
Epetra_SerialComm Comm;
  
int NumMyElements = 1000;

if (Comm.MyPID()==0)
    std::cout << AztecOO_Version() << std::endl << std::endl;

  std::cout << Comm <<std::endl;
AztecOO in Trilinos 12.1 (Dev)


Epetra::Comm::Processor 0 of 1 total processors.

Notice the immediate output to screen. Remember this is C++!

Now we construct a Map that puts same number of equations on each processor. Of course, we only have one processor here, but the code is generic and could also be run in parallel if multiple processors were available.

In [6]:
Epetra_Map Map(-1, NumMyElements, 0, Comm);
int NumGlobalElements = Map.NumGlobalElements();

Now we instantiate the column-row sparse matrix $A$

In [7]:
Epetra_CrsMatrix A(Copy, Map, 3);

Here we fill $A$ to be a finite-difference like operator.

In [8]:
double negOne = -1.0;
double posTwo = 2.0;
for (int i=0; i<NumMyElements; i++) {
    int GlobalRow = A.GRID(i); int RowLess1 = GlobalRow - 1; int RowPlus1 = GlobalRow + 1;

    if (RowLess1!=-1) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowLess1);
    if (RowPlus1!=NumGlobalElements) A.InsertGlobalValues(GlobalRow, 1, &negOne, &RowPlus1);
    A.InsertGlobalValues(GlobalRow, 1, &posTwo, &GlobalRow);
};
?   

Now we call the FillComplete() method to optimize storage of the matrix.

In [9]:
A.FillComplete();

Here we instantiate $\vec{x}$ and the right-hand-side vector $\vec{b}$. Also, we fill $\vec{b}$ with random numbers for our solution.

In [10]:
Epetra_Vector x(Map);
Epetra_Vector b(Map);

b.Random();

Now we instantiate the Epetra_Problem and the Aztec solver from that instance.

In [11]:
Epetra_LinearProblem problem(&A, &x, &b);

AztecOO solver(problem);

Next is the actual solution step. A GMRES solver is used for 10 iterations. This won't be enough for convergence of the solver, but this is just for demonstration purposes and I would like to keep the output short.

In [12]:
solver.SetAztecOption(AZ_precond, AZ_Jacobi);
solver.Iterate(10, 1.0E-2);

std::cout << "Solver performed " << solver.NumIters() << " iterations." << std::endl;
std::cout << "Norm of true residual = " << solver.TrueResidual() << std::endl;
		*******************************************************
		***** Problem: Epetra::CrsMatrix
		***** Preconditioned GMRES solution
		***** 1 step block Jacobi
		***** No scaling
		*******************************************************

                iter:    0           residual = 1.000000e+00
                iter:    1           residual = 5.832906e-01
                iter:    2           residual = 4.547092e-01
                iter:    3           residual = 3.831399e-01
                iter:    4           residual = 3.366700e-01
                iter:    5           residual = 3.043737e-01
                iter:    6           residual = 2.789514e-01
                iter:    7           residual = 2.592427e-01
                iter:    8           residual = 2.449521e-01
                iter:    9           residual = 2.328859e-01
                iter:   10           residual = 2.223153e-01


	***************************************************************

	Warning: maximum number of iterations exceeded
	without convergence

	Solver:			gmres
	number of iterations:	10

	Recursive residual =  3.9447e+00

	Calculated Norms				Requested Norm
	--------------------------------------------	--------------

	||r||_2 / ||r0||_2:		2.223153e-01	1.000000e-02

	***************************************************************



		Solution time: 0.000680 (sec.)
		total iterations: 10

Solver performed 10 iterations.
Norm of true residual = 3.94466

To demonstrate that is worked, we'll print out the first 10 values of the solution vector $\vec{x}$.

In [13]:
for (int i = 0; i < 10; i++){
    std::cout << x[i] << std::endl; 
}
-0.288916
-0.723977
-0.641509
0.309992
0.815517
0.904473
0.0381509
-0.675333
-1.5226
-3.14536

It's pretty amazing to be able to do interactive C++ developement. If we could get this to work in parallel, it would be a leap forward in the ability to do introspective high-performance scientific computing.

Python/Numpy implementation of Bspline basis functions

I was recently helping a student with some preliminary concepts in isogemetric analysis (IGA) and after taking a look at his pure Python implementation of the Cox - de Boor algorithm for computing B-Spline basis functions, I decided to look around for a Numpy implementation that could possibly be a little faster. While I found a few B-spline implementations in Python, I thought I might be able to sqeeze out a little more performance through vectorization and memoization while maintaining readablity of the code. I think I have something that accomplishes that, in the context of how basis functions are needed and used in IGA. The main portion of the code is here:

The full repository can be found/forked here: johntfoster/bspline.

An example use case is shown below. This reproduces Fig. 2.6 in Isogeometric Analysis: Towards the integration of CAD and FEA by Contrell, Hughes, and Bazilevs. With bspline.py in the working directory:

In [1]:
from bspline import Bspline

knot_vector = [0,0,0,0,0,1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
basis = Bspline(knot_vector,4)

%matplotlib inline
basis.plot()

As mentioned above the recursive function that computes the basis has been memoized such that it caches repeated evaluations of the function. This has a large impact on the evaluation speed. We can use Jupyter's %%timeit magic function to investigate this. I've hidden the raw evaluation functions through Python's name mangaling, therefore the following function is not intended for general API usage, but for testing purposes, let's evalute the basis functions defined by the knot vector and basis order given above using the non-memoized code:

In [2]:
import numpy as np
x = np.linspace(0,5,num=1000)
In [3]:
%%timeit 
[ basis._Bspline__basis(i, basis.p) for i in x ]
1 loops, best of 3: 192 ms per loop

Now let's time the memoized code

In [4]:
%%timeit 
[ basis(i) for i in x ]
100 loops, best of 3: 2.36 ms per loop

A speedup of roughly two orders of magnitude!

Run Peridigm (and other scientific HPC codes) without building via Docker

If your a software engineer working in web development or cloud software deployment, you would have had to have your head in the sand for the last two years to have not heard of Docker by now. Briefly, Docker is a light-weight application container deployment service (think super low-overhead virtual machine, usually meant to contain one application and its dependencies), based on Linux containers. While there are 100s of articles describing Docker, its uses, many services, and even companies springing up around it; my colleagues in the scientific computing/high-performance computing (HPC) world are either not aware of Docker or don't see the utility, because it's hard to find significant information for or uses of Docker in this field. This post will hopefully demonstrate some of the utility of the service and promote its use in HPC and in academic settings. Additionally, as a contributor to the open-source software Peridigm, my hope it that the path demonstrated here might lower the courage required to try out the software and promote new users.

To provide some background/context for my own interest in Docker, in my current role as an academic advisor to graduate students and in my previous career as a researcher at Sandia National Laboratories I have often been a user and/or developer of scientific software that has a fairly complex dependent application stack. To use Peridigm, a massively parallel computational peridynamic mechanics simulation code, as an example, we have a large dependency on Trilinos. The packages we use from Trilinos have additional dependencies on a message passing interface (MPI) implementation (e.g. OpenMPI), Boost, and NetCDF. NetCDF has a dependency on HDF5, HDF5 on zlib, etc. All of these need a C/C++ compiler for building, of course, and if we are using the Intel compilers we might as well go ahead and use MKL libraries for efficiency. Trilinos uses CMake as a build system, so we need that around as well, at least during the build process. I'm sure there are others I am not thinking of at this moment. Figuring out and understanding these dependencies and learning to debug build issues can take a really long time for a budding computational scientist to master. Problems can be really compounded in a large multi-user HPC cluster where many different versions of compilers and libraries are present. Using UNIX modules goes a long way towards keeping things straight, but problems still arise that can be time consuming to troubleshoot. I usually have new graduate students go through the process of building Peridigm and all the dependencies for their own learning experience when they first join my group, in every case they struggle considerably the first few times the have to build the entire suite of software on a new machine. In some cases, particularly with MS level students, it can be a serious impediment to progress, as they are very short time lines to make meaningful research progress, and oftentimes they are only going to be end-users of the code, not developers. Almost certainly they are not going to make changes to any of the packages in Trilinos or any of the other dependencies outside of Peridigm. Enter Docker.

Docker allows for the ability to run applications in prebuilt containers on any Linux machine with Docker installed (or even Mac OS X and Windows via Boot2Docker). If your Linux distribution doesn't already have it installed, just use your package manager (e.g. apt-get install docker on Debian/Ubuntu based machines and yum install docker on Fedora/Redhat based machines). Docker is so ubiquitous at this point there are even entire Linux distributions like CoreOS being built specifically to maximize its strengths. Additionally, there is the Docker Hub Registry which provides Github like push/pull operations and cloud storage of prebuilt images. Of course, you or your organization can host your own Docker image registry as well (note: think of a Docker image like a C++ class, and a Docker container as a C++ object, or an instance of the image that runs). You can derive images from one another and this is what I have done in setting up an image of Peridigm. First, I have an image that starts with a baseline Debian based Linux and installs NetCDF and its dependencies. I have two tagged versions, a standard NetCDF build and a largefiles patched version which makes the large file modifications as suggested in the Peridigm build documentation. The NetCDF largefile image can be pulled to a Docker users local machine from my public Docker Hub Registry with

docker pull johntfoster/netcdf:largefiles

the NetCDF image is built automatically via continuous integration with a Github repository that contains a Dockerfile with instructions for the image build process. I then derive a Trilinos image from the NetCDF largefiles image. This image contains only the Trilinos packages enabled such that Peridigm can be built. The Trilinos image can be pulled to a local machine with

docker pull johntfoster/trilinos

this Trilinos image could be used to derive other images for other Trilinos based application codes which utilize the same dependent packages as Peridigm. The cmake build script which shows the active Trilinos packages in this image can be viewed here. This image is also built via continuous integration with this Github repository which can easily be modified to include more Trilinos packages. Finally, the Peridigm image can be pulled to a local machine with

docker pull johntfoster/peridigm

this image is built from the Dockerfile found here, however it must be modified slightly to work with a local Peridigm source distribution if you want to build your own image because I do not want to distribute the Peridigm software source code as that is done by Sandia National Laboratories (see here if interested in getting the source). To run Peridigm you simply have to run the command:

docker run --name peridigm0 -d -v `pwd`:/output johntfoster/peridigm \ 
       Peridigm fragmenting_cylinder.peridigm 

Even if you have not pulled the images locally from the Docker Hub Registry, this command will initiate the download process and run the command after downloading. Because the image contains all the necessary dependencies, it is around 2GB in size; therefore, it takes a while to download on the first execution. I have not made any attempts to optimize the image size, but it's likely it could be made smaller to speed this process up. Once you have the image locally, launching a container to run a simulation takes only milliseconds. The --name option gives the container the name peridigm0, this could be any name you choose and naming the container is not required, but makes for an easy way to stop the execution if you wish. If you need to stop the execution, simply run docker stop peridigm0. The -d option "detaches" the terminal so that the process runs in the background, you can reattach if needed with docker attach peridigm0. The -v option is critical for retrieving the data generated by the simulation. Docker handles data a little strangely, so you have to mount a local volume, in this case the current working directory returned by pwd, but in general it could any /path/to/data/storage. The local path is mounted to the volume /output that has been created in the Peridigm Docker image. You must place your input file, in this case fragmenting_cylinder.peridigm from the examples distributed with Peridigm, in your local shared mount for the Peridigm executable located in the Docker image to access it. As the simulation runs, you will see an output file fragmenting_cylinder.e appear in the shared mount. The other two arguments, johntfoster/peridigm and Peridigm are the image name in the Docker Hub Registry and the executable, respectively.

To quickly recap, if all you want to test out Peridigm quickly, follow these 3 steps:

  1. Install Docker via your package manager on Linux or utilize Boot2Docker on Mac OS X and Windows.

  2. Place a Peridigm input file such as fragmenting_cylinder.perdigm in a directory.

  3. Run the docker run ... command above replacing pwd with the directory name where you placed the file in 2., if not running from the current working directory.

That's it. No compiling, no dependencies. Most of the performance studies I've seen report a very small, 1-2% hit from the Dockerized version of an application over a natively installed application stack.

Of course, Peridigm is meant to be run in a massively parallel setting. A Docker container can take advantage of some multithreading, therefore you can also just run an MPI job right inside a Docker container, for Peridigm we can simply run

docker run --name peridigm0 -d -v `pwd`:/output johntfoster/peridigm \
       mpiexec -np 2 Peridigm fragmenting_cylinder.peridigm 

for a possibly small performance gain. If you run this command, you should now see the domain decomposed output files, e.g. fragmenting_cylinder.e.0 and fragmenting_cylinder.e.1 appear in your mounted directory. However, this alone will not give you the true gains you expect form multicore CPUs or close to what you would see in a massively parallel HPC cluster. We can spawn a virtual cluster of Docker containers, from the johntfoster/peridigm image and use MPI to communicate between them. To do this, we launch our virtual cluster with docker run ... but this time without any execution command. The default is for these containers to be ssh servers, so they will sit there idle until we need them for an MPI task. For example, if we want a 2 node job, we can run

docker run --name peridigm0 -d -P -h node0 -v `pwd`:/output johntfoster/peridigm
docker run --name peridigm1 -d -P -h node1 -v `pwd`:/output johntfoster/peridigm

this launches containers named peridigm0 and peridigm1 that will have local (to their containers) machine names node0 and node1. Now we can find the containers IP addresses with the following command

docker inspect -f "{{ .NetworkSettings.IPAddress }}" peridigm0
docker inspect -f "{{ .NetworkSettings.IPAddress }}" peridigm1

These commands will return the ip addresses, just for demonstration purposes, let's assume that they are 10.1.0.124 and 10.1.0.125 respectively. Now we can launch mpiexec to run Peridigm on our virtual cluster. This time the containers are already running, so all we have to is run

docker exec peridigm0 mpiexec -np 2 -host 10.1.0.124,10.1.0.125 \
       Peridigm fragmenting_cylinder.peridigm

which executes mpiexec on the container named peridigm0 and passes a host list to make MPI aware of where the nodes waiting for tasks are. You can have fine grained control over mpirun as well, for example if you wanted to run 2 MPI tasks per Docker container, you could do something like

docker exec peridigm0 mpiexec -np 4 -num-proc 2 -host 10.1.0.124,10.1.0.125 \
       Peridigm fragmenting_cylinder.peridigm

when your finished you can stop and remove the containers with

docker stop peridigm0 peridigm1
docker rm -v peridigm0 peridigm1

the -v option to docker rm ensures that the mounted volumes are removed as well and there not any orphaned file system volume shares. I have written a Python script (shown below) that automates the whole process for an arbitrary number of Docker containers in a virtual cluster.

You can utilize this script to run Peridigm simulations on a virtual Docker cluster with the following command:

./pdrun --np 4 fragmenting_cylinder.peridigm --path=`pwd`

please consult the documentation of the script for more info about the options using ./pdrun -h. There are really only a few detials in between what I've done here and being able to deploy this on a production HPC cluster. The main difference would be that you would use a resource manager, e.g. SLURM to schedule running the Docker containers. You would also want to have both your mounted volume and a copy of the Docker image on a shared filesystem across all nodes of the cluster. Did I mention you can deploy your own private registry server?

The last point I would like to make about Docker is that, while I demonstrated that end users can benefit by eliminating the entire build process and move right to running the simulation code, also developers could greatly benefit from using a common development image and requiring only support for the Docker image, as it can be quickly deployed and tested against all platforms. How many times have your heard, even from experienced developers, after a commit fails in continuous integration, "It worked on my machine...". Docker offers a solution to those problems and many others.

A LaTeX Beamer template/theme for the Cockrell School of Engineering

This post is mainly aimed at my colleagues (and students) in the Cockrell School of Engineering at The University of Texas at Austin. As you may have gathered from other posts on this blog, I am a huge fan of LaTeX for typesetting documents. Recently, I had to give a talk within the Department and was asked polietly by our Communications Coordinator to use a provided MS PowerPoint template that conforms to the Visual Style Guide of the Cockrell School of Engineering.

I have many reasons for disliking PowerPoint that I will not go into here, but I have avoided using it at all costs for several years now, and did not want to start now. I also have an appreciation for branding and respect the desire to project a uniform brand from the Departments/School. My solution was to create my own LaTeX Beamer presentation style that replicates the Cockrell School PowerPoint as close as possible. I hacked it together relatively quickly for my talk which I delivered last week, but decided over the weekend to put together something a little more robust to share with my colleagues in the Cockrell School. The style file accepts arguments to input different department names and has some nice features for using BibTeX citations within the presentation.

The repository can be found cloned with git with the following command:

git clone git@github.com:johntfoster/cockrell-school-latex-beamer-template

it can also be found on GitHub. Feel free to fork it and extend it as you see fit. If you find any errors or add useful extensions, please send pull-requests.

You can also download a zip archive of the files : cockrell-school-latex-beamer-template-master.zip

An example of the PDF output results are shown below:

Prevent TeXShop from stealing focus from an external editor

I use Vim as my editor of choice for nearly all text editing activities including writing LaTeX documents. My usual workflow is to use a split window setup with tmux and run latexmk -pvc on my LaTeX file in one split-pane while editing the source file in the other split-pane. If your not familiar with latexmk it is a Perl script that, in a smart way, using minimal number of compile runs, keeps your LaTeX document output up-to-date with the correct cross-references, citations, etc. The -pvc option keeps it running in the background and recompiles with a detected change in any of the project files. I then use TeXShop as a PDF viewer because it has the nice ability to detect changes to the PDF output and autoupdate. Mac OS X's built-in Preview will do this as well, but you must click the window to enable the refresh. However, by default, TeXShop will steal focus from the editor window. This is annoying causing me to click or Command-Tab back to the Terminal to continue typing. I found a fix in a comment on Stack Exchange. If you type

defaults write TeXShop BringPdfFrontOnAutomaticUpdate NO

in the Terminal window this will disable the focus-stealing behavior and leave the window in the background so that it doesn't disrupt continuous editing.

Using Exodus.py to extract data from FEA database

This post will briefly show how you can use the exodus.py script distributed with Trilinos to extract data directly from an Exodus database. Exodus is a wrapper API on NetCDF that is specifically suited for finite element data. That is, it defines variables on nodes, elements, blocks of elements, sets of nodes, etc. The Exodus API is provided in both C and Fortran, exodus.py uses ctypes to call into the compiled Exodus C library.

First, I need to rearrange my install environment a little because exodus.py expects the NetCDF and Exodus compiled dynamic libraries to be in the same directory. On my machine they are not, so I will just create some symbolic links. It also expects the Exodus include file to be in a folder labeled inc, but on my machine it is labeled include so again, I will just create some symbolic links.

In [1]:
%%bash
ln -sf /usr/local/netcdf/lib/libnetcdf.dylib /usr/local/trilinos/lib/.
ln -sf /usr/local/trilinos/include /usr/local/trilinos/inc

If your /usr/local is not writable, you may need to use sudo to create the links. Also, I am on Mac OSX where dynamic libraries have a .dylib file extension. If you use Linux, you will need to change .dylib above to .so.

We also need to add the path of exodus.py to the active PYTHONPATH. We can do this from within the IPython session.

In [2]:
import sys
sys.path.append('/usr/local/trilinos/bin')

Now we can load the exodus class from exodus.py and instantiate a file object with a given filename. I will use the ViscoplasticNeedlemanFullyPrescribedTension_NoFlaw.h Exodus history database that is output from the Peridigm verification test of the same name.

In [3]:
from exodus import exodus

e = exodus('ViscoplasticNeedlemanFullyPrescribedTension_NoFlaw.h', mode='r', array_type='numpy')
You are using exodus.py v 1.02 (beta), a python wrapper of some of the exodus II library.
Copyright (c) 2013,2014 Sandia Corporation. Under the terms of Contract DE-AC04-94AL85000
with Sandia Corporation, the U.S. Government retains certain rights in this software.

Opening exodus file: ViscoplasticNeedlemanFullyPrescribedTension_NoFlaw.h

Now we'll use the API calls to extract the data. First we'll get the time step values.

In [4]:
time_steps = e.get_times()

Now we can print the global variable names.

In [5]:
e.get_global_variable_names()
Out[5]:
['Max_Von_Mises_Stress', 'Min_Von_Mises_Stress']

And use the global variable names to extract the data from the database. Since we used array_type='numpy' when we instantiated the file object above. The data is stored in numpy arrays.

In [6]:
vm_max = e.get_global_variable_values('Max_Von_Mises_Stress')
vm_min = e.get_global_variable_values('Min_Von_Mises_Stress')

Because in this example test we load at a constant strain-rate, we can easily convert the time-steps to engineering strain.

In [7]:
eng_strain_Y = time_steps * 0.001 / 1.0e-8

Now we can create a stress-strain curve

In [8]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(eng_strain_Y, vm_max, eng_strain_Y, vm_min);
plt.ylabel("Max/Min von Mises Stress");

I don't want the symbolic links I created earlier to cause any unexpected trouble for me later, so I will remove them.

In [9]:
%%bash
rm /usr/local/trilinos/lib/libnetcdf.dylib
rm /usr/local/trilinos/inc