NWChem for OSX on PPC

NWChem (http://www.emsl.pnl.gov/docs/nwchem/nwchem.html) is a computational chemistry package that is designed to run on high-performance parallel supercomputers as well as conventional workstation clusters. It aims to be scalable both in its ability to treat large problems efficiently, and in its usage of available parallel computing resources. NWChem has been developed by the Molecular Sciences Software group of the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL). Most of the implementation has been funded by the EMSL Construction Project.

Compiling for OSX on PPC

Installing NWChem on the PPC architecture was a pain. Main reason being some Global Arrays Libraries which since has been fixed (i hope). Below documents the steps that i took to get NWChem compiled.

Tools

Below are the opensource or free tools used to get things going

  • GCC3.3.6
  • gcc3 from XCode
  • LAM/MPI 7.1.1

More specifically,

  • Ensure that gcc is from Xcode
  • f77 is from GCC3.3.6
  • LAM-MPI is compiled with the above mentioned compilers

Patches to the code

In armci/src/GNUmakefile, in the statement

SOCKETS  = $(SYSTEM_V)

you need to add MACX, i.e.,

SOCKETS  = $(SYSTEM_V) MACX

In file tcgmsg-mpi/nxtval-armci.c, please comment line#105 (if(NODEID_()== NXTV_SERVER)ARMCI_Free(pnxtval_counter);)

void finalize_nxtval()
{
  /* if(NODEID_() == NXTV_SERVER)ARMCI_Free(pnxtval_counter); */
    ARMCI_Finalize();
}

Also note that i am using GA4.. i.e. i replaced the “tools” in the src directory in NWCHEM and copied the original GNUMakefile over…

Environment Variables

Below are the environment variables used

# NWCHEM stuff
export TCGRSH=ssh
export NWCHEM_TOP=/cluster/nwchem-4.7/src/nwchem-4.7
export NWCHEM_TARGET=MACX
# LAM_MPI configuration for NWCHEM
export MPI_LOC=/opt/cluster/lam-7.1.1/gcc-3
export MPI_LIB=$MPI_LOC/lib
export MPI_INCLUDE=$MPI_LOC/include
export LIBMPI="-llamf77mpi -lmpi -llam -lpthread"
export NWCHEM_NWPW_LIBRARY=/opt/cluster/nwchem-4.7/data/

Ensure that your FC and CC is pointing to the right compilers

Compiling NWChem

The modules and make commands where configured as

$ make nwchem_config NWCHEM_MODULES=all
$ gcc_select 3.3
$ make TARGET=MACX USE_MPI=y DIAG=PAR

The installation mechanics is as described in the INSTALL file

Compiling ga-mpi.x for Global Arrays (GA) tests

Below is the code i used to compile ga-mpi.x to test the GA component.

cc -I../../include -DMACX -O   -c -o ga-mpi.o ga-mpi.c -L/opt/cluster/lam-7.1.1/gcc-3/lib -llamf77mpi -lmpi -llam -lpthread -I/opt/cluster/lam-7.1.1/gcc-3/include
/opt/cluster/gcc-3.3.6/bin/g77 -c -O -O3 -funroll-loops -fno-second-underscore -Wno-globals -I../../include -DMACX  ffflush.F
mpicc -I../../include -DMACX -O   -c -o util.o util.c
if [ -f ga-mpi.c ]; then
/opt/cluster/gcc-3.3.6/bin/g77 -g  -O3 -funroll-loops  -fno-second-underscore -Wno-globals
-o ga-mpi.x ga-mpi.o util.o -L../../lib/MACX -lglobal -lma  -llinalg  -larmci -L/opt/cluster/lam-7.1.1/gcc-3/lib
-ltcgmsg-mpi -llamf77mpi -lmpi -llam -lpthread -lm -lm -L/usr/lib/gcc/darwin/default -lgcc;
else        /opt/cluster/gcc-3.3.6/bin/g77 -g  -O3 -funroll-loops  -fno-second-underscore -Wno-globals
 -o ga-mpi.x ga-mpi.o util.o ffflush.o -L../../lib/MACX -lglobal -lma  -llinalg  -larmci
 -L/opt/cluster/lam-7.1.1/gcc-3/lib -ltcgmsg-mpi -llamf77mpi -lmpi -llam -lpthread -lm -L/usr/lib/gcc/darwin/default
 -lgcc;
fi

Errors

If there are errors about

-mtune=970 -mcpu=970...

i changed these to

-mcpu=powerpc

in config/makefile.h.

Parallelization of d2_cluster

Attached is some work i have done in the past for a company for the benchmarking of d2_cluster in MPI mode. I can’t provide the details as the source was revealed to me under NDA. The attached document was the best i could do.

Abstract: The increasing use of EST data for sequence analysis has lead to the parallization of such computations in order to provide for the capabilities to analyze these information at a greater rate. Coupled with the improvements and affordability of Linux clusters, new methods of parrallizing d2_cluster to aid in the sequence analysis would be beneficial. D2_cluster was parrallized using MPI in a Beowulf cluster. Results obtained in the tests shows promising results with speed up of 42.7 over a uni-processor system using 8 CPUs.

Continue reading Parallelization of d2_cluster