SOFA: A Cross-framework performance profiler for heterogeneous computing systems, especially for HPC and distributed machine learning systems.
- Run
./tools/prepare.shto install all the necessary packages and python packages. - [OPTIONAL] Run
./tools/empower.py $(whoami) $(which tcpdump)to make network related events tracable in SOFA. After running this step, it is required to re-login to APPLY THE CHANGES!!!
- Simply run
./install.sh </PATH/TO/INSTALL>to install SOFA on your system. Note thatsofawill be appended to the path if the last directory is not sofa. - Then, run
source </PATH/TO/INSTALL>/sofa/tools/activate.shto activate SOFA running environment. (Need to be executed on each new shell.) - [ALTERNATIVE] Add
source </PATH/TO/INSTALL>/sofa/tools/activate.shin your~/.bashrcto make this environment available on every shell.
SOFA supports serveral different usages, like how one can use perf. More details can be seen in the following slide
- slide: https://docs.google.com/presentation/d/1fyNnLlU-0WMIddkI8hgYn0Tg1vbP9i7VuXSPIsXB2L4/edit?usp=sharing
- Profile your program by sampling involved CPUs:
sofa stat "dd if=/dev/zero of=dummy.out bs=100M count=10" - Profile your program by sampling all CPUs:
sofa stat "dd if=/dev/zero of=dummy.out bs=100M count=10" --profile_all_cpus
sofa record "dd if=/dev/zero of=dummy.out bs=100M count=10"sofa report [--verbose] [--with-gui]- If passing "--with-gui" to "sofa report", you could open browser with one of the following links for different visualizations.
SOFA provides options for advanced usages. Some examples are shown below. Please use sofa --help to see more info.
sofa record "python tf_cnn_benchmarks.py" --cpu_filters="idle:black,tensorflow:orange"sofa record "python tf_cnn_benchmarks.py" --gpu_filters="tensorflow:orange"
sofa record "python3.6 pytorch_dnn_example.py -a resnet50 /mnt/dataset/imagenet/mini-imagenet/raw-data --epochs=1 --batch-size=64"

sofa record "./scout dt-bench ps:resnet50 --hosts='192.168.0.100,192.168.0.101'"

sofa record "~/cuda_samples/1_Utilities/bandwidthTest/bandwidthTest"

sofa record "./scout t-bench resnet50_real"

We strongly encourage and appreciate any contributions to SOFA to make our performance engineering work more comfortable. But to maintain the quality of the codes, we need to regulate cooperations as the following:
- Please run
test/test.pybefore sending pull request. If you want to test SOFA on some platforms, you could run./test/test.py --dockerfiles Dockerfile.ubuntu.1604,Dockerfile.ubuntu.1804where corresponding dockerfiles must be placed inside directory of test.
