mirror of
https://github.com/andreinechaev/nvcc4jupyter.git
synced 2026-06-14 03:00:47 +05:30
Add option to choose between NSYS and NCU profilers (#28)
* Add option to give nvcc extra arguments * Add test for nvcc options that changes c++ dialect from c++17 to c++14 * Add make and the english language pack to devcontainer to be able to build the documentation * Update documentation config to automatically import the current version of the package * Document new --compiler-args argument * Improve tests coverage by testing for bad arguments and the error output during a failed compilation * Add IPython to docs requirements to allow the __version__ import for readthedocs env * Change devcontainer base image to have the latest CUDA toolkit * Mock the nsight compute tool with a bash script * Add test to compile with opencv * Add new page to documentation that contains a new notebook that explains compiling with external libraries * Add autodocstring vscode extension to devcontainer * Add function that modifies the default profiler/compiler arguments to allow reusing them in multiple magic command calls * Update pylint exceptions * Update contributing instructions * Change version from 1.0.3 to 1.1.0 due to adding features in a backward-compatible manner * Install latest CUDA toolkit on the test runner to pass the OpenCV compilation test * Install opencv in test runner and update code coverage install * Add CUDA bin to PATH in test and coverage runners * Add cuda bin to path variable in .bashrc * Update way to set environment variable PATH in github action * Change devcontainer base image back to ubuntu:22.04 to match the environment from the test runner * Add option to choose between NSYS and NCU profilers * Add tests for choosing the profiler * Add isort config to help it find local modules so they are not considered 3rd party libraries * Replace experimental-string-processing black formatter config with enable-unstable-feature as it was removed in version 24.1.0 * Search for profiling tools executable paths when they are required * Install dev dependencies in editable mode * Add documentation for using Nsight Systems instead of the default Nsight Compute profiling tool * Fix cuda typo * Mention Nsight Systems in README.md
This commit is contained in:
committed by
GitHub
parent
781ff5b76b
commit
0bddf6a6e6
+25
-3
@@ -225,10 +225,11 @@ Profiling
|
||||
---------
|
||||
|
||||
Another important feature of nvcc4jupyter is its integration with the NVIDIA
|
||||
Nsight Compute profiler, which you need to make sure is installed and its
|
||||
executable can be found in a directory in your PATH environment variable.
|
||||
Nsight Compute / NVIDIA Nsight Systems profilers, which you need to make sure
|
||||
are installed and the executables can be found in a directory in your PATH
|
||||
environment variable.
|
||||
|
||||
In order to use it and provide the profiler with custom arguments, simply run:
|
||||
To profile using Nsight Compute with custom arguments:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
@@ -256,6 +257,27 @@ Running the cell above will compile and execute the vector addition code in the
|
||||
Compute (SM) Throughput % 1.19
|
||||
----------------------- ------------- ------------
|
||||
|
||||
To profile using Nsight Systems with custom arguments:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
%cuda_group_run --group "vector_add" --profiler nsys --profile --profiler-args "profile --stats=true"
|
||||
|
||||
Running the cell above will compile and execute the vector addition code in the
|
||||
"vector_add" group and profile it with Nsight Systems. The output will contain
|
||||
multiple tables, one of which will look similar to this:
|
||||
|
||||
.. code-block::
|
||||
|
||||
[5/8] Executing 'cuda_api_sum' stats report
|
||||
|
||||
Time (%) Total Time (ns) Num Calls Avg (ns) Med (ns) Min (ns) Max (ns) StdDev (ns) Name
|
||||
-------- --------------- --------- ------------- ------------- ----------- ----------- ----------- ----------------------
|
||||
77.3 200,844,276 1 200,844,276.0 200,844,276.0 200,844,276 200,844,276 0.0 cudaMalloc
|
||||
22.6 58,594,762 2 29,297,381.0 29,297,381.0 29,153,999 29,440,763 202,772.8 cudaMemcpy
|
||||
0.1 305,450 1 305,450.0 305,450.0 305,450 305,450 0.0 cudaLaunchKernel
|
||||
0.0 1,970 1 1,970.0 1,970.0 1,970 1,970 0.0 cuModuleGetLoadingMode
|
||||
|
||||
Compiler arguments
|
||||
------------------
|
||||
|
||||
|
||||
Reference in New Issue
Block a user