{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Multi-GPU and multi-threading - All About\n", "\n", "This document briefly explains how the GPU and multi-threading is used in OSCARS.\n", "\n", "The simplest way to use threads and the GPU is to specify it when creating an oscars.sr object:\n", "\n", " # Create a new OSCARS object. Default to 8 threads and always use the GPU if available\n", " osr = oscars.sr.sr(nthreads=8, gpu=1)\n", "\n", "This will set the defaults to use 8 threads (choose however many you like) and to use as many GPUs as are available. The GPU has precidence over multi-threading. Nothing else is needed unless you wish to use the GPU/threads on individual function calls, then see below.\n", "\n", "\n", "Nearly all functions in oscars.sr accept the argument 'gpu=1' and 'gpu=0'. If you select 1, oscars will use the gpu to do the calculation. All of these functions also accept the argument 'ngpu=X' where X is the number of GPUs you which to use as well as 'ngpu=[n0, n1, ...]' where n0, n1, ... are the gpu numeric identifiers of the GPUs you wish to use. You may find these using the command:\n", "\n", " osr.print_gpu()\n", "\n", "Nearly all functions in oscars.sr accept an argument like 'nthreads=123' where 123 is the number of threads you wish to use for a calculation.\n", "\n", "At the moment the GPU has higher precident than threads. This means that if you attempt to use both, the gpu will be enabled without multi-threading.\n", "\n", "You MAY use the GPU or multi-threading with MPI, however the user should take care that the distribution of resources makes sense.\n", "\n", "You can also set the gpu and nthreads global flags inline anywhere if you like:\n", "\n", " osr.set_gpu_global(1)\n", " \n", " osr.set_nthreads_global(123)\n", " \n", "You can check if you have a gpu that oscars can see. This will return the number of GPUs that oscars can see, or -1 if your version was not compiled with GPU support.\n", "\n", " osr.check_gpu()\n", "\n", " \n", "## When is the GPU or multi-threading useful?\n", "\n", "Always. Even if you are calculating a single-particle spectrum, the points in the spectrum are handed to different threads (on the CPU or GPU). If you are looking at a 2D or 3D flux or power density the different points are distributed. There is some overhead in copying data over to the GPU, but this is almost always outweighed by the GPU performance as compared to a typical workstation.\n", "\n", "## Problems running on the GPU?\n", "\n", "In order for OSCARS to use your GPU the driver must be correctly installed for your operating system. At the moment it also must be an nvidia cuda-compatible card (quite common)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# matplotlib plots inline\n", "%matplotlib inline\n", "\n", "# Import the OSCARS SR module\n", "import oscars.sr\n", "\n", "# Import OSCARS plots (matplotlib)\n", "from oscars.plots_mpl import *" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Create a new OSCARS object. Default to 8 threads and always use the GPU if available\n", "osr = oscars.sr.sr(nthreads=8, gpu=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Will return the number of GPUs available, or -1 and print an error\n", "# If you built OSCARS yourself with setup.py it will likely not have GPU support\n", "# builtin. The binary versions available for download all have this builtin\n", "\n", "osr.check_gpu()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Print gpu information\n", "osr.print_gpu()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# For these examples we will make use of a simple undulator field\n", "osr.add_bfield_undulator(bfield=[0, 1, 0], period=[0, 0, 0.042], nperiods=31)\n", "\n", "# Plot the field\n", "plot_bfield(osr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Beam\n", "\n", "Add a basic beam somewhat like NSLS2. All that is below also works for multi-particle simulations" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Add a basic electron beam with zero emittance\n", "osr.set_particle_beam(energy_GeV=3, x0=[0, 0, -1], current=0.500)\n", "\n", "# You MUST set the start and stop time for the calculation\n", "osr.set_ctstartstop(0, 2)\n", "\n", "# Plot trajectory of beam\n", "osr.set_new_particle()\n", "plot_trajectory_position(osr.calculate_trajectory())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Spectrum\n", "\n", "If you set the global settings for gpu or nthreads you do not need to specify it in each function call, but if you do it will override any global settings." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Use multi-threading\n", "spectrum = osr.calculate_spectrum(obs=[0, 0, 30], energy_range_eV=[100, 800], npoints=500, nthreads=8)\n", "plot_spectrum(spectrum)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Use the GPU\n", "spectrum = osr.calculate_spectrum(obs=[0, 0, 30], energy_range_eV=[100, 800], npoints=500, gpu=1)\n", "plot_spectrum(spectrum)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Use specific number of GPUs, in the case use 3 GPUs (if available)\n", "spectrum = osr.calculate_spectrum(obs=[0, 0, 30], energy_range_eV=[100, 800], npoints=500, ngpu=3)\n", "plot_spectrum(spectrum)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Use specific GPUs, in the case GPUs numbered 0, 2, and 3\n", "spectrum = osr.calculate_spectrum(obs=[0, 0, 30], energy_range_eV=[100, 800], npoints=500, ngpu=[0, 2, 3])\n", "plot_spectrum(spectrum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other Calculations\n", "\n", "Thread and GPU specifications for flux and power density calculations are exactly the same as above" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 1 }