================ Performance Tips ================ Charm4py will help you parallelize and scale your applications, but it won't make the sequential parts of your code faster. For this, there are several technologies that accelerate Python code, like NumPy_, Numba_, and Cython_. These are outside the scope of this section, but we highly recommended using Numba. We have found that using Charm4py + Numba, it is possible to build parallel applications entirely in Python that have the same or similar performance as the equivalent C++ application. Many examples in our source code repository use Numba. This section contains tips to help maximize the performance of your applications by reducing runtime overhead. Overhead becomes apparent at very low method or task granularity and high communication frequency. Therefore, whether these tips actually help depends on the nature of your application and the impact overhead has on it. Also keep in mind that there are other factors besides overhead that can affect performance, and are outside the scope of this section. .. note:: Method granularity refers to the time for a chare's remote method to run or, in the case of coroutines, the time the method runs before it suspends and control is switched to a different task. - For best inter-process communication *on the same host*, an efficient network layer is highly recommended. For example, OpenMPI uses shared memory for inter-process communication and is much faster than Charm++'s TCP communication layer. On supercomputers, you should build Charm++ choosing a network layer that is optimized for the system interconnect. The Charm4py version distributed via pip uses TCP. You have to build Charm++ to use a different network layer (see :doc:`install`). .. - Coroutines are very lightweight, but do add a tiny bit of overhead. For .. very small methods that do a negligible amount of work but are called frequently, .. you might want to consider avoiding use of coroutines (rely just on message .. passing and method invocation). - If you are sending large arrays of data, use Numpy arrays (or arrays from Python's ``array`` package) and send each as a separate parameter. This allows Charm4py to directly copy the contents of the arrays to a message that is sent through the network (thus bypassing pickling/serialization libraries). For example: ``proxy.method(array1, array2, array3)``. In the case of updateGlobals, have each array be an element of the dict, for example: ``charm.thisProxy.updateGlobals({'array1': array1, 'array2': array2, ...})`` With channels, do the following: ``ch.send(array1, array2, ...)`` Note that these types of arguments can be freely intermixed with others not supporting the buffer protocol. - If you are frequently indexing a proxy (for example ``myproxy[3]``) it is more efficient to store the proxy to the individual element and reuse it, for example:: elem_proxy = myproxy[3] for _ in range(100): elem_proxy.work(...) - When calling remote methods, it is generally more efficient to use unnamed arguments. - Avoiding ``awaitable=True`` and ``ret=True`` in the critical path can reduce overhead in some cases. Internally, awaitable calls require creating a future and sending it as part of your remote method call. It should always be possible to rewrite code so that notification of completion or results are sent via a separate and explicit method invocation, although this can tend to result in less readable code. - Make sure profiling is disabled (it is disabled by default). Charm4py prints a warning at startup if it is enabled. - Charm4py accesses the Charm++ shared library using Cython. Previous support for ctypes and cffi has been removed. .. _numpy: https://www.numpy.org/ .. _Numba: https://numba.pydata.org/ .. _Cython: https://cython.org/