Jekyll2021-04-07T21:13:39+00:00https://shrokmel.github.io/feed.xmlArvind RavichandranPhysics - Engineering - DesignWho actually knows Python?2017-04-23T00:00:00+00:002017-04-23T00:00:00+00:00https://shrokmel.github.io/articles/who-knows-python<p>I realised how relevant this question really was, only recently, close to five years of actively using and coding in Python. What is “knowing”? When distracted about solving a given problem, it is very tempting to ignore thumbrules on writing good code. And there it is again, what is “good code”. Veteran Python developers use the word “pythonic”, almost interchangeably with good Python code. As I try to put a finger on what is actually pythonic, I am running the risk of the same issue that St. Augustine faced when trying to define time: “If no one ask of me, I know; if I wish to explain to him who asks, I know not.” Nevertheless, let me try and bumble my way through this relatively vague notion, because these seemingly subjective terms, “knowing Python”, or writing “pythonic” code,” can, in fact, carry a tangible, and important meaning.</p>
<p>There are multiple ways to solve any problem in Python. None of them are necessarily wrong, as long as you get the correct answer. But there are definitely only a small number of ways that makes your code readable, less verbose, and fast. This is what makes code pythonic. Pythonic code makes Python beautiful. In that brief moment, you read it, and go, “woah.. that’s pretty cute.” And, in my opinion, it is only in this context can one truly say that one “knows” Python. It is because of this peculiar quality to Python that I think it is worth mastering this versatile language. When you actually know Python, you can write readable, faster code, faster. This topic was covered in <a href="https://blog.startifact.com/posts/older/what-is-pythonic.html">this</a> blog post 12 years ago, and remains relevant to a large extent today.</p>
<p>The person who made me such a proponent of the language, Ryan Bradley, once said, “If you know what you’re doing, you can write Python code, which is as fast your C code”. I can’t say that I have managed to achieve this all the time. But I can certainly see many instances where properly written Python scripts are faster than C scripts, if you include the actual writing of the code into the process.</p>
<p>For a terribly long time, I’m embarrased to admit, I had been oblivious to the Python faux pas that I had been committing. I became acutely aware of my inefficiencies largely after watching this video by Raymond Hettinger.</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/OSGv2VnC0go" frameborder="0" allowfullscreen=""></iframe>
<p>After the video, I left Python 2.7 for Python 3, and made it a point to consciously write pythonic code. Here is a collection of tricks that I have collated over this period (particularly in the context of scientific computing).</p>
<p>Over the years of coding, I’ve come to learn that optimising Python code comes in clear discrete steps. We must always begin by writing a program that gives correct results. Then check where the program is slow, by way of profiling, and over time, just by looking at parts which are sloppy. Every step of the optimisation process, you can check your results with the one that you are sure is correct. In general, one can follow <a href="https://wiki.python.org/moin/PythonSpeed/PerformanceTips">these steps</a>:</p>
<ol>
<li>Get it right</li>
<li>Test it’s right</li>
<li>Profile if slow</li>
<li>Optimise</li>
<li>Repeat from 2</li>
</ol>
<p>Certain optimizations amount to good programming style and so should be learned as you learn the language. An example would be moving the calculation of values that don’t change within a loop, outside of the loop. We will see instances of this shortly.</p>
<h2 id="avoiding-for-loops">Avoiding for loops</h2>
<p>I’ve come to learn that that there is an incredible number of tools that you can use to avoid clunky, slow for loops in Python. The more you use these tricks, the faster, and more readable your code becomes.</p>
<p>Let’s say we have an array of numbers, between 0 - 100, in an arbitrary order. We want to simply pick out those values which are between 0 and 50, and populate a new array. In C this looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// "arr" contains these numbers</span>
<span class="c1">// we want to populate the array, "pick" with numbers between 0 - 50</span>
<span class="n">ct</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="c1">// counter for pick</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="n">i</span><span class="o"><</span><span class="n">list_length</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">){</span>
<span class="k">if</span> <span class="n">arr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o"><</span><span class="mi">50</span><span class="p">{</span>
<span class="n">pick</span><span class="p">[</span><span class="n">ct</span><span class="p">]</span><span class="o">=</span><span class="n">arr</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">ct</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The direct equivalent for this in Python would be this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">list_length</span><span class="p">):</span>
<span class="k">if</span> <span class="n">arr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o"><</span><span class="mi">50</span><span class="p">:</span>
<span class="n">pick</span><span class="p">[</span><span class="n">ct</span><span class="p">]</span> <span class="o">=</span> <span class="n">arr</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">ct</span><span class="o">++</span>
</code></pre></div></div>
<p>Although, this works, it wouldn’t be considered pythonic. Firstly, we have this ugly array “pick”, which needs to be defined before hand, and is of unknown size. Even if we made it a list, to which we append these numbers as they arrive, we have to convert it back to an array to operate on it. We can improve this, substantially, and firstly by removing the for loop, and making the if condition an array index.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pick</span> <span class="o">=</span> <span class="n">arr</span><span class="p">[</span><span class="n">arr</span><span class="o"><</span><span class="mi">50</span><span class="p">]</span>
</code></pre></div></div>
<p>This reads, all values in <em>arr</em>, which have values less than 50, are now <em>pick</em>. Cute, fast, readable, pythonic! This method of selecting values based on condition is rather versatile. We can add conditions together, and place it inside the array index too. For instance, if we want values greater than 25, smaller than 50, we write:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pick</span> <span class="o">=</span> <span class="n">arr</span><span class="p">[(</span><span class="n">arr</span><span class="o">></span><span class="mi">25</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">arr</span><span class="o"><</span><span class="mi">50</span><span class="p">)]</span>
</code></pre></div></div>
<p>This trick becomes incredibly powerful, when you have two arrays, <em>arr1</em> and <em>arr2</em>, describing the same set, and are hence of the same size. Now, if you want to select values in <em>arr1</em>, based on a criteria that satisfies <em>arr2</em>, we can use the same method that we used eariler. For instance, we have particles with <em>x</em> and <em>y</em> values in two different arrays, <em>arrx</em> and <em>arry</em>, respectively. You want the <em>x</em> axis values of those particles who have <em>y</em> values greater than 75:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pick</span> <span class="o">=</span> <span class="n">arrx</span><span class="p">[</span><span class="n">arry</span><span class="o">></span><span class="mi">75</span><span class="p">]</span>
</code></pre></div></div>
<p>No for loops!</p>
<!-- ## List comprehensions -->
<!-- ## Generators -->Arvind Ravichandrana.ravichandran@fz-juelich.deMany people claim to know Python, but who actually does?Distance Calculation2017-04-22T00:00:00+00:002017-04-22T00:00:00+00:00https://shrokmel.github.io/articles/distance<p>Time complexity of an algorithm quantifies the amount of time taken for a calculation as a function of the quantity of input. Naively speaking, most molecular dynamics algorithms are of quadratic time complexity, or O(n<sup>2</sup>) problems. This means that as the number of entities or particles, <em>n</em>, in the system increases, the time taken to complete the calculations increases quadratically.</p>
<p>Molecular dynamics algorithms are O(n<sup>2</sup>) because computing distances between particles, when dealt with naively, is of quadratic time complexity. This is also the bottle neck in most algorithms. By way of neighbour lists and cell lists, it is indeed possible to make this logarithmic, but for the purpose of this post, let’s look at tricks to optimise the simple approach using python.</p>
<p>This problem can be most efficiently solved by simply using the <code class="language-plaintext highlighter-rouge">scipy.spatial.distance.pdist</code> function. But this post will help in understanding how to approach this <em>type</em> of problems using Python. For instance, the issue at hand might not always be computing the distance between particles. The problem could be computing the dot product of orientations of all combinations of particle pairs. With this in mind, let us begin!</p>
<h2 id="particles-in-open-boundaries">Particles in open boundaries</h2>
<p>Imagine that you have a system with open boundaries in two dimensions, with <em>N=100</em> particles.</p>
<figure>
<center>
<a href="/sandbox/particles.png"><img src="/sandbox/particles.png" alt="image" /></a>
<figcaption>Randomly generated positions of 100 particles. </figcaption>
</center>
</figure>
<p>Our goal is to compute the forces (it doesn’t matter what sort of force) between all these particles at a given time step. With the forces, we can calculate their accelerations, from which we obtain velocities and then their displacements. The displacements will tell us their positions in the next time step. We use this to compute the forces and the cycle goes on, for some period of time.</p>
<p>Assuming that these forces are functions of distances, we need to compute the distance between all <em>permutations</em> of particles. We can construct a matrix of distances, which will look like this:</p>
<p>\[
\begin{pmatrix}
d_{1,1} & d_{1,2} & .. & d_{1,N} \<br />
d_{2,1} & d_{2,2} & .. & d_{2,N} \<br />
.. & .. & .. & .. \<br />
d_{N,1} & d_{N,2} & .. & d_{N,N} \<br />
\end{pmatrix}
\]</p>
<p>We must generate the initial positions of 100 particles, and construct the distance matrix:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="n">L</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># simulation box dimension
</span><span class="n">N</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># Number of particles
</span><span class="n">dim</span> <span class="o">=</span> <span class="mi">2</span> <span class="c1"># Dimensions
</span>
<span class="c1"># Generate random positions of particles
</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">dim</span><span class="p">))</span><span class="o">-</span><span class="mf">0.5</span><span class="p">)</span><span class="o">*</span><span class="n">L</span>
<span class="c1"># Compute distance matrix
</span><span class="n">D</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">N</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="nb">float</span><span class="p">)</span>
<span class="c1"># This is the N-squared operation
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="n">dr</span> <span class="o">=</span> <span class="n">r</span><span class="p">[</span><span class="n">j</span><span class="p">]</span><span class="o">-</span><span class="n">r</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="c1"># difference between 2 positions
</span> <span class="n">D</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="n">dr</span><span class="o">*</span><span class="n">dr</span><span class="p">))</span> <span class="c1"># calculate distance and store
</span>
<span class="k">print</span><span class="p">(</span><span class="n">D</span><span class="p">[:</span><span class="mi">3</span><span class="p">,:</span><span class="mi">3</span><span class="p">])</span> <span class="c1"># print small section of the matrix
</span>
<span class="o">>>></span> <span class="p">[[</span> <span class="mf">0.</span> <span class="mf">14.47476649</span> <span class="mf">68.4285819</span> <span class="p">]</span>
<span class="p">[</span> <span class="mf">14.47476649</span> <span class="mf">0.</span> <span class="mf">53.99224333</span><span class="p">]</span>
<span class="p">[</span> <span class="mf">68.4285819</span> <span class="mf">53.99224333</span> <span class="mf">0.</span> <span class="p">]]</span>
</code></pre></div></div>
<h4 id="upper-triangular-distance-matrix">Upper Triangular Distance Matrix</h4>
<p>For 100 particles, in this algorithm, we are making 10,000 distance calculations. We can do better. Firstly, notice that the diagonal values are zero. The distance of a particle with itself is always obviously zero. Secondly, the matrix is symmetric <em>i.e.</em> reversing the order of indices does not affect the calculated distance. So we are computing distances between particle pairs twice, when we can get away with half the number of calculations. All we need to do is compute the <em>upper triangular</em> matrix of <em>D</em>. The N-squared operation will then become:</p>
<h5 id="code-1">Code 1</h5>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># This is the N-squared operation
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N</span><span class="p">):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span><span class="n">N</span><span class="p">):</span> <span class="c1"># j>i (second index is always greater than first)
</span> <span class="n">dr</span> <span class="o">=</span> <span class="n">r</span><span class="p">[</span><span class="n">j</span><span class="p">]</span><span class="o">-</span><span class="n">r</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="c1"># difference between 2 positions
</span> <span class="n">D</span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="nb">sum</span><span class="p">(</span><span class="n">dr</span><span class="o">*</span><span class="n">dr</span><span class="p">))</span> <span class="c1"># calculate distance and store
</span></code></pre></div></div>
<p>We have now decreased the number of calculations to 4,950. In terms of the algorithm, there isn’t much more that we can do. However, we can do a lot better if we learn some pythonic tricks.</p>
<ul>
<li>Perform expensive numpy functions such as, <code class="language-plaintext highlighter-rouge">np.sqrt</code>, as few times as possible. For instance, the solution will be identical, if we take the square root of the matrix, after the squared distances are calculated.</li>
<li>Avoid explicit for loops, when possible. They are usually rather slow in Python. If you find a way to offload looping duties to Python implicity, you will generally get much more readable and faster code. Namely, if the body of your loop is simple, as it is here, the interpreter of the for loop itself contributes substantially to the overhead.</li>
<li>Reduce data access. So far, we created an array of zeros, and updated their values by accessing them. This is inefficient.</li>
</ul>
<p>With these three things in mind, let’s do better! For this we need to learn a neat numpy function called <code class="language-plaintext highlighter-rouge">np.triu_indices</code>. This gives the upper triangular matrix indices <em>i.e.</em> exactly the same indices that we were generating using our range function in a for loop.</p>
<h5 id="code-2">Code 2</h5>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="n">L</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># simulation box dimension
</span><span class="n">N</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># Number of particles
</span><span class="n">dim</span> <span class="o">=</span> <span class="mi">2</span> <span class="c1"># Dimensions
</span>
<span class="c1"># Generate random positions of particles
</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">dim</span><span class="p">))</span><span class="o">-</span><span class="mf">0.5</span><span class="p">)</span><span class="o">*</span><span class="n">L</span>
<span class="c1"># uti is a list of two (1-D) numpy arrays
# containing the indices of the upper triangular matrix
</span><span class="n">uti</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">triu_indices</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">k</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># k=1 eliminates diagonal indices
</span>
<span class="c1"># uti[0] is i, and uti[1] is j from the previous example
</span><span class="n">dr</span> <span class="o">=</span> <span class="n">r</span><span class="p">[</span><span class="n">uti</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span> <span class="o">-</span> <span class="n">r</span><span class="p">[</span><span class="n">uti</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span> <span class="c1"># computes differences between particle positions
</span><span class="n">D</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">dr</span><span class="o">*</span><span class="n">dr</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">))</span> <span class="c1"># computes distances; D is a 4950 x 1 np array
</span></code></pre></div></div>
<p>We have done in three lines, what we previously achieved in five, and we have no more for loops. Timing them on my computer shows the gulf of performance between the two approaches, for a 1000 particles system: Code 1 takes around 3.4 seconds, and Code 2 takes around 0.0005 seconds. Perhaps you noticed that I did cheat a little by not generating a <em>NxN</em> matrix, as in Code 1. But even recasting the values using <code class="language-plaintext highlighter-rouge">scipy.spatial.distance.squareform</code> into a <em>NxN</em> matrix does not slow the code significantly.</p>
<p>The easiest method to solve this problem, as I mentined earlier, is to use the <code class="language-plaintext highlighter-rouge">scipy.spatial.distance.pdist</code> function. Its a one-liner:</p>
<h5 id="code-3">Code 3</h5>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="nn">scipy.spatial.distance</span> <span class="kn">import</span> <span class="n">pdist</span>
<span class="n">L</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># simulation box dimension
</span><span class="n">N</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># Number of particles
</span><span class="n">dim</span> <span class="o">=</span> <span class="mi">2</span> <span class="c1"># Dimensions
</span>
<span class="c1"># Generate random positions of particles
</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">dim</span><span class="p">))</span><span class="o">-</span><span class="mf">0.5</span><span class="p">)</span><span class="o">*</span><span class="n">L</span>
<span class="n">D</span> <span class="o">=</span> <span class="n">pdist</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
</code></pre></div></div>
<p>This was, to my surprise, not as fast as Code 2.</p>
<h2 id="particles-within-periodic-boundaries">Particles within periodic boundaries</h2>
<p>In molecular dynamics, we frequently encounter periodic boundaries. It is the case a large (infinite) system is approximated by using a small part called a unit cell. The geometry of the unit cell is tiled such that when an object passes through one side of the unit cell, it re-appears on the opposite side with the same velocity.</p>
<p>A decision must be made, here. Do we</p>
<ul>
<li>“fold” particles into the simulation box when they leave it, or</li>
<li>do we let them go on, and wander out of the unit cell, but compute interactions with the nearest images when necessary?</li>
</ul>
<h3 id="wrapped">Wrapped</h3>
<p>In the first approach, where the positions of particles are wrapped, they are necessarily within the unit cell. Restricting the coordinates within the unit cell is easy. If <em>x</em> is the position of the particle in some arbitrary dimension, and <em>L</em> is the length of the box in that dimension, the approach can be described by the following C++ code:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>if (x < -L * 0.5) x = x + L
if (x >= L * 0.5) x = x - L
</code></pre></div></div>
<p>Naively computing distances between particle pairs will omit pairs where one particle is close to the boundary and its counterpart is lurking in the adjacent cell. Distances and vectors between objects should obey, what is known as the <em>minimum image criterion</em>. And we can calculate the <em>minimum image distance</em> in the following manner:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>dx = x[j] - x[i]
if (dx > L * 0.5) dx = dx - L
if (dx <= -L * 0.5) dx = dx + L
</code></pre></div></div>
<p>Obviously, this needs to be repeated for the number of dimensions that the particles exist in. In Python, these codes will look like this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">L</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># simulation box dimension
</span><span class="n">N</span> <span class="o">=</span> <span class="mi">100</span> <span class="c1"># Number of particles
</span><span class="n">dim</span> <span class="o">=</span> <span class="mi">2</span> <span class="c1"># Dimensions
</span>
<span class="c1"># Particles have purposely wandered out of L
</span><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">dim</span><span class="p">))</span><span class="o">-</span><span class="mf">0.5</span><span class="p">)</span><span class="o">*</span><span class="mf">1.5</span><span class="o">*</span><span class="n">L</span>
<span class="c1"># Wrapping step
</span><span class="n">r</span><span class="p">[</span><span class="n">r</span> <span class="o"><</span> <span class="o">-</span><span class="n">L</span><span class="o">*</span><span class="mf">0.5</span><span class="p">]</span> <span class="o">+=</span> <span class="n">L</span>
<span class="n">r</span><span class="p">[</span><span class="n">r</span> <span class="o">>=</span> <span class="n">L</span><span class="o">*</span><span class="mf">0.5</span><span class="p">]</span> <span class="o">-=</span> <span class="n">L</span>
<span class="c1"># Distance calculation step
</span><span class="n">uti</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">triu_indices</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># uti[0] is i, and uti[1] is j from the previous example
</span><span class="n">dr</span> <span class="o">=</span> <span class="n">r</span><span class="p">[</span><span class="n">uti</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span> <span class="o">-</span> <span class="n">r</span><span class="p">[</span><span class="n">uti</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span> <span class="c1"># computes differences between particle positions
</span>
<span class="c1"># Minimum image convention
</span><span class="n">dr</span><span class="p">[</span><span class="n">r</span> <span class="o">></span> <span class="n">L</span><span class="o">*</span><span class="mf">0.5</span><span class="p">]</span> <span class="o">-=</span> <span class="n">L</span>
<span class="n">dr</span><span class="p">[</span><span class="n">r</span> <span class="o"><=</span> <span class="o">-</span><span class="n">L</span><span class="o">*</span><span class="mf">0.5</span><span class="p">]</span> <span class="o">+=</span> <span class="n">L</span>
<span class="n">D</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">dr</span><span class="o">*</span><span class="n">dr</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">))</span> <span class="c1"># computes distances; D is a 4950 x 1 np array
</span></code></pre></div></div>
<h3 id="unwrapped">Unwrapped</h3>
<p>Given the size of the box, we can also directly compute the distances according to the minimum image convention, <em>without</em> wrapping the particles within the unit cell. This is done here in C++ in the following manner:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>L_r = 1.0 / L; // L is the box length
dx = x[j] - x[i]; // Compute distance between particle i and j
dx -= x_size * nearbyint(dx * L_r);
</code></pre></div></div>
<p>In Python:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">r</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">random</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">N</span><span class="p">,</span><span class="n">dim</span><span class="p">))</span><span class="o">-</span><span class="mf">0.5</span><span class="p">)</span><span class="o">*</span><span class="mf">1.5</span><span class="o">*</span><span class="n">L</span>
<span class="n">uti</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">triu_indices</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">dr</span> <span class="o">=</span> <span class="n">r</span><span class="p">[</span><span class="n">uti</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span> <span class="o">-</span> <span class="n">r</span><span class="p">[</span><span class="n">uti</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span>
<span class="c1"># Minimum image distance of unwrapped dr
</span><span class="n">dr</span> <span class="o">-=</span> <span class="n">L</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="nb">round</span><span class="p">(</span><span class="n">dr</span><span class="o">/</span><span class="n">L</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="bottom-line">Bottom line</h2>
<p>Even though Python is a high level language, we can get a lot of traction in molecular dynamics, even for a relatively large number of particles. I prefer to perform my analysis in Python, because the code is readable and easy to debug in the future. The issue of N-squared time complexity, still remains, in this algorithm, but is greatly alleviated by more pythonic coding.</p>Arvind Ravichandrana.ravichandran@fz-juelich.deEfficiently calculating distances between particles using Python.