In the realm of programming, Python stands out for its emphasis on code readability and simplicity, traits that have cemented its popularity across diverse fields, from web development to data science. However, despite its many virtues, Python is often critiqued for its looping performance. This is especially notable when not utilizing built-in functions or leveraging the specialized capabilities of external libraries.
In this article, we will explore the performance of pure Python code as well as Python’s built-in looping constructs and compare them to the performance of the numpy
library, which is widely used for numerical computing. In the process, we will also discuss what really matters when it comes to performance and how to make the most of Python’s strengths.
Python is praised for its simplicity and vast library support, appealing to both new and experienced developers for tasks ranging from web development to data analysis. Yet, its execution speed, especially in loops, faces criticism.
Loops are essential for tasks like data processing and automation, allowing repeated execution of code blocks. In Python, they’re key for iterating over data structures and condition-based operations.
Python’s loop performance issues are due to its dynamic, interpreted nature, leading to slower execution than compiled languages such as C or Java. This flexibility and ease of use come with overhead affecting loop speed, particularly with large data or complex operations.
Python’s dynamic typing and lack of low-level memory and data structure control can also slow loop execution. While this enhances ease of writing and reading code, it adds runtime overhead.
Despite these drawbacks, Python’s efficiency can be improved using built-in functions and high-performance libraries like NumPy and Pandas, which help bypass the slowness of pure Python loops.
Benchmarks
In our exploration of Python’s performance, particularly focusing on loop efficiency, we introduce a simplified yet powerful tool: the Benchmark
class. This utility is designed to measure and report the execution time of code snippets, offering insights into their performance characteristics. The Benchmark
class is particularly useful for identifying potential bottlenecks and understanding the impact of various coding approaches on execution speed. The code was adapted from here.
from timeit import default_timer as timer
class Benchmark:
def __init__(self, message="Elapsed time", format="%0.3g"):
self.message = message
self.format = format
self.start = 0.0
def __enter__(self):
self.start = timer()
return self
def __exit__(self, exc_type, exc_val, traceback):
end = timer()
self.time = end - self.start
print(f"{self.message} {self.format % self.time} seconds")
# Return False to allow any exceptions to propagate
return False
How to Use the Benchmark Class
Using the Benchmark
class is straightforward. It is implemented as a context manager, enabling it to be easily integrated with Python’s with
statement for clean and concise syntax. Here’s a step-by-step guide on how to use it:
-
Initialization: First, create an instance of the
Benchmark
class. You can customize its behavior through optional parameters, such asmessage
to describe the benchmark andformat
to specify the output format of the execution time. -
Execution Block: Encapsulate the code you wish to benchmark within a
with
block. By doing so, theBenchmark
class automatically starts timing when the block is entered and stops timing when the block is exited. -
Results: Upon exiting the
with
block, theBenchmark
class calculates the elapsed time, formats it according to the specifiedformat
parameter, and prints the result. This gives you immediate feedback on the execution time of the encapsulated code snippet.
Pure Python for Loop Benchmark
For our initial benchmark, we’ll check the performance of a simple for loop in Python. We’ll use the Benchmark
class to measure the execution time of a loop that iterates over a large range of numbers and computes the sum of the elements. This benchmark will serve as a baseline.
LARGE_NUMBER = 100_000_001
with Benchmark() as b_pure_python_for_loop:
total = 0
for i in range(1, LARGE_NUMBER):
total += i
print(f"{total= }")
total= 5000000050000000 Elapsed time 14.8 seconds
The time in your machine may vary, but should be in the order of seconds.
Built-in Functions Benchmark
As seen in the previous benchmark, Python’s for loops can be slow when dealing with large datasets. However, Python provides a rich set of built-in functions that can significantly improve the performance of common operations. In this benchmark, we’ll check the execution time of a for loop with that of the built-in sum()
function, which is optimized for speed.
with Benchmark() as b_python_builtin_sum:
total = sum(range(1, LARGE_NUMBER))
print(f"{total= }")
total= 5000000050000000 Elapsed time 3.06 seconds
# compare how many times faster the built-in sum() function is
print(f"sum() is {b_pure_python_for_loop.time / b_python_builtin_sum.time:.2f} times faster than a pure Python for loop")
sum() is 4.84 times faster than a pure Python for loop
This single line of code is significantly faster than the previous for loop, as it leverages the optimized implementation of the sum()
function, which is implemented in C and designed for efficient computation.
NumPy Benchmark
The NumPy library is a cornerstone of high-performance computing in Python, particularly for numerical operations. It provides a powerful array object and a collection of functions for manipulating and processing large datasets. In this benchmark, we’ll compare the performance of a for loop with that of NumPy’s sum()
function, which is designed for efficient numerical computation.
import numpy as np
with Benchmark() as b_numpy_sum:
total = np.sum(np.arange(1, LARGE_NUMBER))
print(f"{total= }")
total= 5000000050000000 Elapsed time 0.754 seconds
# compare how many times faster the NumPy sum() function is than the built-in sum() function and the pure Python for loop
print(f"NumPy sum() is {b_python_builtin_sum.time / b_numpy_sum.time:.2f} times faster than the built-in sum() function")
print(f"NumPy sum() is {b_pure_python_for_loop.time / b_numpy_sum.time:.2f} times faster than a pure Python for loop")
NumPy sum() is 4.05 times faster than the built-in sum() function NumPy sum() is 19.62 times faster than a pure Python for loop
As expected, the NumPy implementation is significantly faster than the pure Python for loop and than the built-in sum
function, showcasing the library’s ability to accelerate numerical operations through optimized C-based functions.
“Math knowledge” benchmark
It is good to remember that sometimes the best way to optimize a code is to use math knowledge. In this case, we can use the formula for the sum of an arithmetic progression to calculate the sum of the elements in the range without the need for a loop. This approach is expected to be much faster than the previous benchmarks, as it avoids the overhead of iteration and directly computes the result using a mathematical formula.
def sum_formula(n):
return n * (n + 1) // 2
with Benchmark() as b_math:
total = sum_formula(LARGE_NUMBER)
print(f"{total= }")
total= 5000000150000001 Elapsed time 7.29e-05 seconds
# compare how many times faster the math formula is than the other methods
print(f"math formula is {b_pure_python_for_loop.time / b_math.time:.2f} times faster than a pure Python for loop")
print(f"math formula is {b_python_builtin_sum.time / b_math.time:.2f} times faster than the built-in sum() function")
print(f"math formula is {b_numpy_sum.time / b_math.time:.2f} times faster than the NumPy sum() function")
math formula is 202894.86 times faster than a pure Python for loop math formula is 41931.33 times faster than the built-in sum() function math formula is 10342.00 times faster than the NumPy sum() function
This benchmark demonstrates the power of mathematical insight in optimizing code, showcasing a dramatic improvement in performance compared to the previous implementations.
That’s why math is so important in programming. It can help us to solve problems in a more efficient way.
Working with large datasets
Considering that this is a chemistry website, we can think about a chemistry example. Let’s say we have a dataset with the pH values of 100,000,000 samples, and we want to count how many of them are in blood pH range (7.35 to 7.45). And also calculate the mean of those that are in this range. We can use the Benchmark
class to compare the performance of different approaches to this problem.
First, we’ll use a pure Python for loop to iterate over the dataset and count the samples within the blood pH range. Then, we’ll calculate the mean of these samples. Next, we’ll leverage NumPy’s array operations to perform the same tasks.
rng = np.random.default_rng(seed=42)
# Generate LARGE_NUMBER floats between 1 and 14
data = rng.uniform(1, 14, LARGE_NUMBER)
# use a for loop to count the number of values between 7.35 and 7.45 (inclusive)
# and calculate the mean of those values
with Benchmark() as b_pH_python_for_loop:
count = 0
total = 0
for value in data:
if 7.35 <= value <= 7.45:
count += 1
total += value
print(f"{count= }, {total= }")
print(f"Mean: {total / count}")
count= 770155, total= 5699198.662586555 Mean: 7.400067080764982 Elapsed time 16.3 seconds
# now use NumPy to do the same thing
with Benchmark() as b_pH_numpy:
filtered = data[(7.35 <= data) & (data <= 7.45)]
mean = filtered.mean()
sum = filtered.sum()
count = filtered.size
print(f"{count= }, {total= }")
print(f"{mean= }")
count= 770155, total= 5699198.662586555 mean= 7.400067080764624 Elapsed time 0.388 seconds
# compare how many times faster the NumPy method is
print(f"NumPy method is {b_pH_python_for_loop.time / b_pH_numpy.time:.2f} times faster than a pure Python for loop")
NumPy method is 42.01 times faster than a pure Python for loop
Again, a massive difference in performance is expected, with the NumPy implementation being significantly faster than the pure Python for loop. This example illustrates the importance of choosing the right tools and libraries for data-intensive tasks, particularly in scientific computing and data analysis.
What really matters: the total cost of software development
OK, we have seen that Python’s looping performance can be slow, especially when dealing with large datasets. However, it’s important to remember that performance is just one aspect of software development. In many cases, the benefits of Python’s simplicity, readability, and extensive library support outweigh its performance drawbacks.
Python operates at a higher abstraction level than C, inherently making it slower regardless of the implementation. If you convert a C program into pure Python, expect a significant decrease in speed. Nonetheless, Python’s appeal lies in three key aspects:
-
The total cost of software life cycle is crucial, encompassing development, execution, debugging times, and resource costs. Python’s flexibility significantly reduces development time, making it a preferable choice for projects where rapid development is prioritized over execution speed. For instance, writing a brief script in Python that runs momentarily each day is more efficient than dedicating extra development time to C or some other languages for marginal runtime savings annually.
-
The speed of your code is not solely determined by CPU performance. Tasks involving network, databases or filesystem access primarily wait on these operations to complete. Thus, enhancing code execution speed has limited benefits unless your application can effectively utilize parallel processing.
-
Often, software performance bottlenecks are concentrated in a few areas. Python’s ability to leverage C libraries allows for optimization of these critical sections, for maximum efficiency. This approach mirrors the strategy employed in machine learning, where data preparation and model definition are maintained in Python for their variability and lower CPU demands, thereby optimizing development time without necessitating extensive optimization.
Keep in mind that engineer time is more expensive than CPU time.
The final word
Python’s performance limitations, particularly in loop execution, are well-documented and stem from its interpreted nature, dynamic typing, and high-level abstraction. However, these drawbacks are balanced by Python’s simplicity, readability, and extensive ecosystem of libraries and tools.
By leveraging built-in functions, specialized libraries, and mathematical insight, developers can mitigate Python’s performance issues and achieve efficient code execution. Moreover, Python’s rapid development cycle and flexibility make it an attractive choice for a wide range of applications, particularly those where development time and maintainability are paramount.
Ultimately, the choice of programming language should be guided by the specific requirements of the project, balancing performance, development time, and resource costs to achieve the best outcome.
I hope you enjoyed this post and learned something new. If you have any questions or suggestions, feel free to leave a comment. Check out more Python posts here.