Is Python slow? How to Make Your Code 1000+ Times Faster!

In the realm of programming, Python stands out for its emphasis on code readability and simplicity, traits that have cemented its popularity across diverse fields, from web development to data science. However, despite its many virtues, Python is often critiqued for its looping performance. This is especially notable when not utilizing built-in functions or leveraging the specialized capabilities of external libraries.

In this article, we will explore the performance of pure Python code as well as Python’s built-in looping constructs and compare them to the performance of the numpy library, which is widely used for numerical computing. In the process, we will also discuss what really matters when it comes to performance and how to make the most of Python’s strengths.

Python is praised for its simplicity and vast library support, appealing to both new and experienced developers for tasks ranging from web development to data analysis. Yet, its execution speed, especially in loops, faces criticism.

Loops are essential for tasks like data processing and automation, allowing repeated execution of code blocks. In Python, they’re key for iterating over data structures and condition-based operations.

Python’s loop performance issues are due to its dynamic, interpreted nature, leading to slower execution than compiled languages such as C or Java. This flexibility and ease of use come with overhead affecting loop speed, particularly with large data or complex operations.

Python’s dynamic typing and lack of low-level memory and data structure control can also slow loop execution. While this enhances ease of writing and reading code, it adds runtime overhead.

Despite these drawbacks, Python’s efficiency can be improved using built-in functions and high-performance libraries like NumPy and Pandas, which help bypass the slowness of pure Python loops.

Benchmarks

In our exploration of Python’s performance, particularly focusing on loop efficiency, we introduce a simplified yet powerful tool: the Benchmark class. This utility is designed to measure and report the execution time of code snippets, offering insights into their performance characteristics. The Benchmark class is particularly useful for identifying potential bottlenecks and understanding the impact of various coding approaches on execution speed. The code was adapted from here.

from timeit import default_timer as timer

class Benchmark:

    def __init__(self, message="Elapsed time", format="%0.3g"):
        self.message = message
        self.format = format
        self.start = 0.0

    def __enter__(self):
        self.start = timer()
        return self

    def __exit__(self, exc_type, exc_val, traceback):
        end = timer()
        self.time = end - self.start
        print(f"{self.message} {self.format % self.time} seconds")
        # Return False to allow any exceptions to propagate
        return False

How to Use the Benchmark Class

Using the Benchmark class is straightforward. It is implemented as a context manager, enabling it to be easily integrated with Python’s with statement for clean and concise syntax. Here’s a step-by-step guide on how to use it:

  1. Initialization: First, create an instance of the Benchmark class. You can customize its behavior through optional parameters, such as message to describe the benchmark and format to specify the output format of the execution time.

  2. Execution Block: Encapsulate the code you wish to benchmark within a with block. By doing so, the Benchmark class automatically starts timing when the block is entered and stops timing when the block is exited.

  3. Results: Upon exiting the with block, the Benchmark class calculates the elapsed time, formats it according to the specified format parameter, and prints the result. This gives you immediate feedback on the execution time of the encapsulated code snippet.

Pure Python for Loop Benchmark

For our initial benchmark, we’ll check the performance of a simple for loop in Python. We’ll use the Benchmark class to measure the execution time of a loop that iterates over a large range of numbers and computes the sum of the elements. This benchmark will serve as a baseline.

LARGE_NUMBER = 100_000_001

with Benchmark() as b_pure_python_for_loop:
    total = 0

    for i in range(1, LARGE_NUMBER):
        total += i

    print(f"{total= }")
total= 5000000050000000
Elapsed time 14.8 seconds

The time in your machine may vary, but should be in the order of seconds.

Built-in Functions Benchmark

As seen in the previous benchmark, Python’s for loops can be slow when dealing with large datasets. However, Python provides a rich set of built-in functions that can significantly improve the performance of common operations. In this benchmark, we’ll check the execution time of a for loop with that of the built-in sum() function, which is optimized for speed.

with Benchmark() as b_python_builtin_sum:
    total = sum(range(1, LARGE_NUMBER))
    print(f"{total= }")
total= 5000000050000000
Elapsed time 3.06 seconds
# compare how many times faster the built-in sum() function is
print(f"sum() is {b_pure_python_for_loop.time / b_python_builtin_sum.time:.2f} times faster than a pure Python for loop")
sum() is 4.84 times faster than a pure Python for loop

This single line of code is significantly faster than the previous for loop, as it leverages the optimized implementation of the sum() function, which is implemented in C and designed for efficient computation.

NumPy Benchmark

The NumPy library is a cornerstone of high-performance computing in Python, particularly for numerical operations. It provides a powerful array object and a collection of functions for manipulating and processing large datasets. In this benchmark, we’ll compare the performance of a for loop with that of NumPy’s sum() function, which is designed for efficient numerical computation.

import numpy as np

with Benchmark() as b_numpy_sum:
    total = np.sum(np.arange(1, LARGE_NUMBER))
    print(f"{total= }")
total= 5000000050000000
Elapsed time 0.754 seconds
# compare how many times faster the NumPy sum() function is than the built-in sum() function and the pure Python for loop
print(f"NumPy sum() is {b_python_builtin_sum.time / b_numpy_sum.time:.2f} times faster than the built-in sum() function")
print(f"NumPy sum() is {b_pure_python_for_loop.time / b_numpy_sum.time:.2f} times faster than a pure Python for loop")
NumPy sum() is 4.05 times faster than the built-in sum() function
NumPy sum() is 19.62 times faster than a pure Python for loop

As expected, the NumPy implementation is significantly faster than the pure Python for loop and than the built-in sum function, showcasing the library’s ability to accelerate numerical operations through optimized C-based functions.

“Math knowledge” benchmark

It is good to remember that sometimes the best way to optimize a code is to use math knowledge. In this case, we can use the formula for the sum of an arithmetic progression to calculate the sum of the elements in the range without the need for a loop. This approach is expected to be much faster than the previous benchmarks, as it avoids the overhead of iteration and directly computes the result using a mathematical formula.

S = \frac{n (n + 1)}{2}
def sum_formula(n):
    return n * (n + 1) // 2


with Benchmark() as b_math:
    total = sum_formula(LARGE_NUMBER)
    print(f"{total= }")
total= 5000000150000001
Elapsed time 7.29e-05 seconds
# compare how many times faster the math formula is than the other methods
print(f"math formula is {b_pure_python_for_loop.time / b_math.time:.2f} times faster than a pure Python for loop")
print(f"math formula is {b_python_builtin_sum.time / b_math.time:.2f} times faster than the built-in sum() function")
print(f"math formula is {b_numpy_sum.time / b_math.time:.2f} times faster than the NumPy sum() function")
math formula is 202894.86 times faster than a pure Python for loop
math formula is 41931.33 times faster than the built-in sum() function
math formula is 10342.00 times faster than the NumPy sum() function

This benchmark demonstrates the power of mathematical insight in optimizing code, showcasing a dramatic improvement in performance compared to the previous implementations.

That’s why math is so important in programming. It can help us to solve problems in a more efficient way.

Working with large datasets

Considering that this is a chemistry website, we can think about a chemistry example. Let’s say we have a dataset with the pH values of 100,000,000 samples, and we want to count how many of them are in blood pH range (7.35 to 7.45). And also calculate the mean of those that are in this range. We can use the Benchmark class to compare the performance of different approaches to this problem.

First, we’ll use a pure Python for loop to iterate over the dataset and count the samples within the blood pH range. Then, we’ll calculate the mean of these samples. Next, we’ll leverage NumPy’s array operations to perform the same tasks.

rng = np.random.default_rng(seed=42)

# Generate LARGE_NUMBER floats between 1 and 14
data = rng.uniform(1, 14, LARGE_NUMBER)

# use a for loop to count the number of values between 7.35 and 7.45 (inclusive)
# and calculate the mean of those values
with Benchmark() as b_pH_python_for_loop:
    count = 0
    total = 0

    for value in data:
        if 7.35 <= value <= 7.45:
            count += 1
            total += value

    print(f"{count= }, {total= }")
    print(f"Mean: {total / count}")
count= 770155, total= 5699198.662586555
Mean: 7.400067080764982
Elapsed time 16.3 seconds
# now use NumPy to do the same thing
with Benchmark() as b_pH_numpy:
    filtered = data[(7.35 <= data) & (data <= 7.45)]
    mean = filtered.mean()
    sum = filtered.sum()
    count = filtered.size
    print(f"{count= }, {total= }")
    print(f"{mean= }")
count= 770155, total= 5699198.662586555
mean= 7.400067080764624
Elapsed time 0.388 seconds
# compare how many times faster the NumPy method is
print(f"NumPy method is {b_pH_python_for_loop.time / b_pH_numpy.time:.2f} times faster than a pure Python for loop")
NumPy method is 42.01 times faster than a pure Python for loop

Again, a massive difference in performance is expected, with the NumPy implementation being significantly faster than the pure Python for loop. This example illustrates the importance of choosing the right tools and libraries for data-intensive tasks, particularly in scientific computing and data analysis.

What really matters: the total cost of software development

OK, we have seen that Python’s looping performance can be slow, especially when dealing with large datasets. However, it’s important to remember that performance is just one aspect of software development. In many cases, the benefits of Python’s simplicity, readability, and extensive library support outweigh its performance drawbacks.

Python operates at a higher abstraction level than C, inherently making it slower regardless of the implementation. If you convert a C program into pure Python, expect a significant decrease in speed. Nonetheless, Python’s appeal lies in three key aspects:

  1. The total cost of software life cycle is crucial, encompassing development, execution, debugging times, and resource costs. Python’s flexibility significantly reduces development time, making it a preferable choice for projects where rapid development is prioritized over execution speed. For instance, writing a brief script in Python that runs momentarily each day is more efficient than dedicating extra development time to C or some other languages for marginal runtime savings annually.

  2. The speed of your code is not solely determined by CPU performance. Tasks involving network, databases or filesystem access primarily wait on these operations to complete. Thus, enhancing code execution speed has limited benefits unless your application can effectively utilize parallel processing.

  3. Often, software performance bottlenecks are concentrated in a few areas. Python’s ability to leverage C libraries allows for optimization of these critical sections, for maximum efficiency. This approach mirrors the strategy employed in machine learning, where data preparation and model definition are maintained in Python for their variability and lower CPU demands, thereby optimizing development time without necessitating extensive optimization.

Keep in mind that engineer time is more expensive than CPU time.

The final word

Python’s performance limitations, particularly in loop execution, are well-documented and stem from its interpreted nature, dynamic typing, and high-level abstraction. However, these drawbacks are balanced by Python’s simplicity, readability, and extensive ecosystem of libraries and tools.

By leveraging built-in functions, specialized libraries, and mathematical insight, developers can mitigate Python’s performance issues and achieve efficient code execution. Moreover, Python’s rapid development cycle and flexibility make it an attractive choice for a wide range of applications, particularly those where development time and maintainability are paramount.

Ultimately, the choice of programming language should be guided by the specific requirements of the project, balancing performance, development time, and resource costs to achieve the best outcome.

I hope you enjoyed this post and learned something new. If you have any questions or suggestions, feel free to leave a comment. Check out more Python posts here.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top