How To Use Django bulk_create

Hi Guy!,
While Django is renowned for its speed and ease in web development, there are moments when performance becomes a concern. Accessing the database, a crucial aspect of web development, can be one of those time-consuming tasks. πŸš€ #Django #WebDev

In this, we are going to discuss ways for optimization;

  • bulk_create

In this first part we will talk on how to use bulk_create.

How to use bulk_create

“Django’s bulk_create is a game-changer for optimization! Instead of making a bunch of small calls to the database to save loads of data, it helps us save a massive chunk in just one go. Time to say goodbye to looping through and saving one by one! πŸŽ‰

bulk_create(objs, batch_size=None, ignore_conflicts=False)

  • objs β€“ List of objects that should be created.
  • batch_size β€“ Defines how many instances are created with a single database call. Defaults None.
  • ignore_conflicts β€“ Ignores any insert failures. Defaults False.

Example:

In bulk_create, I’ll use the a Blog model.

class Blogs(models.Model):
    title = models.CharField(max_length=200, blank=True, null=True)
    description = models.TextField(blank=True, null=True)
    date_created = models.DateTimeField(auto_now_add=True)
    
    def __str__(self) -> str:
        return f"{self.title}"

We are going to have a function to generate 1000 blogs for us, so we can save that data into our database. here is the function for it

# This is a function to generates 1000 blogs
def generate_sample_data():
    titles = ["Title {}".format(i) for i in range(1, 1001)]
    descriptions = ["Description for Title {}".format(i) for i in range(1, 1001)]

    data = []

    for title, description in zip(titles, descriptions):
        data.append({
            "title": title,
            "description": description,
        })

    return data

in normal cases, for us to store this 1000 objects we would have looped through like this:

from blog.models import Blogs

generated_data = generate_sample_data()
for data in generated_data:
  Blogs.objects.create(title = data["title"],description=["description"])

with the above method we are accessing the database 1000 times which makes it slow. So now lets look at a case where we use bulk_create for this

from blog.models import Blogs

generated_data = generate_sample_data()
Post.objects.bulk_create([Blogs(title=data["title"],description=["description"]) for i in generated_data]

with this above code, it reduces the number of times the database is being called which increases the speed of the application.

How is bulk create faster?

How do I know if this is better? Let’s test the time performance, comparing it to the second one. For time tracking, we will use cProfile.

Firstly, lets check the standard way of doing it.

#Because we are running this file individually we need to add django models into it
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "sample_project.settings")

import django

django.setup() # this is to set django up
import cProfile # module to check time durations
from blog.models import Blogs

def without_bulk():
    # this generate_sample_data function is called from the previous codes which was written earlier
    generated_data = generate_sample_data()
    for data in generated_data:
        Blogs.objects.create(title=data["title"], description=["description"])


p = cProfile.Profile()
p.runcall(without_bulk)
p.print_stats(sort="tottime")

To run this file you can type “python3 <name_of_file>.py”. lets now see the output

(env) project/project$ python3.8 django_test.py 
         316669 function calls (312636 primitive calls) in 4.754 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2001    3.672    0.002    3.672    0.002 {method 'fetchone' of 'sqlite3.Cursor' objects}
     2000    0.279    0.000    0.279    0.000 {function SQLiteCursorWrapper.execute at 0x7fcdedbc0c10}
     1000    0.036    0.000    0.224    0.000 compiler.py:1732(as_sql)
     1000    0.033    0.000    0.051    0.000 base.py:460(__init__)
     1000    0.028    0.000    4.452    0.004 compiler.py:1812(execute_sql)
     1000    0.023    0.000    4.633    0.005 base.py:835(save_base)
     2000    0.021    0.000    0.073    0.000 utils.py:108(debug_sql)

More than half of the time was used to execute the method to call out the sqlite library each time we save a post. Lets now look into this with bulk_create

#Because we are running this file individually we need to add django models into it
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "sample_project.settings")

import django

django.setup() # this is to set django up
import cProfile # module to check time durations
from blog.models import Blogs

def bulk_function():
    # this generate_sample_data function is called from the previous codes which was written earlier
    generated_data = generate_sample_data()
    Blogs.objects.bulk_create(
        [
            Blogs(title=data["title"], description=["description"])
            for data in generated_data
        ]
    )


p = cProfile.Profile()
p.runcall(bulk_function)
p.print_stats(sort="tottime")

Running profilers on the bulk method, we get this output:

(env) project/project$ python3.8 django_test.py 
         126541 function calls (125490 primitive calls) in 0.054 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.006    0.006 {method 'commit' of 'sqlite3.Connection' objects}
        9    0.005    0.001    0.005    0.001 {function SQLiteCursorWrapper.execute at 0x7f4fe0c80c10}
     1000    0.004    0.000    0.007    0.000 base.py:460(__init__)
     1000    0.002    0.000    0.005    0.000 operations.py:260(adapt_datetimefield_value)
     3000    0.002    0.000    0.002    0.000 compiler.py:1627(field_as_sql)
     3026    0.001    0.000    0.001    0.000 functional.py:291(__getattribute__)
     3000    0.001    0.000    0.015    0.000 compiler.py:1659(prepare_value)

We noticed that it took about 4.754 seconds to save 1000 data into the database, while it only took 0.054 seconds, which is less than a second, to perform the same task using the second method. This is a significant improvement. Imagine saving billions of data? That would take almost forever with the first method.

Its advisable to use “bulk_create” whenever you see a need. Thank you.
In our next lesson, we will be looking into bulk_update.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top