最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How to efficiently exclude already assigned objects in a Django QuerySet? - Stack Overflow

programmeradmin0浏览0评论

I am working on a Django project where I need to filter objects based on their status while excluding those that are already assigned in another model.

I have two models:

  1. CartObject – Stores all objects.
  2. OnGoingProcess – Tracks objects that are currently assigned.

Each OnGoingProcess entry has a OneToOneField relationship with CartObject, meaning each object can only be assigned once.

My goal is to fetch all objects with a specific status but exclude those that are already assigned in OnGoingProcess.


Models:

class CartObject(models.Model):
    object_id = models.CharField(max_length=100, unique=True)
    status = models.CharField(max_length=50, choices=[("pending", "Pending")])
    # Other fields...

class OnGoingProcess(models.Model):
    user = models.OneToOneField(DeliveryProfile, on_delete=models.CASCADE, related_name="ongoing_process")
    associated_object = models.OneToOneField(CartObject, on_delete=models.CASCADE, related_name="associated_process", blank=True, null=True)
    # Other fields...

Current View Code:

@user_passes_test(lambda user: user.is_staff)
def process_manager_view(request):
    # Get objects that are already assigned in OnGoingProcess
    assigned_objects = OnGoingProcess.objects.values_list('associated_object', flat=True)

    # Exclude objects that are already assigned
    available_objects = CartObject.objects.filter(status="pending").exclude(id__in=assigned_objects).order_by("-id")

    context = {
        "available_objects": available_objects,
    }
    return render(request, "useradmin/available-objects.html", context)

Issue:

  1. I am using values_list('associated_object', flat=True) to extract the assigned object IDs.
  2. Then, I am using exclude(id__in=assigned_objects) to filter out those objects.
  3. Is this the most efficient way? Or is there a better Django ORM method to achieve the same result?
  4. Should I use Subquery(), isnull=False, or any other approach for better performance?

Alternative Solutions I Considered:

Option 1: Using isnull=False

available_objects = CartObject.objects.filter(status="pending").exclude(associated_process__isnull=False)

Pros: Simple, avoids extra queries.
Cons: Not sure if it's the best approach for performance.

Option 2: Using Subquery

from django.db.models import Subquery

assigned_objects = OnGoingProcess.objects.values('associated_object')
available_objects = CartObject.objects.filter(status="pending").exclude(id__in=Subquery(assigned_objects))

Pros: Optimized for large datasets.
Cons: More complex.

Option 3: Using Raw SQL (if necessary)

from django.db import connection

with connection.cursor() as cursor:
    cursor.execute("""
        SELECT id FROM useradmin_cartobject 
        WHERE status='pending' 
        AND id NOT IN (SELECT associated_object FROM useradmin_ongoingprocess)
        ORDER BY id DESC
    """)
    result = cursor.fetchall()

available_objects = CartObject.objects.filter(id__in=[row[0] for row in result])

Pros: Performance boost for huge data.
Cons: Less readable, database-dependent.


Question:

What is the best and most efficient Django ORM approach to filter objects while excluding those that are already assigned?
Would values_list(), isnull=False, or Subquery() be the recommended way?
Are there any performance considerations when working with large datasets?

Thank you in advance!

I am working on a Django project where I need to filter objects based on their status while excluding those that are already assigned in another model.

I have two models:

  1. CartObject – Stores all objects.
  2. OnGoingProcess – Tracks objects that are currently assigned.

Each OnGoingProcess entry has a OneToOneField relationship with CartObject, meaning each object can only be assigned once.

My goal is to fetch all objects with a specific status but exclude those that are already assigned in OnGoingProcess.


Models:

class CartObject(models.Model):
    object_id = models.CharField(max_length=100, unique=True)
    status = models.CharField(max_length=50, choices=[("pending", "Pending")])
    # Other fields...

class OnGoingProcess(models.Model):
    user = models.OneToOneField(DeliveryProfile, on_delete=models.CASCADE, related_name="ongoing_process")
    associated_object = models.OneToOneField(CartObject, on_delete=models.CASCADE, related_name="associated_process", blank=True, null=True)
    # Other fields...

Current View Code:

@user_passes_test(lambda user: user.is_staff)
def process_manager_view(request):
    # Get objects that are already assigned in OnGoingProcess
    assigned_objects = OnGoingProcess.objects.values_list('associated_object', flat=True)

    # Exclude objects that are already assigned
    available_objects = CartObject.objects.filter(status="pending").exclude(id__in=assigned_objects).order_by("-id")

    context = {
        "available_objects": available_objects,
    }
    return render(request, "useradmin/available-objects.html", context)

Issue:

  1. I am using values_list('associated_object', flat=True) to extract the assigned object IDs.
  2. Then, I am using exclude(id__in=assigned_objects) to filter out those objects.
  3. Is this the most efficient way? Or is there a better Django ORM method to achieve the same result?
  4. Should I use Subquery(), isnull=False, or any other approach for better performance?

Alternative Solutions I Considered:

Option 1: Using isnull=False

available_objects = CartObject.objects.filter(status="pending").exclude(associated_process__isnull=False)

Pros: Simple, avoids extra queries.
Cons: Not sure if it's the best approach for performance.

Option 2: Using Subquery

from django.db.models import Subquery

assigned_objects = OnGoingProcess.objects.values('associated_object')
available_objects = CartObject.objects.filter(status="pending").exclude(id__in=Subquery(assigned_objects))

Pros: Optimized for large datasets.
Cons: More complex.

Option 3: Using Raw SQL (if necessary)

from django.db import connection

with connection.cursor() as cursor:
    cursor.execute("""
        SELECT id FROM useradmin_cartobject 
        WHERE status='pending' 
        AND id NOT IN (SELECT associated_object FROM useradmin_ongoingprocess)
        ORDER BY id DESC
    """)
    result = cursor.fetchall()

available_objects = CartObject.objects.filter(id__in=[row[0] for row in result])

Pros: Performance boost for huge data.
Cons: Less readable, database-dependent.


Question:

What is the best and most efficient Django ORM approach to filter objects while excluding those that are already assigned?
Would values_list(), isnull=False, or Subquery() be the recommended way?
Are there any performance considerations when working with large datasets?

Thank you in advance!

Share Improve this question edited Mar 24 at 6:22 Google User asked Mar 2 at 14:10 Google UserGoogle User 1751 silver badge8 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 5

Your query:

available_objects = (
    CartObject.objects.filter(status='pending')
    .exclude(id__in=assigned_objects)
    .order_by('-id')
)

will use a subquery, so run as:

AND id NOT IN (SELECT associated_object FROM useradmin_ongoingprocess)

You can inspect it with:

print(available_objects.query)

But on databases like MySQL, this is not the most efficient one no.

For most databases:

available_objects = CartObject.objects.filter(status='pending').filter(
    associated_process=None
)

will yield results efficiently. We can rewrite this to:

available_objects = CartObject.objects.filter(
    status='pending', associated_process=None
)

this will generate the same query, but is a bit shorter in code.

This works based on the fact that a LEFT OUTER JOIN includes a NULL row for items for which there is no corresponding item at the table for the OnGoingProcess model. Then we thus retrieve only the ones with NULL, so only retain the CartObject with no OnGoingProcess.

As for raw queries, one usually uses this if there is no ORM-equivalent available, or when it is very cumbersome to construct it. Raw queries have additional disadvantages: a QuerySet can be filtered, paginated, etc. whereas a raw query is an "end product": you can not manipulate it further, so limiting what you can do with it with Django. It also means that if you change the corresponding models, like adding a column, you will probably have to rewrite the raw queries that work with these models as well.

发布评论

评论列表(0)

  1. 暂无评论