django - Using StringAgg after filter & distinct

I'm using StringAgg and order as follows:

    # Get order column & annotate with list of credits
    if request.POST.get('order[0][name]'):
        order = request.POST['order[0][name]']
        if order == 'credits_primary':
            releases = releases.annotate(credits_primary=StringAgg(
                'credits__entity__name',
                delimiter=', ',
                filter=Q(credits__type='primary'),
                ordering='credits__id'
            ))
        elif order == 'credits_secondary':
            releases = releases.annotate(credits_secondary=StringAgg(
                'credits__entity__name',
                delimiter=', ',
                filter=Q(credits__type='secondary'),
                ordering='credits__id'
            ))
    else:
        order = 'title'
    
    # Order releases
    if request.POST.get('order[0][dir]') == 'desc':
        releases = releases.order_by(F(order).desc(nulls_last=True), 'title')
    else:
        releases = releases.order_by(F(order).asc(nulls_last=True), 'title')
    
    for release in releases:
        try: print(release.credits_primary)
        except: pass
        try: print(release.credits_secondary)
        except: pass

This in itself works exactly as expected: the ordering is what I expect, and print returns the values I expect.

However, when I apply a filter before this, where the result includes multiple instances of the same release, the credits__entity__names will be repeated or omitted for that release, depending on how many results there are for the field(s) being filtered by, despite using distinct.

Below is the filter I'm applying that affects the aggregation (applied before StringAgg):

    # Other filters...

    # Filter by Search
    if request.POST.get('search[value]'):
        query = Q()
        search = request.POST['search[value]']
        query.add(Q(title__icontains=search), Q.OR)
        query.add(Q(tags__tag__name__icontains=search), Q.OR)
        query.add(Q(credits__entity__name__icontains=search), Q.OR)
        releases = releases.filter(query)
        
    # Make sure items aren't repeated
    releases = releases.distinct()

I'm using StringAgg and order as follows:

    # Get order column & annotate with list of credits
    if request.POST.get('order[0][name]'):
        order = request.POST['order[0][name]']
        if order == 'credits_primary':
            releases = releases.annotate(credits_primary=StringAgg(
                'credits__entity__name',
                delimiter=', ',
                filter=Q(credits__type='primary'),
                ordering='credits__id'
            ))
        elif order == 'credits_secondary':
            releases = releases.annotate(credits_secondary=StringAgg(
                'credits__entity__name',
                delimiter=', ',
                filter=Q(credits__type='secondary'),
                ordering='credits__id'
            ))
    else:
        order = 'title'
    
    # Order releases
    if request.POST.get('order[0][dir]') == 'desc':
        releases = releases.order_by(F(order).desc(nulls_last=True), 'title')
    else:
        releases = releases.order_by(F(order).asc(nulls_last=True), 'title')
    
    for release in releases:
        try: print(release.credits_primary)
        except: pass
        try: print(release.credits_secondary)
        except: pass

This in itself works exactly as expected: the ordering is what I expect, and print returns the values I expect.

Below is the filter I'm applying that affects the aggregation (applied before StringAgg):

    # Other filters...

    # Filter by Search
    if request.POST.get('search[value]'):
        query = Q()
        search = request.POST['search[value]']
        query.add(Q(title__icontains=search), Q.OR)
        query.add(Q(tags__tag__name__icontains=search), Q.OR)
        query.add(Q(credits__entity__name__icontains=search), Q.OR)
        releases = releases.filter(query)
        
    # Make sure items aren't repeated
    releases = releases.distinct()

Share Improve this question edited Mar 30 at 11:18 asked Mar 29 at 17:03 bur 8098 silver badges22 bronze badges

Add a comment |

2 Answers 2

Sorted by: Reset to default 2

Below is the filter I'm applying that affects the aggregation (applied before StringAgg):

The filter looks at (other) one-to-many or many-to-many relations. This means that this will result in LEFT OUTER JOIN that will start to act as a multiplier. The same works with other aggregates such as SUM and COUNT.

Indeed, if you just aggregate, the query looks like:

SELECT string_agg(CASE
                      WHEN c.type = 'primary' THEN e.name
                      ELSE NULL
                  END
                  ORDER BY e.id)
FROM releases AS r
LEFT OUTER JOIN credits AS c ON c.release_id = r.id
LEFT OUTER JOIN entity AS e ON e.credit_id = e.id

and that is fine, because we want to aggregate over the entities, so per release we get all entities.

But now if we for example filter on the tags, it looks like:

SELECT string_agg(CASE
                      WHEN c.type = 'primary' THEN e.name
                      ELSE NULL
                  END
                  ORDER BY e.id)
FROM releases AS r
LEFT OUTER JOIN credits AS c ON c.release_id = r.id
LEFT OUTER JOIN entity AS e ON e.credit_id = e.id
LEFT OUTER JOIN release_tag rt1 ON rt1.release_id = r.id
LEFT OUTER JOIN tag t1 ON rt1.tag_id = t1.id
LEFT OUTER JOIN release_tag rt2 ON rt2.release_id = r.id
LEFT OUTER JOIN tag t1 ON rt2.tag_id = t2.id
WHERE t1.name = 'tag₁'
  OR t2.name = 'tag₂'

This means that the entity names for a release where one of the two tags match will be repeated once, but where both match, it will be included twice.

The Q(credits__entity__name__icontains) makes it even worse: since this will join on the table c, and thus entities of credits that have two or more matches will be included that many times, whereas the entities of a release that does not match with a given credit, will no longer be included.

The least common divider of all these phenomena is probably that you should be very careful to make aggregates when you make JOINs especially if multiple tables on which the JOINs are caried our are involved.

What you probably can do is make a Subquery expression ^[Django-doc] which is thus not impacted by the "outer" tables and the corresponding JOIN.

Not sure I understand your use case, but if the problem is that you get duplicates in your aggregation, maybe you can try to use distinct on your string aggregations:

https://docs.djangoproject/fr/5.1/ref/contrib/postgres/aggregates/#stringagg

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

django - Using StringAgg after filter & distinct - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)