最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

sql - Searching text against a list of keywords - Stack Overflow

programmeradmin0浏览0评论

I am trying to write a SQL Server stored procedure, but I am having trouble determining the most performant way of handling the logic.

Assume I have a table of keywords:

Table name: Keywords

Keyword
Apple
Orange
Pear

I am trying to write a SQL Server stored procedure, but I am having trouble determining the most performant way of handling the logic.

Assume I have a table of keywords:

Table name: Keywords

Keyword
Apple
Orange
Pear

Now assume I have another table that I want to do the keyword search against:

Table name: Sentences

ID Sentence
1 Today I ate chocolate
2 Today I ate an Apple
3 Today I ate an Apple, Pear and Orange
4 Today I ate nothing
5 Today I ate a Pear

There are two different searches that I need to do.

The first is any row where any of the keywords exists in the sentence - this would return IDs 2, 3 & 5.

The second is any row where all of the keywords exist in the sentence - this would return ID 3

This could be done by iterating through the keywords list one by one through the sentences table, but that is very costly and not very performant.

I am wondering if anyone can suggest a better way of approaching this, as some kind of join?

This seems to work for the 1st query:

select s.* 
from 
    Sentences s
    inner join Keywords k on s.Sentence like concat('%', k.keyword, '%')

I am still lost on how to perform the second version

Share Improve this question edited Feb 4 at 0:44 Charlieface 72.5k8 gold badges32 silver badges64 bronze badges asked Feb 3 at 21:38 user978426user978426 891 silver badge9 bronze badges 2
  • WHat is your SQL Server version? (SELECT @@VERSION) – Yitzhak Khabinsky Commented Feb 3 at 22:00
  • Microsoft SQL Server 2022 (RTM-CU16) (KB5048033) - 16.0.4165.4 (X64) Nov 6 2024 19:24:49 Copyright (C) 2022 Microsoft Corporation Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2022 Datacenter 10.0 <X64> (Build 20348: ) (Hypervisor) – user978426 Commented Feb 3 at 22:26
Add a comment  | 

3 Answers 3

Reset to default 2

After some testing, I have a solution that works performantly across the volume of data I need, however Yitzhak Khabinsky, I want to thank you for the effort you put to into your answer, which was very clever in it's approach.

For any:

select distinct s.ID
from 
    Sentences s
    inner join Keywords k on s.Sentence like  concat('%', k.keyword, '%')

For all:

select
    s.ID
from
    Sentences s
    inner join Keywords k on s.Sentence like  concat('%', k.keyword, '%')
group by s.ID
having count(*) = (select count(*) from Keywords)

Please try the following solution leveraging SQL Server's XML and XQuery functionality.

It will work starting from SQL Server 2017 onwards due to dependency on the STRING_AGG() function.

Raw data is tokenized as XML visible in the column c. After that XQuery's Quantified Expression provides the answer.

Please try to change keyword c.value('every... to c.value('some... in the XQuery Quantified Expression, and see the outcome.

SQL

-- DDL and sample data population, start
DECLARE @tblKeywords TABLE (Keyword VARCHAR(30));
INSERT INTO @tblKeywords (Keyword) VALUES
('Apple'),
('Orange'),
('Pear');

DECLARE @tblSentences TABLE (id INT IDENTITY PRIMARY KEY, sentence VARCHAR(256));
INSERT INTO @tblSentences (sentence) VALUES
('Today I ate chocolate'),
('Today I ate an Apple'),
('Today I ate an Apple, Pear and Orange'),
('Today I ate nothing'),
('Today I ate a Pear');
-- DDL and sample data population, end

DECLARE @Separator CHAR(1) = SPACE(1)
    , @Comma CHAR(1) = ',';
DECLARE @searchFor VARCHAR(256) = (SELECT STRING_AGG(Keyword, @Separator) FROM @tblKeywords);

WITH rs AS
(
   SELECT * 
      , c.value('every $r in /root/source/r/text()
           satisfies $r = /root/target/r/text()', 'BIT') AS Result
   FROM @tblSentences AS t
   CROSS APPLY (SELECT TRY_CAST('<root><source><r><![CDATA[' 
      + REPLACE(@searchFor, @Separator, ']]></r><r><![CDATA[') 
      + ']]></r></source>'
      + '<target><r><![CDATA[' + REPLACE(REPLACE(sentence,@Comma,''), @Separator, ']]></r><r><![CDATA[') 
      + ']]></r></target></root>' AS XML) AS xmldata
     ) AS t1(c)
)
SELECT * FROM rs
WHERE Result = 1;

It will be much more efficient to use a fulltext index which has been designed for this kind of job....

-- the table and data
CREATE TABLE TXT 
(TXT_ID     INT IDENTITY 
            CONSTRAINT PK_TXT PRIMARY KEY,
 TXT_TEXTE  NVARCHAR(max)
 );
GO

INSERT INTO TXT VALUES
('Today I ate chocolate'),
('Today I ate an Apple'),
('Today I ate an Apple, Pear and Orange'),
('Today I ate nothing'),
('Today I ate a Pear');
GO

-- the fulltext index
CREATE FULLTEXT CATALOG FTX_STORAGE
   WITH ACCENT_SENSITIVITY = ON
   AS DEFAULT;
GO

CREATE FULLTEXT INDEX 
   ON TXT (TXT_TEXTE language English)
   KEY INDEX PK_TXT
   WITH CHANGE_TRACKING = AUTO;
GO

-- the search
DECLARE @SEARCH NVARCHAR(4000) = 'orange pear apple';
SET @SEARCH = REPLACE(@SEARCH, ' ', ' AND ');

SELECT TXT.*
FROM   TXT
WHERE  CONTAINS(TXT_TEXTE, @SEARCH);

-- the result
TXT_ID      TXT_TEXTE
----------- -----------------------------------------
3           Today I ate an Apple, Pear and Orange
发布评论

评论列表(0)

  1. 暂无评论