there are millions of documents and all these documents have a price element, an index is created on price with type as long. However, some of the documents have prices with some junk/string values. So, executing the below query:
cts:search(/product,cts:element-range-query(xs:QName("price"),"=",xs:long(100)))
throws an error:
Invalid cast: xs:untypedAtomic("Expensive") cast as xs:long
How do we handle this issue? Also, how to identify all these docs, which has junk/string value in price element.
there are millions of documents and all these documents have a price element, an index is created on price with type as long. However, some of the documents have prices with some junk/string values. So, executing the below query:
cts:search(/product,cts:element-range-query(xs:QName("price"),"=",xs:long(100)))
throws an error:
Invalid cast: xs:untypedAtomic("Expensive") cast as xs:long
How do we handle this issue? Also, how to identify all these docs, which has junk/string value in price element.
Share Improve this question edited Feb 7 at 15:19 Mads Hansen 66.7k12 gold badges116 silver badges150 bronze badges asked Feb 6 at 13:28 KumarKumar 511 silver badge2 bronze badges2 Answers
Reset to default 0Unfortunately, if your data isn't uniform and consistently xs:long
values and you create a range index with invalid values=ignore
then it will index your documents, but you can get an error at runtime when you execute a query and filtering encounters those invalid values.
https://docs.marklogic.com/admin-help/range-element-index
- invalid values specifies whether server should allow insertion of documents that contain XML elements or JSON properties on which range index is configured and their contents cannot be coerced to the index data type. It can be configured to either
ignore
orreject
. By default server rejects insertion of such documents. However, if a user configures invalid values toignore
, these documents can be inserted. This setting does not change the behavior of queries on invalid values after documents are inserted in the database. Performing an operation on an invalid value at query time can still result in an error.
invalid values=ignore
will ignore the problem at index time, but does not ignore it at query time. Many people find this surprising/annoying/inconvenient, and there have been requests to change this behavior.
If you set invalid values=reject
then you will encounter an error when attempting to insert documents with "junk" string values that are not xs:long
, and will also see errors when it attempts to re-index existing docs that have those invalid string values.
So, if you want to have an element-range-index of xs:long
and avoid those errors, you could update your docs to either modify the content (remove those elements with "junk" values), use the envelope design pattern and create a separate section that only has elements with xs:long
values, or maybe add an attribute for xs:long
values to index, or consider using Template Driven Extraction to conditionally create a TDE only for those docs that have xs:long
values.
You can use the XPath operator castable as
to check whether a given price
can actually be cast to a long, e.g.
"Expensive" castable as xs:long
returns false