I need to sanitize some custom settings added with the Customize API, namely some simple text fields.
Other data types have some dedicated functions, i.e. email addresses should use sanitize_email()
, URL's have esc_url_raw()
and so on.
But with simple text fields, I'm not really sure if I must use sanitize_text_field()
or wp_filter_nohtml_kses()
, even after reading their descriptions in the Code Reference they seem to be very similar and I don' really know how to choose one of them.
Intuitively, I would go with sanitize_text_field()
, but some online guides (this and this) seem to suggest that wp_filter_nohtml_kses()
is what should be used instead.
What are the exact differences between these two functions, and how can I choose one of them in this situation?
I need to sanitize some custom settings added with the Customize API, namely some simple text fields.
Other data types have some dedicated functions, i.e. email addresses should use sanitize_email()
, URL's have esc_url_raw()
and so on.
But with simple text fields, I'm not really sure if I must use sanitize_text_field()
or wp_filter_nohtml_kses()
, even after reading their descriptions in the Code Reference they seem to be very similar and I don' really know how to choose one of them.
Intuitively, I would go with sanitize_text_field()
, but some online guides (this and this) seem to suggest that wp_filter_nohtml_kses()
is what should be used instead.
What are the exact differences between these two functions, and how can I choose one of them in this situation?
Share Improve this question asked Jan 7, 2020 at 16:30 SekhemtySekhemty 1512 silver badges14 bronze badges2 Answers
Reset to default 7What Do They Do?
wp_filter_nohtml_kses
strips all HTML from a string, that's it. It does it via the wp_kses
function and it expects slashed data, here's its implementation:
function wp_filter_nohtml_kses( $data ) {
return addslashes( wp_kses( stripslashes( $data ), 'strip' ) );
}
sanitize_text_field
on the other hand does more than that, the doc says:
- Checks for invalid UTF-8,
- Converts single < characters to entities
- Strips all tags
- Removes line breaks, tabs, and extra whitespace
- Strips octets
Specificially the point of sanitize_text_field
is to sanitize text fields. Since you want to sanitize a text field, you should use sanitize_text_field
.
So Why Does One Article Recommend wp_filter_nohtml_kses
?
It doesn't.
The author correctly recognised the variable may contain HTML, and searched for a function to do just that, and found wp_filter_nohtml_kses
.
However this is neither the best function for this job, or the appropriate one. sanitize_text_field
targets the actual problem, that the field needs sanitising, and does a more than strip out tags.
The author could easily have chosen wp_strip_all_tags
, but should have gone for sanitize_text_field
.
Which brings us to the main points:
- Not all guides are correct
- When faced with a choice between 2 functions, pick the one that literally says what you want to do, not the obscure one
wp_filter_nohtml_kses
would not have protected against octal characters, and other things thatsanitize_text_field
does, but perhaps the author was unaware of those attack vectors?- Some of the things
sanitize_text_field
does are less about security and more about cleanliness, e.g. stripping extra whitespace
Escaping vs Sanitising
URL's have esc_url_raw() and so on.
Not quite, esc_url_raw
is an escaping function, escaping is not sanitisation! Though this is a rare exception where it can be used as a sanitising function
- Sanitising data cleans it
- Validating data confirms it
- Escaping data secures it
Sanitise and validate on input, escape on output.
Another way to think of it, is like a gameshow:
- Sanitising is the cleanup they do to get you ready for TV
- Validation is the judge at the end of the round who checks that you did what you were asked
- Escaping is the giant wall with the shaped hole you have to fit through or it'll push you off the ledge
For example, consider this value: " [email protected] <b>hello!</b> "
- we sanitize with
sanitize_email
, eliminating trailing spaces, etc, giving us[email protected]
- we can validate it with
validate_email
to check if it is indeed an email - At this point we can process the input
Some time passes...
- now we need to output the data, so we escape it
echo esc_url( 'mailto:'.$email );
Now we get a mailto
link with an email, it will always be a link. There is no dithering about it should be a link, it can be a link, it's supposed to be a link, etc. It is guaranteed to be a link, there is now certainty. No assumptions need to be made.
Lets say that $email
was sanitised and validated, and saved, yet the sites database was mangled or modified during a hack. Now, $email
contains a JS bitcoin miner! Or it did until esc_url
mangled it into an email. The email isn't usable, but it has the format of an email.
For this reason, escaping is only done on output, uses an escaping function appropriate for the exected output, not the data ( esc_attr
for html attributes, esc_html
for text, esc_url
for URLs, and so on ). Escaping is also done at the latest possible moment, closest to output. This way escaped data can't be modified after its been escaped, and there's no confusion about when something was escaped that might cause double escaping.
For this reason, avoid adding HTML to a variable then echoing it at the end, or passing around complex HTML fragments in variables. You have no way to know if they're safe or not
The key difference is sanitize_text_field()
:
- Checks for invalid UTF-8
- Converts single
<
characters to entities - Strips all tags
- Removes line breaks, tabs, and extra whitespace
- Strips octets
...and is designed primarily for user input (e.g. GET/POST/COOKIE
).
Meanwhile wp_filter_nohtml_kses()
just strips all HTML from a text string. Worth noting that it expects the string param you pass to be slashed.
As for which one you use, tradition says "sanitize on input, escape on output". Which would suggest sanitize_text_field
in your case, and use wp_filter_nohtml_kses
(or similar) when you output it.
Having said that, if you never want users to be able to save HTML in these text fields, you should strip it out on input too - but always use sanitize_text_field
(or similar) beforehand.