InnerText identification and the HTML spec
Posted: Thu Mar 06, 2014 8:44 pm
I'm having a bit of an issue with identifying some objects using their InnerText attribute. The objects in question have an InnerText that has line-breaks and carriage-returns in the middle of sentences.
For example, the InnerText is
"Sign in to view your
designs »"
(as in "Sign in to view your\r\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ designs »"), but the HTML rendering engine displays it as "Sign in to view your designs »" because the HTML spec requires that the rendering engine remove all multi-space and and other special characters (\r \n \t) and replace them with " " (a single space).
This means that the developers will add line-breaks to the code in order to make it readable and allow it to pass the code review system, and the browser will display the code without the line breaks, but Ranorex sees all of the characters (except in IE8, where it sees just what the browser displays). Before anyone says that we should just remove the special characters, this isn't going to happen, and shouldn't be necessary given the HTML spec.
My concern is how to identify elements using the InnerText (sometimes the only way to ID) without regard to the special characters.
Is there a way to do this without having to go through each and every repository object or new object and specifically change the regex to include "\s+" in place of all of the white-space characters? This would be a huge undertaking and potentially make the test prone to errors, as well as meaning that the text in the XPath would not be as easily human-readable.
I am not sure that this isn't a problem with Ranorex and how it is presenting the InnerText element, because the HTML renderer strips those special characters but Ranorex doesn't.
Does anyone else have this issue or has anyone found a clean way around it?
For example, the InnerText is
"Sign in to view your
designs »"
(as in "Sign in to view your\r\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ designs »"), but the HTML rendering engine displays it as "Sign in to view your designs »" because the HTML spec requires that the rendering engine remove all multi-space and and other special characters (\r \n \t) and replace them with " " (a single space).
This means that the developers will add line-breaks to the code in order to make it readable and allow it to pass the code review system, and the browser will display the code without the line breaks, but Ranorex sees all of the characters (except in IE8, where it sees just what the browser displays). Before anyone says that we should just remove the special characters, this isn't going to happen, and shouldn't be necessary given the HTML spec.
My concern is how to identify elements using the InnerText (sometimes the only way to ID) without regard to the special characters.
Is there a way to do this without having to go through each and every repository object or new object and specifically change the regex to include "\s+" in place of all of the white-space characters? This would be a huge undertaking and potentially make the test prone to errors, as well as meaning that the text in the XPath would not be as easily human-readable.
I am not sure that this isn't a problem with Ranorex and how it is presenting the InnerText element, because the HTML renderer strips those special characters but Ranorex doesn't.
Does anyone else have this issue or has anyone found a clean way around it?