Validate PDF content

Ask general questions here.
mrt
Posts: 257
Joined: Mon Mar 16, 2020 11:31 am

Validate PDF content

Post by mrt » Fri May 06, 2022 9:13 am

Dear all,

today I tried to validate the content of a PDF.

When the PDF is opened inside the browser (Chrome) and using Record or Spy, the whole PDF viewer is recognized as one big repo item, but I cannot get inside to any content of the PDF.
When the PDF is opened outside of the browser, e.g. in Adobe Acrobat Reader, the whole PDF is recognized as one giant repo item - also no access to the content.

Like mentioned here: https://www.ranorex.info/how-to-get-dat ... tml#p25699
the document shows all restrictions as 'Allowed'.

Because the post is rather old, I wonder if there is some other restrictions / limitations why no content items are recognized.
Any idea?

thanks, BR mrt

User avatar
odklizec
Ranorex Guru
Ranorex Guru
Posts: 7470
Joined: Mon Aug 13, 2012 9:54 am
Location: Zilina, Slovakia

Re: Validate PDF content

Post by odklizec » Fri May 06, 2022 10:29 am

Hi,

Do you know with what PDF generator was created the file? Could you please share the file? I believe Ranorex supports only PDF files created with Adobe Acrobat-based tools? But I can be wrong bout this ;) Anyway, validating PDF files is always a tricky task.
Pavel Kudrys
Ranorex explorer at Descartes Systems

Please add these details to your questions:
  • Ranorex Snapshot. Learn how to create one >here<
  • Ranorex xPath of problematic element(s)
  • Ranorex version
  • OS version
  • HW configuration

mrt
Posts: 257
Joined: Mon Mar 16, 2020 11:31 am

Re: Validate PDF content

Post by mrt » Fri May 06, 2022 11:57 am

Hi,

the PDF contains sensitive data, so unfortunately I cannot share it.
And no, sorry, I do not know how it was created.

I just googled after a sample PDF and tried this one:
https://www.w3.org/WAI/ER/tests/xhtml/t ... /dummy.pdf

I cannot instrument this one, neither in Adobe Reader nor in chrome viewer.

Are you able to?

User avatar
odklizec
Ranorex Guru
Ranorex Guru
Posts: 7470
Joined: Mon Aug 13, 2012 9:54 am
Location: Zilina, Slovakia

Re: Validate PDF content

Post by odklizec » Fri May 06, 2022 1:04 pm

Hi,

Yes, I'm able to track the text, but only after making some 'Reading' adjustments...
PDFReading.png
You do not have the required permissions to view the files attached to this post.
Pavel Kudrys
Ranorex explorer at Descartes Systems

Please add these details to your questions:
  • Ranorex Snapshot. Learn how to create one >here<
  • Ranorex xPath of problematic element(s)
  • Ranorex version
  • OS version
  • HW configuration

mrt
Posts: 257
Joined: Mon Mar 16, 2020 11:31 am

Re: Validate PDF content

Post by mrt » Fri May 06, 2022 3:07 pm

Alright, when doing this (and saving the PDF) I can also track the text in Adobe Reader - but still not in Chrome.

Does it work for you in any browser?

User avatar
odklizec
Ranorex Guru
Ranorex Guru
Posts: 7470
Joined: Mon Aug 13, 2012 9:54 am
Location: Zilina, Slovakia

Re: Validate PDF content

Post by odklizec » Mon May 09, 2022 7:19 am

Hi,

No, I don't think it will work directly in browser. You need to use Acrobat Reader. The thing is, that browsers use some kind of internal PDF reader, which most probably does not incorporate the extra accessibility stuff, supported by Acrobat Reader and which is required by Ranorex.
Pavel Kudrys
Ranorex explorer at Descartes Systems

Please add these details to your questions:
  • Ranorex Snapshot. Learn how to create one >here<
  • Ranorex xPath of problematic element(s)
  • Ranorex version
  • OS version
  • HW configuration

thnessum
Posts: 1
Joined: Mon Jun 20, 2022 9:26 am

Re: Validate PDF content

Post by thnessum » Mon Jun 20, 2022 1:07 pm

Hi Ranorex gurus

I am also trying to validate data from inside a pdf file to validate a report from an old windows form program.
But I simply cannot get deeper than fetching an ALV element from the pdf file through Adobe Acrobat Reader DC.
I have 'Show all elements' active under WPF and both capture ANSI and GDI+ text enabled as well.
Not sure how it's possible to fetch the actual text from the file.

I can fine upload a snapshot if needed but for now, I've just added the pdf file

TIA
Thomas
You do not have the required permissions to view the files attached to this post.

mrt
Posts: 257
Joined: Mon Mar 16, 2020 11:31 am

Re: Validate PDF content

Post by mrt » Mon Aug 29, 2022 7:43 am

I ended up with using IvyPDF to manually walk through the PDF.
https://ivytools.net/

It takes some effort in advance, but it has some quite handy functions (e.g. splitting by words, format specifiers, lines, pages, ...) to get to the information you want.
Check out the documentation to get a clue what it can do:
https://ivytools.net/faq.html
For personal use it is free, also the developers are very nice, e.g. after I requested additional functionality, they created a beta in almost no time and sent it over to me for testing.

Also, there is this IvyTemplate editor from which you can open your PDF and watch in a graphical way how elements are organized and recognized inside the PDF - which is not always as straightforward as your eyes may think. ;)