I guess accuracy trumps in.

Helfenstein: Gleaves: see that ‘data:’ at the start? the whole PDF is in the URL, encoded in base64

Alonza: Because when you order a car, it won’t have the same order description as when I order a pot of peanutbutter

Cyrnek: So there is only one element which contains lots of text – so now you need to extract the text

Mcnair: And the quantity ’20’ could also be part of a phone number

Blais: From this one element

Beech: Gleaves: is that a page you’re generating?

Shealey: So the only pattern is the position of the text of this single element

Soho: The only way for me to know what text has what meaning is by looking at the original PDF, and establishing a profile with exact coordinates

Guilianelli: The easiest thing which comes to mind would be probably macroing the mouse to drag-select at that position

Sovak: That requires a manual action

Bergamine: Gillice: the pdf lib API you are using doesn’t offer a method for this?

Fuglsang: Could there be a better suited pdf api?

Molony: This is already very useful

Bolender: I just need to find the most efficient way to search the stack

Lutterman: So you get the text of that element and now you have to find out where the part of text is that wold be on that position

Korbel: I have absolutely no issue finding these elements, it already works in v1 of this tool

Sobus: But now there are a few more variables to take into account

Breidenbach: Hashtag__: I have the text and the position

Schwipps: And I have a profile that tells me the position and its meaning

Deringer: So you have to find out what kind of combination it is ?

Supino: So I need to compare this huge stack of random texts to the profile

Nanka: Like two text fields with this kind of text means bill of this kind

Shreck: You iterate over the elements and check whether they are the elements for the given position?

Deavers: I have a profile that tells me the phone number is at 12.04, 13.564 for example

Mangione: Which isn’t too hard for things like single phone numbers

Raul: And the pdf api got no such function “give me element at position xy”

Gruhlke: Because that’s 1 item at a fixed position for every single invoice

Giarratano: You have to make it yourself

Josue: Hm, a for loop over the items and checking the position?

Emmerling: The fun part is orders

Carson: Does anyone think phantomjs is good enough. Or do you really want to test your site against real browsers?

Kules: Because there might be 1 order, there might be 20 orders

Domhoff: Carson: a real end to end test would involve real browsers the end users are using

Abramowski: And they might be on 1 row or 2

Battani: And they might be in columns or not

Luker: So what I do at the moment is find the x-coordinate of any order, which is the same for every order but otherwise unique in the invoice

Vulich: Then I’ll also know its y-coordinate, and I know the rest of the row will have the same y-coordinate

Armada: So I need to search the stack for that y-coordinate

Vandenbergh: Carson: it’s not one or the other: use one for speed, and the other for accuracy

Packard: Hashtag__: yes, I could just go through the original stack a million times until I have all the data

Pacini: But that’s very expensive

Henslin: You can use a hash or something like that

Stablein: So I’m trying to figure out how to reduce the amount of lookups and the expense of each lookup

Kingma: For a cheap look up of a particular field

Brueckman: A small internal database may help

Carson: I guess accuracy trumps in my case. it’s just such a h***le to set up selenium