Can I export text? - Beginners' Questions

#1

2021-08-12

*

Perhaps a silly question but I have a bunch of text fields acquired via measuring path lengths. I need to clean them up (remove the units) and add up the numbers.

I currently use a tedious workflow of copying svg source and regexing through the file to remove the text that surrounds the numerical values I'm after. I export the remaining numerical values into a text file, and then to Excel to sum them up. So I'm wondering if there is a simpler way to extract these text fields as plain text?

Thanks.

#2

Tyler Durden @TylerDurden⚖

2021-08-13

👍*

Fast and dirty: save as pdf (embed text), open pdf and select all text, copy and paste into spreadsheet.

#3

thx-1138 @thx-1138

2021-08-14

Will try that. Thank you.

#4

Xav @Xav👹

2021-08-15

Is the question "how can I export text", or is it "can I sum all the numeric text values in a document"? If it's the latter then a few lines of JavaScript, saved as a bookmarklet in a web browser, could probably do the job. Open the file in the browser, click the bookmarklet and get given the result.

Even if you need something a bit more complex than that (e.g. a CSV of the values, or only a subset summed) it might still be a simple task in JS. Can you post an example file with some details about what exactly you need to get out at the end of the process?

#5

thx-1138 @thx-1138

2021-08-17

Fair enough @Xav, that is also a valid approach. I'm not as familiar with JavaSript but I'll try to dip my toes into it and see how I can do this. Thanks.

#6

Xav @Xav👹

2021-08-18

If you can post an example file then I'm happy to help with this. For example this bit of code may be all you need (you can try running it in the developer console in the browser with one of your files loaded):

var result = Array.from(document.querySelectorAll('tspan')).reduce((acc, node) => { const value = parseFloat(node.innerHTML); return acc + (isNaN(value) ? 0 : value); }, 0); alert(result);

This includes some rudimentary error checking, but depending on what the text in your files looks like, even that much may not be necessary; or conversely it may require even more error checking to rule out pieces of text that might be interpreted as numbers when they shouldn't be. Without knowing more about the structure of the file, and what exactly you want to achieve, it's hard to say whether using JS would be (relatively) trivial, or would quickly become complex.

For a lot of these sort of tasks JS has an edge in that it has good DOM bindings for finding and extracting specific nodes. For example in the above code, document.querySelectorAll('tspan') is all that's needed to get handle on every <tspan> element in the document, no matter how deeply nested it is - all the rest is just there to iterate over each node, convert the text to a floating point number, and sum them together.