How to convert Xml files to Text Files [closed]
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago .
I have around 8000 xml files that needs to be converted into text files. The text file must contain title, description and keywords of the xml file without the tags and removing other elements and attributes as well. In other words, i need to create 8000 text files containing the title,description and keywords of the xml file. I need codings for this to be done systematically. Any help would be greatly appreciated. Thanks in advance.
4 Answers 4
Going from XML to text smells like a job for XSLT — it’s a XML-based transformation language that can take an XML input and convert it to anything text-based on the output side.
You can read up on XSLT on lots of websites — one of the better tutorials in the W3Schools one.
Since you didn’t post any sample XML, I have no clue what your XML looks like, and also no idea what your output should be. But assuming it would look something like:
you could easily write a XSLT transformation to turn that into
YourTextFile.txt
or whatever other format you are looking for.
My suggestion would be to use Python. You can use the interpreter to run the pattern while you are setting it up, command line goes along way in setting this sort of thing up properly. Assuming the xml is valid this should allow you the most flexibility with the least hassle.
so assuming the following xml format:
and assuming the output of each document should be:
The python code might look something like:
from which you could generate a batch file to update files regularly (assuming it is a windows environment though python works in whatever).
There are a couple of possibilities. If it is simple XML you can read it like any other text file, filter out the angle brackets and add in your own strategically-placed punctuation. Or, you can open up an XML reader and a text writer, and output it any way you want.
If you read the file names from the folder into a collection, you can loop through them and process all of the files automatically.
I’ve had similar issues when I copied text messages from my phone to a file and it was an .xml format and had symbols and characters in between each word and I wanted to edit those out. So I downloaded Notepad++ and opened the .xml file in it. Say you want to delete all instances of . You highlight (sample text) and the click Replace icon (it’s a blue b→a icon in the tool bar at the top). It’ll have the highlighted text in the «Find what» field and then you leave the «Replace With» field blank and choose Replace All and it’ll get rid of all instances of (sample text). Do that for all symbols and text and replace with what you want or it should be. I had over 4800 lines and it worked great.
Not the answer you’re looking for? Browse other questions tagged xml text or ask your own question.
Linked
Related
Hot Network Questions
site design / logo © 2021 Stack Exchange Inc; user contributions licensed under cc by-sa. rev 2021.10.8.40416
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.
Источник
How to format XML document in Linux
I have following XML tags in a large number.
I want to get this formatted in the following manner. I have tried using xmllint but it is not working for me. Please provide help.
4 Answers 4
For tab indentation:
For four space indentation:
Without programming you can use Eclipse XML Source Editor . Have a look at this answer
By the way have you tried xmllint -format -recover nonformatted.xml > formated.xml ?
EDIT:
I do it from gedit. In gedit, you can add any script, in particular a Python script, as an External Tool. The script reads data from stdin and writes output to stdout, so it may be used as a stand-alone program. It layouts XML and sorts child nodes.
There are two tricky things:
By default, the spaces are not ignored, which may produce a strange result.
Again, by default there is no pretty-printing either.
I configure this tool to work on the current selection and replace the current selection because usually there are HTTP headers in the same file, YMMV.
If you do not need child node sorting, just comment the corresponding line out.
UPDATE v2 places header in front of anything else; fixed spaces
UPDATE getting lxml on Ubuntu 18.04.3 LTS bionic:
Источник