Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore moving to fast_xml completely #16

Closed
oubiwann opened this issue Aug 12, 2023 · 4 comments
Closed

Explore moving to fast_xml completely #16

oubiwann opened this issue Aug 12, 2023 · 4 comments
Milestone

Comments

@oubiwann
Copy link
Member

With ticket #15 we've needed to start using fast_xml. This ticket considers what it would take to move lxml entirely to fast_xml.

@oubiwann
Copy link
Member Author

Test XML file:

<?xml version="1.0"?>
<customers>
   <customer id="55000">
      <name>Charter Group</name>
      <address>
         <street>100 Main</street>
         <city>Framingham</city>
         <state>MA</state>
         <zip>01701</zip>
      </address>
      <address>
         <street>720 Prospect</street>
         <city>Framingham</city>
         <state>MA</state>
         <zip>01701</zip>
      </address>
      <address>
         <street>120 Ridge</street>
         <state>MA</state>
         <zip>01760</zip>
      </address>
   </customer>
</customers>

As parsed by erlsom:simple_form:

#(ok
  #("customers"
    ()
    (#("customer"
       (#("id" "55000"))
       (#("name" () ("Charter Group"))
        #("address"
          ()
          (#("street" () ("100 Main"))
           #("city" () ("Framingham"))
           #("state" () ("MA"))
           #("zip" () ("01701"))))
        #("address"
          ()
          (#("street" () ("720 Prospect"))
           #("city" () ("Framingham"))
           #("state" () ("MA"))
           #("zip" () ("01701"))))
        #("address"
          ()
          (#("street" () ("120 Ridge"))
           #("state" () ("MA"))
           #("zip" () ("01760"))))))))
  "\n")

As parsed by `fast_xml:

#(xmlel
  #"customers"
  ()
  (#(xmlcdata #"\n   ")
   #(xmlel
     #"customer"
     (#(#"id" #"55000"))
     (#(xmlcdata #"\n      ")
      #(xmlel #"name" () (#(xmlcdata #"Charter Group")))
      #(xmlcdata #"\n      ")
      #(xmlel
        #"address"
        ()
        (#(xmlcdata #"\n         ")
         #(xmlel #"street" () (#(xmlcdata #"100 Main")))
         #(xmlcdata #"\n         ")
         #(xmlel #"city" () (#(xmlcdata #"Framingham")))
         #(xmlcdata #"\n         ")
         #(xmlel #"state" () (#(xmlcdata #"MA")))
         #(xmlcdata #"\n         ")
         #(xmlel #"zip" () (#(xmlcdata #"01701")))
         #(xmlcdata #"\n      ")))
      #(xmlcdata #"\n      ")
      #(xmlel
        #"address"
        ()
        (#(xmlcdata #"\n         ")
         #(xmlel #"street" () (#(xmlcdata #"720 Prospect")))
         #(xmlcdata #"\n         ")
         #(xmlel #"city" () (#(xmlcdata #"Framingham")))
         #(xmlcdata #"\n         ")
         #(xmlel #"state" () (#(xmlcdata #"MA")))
         #(xmlcdata #"\n         ")
         #(xmlel #"zip" () (#(xmlcdata #"01701")))
         #(xmlcdata #"\n      ")))
      #(xmlcdata #"\n      ")
      #(xmlel
        #"address"
        ()
        (#(xmlcdata #"\n         ")
         #(xmlel #"street" () (#(xmlcdata #"120 Ridge")))
         #(xmlcdata #"\n         ")
         #(xmlel #"state" () (#(xmlcdata #"MA")))
         #(xmlcdata #"\n         ")
         #(xmlel #"zip" () (#(xmlcdata #"01760")))
         #(xmlcdata #"\n      ")))
      #(xmlcdata #"\n   ")))
   #(xmlcdata #"\n")))

@oubiwann
Copy link
Member Author

oubiwann commented Aug 14, 2023

That's a little cumbersome for fast_xml ... but with some hacking and cleanup, I can get results like this:

#(customers
  #M()
  (#(customer
     #M(#"id" #"55000")
     (#(name #M() (#"Charter Group"))
      #(address
        #M()
        (#(street #M() (#"100 Main"))
         #(city #M() (#"Framingham"))
         #(state #M() ())
         #(zip #M() (#"01701"))))
      #(address
        #M()
        (#(street #M() (#"720 Prospect"))
         #(city #M() (#"Framingham"))
         #(state #M() ())
         #(zip #M() (#"01701"))))
      #(address
        #M()
        (#(street #M() (#"120 Ridge"))
         #(state #M() ())
         #(zip #M() (#"01760"))))))))

Note: this command was left in draft mode for a day or more ... since this hack, support has landed for this in lxml.

@oubiwann
Copy link
Member Author

I think it's fair to say that once streaming support is added, we're on our way.

Probably 0.5.0 can be the release where erlsom is dropped and fast_xml is used exclusively.

@oubiwann
Copy link
Member Author

Closing this as "successfully explored". Barring any insurmountably difficulties with streaming support, we'll be moving forward on a switch to fast_xml.

@oubiwann oubiwann added this to the 0.4.0 milestone Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant