|
|||
|
It seems to me the usual XML tools in Java load the entire XML file
into RAM. Are there any tools that process sequentially, bringing in only a chunk at a time so you could handle really fat files. -- Roedy Green Canadian Mind Products http://mindprod.com Every compilable program in a sense works. The problem is with your unrealistic expections on what it will do. |
|
|
||||
|
||||
|
|
|
|||
|
On 7.2.2010 19:59, Roedy Green wrote:
> It seems to me the usual XML tools in Java load the entire XML file > into RAM. Are there any tools that process sequentially, bringing in > only a chunk at a time so you could handle really fat files. Java has tools for such XML files. SAX processes XML so that it does not need to load it all to memory. -- Good day for a change of scene. Repaper the bedroom wall. |
|
|||
|
In article <3qvtm5h7bf92h7nos1nms4oc4m6cd203d6@4ax.com>,
Roedy Green <see_website@mindprod.com.invalid> wrote: > It seems to me the usual XML tools in Java load the entire XML file > into RAM. Are there any tools that process sequentially, bringing in > only a chunk at a time so you could handle really fat files. I thought that was a principal advantage of the Simple API For XML (SAX) model, at least in principle. :-) <http://www.totheriver.com/learn/xml/xmltutorial.html> -- John B. Matthews trashgod at gmail dot com <http://sites.google.com/site/drjohnbmatthews> |
|
|||
|
Roedy Green wrote:
> It seems to me the usual XML tools in Java load the entire XML file > into RAM. Are there any tools that process sequentially, bringing in > only a chunk at a time so you could handle really fat files. Sounds like you want the XMLStreamReader interface: http://java.sun.com/javase/6/docs/ap...eamReader.html I haven't used the Java version myself (there's a similar type in .NET), and haven't looked closed to determine the specifics. But I presume there's a way to get an implementation of the interface (looks like XMLInputFactory is the way to go). Of course, if per a previous discussion you're stuck on Java 1.5, this is unavailable to you. But otherwise, you should find it exactly what you're asking for. Pete |
|
|||
|
On 7.2.2010 20:14, Peter Duniho wrote:
> Roedy Green wrote: >> It seems to me the usual XML tools in Java load the entire XML file >> into RAM. Are there any tools that process sequentially, bringing in >> only a chunk at a time so you could handle really fat files. > > Sounds like you want the XMLStreamReader interface: > http://java.sun.com/javase/6/docs/ap...eamReader.html > > I haven't used the Java version myself (there's a similar type in .NET), > and haven't looked closed to determine the specifics. But I presume > there's a way to get an implementation of the interface (looks like > XMLInputFactory is the way to go). > > Of course, if per a previous discussion you're stuck on Java 1.5, this > is unavailable to you. But otherwise, you should find it exactly what > you're asking for. > > Pete SAX interface works fine even with Java 1.4, and it does what Roedy wants. -- Good day for a change of scene. Repaper the bedroom wall. |
|
|||
|
On 2/7/2010 1:20 PM, Donkey Hottie wrote:
> On 7.2.2010 20:14, Peter Duniho wrote: >> Roedy Green wrote: >>> It seems to me the usual XML tools in Java load the entire XML file >>> into RAM. Are there any tools that process sequentially, bringing in >>> only a chunk at a time so you could handle really fat files. >> >> Sounds like you want the XMLStreamReader interface: >> http://java.sun.com/javase/6/docs/ap...eamReader.html >> >> I haven't used the Java version myself (there's a similar type in .NET), >> and haven't looked closed to determine the specifics. But I presume >> there's a way to get an implementation of the interface (looks like >> XMLInputFactory is the way to go). >> >> Of course, if per a previous discussion you're stuck on Java 1.5, this >> is unavailable to you. But otherwise, you should find it exactly what >> you're asking for. >> >> Pete > > SAX interface works fine even with Java 1.4, and it does what Roedy wants. It's been around since Java 1.2; it better work with 1.4. -- Lew |
|
|||
|
Roedy Green wrote:
>> It seems to me the usual XML tools in Java load the entire XML file >> into RAM. Are there any tools that process sequentially, bringing in >> only a chunk at a time so you could handle really fat files. Donkey Hottie wrote: > Java has tools for such XML files. SAX processes XML so that it does not > need to load it all to memory. I first used SAX for XML parsing in early 1999. There's nothing new about it. SAX, and its equally handy StAX sibling, are perfect for single-pass, very-high-speed, memory-parsimonious handling of XML documents. Roedy has an interesting definition of "usual XML tools", since he's ignoring two out of three interfaces, including one that's been around nearly forever. -- Lew |
|
|||
|
On 07-02-2010 15:31, Lew wrote:
> On 2/7/2010 1:20 PM, Donkey Hottie wrote: >> On 7.2.2010 20:14, Peter Duniho wrote: >>> Roedy Green wrote: >>>> It seems to me the usual XML tools in Java load the entire XML file >>>> into RAM. Are there any tools that process sequentially, bringing in >>>> only a chunk at a time so you could handle really fat files. >>> >>> Sounds like you want the XMLStreamReader interface: >>> http://java.sun.com/javase/6/docs/ap...eamReader.html >>> >>> >>> I haven't used the Java version myself (there's a similar type in .NET), >>> and haven't looked closed to determine the specifics. But I presume >>> there's a way to get an implementation of the interface (looks like >>> XMLInputFactory is the way to go). >>> >>> Of course, if per a previous discussion you're stuck on Java 1.5, this >>> is unavailable to you. But otherwise, you should find it exactly what >>> you're asking for. >>> >>> Pete >> >> SAX interface works fine even with Java 1.4, and it does what Roedy >> wants. > > It's been around since Java 1.2; it better work with 1.4. Yes and no. SAX was added to Java API in 1.4. JAXP API including SAX existed earlier than Java 1.4 and libraries implementing it could be separately downloaded. I have done the latter for Java 1.3 and it may have existed already for 1.2. Arne |
|
|||
|
Arne Vajhøj wrote:
> On 07-02-2010 12:59, Roedy Green wrote: >> It seems to me the usual XML tools in Java load the entire XML file >> into RAM. > > ???? > > W3CDOM and JAXB do load all data in memory. > > SAX and StAX do not load all data in memory. If you use XSLT to process an XML file, it has to keep a complete representation of the resulting XML document into memory, since an XSLT transformation can include XPath expressions, and XPath can in principle access anything in the dociument. This is true even if the input to XSLT is a SAXSource. |
|
|||
|
On 07-02-2010 16:37, Mike Schilling wrote:
> Arne Vajhøj wrote: >> On 07-02-2010 12:59, Roedy Green wrote: >>> It seems to me the usual XML tools in Java load the entire XML file >>> into RAM. >> >> ???? >> >> W3CDOM and JAXB do load all data in memory. >> >> SAX and StAX do not load all data in memory. > > If you use XSLT to process an XML file, it has to keep a complete > representation of the resulting XML document into memory, since an XSLT > transformation can include XPath expressions, and XPath can in principle > access anything in the dociument. This is true even if the input to XSLT is > a SAXSource. True. But that problem is very hard to solve. Arne |
|
|||
|
On Sun, 7 Feb 2010, Mike Schilling wrote:
> Arne Vajh?j wrote: >> On 07-02-2010 12:59, Roedy Green wrote: >>> It seems to me the usual XML tools in Java load the entire XML file >>> into RAM. >> >> ???? >> >> W3CDOM and JAXB do load all data in memory. >> >> SAX and StAX do not load all data in memory. > > If you use XSLT to process an XML file, it has to keep a complete > representation of the resulting XML document into memory, since an XSLT > transformation can include XPath expressions, and XPath can in principle > access anything in the dociument. This is true even if the input to > XSLT is a SAXSource. Weeeellll, kinda. Some XSLTs will require the whole document to be held in memory. But it is possible to process some XSLTs in a streaming or streaming-ish manner (where elements are held in memory, but only a subset at a time). There's nothing stopping an XSLT processor compiling such XSLTs into a form which does just that. Whether any actually do, i don't know. A while ago, i read about a streaming XPath processor. It couldn't handle all XPaths in a streaming manner, so it had to fall back to searching an in-memory tree where that was the case, but many common XPaths can be handled streamingly. For instance, something like: //order[@id='99']/order-item Could be. You run the parse, and maintain the current stack of elements in memory - all the elements enclosing the current parse point, IYSWIM. Then you just look at the top of the stack at every point to see if it's an order-item, then if it is, look back to see if the enclosing order has an id of 99. You could probably do it more efficiently than that, but that's one way you could do it. Something like this: //order[customer[@id='99']]/order-item Is more challenging, and requires a more sophisticated evaluation strategy - you might need to read in a whole order, search it for matching order-items, then throw it away and move on to the next one. Or, if you knew from the DTD that the customer element had to come before any order-items in an order, you could build a state machine that could decide that it was inside a matching order, and then report all order-items. Anyway, all speculation, but it's interesting stuff! tom -- Dreams are not covered by any laws. They can be about anything. -- Cmdr Zorg |
|
|||
|
On Sun, 7 Feb 2010, Roedy Green wrote:
> It seems to me the usual XML tools in Java load the entire XML file into > RAM. Are there any tools that process sequentially, bringing in only a > chunk at a time so you could handle really fat files. What do you mean by 'tools'? tom -- Dreams are not covered by any laws. They can be about anything. -- Cmdr Zorg |
|
|||
|
Tom Anderson wrote:
> On Sun, 7 Feb 2010, Mike Schilling wrote: > >> Arne Vajh?j wrote: >>> On 07-02-2010 12:59, Roedy Green wrote: >>>> It seems to me the usual XML tools in Java load the entire XML file >>>> into RAM. >>> >>> ???? >>> >>> W3CDOM and JAXB do load all data in memory. >>> >>> SAX and StAX do not load all data in memory. >> >> If you use XSLT to process an XML file, it has to keep a complete >> representation of the resulting XML document into memory, since an >> XSLT transformation can include XPath expressions, and XPath can in >> principle access anything in the dociument. This is true even if >> the input to XSLT is a SAXSource. > > Weeeellll, kinda. Some XSLTs will require the whole document to be > held in memory. But it is possible to process some XSLTs in a > streaming or streaming-ish manner (where elements are held in memory, > but only a subset at a time). There's nothing stopping an XSLT > processor compiling such XSLTs into a form which does just that. > Whether any actually do, i don't know. Xalan (the XSLT processor in the JDK), doesn't. |
|
|||
|
On 07-02-2010 17:25, Tom Anderson wrote:
> On Sun, 7 Feb 2010, Mike Schilling wrote: >> Arne Vajh?j wrote: >>> On 07-02-2010 12:59, Roedy Green wrote: >>>> It seems to me the usual XML tools in Java load the entire XML file >>>> into RAM. >>> >>> ???? >>> >>> W3CDOM and JAXB do load all data in memory. >>> >>> SAX and StAX do not load all data in memory. >> >> If you use XSLT to process an XML file, it has to keep a complete >> representation of the resulting XML document into memory, since an >> XSLT transformation can include XPath expressions, and XPath can in >> principle access anything in the dociument. This is true even if the >> input to XSLT is a SAXSource. > > Weeeellll, kinda. Some XSLTs will require the whole document to be held > in memory. But it is possible to process some XSLTs in a streaming or > streaming-ish manner (where elements are held in memory, but only a > subset at a time). There's nothing stopping an XSLT processor compiling > such XSLTs into a form which does just that. Whether any actually do, i > don't know. > > A while ago, i read about a streaming XPath processor. It couldn't > handle all XPaths in a streaming manner, so it had to fall back to > searching an in-memory tree where that was the case, but many common > XPaths can be handled streamingly. For instance, something like: > > //order[@id='99']/order-item > > Could be. You run the parse, and maintain the current stack of elements > in memory - all the elements enclosing the current parse point, IYSWIM. > Then you just look at the top of the stack at every point to see if it's > an order-item, then if it is, look back to see if the enclosing order > has an id of 99. You could probably do it more efficiently than that, > but that's one way you could do it. Something like this: > > //order[customer[@id='99']]/order-item > > Is more challenging, and requires a more sophisticated evaluation > strategy - you might need to read in a whole order, search it for > matching order-items, then throw it away and move on to the next one. > Or, if you knew from the DTD that the customer element had to come > before any order-items in an order, you could build a state machine that > could decide that it was inside a matching order, and then report all > order-items. > > Anyway, all speculation, but it's interesting stuff! Interesting. But for writing code today that use the standard XML libraries, then assuming that XSLT would read it all into memory would be a safe assumption. Arne |
|
|
![]() |
| Popular Tags in the Forum |
| files, large, xml |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Tricks Of The Trade: How To Compile Large Program On Two VeryDifferent Compilers | ChristopherL | Newsgroup comp.lang.ada | 21 | 11-11-2009 07:51 PM |
| Re: attaching large XML files to outlook messages | Mary | Newsgroup comp.soft-sys.sas | 0 | 08-12-2008 07:21 PM |
| setting internal order for proc tabulate | rss | Newsgroup comp.soft-sys.sas | 7 | 12-05-2006 11:29 PM |
| Efficient, fast table lookup (AKA Paul Dorfman, where are you? :-) ) | Scott Bass | Newsgroup comp.soft-sys.sas | 2 | 08-12-2005 10:12 PM |
| Re: possible in one step? | Michael Murff | Newsgroup comp.soft-sys.sas | 3 | 01-26-2005 08:33 PM |