Text File Parsing in Java?

It sounds like you're doing something wrong to me - a whole lotta object creation going on.

It sounds like you're doing something wrong to me - a whole lotta object creation going on. How representative is that "test" file? What are you really doing with that data?

If that's typical of what you really have, I'd say there's lots of repetition in that data. If it's all going to be in Strings anyway, start with a BufferedReader to read each line. Pre-allocate that List to a size that's close to what you need so you don't waste resources adding to it each time.

Split each of those lines at the comma; be sure to strip off the double quotes. You might want to ask yourself: "Why do I need this whole file in memory all at once? " Can you read a little, process a little, and never have the whole thing in memory at once?

Only you know your problem well enough to answer. Maybe you can fire up jvisualvm if you have JDK 6 and see what's going on with memory. That would be a great clue.

The way the questioner is doing it appears to create one large char (in a String) and then Strings which are slices of that, which surprisingly is actually the uber memory efficient way of doing it. (Not checked implementation of split. Of course it is all implementation dependent.) – Tom Hawtin - tackline May 21 '09 at 0:29 You are correct on "uber efficient", Tom.

My advice would actually make it worse. If the problem persists, I think it's processing on the fly and jvisualvm that will help the most. – duffymo May 21 '09 at 21:59.

I'm not sure how efficient it is memory-wise, but my first approach would be using a Scanner as it is incredibly easy to use: File file = new File("/path/to/my/file. Txt"); Scanner input = new Scanner(file); while(input.hasNext()) { String nextToken = input.next(); //or to process line by line String nextLine = input.nextLine(); } input.close(); Check the API for how to alter the delimiter it uses to split tokens.

It sounds like you currently have 3 copies of the entire file in memory: the byte array, the string, and the array of the lines. Instead of reading the bytes into a byte array and then converting to characters using new String() it would be better to use an InputStreamReader, which will convert to characters incrementally, rather than all up-front. Also, instead of using String.

Split("\n") to get the individual lines, you should read one line at a time. You can use the readLine() method in BufferedReader. Try something like this: BufferedReader reader = new BufferedReader(new InputStreamReader(fileInputStream, "UTF-8")); try { while (true) { String line = reader.readLine(); if (line == null) break; String fields = line.

Split(","); // process fields here } } finally { reader.close(); }.

The original way the Strings (should) all share the same backing char, and therefore be more efficient. A line split probably isn't too bad, because there will just be one char per line. – Tom Hawtin - tackline May 21 '09 at 0:45 (And the byte array doesn't need to be in memory at the same time as the array of lines.) – Tom Hawtin - tackline May 21 '09 at 0:46 I was starting to feel like I was having to many copies of the file contents in memory.

I will try this out and see the difference – brock May 21 '09 at 3:28.

If you have a 200,000,000 character files and split that every five characters, you have 40,000,000 String objects. Assume they are sharing actual character data with the original 400 MB String (char is 2 bytes). A String is say 32 bytes, so that is 1,280,000,000 bytes of String objects.(It's probably worth noting that this is very implementation dependent.

Split could create entirely strings with entirely new backing char or, OTOH, share some common String values. Some Java implementations to not use the slicing of char. Some may use a UTF-8-like compact form and give very poor random access times.) Even assuming longer strings, that's a lot of objects.

With that much data, you probably want to work with most of it in compact form like the original (only with indexes). Only convert to objects that which you need. The implementation should be database like (although they traditionally don't handle variable length strings efficiently).

I suggest you consider one of the many, many, MANY CSV parsers already written google.co.uk/search?q=java+CSV+parser 6,150,000 hits.

Have a look at this page. It contains many open source CSV parsers. JSaPar is one of them.

Java Open Source libraries.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions