Java: reading strings from a random access file with buffered input?

If you didn't care about pos2 then you woundn't need Apache Commons IO.

Import org.apache.commons.io.input. BoundedInputStream FileInputStream file = new FileInputStream(filename); file. Skip(pos1); BufferedReader br = new BufferedReader( new InputStreamReader(new BoundedInputStream(file,pos2-pos1)) ); If you didn't care about pos2, then you woundn't need Apache Commons IO.

Thanks! It's a pity that original Java APIs won't include such a functionality, but at least we have a workaround like BoundedStream. I've also found out that Google's Guava includes utterly similar LimitedInputStream class.

– GreyCat Nov 29 '10 at 20:06.

For @Ken Bloom A very quick go at a Java 7 version. Note: I don't think this is the most efficient way, I'm still getting my head around NIO.2, Oracle has started their tutorial here Also note that this isn't using Java 7's new ARM syntax (which takes care of the Exception handling for file based resources), it wasn't working in the latest openJDK build that I have. But if people want to see the syntax, let me know.

/* * Paths uses the default file system, note no exception thrown at this stage if * file is missing */ Path file = Paths. Get("C:/Projects/timesheet. Txt"); ByteBuffer readBuffer = ByteBuffer.

Allocate(readBufferSize); FileChannel fc = null; try { /* * newByteChannel is a SeekableByteChannel - this is the fun new construct that * supports asynch file based I/O, e.g. If you declared an AsynchronousFileChannel * you could read and write to that channel simultaneously with multiple threads. */ fc = (FileChannel)file. NewByteChannel(StandardOpenOption.

READ); fc. Position(startPosition); while (fc. Read(readBuffer)!

= -1) { readBuffer.rewind(); System.out. Println(Charset. ForName(encoding).

Decode(readBuffer)); readBuffer.flip(); } }.

– Ken Bloom Nov 29 '10 at 17:04 I've fixed the sample code so the reading works - will figure out the best way to read a line (this is a useful exercise as I'm feeding back API 'issues' to the nio-dev mailing list) – Martijn Verburg Nov 29 '10 at 17:11 I see the answer here javakb.Com/Uwe/Forum. Aspx/java-programmer/7117/… -- use a java.util. Scanner to operate on the channel – Ken Bloom Nov 29 '10 at 17:14 That would work yes, I just realised I used the regular FileChannel example as opposed to the AsynchronousFileChannel example, so I've adjusted my comments above.It's all powerful stuff, but it still needs some higher level API abstractions to catch on I think.

– Martijn Verburg Nov 29 '10 at 17:16 AFAICT, GreyCat still couldn't limit the reader to go no further than pos2 – Ken Bloom Nov 29 '10 at 18:56.

The java IO API is very flexible. Unfortunately sometimes the flexibility makes it verbose. The main idea here is that there are many streams, writers and readers that implement wrapper patter.

For example BufferedInputStream wraps any other InputStream. The same is about output streams. The difference between streams and readers/writers is that streams work with bytes while readers/writers work with characters.

Fortunately some streams, writers and readers have convenient constructors that simplify coding. If you want to read file you just have to say InputStream in = new FileInputStream("/usr/home/me/myfile. Txt"); if (in.markSupported()) { in.

Skip(1024); in.read(); } It is not so complicated as you afraid. Channels is something different. It is a part of so called "new IO" or nio.

New IO is not blocked - it is its main advantage. You can search in internet for any "nio java tutorial" and read about it. But it is more complicated than regular IO and is not needed for most applications.

Start with a RandomAccessFile and use read or readFully to get a byte array between pos1 and pos2. Let's say that we've stored the data read in a variable named rawBytes. Then create your BufferedReader using new BufferedReader(new InputStreamReader(new ByteArrayInputStream(rawBytes))) Then you can call readLine on the BufferedReader.

Caveat: this probably uses more memory than if you could make the BufferedReader seek to the right location itself, because it preloads everything into memory.

It's not an option for me: I'm working with multiple gigabyte files on computers with limited memory. – GreyCat Nov 29 '10 at 15:51.

I think the confusion is caused by the UTF-8 encoding and the possibility of double byte characters. UTF8 doesn't specify how many bytes are in a single character. I'm assuming from your post that you are using single byte characters.

For example, 412 bytes would mean 411 characters. But if the string were using double byte characters, you would get the 206 character. The original java.Io package didn't deal well with this multi-byte confusion.

So, they added more classes to deal specifically with strings. The package mixes two different types of file handlers (and they can be confusing until the nomenclature is sorted out). The stream classes provide for direct data I/O without any conversion.

The reader classes convert files to strings with full support for multi-byte characters. That might help clarify part of the problem. Since you state you are using UTF-8 characters, you want the reader classes.

In this case, I suggest FileReader. The skip() method in FileReader allows you to pass by X characters and then start reading text. Alternatively, I prefer the overloaded read() method since it allows you to grab all the text at one time.

If you assume your "bytes" are individual characters, try something like this: FileReader fr = new FileReader( new File("x. Txt") ); char buffer = new char pos2 - pos ; fr. Read( buffer, pos, buffer.

Length ); ...

Note that readers skip characters, not bytes. This prevents ambiguity when working with unknown character sets - is it single byte or double byte? In your case, I assumed that it's all single byte characters, so "pos" = characters.

– Jonathan B Nov 29 '10 at 15:54 I work with full unicode set - with multiple bytes per character in UTF8 encoding - there's no problem with that. FileReader has essentially the same interface as BufferedReader, but you propose to read full range of bytes into memory at once - but I can't do it, I'm working with multi-gigabyte ranges on machines with fairly limited RAM. – GreyCat Nov 29 '10 at 15:54 You don't have to do it that way, it's just convenient.

The longer way is to use skip() to get to the correct point, then use read() to get single characters off the stream. Since there is no buffering in that method you would have a lot of control over the memory footprint. (You can still get the benefits of buffering by wrapping the FileReader with a BufferedReader.

BufferedReader can be initialized to a very specific size if you need to limit the memory footprint. ) – Jonathan B Nov 29 '10 at 16:02.

I wrote this code to read utf-8 using randomaccessfiles //File: CyclicBuffer. Java public class CyclicBuffer { private static final int size = 3; private FileChannel channel; private ByteBuffer buffer = ByteBuffer. Allocate(size); public CyclicBuffer(FileChannel channel) { this.

Channel = channel; } private int read() throws IOException { return channel. Read(buffer); } /** * Returns the byte read * * @return byte read -1 - end of file reached * @throws IOException */ public byte get() throws IOException { if (buffer.hasRemaining()) { return buffer.get(); } else { buffer.clear(); int eof = read(); if (eof == -1) { return (byte) eof; } buffer.flip(); return buffer.get(); } } } //File: UTFRandomFileLineReader. Java public class UTFRandomFileLineReader { private final Charset charset = Charset.

ForName("utf-8"); private CyclicBuffer buffer; private ByteBuffer temp = ByteBuffer. Allocate(4096); private boolean eof = false; public UTFRandomFileLineReader(FileChannel channel) { this. Buffer = new CyclicBuffer(channel); } public String readLine() throws IOException { if (eof) { return null; } byte x = 0; temp.clear(); while ((byte) -1!

= (x = (buffer.get())) && x! = '\n') { if (temp.position() == temp.capacity()) { temp = addCapacity(temp); } temp. Put(x); } if (x == -1) { eof = true; } temp.flip(); if (temp.hasRemaining()) { return charset.

Decode(temp).toString(); } else { return null; } } private ByteBuffer addCapacity(ByteBuffer temp) { ByteBuffer t = ByteBuffer. Allocate(temp.capacity() + 1024); temp.flip(); t. Put(temp); return t; } public static void main(String args) throws IOException { RandomAccessFile file = new RandomAccessFile("/Users/sachins/utf8.

Txt", "r"); UTFRandomFileLineReader reader = new UTFRandomFileLineReader(file .getChannel()); int I = 1; while (true) { String s = reader.readLine(); if (s == null) break; System.out. Println("\n line " + i++); s = s + "\n"; for (byte be : s. GetBytes(Charset.

ForName("utf-8"))) { System.out. Printf("%x", b); } System.out. Printf("\n"); } } }.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions