Mini-Pattern: The File / Stream Duality

June 11, 2004 2:40 PM

Name: The File/Stream Duality

Context: Your API retrieves data from, or writes data to a file

Forces:

  • The naive approach is to have the API take in a filename, or a File object to work on.
  • Far too many developers believe this is sufficient
  • Unless the file is random-access, then in order to read or write to that file, you're going to have to turn it into a byte stream.
  • One day, somebody is going to have a stream of bytes that is not a file, but that they want to pass through your API.
  • When that happens, this person will curse you, your parents, and the town you grew up in.

Therefore:

If you are writing an API that takes a filename, instead provide an API that does precisely the same thing to an arbitrary stream of bytes, and then add "convenience" methods that apply those stream-based methods to files.

4 Comments

Python has an interesting way of handling this. Many APIs take a "file-like object", which is an object that implements a number of methods such as read(), write(), readline() etc. The core Python library provides a wrapper class called StringIO (and a faster C version, cStringIO) which can be used to provide a file-like object proxy around any string, allowing strings to be passed to APIs that expect files. The odd API still accepts a filename though which is, as you've so delicately explained, a right royal pain in the arse.

I have this pattern too. It applies just as much to URLs, resource names and any other resource identifier.

Simon: Python's "file like object" is the equivalent of Java's InputStream or OutputStream. Python uses strings where Java has a File class.

The pattern is that an instead of providing only a method like "doIt(String filename)", an API should provide a method "doIt(InputStream stream)" also provide "doIt(String fileName)" as a convenience method that opens a stream reading from/writing to fileName and passes the stream to "doIt(InputStream)".

In Java the StringWriter and StringReader classes play a similar role to Python's StringIO.

Here's a related pattern I use: have a Resource interface, with a single openStream() method. Implementations can include FileResource, UrlResource, ClasspathResource, ZipEntryResource and ServletContextResource.

Then, have your API take a Resource as argument. Not only does your API not care where the InputStream comes from, it can also control if and when the stream is opened. This is particularly useful if the method would otherwise take one or more InputStreams that might not actually get used.

The Spring framework has similar abstractions: see org.springframework.core.io.InputStreamSource and org.springframework.core.io.Resource.

Comments are no longer being accepted for this blog entry. If you really want to make your voice heard, you can always email me.

Previously: The Need to Scribble

Next: Disclaimers on the Packaging of My New G4 Powerbook