I have a Java application that needs the ability to upload and download large files to and from an Amazon S3 storage area.
I've been pleasantly surprised at how quickly large files can be uploaded. Really just a matter of seconds.
And I've also been pretty happy with how quickly it can download these same files and convert to a byte array.
What is way too slow, though, is reading the byte array into an actual file.
Any suggestions for how to get this last part to go faster?
Here is my code:
// Get the response - this is actually quite fast
ResponseInputStream<GetObjectResponse> getResponse = s3Client.getObject(request);
byte[] responseBytes = getResponse.readAllBytes();
// Download to a file - this is extremely slow
File outputFile = new File(downloadPath);
try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile)) {
for (int ii=0; ii<responseBytes.length; ii++) {
fileOutputStream.write(responseBytes, ii, 1);
}
}
I have a Java application that needs the ability to upload and download large files to and from an Amazon S3 storage area.
I've been pleasantly surprised at how quickly large files can be uploaded. Really just a matter of seconds.
And I've also been pretty happy with how quickly it can download these same files and convert to a byte array.
What is way too slow, though, is reading the byte array into an actual file.
Any suggestions for how to get this last part to go faster?
Here is my code:
// Get the response - this is actually quite fast
ResponseInputStream<GetObjectResponse> getResponse = s3Client.getObject(request);
byte[] responseBytes = getResponse.readAllBytes();
// Download to a file - this is extremely slow
File outputFile = new File(downloadPath);
try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile)) {
for (int ii=0; ii<responseBytes.length; ii++) {
fileOutputStream.write(responseBytes, ii, 1);
}
}
Share
Improve this question
edited 2 days ago
Mark Rotteveel
109k227 gold badges156 silver badges220 bronze badges
asked Feb 17 at 21:05
TimTim
8332 gold badges10 silver badges19 bronze badges
2
- This question is similar to: Fastest way to write to file?. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. – SpaceTrucker Commented Feb 17 at 21:11
- IMO, that is not a good dup link. That question is about writing text data. (Yes, the answers can be adapted ... but ...) – Stephen C Commented Feb 18 at 4:02
4 Answers
Reset to default 5Writing the file byte by byte will incur the overhead of a system call for every single byte.
Fortunately, there's an overload of write
that takes an entire byte[]
and writes it out with far fewer system calls:
try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile)) {
fileOutputStream.write(responseBytes);
}
In your current code, you're writing to the file using a loop:
for (int ii=0; ii<responseBytes.length; ii++) {
fileOutputStream.write(responseBytes, ii, 1);
}
This will write one byte at a time to the file output stream. Each call to fileOutputStream.write() incurs overhead because of method invocation and possibly disk I/O operations. Instead of writing one byte at a time, you can write the entire byte array in a single call:
// Write the entire byte array at once - much faster
try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile){
fileOutputStream.write(responseBytes);
}
However, for even better performance, wrap your FileOutputStream in a BufferedOutputStream as follows:
import java.io.BufferedOutputStream;
// ...
try (BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(new FileOutputStream(outputFile))) {
bufferedOutputStream.write(responseBytes);
}
Finally, I think that you have to go even beyond that and try not to read the entire file into memory, which can cause high memory consumption. you can directly stream the object into file skipping load it into memory.. Here How you can stream it directly into files skipping memory:
// Get the response input stream from S3
ResponseInputStream<GetObjectResponse> s3InputStream = s3Client.getObject(request);
// Define the path to the output file
File outputFile = new File(downloadPath);
try (InputStream inputStream = s3InputStream;
OutputStream outputStream = new BufferedOutputStream(new FileOutputStream(outputFile))) {
byte[] buffer = new byte[8192]; // Buffer size can be adjusted
int bytesRead;
// Read and write in chunks
while ((bytesRead = inputStream.read(buffer)) != -1) {
outputStream.write(buffer, 0, bytesRead);
}
} catch (IOException e) {
e.printStackTrace();
// Handle exceptions appropriately
}
As detailed in other answers and by Thomas, the issue you see is because you are writing byte by byte to file without buffering, and can be fixed with outputStream.write(bytes)
.
Your solution has another issue, as the call to readAllBytes
might cause some out of memory issues if the size of data extends beyond what you might be able to allocate in one chunk. It is much safer to stream directly from input to file without the intermediate memory footprint, this can be achieved without readAllBytes
if you call:
JDK9+ s3InputStream.transferTo(outputStream)
JDK8+ Files.copy
is simplest to use:
// Use Paths.get in older JDK:
Path outputFile = Path.of(downloadPath);
Files.copy(s3InputStream, outputFile);
The S3Client
has several methods to directly write to files, use one of those (e.g. getObject(GetObjectRequest, Path)
). That avoids the overhead of loading the entire file data in memory (which for small files is not a problem, but for large files might mean you run out of memory).