Posts

File Upload Strategies with AWS S3, Node.js, Express, React, and Uppy

Last updated Jan 02, 2023
written byZach

Source code for this blog post

All of the source code referenced in this blog post can be found in my file uploads repository on Github.

Who is this post for?

I searched. And searched. And searched...

And while I found a few decent tutorials, I could not find any comprehensive resource that helped me understand...

  • What is the best JavaScript library for file uploads?
    • Do I use a managed file upload service?
    • What strategies can I use to implement my own file uploading solution?
  • The core concepts and best practices of file uploads
    • What is the best place to store files?
    • How do I get my files uploaded to storage?
    • What is the best way to view files after uploading?
    • Can I upload directly from my browser?
    • How do I handle large files? Multiple files at once?

If you have any of those questions, read on!

The 4 Phases of File Uploads

Before we look at any code, it's important to understand the lifecycle of a file in the context of a web application. Assuming the file we are talking about is a user-uploaded file, it will go through the following steps:

  1. Collection - we use a form to collect files from the user
  2. Upload - our server processes the binary form data and uploads it to some type of persistent storage
  3. Processing (optional) - for audio and video files, we can optionally send them through some sort of processing queue that will encode (compress) and transcode (convert) the original file to something we can stream on the web efficiently
  4. Viewing - if the file upload is meant for viewing (e.g. a YouTube video), we need to deliver it over a CDN to the user and display it with the appropriate HTML elements (i.e. video, audio, img, etc.). Furthermore, if it is a large video file, we must consider how that file will be streamed over the CDN to the user (in chunks).

This post is primarily concerned with phases 1-2, but I will briefly cover file processing (phase 3) and provide some sample code for creating an AWS Cloudfront CDN that serves files from an S3 bucket (phase 4).

While I will be showing reference implementations in JavaScript, my goal with this post is to show you the pieces that you need to implement and give you the conceptual understanding to implement them using whatever technology suits your needs.

Phase 1: Collection

File collection happens client-side (in the browser) and in its most basic form (no pun intended), looks like this:

<form action="/upload-endpoint" method="post" enctype="multipart/form-data">
  <input type="file" name="some-field-name" />
</form>

While this may look trivial, there are a couple attributes here that completely change the behavior of the form. Particularly, it is important to understand the multipart/form-data encoding type. This encoding is required to submit (binary) files to a backend server (phase 2), and is equivalent to the FormData class. You can think of FormData as the "in transport" data package for files. But this is just a portion of what the browser handles! The value of the input form element will be a FileList, which is an "array" of File objects, which is a subtype of the Blob object (binary large object).

This is the most common way to handle file uploads client-side, but as a sidenote, you might also see a file represented by an Object URL, which is a browser-specific URL format that can be converted to and from a Blob (or File) object and displayed in the browser via an a tag or something similar, which is useful for showing a visual preview of the uploaded file before you have a permanent CDN url to display it (the object url acts exactly like a regular URL).

Controlled upload forms

More commonly, especially in React, you'll be controlling the form with JavaScript and will not be using enctype="multipart/form-data" or action=... in your form. To handle this, you'll need to add a bit of logic to your submit handler function that will convert your controlled values into FormData, which as mentioned above, is the equivalent of enctype="multipart/form-data". Below are two examples I have built out to demonstrate this.

And here is some pseudocode showing how the submit handler will work:

async function submitHandler(event) {
  // Prevent default form behavior
  event.preventDefault();

  // Pass the event target object to FormData
  const form = new FormData(event.target);

  // Submit the multipart/form-data encoded object to the backend
  const response = await fetch('/your-uploads-endpoint', {
    method: 'POST',
    body: form,
  }).then((res) => res.json());

  // Usually, your backend will return the full CDN url and the client will do something with it
  // setState(response.data.url)
}

One important thing to notice here is our Axios request headers (or lack-thereof). Notice that we are NOT setting the Content-Type header. This is because we want the browser to handle setting that header in addition to the Form Boundary, which informs the server what fields exist in the data. If we manually set the header, we need to also set the boundary, which can be challenging and is unnecessary given the browser handles this automatically.

If you try to manually set the header, in many cases, you'll receive this nasty Missing boundary in multipart/form-data error, which can be confusing to debug.

Collecting multiple files

You can also collect multiple files at once by adding the multiple attribute to the input element. Nothing else changes here; you still will deal with a FileList as the form input value.

<input type="file" multiple />

Using a custom upload UI (Uppy)

While the above describes the basic way of collecting a file(s) from a user in the browser, generally, you'll add some styles around the form with CSS or even create an entire UI library for managing uploads from various sources. For example, you could use the MediaRecorder browser API to collect audio or video from a user. You could also implement a drag-n-drop UI, or even receive files from external storage like Google Drive or Dropbox. Given the vast number of sources that a file can come from, several open source libraries have been created to handle all these possibilities in a centralized browser UI library.

One of these that is worth mentioning is Uppy, which is my personal choice when it comes to an "out of the box" solution to collecting files in clean looking and functional UI. Below is an example implementation using Uppy, which is an unstyled, simplified snippet of my Next.js uploads example.

import { useSWRConfig } from 'swr';
import Uppy from '@uppy/core';
import { DashboardModal, useUppy } from '@uppy/react';
import UppyS3 from '@uppy/aws-s3';
import '@uppy/core/dist/style.css';
import '@uppy/dashboard/dist/style.css';

const { mutate } = useSWRConfig();

const [showUppyModal, setShowUppyModal] = useState(false);

// React + Uppy - uploads the files using pre-signed URLs to S3 storage directly from browser
const uppy = useUppy(() =>
  new Uppy().use(UppyS3, {
    // NOTE: We will cover how this works later; just showing this for context
    // Fetches a pre-signed S3 URL from your backend endpoint
    async getUploadParameters(file) {
      const response = await axios.post('/api/sign-url', {
        mimetype: file.type,
      });

      return {
        method: 'PUT',
        url: response.data.url,
      };
    },
  })
);

// When upload completes, close the Uppy uploader modal
uppy.on('complete', () => {
  setShowUppyModal(false);
});

return (
  <main>
    <DashboardModal
      open={showUppyModal}
      onRequestClose={() => setShowUppyModal(false)}
      uppy={uppy}
    />

    <button onClick={() => setShowUppyModal(true)}>Open Uppy Uploader</button>
  </main>
);

Phase 2: Upload

After we have collected the file(s) from the user, it is time to engage our backend server, which is responsible for two primary things:

  1. Parse the form data and prepare the binary files for upload
  2. Save the files to some storage location

Parsing the form data is relatively straightforward. Saving the files is not. There are many different places files can be saved to and many strategies for getting them there. The following sections are my best attempt to outline all of these possibilities so you can determine what is right for your use-case.

Parsing multipart form data

Parsing multipart/form-data encoding in an HTTP request is no easy task, but it has a relatively standard solution thanks to open-source contributors who have handled this for us in libraries. One of the preferred libraries in the Node.js ecosystem is busboy, which is solely responsible for parsing FormData / enctype="multipart/form-data" server-side via Node.js streams. If you've ever used Express JS and dealt with the body-parser package, you can think of busboy as the "body parser for multipart/form-data". While the body-parser middleware can read HTTP requests where the Content-Type is raw, JSON, urlencoded, or text, it cannot handle multipart/form-data, which is where busboy comes in.

Below is a code snippet demonstrating how this form data is parsed using Busboy, which comes from my Next.js file uploads example. A simpler snippet (that only handles 1 file) can be found in my vanilla file uploads server-side request handler.

import busboy, { FileInfo } from 'busboy';

type RawFile = {
  data: Buffer; // Node.js Buffer
  info: FileInfo; // busboy parsed file info
};

// Parses a standard Node.js IncomingMessage request - https://nodejs.org/api/http.html#class-httpincomingmessage
async function parseMultipartReq(req: Request) {
  const bb = busboy({ headers: req.headers });

  // Pipe request stream into busboy
  req.pipe(bb);

  // Returns a Promise containing array of parsed files, ready for upload!
  return new Promise<RawFile[]>((resolve) => {
    const files: RawFile[] = [];

    // Loops through all files in the form
    bb.on('file', (name, stream, info) => {
      const fileChunks: Buffer[] = [];
      stream.on('data', (chunk) => fileChunks.push(chunk));
      stream.on('end', () =>
        files.push({ data: Buffer.concat(fileChunks), info })
      );
    });

    bb.on('close', () => resolve(files));
  });
}

In the code snippet above, you can see that we've utilized the built-in ReadableStream pipe method to redirect the incoming HTTP request stream to the busboy library to parse. We then assemble each file from the FormData using the data event of Node.js streams and put each file Buffer into an array.

The return value of this sample function has file Buffers that are ready to upload to our storage location of choice. In the next sections, we will explore some options you have here.

Where do I store my uploaded files?

I'll give you a short answer and a long one.

Short answer

For 99% of use cases, I believe that storing files in object storage (such as AWS S3) is the best choice.

If you don't know much about file uploads, want an industry-standard, reliable solution, and don't care to understand why, skip the next section and use AWS S3 (or a similar solution like Google Cloud Storage, Digital Ocean Spaces, etc.)

Long answer

When storing files, there are 3 common places to use:

  • Cloud storage (usually object storage)
  • Your server
  • Your database

Through process of elimination, here's why I prefer object storage for file uploads.

Cloud storage

As I mentioned, this (specifically, object storage) is my favorite choice. It is generally cheaper than storing in a database and more secure and convenient than storing files on your server.

Server storage

Server storage is when you upload your files directly to the physical machine that your app is running on.

This option can be a good one in certain cases. For example, if you're running a Wordpress website, the /wp-content/uploads directory on your server is the default location for file uploads. This works great because by its very nature, Wordpress servers contain more than an API; they also include themes, uploads, plugins, and other user-generated files.

That said, most people reading this post are in the Node.js ecosystem and generally manage servers that are thin, stateless RESTful APIs (think Express.js). Once you start uploading files to the server, your API becomes an entirely different program.

Ask yourself, "How easy would it be to deploy this API to another server?".

With most Node.js servers / RESTful APIs, the answer is, "easy!". Spin up a new server, initialize a few environment variables, and run node main.js and you're up and running! But with files stored on your previous server, you now must migrate these files. Not fun.

Database storage

If your upload requirements are small (i.e. all you need to store is a profile picture for each user and nothing more), database storage can be a good option.

But if you need to store anything more, scaling DB storage is much more expensive than scaling object storage. In many cases, to add storage to your database, you also have to increase the memory and CPU on the instance since database pricing models are generally a combination of storage + CPU + memory. In other words, you're file/media scaling is now coupled to your DB memory/CPU scaling (which often scale at different rates).

If that's not enough to convince you, storing files in a database increases complexity as you need to store the files in either binary or base64 format, consider compression, etc.

And even one more thing... Performance. When you store large files in your database, you need to either store these files in isolated tables or make sure that you're querying the database efficiently. If you run select * on a table with large files, you're going to slow things down considerably.

What are some common file upload strategies?

Now that you parsed your form data server-side and have decided where to store your uploads, it's time to decide how you're going to get them there.

There are 3 common strategies you can use to upload files.

  1. Standard server-side uploads
  2. Browser uploads via pre-signed AWS S3 URLs
  3. Managed services (easiest, but most expensive)

Option 1: Server uploads

This option is the most common one you'll see in tutorials online and is a great option for most simple workloads.

If you remember from prior sections, we used the busboy library to parse our form data coming from the client into an array of RawFile objects:

type RawFile = {
  data: Buffer; // Node.js Buffer
  info: FileInfo; // busboy parsed file info
};

async function parseMultipartReq(req): Promise<RawFile[]> {
  // parse with busboy and return the prepared files
}

From here, we can simply loop through the files and upload them to our storage location via SDKs or classic RESTful APIs. Below is a Next.js uploads example demonstrating how we take the parsed output and get those files uploaded to AWS S3:

import { S3Client } from '@aws-sdk/client-s3';

const s3 = new S3Client({
  region: 'us-east-1',
});

type RawFile = {
  data: Buffer; // Node.js Buffer
  info: FileInfo; // busboy parsed file info
};

async function uploadFiles(req: Request) {
  // Parse the form, prepare the files (see previous section for explanation)
  const rawFiles: RawFile[] = await parseMultipartReq(req);

  // Upload all files to storage destination
  await Promise.allSettled(
    rawFiles.map((f) => {
      return s3.send(
        new PutObjectCommand({
          Bucket: 'your-bucket-name',
          // Here, just using a unique UUID for the file upload name in S3, but you could customize this
          Key: uuid()
          // Pass the Node.js Buffer to the SDK
          Body: f.data,
        })
      );
    })
  );
}

Option 2: Upload to AWS S3 directly from Browser

This strategy is not often shown in tutorials but by far my favorite strategy for many reasons you'll see in the implementation section. Particularly, this one is great because you don't have to use up server bandwidth as the files go directly to S3 rather than using your server as the intermediary (as in option 1).

The overall strategy here is:

  1. Collect files in browser via a form <input type="file" />
  2. Send an HTTP POST request with some file metadata (not the file contents) to your server to generate a pre-signed URL for S3 uploads
  3. Use the signed URL to directly upload your files to S3

You can implement this strategy with other cloud providers; I'm using S3 as an example as it's what I use and a very common service. Below, you can see the client and server implementation of vanilla signed urls.

// Client-side logic
// -------------------

// A minimal form submission handler
function handleFormSubmission() {
  // Get a signed URL from our backend
  const signedUrl = await axios.post('/api/sign-url');

  // Here is where we use the signed URL to actually upload to S3
  await axios.put(signedUrl.data.url, file);
}

And then, the server-side logic that signs the URLs (full, working version here):

// Client-side logic
// -------------------

import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
import { PutObjectCommand } from '@aws-sdk/client-s3';
import { S3Client } from '@aws-sdk/client-s3';

const s3 = new S3Client({
  region: 'us-east-1',
});

// Notice, we do NOT have to parse multipart/form-data here because we're not actually submitting file data to the backend.
// This saves us lots of bandwidth!
async function signUrl(req: Request) {
  const url = await getSignedUrl(
    s3,
    new PutObjectCommand({
      Bucket: 'your-bucket-name',
      Key: uuid(),
    }),
    { expiresIn: 3600 }
  );
}

The benefit of using an approach like this is that we can completely eliminate the first step of parsing multipart/form-data. This achieves two things:

  1. No complex parsing logic required
  2. Massive reduction in bandwidth used since we are not including the file data in the request to our server

You can learn more about pre-signed URLs here in the AWS documentation. Additionally, you can leverage these pre-signed URLs with upload UIs like Uppy as demonstrated here in my Uppy example.

Option 3: Managed file uploads

The final option you have is to replace your server-side implementation with a managed solution. Managed file upload solutions generally consist of a 1) UI upload widget and 2) Backend upload API that the widget automatically connects to. In other words, for a price, file uploads are handled for you. Most services also handle multi-chunk uploads (for larger files) in addition to encoding media as I described above.

Some popular services you can choose from include:

  1. Transloadit
  2. UploadCare
  3. Filestack

If you have the funds and some heavy upload requirements, these are all great options with minimal code required to get started.

Additionally, many of these services offer open-source libraries and UI widgets that you can use without paying for their premium service. For example, Transloadit maintains the popular Uppy library, which provides a simple way to handle uploads from various sources such as your local filesystem, webcam, or microphone. They also maintain a server-side library that handles resumable, multi-part uploads called Tus.

Phase 3: Processing

At this point, we have covered how to collect and upload files, but what about large video and audio files that need to be encoded and transcoded? This is a critical step when dealing with these types of files, but as I mentioned earlier in this post, I will not be covering this phase in detail because it is complex and out of the scope of this post.

AWS provides a high-level overview of how this phase might be implemented with AWS Elemental MediaConvert, but in general, you should be looking for a managed solution like Transloadit unless you have hyper-specific requirements that cannot be met by any managed service.

Phase 4: Viewing

Once files have been uploaded to storage and optionally encoded to various formats, it's time to deliver them in the appropriate format to users for viewing.

In almost every case, you'll be delivering files over a CDN. This could be as simple as an AWS Cloudfront CDN serving S3 content all the way to Cloudfront serving streaming content from AWS Elemental MediaPackage. I'm using AWS as an example, but there are many other services capable of doing the same.

To see how this might be implemented with Cloudfront + S3, I have created an AWS CDK stack to demonstrate the required setup. To avoid cluttering this post with CDK instructions, please follow the steps outlined in the README I have created for the associated tutorial repository.

Once you have the CDN setup and it is serving your S3 uploads, displaying them to a user is relatively simple and will generally utilize the a, img, video, or audio HTML tags alongside the CDN url returned from your server after a successful upload in phase 2.

<!-- Allow users to download the upload from their browser -->
<a href="https://your-cdn.yoursite.com/your-upload.extension" download
  >your downloadable upload url</a
>

<!-- Show as image -->
<img src="https://your-cdn.yoursite.com/your-upload.png" />

<!-- Show as audio -->
<audio src="https://your-cdn.yoursite.com/your-upload.mp3" controls />

<!-- Show as video -->
<video src="https://your-cdn.yoursite.com/your-upload.mp4" controls />

How do I handle large file uploads?

The above sections outline the basic file upload strategy that will cover most use cases. That said, once your app starts handling large files (10MB+), you may want to consider some alternative strategies for uploading.

Before I get into these alternate strategies, I want to briefly cover how Node.js handles large file uploads. While you may intuitively think that your server will be "blocked" the entire time the file is uploading, this is not the case. The Event Loop handles non-blocking network I/O requests like file uploads in "chunks" (remember, the req object is a Node.js Stream). Because of this, your Node.js server can handle pretty large files without running into performance issues. Obviously if you've got hundreds of users uploading huge files over the same endpoint it will cause problems, but if your app handles occasional uploads, you're probably fine with the strategies I outlined in prior sections.

If that's the case, why would we ever handle large file uploads differently? Among other things, one huge reason to think about large file uploads differently is due to user experience. Just imagine... You've waited 30 minutes for your YouTube video to upload and then you receive this error in the browser: Sorry, something went wrong. Please try your upload again. It hurts to just think about!!

In cases like this, we need a way to handle uploads in "chunks" so that even if part of the upload fails, the browser can reconstruct the file and continue the upload from where it left off.

While I do not have any code examples that handle this (they require a bit more boilerplate), here are two resources that you will want to read through to better understand how you can achieve multi-chunk file uploads that are resilient to network failures.

  • Tus - Transloadit, the company behind Uppy, maintains Tus, which is an open-source protocol for resumable file uploads
  • AWS S3 MultiPart Uploads - AWS S3 has native support for multi part uploads

As a point of clarification, you want to use Tus OR S3 Multipart, but NOT both. These are separate solutions to the same problem and both are great options.

If you prefer the pre-signed URL approach to uploads like I do, I recommend the Uppy + S3 Multipart OR Uppy + Transloadit Managed Service (which uses Tus under the hood).