Understanding Resumable Upload in Google Cloud Storage and cURL example.

Leandro Damascena
7 min readNov 9, 2020

--

First of all we must understand what “Resumable Upload” is and how it works. I don’t know how old you are or how much time you have spent in the “land of the internet” but if you are over 30 years old and started using the internet in the 90’s or 00’s, you surely remember a useful program to download programs/videos/music using dial-up connection: Getright! I think it was the first Resumable Download/Upload that I had contact with.

According to GCP’s definition: “a Resumable Upload allows you to resume data transfer operations to Cloud Storage after a communication failure has interrupted the flow of data. Resumable uploads work by sending multiple requests, each of which contains a portion of the object you’re uploading. This is different from a simple upload, which contains all of the object’s data in a single request and must restart from the beginning if it fails part way through.”

We can define this functionality of GCP as a “multi-part upload”, where you don’t need to send all the data in just one package, you can split it into several small packages and send whatever you want. Using this approach, you can solve network problems like: slow connections, unstable 4G connections, bandwidth limits, upload failures, etc.

The upload using Resumable Upload would be defined as show in the image below. You have the original object, then you split it into multiple parts and send each part in a request.

GCP Resumable Upload

Note: even though GCP Resumable Upload allows to you upload the file using multiple requests/packages, you still can upload using the “traditional” method and send all the data in a single request. Btw, GCP recommends the traditional method still as best practice.

You may be asking yourself now: how does GCP know the package size or when the upload is complete? What HTTP response status do I receive in each request? I’ll explain in a special chapter.

Special chapter: GCP Resumable Upload “magic”

To control Resumable Upload and know whether the upload is completed or not, GCP uses 2 headers: Content-Length and Content-Range.

Content-Length: The Content-Length header controls the data that you send in the body, for example: Content-Length: 4096. You are sending 4kb in the body size. According to rfc-2616

The Content-Length entity-header field indicates the size of the entity-body, in decimal number of OCTETs, sent to the recipient or, in the case of the HEAD method, the size of the entity-body that would have been sent had the request been a GET.

Content-Range: The Content-Range header controls the range of data that you are sending in the body, for example: Content-Range: 2000–3000/4096. You are sending the file bytes from 2000 to 3000, what means that 1096 bytes still remain to complete the upload. According the rfc-7233

The “Content-Range” header field is sent in a single part 206 (Partial Content) response to indicate the partial range of the selected representation enclosed as the message payload, sent in each part of a multipart 206 response to indicate the range enclosed within each body part, and sent in 416 (Range Not Satisfiable)
responses to provide information about the selected representation.

Now that you are aware of the headers, you must be aware of the HTTP response status also. The HTTP response status is something important that you must monitor/check when performing Resumable Upload.
GCP uses the following headers to control the Resumable Upload status:

  • HTTP 200/201: These response codes means that you upload is complete and you sent all bytes.
  • HTTP 308: This response code means that you upload is not complete and you must send more bytes to finish the upload. Note that you’ll receive a range header stating the byte offset that you already sent.
  • HTTP 499: This response code means that you deleted a Resumable Upload successfully.
  • HTTP 500/503: These response codes means that your upload was interrupted and you must continue the upload. In this case, you must check the upload status (I’ll show bellow)

For more information about headers and response codes, please read the official documentation.

Hands-on:

Ok, now you know what GCP Resumable Upload is and how it works, so let’s go hands-on. To execute this lab you must have the following:

  • A GCP Account
  • The gsutil tool installed
  • The cURL installed
  • One GCP Storage Bucket created

Part 1: Uploading using cURL

  1. Go to your GCP Console, create a Service Account and Create New Key (JSON Format) and download this key. In my case, I added this account as owner, but you must be conscious of “The least privilege
Service account and keys

2. Configure your gsutil to use this service account.

Run the command:

gcloud auth activate-service-account ACCOUNTNAME --key-file=ACCOUNTFILE.json

The ACCOUNTNAME parameter must be equal to the key “client_email” in the JSON file.

Account activated

3. Create your bucket

themediumarticle bucket

4. To demonstrate the Resumable Upload, let’s create a random file with ~1GB of data.

base64 /dev/urandom | head -c 1000000000 > example-file.txt

5. Now that our environment is set up and the file/bucket created, let’s start our Resumable Upload. GCP allows Resumable Uploads through SignedURL and we must create one before starting the upload.

gsutil signurl -c "text/plain" -m RESUMABLE my-auth.json gs://themediumarticle/example-file.txt

Explaining the command:

  • “gsutil signurl”: gsutil command and signurl action
  • “-c text/plain”: content type for the signed url
  • “-m RESUMABLE”: to define that upload is a Resumable Upload
  • “my-auth.json”: your service account key file. You can omit it, but I like to include it in every request, so I can control the account that I’m using.
  • “gs://themediumarticle/example-file.txt”: the bucket name and the object name (file name).

This command will generate a SignedURL to use in the upload and you must copy the “Signed URL” response. Look at the response:

In this case the Signed URL is: https://storage.googleapis.com/themediumarticle/example-file.txt?x

6. Execute the cURL start Resumable Upload command to get the Location URI to upload the file:

export SIGNEDURL="<SignedURL>"

curl -v -X "POST" -H "content-type: text/plain" -H "x-goog-resumable:start" -d '' $SIGNEDURL

This command will generate a URI Location to Upload our files. Look at the response and copy the “Location” header value.

Location header value

7. Now we have the final Location URI to upload the file in a single request:

export UPLOADURL="<Location header>"

Uploading the file in a single request. This command must returns HTTP 200 and finish the upload.

curl -v -X PUT --upload-file example-file.txt $UPLOADURL

HTTP 200 response code.
The bucket object was created.

8. Uploading the file using Content-Range header to control the bytes already sent:

Let’s repeat 6 and 7 steps to get a new Location to upload the file. In this case I’ll simulate a network problem and stop sending the file 3 seconds after the start:

curl -v -X PUT --upload-file example-file.txt $UPLOADURL

Now we have a new command to check the upload status:

curl -i -X PUT -H "Content-Length: 0" -H "Content-Range: bytes */1000000000" -d "" $UPLOADURL

range header

We receive the header range informing that we sent 222298111 bytes, and so is missing 777701889 bytes to send. The response code was HTTP/2 308.
Let’s divide the file and get the missing part. You can do this before uploading, you can split the file and send each part, you can split using programmatic tools, this is just a demonstration.

dd skip=222298112 if=example-file.txt of=remains.txt ibs=1

curl -v -X PUT --upload-file remains.txt -H "Content-Range: bytes 222298112-999999999/1000000000" $UPLOADURL

This request must respond HTTP 200 because you finished the upload!

So, that’s it folks! I hope you enjoyed this post and I’ll update it with Python and Go examples :)

Thanks to my friend Computer15776 and his contribution :D

Thank you for reading this!

--

--

Leandro Damascena
Leandro Damascena

Written by Leandro Damascena

Systems architect and developer for over 15 years and passionate about cloud solutions. I have been working with cloud solutions for over 10 years!

No responses yet