How to encode/marshal a big slice into json in Golang

The problem

Say you have an io.Writer and you want to encode/marshal your very big slice of data in json string and write into it; it can be a file, an S3 upload stream, http response, or any form of a writer.

Marshaling a big chunk of data, let’s say 20GB, can be a heavy workload for the system, as it requires a significant amount of memory and processing power regardless of the language you use. The performance of the system will depend on several factors, such as specifications of the machine, the available memory and processing power, and the efficiency of the code used to perform the marshaling operation.

Assuming that the system has enough memory to handle the 20GB data, the marshaling operation can still take a considerable amount of time, especially if the data contains complex nested structures or arrays. It’s worth noting that the size of the resulting JSON string may be much larger than the original data, due to the overhead of JSON formatting and encoding. This can also impact the performance and memory usage of the system, especially when transmitting or storing the data.

I’m gonna show you a technique called streaming that improves a performance and reduces the memory usage.

Short answer

Streaming works like this. You manually create the json array structure in your io.Writer, then you iterate over your data, encode each element, write the json string in writer and manually take care of the json array syntax to be valid.

This method will give you a lower transfer rate since between each iteration the encoding process happens and each write on the output is a io-bound workload, but it only uses the amount of memory required for a single element in the array. Take a look at this example:

func encode(posts []Post, w io.Writer) {
    // Write the beginning of the JSON array 
    w.Write([]byte("["))

    encoder := json.NewEncoder(w)

    // Encode the first element and write it to the response
    if len(posts) > 0 {
        encoder.Encode(posts[0])
    }
    
    // Encode and write the rest of the posts to the response while 
    // being careful not to forget commas in between the elements
    for i := 1; i < len(posts) {
        w.Write([]byte(","))
        encoder.Encode(posts[i])
    }
    
    // Write the end of the JSON array
    w.Write([]byte("]"))
}

This method really shines when you plan to get your data from a channel. And you can reduce the effect of blocking io-bound workload if you buffer the output by wrapping your response writer by a bufio.Writer. Go to the best solution for more details

Always benchmark the solutions for your case. Depending on the size of your data this solution might be slower due to the lower transfer rate.

Long answer

Say you have a large slice, or a stream of objects from a channel and you need to encode them into json and write it to a file or return it back as a http response. The following piece of code is a simplified empty http handler that needs to return the posts where were seeded by the init function.

 1package main
 2
 3import (
 4    "encoding/json"
 5    "log"
 6    "net/http"
 7    "time"
 8)
 9
10var posts []Post
11
12type Post struct {
13    Date string
14}
15
16init() {
17    // seed data (just for testing purpose)
18    const count = 50_000_000
19    posts = make([]Post, count)
20    for i := 0; i < count; i++ {
21        posts[i] = Post{Date: time.Now().String()}
22    }
23}
24
25func main() {
26    http.HandleFunc("/", handler)
27    log.Fatal(http.ListenAndServe(":8080", nil))
28}
29
30func handler(w http.ResponseWriter, r *http.Request) {
31
32    // Set the Content-Type header to application/json
33    w.Header().Set("Content-Type", "application/json")
34    
35    // TODO: <---- need to return the posts in json format here
36    //             the encoded json response is going to be  about 4GB
37}

How do you do it?

The common solution

The most straight forward way to do it is to encode it into a string and then write it in the response like this.

35// This string allocates 4GB of memory!!!
36// I ignored the error for the sake of simplicity. please don't do it.
37jsonString, _ := json.Marshal(posts)
38
39// writing the JSON response to the http.ResponseWriter
40w.Write(jsonString) 

Why is it bad?

First of all, the jsonString requires 4GB of memory. In some cases having the required memory is not feasible.

The I followed this this post that shows how to generate a graph for transfer rate, but I changed it to get a chart for Total data received since the script doesn’t work well with the http calls that are short.

$ time curl http://localhost:8080 2>&1 |tr -u '\r' '\n'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0

  ...
  
  0     0    0     0    0     0      0      0 --:--:--  0:00:43 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:44 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:45 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:46 --:--:--     0
100 1400k    0 1400k    0     0  30517      0 --:--:--  0:00:47 --:--:--  297k
100  179M    0  179M    0     0  3835k      0 --:--:--  0:00:48 --:--:-- 38.2M
100  340M    0  340M    0     0  7108k      0 --:--:--  0:00:49 --:--:-- 72.3M
100  450M    0  450M    0     0  9222k      0 --:--:--  0:00:50 --:--:-- 95.7M
100  975M    0  975M    0     0  19.1M      0 --:--:--  0:00:51 --:--:--  207M
100 1608M    0 1608M    0     0  30.9M      0 --:--:--  0:00:52 --:--:--  321M
100 2301M    0 2301M    0     0  43.4M      0 --:--:--  0:00:53 --:--:--  424M
100 2942M    0 2942M    0     0  54.4M      0 --:--:--  0:00:54 --:--:--  520M
100 3660M    0 3660M    0     0  66.5M      0 --:--:--  0:00:55 --:--:--  642M
100 3991M    0 3991M    0     0  71.9M      0 --:--:--  0:00:55 --:--:--  678M
________________________________________________________
Executed in   55.48 secs    fish           external
  usr time    0.80 secs    0.48 millis    0.80 secs
  sys time    1.74 secs    2.87 millis    1.74 secs

Here is the curl response and the chart for it. As you can see the worst downside is that the API doesn’t return any response until the whole slice is converted to string and then it starts a fast pace stream. My case took 55.48 seconds to receive the whole response body overall but the first 46 seconds there was no data incoming.

Total bytes received over time chart for method 1

An alternative solution

Well, you might think giving the writer directly to the encoder might solve this issue…

35json.NewEncoder(w).Encode(posts)

But underneath the Encoder method there is a buffer which waits for the completion of of encoding process before writing anything into the writer. So, there is going to be no difference between this an the previous method.

A better solution

A possible solution is to iterate over the slice and create the json string for each item separately while writing each of them in the response writer one by one manually. By doing that we make sure we’re not allocating any unnecessary memory and keeping the footprint as low as possible.

35encoder := json.NewEncoder(w)
36
37w.Write([]byte("[")) // <--- w is the http.ResponseWriter
38
39if len(posts) > 0 {
40    encoder.Encode(posts[0])
41}
42
43for i := 1; i < len(posts) {
44    w.Write([]byte(","))
45    encoder.Encode(posts[i])
46}
47
48w.Write([]byte("]"))

What’s the catch?

The downside is that the transfer rate is significantly lower than the common solution, because attempting to write to IO is slow and we are doing it many times more that the previous time.

Every time we try to write something in a output writer, the operating system normally needs to perform several steps to write the data to the underlying platform which can be disk or network. These steps can include copying the data from user space to kernel space, allocating memory for the data in kernel space, and scheduling disk I/O operations to write the data to disk.

If data is written in small chunks, each of these steps may need to be repeated for each chunk of data, which can result in a lot of overhead and slow performance.

$ time curl http://localhost:8080 2>&1 |tr -u '\r' '\n'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 64.7M    0 64.7M    0     0  51.6M      0 --:--:--  0:00:01 --:--:-- 52.6M
100  161M    0  161M    0     0  71.6M      0 --:--:--  0:00:02 --:--:-- 72.3M
100  258M    0  258M    0     0  79.4M      0 --:--:--  0:00:03 --:--:-- 79.9M
100  368M    0  368M    0     0  86.6M      0 --:--:--  0:00:04 --:--:-- 87.1M
100  471M    0  471M    0     0  89.7M      0 --:--:--  0:00:05 --:--:-- 92.9M
100  578M    0  578M    0     0  92.5M      0 --:--:--  0:00:06 --:--:--  102M
100  689M    0  689M    0     0  95.0M      0 --:--:--  0:00:07 --:--:--  105M
100  786M    0  786M    0     0  95.2M      0 --:--:--  0:00:08 --:--:--  105M
100  892M    0  892M    0     0  96.4M      0 --:--:--  0:00:09 --:--:--  104M
100 1019M    0 1019M    0     0  99.4M      0 --:--:--  0:00:10 --:--:--  109M
100 1120M    0 1120M    0     0  99.5M      0 --:--:--  0:00:11 --:--:--  108M
100 1247M    0 1247M    0     0   101M      0 --:--:--  0:00:12 --:--:--  111M
100 1377M    0 1377M    0     0   103M      0 --:--:--  0:00:13 --:--:--  118M
100 1506M    0 1506M    0     0   105M      0 --:--:--  0:00:14 --:--:--  122M
100 1636M    0 1636M    0     0   107M      0 --:--:--  0:00:15 --:--:--  123M
100 1765M    0 1765M    0     0   108M      0 --:--:--  0:00:16 --:--:--  128M
100 1891M    0 1891M    0     0   109M      0 --:--:--  0:00:17 --:--:--  128M
100 2025M    0 2025M    0     0   110M      0 --:--:--  0:00:18 --:--:--  129M
100 2157M    0 2157M    0     0   112M      0 --:--:--  0:00:19 --:--:--  130M
100 2281M    0 2281M    0     0   112M      0 --:--:--  0:00:20 --:--:--  128M
100 2386M    0 2386M    0     0   112M      0 --:--:--  0:00:21 --:--:--  124M
100 2519M    0 2519M    0     0   113M      0 --:--:--  0:00:22 --:--:--  125M
100 2651M    0 2651M    0     0   114M      0 --:--:--  0:00:23 --:--:--  125M
100 2780M    0 2780M    0     0   114M      0 --:--:--  0:00:24 --:--:--  124M
100 2921M    0 2921M    0     0   115M      0 --:--:--  0:00:25 --:--:--  128M
100 3057M    0 3057M    0     0   116M      0 --:--:--  0:00:26 --:--:--  134M
100 3199M    0 3199M    0     0   117M      0 --:--:--  0:00:27 --:--:--  135M
100 3339M    0 3339M    0     0   118M      0 --:--:--  0:00:28 --:--:--  137M
100 3462M    0 3462M    0     0   118M      0 --:--:--  0:00:29 --:--:--  136M
100 3600M    0 3600M    0     0   119M      0 --:--:--  0:00:30 --:--:--  135M
100 3716M    0 3716M    0     0   118M      0 --:--:--  0:00:31 --:--:--  131M
100 3845M    0 3845M    0     0   119M      0 --:--:--  0:00:32 --:--:--  129M
100 3984M    0 3984M    0     0   119M      0 --:--:--  0:00:33 --:--:--  129M
100 4039M    0 4039M    0     0   120M      0 --:--:--  0:00:33 --:--:--  131M
________________________________________________________
Executed in   33.67 secs    fish           external
   usr time    5.96 secs    0.38 millis    5.96 secs
   sys time   10.48 secs    1.37 millis   10.48 secs

writing data directly into output

Despite whe io-bound workload blockers we face the chart shows a huge improvement over waiting for encoding the big data chunk and we finish the transfer faster than before. Almost in 33 seconds instead of 55 seconds. But there is a way to improve this further.

Best way

Remember when I say every attempt on writing to an output takes time? What if we buffer our output and write bigger chunks each time?

35// Wrap the http.ResponseWriter with a bufio.Writer with the size of 64kb
36buffer := bufio.NewWriterSize(w, 65536)
37
38encoder := json.NewEncoder(buffer)
39
40buffer.WriteByte('[')
41
42if len(posts) > 0 {
43    encoder.Encode(posts[0])
44}
45
46for i := 1; i < len(posts) {
47	buffer.WriteByte(',')
48    encoder.Encode(posts[i])
49}
50  
51buffer.WriteByte(']')
52
53// Flush the bufio.Writer to ensure all data is written to the http.ResponseWriter
54buffer.Flush()

What we did here is wrapping the response writer with buffered io writer and use it as output. I will explain how it improves the processHere is the best result I got by setting the buffer to 64KB.

$ time curl http://localhost:8080 2>&1 |tr -u '\r' '\n'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 50.1M    0 50.1M    0     0   205M      0 --:--:-- --:--:-- --:--:--  212M
100  272M    0  272M    0     0   218M      0 --:--:--  0:00:01 --:--:--  220M
100  500M    0  500M    0     0   223M      0 --:--:--  0:00:02 --:--:--  223M
100  736M    0  736M    0     0   226M      0 --:--:--  0:00:03 --:--:--  227M
100  975M    0  975M    0     0   229M      0 --:--:--  0:00:04 --:--:--  230M
100 1198M    0 1198M    0     0   228M      0 --:--:--  0:00:05 --:--:--  229M
100 1437M    0 1437M    0     0   230M      0 --:--:--  0:00:06 --:--:--  233M
100 1664M    0 1664M    0     0   229M      0 --:--:--  0:00:07 --:--:--  232M
100 1904M    0 1904M    0     0   230M      0 --:--:--  0:00:08 --:--:--  233M
100 2144M    0 2144M    0     0   232M      0 --:--:--  0:00:09 --:--:--  233M
100 2382M    0 2382M    0     0   232M      0 --:--:--  0:00:10 --:--:--  236M
100 2601M    0 2601M    0     0   231M      0 --:--:--  0:00:11 --:--:--  232M
100 2831M    0 2831M    0     0   231M      0 --:--:--  0:00:12 --:--:--  233M
100 3062M    0 3062M    0     0   231M      0 --:--:--  0:00:13 --:--:--  231M
100 3294M    0 3294M    0     0   231M      0 --:--:--  0:00:14 --:--:--  229M
100 3518M    0 3518M    0     0   230M      0 --:--:--  0:00:15 --:--:--  227M
100 3732M    0 3732M    0     0   229M      0 --:--:--  0:00:16 --:--:--  226M
100 3962M    0 3962M    0     0   229M      0 --:--:--  0:00:17 --:--:--  226M
100 4038M    0 4038M    0     0   229M      0 --:--:--  0:00:17 --:--:--  225M
________________________________________________________
Executed in   17.59 secs    fish           external
   usr time    0.88 secs    0.43 millis    0.88 secs
   sys time    1.62 secs    1.53 millis    1.62 secs

using optimized buffer improves the process

As you can see by streaming the data one by one, we got the full transfer done in 17.59 seconds plus reduction of memory usage, isn’t that good?

How to decide the buffer size?

The default buffer size is 4KB which it can make the transfer faster, or slower depending on the element you have in the array without being optimized! The effective buffer size will be different case to case and it’s hard to say which size works for each case without testing. So, don’t rush and do tests and compare results.