jitl a day ago

Here's how I solved this problem in Notion's internal command line tools:

    function flushWritableStream(stream: NodeJS.WritableStream) {
      return new Promise(resolve => stream.write("", resolve)).catch(
        handleFlushError,
      )
    }
    
    /**
     * In NodeJS, process.stdout and process.stderr behave inconsistently depending on the type
     * of file they are connected to.
     *
     * When connected to unix pipes, these streams are *async*, so we need to wait for them to be flushed
     * before we exit, otherwise we get truncated results when using a Unix pipe.
     *
     * @see https://nodejs.org/api/process.html#process_a_note_on_process_i_o
     */
    export async function flushStdoutAndStderr() {
      await Promise.all([
        flushWritableStream(process.stdout),
        flushWritableStream(process.stderr),
      ])
    }

    /**
     * If `module` is the NodeJS entrypoint:
     *
     * Wait for `main` to finish, then exit 0.
     * Note that this does not wait for the event loop to drain;
     * it is suited to commands that run to completion.
     *
     * For processes that must outlive `main`, see `startIfMain`.
     */
    if (require.main === module) {
      await main(argv)
      await flushStdoutAndStderr()
      setTimeout(() => process.exit(0))
    }
  • 3np a day ago

    Hm, that does not solve the issue in TFA and I'm not sure it consistently works like you intend:

        $ node -e "process.stdout.write('@'.repeat(128 * 1024)); process.stdout.write(''); process.exit(0);" | wc -c
        65536
    
    You want to use `Stream.finished`:

        $ node -e "const { Stream } = require('stream'); s = process.stdout; s.write('@'.repeat(128 * 1024)); Stream.finished(s, () => { process.exit(0); })" | wc -c
        131072
    
    https://nodejs.org/api/stream.html#streamfinishedstream-opti...

    If this helps you, consider bothering your infra team to look at opening access back up for tor IPs (;

    • ecedeno a day ago

      > node -e "process.stdout.write('@'.repeat(128 * 1024)); process.stdout.write(''); process.exit(0);" | wc -c

      That's missing the part where it waits for the last write to flush before exiting.

      • 3np a day ago

        You might think that would be it?

            $ node -e "process.stdout.write('@'.repeat(128 * 1024)); process.stdout.write(''); setTimeout(()=>{ process.exit(0); }, 0);" | wc -c
            131072
        
        Sure. But not so fast! You're still just racing and have no guarantees. Increase the pressure and it snaps back:

            $ node -e "process.stdout.write('@'.repeat(1280 * 1024)); process.stdout.write(''); setTimeout(()=>{ process.exit(0); }, 0);" | wc -c
            65536
        
        Meanwhile:

            $ node -e "const { Stream } = require('stream'); s = process.stdout; s.write('@'.repeat(1280 * 1024)); Stream.finished(s, () => { process.exit(0); })" | wc -c
            1310720
        • Thorrez 13 hours ago

          You're not passing anything into the 2nd parameter of stdout.write()

    • maple3142 a day ago

      I think this is what @jitl means:

        node -e "process.stdout.write('@'.repeat(128 * 1024)); process.stdout.write('',()=>process.exit(0)); " | wc -c
      
      It writes an empty string and use its callback to detect if it has been flushed, which means previous writes are also flushed.
    • jitl a day ago

      Your one liner doesn’t do the same thing as the code I posted. You missed passing a callback to stream.write and waiting for the callback before exiting the process.

      Once we await that callback to resolve, then the buffered data from before that point is known to be flushed. I don’t want to wait for a “finished” event because I want my program to terminate once my main() function and a flush after main() is complete. If I wait for finish, a background “thread” can extend the process lifetime by continuing to log.

      This is also why we don’t set process.exitCode and wait for the event loop to drain and Node to auto-exit. That might be the “right way”, but if someone leaves a lingering setTimeout(…, 10_MINUTES) you’ll have a bad time.

      It’s also more clear to reason about. Our rule: if you want the process to wait for your async work to complete, then `await` it.

      • 3np a day ago

        > Once we await that callback to resolve, then the buffered data from before that point is known to be flushed.

        This is not true IME. Just more likely.

        See my comment here: https://news.ycombinator.com/item?id=41829905

        > If I wait for finish, a background “thread” can extend the process lifetime by continuing to log.

        Can you not address this by calling .end() on the stream?

        https://nodejs.org/api/stream.html#writableendchunk-encoding...

        > Our rule: if you want the process to wait for your async work to complete, then `await` it.

        In general I very much agree. In this specific case, though, the awaits are just buying you a bit more time to race the buffer which is why they appear to help. The "flush" will not wait for completion before exiting the process.

        Maybe also worth keeping in mind that even if this would be working, stuff like this has historically changed breakingly a few times between major Node releases. If you're relying on unspecified behavior you're obviously in darker waters.

        • throwitaway1123 a day ago

          > See my comment here: https://news.ycombinator.com/item?id=41829905

          Your comment isn't equivalent to the original code.

          Your one liner is doing this: `process.stdout.write('')`

          Jitl's example is doing this: `new Promise(resolve => stream.write("", resolve))`

          He's passing in a promise resolver as the callback to stream.write (this is basically a `util.promisify` version of the writable.write chunk callback). If the data is written in order (which it should be), then I don't see how the `stream.write` promise could resolve before prior data is flushed. The documentation says: "The writable.write() method writes some data to the stream, and calls the supplied callback once the data has been fully handled". [1]

          [1] https://nodejs.org/api/stream.html#writablewritechunk-encodi...

          • jitl a day ago

            It's so easy to miss a tiny but important detail when working with or discussing Node streams and come out the other side bamboozled by bugs. This whole thread exemplifies why I tell people to stay as far away from streams as possible. Though in the case of stdout/stderr we can't avoid it.

            • 3np 21 hours ago

              > It's so easy to miss a tiny but important detail when working with or discussing Node streams and come out the other side bamboozled by bugs. This whole thread exemplifies why I tell people to stay as far away from streams as possible

              100%.

              Case-in-point: After coming back I realize that indeed I did overlook the callback/promise making the difference for the write call. Sorry for spreading lies!

userbinator a day ago

When I read the title and the first sentence I immediately thought of partial writes, one of those things that a lot of people seem to ignore until things stop working, often intermittently. I don't work with Node.js, but I've had to fix plenty of network code that had the same bug of not handling partial writes correctly.

I thought gpt-4o and o1-preview would be able to do this pretty easily, but surprisingly not.

You're surprised that AI doesn't work? I'm not.

  • Joel_Mckay 12 hours ago

    "I don't work with Node.js"

    Consider yourself blessed, as it is the worst designed code ecosystem I've had stinking up the infrastructure. It is worse than VB in many ways.

    "You're surprised that AI doesn't work? I'm not."

    In general, most productivity studies quietly hide the fact ML LLM make zero impact on experienced developers performance. However, if the algorithm is well formed arbitrary nonsense, then an LLM can generate plausible sounding slop all day long.

    Best of luck =3

    • namaria 5 hours ago

      Few decisions I took make me happier than steering clear of the whole thing in my early days of learning code. What ticked me off first was the silent failures but I keep getting confirmation I did the right thing.

fovc a day ago

POSIX is weird, but NodeJS streams are designed to be misused

hipadev23 a day ago

I’m confused. If process.stdout.write() returns false when the pipe is full, do you not need to loop and call it again or something analogous? Or does it continue operating on the write in the background and that’s why waiting for the .finished() event works?

Is there a reason it doesn’t use standard nodejs promise semantics (await process.stdout.write)? So probably best solution is util.promisify()?

pdr94 12 hours ago

reat investigation! This highlights a crucial aspect of Node.js's asynchronous nature when dealing with pipes. It's a reminder that we should always be mindful of how Node handles I/O operations differently based on the output destination.

The key takeaway is the behavior difference between synchronous (files/TTYs) and asynchronous (pipes) writes in Node.js. This explains why `process.exit()` can lead to truncated output when piping.

For those facing similar issues, remember to handle the `drain` event or use a more robust streaming approach to ensure all data is written before exiting. This post is a valuable resource for debugging similar "mysterious" pipe behavior in Node.js applications.

arctek a day ago

fsync doesn't work here right because unix pipes are in memory? I've had luck elsewhere with nodejs and WriteableStreams that refuse to flush their buffers before a process.exit() using fsync on the underlying file descriptors.

  • Joker_vD 12 hours ago

    Well, yes:

        EINVAL    fd is bound to a special file (e.g., a pipe, FIFO, or
                  socket) which does not support synchronization.
    
    Or as POSIX puts it,

        [EINVAL]
            The fildes argument does not refer to a file on which this operation is possible.
moralestapia a day ago

???

Wait, so what's the solution?

  • simonbw a day ago

    My understanding is that it's "don't call system.exit() until you have finished writing everything to system.stdout".

    • gcommer a day ago

      Even better: never use process.exit(). Set process.exitCode instead.

      https://nodejs.org/api/process.html#processexitcode

      • o11c a day ago

        Or just throw an error.

        See also `process.on('exit', ...)`; you can conditionally set process.exitCode in it (only if there was no other failure) if you want.

  • ksr a day ago

    I use fs.writeFileSync:

      $ node -e "fs.writeFileSync(1, Buffer.from('@'.repeat(128 * 1024))); process.exit(0);" | wc -c
      131072
    • 3np a day ago

      It doesn't work for all streams but for the stdout/stderr case where they can be treated like file descriptors, I prefer this simpler approach over fiddling with the Streams API.

      (If blocking the process on writing synchronously to stdout is indeed what you actually want...)

molsson a day ago

process.stdout._handle.setBlocking(true)

...is a bit brutal but works. Draining the stream before exiting also kind of works but there are cases where drain will just permanently block.

async function drain(stream) { return new Promise((resolve) => stream.on('drain', resolve)) }

  • jitl a day ago

    When you want to await a single instance of a Node EventEmitter, please use `stream.once('drain', (err) => ...)` so you don't leak your listener callback after the promise resolves.

  • yladiz a day ago

    Which cases will it permanently block?

    • jitl a day ago

      The writable stream will only emit 'drain' if the buffer fills past the limit. In that case, a prior call to `writable.write(...)` would return `false` indicating you should wait for drain before calling write again. Even if your code can't access the return value for the last writable.write call, you can check `if (writable.writableNeedDrain) { ...` to decide to wait for drain.

      This program will run forever, since we never write to stdout, stdout never "drains":

          setInterval(() => console.error("stderr: ", new Date().toISOString()), 5_000)
          process.stdout.once("drain", () => {
            console.error("stderr: stdout.once.drain, exit")
            process.exit(0)
          })
benatkin a day ago

This is clickbait. The process exiting without unflushed output doesn't mean disappearing bytes. The bytes were there but the program left without them.

  • ptx a day ago

    Since the writer never wrote the remaining bytes to the pipe, but only to its own memory buffer, and that memory buffer is destroyed when writer exits, "disappearing bytes" seems accurate. The bytes buffered in the memory of the writer process disappear along with the process when it exits.

  • lexicality 14 hours ago

    clickbait would be "POSIX pipes corrupt JSON data? You won't believe this footgun NodeJS intentionally points at linux users!"

    this title is simply a poetic description of the initial bug report.

  • richbell a day ago

    Yes, that is the conclusion of the investigation. It does not make this clickbait.

    • benatkin a day ago

      I disagree, which is why I left my comment. The bytes didn't disappear in a way that justifies the headline on HN IMHO because there was a very clear explanation of where they went.

      • pyre a day ago

        The title is written from the perspective of the original problem, not from the solution. It then leads the user through the process of discovering why the bytes were 'disappearing.' From the perspective of the original users the bytes were disappearing.