Bitlight Labs Blog
Lightning RGB: Disaster Recovery for Colored Channels

Lightning RGB: Disaster Recovery for Colored Channels

An ordinary Lightning channel has a recovery floor that needs no cooperation from the counterparty: even after losing all channel state, a party can sweep its own BTC from the on-chain to_remote output using just its seed. RGB colored channels cannot do this.

RGB is a client-side validated protocol. The asset allocation does not live on-chain — the chain carries only an OP_RETURN commitment, and the real allocation is maintained in each party's local stock. After a node recovers from an old backup, it cannot rebuild its own RGB allocation from on-chain data alone: that state was lost after the backup was taken, and can only be retrieved from the counterparty that still holds it.

We (Bitlight Labs) completed this missing link for RGB colored channels, bringing their disaster-recovery capability in line with ordinary Lightning channels: BTC is recovered on-chain as usual, while the RGB allocation is retrieved from the counterparty through a Lightning protocol-layer extension. This post describes the design, the protocol extension, and how the finalize stage avoids collateral damage to healthy channels.


1. The problem: why you can't "just close the channel and take your money back"

Recovering from an old backup means the commitment transaction you hold locally is stale. Lightning's penalty mechanism (the justice transaction) works as follows: on each channel state update, both sides hand over the revocation key for the other's previous state. If a party broadcasts an old commitment, the counterparty can use the revocation key it received earlier to confiscate that transaction's entire balance.

So after recovering from an old backup, broadcasting any commitment of your own gets confiscated by justice. LDK's fallen behind panic is exactly this: during channel_reestablish it detects that the local state has fallen behind and terminates the process before the node can make the mistake:

panic: We have fallen behind

This is LDK's safety design, not a defect — but it forms a deadlock: you can't broadcast yourself (justice would punish you), you can't reach a cooperative close if the node won't start, and every container restart panics again. The first engineering task is to break this infinite restart loop.


2. Let the counterparty close it: reusing a proven mechanism

The community already has an answer for ordinary Lightning channels. chantools (opens in a new tab)' triggerforceclose and CLN's recoverchannel follow the same idea: the party that has fallen behind can't close the channel itself, but it can proactively send a "I know nothing" channel_reestablish; the counterparty, judging by BOLT-2 that the other side has fallen behind, force-closes using its own latest state. That commitment is current and is not subject to justice.

We adopt this mechanism: the recovery node sends a forged "I know nothing" reestablish, handing the close action to the healthy counterparty.

After the counterparty force-closes, the to_remote output of the close transaction is the recovering party's share of the BTC. It is a StaticPaymentOutput — a static key with no to_self_delay, spendable using only the seed and channel information, depending on no stale channel state. The BTC portion is recovered at this point.

The RGB portion has no such on-chain fallback.


3. RGBR: completing Lightning's RGB recovery protocol

The recovering party has swept back the to_remote UTXO, but its local stock does not know how much RGB asset that UTXO carries — the chain can't tell you. The only source is the still-healthy counterparty: it exports a consignment containing the recovering party's allocation from its own stock and sends it back.

For this we defined the RGBR (RGB Recovery) protocol at the Lightning protocol layer — a minimal, backward-compatible query extension:

  1. The recovering party asks: is channel X closed? If so, what is the close txid, and where can the consignment be downloaded? (RGBR_QUERY_CHANNEL_CLOSE, custom message type 60001)
  2. The counterparty answers: closed; here are the txid and consignment URL. (RGBR_CHANNEL_CLOSE_INFO, 60003)
  3. The recovering party downloads the consignment out-of-band over HTTP, consumes it into its local stock, and the RGB allocation is rebuilt.

Two key design choices:

  • The consignment travels by URL, not inside the LN message. A consignment grows with the transfer history (the operation DAG) and is uncompressed; it can reach MB or even GB scale, far beyond a single LN message's limit. The message returns only a consignment_url, reusing the consignment distribution mechanism that already exists for normal RGB channel operation.
  • Everything is odd-typed. The capability is advertised with odd feature bit 829 (adjacent to RGB channel's 827, with numbering derived from RGB's SLIP-0044 registered coin type); all three messages are odd custom message types. Under BOLT-1's "it's ok to be odd" convention, a standard Lightning node is entirely unaffected even though it knows nothing of them: the feature bit is ignored in init, custom messages are dropped on receipt, and neither disconnects.

We have written up the full wire protocol as a bLIP proposal, a sibling to bLIP-0070 (RGB colored channel support) (opens in a new tab) — 0070 defines how RGB assets travel over Lightning, RGBR defines how to recover the assets after an RGB channel closes.


4. finalize: no collateral damage to healthy channels

After recovery, the node has to start normally again. A direct approach would be to delete the ChannelManager, letting LDK generate an empty manager at startup and avoiding the fallen behind panic at the root. But this would harm healthy channels, for two reasons.

First, a stale backup does not mean every channel has fallen behind. Whether a channel has fallen behind depends on whether it had any activity after the backup. If a channel was completely quiescent after the backup, its local state matches the counterparty's and reestablish will not panic — it is healthy. Deleting the whole manager would close out these healthy channels along with the rest.

Second, fallen-behind channels and quiescent channels are distinguishable; there's no need for a heavy-handed one-size-fits-all. The recovery flow already connects to the counterparty, and the channel_reestablish it sends carries next_commitment_number; comparing it against the local state number immediately tells you whether the channel is behind or in sync.

So the final approach does not delete the manager. Instead, before the node starts, it performs one on-chain sync: the ChannelMonitor observes that the funding of a counterparty-force-closed channel has been spent, and after the manager handles that event it cleanly removes the closed channel — so reestablish takes the "I no longer recognize this channel" branch and no longer panics. A quiescent healthy channel's funding has not been spent, the on-chain sync does not touch it, its state matches, and it likewise does not panic and runs normally.

We verified this in practice: the recovery flow drops only the channels the counterparty force-closed, while quiescent healthy channels stay untouched and keep running.


5. The overall flow

The final design is a startup guard plus five CLI subcommands:

Startup guard: detect panic → write sentinel file → refuse to start next time
               (exit code 30), breaking the infinite restart loop

Operator runs five steps manually:
  recover query     connect to counterparties, trigger force-close of fallen-behind
                    channels, stage the consignment address
  recover classify  diagnose: distinguish fallen-behind vs quiescent healthy channels
                    (no side effects)
  recover import    download the consignment over HTTP, consume into stock, rebuild
                    the RGB allocation
  recover sweep     sweep BTC + RGB-colored assets from the close tx's to_remote
  recover finalize  reconcile closed channels, clear the sentinel; healthy channels
                    resume with zero damage

Recovery operations are idempotent and re-entrant: consume into the stock is idempotent, already-swept outpoints are recorded by an on-disk ledger and not processed again, and after an interruption you simply re-run the same command.


6. Conclusion

RGB-on-Lightning has long faced one question: when something goes wrong, how is the money recovered? Our answer is — recover BTC on par with ordinary Lightning, and additionally rebuild the client-side validated asset state (the allocation in the stock). The path is: reuse the proven "let the counterparty force-close" mechanism to retrieve the BTC, retrieve the RGB allocation from the counterparty via the RGBR protocol, and use a zero-damage finalize to keep healthy channels unaffected. All protocol extensions are odd-typed and do not break interoperability with standard nodes.

We have written up RGBR's full wire protocol as a bLIP proposal, submitted to the Lightning community for review (opens in a new tab).