▲Longhorn – A Kubernetes-Native Filesystemvegard.blog.engen.priv.no

33 points by jandeboevrie 4 days ago | 30 comments

dpedu 49 minutes ago [-]

Kubernetes CSI drivers are surprisingly easy to write. You basically just have to implement a number of gRPC procedures that manipulate your system's storage as the Kubernetes control plane calls them. I wrote one that uses file-level syncing between hosts using Syncthing to "fake" network volumes.

https://kubernetes-csi.github.io/docs/developing.html

There are 4 gRPCs listed in the overview, that literally all you need.

cmeacham98 4 hours ago [-]

I tried longhorn on my homelab cluster. I'll admit it's possible that I did something wrong, but I managed to somehow get it into a state where it seemed my volumes got permanently corrupted. At the very least I couldn't figure out how to get my volumes working again.

When restoring from backup I went with Rook (which is a wrapper on ceph) instead and it's been much more stable, even able to recover (albeit with some manual intervention needed) from a total node hardware failure.

nerdjon 2 hours ago [-]

It is interesting seeing this article come up since just yesterday I setup longhorn in my homelab cluster needing better performance for some tasks than NFS was providing so I setup a raid on my r630 and tried it out.

So far things are running well but I can't shake this fear that I am in for a rude awakening and I loose everything. I backups but the recovery will be painful if I have to do it.

I will have to take a look at rook since I am not quite committed enough yet (only moved over 2 things) to switch.

positisop 4 hours ago [-]

Longhorn is a poorly implemented distributed storage layer. You are better off with Ceph.

willbeddow 3 hours ago [-]

have not used longhorn, but we are currently in the process of migrating off of ceph after an extremely painful relationship with it. Ceph has fundamental design flaws (like the way it handles subtree pinning) that, IMO, make more modern distributed filesystems very useful. SeaweedFS is also cool, and for high performance use cases, weka is expensive but good.

q3k 3 hours ago [-]

That sounds more like a CephFS issue than a Ceph issue.

(a lot of us distrust distributed 'POSIX-like' filesystems for good reasons)

__turbobrew__ 2 hours ago [-]

Are there any distributed POSIX filesystems which don’t suck? I think part of the issue is that POSIX compliant filesystem just doesn’t scale, and you are just seeing that?

willbeddow 2 hours ago [-]

weka seems to Just Work from our tests so far, even under pretty extreme load with hundreds of mounts on different machines, lots of small files, etc... Unfortunately it's ungodly expensive.

yupyupyups 4 hours ago [-]

I've heard Ceph is expensive to run. But maybe that's not true?

keeperofdakeys 3 hours ago [-]

Ceph overheads aren't that large for a small cluster, but they grow as you add more hosts, drives, and more storage. Probably the main gotcha is that you're (ideally) writing your data three times on different machines, which is going to lead to a large overhead compared with local storage.

Most resource requirements for Ceph assume you're going for a decently sized cluster, not something homelab sized.

jauntywundrkind 3 hours ago [-]

I'm only just wading in, after years of intent. I don't feel like Ceph is particularly demanding. It does want a decent amount of ram. 1GB each for monitor, manager, and metadata, up to 16GB total for larger clusters, according to docs. But then each disk's OSD defaults to 4gb, which can add up fast!! And some users can use more. 10Gbe is recommended and more is better here but that seems not unique to ceph: syncing storage will want bandwidth. https://docs.ceph.com/en/octopus/start/hardware-recommendati...

xyzzy123 3 hours ago [-]

For me it was the ram for the OSDs, 1GB per 1TB but ideally more for SSDs...

westurner 3 hours ago [-]

This from 2023 says: https://www.redhat.com/en/blog/ceph-cluster-single-machine :

> All you need is a machine, virtual or physical, with two CPU cores, 4GB RAM, and at least two or three disks (plus one disk for the operating system).

d3Xt3r 4 days ago [-]

Longhorn was the codename for Windows Vista... so not a great choice of a name (IMO).

onionisafruit 5 hours ago [-]

Longhorn is a fine name, and it doesn't matter if somebody else used it 20+ years ago

weinzierl 3 hours ago [-]

By that logic Titanic would be a fine name too.

NewJazz 3 hours ago [-]

Hmm, maybe just shorten to Titan?

esafak 13 minutes ago [-]

Just don't use it to name a database.

bigstrat2003 3 hours ago [-]

I mean, I think it would be. Superstition about naming is silly.

fineallaround 4 hours ago [-]

As a codename, no less. 0-0

What a stupid thing to complain about.

privatelypublic 4 hours ago [-]

Even complaining about Vista raises eyebrows. It had two huge issues: overactive UAC, and Microsoft handing "Vista Certified" to basically anybody who asked. (Frequently to machines that would barely run XP pre-SP1.)

Most of the complaints can be reduced to one of those.

Yes- I hand wave away a lot of other things: because they were required for a huge step towards a decently secure and stable OS.

4 hours ago [-]

antod 1 hours ago [-]

Could've been worse eg Cairo or Blackcomb.

gdbsjjdn 3 hours ago [-]

I did this was going to be about the Vista and how some of the FS stuff that got cut was prescient. "This old thing that didn't work was ahead of its' time" is a whole genre of post (ex. Itanium)

tracker1 4 hours ago [-]

I remembered the Windows Vista reference as soon as I saw the name. That said, I don't think it's a big deal.

scubbo 2 hours ago [-]

(Copied from[0] when this was posted to lobste.rs) Longhorn was nothing but trouble for me. Issues with mount paths, uneven allocation of volumes, orphaned undeletable data taking up space. It’s entirely possible that this was a skill issue, but still - never touching it again. Democratic-csi[1] has been a breath of fresh air by comparison.

[0] https://lobste.rs/s/vmardk/longhorn_kubernetes_native_filesy... [1] https://github.com/democratic-csi/democratic-csi

coopreme 4 hours ago [-]

Go with Ceph… a little more of a learning curve but overall better.

studmuffin650 2 hours ago [-]

Where I work, we primarily use Ceph for the a K8s Native Filesystem. Though we still use OpenEBS for block store and are actively watching OpenEBS mayastor

__turbobrew__ 2 hours ago [-]

I looked into mayastor and the NVME-of stuff is interesting, but it is so so so far behind ceph when it comes to stability and features. One ceph has the next generation crimson OSD with seastore I believe it should close a lot of the performance gaps with ceph.

dilyevsky 50 seconds ago [-]

> One ceph has the next generation crimson OSD with seastore I believe it should close a lot of the performance gaps with ceph.

only been in development for what like 5 years at this point? =) i have no horse in this race but seems to me openebs will close the gap sooner.

dilyevsky 4 hours ago [-]

Anyone knows what's the story with NVMEoF/SPDK support these days? A couple years ago Mayastor/OpenEBS was running laps around Longhorn on every performance metrics big time, not sure if anything changed there...

samlevy0515 24 minutes ago [-]

[dead]

Loading comments...

dpedu 49 minutes ago [-]

https://kubernetes-csi.github.io/docs/developing.html

There are 4 gRPCs listed in the overview, that literally all you need.

cmeacham98 4 hours ago [-]

nerdjon 2 hours ago [-]

So far things are running well but I can't shake this fear that I am in for a rude awakening and I loose everything. I backups but the recovery will be painful if I have to do it.

I will have to take a look at rook since I am not quite committed enough yet (only moved over 2 things) to switch.

positisop 4 hours ago [-]

Longhorn is a poorly implemented distributed storage layer. You are better off with Ceph.

willbeddow 3 hours ago [-]

q3k 3 hours ago [-]

That sounds more like a CephFS issue than a Ceph issue.

(a lot of us distrust distributed 'POSIX-like' filesystems for good reasons)

__turbobrew__ 2 hours ago [-]

Are there any distributed POSIX filesystems which don’t suck? I think part of the issue is that POSIX compliant filesystem just doesn’t scale, and you are just seeing that?

willbeddow 2 hours ago [-]

weka seems to Just Work from our tests so far, even under pretty extreme load with hundreds of mounts on different machines, lots of small files, etc... Unfortunately it's ungodly expensive.

yupyupyups 4 hours ago [-]

I've heard Ceph is expensive to run. But maybe that's not true?

keeperofdakeys 3 hours ago [-]

Most resource requirements for Ceph assume you're going for a decently sized cluster, not something homelab sized.

jauntywundrkind 3 hours ago [-]

xyzzy123 3 hours ago [-]

For me it was the ram for the OSDs, 1GB per 1TB but ideally more for SSDs...

westurner 3 hours ago [-]

This from 2023 says: https://www.redhat.com/en/blog/ceph-cluster-single-machine :

> All you need is a machine, virtual or physical, with two CPU cores, 4GB RAM, and at least two or three disks (plus one disk for the operating system).

d3Xt3r 4 days ago [-]

Longhorn was the codename for Windows Vista... so not a great choice of a name (IMO).

onionisafruit 5 hours ago [-]

Longhorn is a fine name, and it doesn't matter if somebody else used it 20+ years ago

weinzierl 3 hours ago [-]

By that logic Titanic would be a fine name too.

NewJazz 3 hours ago [-]

Hmm, maybe just shorten to Titan?

esafak 13 minutes ago [-]

Just don't use it to name a database.

bigstrat2003 3 hours ago [-]

I mean, I think it would be. Superstition about naming is silly.

fineallaround 4 hours ago [-]

As a codename, no less. 0-0

What a stupid thing to complain about.

privatelypublic 4 hours ago [-]

Most of the complaints can be reduced to one of those.

Yes- I hand wave away a lot of other things: because they were required for a huge step towards a decently secure and stable OS.

4 hours ago [-]

antod 1 hours ago [-]

Could've been worse eg Cairo or Blackcomb.

gdbsjjdn 3 hours ago [-]

I did this was going to be about the Vista and how some of the FS stuff that got cut was prescient. "This old thing that didn't work was ahead of its' time" is a whole genre of post (ex. Itanium)

tracker1 4 hours ago [-]

I remembered the Windows Vista reference as soon as I saw the name. That said, I don't think it's a big deal.

scubbo 2 hours ago [-]

[0] https://lobste.rs/s/vmardk/longhorn_kubernetes_native_filesy... [1] https://github.com/democratic-csi/democratic-csi

coopreme 4 hours ago [-]

Go with Ceph… a little more of a learning curve but overall better.

studmuffin650 2 hours ago [-]

Where I work, we primarily use Ceph for the a K8s Native Filesystem. Though we still use OpenEBS for block store and are actively watching OpenEBS mayastor

__turbobrew__ 2 hours ago [-]

dilyevsky 50 seconds ago [-]

> One ceph has the next generation crimson OSD with seastore I believe it should close a lot of the performance gaps with ceph.

only been in development for what like 5 years at this point? =) i have no horse in this race but seems to me openebs will close the gap sooner.

dilyevsky 4 hours ago [-]

samlevy0515 24 minutes ago [-]

[dead]