- Interior btree node updates are now journalled; removing the need for btree writes to be FUA
- Interior btree node updates are now fully transactional, we no longer have to do any metadata scanning after unclean shutdown
- Btree key cache code has been merged
- Major rework of journal replay finally fi...
2020-06-18 23:06:26 +0000 UTC
View Post
Just finished a major rework that gets us a step closer to snapshots: the btree code is incrementally being changed to handle extents like regular keys.
Previously, when reading in a btree node we'd have to check for and handle partially overwritten extents, as part of the mergesort we do (btree nodes are log structured). But the plan for ...
2019-12-29 17:36:19 +0000 UTC
View Post
There is now a (very work-in-progress) fuse port!
The fuse port isn't intended to ever be for serious use - but I do expect it to be useful for debugging in the future; if someone is hitting a repeatable bug in the bcachefs code, debugging it via the fuse version (with gdb) should be much easier for most people than collecting kernel oopse...
2019-11-06 20:07:53 +0000 UTC
View Post
For those who aren't familiar with the idea - reflink means using shared, reference counted extents to do "shallow copies" - copies that share data transparently on disk, but are copy on write (unlike hardlinked files).
To use it, just use cp --reflink. It's great for virtual machine images, and you can also use it like snapshots - e.g. "c...
2019-08-21 17:28:01 +0000 UTC
View Post
It's pretty close to done, but working through the last of the xfstests failures has been tedious.
But - I just pushed out a punch of prep work patches, and something else cool is now done - we're exporting the actual filesystem blocksize to the Linux VFS, instead of pretending the filesystem blocksize is actually PAGE_SIZE. This was neede...
2019-08-07 15:03:34 +0000 UTC
View Post
Phoronix posted some bcachefs benchmarks: https://www.phoronix.com/scan.php?page=article&item=bcachefs-linux-2019
The results are actually pretty encouraging, even if they might not look it on the surface - they're about ...
2019-06-26 19:04:54 +0000 UTC
View Post
Finally! It was a huge effort, but it's done and pushed out.
This means that when mounting a filesystem - even after an unclean shutdown - we don't have to walk all the metadata anymore, because it's always updated in a transactional manner and kept fully consistent in the b-tree.
There may be a performance regression for now on mul...
2019-04-20 03:39:45 +0000 UTC
View Post
5.0 rebase is up
And, more importantly - fully persistent allocation info is finally just about done! It's passing the tests, not much left before I can push it out...
2019-04-04 02:14:11 +0000 UTC
View Post
So, first some background:
Fully persistent allocation info is going to require updating the alloc btree every time we update the extents btree - one key in the alloc btree for every pointer in an extent being inserted or overwritten.
That introduces a bit of a difficulty, in that extents can overwrite an unbounded number of existing...
2019-03-04 20:30:31 +0000 UTC
View Post
So, to recap: bcachefs now persists allocation information on clean shutdown, so mounting after a clean shutdown doesn't require walking any metadata. However, we're not yet keeping allocation information updated as it's modified - that's my current project.
There's two main components to this. Firstly, there's the filesystem wide sector c...
2019-02-18 17:55:01 +0000 UTC
View Post
Persistent alloc info for clean shutdowns is finally done - this means when mounting after a clean shutdown, we don't have to scan metadata anymore, and mounting should be just as fast or faster than other filesystems.
We do still run fsck by default on every mount, so to see any change you'll have to turn that off with the nofsck mount o...
2019-02-10 00:59:52 +0000 UTC
View Post
I'll be at FOSDEM. I'm not planning on giving a talk or anything, but if anyone else is interested and is going to be there, send a message and I'd love to meet up.
2019-01-12 19:33:54 +0000 UTC
View Post
Option handling improvements: There's a single master list of option in opts.h, and that list is now used by bcachefs format as well, including for bcachefs format --help. This is a nice usability improvement - it means options are always specified the same way anywhere they can be used, and it means the helptext is always going to be consistent...
2018-12-27 15:06:10 +0000 UTC
View Post
So for now, I'm leaving off the remaining parts of erasure coding - the important part was getting everything done that impacts both the on disk format, and the rest of the design. There's some commonality between erasure coding and some of the other upcoming features, so getting erasure coding mostly done now was very useful because it was a good ...
2018-11-30 19:04:29 +0000 UTC
View Post
It's not production ready yet - stripe level copygc isn't implemented yet, so disk fragmentation could lead to your filesystem getting filled with partially empty stripes and getting stuck. But, aside from that it should be functional.
To use it, just enable the erasure_code option, either at mount time
mount -o erasure_code=true
or v...
2018-11-14 05:15:47 +0000 UTC
View Post
First off, sorry for the slow progress lately - I've been dealing with some health issues that have been making it incredibly difficult to work. But, the good news is that we may have finally figured out what's going on and *fingers crossed* aforementioned issues seem to finally, slowly be getting better.
The good news is though - with the w...
2018-10-12 17:34:44 +0000 UTC
View Post
One topic that was asked about recently was compression in bcachefs, so I thought I'd write a bit about how extents are represented as a bunch of stuff falls out of that.
In bcachefs, checksumming and compression are done per extent, not per block or per page. This means we store one checksum per extent and if the data is compressed, it'll be com...
2018-08-13 21:55:07 +0000 UTC
View Post
I've gotten a few comments that people have been enjoying my technical deep dives into things I'm working on.
There's a lot of other things I could write about as well, not just bcachefs but perhaps also other kernel and storage topics. I'd like to hear what people are interested in, though. If you've got an idea of something you'd lik...
2018-08-06 22:30:21 +0000 UTC
View Post
In the last post, I wrote about some new transaction infrastructure I was working on that would make it practical to make all the high level filesystem operations (e.g. create, link, unlink) fully atomic - that work is now finished and merged in.
The main benefit from this work is that now, on unclean shutdown, we don't have to walk the filesyste...
2018-07-17 14:18:18 +0000 UTC
View Post
I've talked a bit before about the new transaction infrastructure I've been working on, but to recap:
bcachefs has, for quite some time, had the ability to use multiple btree iterators simultaneously, and to do multiple btree updates atomically - the main btree update function takes a list of (iterator, new key) pairs and does all the updates ato...
2018-07-06 23:21:14 +0000 UTC
View Post
Been spending a surprising amount of time lately on the core btree - in a good way, as in "oh, here's some good an useful improvements I can easily make", not "oh crap, this thing is broken and I have to fix it".
Some of this was motivated by the truncate bug and needing implement BTREE_INSERT_NOUNLOCK, and more has been motivated by some mo...
2018-06-08 00:46:34 +0000 UTC
View Post
Been squashing quite a few bugs lately, but this latest one has been quite a trip down the rabbit hole...
Initial symptom was that on xfstest generic/475, very occasionally we'd see an extent past the end a file's current i_size (the test runs a filesystem stress test while injecting IO errors and then checking that the filesystem is consistent, ...
2018-06-01 18:07:57 +0000 UTC
View Post
definitely not drunk debugging right now
I know I've been shit at posting updates, so ask your questions now - about what's going on with upstreaming or anything else you can think of
2018-05-25 05:11:36 +0000 UTC
View Post
Just pushed a new feature (only lightly tested so far): when formatting, you can specify a "durability" for each device: the effect of this is that data on that device will be counted as being replicated that many times.
So if you've got a filesystem with two SSDs and a big hardware RAID array: you probably want all your data to be replicated...
2018-03-13 20:16:46 +0000 UTC
View Post
The new disk groups-based code for configuring data placement has been merged, and the notion of configuring disks into "tiers" has been removed. If you have an existing filesystem that uses tiering, you'll have to configure the new interfaces.
The reasoning behind the change was that a "disk tier" wasn't really a thing - it was just a hint ...
2018-02-20 21:03:09 +0000 UTC
View Post
Please test (and don't assume it won't eat all your data)
2018-02-17 00:16:13 +0000 UTC
View Post
The test framework I use for bcachefs - ktest - has been getting various cleanups and fixes to make it easier for other people to use - in particular, it works on non debian distributions now.
For anyone who's been interesting in getting started with kernel development or bcachefs development, ktest makes it really easy to get started: no messing...
2018-02-13 21:01:55 +0000 UTC
View Post
I just pushed initrams hooks/scripts for handling a bcachefs encrypted root filesystem - after you make install in bcachefs-tools, they'll be picked up next time you generate an initramfs, and if your root filesystem is encrypted you'll be promted for the passphrase to unlock it when booting up.
I've only tested it on debian. It could also b...
2018-02-11 19:32:56 +0000 UTC
View Post
Replication support is finally feature complete; it should have everything implemented that's needed for handling and recoving from device failure.
If replication is enabled on a filesystem, a device can fail and be removed while the filesystem is in use without returning any IO errors to userspace - reads/writes will be retried as needed, i...
2018-02-08 21:02:52 +0000 UTC
View Post
just fixed some bugs in the migrate tool, should be working again
2018-02-07 16:15:41 +0000 UTC
View Post