Historically, I believe bcache offered a better design than dm-cache. I wonder if that has changed at all?
That said, for this use, I would be very concerned about coherency issues putting any cache in front of the actual distributed filesystem. (Unless this is the only node doing writes, I guess?)
> For e-commerce workloads, the performance benefit of write-back mode isn’t worth the data integrity risk. Our customers depend on transactional consistency, and write-through mode ensures every write operation is safely committed to our replicated Ceph storage before the application considers it complete.
Unless the writer is always overwriting entire files at once blindly (doesn't read-then-write), consistency requires consistency reads AND writes. Even then, potential ordering issues creep in. It would be really interesting to hear how they deal with it.
dm-cache writeback mode is both amazing and terrifying. It reorders writes, so not only do you lose data if the cache fails, you probably just corrupted the entire backing disk.
This is good timing; I was just looking at a use-case where we need more iops and the only immediate solutions involve allocating way more high-performance disks or network storage. The problem with a cache is having a large dataset with random access, so repeated cache hits might not be frequent. But I had a theory that you could still make an impact on performance and lower your storage performance requirements. I may try this out, but it is block-level, so it's a bit intrusive.
Another option I haven't tried is tmpfs with an overlay. Initial access is RAM, falls back to underlying slower storage. Since I'm mostly doing reads, should be fine, writes can go to the slower disk mount. No block storage changes needed.
Is Intel still working on it? Open-CAS bdev support was nearly removed from SPDK at a time when Intel still employed a SPDK development and QA team. Huawei stepped in to offer support to keep it alive, preventing its removal.
I’ve been under the impression that Intel got rid of pretty much all of their storage software employees.
AZs are whole datacenters, so I imagine their backbone bandwidth between AZs is a fraction of total bandwidth inside the DC. If they didn't charge it'd probably get saturated and then there's not much point in using them for reliability.
The internet egress price is where they're bastards.
Definitely not. Azure doesn't charge for intra region costs FWIW.
Getting terabits and terabits of 'private' interconnect is unbelievably cheap at amazon scale. AWS even own some of their own cables and have plans to build more.
There is _so_ much capacity available on fiber links. For example one newish (Anjana) cable between the US and Europe has 480Tbit/sec capacity. That's just one cable. And that could probably be upgraded to 10-20x that already with newer modulation techniques.
Historically, I believe bcache offered a better design than dm-cache. I wonder if that has changed at all?
That said, for this use, I would be very concerned about coherency issues putting any cache in front of the actual distributed filesystem. (Unless this is the only node doing writes, I guess?)
> For e-commerce workloads, the performance benefit of write-back mode isn’t worth the data integrity risk. Our customers depend on transactional consistency, and write-through mode ensures every write operation is safely committed to our replicated Ceph storage before the application considers it complete.
Unless the writer is always overwriting entire files at once blindly (doesn't read-then-write), consistency requires consistency reads AND writes. Even then, potential ordering issues creep in. It would be really interesting to hear how they deal with it.
dm-cache writeback mode is both amazing and terrifying. It reorders writes, so not only do you lose data if the cache fails, you probably just corrupted the entire backing disk.
This is good timing; I was just looking at a use-case where we need more iops and the only immediate solutions involve allocating way more high-performance disks or network storage. The problem with a cache is having a large dataset with random access, so repeated cache hits might not be frequent. But I had a theory that you could still make an impact on performance and lower your storage performance requirements. I may try this out, but it is block-level, so it's a bit intrusive.
Another option I haven't tried is tmpfs with an overlay. Initial access is RAM, falls back to underlying slower storage. Since I'm mostly doing reads, should be fine, writes can go to the slower disk mount. No block storage changes needed.
I was looking into SSD caching recently and decided to go with Open-CAS instead, which should be more performant (didn't test it personally): https://github.com/Open-CAS/open-cas-linux/issues/1221
It's maintained by Intel and Huawei and the devs were very responsive.
Is Intel still working on it? Open-CAS bdev support was nearly removed from SPDK at a time when Intel still employed a SPDK development and QA team. Huawei stepped in to offer support to keep it alive, preventing its removal.
I’ve been under the impression that Intel got rid of pretty much all of their storage software employees.
"When deploying infrastructure across multiple AWS availability zones (AZs), bandwidth costs can become a significant operational expense"
An expense in the age of 100gbit networking that is entirely because AWS can get away with charging the suckers, um, customers for it
AZs are whole datacenters, so I imagine their backbone bandwidth between AZs is a fraction of total bandwidth inside the DC. If they didn't charge it'd probably get saturated and then there's not much point in using them for reliability.
The internet egress price is where they're bastards.
Definitely not. Azure doesn't charge for intra region costs FWIW.
Getting terabits and terabits of 'private' interconnect is unbelievably cheap at amazon scale. AWS even own some of their own cables and have plans to build more.
There is _so_ much capacity available on fiber links. For example one newish (Anjana) cable between the US and Europe has 480Tbit/sec capacity. That's just one cable. And that could probably be upgraded to 10-20x that already with newer modulation techniques.
reduce network bandwidth from the network attaches SSD volumes, yes?