I recently learned that the IBM Z series mainframes are generally compatible with software written for the legendary IBM 360 launched in 1964. While I'm sure there are caveats, maintaining any backward compatibility with a platform from over 60 years ago is impressive.
Having started in 8-bit microcomputers and progressing to various desktop platforms and servers, mainframes were esoteric hulking beasts that were fascinating but remained mysterious to me. In recent years I've started expanding my appreciation of classic mainframes and minis through reading blogs and retro history. This IEEE retrospective on the creation of the IBM 360 was eye-opening. https://spectrum.ieee.org/building-the-system360-mainframe-n...
Having read pretty deeply on the evolution of early computers from the ENIAC era through Whirlwind, CDC, early Cray and DEC, I was familiar with the broad strokes but I never fully appreciated how much the IBM 360 was a major step change in both software and hardware architecture. It's also a dramatic story because it's rare for a decades-old company as successful and massive as IBM to take such a huge "bet the company" risk. The sheer size and breadth of the 360 effort as well as its long-term success profoundly impacted the future of computing. It's interesting seeing how architectural concepts from the 360 (as well as DEC's popular PDP-8, 10 and 11) went on to influence the design of early CPUs and microcomputers. The engineers and hobbyists creating early micros had learned computers in the late 60s and early 70s mostly on the 360s and PDPs which were ubiquitous in universities.
After reading the IEEE article I linked above, I got the book the article was based on ("IBM: The Rise and Fall and Reinvention of a Global Icon"). While it's a thorough recounting of IBM's storied history, it wasn't what I was looking for. The author specifically says his focus was not the technical details as he felt too much had been written from that perspective. Instead that book was a more traditional historian's analysis which I found kind of dry.
I don't mean this in a condescending way at all, but really out of sheer curiosity: Who uses mainframes nowadays and what for?
I find mainframes fascinating, but I'm so unfamiliar with them that I don't know what or why I'd ever use one for (as opposed to "traditional" hardware or cloud services).
Besides all answers given already, one of the reasons Unisys keeps selling Burroughs, aka ClearPath MCP, is its security model.
ESPOL/NEWP is one of the very first systems programming languages, being safe by default, with unsafe code blocks.
The whole OS was designed with security first in mind, think Rust in 1961, thus their customers are companies that take this very seriously, not only running COBOL.
> Besides all answers given already, one of the reasons Unisys keeps selling Burroughs, aka ClearPath MCP, is its security model.
I think you are exaggerating that selling point – maybe historically that was true, but nowadays nobody is running MCP because it is more secure than any alternative, they are running it because it is a legacy system and migrating off it is too hard or expensive (at least for now), or the migration project is still underway or stuck in development hell, or they tried migrating off it before and the migration project failed.
People who are shopping for the highest security money can buy would be looking at something like BAE Systems XTS-400, not Unisys MCP. (Or seL4, which is open source and hence free, but you'll spend $$$$ on the custom work required to make it do anything useful.)
Especially since MCP now runs as a software CPU emulator under x86-64 Linux, and Unisys has done a lot of work to enable Linux and MCP processes to seamlessly talk to each other – so you can create hybrid apps which retain the MCP core but add components written in Java/etc running directly under Linux – that makes it really hard for contemporary MCP to provide much more security than the host Linux system does.
Maybe, but then again, those systems are based on a systems language that the industry has spent 50 years mostly ignoring its design flaws, until governments decided it was about time to start taking liabilities in cybersecurity seriously.
Also I didn't came up with this myself, it is part of their marketing materials and white papers.
Finally, if it was worthless they would have probably dropped it by now.
> Also I didn't came up with this myself, it is part of their marketing materials and white papers.
> Finally, if it was worthless they would have probably dropped it by now.
Companies love to "talk up" their products in marketing materials. It is far from uncommon for those materials to contain claims which, while not entirely false, aren't exactly true either – and I suspect that's what's happening here.
IBM does the same thing – listen to some IBM i person tell you how "advanced" their operating system is compared to everything else – sure, there's some theoretical truth to that, but in practical terms it is more true in the past than in the present
At least partially - the technical introduction for the Z17 says that several items can be concurrently maintained (IBM-speak for hot-swapped). So far as major items like the processing units - maybe (still reading).
> Four Power Supply Units (PSUs) that provide power to the CPC drawer and are accessible from the rear. Loss of one PSU leaves enough power to satisfy the power requirements of the entire drawer. The PSUs can be concurrently maintained
I would think their customers would demand zero downtime. And hey - if BeOS could power down CPUs almost 30 years ago I would expect a modern mainframe to be able to do it.
The Unisys Clearpath/MCP runs on Xeons, so I don't think there is CPU hot-swapping.
I don't know about physically removing a drawer, but on IBM Z, if there is a un-recoverable processor error, it will be shut down, and another spare processor brought on-line to take over, transparently.
I don't know how licensing/costs ties into the CPU/RAM spares.
> The Unisys Clearpath/MCP runs on Xeons, so I don't think there is CPU hot-swapping.
Nowadays it is a software emulator that runs under Linux – so if the Linux kernel and hardware you are running it on supports CPU hot-swap, then the underlying OS will. I believe at one point Unisys would only let you run it on their own branded x86 servers, they now let people run it in Azure, and I'm sure Microsoft isn't using Unisys hardware for that.
Running Linux in a VM, the hypervisor can implement hotplug whether or not the underlying server hardware physically does. Of course, without physical CPU hot-swap, it may not add much to reliability, but it still can potentially help with scaling.
If you hotplug a new virtual CPU into the Linux VM, you'd probably want to hotplug another emulated mainframe CPU into the mainframe CPU emulator at the same time. No idea if Unisys actually supports that, but they easily could, it is just software – the Linux kernel sends a udev event to userspace when CPUs are plugged/unplugged, and you could use that to propagate the same event into the mainframe emulator.
Large institutions (corporations, governments) that have existed more than a couple decades, and have large-scale mission-critical batch processes that run on them, where the core function is relatively consistent over time. Very few, if any, new processes are automated on mainframes most of these places, and even new requirements for the processes that depend on the mainframe may be built in other systems that process data before or after the mainframe workflows, but the cost and risk of replacing the well-known, finely-tuned-by-years of ironing out misbehavior, battle-tested systems often isn't warranted without some large scale requirements change that invalidates the basic premises of the system. So, they stay around.
> and have large-scale mission-critical batch processes that run on them, where the core function is relatively consistent over time.
I think most mainframe applications involve OLTP not just batch processing – commonly a hybrid of both. e.g. around a decade ago I did some work for a large bank which used CSC/DXC/Luxsoft Hogan as their core banking system – that's a COBOL application that runs under IBM CICS for transaction processing, although I'm sure it had some batch jobs in it too.
(I myself didn't touch any of the mainframe stuff, I was actually helping the project to migrate some of its functions to a Java-based solution that ran on Linux – no idea what its current status of it all is a decade on.)
Thanks for that and yeah that fits with what I've found, mostly continuation of legacy, critical systems that were built on mainframes. It just seems shocking to me the amount of investments IBM still puts on developing those machines given that no one seems to want to use them anymore?
It feels like I must be missing something, or maybe just underestimating how much money is involved in this legacy business.
IBM mainframes are extremely profitable. There are ~1,000 customers who cannot migrate off mainframes and are willing to spend almost any amount to keep their apps working. Mainframe profits subsidize all of IBM's other money-losing divisions like cloud.
Australia's social security system is implemented using Model 204 – that's a database combined with transactional 4GL, that runs under z/OS, now sold by Rocket software. It was quite popular in the 1980s, but now I believe there are only two customers left on the planet – the Australian federal government, and the US federal government (although possibly they have more than one customer within the US federal government–Veterans Affairs is one, but maybe there are one or two more.)
A couple of years ago, Australia cancelled its multi-billion dollar project to move off it on to a Linux-based solution (Pegasystems), after having spent over US$120 million on it. The problem was that although the new system did the calculations correctly, it took minutes to process a single individual, something the mainframe could do in 2 seconds.
But, I'm 100% sure this was nothing to do with the inherent nature of the problem they were trying they were trying to solve – I think it was likely because they'd picked the wrong off-the-shelf product as a replacement and the wrong team to implement it, and if they'd gone with different vendors it could well have succeeded – but after spending years and over US$100 million to discover that, they aren't keen to try again.
While new processes don't often get created for mainframes (most of the tasks it is particulary suited for have already been automated by this point, and the market of companies doing those tasks is pretty statics) outside of bolting on more modern interfaces to existing backend, one thing that is often ignored in answering the "why mainframes" question is that the overall amount of compute being done on the platform continues to increase. The same old batch and transaction processing programs get fitted with shiny web frontends and more and more business gets sent to those same old programs. So you can end up in a situation where mainframes remain a very stable business that is worth investing in for future upgrades, even while they become more and more "niche" in the eyes of most computer users over time as the number of computers increases at a faster rate than mainframe capacity increase, so the ratio of computer capacity to mainframe capacity is going up, therefore mainframe capacity is growing substantially while also making up less percent of a larger market over time. Then of course there are the margins, which are much higher than they are for x86_64 servers or other common architecture. A slow and steady increase in demand, mixed with outstanding margins, makes for a good business plan.
Many of the big existing mainframe customers already have multiple max capacity models and are pushing them to their limits as web and analytics and AI/ML and a bunch of other factors increase the overall amount of workload finding their way to mainframes. IBM wouldn't be making those brand new generations of 4-frame models with a new larger max capacity if there weren't customers buying them.
given that no one seems to want to use them anymore
According to a 2024 Forrester Research report, mainframe use is as large as it's ever been, and expected to continue to grow.
Reasons include (not from the report) hardware reliability and scalability, and the ability to churn through crypto-style math in a fraction of the time/cost of cloud computing.
Report is paywalled, but someone on HN might have a free link.
"Crypto-style" here, I am guessing--but not entirely certain--from a downstream comment, is intended as cryptographic in the more general sense, and not "cryptocurrency" as "crypto"-alone is often used for these days?
I'd expect them to be significantly better than the competition in that area given the large part of the CPU that is a dedicated coprocessor specifically for crypto (called CPACF). There is much less area given to crypto on x86_64.
Of course we are talking about encryption here. TLS and AES etc etc. Not Bitcoin mining, which would indeed not be very cost effective.
They have a lot of cryptographic functionality built directly into hardware, and IBM is touting quantum resistant cryptography as one of the key features. You won't mine Bitcoin on one, but, if you are concerned a bad actor could get a memory or disk dump of your system and store it until quantum computers become practical, IBM says they have your back.
All these legacy answers don't really make sense for this Z17... it's a new mainframe supporting up to 64T of memory and specialized cores for matrix math/AI applications. I have a hard time believing that legacy workloads are calling for this kind of hardware.
It also has a ton of high availability features and modularity that _does_ fit with legacy workloads and mainframe expectations, so I'm a little unclear who wants to pay for both sets of features in the same package.
The legacy workloads are in some sense legacy, but in another sense the term legacy is misleading because at larger shops with capacity growth there has been no shortage of modernization of the various frontends and addition of new logic into the existing programs. The application maybe has the same name and core business as it always has, but is still growing in size and under active development. The idea now is that you might as well do the same thing and build in AI inference to those existing applications. Which is something they started implementing first with the last generation Telum chip which added some basic tensor units. This time around they are adding vector extensions 3 (think round 3 kinda like how AVX evolved) and that tensor processing extension 2 to the instruction set. Plus they are selling a discrete chip now that uses those same instructions and connects over PCI, which will probably be more than enough given that the goal is to never train on mainframes and only inference.
You won't see mainframes doing AI training, but there is a lot of value in being able to do extremely low-latency inference (which is why they have their NPU on the same chip as the CPUs, just a few clock-cycles from the cores) during on-line transaction processing, and less timing-critical inference work on the dedicated cards (which are a few more nanoseconds away).
Additionally, IBM marketing likes the implication that mainframe CPUs are 'the best'. If you can buy a phone with an AI processor, it only makes sense that your mainframe must have one too. (And IBM will probably charge you to use it.)
They aren't the best, they are different, and the best in certain areas. In terms of cache size they are way out in front. That architecture has very very fat cores. That's the main thing that makes them different (not better or worse) than something like x86_64, is the focus on single thread performance and cache rather than more cores. If you have a transaction processing app that you can't shard and that scales mostly with single thread perf, then they are pretty well suited for that use case.
If I were a bank, I'd order one of those and put all the financial modelling and prediction load on it. Like real time analysis to approve/deny loans, do projections for deciding what to do in slower moving financial products, predicting some future looking scenarios on wider markets, etc. etc. simultaneously.
That thing is dreadnought matmul machine with some serious uptime, and can crunch numbers without slowing down or losing availability.
or, possibly, you can implement a massively parallel version of WOPR/Joshua on it and let it rip scenarios for you. Just don't connect to live systems (not entirely joking, though).
P.S.: I'd name that version of the Joshua as JS2/WARP.
> I don't mean this in a condescending way at all, but really out of sheer curiosity: Who uses mainframes nowadays and what for?
There's probably some minor strategic relevance here. E.g. for the government which has some workloads (research labs, etc.) that suit these systems, it's probably a decent idea not to try and migrate over to differently-shaped compute just to keep this CPU IP and its dev teams alive at IBM, to make sure that the US stays capable of whipping up high-performance CPU core designs even if Intel/AMD/Apple falter.
Those customers don't use mainframe, they use POWER. There's been a handful of POWER supercomputers in the past decade built for essentially that reason.
POWER is not uncommon in HPC, but IBMi (which is very enterprisey) is also based on POWER. You won't find IBM mainframes in HPC, but that's because HPC is not as sensitive for latency and reliably than online transaction processing, and, with mainframes, you are paying for that, not for TFLOPS.
Yeah the architectures focus on different things. The huge caches on the z series chips are designed primarily around the kind of latency sensitive workloads more common in finance than the massive floating point throughput often needed for big scientific computing.
I understand that a company I work with uses a few, and is migrating away from them.
It seems clear to me that prior to robust systems for orchestrating across multiple servers that you would install a mainframe to handle massive transactional workloads.
What I can never seem to wrap my head around is if there are still applications out there in typical business settings where a massive machine like this is still a technical requirement of applications/processes or if it's just because the costs of switching are monumental.
There are really applications that are large enough and hard enough to parallelize and/or shard that any rewrite on a different platform would turn into a performance catastrophe, even if you hired the best engineers and wrote the whole thing in very efficient C++, which you never hear about them doing because its usually only about saving money and so it's done with as cheap of developers as possible and in Java. I've seen it and it's not pretty. They tend to be dog slow and unresponsive, and even harder to maintain than the crusty old assembler and COBOL, because you have to implement a lot of the report writing and record crunching features built into a domain specific language like COBOL from scratch it you want to write the same application in Java.
That's my biggest pet peeve with people that want to ditch mainframes, which is that they seem to care very little about the quality and performance of the software in my experience or they would only be thinking of replacing COBOL and Assembler code with an equivalently performant modern language and dialect. The desire to migrate is often driven primarily to have cheap, easily replaceable developers.
Bank payment processing is the primary example - being able to tell if a specific transaction is or not fraudulent in less than 100 milliseconds - but there are other businesses with similar requirements. Healthcare is one of them, and fraud detection is getting a lot more sophisticated with the on-chip NPUs within the same latency constraints.
Cloud is basically an infinitely scalable mainframe. You have dedicated compute optimised for specific workloads, and you string these together in a configuration that makes sense for your requirements.
If you understand the benefits of cloud over generic x86 compute, then you understand mainframes.
Except that now you need to develop the software that gives mainframes their famed reliability yourself. The applications are very different: software developed for cloud always needs to know that part of the system might become unavailable and work around that. A lot of the stack, from the cluster management ensuring a failed node gets replaced and the processes running on them are spun up on another node, all the way up to your code that retries failed operations, needs to be there if you aim for highly reliable apps. With mainframes, you just pretend the computer is perfect and never fails (some exaggeration here, but not much).
Also, reliability is just one aspect - another impressive feature is their observability features. Mainframes used to be the cloud back then and you can trace resource usage with exquisite detail, because we used to bill clients by CPU cycle. Add to that the hardware reliability features built-in (for instance, IBM mainframes have memory in RAID-like arrays).
Pretty much every fortune 500 that's been around for more than 30 years. Batch processing primarily - from closing their books to order processing, differs by company. But if you ask the right person, they'll tell you where it's at.
Fortune 500? More like Fortune 50000 (ok, exaggeration). But there are so many banks in the world, and their automation can run back to the 1950s. They are only slowly moving away from mainframes, if only because a rewrite of a complex system that nobody understands is tough, and possibly very costly if it is the key to billions of euros/dollars/...
IBM prices processors differently for running COBOL and Java - if you run mostly Java code, your monthly licensing fees will be vastly different. On boot, the service elements (their BMCs - on modern machines they are x86 boxes running Linux) loads microcode in accordance to a saved configuration - some CPUs will run z/OS, some will run z/OS and COBOL apps, some will run Java, some will run z/VM, some will run Linux. This is all configured on the computer whose job is to bring up the big metal (first the power, then the cooling, and only then, the compute). Under everything on the mainframe side is the PR/SM hypervisor, which is, IIRC, what manages LPARS (logical partitions, completely isolated environments sharing the same hardware). The cheapest licensing is Linux under a custom z/VM (they aren't called z but LinuxONE), and the most expensive is the ability to run z/OS and COBOL code. Running Java under z/OS is somewhat cheaper. Last time I saw it, it was very complicated.
Everyone's predisposed that "mainframes are obsolete", but why not use a mainframe?
I mean, no one except for banks can afford one, let alone make back on opex or capex, and so we all resort to MySQL on Linux, but isn't the cost the only problem with them?
Banks smaller than the big ~5 in the US cannot afford anything when it comes to IT infrastructure.
I am not aware of a single state/regional bank that wants to have their IBM on premise anymore - at any cost. Most of these customers go through multiple layers of 3rd party indirection and pay one of ~3 gigantic banking service vendors for access to hosted core services on top of IBM.
Despite the wildly ramping complexity of working with 3rd parties, banks still universally prefer this over the idea of rewriting the core software for commodity systems.
A Rockhopper 4 Express, a z16 without z/OS support (running Linux) was in the mid 6 digits. It's small enough to co-locate on a rack with storage nodes. While z/OS will want IBM storage, Linux is much less picky.
IBM won't release the numbers, but I am sure it can host more workloads in the same chassis than the average 6-digit Dell or Lenovo.
Density is also bad. You spend 4 full racks and get 208 cores. Sure, they might be the fastest cores around, but that gets you only so far when even off-the-shelf Dell server has 2x128-192 cores in 1U server. Similarly 64 TB of RAM is a lot, but that same Dell can have 3 TB of RAM. If I'm reading the specs correctly (they are bit confusing), z17 has only 25G networking (48 ports); the Dell I'm checking can have 8x200G network ports and can also do 400G networking. So the single 1U commodity server has more network bandwidth than the entire 4 rack z system.
Sure, there will be lot of overhead in having tens-hundreds of servers vs single system, but for lots of workloads it is manageable and certainly worth the tradeoff.
Take a look at the cache size on the Telum II, or better yet look at a die shot and do some measuring of the cores. Then consider that mainframe workloads are latency sensitive and those workloads tend to need to scale vertically as long as possible.
The goal is not to rent out as many vCPUs as possible (a busines model in which you benefit greatly by having lots and lots of small cores on your chip). The goal for zArch chips is to have the biggest cores possible with as much area used for cache as possible. This is antithetical to maximizing core density, and so you will find that each dual chip module is absolutely enormous, and that each core takes up more area in the zArch chips than in x86_64 chips, and that those chips therefore have significantly less core density.
The end result is likely that the zArch chips are going to have much higher single thread perf. Whereas they will probably get smacked by say a Threadripper on multithreaded workload where you are optimizing for throughout. This is ignoring intricacies about vectorizatiln and what can / can't be accelerated and whether or not you want binary or decimal floating point and other details and is a broad generalization about the two architectural general performance characteristics.
Likewise, the same applies for networking. Mainframe apps are not bottlenecking on bandwidth. They are way less likely to be web servers dishing out media for instance.
I really dislike seeing architectures compared via such frivolous metrics because it demonstrates a big misunderstanding of just how complex modern CPU designs are.
Not the only problem with them. Not as easy to find skilled staff to operate them. Also, you become completely dependent on IBM (not fully terrible -- it's a single throat to choke when things go wrong).
By that, do you mean banks, payment networks or both? And I guess I'd be curious as to why mainframes versus the rest. It seems like the answer for "why" is mainly because it started on mainframes and the cost of switching is really high, but I wonder if there isn't more to it.
Edit: Oh yeah, just saw MasterCard has some job posting for IBM Mainframe/COBOL positions. Fascinating.
Not just that. Most operating systems lie about when an IO transaction completes for performance reasons. So if you lose power or the IO device dies you still think it succeeded. A mainframe doesn't do that... it also multiplexes the IO so it happens more than once so if one adapter fails it keeps going. The resiliency is the main use case in many situations. That said IME 99.995% of use cases don't need a mainframe. They just don't need to be that reliable if they can fail gracefully.
I think what you are referring to is the "sub capacity" pricing model wherein a rolling average of resource consumption is used for billing. They've transitioned to newer models circa cloud technology, but it's mostly the same idea with more moving parts.
The advantage of this model from a business operations standpoint is that you don't have to think about a single piece of hardware related to the mainframe. IBM will come out automagically and fix issues for you when the mainframe phones home about a problem. A majority of the system architecture is designed on purpose to enable seamless servicing of the machine without impacting operations.
> IBM will come out automagically and fix issues for you when the mainframe phones home about a problem. A majority of the system architecture is designed on purpose to enable seamless servicing of the machine without impacting
I'd rather have a fault-tolerant distributed software system running on commodity hardware, that way there's a plurality of hardware and support vendors to choose from. No lock-in.
> I'd rather have a fault-tolerant distributed software system running on commodity hardware, that way there's a plurality of hardware and support vendors to choose from. No lock-in.
But then you'd have to develop it yourself. IBM has been doing just that for 60 years (on the 360 and its descendants).
That's like toy drone company trying to compete with DARPA. Not even close.
These kinds of monsters run under critical environments such as airports, with AS400 or similar terminals being used by secretaries. These kind of workloads, reliability, security, testing, are no joke. At all. This is not your general purpose Unix machine.
I haven't worked with mainframes since the z10, but back then you could get into an entry model for about $100k.
Though the sky is the limit. The typical machine I would order had a list price of about 1 million. Of course no one pays list. Discounts can be pretty substantial depending on how much business you do with IBM or how bad they want to get your business.
The big problem is that everything in IBM-z world is negotiated, and often covered by NDAs. The pricing is complicated by which operating systems and what sort of workloads you'll be running, and what service level guarantees you need. The only published pricing in the entire life of the IBM 360/370/390/z-series line was the Linux One when it was first released... Hardware plus OS, excluding storage, was $70k on the low end.
Previous generation machines that came off-lease used to be listed on IBM's web site. You could have a fully-maxed-out previous-generation machine for under $250k. Fifteen years ago I was able to get ballpark pricing for a fully-maxed-out new machine, and it was "over a million, but less than two million, and closer to the low end". That being said, the machines are often leased.
If you go with z/vm or z/vse, the OS and softare is typically sold under terms that are pretty much like normal software, except it varies depending on the capacity level of the machine, which may be less than the physical number of CPUs in the machine, since that is a thing in IBM-land.
If you go for z/os, welcome to the world of metered billing. You're looking at tens of thousands of dollars in MRC just to get started, and if you're running the exact wrong mix of everything, you'll be spending millions just on software each month. There's a whole industry that revolves around managing these expenses. Still less complicated than The Cloud.
Hercules is _not_ used by IBM's own developers. Being found with Hercules on your computer at IBM gets you in trouble. I know people who work on mainframe-related stuff inside IBM and they steer well clear of Hercules. And I've heard that IBM's computer monitoring stuff (antivirus, asset protection, etc.) looks for Hercules and flags it.
But IBM _does_ have their own mainframe emulator, zPDT (z Personal Development Tool), sold to their customers for dev and testing (under the name zD&T -- z Development and Test), and to ISVs under their ISV program. That's what IBM's own developers would be using if they're doing stuff under emulation instead of LPARs on real hardware.
(And IBM's emulator is significantly faster than Hercules, FWIW, but overall less feature-full and lacks all of the support Hercules has for older architectures, more device types, etc.)
There was some of a legal fight between IBM and Turbo Hercules SSA, a company that tried to force IBM to license z/OS to their users. IBM has been holding a grudge ever since (probably at the advice of their legal).
You can run the emulator but you will not get your hands on
new versions of the operating system to run on it.
But there are old versions that you can get your hand on.
You don't buy a mainframe, it's consumption based pricing. They aren't just going to list a price, because they need to size the hardware to what they think the workload will be.
Could they just list prices? Sure. Will they ever do it? No.
It depends on how full those drawers are. $250k to $1m would be the typical price range.
It's easier and harder at the same time to buy older hardware. That's half the challenge though because the software is strictly licensed and you pay per MIPS.
Here's a kid who bought a mainframe and then brought it up:
It's probably impossible to say because of the service contracts that come with it. Nobody would buy one brand new and not pay for support and consulting too.
I'm completely fascinated by the diagram. In a four rack system, 2.5 rack is dedicated to I/O, half a rack is just empty and the remaining is the actual processing and memory.
The I/O probably isn't endless networking adaptors, so what is it?
“The IBM z17 supports a PCIe I/O infrastructure. PCIe features are installed in PCIe+ I/O drawers. Up to 12 I/O drawers per IBM z17 can be ordered, which allows for up to 192 PCIe I/O and special purpose features.
For a four CPC drawer system, up to 48 PCIe+ fan-out slots can be populated with fan-out cards for data communications between the CPC drawers and the I/O infrastructure, and for coupling. The multiple channel subsystem (CSS) architecture allows up to six CSSs, each with 256 channels.
The IBM z17 implements PCIe Generation 5 (PCIe+ Gen5), which is used to connect the PCIe Generation 4 (PCIe+ Gen4) dual port fan-out features in the CPC drawers. The I/O infrastructure is designed to reduce processor usage and I/O latency, and provide increased throughput and availability.”
There's also the problem in that they need to take into account floor loading. They're not going to tell a customer upgrading from an older machine to a new one that, "oh, by the way, the rack weighs twice what it used to, so you'll need to upgrade your floor while you're at it." Especially important for raised floors.
Probably channels. In an IBM mainframe, each I/O device is connected on its own channel, which is actually a separate computer that handles the transfer to/from the main CPU. This has been the case going back to the System/360, which is why mainframes are legendary for their transaction throughput and reliability. There's probably a lot of redundancy in the I/O hardware, as they have to be rock solid and hot swappable while the system is running.
Reading about mainframes feels very much like reading science fiction. Truly awesome technology that exists on a completely different plane of computing than anything else.
Yeah a few years ago there was a Talos (I think) desktop motherboard that had a POWER 8 cpu in it. It was expensive due to low production runs but I wish it had taken off. I think IBM is up to power 9 now, but I doubt if there are any personal motherboards for it.
> POWER10, however, contained closed firmware for the off-chip OMI DRAM bridge and on-chip PPE I/O processor, which meant that the principled team at Raptor resolutely said no to building POWER10 workstations, even though they wanted to.
I bought mine - I own two Talos systems and a Blackbird - because I want choices in ISA, and if people want that, they'll need to spend the money. (It helps that I'm a notorious pro-PowerPC bigot, having used POWER since the RS/6000 MicroChannel days.) Other than ARM, they're the only architecture with both general OS support and processing power in the same ballpark, and while IBM may sometimes be dunderheaded, they aren't going anywhere, either.
They aren't cheap and they aren't for everyone. But it meets my needs and it puts my money where my mouth is.
If I understand you correctly it means the primary use case is security applications that need transparency at all levels on a system that is roughly equivalent to mainstream platform performance features. Is that accurate?
As I see it, these systems have two potential markets (some natural overlap exists): the first being people like me who don't want to necessarily feed the x86-64 beast if they can avoid it, and the second indeed being people who want a completely auditable, transparent platform. In both cases to be a practical alternative the computer needs to have similar performance to what you'd get from a modern desktop, and while POWER9 is no longer cutting edge, it still generally delivers.
If I could afford one I would, not because of security, but just the geek in me finds it cool.
Back in the 90s and early 2000s, there were several non-x86 architectures that were more powerful, and even 64 bit long before Intel ever did. The DEC alpha, SPARC, and others. I was also too poor to afford those back then but I remember them fondly.
I believe Intel and AMD motherboards all have proprietary firmware and the Talos systems are puritanically open. (But the processor itself is closed so there's that. Could have gone with SPARC Niagara which was open sourced https://www.cnet.com/tech/computing/sun-open-sources-second-... )
The microarch is closed and IBM-specific. However, the ISA is open and royalty-free, and the on-chip firmware is open source and you can build it yourself. In this sense it's at least as open as, say, many RISC-V implementations.
No. The microarchitectures have some notable similarities and cross-pollinate each other, but they are distinct.
You may be thinking of IBM i (formerly known as AS/400 and i5), which has a completely abstracted instruction set that on modern systems is internally recompiled to Power.
I dunno, but the z-processors and the POWER processors look a lot different even from a floor plan / die shot perspective. The former also clock much higher. Doesn't smell like microcode to me.
I recently learned that the IBM Z series mainframes are generally compatible with software written for the legendary IBM 360 launched in 1964. While I'm sure there are caveats, maintaining any backward compatibility with a platform from over 60 years ago is impressive.
Having started in 8-bit microcomputers and progressing to various desktop platforms and servers, mainframes were esoteric hulking beasts that were fascinating but remained mysterious to me. In recent years I've started expanding my appreciation of classic mainframes and minis through reading blogs and retro history. This IEEE retrospective on the creation of the IBM 360 was eye-opening. https://spectrum.ieee.org/building-the-system360-mainframe-n...
Having read pretty deeply on the evolution of early computers from the ENIAC era through Whirlwind, CDC, early Cray and DEC, I was familiar with the broad strokes but I never fully appreciated how much the IBM 360 was a major step change in both software and hardware architecture. It's also a dramatic story because it's rare for a decades-old company as successful and massive as IBM to take such a huge "bet the company" risk. The sheer size and breadth of the 360 effort as well as its long-term success profoundly impacted the future of computing. It's interesting seeing how architectural concepts from the 360 (as well as DEC's popular PDP-8, 10 and 11) went on to influence the design of early CPUs and microcomputers. The engineers and hobbyists creating early micros had learned computers in the late 60s and early 70s mostly on the 360s and PDPs which were ubiquitous in universities.
I encountered assembly programs written and compiled in '88 and still running.
There are several drawbacks to maintaining this kind of compatibility but, nevertheless, it's impressive.
Book recommendation: in-depth on the people, processes, and technology. Incredible detail on all aspects.
https://direct.mit.edu/books/monograph/4262/IBM-s-360-and-Ea...
Thanks for the recommendation! I've ordered it.
After reading the IEEE article I linked above, I got the book the article was based on ("IBM: The Rise and Fall and Reinvention of a Global Icon"). While it's a thorough recounting of IBM's storied history, it wasn't what I was looking for. The author specifically says his focus was not the technical details as he felt too much had been written from that perspective. Instead that book was a more traditional historian's analysis which I found kind of dry.
I don't mean this in a condescending way at all, but really out of sheer curiosity: Who uses mainframes nowadays and what for?
I find mainframes fascinating, but I'm so unfamiliar with them that I don't know what or why I'd ever use one for (as opposed to "traditional" hardware or cloud services).
Besides all answers given already, one of the reasons Unisys keeps selling Burroughs, aka ClearPath MCP, is its security model.
ESPOL/NEWP is one of the very first systems programming languages, being safe by default, with unsafe code blocks.
The whole OS was designed with security first in mind, think Rust in 1961, thus their customers are companies that take this very seriously, not only running COBOL.
The motto is unsurpassed security.
https://www.unisys.com/product-info-sheet/ecs/clearpath-mast...
> Besides all answers given already, one of the reasons Unisys keeps selling Burroughs, aka ClearPath MCP, is its security model.
I think you are exaggerating that selling point – maybe historically that was true, but nowadays nobody is running MCP because it is more secure than any alternative, they are running it because it is a legacy system and migrating off it is too hard or expensive (at least for now), or the migration project is still underway or stuck in development hell, or they tried migrating off it before and the migration project failed.
People who are shopping for the highest security money can buy would be looking at something like BAE Systems XTS-400, not Unisys MCP. (Or seL4, which is open source and hence free, but you'll spend $$$$ on the custom work required to make it do anything useful.)
Especially since MCP now runs as a software CPU emulator under x86-64 Linux, and Unisys has done a lot of work to enable Linux and MCP processes to seamlessly talk to each other – so you can create hybrid apps which retain the MCP core but add components written in Java/etc running directly under Linux – that makes it really hard for contemporary MCP to provide much more security than the host Linux system does.
Maybe, but then again, those systems are based on a systems language that the industry has spent 50 years mostly ignoring its design flaws, until governments decided it was about time to start taking liabilities in cybersecurity seriously.
Also I didn't came up with this myself, it is part of their marketing materials and white papers.
Finally, if it was worthless they would have probably dropped it by now.
> Also I didn't came up with this myself, it is part of their marketing materials and white papers.
> Finally, if it was worthless they would have probably dropped it by now.
Companies love to "talk up" their products in marketing materials. It is far from uncommon for those materials to contain claims which, while not entirely false, aren't exactly true either – and I suspect that's what's happening here.
IBM does the same thing – listen to some IBM i person tell you how "advanced" their operating system is compared to everything else – sure, there's some theoretical truth to that, but in practical terms it is more true in the past than in the present
Not just security, but CPU hotplugging and resuming as if nothing happened.
I'm not sure hotpugging is still there.
At least partially - the technical introduction for the Z17 says that several items can be concurrently maintained (IBM-speak for hot-swapped). So far as major items like the processing units - maybe (still reading).
(3.4mb PDF) https://www.redbooks.ibm.com/redbooks/pdfs/sg248580.pdf
> Four Power Supply Units (PSUs) that provide power to the CPC drawer and are accessible from the rear. Loss of one PSU leaves enough power to satisfy the power requirements of the entire drawer. The PSUs can be concurrently maintained
I would think their customers would demand zero downtime. And hey - if BeOS could power down CPUs almost 30 years ago I would expect a modern mainframe to be able to do it.
All servers have hot-swap power supplies.
(I'm pretty sure BeOS never actually powered off CPUs; it just didn't schedule anything. Linux "hotplug" works the same way today.)
The Unisys Clearpath/MCP runs on Xeons, so I don't think there is CPU hot-swapping.
I don't know about physically removing a drawer, but on IBM Z, if there is a un-recoverable processor error, it will be shut down, and another spare processor brought on-line to take over, transparently.
I don't know how licensing/costs ties into the CPU/RAM spares.
> The Unisys Clearpath/MCP runs on Xeons, so I don't think there is CPU hot-swapping.
Nowadays it is a software emulator that runs under Linux – so if the Linux kernel and hardware you are running it on supports CPU hot-swap, then the underlying OS will. I believe at one point Unisys would only let you run it on their own branded x86 servers, they now let people run it in Azure, and I'm sure Microsoft isn't using Unisys hardware for that.
Running Linux in a VM, the hypervisor can implement hotplug whether or not the underlying server hardware physically does. Of course, without physical CPU hot-swap, it may not add much to reliability, but it still can potentially help with scaling.
If you hotplug a new virtual CPU into the Linux VM, you'd probably want to hotplug another emulated mainframe CPU into the mainframe CPU emulator at the same time. No idea if Unisys actually supports that, but they easily could, it is just software – the Linux kernel sends a udev event to userspace when CPUs are plugged/unplugged, and you could use that to propagate the same event into the mainframe emulator.
> Who uses mainframes nowadays and what for?
Large institutions (corporations, governments) that have existed more than a couple decades, and have large-scale mission-critical batch processes that run on them, where the core function is relatively consistent over time. Very few, if any, new processes are automated on mainframes most of these places, and even new requirements for the processes that depend on the mainframe may be built in other systems that process data before or after the mainframe workflows, but the cost and risk of replacing the well-known, finely-tuned-by-years of ironing out misbehavior, battle-tested systems often isn't warranted without some large scale requirements change that invalidates the basic premises of the system. So, they stay around.
> and have large-scale mission-critical batch processes that run on them, where the core function is relatively consistent over time.
I think most mainframe applications involve OLTP not just batch processing – commonly a hybrid of both. e.g. around a decade ago I did some work for a large bank which used CSC/DXC/Luxsoft Hogan as their core banking system – that's a COBOL application that runs under IBM CICS for transaction processing, although I'm sure it had some batch jobs in it too.
(I myself didn't touch any of the mainframe stuff, I was actually helping the project to migrate some of its functions to a Java-based solution that ran on Linux – no idea what its current status of it all is a decade on.)
Thanks for that and yeah that fits with what I've found, mostly continuation of legacy, critical systems that were built on mainframes. It just seems shocking to me the amount of investments IBM still puts on developing those machines given that no one seems to want to use them anymore?
It feels like I must be missing something, or maybe just underestimating how much money is involved in this legacy business.
IBM mainframes are extremely profitable. There are ~1,000 customers who cannot migrate off mainframes and are willing to spend almost any amount to keep their apps working. Mainframe profits subsidize all of IBM's other money-losing divisions like cloud.
> customers who cannot migrate off mainframes and are willing to spend almost any amount to keep their apps working.
They all can migrate their apps off the mainframes. It's just that it's cheaper to continue paying for the machines.
Australia's social security system is implemented using Model 204 – that's a database combined with transactional 4GL, that runs under z/OS, now sold by Rocket software. It was quite popular in the 1980s, but now I believe there are only two customers left on the planet – the Australian federal government, and the US federal government (although possibly they have more than one customer within the US federal government–Veterans Affairs is one, but maybe there are one or two more.)
A couple of years ago, Australia cancelled its multi-billion dollar project to move off it on to a Linux-based solution (Pegasystems), after having spent over US$120 million on it. The problem was that although the new system did the calculations correctly, it took minutes to process a single individual, something the mainframe could do in 2 seconds.
But, I'm 100% sure this was nothing to do with the inherent nature of the problem they were trying they were trying to solve – I think it was likely because they'd picked the wrong off-the-shelf product as a replacement and the wrong team to implement it, and if they'd gone with different vendors it could well have succeeded – but after spending years and over US$100 million to discover that, they aren't keen to try again.
While new processes don't often get created for mainframes (most of the tasks it is particulary suited for have already been automated by this point, and the market of companies doing those tasks is pretty statics) outside of bolting on more modern interfaces to existing backend, one thing that is often ignored in answering the "why mainframes" question is that the overall amount of compute being done on the platform continues to increase. The same old batch and transaction processing programs get fitted with shiny web frontends and more and more business gets sent to those same old programs. So you can end up in a situation where mainframes remain a very stable business that is worth investing in for future upgrades, even while they become more and more "niche" in the eyes of most computer users over time as the number of computers increases at a faster rate than mainframe capacity increase, so the ratio of computer capacity to mainframe capacity is going up, therefore mainframe capacity is growing substantially while also making up less percent of a larger market over time. Then of course there are the margins, which are much higher than they are for x86_64 servers or other common architecture. A slow and steady increase in demand, mixed with outstanding margins, makes for a good business plan.
Many of the big existing mainframe customers already have multiple max capacity models and are pushing them to their limits as web and analytics and AI/ML and a bunch of other factors increase the overall amount of workload finding their way to mainframes. IBM wouldn't be making those brand new generations of 4-frame models with a new larger max capacity if there weren't customers buying them.
given that no one seems to want to use them anymore
According to a 2024 Forrester Research report, mainframe use is as large as it's ever been, and expected to continue to grow.
Reasons include (not from the report) hardware reliability and scalability, and the ability to churn through crypto-style math in a fraction of the time/cost of cloud computing.
Report is paywalled, but someone on HN might have a free link.
"Crypto-style" here, I am guessing--but not entirely certain--from a downstream comment, is intended as cryptographic in the more general sense, and not "cryptocurrency" as "crypto"-alone is often used for these days?
Consider me skeptical. A mainframe has to be the least cost effective "crypto-style" math machine you could imagine.
I'd expect them to be significantly better than the competition in that area given the large part of the CPU that is a dedicated coprocessor specifically for crypto (called CPACF). There is much less area given to crypto on x86_64.
Of course we are talking about encryption here. TLS and AES etc etc. Not Bitcoin mining, which would indeed not be very cost effective.
They have a lot of cryptographic functionality built directly into hardware, and IBM is touting quantum resistant cryptography as one of the key features. You won't mine Bitcoin on one, but, if you are concerned a bad actor could get a memory or disk dump of your system and store it until quantum computers become practical, IBM says they have your back.
Those analyst reports are kind of bought and paid for by vendors BTW.
All these legacy answers don't really make sense for this Z17... it's a new mainframe supporting up to 64T of memory and specialized cores for matrix math/AI applications. I have a hard time believing that legacy workloads are calling for this kind of hardware.
It also has a ton of high availability features and modularity that _does_ fit with legacy workloads and mainframe expectations, so I'm a little unclear who wants to pay for both sets of features in the same package.
The legacy workloads are in some sense legacy, but in another sense the term legacy is misleading because at larger shops with capacity growth there has been no shortage of modernization of the various frontends and addition of new logic into the existing programs. The application maybe has the same name and core business as it always has, but is still growing in size and under active development. The idea now is that you might as well do the same thing and build in AI inference to those existing applications. Which is something they started implementing first with the last generation Telum chip which added some basic tensor units. This time around they are adding vector extensions 3 (think round 3 kinda like how AVX evolved) and that tensor processing extension 2 to the instruction set. Plus they are selling a discrete chip now that uses those same instructions and connects over PCI, which will probably be more than enough given that the goal is to never train on mainframes and only inference.
I've heard that the AI features are used by banks for fraud detection. I guess some banks are also growing their transaction volume.
I agree that many mainframe workloads are probably not growing so what used to require a whole machine probably fits in a few cores today.
You won't see mainframes doing AI training, but there is a lot of value in being able to do extremely low-latency inference (which is why they have their NPU on the same chip as the CPUs, just a few clock-cycles from the cores) during on-line transaction processing, and less timing-critical inference work on the dedicated cards (which are a few more nanoseconds away).
Additionally, IBM marketing likes the implication that mainframe CPUs are 'the best'. If you can buy a phone with an AI processor, it only makes sense that your mainframe must have one too. (And IBM will probably charge you to use it.)
They aren't the best, they are different, and the best in certain areas. In terms of cache size they are way out in front. That architecture has very very fat cores. That's the main thing that makes them different (not better or worse) than something like x86_64, is the focus on single thread performance and cache rather than more cores. If you have a transaction processing app that you can't shard and that scales mostly with single thread perf, then they are pretty well suited for that use case.
If I were a bank, I'd order one of those and put all the financial modelling and prediction load on it. Like real time analysis to approve/deny loans, do projections for deciding what to do in slower moving financial products, predicting some future looking scenarios on wider markets, etc. etc. simultaneously.
That thing is dreadnought matmul machine with some serious uptime, and can crunch numbers without slowing down or losing availability.
or, possibly, you can implement a massively parallel version of WOPR/Joshua on it and let it rip scenarios for you. Just don't connect to live systems (not entirely joking, though).
P.S.: I'd name that version of the Joshua as JS2/WARP.
> I don't mean this in a condescending way at all, but really out of sheer curiosity: Who uses mainframes nowadays and what for?
There's probably some minor strategic relevance here. E.g. for the government which has some workloads (research labs, etc.) that suit these systems, it's probably a decent idea not to try and migrate over to differently-shaped compute just to keep this CPU IP and its dev teams alive at IBM, to make sure that the US stays capable of whipping up high-performance CPU core designs even if Intel/AMD/Apple falter.
Those customers don't use mainframe, they use POWER. There's been a handful of POWER supercomputers in the past decade built for essentially that reason.
POWER is not uncommon in HPC, but IBMi (which is very enterprisey) is also based on POWER. You won't find IBM mainframes in HPC, but that's because HPC is not as sensitive for latency and reliably than online transaction processing, and, with mainframes, you are paying for that, not for TFLOPS.
Yeah the architectures focus on different things. The huge caches on the z series chips are designed primarily around the kind of latency sensitive workloads more common in finance than the massive floating point throughput often needed for big scientific computing.
I understand that a company I work with uses a few, and is migrating away from them.
It seems clear to me that prior to robust systems for orchestrating across multiple servers that you would install a mainframe to handle massive transactional workloads.
What I can never seem to wrap my head around is if there are still applications out there in typical business settings where a massive machine like this is still a technical requirement of applications/processes or if it's just because the costs of switching are monumental.
I'd love to understand as well!
There are really applications that are large enough and hard enough to parallelize and/or shard that any rewrite on a different platform would turn into a performance catastrophe, even if you hired the best engineers and wrote the whole thing in very efficient C++, which you never hear about them doing because its usually only about saving money and so it's done with as cheap of developers as possible and in Java. I've seen it and it's not pretty. They tend to be dog slow and unresponsive, and even harder to maintain than the crusty old assembler and COBOL, because you have to implement a lot of the report writing and record crunching features built into a domain specific language like COBOL from scratch it you want to write the same application in Java.
That's my biggest pet peeve with people that want to ditch mainframes, which is that they seem to care very little about the quality and performance of the software in my experience or they would only be thinking of replacing COBOL and Assembler code with an equivalently performant modern language and dialect. The desire to migrate is often driven primarily to have cheap, easily replaceable developers.
Bank payment processing is the primary example - being able to tell if a specific transaction is or not fraudulent in less than 100 milliseconds - but there are other businesses with similar requirements. Healthcare is one of them, and fraud detection is getting a lot more sophisticated with the on-chip NPUs within the same latency constraints.
The funny thing is that if they spun out half of the mainframe thing into something they could compete with people might actually buy them.
Most firms have so-so software, in need of ultra reliable hardware, not everyone is google
Cloud is basically an infinitely scalable mainframe. You have dedicated compute optimised for specific workloads, and you string these together in a configuration that makes sense for your requirements.
If you understand the benefits of cloud over generic x86 compute, then you understand mainframes.
Cloud is mainframes gone full circle.
> Cloud is mainframes gone full circle.
Except that now you need to develop the software that gives mainframes their famed reliability yourself. The applications are very different: software developed for cloud always needs to know that part of the system might become unavailable and work around that. A lot of the stack, from the cluster management ensuring a failed node gets replaced and the processes running on them are spun up on another node, all the way up to your code that retries failed operations, needs to be there if you aim for highly reliable apps. With mainframes, you just pretend the computer is perfect and never fails (some exaggeration here, but not much).
Also, reliability is just one aspect - another impressive feature is their observability features. Mainframes used to be the cloud back then and you can trace resource usage with exquisite detail, because we used to bill clients by CPU cycle. Add to that the hardware reliability features built-in (for instance, IBM mainframes have memory in RAID-like arrays).
But latency
The cache design in the Z is so different from cloud computing for collaborative job processes.
Pretty much every fortune 500 that's been around for more than 30 years. Batch processing primarily - from closing their books to order processing, differs by company. But if you ask the right person, they'll tell you where it's at.
Fortune 500? More like Fortune 50000 (ok, exaggeration). But there are so many banks in the world, and their automation can run back to the 1950s. They are only slowly moving away from mainframes, if only because a rewrite of a complex system that nobody understands is tough, and possibly very costly if it is the key to billions of euros/dollars/...
That's all true, but these machines often run java code. That's something to contemplate.
IBM prices processors differently for running COBOL and Java - if you run mostly Java code, your monthly licensing fees will be vastly different. On boot, the service elements (their BMCs - on modern machines they are x86 boxes running Linux) loads microcode in accordance to a saved configuration - some CPUs will run z/OS, some will run z/OS and COBOL apps, some will run Java, some will run z/VM, some will run Linux. This is all configured on the computer whose job is to bring up the big metal (first the power, then the cooling, and only then, the compute). Under everything on the mainframe side is the PR/SM hypervisor, which is, IIRC, what manages LPARS (logical partitions, completely isolated environments sharing the same hardware). The cheapest licensing is Linux under a custom z/VM (they aren't called z but LinuxONE), and the most expensive is the ability to run z/OS and COBOL code. Running Java under z/OS is somewhat cheaper. Last time I saw it, it was very complicated.
From what I've read, 70% of the Fortune 500 do.
Here's a brochure that might be useful to read:
https://www.ibm.com/downloads/documents/us-en/107a02e95d48f8...
It's an IBM brochure, so naturally it's pumping mainframes, but it still has lots of interesting facts in it.
Everyone's predisposed that "mainframes are obsolete", but why not use a mainframe?
I mean, no one except for banks can afford one, let alone make back on opex or capex, and so we all resort to MySQL on Linux, but isn't the cost the only problem with them?
> no one except for banks can afford one
Banks smaller than the big ~5 in the US cannot afford anything when it comes to IT infrastructure.
I am not aware of a single state/regional bank that wants to have their IBM on premise anymore - at any cost. Most of these customers go through multiple layers of 3rd party indirection and pay one of ~3 gigantic banking service vendors for access to hosted core services on top of IBM.
Despite the wildly ramping complexity of working with 3rd parties, banks still universally prefer this over the idea of rewriting the core software for commodity systems.
> I mean, no one except for banks can afford one,
A Rockhopper 4 Express, a z16 without z/OS support (running Linux) was in the mid 6 digits. It's small enough to co-locate on a rack with storage nodes. While z/OS will want IBM storage, Linux is much less picky.
IBM won't release the numbers, but I am sure it can host more workloads in the same chassis than the average 6-digit Dell or Lenovo.
Density is also bad. You spend 4 full racks and get 208 cores. Sure, they might be the fastest cores around, but that gets you only so far when even off-the-shelf Dell server has 2x128-192 cores in 1U server. Similarly 64 TB of RAM is a lot, but that same Dell can have 3 TB of RAM. If I'm reading the specs correctly (they are bit confusing), z17 has only 25G networking (48 ports); the Dell I'm checking can have 8x200G network ports and can also do 400G networking. So the single 1U commodity server has more network bandwidth than the entire 4 rack z system.
Sure, there will be lot of overhead in having tens-hundreds of servers vs single system, but for lots of workloads it is manageable and certainly worth the tradeoff.
> Dell server has 2x128-192 cores in 1U server.
Can you replace 25% of your cores without stopping the machine?
> that same Dell can have 3 TB of RAM.
How does it deal with a faulty memory module? Or two? Does it notice the issue before a process crashes?
> z17 has only 25G networking
They have up to 12 IO drawers with 20 slots each. I think the 48 ports you got are on the built-in switch.
You can't compare cores like that.
Take a look at the cache size on the Telum II, or better yet look at a die shot and do some measuring of the cores. Then consider that mainframe workloads are latency sensitive and those workloads tend to need to scale vertically as long as possible.
The goal is not to rent out as many vCPUs as possible (a busines model in which you benefit greatly by having lots and lots of small cores on your chip). The goal for zArch chips is to have the biggest cores possible with as much area used for cache as possible. This is antithetical to maximizing core density, and so you will find that each dual chip module is absolutely enormous, and that each core takes up more area in the zArch chips than in x86_64 chips, and that those chips therefore have significantly less core density.
The end result is likely that the zArch chips are going to have much higher single thread perf. Whereas they will probably get smacked by say a Threadripper on multithreaded workload where you are optimizing for throughout. This is ignoring intricacies about vectorizatiln and what can / can't be accelerated and whether or not you want binary or decimal floating point and other details and is a broad generalization about the two architectural general performance characteristics.
Likewise, the same applies for networking. Mainframe apps are not bottlenecking on bandwidth. They are way less likely to be web servers dishing out media for instance.
I really dislike seeing architectures compared via such frivolous metrics because it demonstrates a big misunderstanding of just how complex modern CPU designs are.
The realiability of a mainframe surpasses a Dell server by a huge gap.
Not the only problem with them. Not as easy to find skilled staff to operate them. Also, you become completely dependent on IBM (not fully terrible -- it's a single throat to choke when things go wrong).
It's hard to choke someone's throat when they already hold you by the balls.
>Who uses mainframes nowadays and what for?
Do you have a credit card? Do you bank in the USA? If you answered "yes" to either of the above questions, you interact indirectly with a mainframe.
By that, do you mean banks, payment networks or both? And I guess I'd be curious as to why mainframes versus the rest. It seems like the answer for "why" is mainly because it started on mainframes and the cost of switching is really high, but I wonder if there isn't more to it.
Edit: Oh yeah, just saw MasterCard has some job posting for IBM Mainframe/COBOL positions. Fascinating.
Core transactions and ledgers run on these. Bank I work for now (contracting job) uses them as their core system.
Both. Mainframes though are incredibly good for both I/O and uptime.
Yeah, Linux/Unix are way better on both than they used to be, but on a mainframe, it's just a totally different level.
You can run Linux on mainframes fine. RHEL has first-class support for s390x / Z.
Not just that. Most operating systems lie about when an IO transaction completes for performance reasons. So if you lose power or the IO device dies you still think it succeeded. A mainframe doesn't do that... it also multiplexes the IO so it happens more than once so if one adapter fails it keeps going. The resiliency is the main use case in many situations. That said IME 99.995% of use cases don't need a mainframe. They just don't need to be that reliable if they can fail gracefully.
Does IBM mainframes still have the pricing model where you "buy" hardware and then still pay IBM for main processing?
(Where you can save money buying Linux or Java accelerators to run things on for free
I think what you are referring to is the "sub capacity" pricing model wherein a rolling average of resource consumption is used for billing. They've transitioned to newer models circa cloud technology, but it's mostly the same idea with more moving parts.
The advantage of this model from a business operations standpoint is that you don't have to think about a single piece of hardware related to the mainframe. IBM will come out automagically and fix issues for you when the mainframe phones home about a problem. A majority of the system architecture is designed on purpose to enable seamless servicing of the machine without impacting operations.
https://www.ibm.com/z/pricing
> IBM will come out automagically and fix issues for you when the mainframe phones home about a problem. A majority of the system architecture is designed on purpose to enable seamless servicing of the machine without impacting
I'd rather have a fault-tolerant distributed software system running on commodity hardware, that way there's a plurality of hardware and support vendors to choose from. No lock-in.
> I'd rather have a fault-tolerant distributed software system running on commodity hardware, that way there's a plurality of hardware and support vendors to choose from. No lock-in.
But then you'd have to develop it yourself. IBM has been doing just that for 60 years (on the 360 and its descendants).
> distributed software system
What if the business demands a certain level of serialized transaction throughput that is incompatible with ideas like paxos?
You will never beat one fast machine at a serialized narrative, and it just so happens that most serious businesses require these semantics.
How much does downtime cost you per hour? What are the consequences if your services become unavailable?
That's like toy drone company trying to compete with DARPA. Not even close.
These kinds of monsters run under critical environments such as airports, with AS400 or similar terminals being used by secretaries. These kind of workloads, reliability, security, testing, are no joke. At all. This is not your general purpose Unix machine.
How much does it cost? I'm just curious. No, I don't want to book a meeting to "discuss" it.
I haven't worked with mainframes since the z10, but back then you could get into an entry model for about $100k.
Though the sky is the limit. The typical machine I would order had a list price of about 1 million. Of course no one pays list. Discounts can be pretty substantial depending on how much business you do with IBM or how bad they want to get your business.
The big problem is that everything in IBM-z world is negotiated, and often covered by NDAs. The pricing is complicated by which operating systems and what sort of workloads you'll be running, and what service level guarantees you need. The only published pricing in the entire life of the IBM 360/370/390/z-series line was the Linux One when it was first released... Hardware plus OS, excluding storage, was $70k on the low end.
Previous generation machines that came off-lease used to be listed on IBM's web site. You could have a fully-maxed-out previous-generation machine for under $250k. Fifteen years ago I was able to get ballpark pricing for a fully-maxed-out new machine, and it was "over a million, but less than two million, and closer to the low end". That being said, the machines are often leased.
If you go with z/vm or z/vse, the OS and softare is typically sold under terms that are pretty much like normal software, except it varies depending on the capacity level of the machine, which may be less than the physical number of CPUs in the machine, since that is a thing in IBM-land.
If you go for z/os, welcome to the world of metered billing. You're looking at tens of thousands of dollars in MRC just to get started, and if you're running the exact wrong mix of everything, you'll be spending millions just on software each month. There's a whole industry that revolves around managing these expenses. Still less complicated than The Cloud.
You can get a software emulator for free and run it on a PC! It's quite robust and used by IBM's own developers. https://en.wikipedia.org/wiki/Hercules_(emulator)
Hercules is _not_ used by IBM's own developers. Being found with Hercules on your computer at IBM gets you in trouble. I know people who work on mainframe-related stuff inside IBM and they steer well clear of Hercules. And I've heard that IBM's computer monitoring stuff (antivirus, asset protection, etc.) looks for Hercules and flags it.
But IBM _does_ have their own mainframe emulator, zPDT (z Personal Development Tool), sold to their customers for dev and testing (under the name zD&T -- z Development and Test), and to ISVs under their ISV program. That's what IBM's own developers would be using if they're doing stuff under emulation instead of LPARs on real hardware.
(And IBM's emulator is significantly faster than Hercules, FWIW, but overall less feature-full and lacks all of the support Hercules has for older architectures, more device types, etc.)
> looks for Hercules and flags it.
There was some of a legal fight between IBM and Turbo Hercules SSA, a company that tried to force IBM to license z/OS to their users. IBM has been holding a grudge ever since (probably at the advice of their legal).
You can run the emulator but you will not get your hands on new versions of the operating system to run on it. But there are old versions that you can get your hand on.
You might be able to get your hands on a recent z/OS, but IBM will not be pleased.
You don't buy a mainframe, it's consumption based pricing. They aren't just going to list a price, because they need to size the hardware to what they think the workload will be.
Could they just list prices? Sure. Will they ever do it? No.
A Rockhopper 4 Express starts at $135,000. While technically a mainframe, it won't run z/OS.
It depends on how full those drawers are. $250k to $1m would be the typical price range.
It's easier and harder at the same time to buy older hardware. That's half the challenge though because the software is strictly licensed and you pay per MIPS.
Here's a kid who bought a mainframe and then brought it up:
https://www.youtube.com/watch?v=45X4VP8CGtk
If you have to ask, you can't afford it!
I think leasing is more common.
It's probably impossible to say because of the service contracts that come with it. Nobody would buy one brand new and not pay for support and consulting too.
Accounting is a key aspect as well. A lot of would-be capex that turns into opex that way.
I'm completely fascinated by the diagram. In a four rack system, 2.5 rack is dedicated to I/O, half a rack is just empty and the remaining is the actual processing and memory.
The I/O probably isn't endless networking adaptors, so what is it?
https://www.redbooks.ibm.com/abstracts/sg248579.html:
“The IBM z17 supports a PCIe I/O infrastructure. PCIe features are installed in PCIe+ I/O drawers. Up to 12 I/O drawers per IBM z17 can be ordered, which allows for up to 192 PCIe I/O and special purpose features.
For a four CPC drawer system, up to 48 PCIe+ fan-out slots can be populated with fan-out cards for data communications between the CPC drawers and the I/O infrastructure, and for coupling. The multiple channel subsystem (CSS) architecture allows up to six CSSs, each with 256 channels.
The IBM z17 implements PCIe Generation 5 (PCIe+ Gen5), which is used to connect the PCIe Generation 4 (PCIe+ Gen4) dual port fan-out features in the CPC drawers. The I/O infrastructure is designed to reduce processor usage and I/O latency, and provide increased throughput and availability.”
There's also the problem in that they need to take into account floor loading. They're not going to tell a customer upgrading from an older machine to a new one that, "oh, by the way, the rack weighs twice what it used to, so you'll need to upgrade your floor while you're at it." Especially important for raised floors.
Probably channels. In an IBM mainframe, each I/O device is connected on its own channel, which is actually a separate computer that handles the transfer to/from the main CPU. This has been the case going back to the System/360, which is why mainframes are legendary for their transaction throughput and reliability. There's probably a lot of redundancy in the I/O hardware, as they have to be rock solid and hot swappable while the system is running.
Could be storage, networking, crypto HSM, or cluster interconnect. See page 28 on https://www.redbooks.ibm.com/redbooks/pdfs/sg248580.pdf
I always enjoy reading those IBM Redbooks and learning about the technical details of these mainframe systems.
Reading about mainframes feels very much like reading science fiction. Truly awesome technology that exists on a completely different plane of computing than anything else.
They elevate hardware design to a fine art - everything is carefully balanced.
z Systems have always been amazing engineering feats. Too bad adopting it comes with a gargantuan amount of... IBM.
Yeah a few years ago there was a Talos (I think) desktop motherboard that had a POWER 8 cpu in it. It was expensive due to low production runs but I wish it had taken off. I think IBM is up to power 9 now, but I doubt if there are any personal motherboards for it.
> IBM is up to power 9 now, but I doubt if there are any personal motherboards for it.
The Talos II:
https://wiki.raptorcs.com/wiki/Talos_II
> EATX form factor > Two POWER9-compatible CPU sockets accepting 4-/8-/18- or 22-core Sforza CPUs
"Entry" level is $5,800 USD.
There won't be a POWER10 version from them because of proprietary bits required
https://www.talospace.com/2023/10/the-next-raptor-openpower-...
> POWER10, however, contained closed firmware for the off-chip OMI DRAM bridge and on-chip PPE I/O processor, which meant that the principled team at Raptor resolutely said no to building POWER10 workstations, even though they wanted to.
https://www.osnews.com/story/137555/ibm-hints-at-power11-hop...
What are some of the reasons to buy or use these over Intel or AMD?
I bought mine - I own two Talos systems and a Blackbird - because I want choices in ISA, and if people want that, they'll need to spend the money. (It helps that I'm a notorious pro-PowerPC bigot, having used POWER since the RS/6000 MicroChannel days.) Other than ARM, they're the only architecture with both general OS support and processing power in the same ballpark, and while IBM may sometimes be dunderheaded, they aren't going anywhere, either.
They aren't cheap and they aren't for everyone. But it meets my needs and it puts my money where my mouth is.
If I understand you correctly it means the primary use case is security applications that need transparency at all levels on a system that is roughly equivalent to mainstream platform performance features. Is that accurate?
As I see it, these systems have two potential markets (some natural overlap exists): the first being people like me who don't want to necessarily feed the x86-64 beast if they can avoid it, and the second indeed being people who want a completely auditable, transparent platform. In both cases to be a practical alternative the computer needs to have similar performance to what you'd get from a modern desktop, and while POWER9 is no longer cutting edge, it still generally delivers.
If I could afford one I would, not because of security, but just the geek in me finds it cool.
Back in the 90s and early 2000s, there were several non-x86 architectures that were more powerful, and even 64 bit long before Intel ever did. The DEC alpha, SPARC, and others. I was also too poor to afford those back then but I remember them fondly.
> even 64 bit long before Intel ever did.
At some point I was reading e-mail on a 64-bit SGI machine while we waited for the Dell the company ordered for me to arrive.
The day it came in was one of the saddest days of my life.
I believe Intel and AMD motherboards all have proprietary firmware and the Talos systems are puritanically open. (But the processor itself is closed so there's that. Could have gone with SPARC Niagara which was open sourced https://www.cnet.com/tech/computing/sun-open-sources-second-... )
> But the processor itself is closed
The microarch is closed and IBM-specific. However, the ISA is open and royalty-free, and the on-chip firmware is open source and you can build it yourself. In this sense it's at least as open as, say, many RISC-V implementations.
The Sun Niagara 2 even has the Verilog RTL available. That is several orders of magnitude more open than the IBM.
> Verilog RTL for OpenSPARC T2 design
> Verification environment for OpenSPARC T2
> Diagnostics tests for OpenSPARC T2
> Scripts and Sun internal tools needed to simulate the design and to do synthesis of the design
> Open source tools needed to simulate the design
https://www.oracle.com/servers/technologies/opensparc-t2-pag...
Unfortunately, going from there to an affordable chip with reasonable performance doesn't seem possible.
Well you could just buy a T2 chip, but I don't know that Sun would sell them outside of an entire system.
That's Raptor Computing Systems (www.raptorcs.com) now selling Talos II workstations.
Being somewhat pedantic, Power and z are completely different architectures.
Completely? Isn't Z just microcoded on top of POWER under the hood?
No. The microarchitectures have some notable similarities and cross-pollinate each other, but they are distinct.
You may be thinking of IBM i (formerly known as AS/400 and i5), which has a completely abstracted instruction set that on modern systems is internally recompiled to Power.
No, Power and Telum processors are very different internally.
I dunno, but the z-processors and the POWER processors look a lot different even from a floor plan / die shot perspective. The former also clock much higher. Doesn't smell like microcode to me.
To be fair, AWS copied the concept of mainframe with x86 and "commodity" hardware, including the consumption based billing.
They forgot to copy the 99.99999% availability. :-/
I really don't understand why in 4 racks you can have only 4 CPU drawers and 12 I/O drawers. This seems like their IO is incredibly inefficient.
Have you seen the size of those drawers? A single rack can only fit five of them, and you still would need to add processing, power/UPS, and cooling.
The CPU drawers are 5U while the I/O drawers are 8U. https://higherlogicdownload.s3.amazonaws.com/IMWUC/UploadedI...