Upgrading Uber's MySQL Fleet

236 points by benocodes a year ago

remon a year ago

Impressive numbers at a glance but that boils down to ~140qps which is between one and two orders of magnitude below what you'd expect a normal MySQL node typically would serve. Obviously average execution time is mostly a function of the complexity of the query but based on Uber's business I can't really see what sort of non-normative queries they'd run at volume (e.g. for their customer facing apps). Uber's infra runs on Amazon AWS afaik and even taking some level of volume discount into account they're burning many millions of USD on some combination of overcapacity or suboptimal querying/caching strategies.

aseipp a year ago

Dividing the fleet QPS by the number of nodes is completely meaningless because it assumes that queries are distributed evenly across every part of the system and that every part of the system is uniform (e.g. it is unclear what the read/write patterns are, proportion of these nodes are read replicas or hot standbys, if their sizing and configuration are the same). That isn't realistic at all. I would guess it is extremely likely that hot subsets of these clusters, depending on the use case, see anywhere from 1 to 4 orders of magnitude higher QPS than your guess, probably on a near constant basis.
Don't get me wrong, a lot of people have talked about Uber doing overengineering in weird ways, maybe they're even completely right. But being like "Well, obviously x/y = z, and z is rather small, therefore it's not impressive, isn't this obvious?" is the computer programming equivalent of the "econ 101 student says supply and demand explain everything" phenomenon. It's not an accurate characterization of the system at all and falls prey to the very thing you're alluding to ("this is obvious.")
- 0cf8612b2e1e a year ago
  
  Simple enough just to think about localities and time of day. New York during Tuesday rush hour could be more load than all of North Dakota sees in a month. Even busy cities probably drop down to nothing on a weekday at 3am.
Twirrim a year ago

They're not on AWS. They use on-prem and are migrating to Google and Oracle clouds.
https://www.forbes.com/sites/danielnewman/2023/02/21/uber-go...
Jgrubb a year ago

See, the problem is that the people who care about cost performance and the people who care about UX performance are rarely the same people, and often neither side is empowered with the data or experience they need to bridge the gap.
- bushbaba a year ago
  
  Hardware is cheap relative to salaries. It might take 1 engineer 1 quarter to optimize. Compare that to a few thousand per server.
  
  Jgrubb a year ago
  
  Ok but we're in a thread about Ubers cloud bills, which are probably well into the 9 figures annually. It definitely gets talked about in board meetings.
  Global public cloud spend is hundreds of billions of dollars a year. I wouldn't be surprised if it's AWS's marketing team that came up with the talking point about how much more expensive developer time is.
  Edit: put this another way- wherever you work, you might know what parts of the architecture need some performance work but do you know what parts of the architecture cost the most money?
  
  JackSlateur a year ago
  
  A couple of years ago, I optimize some shit and reduced the annual billing of 150k€/y, for a 3 days of work
  I might say, "hardware" is expensive compared to (my) salary :)
  
  notyourwork a year ago
  
  There isn’t always low hanging fruit. And when there is, it likely requires engineering knowledge to know it exists.
  
  sgarland a year ago
  
  There almost always is, actually. If you’re in the cloud and aren’t a tiny startup, that means you’ve had team[s] building your infrastructure, probably led by devs at some point.
  It doesn’t take engineering knowledge to browse through CloudWatch metrics and see that your average CPU utilization is in the single digits.
  
  sgarland a year ago
  
  It might take an engineer with no prior RDBMS knowledge a quarter to be able to optimize a DB for their use case, but then it’s effectively free. You found the optimal parameters to use for writer nodes? Great, roll that out to the fleet.
nunez a year ago

Didn't realize their entire MySQL data layer runs in AWS. Given that they went with basically a blue-green update strategy, this was, essentially a "witness our cloud spend" kind of post.
- pocket_cheese a year ago
  
  They're not. Almost all of their infra was on prem when I worked there 3 years ago.
  
  remon a year ago
  
  It's neither. I remember them moving to the cloud but apparently they moved to Google/Oracle (the latter making this article particularly interesting btw). As per the relevant press release : "It’s understood that Uber will close down its own on-premises data centers and move the entirety of its information technology workloads to Oracle and Google Cloud."

remon a year ago

It's sort of funny how can you immediately tell it's LLM sanitized/rewritten.

jdbdndj a year ago

It reads like any of those tech blogs, using big words where not strictly necessary but also not wrong
Don't know about your LLM feeling
- est31 a year ago
  
  It contains the word "delve", a word that got way more popular in use since the introduction of LLMs.
  Also this paragraph sounds a lot like it has been written by LLMs, it's over-expressive:
  We systematically advanced through each tier, commencing from tier 5 and descending to tier 0. At every tier, we organized the clusters into manageable batches, ensuring a systematic and controlled transition process. Before embarking on each stage of the version upgrade, we actively involved the on-call teams responsible for each cluster, fostering collaboration and ensuring comprehensive oversight.
  The paragraph uses "commencing from" together with "descending to". People would probably write something like "starting with". It shows how the LLM has no spatial understanding: tier 0 is not below or above tier 5, especially as the text has not introduced any such spatial ordering previously. And it gets worse: there is no prior mention of the word "tier" in the blog post. The earlier text speaks of stages, and lists 5 steps (without giving them any name, but the standard term is more like "step" instead of "tier").
  There is more signs like "embark", or that specific use of "fostering collaboration" which goes beyond corporate-speak, it also sounds a lot like what an LLM would say. Apparently "safeguard" is also a word LLMs write very often.
  
  wongarsu a year ago
  
  It doesn't get much better if you translate that paragraph from corpo speak to normal language: "We did the upgrade step by step. We did each step in batches. After we already decided how we were going to upgrade the clusters but before actually doing it we asked the teams responsible for keeping the clusters running for their opinion. This helped create an environment where we work together and helped monitoring the process"
  I'm sure there are people who write like that. LLMs have to get it from somewhere. But that part especially is mostly empty phrases, and the meaning that is there isn't all that flattering
  
  Groxx a year ago
  
  People write like that to sound good to higher-ups who don't understand what's going on underneath.
  There's A LOT of that kind of content to learn from. A brief glance at LinkedIn is all you need.
  
  zx76 a year ago
  
  Relevant pg thread on twitter: https://x.com/paulg/status/1777030573220933716
- maeil a year ago
  
  This [1] is a good piece on it. Here's [2] anorher good one.
  We don't just carry out a MySQL upgrade, oh no. We embark on a significant journey. We don't have reasons, but compelling factors. And then, we use compelling again soon after when describing how "MySQL v8.0 offered a compelling proposition with its promise of substantial performance enhancements", just as any human meatbag would.
  [1] https://www.latimes.com/socal/daily-pilot/opinion/story/2024...
  [2] https://english.elpais.com/science-tech/2024-04-25/excessive...
  
  sroussey a year ago
  
  If the meatbag was a salesperson though… very believable! ;)
- remon a year ago
  
  Nah this isn't a big word salad issue. The content is fine. It's just clearly a text written by humans and then rewritten by an LLM, potentially due to the original author(s) not being native speakers. If you feel it's natural English that's fine too ;)
- exe34 a year ago
  
  I always thought 90% of what management wrote/said could be replaced by a RNN, and nowadays LLMs do even better!
1f60c a year ago

I got that feeling as well. In addition, I suspect it was originally written for an internal audience and adapted for the 'blog because the references to SLOs and SLAs don't really make sense in the context of external Uber customers.
aprilthird2021 a year ago

Let's delve into why you think that
- fs0c13ty00 a year ago
  
  It's simple. Human writing is short and to the point (either because they're lazy or want to save the reader's time), yet still manages to capture your attention. AI writing tends to be too elaborate and lacks a sense of "self".
  I feel like this article challenges my patience and attention too much, there is really no need to focus on the pros of upgrading here. We reader just want to know how they managed to upgrade at that large scale, challenges they faced and how the solved them. Not to mention any sane tech writers that value their time wouldn't write this much.
  
  wisemang a year ago
  
  > Human writing is short and to the point (either because they're lazy or want to save the reader's time)
  Good human writing is short and to the point. (Technical writing at least.) But this is not a result of laziness — it’s actually more difficult.
  “If I had more time, I would have written a shorter letter.” - Blaise Pascal, and probably others [0]
  In any case I find these LLM “gotcha” comments incredibly tedious.
  [0] https://quoteinvestigator.com/2012/04/28/shorter-letter/?amp...
  
  peppermint_gum a year ago
  
  >It's simple. Human writing is short and to the point (either because they're lazy or want to save the reader's time), yet still manages to capture your attention. AI writing tends to be too elaborate and lacks a sense of "self".
  Corporate (and SEO) writing has always been overly verbose and tried to sound fancy. In fact, this probably is where LLMs learned that style. There's no reliable heuristic to tell human- and AI-writing apart.
  There's a lot of worry about people being fooled by AI fakes, but I'm also worried about false positives, people seeing "AI" everywhere. In fact, this is already happening in the art communities, with accusations flying left and right.
  People are too confident in their heuristics. "You are using whole sentences? Bot!" I fear this will make people simplify their writing style to avoid the accussations, which won't really accomplish anything, because AIs already can be prompted to avoid the default word-salad style.
  I miss the time before LLMs...
  
  vundercind a year ago
  
  > Not to mention any sane tech writers that value their time wouldn't write this much.
  This is a big part of why the tech is so damn corrosive, even in well-meaning use, let alone its lopsided benefits for bad actors.
  Even on the “small” and more-private side of life, it’s tempting to use it to e.g. spit out a polished narrative version of your bullet-point summary of your players’ last RPG session, but then do you go cut it back down to something reasonable? No, by that point it’s about as much work as just writing it yourself in the first place. So the somewhat-too-long version stands.
  The result is that the temptation to generate writing that wasn’t even worth someone’s time to write—which used to act as a fairly effective filter, even if it could be overcome by money—is enormous. So less and less writing is worth the reader’s time.
  As with free long distance calls, sometimes removing friction is mostly bad.
  
  bityard a year ago
  
  My hypothesis is that long form content generated by LLMs tend to sound like blogspam and press releases because those are exactly the kinds of things they were trained on. Most content generated by humans for public consumption is ANYTHING but succinct.
  Their style is much more direct if you just ask them a question or to summarize something. (Although whether the answer is accurate or not is another matter.)
  
  remon a year ago
  
  This. Thank you for verbalizing what I struggled to.
- blackenedgem a year ago
  
  I'm enjoying the replys to this not getting that it's a joke
- Starlevel004 a year ago
  
  every section is just a list in disguise, and gpts LOVE listts
l5870uoo9y a year ago

AI has a preference for dividing everything into sections, especially "Introduction" and "Conclusion" sections.
anitil a year ago

It was hard to read in places because of that, I need to work out a reverse prompt to make it clearer
msoad a year ago

Yeah, I kinda stopped reading when I felt this. Not sure why? The substance is still interesting and worth learning from but knowing LLM wrote it made me feel icky a little bit
- greenavocado a year ago
  
  Scroll to the bottom to see a list of those who claimed to have authored it
bronzekaiser a year ago

Scroll to the bottom and look at the authors Its immediately obvious
- karthikmurkonda a year ago
  
  I don't get it. Why is it so?
cheema33 a year ago

> it's LLM sanitized/rewritten
LLM is the new spellchecker. Soon we'll we will wonder why some people don't use it to sanity check blog posts or any other writing.
And let's be honest, some writings would greatly benefit from a sanity check.
lawrjone a year ago

Yeah I found this really off putting: it’s not possible for you to have several goals that are all ‘paramount’, and the word ‘seamless’ adds nothing in every place it appears!
I wish it didn’t turn me off the content as much as it does but it’s very jarring.

whalesalad a year ago

So satisfying to do a huge upgrade like this and then see the actual proof in the pudding with all the reduced latencies and query times.

hu3 a year ago

Yeah some numbers caught my attention like ~94% reduction in overall database lock time.
And to think they never have to worry about VACUUM. Ahh the peace.
- InsideOutSanta a year ago
  
  As somebody who has always used MySQL, but always been told that I should be using Postgres, I'd love to understand what the issues with VACUUM are, and what I should be aware of when potentially switching databases?
  
  tomnipotent a year ago
  
  MySQL stores table data in a b+ tree where updates modify the data directly in place as transactions are committed, and overwritten data is moved to a secondary undo log to support consistent reads. MySQL indexes store primary keys and queries rely on tree traversal to find the row in the b+ tree, but it can also contain references to rows in the undo log.
  PostgreSQL tables are known as heaps, which consist of slotted pages where new data is written to the first page with sufficient free space. Since it's not a b-tree and you can't resolve a row with just a primary key without a table scan, Postgres uses the physical location of the row called a tuple ID (TID, or item pointer) that contains the page and position (slot) of the row within that page. So the TID (10, 3) tells Postgres the row is in block 10 slot 3 which can be fetched directly from the page buffer or disk without having to do a tree traversal.
  When PostgreSQL updates a row, it doesn’t modify the original data directly. Instead, it:
  1) Writes a new version of the row to a new page 2) Marks the old row as outdated by updating its tuple header and relevant page metadata 3) Updates the visibility map to indicate that the page contains outdated rows 4) Adjusts indexes to point to the new TID of the updated row
  This means that indexes need to be updated even if the column value didn't change.
  Old rows continue to accumulate in the heap until the VACUUM process permanently deletes them, but this process can impact normal operations and cause issues.
  Overall this means Postgres does more disk I/O for the same work as MySQL. The upside is Postgres doesn't have to worry about page splits, so things like bulk inserts can be much more efficient.
  
  sgarland a year ago
  
  > The upside is Postgres doesn't have to worry about page splits, so things like bulk inserts can be much more efficient.
  Not in the heap, but if you have any index on the table (I know, don’t do that for bulk loads, but many don’t / it isn’t feasible sometimes) then you’re still dealing with a B+tree (probably).
  Also, MySQL still gets the nod for pure bulk load speed via MySQLShell’s Parallel Import Utility [0]. You can of course replicate this in Postgres by manually splitting the input file and running multiple \COPY commands, but having a tool do it all in one is lovely.
  [0]: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-uti...
  
  tomnipotent a year ago
  
  > then you’re still dealing with a B+tree
  Absolutely, though they're generally orders of magnitude smaller than the table file unless you're INCLUDE'ing lots of columns.
  There's pg_bulkload which supports parallel writers as well as deferred index updates until the loading process is complete. Not sure how it compares to what MySQL offers out of the box, but I definitely agree that the MySQL tooling ecosystem in general has a leg up.
  
  InsideOutSanta a year ago
  
  That's a perfect explanation, thank you very much!
  
  mjr00 a year ago
  
  Worth reading up on Postgres' MVCC model for concurrency.[0]
  Short version is that VACUUM is needed to clean up dead tuples and reclaim disk space. For most cases with smaller amounts of data, auto-vacuum works totally fine. But I've had issues with tables with 100m+ rows that are frequently updated where auto-vacuum falls behind and stops working completely. These necessitated a full data dump + restore (because we didn't want to double our storage capacity to do a full vacuum). We fixed this by sharding the table and tweaking auto-vacuum to run more frequently, but this isn't stuff you have to worry about in MySQL.
  Honestly if you're a small shop without database/postgres experts and MySQL performance is adequate for you, I wouldn't switch. Newer versions of MySQL have fixed the egregious issues, like silent data truncation on INSERT by default, and it's easier to maintain, in my experience.
  [0] https://www.postgresql.org/docs/current/mvcc-intro.html
  
  williamdclt a year ago
  
  As much as I have gripes with the autovac, I’m surprised at the idea of getting to such a broken state. 100M rows is not small but not huge, how frequent is “frequent updates”? How long ago was that (there’s been a lot of changes in autovac since v9)?
  “Stops working completely” should not be a thing, it could be vacuuming slower than the update frequency (although that’d be surprising) but I don’t know of any reason it’d just stop?
  That being said I’ve also had issues with autovac (on aurora to be fair, couldn’t say if it was aurora-specific) like it running constantly without vacuuming anything, like there was an old transaction idling (there wasn’t)
  
  sgarland a year ago
  
  On decently-sized tables (100,000,000 is, as you say, not small but not huge), if you haven’t tuned cost limiting and/or various parameters for controlling autovacuum workers, it’s entirely possible for it to effectively do nothing, especially if you’re in the cloud with backing disks that have limited IOPS / throughput.
  It continues to baffle me why AWS picks some truly terrible defaults for parameter groups. I understand most of them come from the RDBMS defaults, but AWS has the luxury of knowing precisely how many CPUs and RAM any given instance has. On any decently-sized instance, it should allocate far more memory for maintenance_work_mem, for example.
  
  mjr00 a year ago
  
  It's been a while, but IIRC it was on pg12. "Stopped working completely" I'm basing on the vacuum statistics saying the last auto-vacuum started weeks ago for these tables and never actually finished. Frequent updates means regularly rewriting 10 million rows (at various places) throughout the table. I also should mention that there were 100+ materialized views built off this table which I'm sure had an impact.
  In any case, this got resolved but caused a huge operational headache, and isn't something that would have been a problem with MySQL. I feel like that's the main reason VACUUM gets hated on; all of the problems with it are solvable, but you only find those problems by running into them, and when you run into them on your production database it ends up somewhere between "pain in the ass" and "total nightmare" to resolve.
  
  InsideOutSanta a year ago
  
  Thanks for that, that's valuable information.
  
  vbezhenar a year ago
  
  When postgres deletes a row, it marks a disk space unused. If you delete a lot of rows, there will be plenty of unused space, but that space will not be released to OS. To release that space to OS, you need to run vacuum full or perform backup/truncate/restore.
  In normal circumstances it's not needed. The unused space is reused for further inserts and if your database maintains steady rate of inserts/deletes, this won't be an issue.
  However I experienced situation when database starts to grow uncontrollably and it was happening for weeks (when actual workload didn't change). I don't know what causes that behaviour. The solution was to run `vacuum full`.
  
  evanelias a year ago
  
  For an in-depth read on the differences in MVCC implementations, this post is pure gold: https://www.cs.cmu.edu/~pavlo/blog/2023/04/the-part-of-postg...
  
  djbusby a year ago
  
  VACUUM and VACUUM FULL (and/or with ANALYZE) can lock tables for a very long time, especially when the table is large. Incantation may also require 2x the space for the table being operated on. In short: it's slow.
  
  sgarland a year ago
  
  `VACUUM` (with or without `ANALYZE`) on its own neither locks tables nor requires additional disk space. This is what the autovacuumdvaemon is doing. `VACUUM FULL` does both, as it's doing a tuple-by-tuple rewrite of the entire table.
  
  williamdclt a year ago
  
  Only FULL takes a serious lock (normal vacuum only takes a weak lock preventing things like other vacuums or table alterations iirc).
  Aside: I wish Postgres forced to make explicit the lock taken. Make me write “TAKE LOCK ACCESS EXCLUSIVE VACUUM FULL my_table”, and fail if the lock I take is too weak. Implicit locks are such a massive footgun that have caused countless incidents across the world, it’s just bad design.
  
  luhn a year ago
  
  `TAKE LOCK ACCESS EXCLUSIVE VACUUM FULL` is just an incantation that will be blindly copy-pasted. I don't see how it would stop anyone from shooting themselves in the foot.
  
  immibis a year ago
  
  Imagine two footguns. One shoots your foot off when you open the window. The second requires you to point a gun at your foot and pull the trigger before the window unlocks. Far fewer people will suffer accidental foot injuries from the latter.
  
  gomoboo a year ago
  
  pg_repack gets rid of the need to lock tables for the duration of the vacuum: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appen...
  It is an extension though so downside there is it not being included in most Postgres installs. I’ve used it at work and it felt like a superpower getting the benefits of a vacuum full without all the usual drama.
  
  take-five a year ago
  
  pg_repack can generate a lot of WAL, which can generate so much traffic that standby servers can fall behind too much and never recover.
  We've been using https://github.com/dataegret/pgcompacttable to clean up bloat without impacting stability/performance as much as pg_repack does.
  
  cooljacob204 a year ago
  
  This is sorta mitigated by partitioning or sharding though right?
  Too bad it's sorta annoying to do on plain old pg.
- tomnipotent a year ago
  
  MySQL indexes can contain references to rows in the undo log and has a periodic VACUUM-like process to remove those references, though no where near as impactful.
- brightball a year ago
  
  There are always tradeoffs.
- anonzzzies a year ago
  
  Yeah, until vacuum is gone, i'm not touching postgres. So many bad experiences with our use cases over the decades. I guess most people don't have our uses, but i'm thinking Uber does.
  
  RedShift1 a year ago
  
  Maybe just vacuum much more aggressively? Also there have been a lot of changes to the vacuuming and auto vacuuming process these last few years, you can pretty much forget about it.
  
  anonzzzies a year ago
  
  Not in our experience; for our cases it is still a resource hog. We discussed it even less than a year ago with core devs and with a large postgres consultancy place; they said postgres doesn't fit our use case which was already our conclusion, no matter how much we want it to be. Mysql is smooth as butter. I have nothing to win from picking mysql just that it works; I rather use postgres as features / not oracle but...
  Edit; also, as can be seen here in responses, and elsewhere on the web when discussing this, the fans say it's no problem, but many less religious users feel it's a massive design flaw (perfectly logical at the time, not so logical now) that sometimes will stop users from using it, which is a shame
  
  yeswecatan a year ago
  
  What is your use case?
  
  anonzzzies a year ago
  
  We have 100000s tables per database (1000s of those) (think sensor/iot data with some magic sauce that 0 of our competitors offer) that are heavy on the changes. And yes, maybe it's the wrong tool (is it though if it works without hickups?) for the job (but migrating would be severe so we would only attempt that if we are 100% sure it will work and if the endresult would be cheaper; remember; we are talking decades here, not a startup), but mysql has been taking this without any issues for decades with us (including the rapid growth of the past decade) now while far smaller setups with postgres have been really painful and all because of vacuum. We were postgres in 1999 when we ran many millions of records through it, but that was when we could do a full vacuum at night without anyone noticing. The internet grew a little bit, so that's not possible anymore. Vacuum improved too like everyone says here, and i'm not spreading the gospel or whatever; just fans (... what other word is there) blindly stating it can do loads 'now' they never considered is, well weird.
  
  dhoe a year ago
  
  I'd generally call this amount of tables an antipattern - doing this basically implies that there's information stored in the table names that should be in rows instead, like IDs etc. -- But I'll admit that sensor related use cases have a tendency to stress the system in unusual ways, which may have forced this design.
  
  anonzzzies a year ago
  
  Especially back when we started. Now we would've done it differently, but still think postgres wouldn't really work. Guess we will never now as even far smaller data sets do not work in the way we need them.
  
  leishman a year ago
  
  Postgres 17 tremendously improves vacuum performance
  
  mannyv a year ago
  
  Vacuuming is a design decision that may have been valid back in the day, but is really a ball and chain today.
  In a low-resource environment deferring work makes sense. But even in low-resource environment the vacuum process would consume huge amounts of resources to do its job, especially given any kind of scale. And the longer it's deferred the longer the process will take. And if you actually are in a low-resource environment it'll be a challenge to have enough disk space to complete the vacuum (I'm looking at you, sunos4) - and don't even talk about downtime.
  I don't understand how large pgsql users handle vacuuming in production. Maybe they just don't do it and let the disk usage grow unbounded, because disk space is cheap compared to the aggravation of vacuuming?
  
  wongarsu a year ago
  
  You run VACUUM often enough that you never need a VACUUM FULL. A normal VACUUM doesn't require any exclusive locks or a lot of disk space, so usually you can just run it in the background. Normally autovacuum does that for you, but at scale you transition to running it manually at low traffic times; or if you update rows a lot you throw more CPUs at the database server and run it frequently.
  Vacuuming indices is a bit more finicky with locks, but you can just periodically build a new index and drop the old one when it becomes an issue
  
  sgarland a year ago
  
  People not realizing you can tune autovacuum on a per-table basis is the big one. Autovacuum can get a lot done if you have enough workers and enough spare RAM to throw at them.
  For indices, as you mentioned, doing either a REINDEX CONCURRENTLY (requires >= PG12), or a INDEX CONCURRENTLY / DROP CONCURRENTLY (and a rename if you’d like) is the way to go.
  In general, there is a lot more manual maintenance needed to keep Postgres running well at scale compared to MySQL, which is why I’m forever upset that Postgres is touted as the default to people who haven’t the slightest clue nor the inclination to do DB maintenance. RDS doesn’t help you here, nor Aurora – maintenance is still on you.
  
  anonzzzies a year ago
  
  We make good money 'saving' people from Aurora; you can throw traffic at it and pay more. We often migrate companies who then end up with a fraction of the price.
  
  sgarland a year ago
  
  I’m convinced that Aurora’s team consists mostly of sales. There are certainly some talented engineers working on it – I’ve talked to a few – but by and large, all of my interactions with AWS about DB stuff was been them telling me how much better it is than other options.
  I’ve tested Aurora Postgres and MySQL against both RDS and native (on my own, extremely old hardware), and Aurora has never won in performance. I’ve been told that “it’s better in high concurrency,” but IMO, that’s what connection poolers are for.

m4r1k a year ago

Uber's collaboration with Percona is pretty neat. The fact that they've scaled their operations without relying on Oracle's support is a testament to the expertise and vision of their SRE and SWE teams. Respect!

tiffanyh a year ago

Aren't they using Persona in lieu of Oracle.
So it's kind of the same difference, no?
- paulryanrogers a year ago
  
  Word on the street is Oracle contracts are expensive and hard to cancel, like a deal with the devil. Not sure if their MySQL support is any different than Oracle DB itself

jauntywundrkind a year ago

Having spent a couple months doing a corporate mandated password rotation on our services - a number of which weren't really designed for password rotation - happy to see the dual password thing mentioned.

Being able to load in a new password while the current one is active is where it's at! Trying to coordinate a big bang where everyone flips over at the same time is misery, and I spent a bunch of time updating services to not have to do that! Great enhancement.

I wonder what other datastores have dual (or more) password capabilities?

johannes1234321 a year ago

I can't answer with an overview on who got such a feature, but "every" system got a different way of doing that: rotating usernames as well. Create a new user with new password.
This isn't 100% equal as ownership (thus permissions with DEFINER) in stored procedures etc. needs some thought, but bad access using outdated username is simpler to trace (as username can be logged etc. contrary to passwords; while MySQL allows for tracing using performance_schema logging incl. user defined connection attributes which may ease finding the "bad" application)

sandGorgon a year ago

so how does an architecture like "2100 clusters" work. so the write apis will go to a database that contains their data ?

how is this done - like a user would have history, payments, etc. are all of them colocated in one cluster ? (which means the sharding is based on userid) ?

is there then a database router service that routes the db query to the correct database ?

ericbarrett a year ago

A query for a given item goes to a router*, as you said, that directs it to a given shard which holds the data. I don't know Uber's schema, but usually the data is "denormalized" and you are not doing too many JOINs etc. Probably a caching layer in front as well.
If you think this sounds more like a job for a K/V store than a relational database, well, you'd be right; this is why e.g. Facebook moved to MyRocks. But MySQL/InnoDB does a decent job and gives you features like write guarantees, transactions, and solid replication, with low write latency and no RAFT or similar nondeterministic/geographically limited protocols.
* You can also structure your data so that the shard is encoded in the lookup key so the "routing" is handled locally. Depends on your setup
bob1029 a year ago

I imagine it works just like any multi-tenant SaaS product wherein you have a database per customer (region/city) with a unified web portal. The primary difference being that this is B2C and the ratio of customers per database is much greater than 1.

candiddevmike a year ago

Does Uber still use Docstore? I'd imagine having built an effectively custom DB on top of MySQL made this upgrade somewhat inconsequential for most apps.

geitir a year ago

Yes
dravikontor1212 a year ago

[dead]

anitil a year ago

I wonder why they did a large version jump in one shot (v5.7->v8) and didn't do incremental upgrades (v5.7 -> 6.x etc)?

I wonder because the promotion of the secondary v8 node to primary is a breaking change in this path, whereas in an incremental upgrade it might not have been. But I also understand at this sort of scale things might be as easy as that.

tiffanyh a year ago

Why upgrade to v8.0 (old LTS) and not v8.4 (current LTS)?

Especially given that end-of-support is only 18-months from now (April 2026) … when end-of-support of v5.7 is what drive them to upgrade in the first place.

https://en.m.wikipedia.org/wiki/MySQL

hu3 a year ago

The upgrade initiative started somewhere in 2023 according to the article.
MySQL 8.4 was released in April 30, 2024.
Their criteria for a "battle tested" MySQL version is probably much more rigorous than the average CRUD shop.
- paulryanrogers a year ago
  
  Considering several versions of 8.0 had a crashing bug if you renamed a table, waiting is probably the right choice.
  
  blindriver a year ago
  
  You’re not renaming tables when you’re at scale.
  
  abhorrence a year ago
  
  Sure you do! It's how online schema changes tend to be done, e.g. https://docs.percona.com/percona-toolkit/pt-online-schema-ch... describes doing an atomic rename as the last step.
  
  yen223 a year ago
  
  You aren't renaming tables at scale because there are 27 downstream services that will break if you even think about fixing the name of the revnue_dolars table, and it's not in anyone's OKR to fix it
  
  paulryanrogers a year ago
  
  Take a closer look at how some min downtime tools work. They often use a drop-swap strategy to replace an old table name with a new one that has schema changes.
  There are sometimes temporary views to keep the old and new code working during gradual transitions.
  
  blindriver a year ago
  
  You clearly have not worked at global companies with concerns that go beyond “can I do it?”. I’ve used percona and I’ve used these tools before and percona is great. But when you’re at scale something as trivial as a table name change is not something you ever do because it’s not worth the risk of taking your entire site down. Just use the existing name no matter how bad it is.
  
  paulryanrogers a year ago
  
  Are we talking about the same Pt-online-ddl-change?
  It certainly has a drop-swap approach which does a table rename, even though the resulting table ends up with the same name as before.
  That's how I encountered the MySQL table-rename-crash bug.
  
  sroussey a year ago
  
  I’m not sure if this is a joke (ref to OKR was funny, for example), or just naive and not understanding the parent comment. I found it funny either way though.
johannes1234321 a year ago

Since direct upgrade to 8.4 isn't supported. They got to go to 8.0 first.
Also: 8.0 is old and most issues have been found. 8.4 probably has more unknowns.
pizza234 a year ago

I suppose they opted for a conservative upgrade policy, as v8.4 probably includes all the functional additions/changes of the previous v8.1+ versions, and moving to it would have been a very big step.
MySQL is very unstable software - hopefully this will be past - and it's very reasonable to go for the smallest upgrade steps possible.
- hu3 a year ago
  
  > MySQL is very unstable software
  I've worked on 20+ projects using MySQL in consulting career. Not once stability was a concern. Banking clients would even routinely shut down radom MySQL nodes in production to ensure things continued running smoothly.
  As I'm sure users like Uber and Youtube would agree. And these too: https://mysql.com/customers
  Unless you know something we don't and we're just lucky.
  
  pizza234 a year ago
  
  I've used MySQL for almost 15 years, and reported dozens of bugs, a few of which were critical security issues.
  My definition of "very unstable" has been very vague, so I'll clarify.
  MySQL is stable software if a company uses a subset of features - the core ones, stable for a long time. Anything outside of that area, and either a company has very considerable engineering resources (that is, being able to work on the MySQL source, indeed like Uber, Google or Facebook), or it's suicide.
  For starters, the MySQL development model has a habit of abandoning functionalities while not officially deprecating them, leaving them buggy.
  MyISAM for example has been abandoned long ago, but not officially. Around 10/12 years ago, while it was still somewhat used, we experienced a bug where the tables were marked as crashed on clean shutdowns. Very nasty because it caused long recovery times on startup.
  InnoDB fulltext indexes are the same. They have a broken workflow (they require maintenance, but some required maintenance functionalities are missing), and clearly MySQL has abandoned them, but not officially deprecated.
  New functionalities have typically nasty bugs. The critical bugs I've mentioned were on simple operations related to relatively new, but not widespread, functionalities (MVIs, regular expressions, InnoDB FTIs, I think JSON, and can't remember the other(s)).
  Worse, MySQL 8.0 has been a complete trainwreck, because the product managers decided to develop features/changes in patch versions, continuosly introducing breakages. So, maybe a functionality that has finally been stable for an year or more, suddenly breaks (in production, of course).
  Heck, even something as simple as 'SHOW TRIGGERS' is currently buggy, just because triggers are not common in the userbase.
  Given the domain you've mentioned (banking) you've probably been working with a conservative set of functionalities - which absolutely makes sense, but calling MySQL stable depends on whether one considers MySQL a small set of functionalities, or the whole set of them.
  
  paulryanrogers a year ago
  
  Depends on your use case. I had a huge table get corrupted after a drop-swap rename during some DDL. 8.0.14 was the version, IIRC.
  For simple use cases or with tried and true patch sets I'm sure it can be a work horse.
PedroBatista a year ago

MySQL 8 and beyond has been riddled with bugs and performance regressions. It was a huge rewrite from 5.7.
8 has nice features and I think they evaluated it as stable enough to upgrade their whole fleet to it. I'm pretty sure from 8 to 8.4 the upgrades will be much simpler.
cbartholomew a year ago

They started in 2023. v8.0 was the current LTS when they started.
tallanvor a year ago

According to the article they started the project in 2023. Given that 8.4 was released in April 2024, that wasn't even an option when they started.
EVa5I7bHFq9mnYK a year ago

With all the migration code already written and experience gained, I imagine upgrading 8->8.4 would take 1/10 of effort of 5.7->8.0.
gostsamo a year ago

> Several compelling factors drove our decision to transition from MySQL v5.7 to v8.0:
Edit: for the downvoters, the parent comment was initially a question.

donatj a year ago

Interestingly we just went through basically the same upgrade just a couple days ago for similar reasons. We run Amazon Aurora MySQL and Amazon is finally forcing us to upgrade to 8.0.

We ended up spinning up a secondary fleet and bin log replicating from our 5.7 master to the to-be 8.0 master until everything made the switch over.

I was frankly surprised it worked, but it did. It went really smoothly.

takeda a year ago

AFAIK the 8.0 release is one where Oracle breaks compatibility. So anyone considering MariaDB needs to switch before going to 8.0, otherwise switching will be much more painful.
- paulryanrogers a year ago
  
  IIRC it wasn't a big break from 5.7 unless you used some odd grouping or relied on its old+lossy defaults.
  Though the single threaded performance fell off a cliff, tanking my CI performance.
  Maria DB evolved very differently. I'm not sure how they stack up performance wise.
  
  donatj a year ago
  
  You can re-enable the old group by behavior, which we did.

drastic_fred a year ago

Wow! Less than 200qps per node, quite a redundancy. They also mentioned its read heavy. With aws dynamo db it would be at least 10 times cheaper for their given workload. Although, i do not know how much data they store and serve or how they are additionally (analytics?, etc) using this fleet for.

denysonique a year ago

Why didn't they move to MariaDB instead? A faster than MySQL 8 drop-in replacement.

evanelias a year ago

While it is indeed often faster, it isn't drop-in. MySQL and MariaDB have diverged over the years, and each has some interesting features that the other lacks.
I wrote a summary of the DDL / table design differences between MySQL and MariaDB, and that topic alone is fairly long: https://www.skeema.io/blog/2023/05/10/mysql-vs-mariadb-schem...
Another area with major differences is replication, especially when moving beyond basic async topologies.
- aorth a year ago
  
  Wow, I hadn't realized that MySQL and MariaDB diverged so much! In the last year I've started seeing some prominent applications like Apache Superset and Apache AirFlow claiming they don't support—or even test on—MariaDB at all.

edf13 a year ago

3 million queries/second across 16k nodes seems pretty heavy on redundancy?

sgarland a year ago

I was going to say, that's absolutely nothing. They state 2.1K clusters and 16K nodes; if you divide those, assuming even distribution, you get 7.6 instances/cluster. Round down because they probably rounded up for the article, so 1 primary and 6 replicas per cluster. That's still only ~1400 QPS / cluster, which isn't much at all.
I'd be interested to hear if my assumptions were wrong, or if their schema and/or queries make this more intense than it seems.
- pgwhalen a year ago
  
  > assuming even distribution
  I don't work for Uber, but this is almost certainly the assumption that is wrong. I doubt there is just a single workload duplicated 2.1K times. Additionally, different regions likely have different load.
withinboredom a year ago

That's 200 qps per node, assuming perfect load balancing.
620gelato a year ago

2100 clusters, 16k nodes, and data is replicated across every node "within a cluster" with nodes placed in different data centers/regions.
That doesn't sound unreasonable, on average. But I suspect the distribution is likely pretty uneven.

xyst a year ago

I wonder if an upgrade like this would be less painful if the db layer was containerized?

The migration process they described would be less painful with k8s. Especially with 2100+ nodes/VMs

remon a year ago

Their entire setup seems somewhat suspect. I can't think of any technical justification for needing 21k instances for their type of business.
meesles a year ago

A pipe dream. Having recently interacted with a modern k8s operator for Postgres, it lacked support for many features that had been around for a long time. I'd be surprised if MySQL's operators are that much better. Also consider the data layer, which is going to need to be solved regardless. Of course at Uber's scale they could write their own, I guess.
At that point, if you're reaching in and scripting your pods to do what you want, you lose a lot of the benefits of convention and reusability that k8s promotes.
- jcgl a year ago
  
  > it lacked support for many features that had been around for a long time
  Care to elaborate at all? Were they more like missing edge cases or absent core functionality? Not to imply that missing edge cases aren’t important when it comes to DB ops.
__turbobrew__ a year ago

I can tell you that k8s starts to have issues once you get over 10k nodes in a single cluster. There has been some work in 1.31 to improve scalability but I would say past 5k nodes things no longer “just work”: https://kubernetes.io/blog/2024/08/15/consistent-read-from-c...
The current bottleneck appears to be etcd, boltdb is just a crappy data store. I would really like to try replacing boltdb with something like sqlite or rocksdb as the data persistence layer in etcd but that is non-trivial.
You also start seeing issues where certain k8s operators do not scale either, for example cilium cannot scale past 5k nodes currently. There are fundamental design issues where the cilium daemonset memory usage scales with the number of pods/endpoints in the cluster. In large clusters the cilium daemonset can be using multiple gigabytes of ram on every node in your cluster. https://docs.cilium.io/en/stable/operations/performance/scal...
Anyways, the TL;DR is that at this scale (16k nodes) it is hard to run k8s.
zemo a year ago

upgrade clients and testing the application logic, changes to the queries themselves as written, the process of detecting the regression and getting MySQL patched by percona, changes to default collation ... all of these things have nothing to do with whether the instances are in containers and whether the containers are managed by k8s or not.
shakiXBT a year ago

running databases (or any stateful application, really) on k8s is a mess, especially at that scale

vivzkestrel a year ago

Anyone has any ideas why Uber doesn't use PostgreSQL?

hu3 a year ago

https://eng.uber.com/postgres-to-mysql-migration/
cyberax a year ago

They switched from PG to MySQL because they need to update highly concurrent tables, and PG created tons of bloat as a result. MySQL uses locking instead of optimistic concurrency.

John23832 a year ago

Anyone else get a "Not Acceptable" response?

internetter a year ago

I did but it worked on a private tab

gregoriol a year ago

Wait until they find out they have to upgrade to 8.4 now

gregoriol a year ago

And also all the passwords away from mysql_native_password
- johannes1234321 a year ago
  
  Which one should do anyways. mysql_native_password is considered broken for a out ten years. (Broken for people who can access the hashed form of the password on the server)
- sgarland a year ago
  
  They've got until 9.0 for that, it just gives deprecation warnings in 8.4.
  
  evanelias a year ago
  
  More specifically, mysql_native_password is disabled by default in 8.4, but can be re-enabled if needed: https://www.skeema.io/blog/2024/05/14/mysql84-surprises/#aut...
  
  gregoriol a year ago
  
  Deprecation warnings are in 8.0. It's disabled in 8.4.
  If you are up-to-date with all your libraries it all should go well, but if some project is stuck on some old code, mostly old mysql libraries, one might get surprises when doing the switch away.

dweekly a year ago

Am I the only one who saw "delve" at the top of the article and immediately thought "ah, an AI generated piece"? Well, that and the over-structured components of the analysis with nearly uniform word count per point and high-complexity but low signal-to-noise vocabulary using phraseology not common to the domain being discussed. (The article doesn't scan as written by an SRE/DBA.)

devbas a year ago

The introduction seems to have AI sprinkled all over it: ..we embarked on a significant journey, ..in this monumental upgrade.

rafram a year ago

Did they have ChatGPT (re)write this? The writing style is very easy to identify, and it’s grating.

OsrsNeedsf2P a year ago

> The writing style is very easy to identify,
Really? At n=1 the rate seems to be 0

paradite a year ago

I can tell from a mile away that this is written by ChatGPT / Claude, at least partially.

"This distinction played a crucial role in our upgrade planning and execution strategy."

"Navigating Challenges in the MySQL Upgrade Journey"

"Finally, minimizing manual intervention during the upgrade process was crucial."

traceroute66 a year ago

> I can tell from a mile away that this is written by ChatGPT / Claude, at least partially.
Whilst it may smell of ChatGPT/Claude, I think the answer is actually simpler.
Look at the authors of the blog, search LinkedIn. They are all based in India, mostly Bangalore.
It is therefore more likely to be Indian English.
To be absolutely clear, for absolute avoidance of doubt:
This is NOT intended a racist comment. Indians clearly speak English fluently. But the style and flow of English is different. Just like it is for US English, Australian English or any other English. I am not remotely saying one English is better than another !
If, like me, you have spent many hours on the phone to Bangalore call-centres, you will recognise many of the stylistic patterns present in the blog text.
- calmoo a year ago
  
  There's nothing that sticks out to me as obviously Indian English in this blog post. It's almost certainly entirely run through an LLM though.
  
  antisthenes a year ago
  
  If there are large amounts of Indian English in an LLM's training data, it stands to reason the LLM output will be very similar to Indian English, no?
  
  calmoo a year ago
  
  No, I don't think so. There's nothing Indian English about the blog post. It's just overly verbose, fluffy language.
- 620gelato a year ago
  
  (Speaking as an Indian engineer)
  Hate to generalize, but this has less to do with "Indian style" but rather adding a lot of fluff to make a problem appear more complex than it is, OR maybe someone set a template that you must write such and such sections, despite there not being relevant content. [ Half the sections from this article could be cut without losing anything ]
  In this case, the _former_ really shouldn't have been the case. I for one would love to read a whole lot more about rollback planning, traffic shifting, which query patterns saw most improvements, hardware cost optimizations, if any, etc.
- excitive a year ago
  
  Can you elaborate on the last part? What are some stylistic patterns that are different when something is written by a US author v/s Indian?
  
  albert_e a year ago
  
  I recently saw a tweet where someone pointed out that "today morning" was an Indian phrase.
  I had to really think hard why it is incorrect / not common elsewhere. Had to see comments to learn -- someone explained that a native English speaker would instead say "this morning" and not "today morning".
  As a Indian ESL speaker -- "today morning" sounded (and still sounds) perfectly fine to me -- since my brain grew up with indian languages where this literal phrase (equivalent of "TODAY morning") is not only very common, but also the normal/correct way to convey the idea, and if we instead try to say "THIS morning" it would feel pretty contrived.
  
  hodgesrm a year ago
  
  Not exactly a stylistic difference but there are real differences in the dialects. Here's example from many moons ago: "Even I think that's a bad idea." That was an Indian colleague. It took me weeks to figure out that he was using "even" in place of "also."
  In a like vein when Australians say "goodeye" they usually aren't talking about your vision.
  
  ssl-3 a year ago
  
  Perhaps.
  Or perhaps it was meant to specify that they, themselves, might have been presumed to be an outlier who would think it was a good idea, but who has in fact come to think that is a bad idea.
  Examples of this kind of counter-presumptive use of the word "even":
  1: On animals and the weather: "It was so cold that even polar bears were suffering from frostbite and frozen digits."
  2: On politics, where one's general stance is well-known and who who might be rationally presumed to be a supporter of a particular thing: "Even I think that this issue is a total non-starter."
  Even if they may have meant something else, that doesn't mean that they didn't intend for the words to be taken non-literally.
  
  hodgesrm a year ago
  
  In this case it was indeed "also." I've heard it used that way many times.
  Another common phrase in Indian English is "do the needful," which is a delightful formulation. Grammarly has a plausible description of how it arose. [0]
  [0] https://www.grammarly.com/blog/idioms/do-the-needful/
  
  V-eHGsd_ a year ago
  
  > In a like vein when Australians say "goodeye" they usually aren't talking about your vision.
  They aren’t saying goodeye, they’re saying g’day (good day)
  
  traceroute66 a year ago
  
  > What are some stylistic patterns that are different when something is written by a US author v/s Indian?
  Largely as @brk above you already mentioned, tendency to use formal and obscure words alongside a specific tone. I'll also re-iterate what @brk said, hard to fully describe, more of a "you know it when you see it".
  If I had to pick some specific examples from the blog post, the following phrase is a good example:
  We systematically advanced through each tier, commencing from tier 5 and descending to tier 0.
  There are 101 ways you could write that in US English, but I reckon 99% of the US population would be unlikely to pick the above unless they were writing an academic paper or something.
  This one is also quite Indian English in many respects:
  Our automated alerts and monitoring system actively oversees the process to ensure a seamless transition and promptly alerts of any issues that may arise.
  Similarly, we have stylistic elements such as the over-breaking of paragraphs to the extent it becomes a series of statements. For example:
  Upgrading to MySQL 8.0 brought not only new features, but also some unexpected tweaks in query execution plans for certain clusters. This resulted in increased latencies and resource consumption, potentially impacting user experience. This happened for the cluster which powers all the dashboards running at Uber. To address this issue, we collaborated with Percona, identified a patch fix, and successfully implemented it for the affected clusters. The resolution ensured the restoration of optimized query performance and resource efficiency in alignment with the upgraded MySQL version.
  A relatively short paragraph, but five phrases. Your average US English writer would likely word it differently resulting in it being trimmed down to two or three phrases.
  As I said in my original post though, none of it is bad English, its just a different style.
- brk a year ago
  
  I agree (I've posted a similar comment in the past and collected a handful of downvotes). Much like ChatGPT, you tend to see a slight over use of more formal and obscure words and a tone that tends to feel like the topic being discussed is being given just a touch too much focus or dedication relative to the grand scheme of things. It is hard to fully describe, more of a "you know it when you see it".
- dr-detroit a year ago
  
  [dead]
brunocvcunha a year ago

I can tell just by the frequency of the word “delve”
godshatter a year ago

That sounds like regular old English to me. I could see myself saying all those things without thinking it's pushing any boundaries whatsoever. I'm starting to fear that LLMs are going to dumb down our language in the same way that people feared that calculators would remove our ability to calculate mentally.
notinmykernel a year ago

Agree. Repetition (e.g., crucial) in ChatGPT is an issue.
aster0id a year ago

Because the authors are likely non native English speakers. I'm one myself and it is hard to write for a primarily native English speaking audience without linguistic artifacts that give you away, or worse, are ridiculed for.
rand_r a year ago

I know what you mean, and you’re probably right, but there’s a deeper problem, which is the overuse of adjectives and overall wordiness. It’s quite jarring because it reads like someone trying to impress rather than get an important message across.
Frankly, ChatGPT could have written this better with a simple “improve the style of this text” directive.
Example from the start:
> MySQL v8.0 offered a compelling proposition with its promise of substantial performance enhancements.
That could have just been “MySQL v8.0 promised substantial performance improvements.”
mannyv a year ago

Once ChatGPT puts in "we did the needful" we're all doomed.
- greenchair a year ago
  
  Dear sir, we are having a P1 incident, Prashant please revert.
kaeruct a year ago

ChatGPT says "While it's plausible that a human might write this content, the consistent tone, structure, and emphasis on fluency suggest it was either fully or partially generated by an LLM."
- gurchik a year ago
  
  How would ChatGPT know?

indulona a year ago

[flagged]

kavunr a year ago

I had a similar reaction when reading https://engineeringblog.yelp.com/2024/10/migrating-from-post...
- BarryMilo a year ago
  
  I get the urge to standardize infrastructure but... wow. After reading the whole thing, I get why they thought it was worth doing, but you'd think they'd just hire a Postgres guy or two. Especially when they looked at the missing features, this feels like a downgrade...
  
  kccqzy a year ago
  
  In big companies they never just "hire a guy or two"; the bus factor would be atrocious in this case, not to mention vacations and such. The minimum unit of hiring is one team, about five people five or take. So the difference they are looking at is either hiring a team of Postgres people or forcing everyone on one existing team to learn Postgres deeply. From this perspective, standardizing on infrastructure makes more sense now.
- mardifoufs a year ago
  
  Uber also famously migrated from postgres to MySQL, so they have used both!
- indulona a year ago
  
  everyone is praising postgres and shitting on mysql, until performance matters. then, when all these big tech companies turn to mysql, the postgres fanboys cry in unison.
  
  sgarland a year ago
  
  Tbf, either can be made to be fast, or slow. MySQL used to have a huge advantage for range queries due to its clustering index (assuming you have designed a schema to exploit that capability), but as time has gone on that gap has narrowed [0] (despite the title, it's not just about inserts). My own benchmarks for strictly read-only queries have also borne that out.
  The biggest difference, IMO, is if you _aren't_ aware of the clustering index, and so design your schema in a way that is suboptimal. For example, given a table with orders or something similar, with a PK of `(user_id, created_at)`, a query like `SELECT... FROM... WHERE user_id = ? ORDER BY created_at DESC LIMIT 1` can be quite fast in MySQL. With a monotonic integer as the PK, and a secondary index on those two columns, it can still be quite fast, depending on insert order. If however you invert the PK to `(created_at, user_id)`, while MySQL 8+ will still be able to use the index, it's not nearly as efficient – in my tests, I saw query speed go from 0.1 msec --> 10 msec.
  In contrast, while there is a small difference in Postgres with all of those differences, it's just not that big of a difference, since it stores tuples in a heap. Again, in my tests, query speed went from 0.1 msec --> 0.7 msec. This is a point query, of course; with range queries (`WHERE created_at < ...`) Postgres suffered more, jumping up to ~25 - 50 msec.
  [0]: http://smalldatum.blogspot.com/2024/09/mysql-and-postgres-vs...
  
  lenerdenator a year ago
  
  There'd be less of that if there weren't a feeling that MySQL is the high-grade free hit that Oracle uses to get you hooked on their low-grade crap.
  
  mardifoufs a year ago
  
  It's been 15 years since oracle acquired MySQL. I agree that you should always be prudent with Oracle but that's a long time...
  
  indulona a year ago
  
  use mariadb instead of mysql.

sinaptia_dev a year ago

[flagged]

greenie_beans a year ago

yall should prioritize your focus so you can do better at vetting drivers who don't almost kill me

JamesSwift a year ago

I'm not sure how effective the database engineers are going to be at solving this, but I guess we can ask them to try...
- greenie_beans a year ago
  
  thanks for your help
lenerdenator a year ago

Their focus is prioritized according to what returns maximum value to their shareholders.
- greenie_beans a year ago
  
  beep boop i'm a capitalist robot
  pretty sure safe travels is critical to maximum value to their shareholders (aka stfu or tell me how this blog post has anything to do with maximize shareholder value https://www.uber.com/en-JO/blog/upgrading-ubers-mysql-fleet/... ... shareholder value is a dumb ass thing to prioritize over human life)
  
  lenerdenator a year ago
  
  > pretty sure safe travels is critical to maximum value to their shareholders
  Well, it is... sort of.
  Obviously you can't have Uber be a guaranteed way to be robbed by a highwayman, but when you've cleared out most taxis in a given city, you can start to dictate the terms by which customers accept your service.
  And if that means including language in your ToS that shove your customers into a binding arbitration agreement [0] that effectively shield you from the risks of hiring incompetent or malicious drivers, well... that's what that means.
  [0]https://www.npr.org/2024/10/02/nx-s1-5136615/uber-car-crash-...
  
  greenie_beans a year ago
  
  you can rationalize your way around the issue all day but it still don't make it right.
  
  Kennnan a year ago
  
  Honest question, how do you (or amyone) propose to vet drivers? They require drivers license and car insurance registration, anything like a CDL would make being a driver prohibitively expensive. Their rating system already works as a good signal the few times Ive used uber.
  
  lenerdenator a year ago
  
  > They require drivers license and car insurance registration, anything like a CDL would make being a driver prohibitively expensive.
  Taxi companies made it work.
  They just accepted less return on their investment than the tech bros behind Uber did.
  
  greenie_beans a year ago
  
  i don't know, i don't work there. i'm just somebody who almost died because one of their drivers was a terrible driver. that sounds like a problem they should figure out. dude didn't even know how to change a tire, so start with "basic knowledge of car maintenance." and a basic ability to speak english would be a good bar to meet, too. they'll let anybody with a driver's license, car, and a heart beat drive on that app. there should be a higher barrier of entry. but idk, i don't work there. this is just my experience as consumer.
  also, the US should be wayyyyy stricter on who we issue drivers license to. so many terrible drivers on the road driving these death machines.
  
  robertlagrant a year ago
  
  If you have one big company with 10 bad drivers, you'll get a much worse impression of it than 100 companies each with one bad driver.
  
  greenie_beans a year ago
  
  and your point is?
  this just makes no sense bc the drivers are on all of the different apps. rework your formula.
  
  croisillon a year ago
  
  mandatory retest every 5 years
photochemsyn a year ago

It's undeniable that the worst drivers on the road are those working for ride-hailing services like Uber. It's a big point in Waymo's favor that their automated vehicles behave predictably - Uber drivers are typically Crazy Ivan types doing random u-turns, staring at their electronic devices while driving, blocking pedestrian walkways and bike lanes, etc.

jeffbee a year ago

File under "things you will never need to do if you use cloud services".

martinsnow a year ago

Nah random outages because the RDS instance you were on decided to faceplant itself, or the weird memory to bandwidth scaling AWS has chosen will make you pull your hair out on a high traffic day.
It's just different problems.
- jeffbee a year ago
  
  The company in the article is doing < 200qps per node. Unless they are returning a feature-length video file from every query, they are nowhere near any hardware resource limits.
mannyv a year ago

That's not true. The RDS 5.7 instances are EOL so you have to upgrade them at some point.
At least in RDS, that will be a one-way upgrade ie: no rollback will be possible. That said, you can upgrade one instance at a time in your cluster for a no-downtime rollout.
- jeffbee a year ago
  
  Hosted MySQL is not what I meant. That just means you're paying more to have all the same problems. The kind of cloud service I am alluding to is cloud spanner, cloud bigtable, dynamodb.
paxys a year ago

At Uber's scale they are a cloud service.

menaerus a year ago

Lately there's been a shitload of sponsored $$$ and anti-MySQL articles so it's kinda entertaining that their authors are being slapped in their face by Uber, completely unintended.