Lessons from Plain Text

99 points by kugurerdem 9 months ago

bruce511 9 months ago

The biggest problem with this "not the true text" issue is when coders encounter unicode.

A lot of coders, those who have worked in primarily english countries see ascii as utf-8 and the difference is invisible. They can go decades being oblivious to topics like encodings and mappings and display.

So it can be surprising to them when they start dealing with European characters for the first time. They view the text in one place (like an editor which treats the file as utf-8) and another (their program) which treats the text as ASCII.

It's hard to explain to them that "when I look at it" isn't a universal truth, it also matters how the "look at it program" chooses to interpret, and display, it.

elric 9 months ago

The same is true for all aspects of I18N and L11N. From keyboard layouts to date formats. I've seen tools that expect US qwerty use hard-coded shortcuts that are impossible to type on different layouts.
Something assumptions and asses.
- 7bit 9 months ago
  
  I hate Microsoft for that. They created a truly great date and time formatting thing in Windows where I can set a system encoding, a system language, a user language, a date/time "language" and configure individual things like the thousands-separator (. or ,)
  Only to have obvious junior devs throw all of that out the Windows and now most things are dictated by the users region. Yes, I live in Germany and the region is important for region-locked shit on the app store. That does not fucking mean I want my OS talk to me in German.
  And on M365, I want German date formatting (DD.MM.YYYY), but when M365 is in English, you cannot select that date formatting, because someone thought that Americans would never need that.
  Fuck all of these ignorant bastards!
  
  tmm84 9 months ago
  
  As a user who is in an asian region I have all my machines set to that region. However, from time to time there are cases where dates that need to be in USA format are automatically converted to match my region. Go back, edit them, save and they're formatted again. This isn't consistent enough of a problem for me so I don't have a list of programs that do it.
  
  elric 9 months ago
  
  Agreed. I'm in Belgium. I get really annoyed when applications or websites randomly decide to talk French at me. Or Dutch. Or even bloody German. My browser is set to Accept-Language: en. I wish people would realise that country is not a trustworthy indicator for language.
rustybolt 9 months ago

In some cases pretending everything is ASCII is the sane thing to do. With Unicode, sorting and case conversion are neigh impossible to do correctly. While there are algorithms for collating codepoints into (extended) grapheme clusters, there is still a lot of freedom, so while there are wrong ways to do it there is no canonical right way.
- poincaredisk 9 months ago
  
  >With Unicode, sorting and case conversion are neigh impossible to do correctly
  Surely you mean that sorting correctly is impossible without Unicode? Otherwise you would have to hardcode the rules of sorting strings correctly in my language (and all other languages) yourself.
  Unless your preferred solution is "close my eyes and prefer non-ascii characters don't exist", then... I'm not a fan.
  
  samatman 9 months ago
  
  Sorting is impossible to do correctly without knowledge of the language in which the text is written, because the collation rules for symbols differ between languages. Unicode, of course, defines those collation rules, and UTF-8 sorts lexicographically using the same naïve byte comparison which works for ASCII.
  Case conversion is similar except the default rules do a very good job in general. But still, there are a few language-specific quirks and, again, you do have to know what language is involved to get those right.
  I'm agreeing with you, to be clear, just adding that a) Unicode isn't always enough, but it does a decent job if you don't know the language in advance, and that it defines the correct rules if you do know that.
  
  sgarland 9 months ago
  
  > UTF-8 sorts lexicographically using the same naïve byte comparison which works for ASCII
  This isn’t necessarily true beyond ASCII, and it depends entirely on the collation [0]. One need only to spend some time peering into the abyss that is RDBMS collation support [1] [2] to see the horror.
  [0]: http://www.unicode.org/reports/tr10/
  [1]: https://dev.mysql.com/doc/refman/8.4/en/charset-unicode-sets...
  [2]: https://www.postgresql.org/docs/current/collation.html
  
  samatman 9 months ago
  
  Well, no. It sorts according to the lexiographical order of Unicode. Earlier points before later points.
  How useful this is depends on the language, of course. I did say that as well. But Unicode was put together from legacy character encodings, and did what it could to preserve the order of those, so it's far from useless.
- teddyh 9 months ago
  
  “Sorting is hard in other languages, so I would like to force everybody to only use the characters from my language, no characters from any other language. This will make it easy for me.”
  
  Etherlord87 9 months ago
  
  English is the international language, and latin characters have a special 'canonical' status in computer science, so you're heavily strawmaning here...
  
  DrillShopper 9 months ago
  
  Laziness, Impatience, and Hubris

thomassmith65 9 months ago

  In the end, what truly matters is whether the codebase is consistent—either using tabs or spaces throughout

I use tabs for code indentation, but spaces for non-code indentation (eg: for ascii diagrams within comments).

Anyone who has converted a lot of code, from different projects, from spaces to tabs will have noticed: the vast majority of code with spaces contains a few screwups where a line or two in a 4-spaced file actually contains 3 spaces.

Why that happens, despite editors automatically converting tabs to spaces, is beyond me, but it is a ubiquitous phenomenon. I suspect this is the real reason some people, certainly myself, prefer tabs.

GuB-42 9 months ago

I like the "tabs for indentation, spaces for alignment" style, but it is not always easy for formatting tools as there is no simple way to convert spaces to tabs, the tool has to be aware of the syntax somehow, or you need to do some manual tweaks.
Screwups like missing or adding a space can happen easily even with auto-indent, a common cause is splitting or merging a line, i.e. changing a space with a newline and vice versa. That space character has a tendency to end up where you don't want it, or conversely, get eaten up. When using tabs, invisible space characters can end up between tabs.
In the end, on collaborative projects, I usually settle on 4 space indentation, as it is the most common and from my experience, the least likely for people to screw up.
crazygringo 9 months ago

> a few screwups where a line or two in a 4-spaced file actually contains 3 spaces. Why this happens... is beyond me
In my experience it's usually from copy-paste, usually because the cursor wasn't at the right position when pasting. The cursor not being at the right position because you deleted some spaces to reduce the level of indentation before pasting, but didn't do the right number. While tab inserts the right number of spaces, delete still deletes spaces.
Also occasionally due to a find-replace that accidentally included a leading space, which can be hard to see when the find/replace boxes are in a proportional font.
norir 9 months ago

I solve this problem at the language level. Instead of using an external formatter, indentation is enforced by the compiler and requires tabs. Significant indentation is used for multiline functions rather than block delimiters. On a given line, spaces may be used for alignment purposes but only after the first non whitespace token. This is no harder to parse than arbitrary whitespace between tokens and guarantees a uniform format for any valid program in the language.
I know not everyone will agree with me, but I think defining whitespace in a language as essentially [ \t\n] between any token is a language design mistake.
edflsafoiewq 9 months ago

Code with tabs inevitably ends up with X spaces where a tab should be, which goes unnoticed until viewed with a different tab width.
- samatman 9 months ago
  
  Bingo. And the only solution to this problem, consistent use of a formatter, solves the equivalent problem with spaced indentation just as well.
  I would prefer to have the spacing version of this problem, personally, because that way I can always see that there's a problem, and can do so without resorting to changing tab widths or making invisible characters visible.
  
  thomassmith65 9 months ago
  
  You also make a good point. I usually code in Sublime, where I have invisible characters hidden in unselected text. At this point, I am used to doing 'select all' to check whitespace, but I can understand how that could annoy.
- thomassmith65 9 months ago
  
  That's a valid point, though I think it's evitable (if that's a word) because many editors have visible white space, and it's easier for the eye to catch 4 dots after a long tab than to notice that an indentation is 7 dots instead of 8, etc. It also attracts more attention when using arrow keys to navigate the offending lines.
- fire_lake 9 months ago
  
  Could easily be fixed with CI
  
  poincaredisk 9 months ago
  
  Just like incorrect indentation using spaces.
  I personally use autoformatter in all CI pipelines, and error out for every change. This entirely kills the whole issue of wrong indentation/dangling spaces/accidental tabs/inconsistent formatting, etc.
ericyd 9 months ago

On my team this happens due to individual laziness and I don't think tabs would solve that problem.
everybodyknows 9 months ago

> ascii diagrams within comments
As an aside, what tools do you use to produce the diagrams?
- thomassmith65 9 months ago
  
  I don't use any tools for that. I did try an online ascii-drawing tool one afternoon, but the results weren't compact enough to be useful for programming comments.
  Contrary to the way I worded my comment, my 'diagrams' are typically no more than text with perhaps a box-drawing unicode character here or there. But even drawing a simple tree, tabs can mess up important details.

bediger4000 9 months ago

Greppability is an interesting idea, and a good one, but I'm going to disagree with the recommendation

> Stop hard-wrapping and just use soft-wrapping,

Grep for some pattern in soft-wrapped text and you get a lot of extraneous material.

You also can't grep for things "at the beginning of the line", which is often an important indicator. When I did a lot of plain C programming, I would put function names at the start of a line, below their return type to make it easy to grep for a function definition, rather than just uses.

Soft-wrapping also limits the use of diffability, a complement to grepability. You might correct a single letter in a misspelled word in a soft-wrapped paragraph. Do a "git diff" or equivalent and you'll get back a huge block of "changed" text. Useless. Short, hard wrapped lines make it easy to see diffs.

kugurerdem 9 months ago

Can you detail a bit more what you mean by extraneous material? Is it something like "you now also need tools that can do soft-wrapping"? Even if that's the case, I think it is easier to wrap a text than to unwrap it (programmatically). So, if you need hard-wrapping, you can just do it.
Wrapping is just as simple as; `fold -s -w 80 input.txt`
Unwrapping usually turns out to be harder according to my experiences. [1]
> You also can't grep for things "at the beginning of the line", which is often an important indicator. When I did a lot of plain C programming, I would put function names at the start of a line, below their return type to make it easy to grep for a function definition, rather than just uses.
I see what you mean. But I don’t think your approach conflicts with my recommendation for soft-wrapping. You can still soft-wrap regular text files while choosing to separate certain lines of code for clarity. What you’re doing might not even be considered "hard-wrapping" in the typical sense—it's not like you're breaking a 240-character line into multiple lines. You're simply formatting the definition in a way that suits your style, and it's perfectly ok!
For the last one, you can simply use `git diff --word-diff`. Also, platforms like GitHub already highlight word-based diffs, so it usually is very easy to spot the changes.
[1]. https://news.ycombinator.com/item?id=39227848
- bediger4000 9 months ago
  
  Extraneous material in my example would be potentially the rest of a large paragraph if I change one word. The example would be the diffs of a "Word" doc: the whole paragraph shows as a diff. Folding after the "git diff" would just mean visually picking the diff out of a lot of text.
  I do a lot of Go programming these days, and there's a conventional format for code that ends up with a lot of hard wrapped lines, so my C example is just that, an example.
  Maybe Markdown would be a better example. When I edit markdown, I move around phrases, clauses and sentences. It's certainly possible to do this with a gigantic soft wrapped chunk of text, but it's much easier with one clause or even phrase per hard wrapped (at 74 characters or less) line. Grepability and diffability and even running text through sed or awk are easier. You're not relying on text coloring. Editing with vim is easier, it has commands to move the cursor to next word, previous paragraph etc etc.
  This is one of those things like tabs or spaces and byte order marks. We're unlikely to convince each other.
  
  MrJohz 9 months ago
  
  That doesn't sound like soft or hard wrapping, though, that sounds like semantic wrapping, which is a separate concept entirely. With semantic wrapping you put each sentence (or similar) on a new line, which helps with diffing. But if that sentence runs over e.g. 80 characters, you still need to decide whether you're going to hard wrap or soft wrap that sentence. And in the inverse direction, if you don't do semantic wrapping, you'll have similar issues with diffs regardless of whether you use hard wraps or soft wraps.
  So I think that's a good argument for doing semantic wrapping of code and text (I guess semantic wrapping for code is just not writing everything in one long line separated by semicolons), but once you've put in semantic line breaks, you still need to decide how to handle text that spans multiple lines.
  
  a1369209993 9 months ago
  
  > But if that sentence runs over e.g. 80 characters, > you still need to decide > whether you're going to hard wrap or soft wrap that sentence.
  No I don't. Semantic wrapping all the way.
  
  MrJohz 9 months ago
  
  > This is a sentence that includes the word "Lopadotemachoselachogaleokranioleipsanodrimhypotrimmatosilphiokarabomelitokatakechymenokichlepikossyphophattoperisteralektryonoptekephalliokigklopeleiolagoiosiraiobaphetraganopterygon" in it. > How should it be wrapped semantically?
  This is a psychological case to demonstrate how semantic wrapping does not by itself solve the "hard vs soft" wrapping question. If the answer is that the word should remain as a single word, then you are using soft wraps (or no wraps at all). If the answer is that the word should be split into 80 character chunks, then you're using hard wraps.
  
  a1369209993 9 months ago
  
  > How should it be wrapped semantically?
  I have no idea what the semantics of that word are, which is information that is required in order to properly semantically wrap it. (Inherently, since conveying such semantics is one of the major pointer of semantic wrapping.)
  However, you included embedded control characters (C2 AD aka 'SOFT HYPHEN'; below replaced with '-') that encode less semantic information than is necessary for proper semantic wrapping, but not none:
  Lopado-temacho-selacho-galeo-kranio-leipsano-drim-hypo-trimmato-silphio-karabo-melito-katakechy-meno-kichl-epi-kossypho-phatto-perister-alektryon-opte-kephallio-kigklo-peleio-lagoio-siraio-baphe-tragano-pterygon.
  Web browsers use that information to do poor-quality semantic wrapping automatically - actual hard or soft[0] wrapping would produce something like:
  Lopadotemachoselachogaleokranioleipsanod- rimhypotrimmatosilphiokarabomelitokatake- chymenokichlepikossyphophattoperisterale- ktryonoptekephalliokigklopeleiolagoiosir- aiobaphetraganopterygon.
  Which looks like the following from a partly-semanically-aware perspective:
  Lopado-temacho-selacho-galeo-kranio-leipsano-d[BREAK]rim-hypo-trimmato-silphio-karabo-melito-katake[BREAK]chy-meno-kichl-epi-kossypho-phatto-perister-ale[BREAK]ktryon-opte-kephallio-kigklo-peleio-lagoio-sir[BREAK]aio-baphe-tragano-pterygon.
  The fact that you included soft hyphens rather concedes the point that hard and soft[0] wrapping is incorrect[1].
  0: Or rather, non-semantic, which is what we're actually arguing over. Technically, semantic wrapping is a subset of hard wrapping, but it's a specific subset that isn't what is expressed by just saying "hard wrapping". Kind of like how birds aren't what anyone means when they just say "dinosaurs".
  1: Granted, to be fair, a lot of the time we just don't care. But (contra your original comment) we never need to resort to non-semantic wrapping; we just sometimes (often) decide to be lazy because it doesn't matter.
  
  MrJohz 9 months ago
  
  I think this a valid approach to semantic wrapping, but I don't think this is the only one, and specifically I think it has significant flaws: (1) We've lost grepability unless I write rather complex regexes to handle the possible places where hard line breaks may have been added. (2) We've lost diffability in the sense that if I correct a typo in the word, that correction can cascade through the word and cause multiple lines to show up as changed in the diff when semantically only one part of one word has changed.
  Instead, I would prefer a soft semantic wrap: if a single semantic unit (be that a word, a clause, or whatever else) extends beyond, say, 80 characters, we keep it on the same line and let the editor/file viewer handle wrapping. This means that we maintain grepability over words and semantically-connected phrases, and we maintain diffability by avoiding the hard-wrap cascade. To me, this is a much more useful version of semantic wrapping, because it only wraps when there is a semantic clause, and not on any arbitrary semantic break.
  My goal here isn't to convince you that this version is better than your version of semantic wrapping, only that wrapping based on semantics is an orthogonal concept to hard and soft wrapping, and that even if we choose to take a semantic wrapping approach, we still need to decide what to do with particularly long lines.
  (Although I will add to this: I had a colleague who was a deep fan of semantic wrapping, and I just never really got it. I used it for a couple of years, but I've never run into issues with simply soft-wrapping everything. When inserting new clauses or changing text in the middle of a line, every diff tool that I've used has been able to accurately identify which portion of a given paragraph has changed and highlight it. Meanwhile, as a writer and reader, I need to put more effort into reading prose that is written in an odd, stylised format that is very different from the intended paragraph structure. I can see the argument that I've accepted semantic line breaks in code or configuration files, so I should be able to handle it in markdown, but I just find it harder to read and more irritating to write. But assuming someone does want to use semantic line breaks, I still believe that that's an orthogonal choice to deciding between hard and soft wrapping.)
  
  a1369209993 9 months ago
  
  > Instead, I would prefer a soft semantic wrap
  So would I, but...
  > if a single semantic unit (be that a word, a clause, or whatever else) extends beyond, say, 80 characters, we keep it on the same line and let the editor/file viewer handle wrapping.
  ...the editor can't do that because it doesn't understand the semantics.
  > that wrapping based on semantics is an orthogonal concept to hard and soft wrapping
  Yes, that's why I've been saying "hard and/or soft [but in either case nonsemantic] wrapping".
  > > > With semantic wrapping you put each sentence (or similar) on a new line [...] But if that sentence runs over e.g. 80 characters, [then...]
  ... You don't need to fall back on non-semantic wrapping, you can just just keep breaking it up into smaller and smaller semantically-meaningful pieces.
  (You have to do that 'hard'-ly because the editor doesn't understand the semantics, but that's not "decid[ing] whether you're going to hard wrap or soft wrap", it's being forced to hard wrap as a implementation detail because that's what results in correct wrapping.)
  It might not be worth the effort to do that, but you're never forced not to (given not-pathologically-short line length limits like 20 characters).
  
  MrJohz 9 months ago
  
  Hmm, I think we have different definitions of a semantic line wrap. To me, semantic line breaks means that line breaks are used to separate clauses and sentences, such that at least every sentence is on its own line, and every line break represents a semantic clause or sentence gap.
  To you, I get the impression that semantic wraps are about ensuring that every wrap/line break happens at a semantically valid place, where semantically valid could be a semantically valid clause, but also a semantically valid intra-word line break.
  In that sense, I can see how your strategy would produce the same effects as hard wrapping, albeit with different choices about where to put the wraps. But I think then, like I said, you end up running into the same difficulties that you do with conventional hard wrapping, at least in pathological cases.
  
  a1369209993 9 months ago
  
  > such that at least every sentence is on its own line
  Yes, with the obvious possible exception of trivial/degenerate cases like "i++; j--;" in C or "This is a cat. That is a dog." in English.
  > and every line break represents a semantic clause or sentence gap.
  Specifically, it represents a maximally coarse semantic gap, drilling as shallowly down into subclauses as possible/practical.
  > wrap/line break [can happen at ...] also a semantically valid intra-word line break.
  Preferably only if that word would already be alone on its overly-long line. Eg:
  # bad, breaks subordinate clause before superordinate That sounds supercalifragilistic- expialidocious. # semantically valid, but ugly (a pathological case) That sounds supercalifragilisticexpialidocious. # vertically larger, but probably fine # (unless you're feeling incunabulum-y[0]) That sounds supercalifragilistic- expialidocious.
  > you end up running into the same difficulties that you do with conventional hard wrapping, at least in pathological cases.
  I've yet to see any evidence that really pathological cases exist. (As opposed to "I'm lazy and can't be arsed" cases, which I'm fairly explicitly not disputing.)
  0: http://code.jsoftware.com/wiki/Essays/Incunabulum
  
  a1369209993 9 months ago
  
  > given not-pathologically-short line length limits like 20 characters
  Poor phrasing; 20 characters was meant as a example of a limit that is pathologically short.
eviks 9 months ago

> Soft-wrapping also limits the use of diffability
there are better tools for that that show word-based diff instead of a huge block. There aren't such tools that can convert your hard-coded linebreaks back.
- Zamiel_Snawley 9 months ago
  
  Even with just regular git on the command line, there is an option for word-wise diffing, and hacky ways to get character-wise diffing.
defanor 9 months ago

Neither does soft wrap play nicely with preformatted texts (e.g., ASCII graphics, including diagrams and ornaments such as titles centered with spaces, as well as formatted code).
And the support for soft-wrapping in tools varies: it may be completely unavailable, or just turned off by default, and generally unused in such a case.
I think reflowable text enters the area of markup languages, rather than plain text.
card_zero 9 months ago

Huh? Presumably "just use soft-wrapping" only applies to long lines. It isn't banning you from using linebreaks before function definitions. The next section is about linebreaks! Earlier on, it talks about lining thing up with tabs! It can't be advising you to write entire programs on one line.
But anyway, don't you have a code editor with a sidebar with function names, that you can click on to go to the definitions? Sounds like choosing to navigate via grep is the nature of the problem with grepability. And other search tools that aren't regex based can search for multi-line text. This isn't about plain text, it's about Vim. It's like saying "this farmer's field should be constructed differently because it isn't skateable".
BlueTemplar 9 months ago

> You also can't grep for things "at the beginning of the line"
Why can't you, is there no way to make grep work with regular expressions??

wavemode 9 months ago

> Soft Wrapping vs Hard Wrapping

This is actually one of HTML's most underrated features - there is no distinction between hard and soft wrapping. Any whitespace, of any form and quantity, between any two words is just converted to a single space in the rendered output.

Thus the developer, in a code editor, is free to hard wrap and indent the text in whatever way makes the most visual sense. Meanwhile in the rendered output the actual wrapping that occurs (if any) is controlled by the stylesheet.

I wish more programming languages had multiline string syntax that could do this (automatically remove all newlines and indentation). It turns out to be quite useful in a variety of domains.

macintux 9 months ago

A few years ago I was auto-generating HTML to ingest into an older version of Confluence (pretending it was Markdown). Confluence behaved differently (correctly) when I inserted hard line breaks between elements. Took a while to figure that one out.
playingalong 9 months ago

Useful? Yes.
But then you need some way to provide the exact indentation/spacing in some cases. And the easiest is to provide them verbatim.
- SoftTalker 9 months ago
  
  HTML has the "pre" element that does this.
  
  jbaber 9 months ago
  
  pre does a lot more than respect newlines.
  
  arp242 9 months ago
  
  "white-space:pre-line" in CSS should make it only break on hard enters. There are a bunch of values; see e.g. MDN.
  Can be used on any element of course, not just <pre>.
  
  bastawhiz 9 months ago
  
  No?
  https://searchfox.org/mozilla-central/source/layout/style/re...
  The Firefox default style sets a fixed width font and sets a small margin. What's "a lot more"?

keybored 9 months ago

The niche for hard-wrapping is straightforward. Sending patches via email.

In these types of communities there is no formal markup. So what is code and what is text? You can’t tell. Some might use “code fence”. Some might use four-space indents. Some might just dump code in between prose. And when you comment on a patch you comment directly on the diff.[1]

You can’t just let the email reader go to town on the text. That’s fine for prose but annoying for code where every line break is either intentional or machine formatted.

The author mentions the downside of browsing on a mobile device. Yeah I sometimes do that. But the primary mode for this kind of browsing might just be on a laptop/desktop. Certainly if you plan on doing some coding. (not just browsing the email archive for discussions that happened eight years ago... not that I would ever do that)

[1] Maybe diffs are easy to parse out of a message since each line starts with `+`, `-` or a space. After you have peeled away the quoting.

deafpolygon 9 months ago

> I don’t know whether this is just due to first-mover advantage or not but it also looks like more projects use spaces over tabs. So what’s the point of going against the tide where there does not seem to be a very powerful advantage anyway?

Sure, now. But, there was a time when I was a young man in college (circa 1997) where professors and the industry would push Tabs as a standard. Shortly after, the tides changed and we were all using spaces.

> Stop hard-wrapping and just use soft-wrapping

Who cares about grep? I mean, aside from the OP and probably many on here. Wrapping is a task that should be left up to the viewing device/software. It can be made to be responsive, which hard-wrapping cannot be.

> newline

This really should be a solved issue by now. Both as users and by software.

kugurerdem 9 months ago

> Shortly after, the tides changed and we were all using spaces.
Very interesting. Thanks for sharing this information! What do you think might have caused this though?
> Who cares about grep?
I do care. I find it much easier to work with a codebase that has logs and error messages that can be easily searched. Similarly, working on a blog with searchable text makes more sense to me. Before switching to soft-wrapping, I used hard-wrapping, and sometimes I would notice a typo or an issue in one of the essays. When I tried to quickly search for a nearby word, it wouldn’t find it because the text had been hard-wrapped. I think it also makes it far easier for outsiders to navigate a repo which they are not familiar with.
About the newline, I agree.
- SoftTalker 9 months ago
  
  > What do you think might have caused this though?
  The rise in web-browser-based code editors, where tab moves to the next field on the form instead of inserting a literal \t character.
- samatman 9 months ago
  
  grep -C 5 does the trick quite nicely I find. rg accepts the same flag.

jiehong 9 months ago

Those points really depend on context:

- if it’s code, you should be using an automatic code formatter and that’s it. - if you write prose, sure, soft wrap.

If you grep, \s doesn’t care about space vs tabs.

Sadly, elastic tabs never caught as the default [0].

Maybe we would need something like a "semantic alignment" marker instead of using spaces for aligning things. Like beginning of function name, beginning of function argument, etc.

[0]: https://nick-gravgaard.com/elastic-tabstops/

eviks 9 months ago

> you should be using an automatic code formatter
unless you want better alignment similar to the example of vertical alignment, which automatic code formatters simply don't allow for since they're much more simplistic.
> If you grep, \s doesn’t care about space vs tabs.
so now instead of regular easy-to-type space you have to do \s to search for a phrase?

zokier 9 months ago

Biggest thing about plain text to me is that it's not really a thing. Markup languages generally are not considered plain text, I would not consider code either as plain text (it's literally called code). What does that leave? Prose and similar writings maybe. But do you include control codes in plain text? I don't think something like ASCII bell can be thought as plain text. The various whitespace characters are tricky question... they arguably are formatting codes, so if we think plain text as unformatted text, then things like newlines don't really fit the picture; instead we should have some semantic markers like "paragraph separator". On the other hand if we allow plain text to include formatted text, then it opens a whole can of worms on different ways to represent formatting.

poincaredisk 9 months ago

We probably understand that term differently. Markdown and code is "plain text" in my opinion (just like JSON and csv and ini etc). Basically everything that can be easily edited using a regular text editor is plaintext. Alternatively, everything that can be checked into and diffed using git. I think this is the same understanding the author has.
This rules out special characters and binary file formats, of course.
- zokier 9 months ago
  
  but code and prose are intrinsically very different, code has strictly defined structure and form in a way that free-form text or prose does not. code is generally tree-shaped, text is generally linear. and so on
  
  zelphirkalt 9 months ago
  
  A lot of code calls out to higher levels from lower levels. Does it still count as tree?
  Prose can also have strict structure.

Evidlo 9 months ago

In my typical coding style I always start a newline for function arguments so the tabs vs spaces argument has never mattered to me.

The other way leads to a bunch of random indentation levels all over the file which has always looked ugly to me.

nicbou 9 months ago

Another fun one: you can encode accented characters in two different ways: as a character plus its accent, or as a dedicated Unicode character. Software and operating systems can have their own standard for this.

I learned it the hard way with my static site generator and the pages of my website that use Umlauts. It introduced subtle problems where Syncthing would replace one standard with another, and nginx would suddenly 404 on URLs that looked fine to me.

breck 9 months ago

I love everything about this post. Thank you. What a great way to start the day. (One nit: it would be cool to have a link to the source code for the post).

Here's my user test: https://news.pub/?try=https://www.youtube.com/embed/rx7nv6R5...

kugurerdem 9 months ago

Hi Breck, it's a nice coincidence to see an author I recently discovered through breckyunits.com, now reviewing one of my essays :)
You never know how paths might cross.
I think the information you shared about the tabs is worth mentioning. I'll reference your video and the tabs info you provided in the addendum.

eviks 9 months ago

> This is also partly the reason why I use spaces most of the time. If you still end up adjusting the tab width to match others’ preferences, what’s the purpose of using tabs in the first place?

Or if you used elastic tabstops, the pursose of using tabs would be that this alignment happens automatically on edits instead of you having to adjust the number of spaces manually

slmjkdbtl 9 months ago

Question about hard wrapping: If you have a piece of hard wrapping text how do you easily add text in the middle? Or you just have to re-wrap all the following lines by hand

bediger4000 9 months ago

This can be handled by any decent text editor. In vim, highlight the problem area, then hit GQ. Everything is rewrapped.
SoftTalker 9 months ago

One approach is you commit the change with the added text. This diff will clearly show the change.
Then you rewrap and commit that, with a comment "rewrap" so the reader knows there is no material change.