Sunday, February 27, 2011

The Problem Isn't Email, It's Microsoft Exchange

The takeaway: don't pretend your appointment book can handle your email. And don't blame the Internet for all the compatibility issues. The main problem is Microsoft Exchange.

I care about email. In fact, a large part of how I have made a living over the years has depended on a reliable email service. I get a lot of email, and I send my fair share of it too - some of it is correspondence directly related to whatever I'm working on at the moment, some of it is personal, quite a bit comes from topic-oriented mailing lists such as openbsd-misc, and a large chunk of my mail archive consists of automatically generated mail sent by systems in my care. I've also been known to treat email much the same as other correspondence, rarely if ever deleting messages. When the mailboxes became too unwieldy I would transfer some of the contents to archive storage.

I've become convinced that a large part of the reason I don't mind dealing with large volumes of email is that I started doing it before Microsoft became an actor in the Internet email market. Way back in the late eighties and early nineties, email of the Internet, TCP/IP, kind would be handled by some sort of Unix box (a BSD or, by the mid-nineties, a Linux variant, perhaps) that would frequently offer shell command line access, but more likely than not also email reading via POP or IMAP interfaces.

And it worked. Users who insisted on (or needed to be on) a Microsoft desktop could be persuaded to install a useful email client such as Eudora (now defunct but fortunately Qualcomm donated the code base to Mozilla for integration in Thunderbird), and for mailboxes that became too unwieldy, the advice would be to just move content to mailboxes that Eudora wouldn't load into memory by default, such as the ubiquitous Inbox. Over the years the volumes of and the nature of email changed gradually, so along the way we learned to deal with spam and mail-borne Microsoft worms by installing content filtering and setting up other tools. Still, everywhere I worked, apart from the unavoidable but infrequent freak incidents SMTP email was considered reliable, and your email archive was just that.

From other parts of the world we would hear every now and then stories about the death of email, and recently even a largish IT company announced that they were planning to get rid of all email in the near future. Email, the story goes, is just too time consuming and disruptive. I never quite understood what they were on about.

Then not too long ago I started working regularly in an environment where email is done the Microsoft way, via Exchange and Outlook. And it has struck me that they're right: If your email experience is via Exchange and Outlook, the net effect is both time consuming and disruptive.

Forced to work with an all-Microsoft desktop for the first time in years (where my most frequently used application by far is putty.exe, but that's beside the point here), I found Outlook's user interface clunky and with frankly insane default settings ("rich text" by default, newest messages on top and positively deranged quoting setups, more about that later) that were for the most part fortunately changeable, at least on a per mailbox basis.

The first revelation came when I heard a co-worker praise newer Microsoft Office releases "because 2007 and newer has discussions". I was forced to imagine how life must have been like without threading as we've tended to call it on the USENET and mailing lists since, well, the late 1980s. Outlook's predecessor Microsoft Mail of course did not support threading, but I suppose any plans to support threading via References: headers and suchlike received a major blow when the translators of MSMail decided not to leave the RFC-dictated "Re:" prefix alone, but rather translate it for local language versions and lead the way to the "Re: SV: Antw: VS:" and so on cascades we see in the Subject: fields for correspondence between users of Microsoft mail clients and others.

No big surprise then, that when Microsoft decided to "invent" threading for their messaging products, they again ignored the RFC-compliant References: header and chose to implement their very own version based on a set of X-something headers that appears to make the threading a local-to-this-Exchange-server (and Outlook clients only) thing. Messages that do not retain the X-something headers regularly show up as separate "discussions". All this is to a Unix-head much like the "Recall" functionality that always draws smiles on mailing lists.

Being robbed of any easy way to track the relationships between messages in your mailboxes is bad enough, but there's more. Even with a limited sort of threading in place (even one that would break at the slightest interference from outside software), the damage had already been done by software that introduced counterproductive, confusing and time consuming response practices.

For reasons that have never become entirely clear to me, the developers of Microsoft email client software decided that direct and limited quoting of text from previous messages was not a priority. So rather than build on earlier work where we would have exchanges like

From: First Correspondent <first.correspondent@onecompany.nx>
To: Second Emailer <second.emailer@otherplace.nx>
Subject: A most enlightening message

Dear Second,

Here I offer an important insight that I would like to share.
Followed with random commentary that may or may not be important.

I hope you agree this was worth sharing.

Yours,
First

where a typical response from Second would typically be something like this,

From: Second Emailer <second.emailer@otherplace.nx>
To: First Correspondent <first.correspondent@onecompany.nx>
Subject: Re: A most enlightening message

First Correspondent <first.correspondent@onecompany.nx> writes:

> Here I offer an important insight that I would like to share.

Thanks for sharing that! The next bit was really about something else
entirely, but is probably worth discussing over refreshments at an
appropriate time.

> I hope you agree this was worth sharing.

Oh, definitely! We'll get plenty of good out of this as time goes by.

Be seein' ya,
Second, jr

they chose a different approach entirely.

Keep in mind that other parts of the world that were already used to email and related forms of communication such as Usenet news, where exchanges like these were commonplace and gave a reasonable certainty as to who said what, when.

What Microsoft did instead was to introduce a wholly new convention for email responses. The details vary over the various versions, but the main parts were to wrap any text information in pseudo-html formatting and place the entire previous message after the present correspondent's signature, with the cursor for the user to input text at the top.

Inline quoting like in the exchange I quoted earlier was tricky bordering on impossible, and adventurous users would resort to tricks like "my parts are the ones in magenta", only to discover that the carefully hand-painted text would fail to render correctly on any other software than their own version, down to the minutest patch level.

Thus was born the age of all-inclusive top-posting, where deciphering the true meaning of any of the paragraphs on top of the message could take more moments than you really have at your disposal, the time needed to decipher the cascade of earlier messages included. Not only would the ever-expanding, all-inclusive (but actually rather unreliable and far from tamper-proof) discussion-in-a-message convention confuse all readers involved, it also meant that the text and any file attachments would be stored multiple times, many times over for long discussions. It would take only a minimally uncharitable view of the average C?O's intellectual capacity to suggest that this was a prime mover behind the intense rush to "data deduplication" in storage marketing literature a few years ago.

Which takes us to the next item: Storage. Taken in semi-random order, the next hurdle for a Microsoft email user to overcome is storage. Outlook by default uses its own binary format for local message storage, know as PST files or Personal Storage Table files as the informative Wikipedia entry explains. In some configurations all mail is stored in a database of sorts on the Exchange server, and the user may or may not have the option to save messages to local PST files to work around space limitations on the server.

It is not uncommon for Exchange admins to turn off users' ability to save messages to PST files. One major reason is that more likely than not any saved PST file will end up on the end user's computer, with the consequence that potentially important messages may end up being backed up infrequently, if at all. Other reasons to avoid PSTs are size limits (originally 2GB but larger in newer releases), but the thing that tends to scare people the most are horror stories of data corruption to the point of absolute unrecoverability. As in gigabytes of your business or personal life gone, due to a scrambled PST file. There is anecdotal evidence that missing or scrambled PST files are a big headache for those who for various reasons want to look into the inner life of the Bush 43 administration.

So for records keeping involving your email, you're in a bind: Your mailbox size is likely to be limited -- every Exchange admin knows that large mailboxes will hurt performance, impacting all users of that server -- and the only way to save messages offline is a known-unsafe method. As far as I have been able to find out, there is no easy way (other than extracting messages to a separate system, say via IMAP) to export mail from the Microsoft product combo to any text or non-microsoft mailbox format.

Now weigh those practical considerations against legislation that dictates all business related correspondence be kept on file for a matter of several years. The exact number of years varies by location, but unless you've purchased one of the add-on solutions for archiving, you will be struggling to keep in line with requirements.

It all comes down to the shortsightedness or intellectual shallowness of Microsoft Exchange's designers, way back then. It does make sense that your appointment calendar application should be able to send and receive email, and it kind of makes sense that your appointments are within easy reach from your email client.

Those facts do not, however, dictate that the appointments calendar and your email archive should share a common storage backend. In fact, it's likely that the decision to merge the email storage and appointments storage into one is the direct cause of many of the inefficiencies of Microsoft Exchange.

In one recent incident involving a user mailbox of perhaps a couple of gigabytes, where the bulk of the data was made up of an estimated (since Outlook never managed to display totals before freezing) 1.5 million messages of about one kilobyte each, even deleting the messages using an Outlook filtering rule (the content was not of a nature that required long term storage) literally took weeks, typically proceeding at a rate of one message per second early in the process, speeding up to somewhere in the five to ten messages per second rate near the end. Fortunately the user in question was able to access email functionality via the Outlook web access interface while deletion proceeded, but anecdotal evidence suggests that the workload had measurable performance impact on other hosts attached to the same SAN.

Even if you tackle the storage hurdle, you more than likely will be tripped up by other inanities in the software design. There are bound to be other pitfalls, but here is my personal list of things that continue to irritate me (in addition to the default "rich text" formatting), coming as I do from the outside world:

Using Outlook it appears to be impossible to see what your From: address will be before you send the message. The effect is sometimes quite bizarre, in my case since the site has several domains, I of course ended up signing up to several mailing lists with a wrong address, banishing my posts there to moderator queues until I was able to study the real mail headers on a non-Microsoft system.

Also, Outlook is overly helpful in filling in adress fields such as To: and Cc: from common address books and Active Directory, leading in at least one case I know of to a supposed-to-be-private message to be sent to every mailbox in a largish corporation. That's when you learn that after the first reply, retracting the message won't actually work.

And no rant about Exchange would be complete without mention of the largely information-free bounce messages the system generates for non-delivery. A significant portion of the spamtrap addresses I use have been fished out of bounce messages, and the Exchange ones stand out as the ones practically guaranteed to exclude any information about where the triggering message came from, or when.

Summing up, if you're an executive who feels that your organization is saddled with inefficient email processing and dubious archiving, the likely culprit is not email as such, but rather the poorly constructed application some unscrupulous sales person inserted in your network for you.

Changing to a standards compliant, preferably open source, alternative is likely to save your organization costs at all levels, including hardware and software acquisition and maintenance costs as well as significant personell time. At the same time a move to a standards compliant, open source solution will likely leave you in a better position with respect to security, information consistency and verification. A full treatment of email as a business tool would have had at least one column of similar length as this one on each of these topics, and I may return to those in future columns. In the meantime, if inefficient emailing bothers you, you may need to realize that a large part of your problem is Microsoft Exchange.



St. Patrick's Day PF tutorial in Tokyo: Returning readers may already be aware that I will be giving a PF tutorial at AsiaBSDCon 2011. My session will be on March 17th, known in some parts of the world as St Patrick's Day. You can register for my session and others here, hope to see you there!

21 comments:

  1. Sysadmins that deal with ISP scale mail clusters will tell you that the problems run even down to inconsistent and non-standard SMTP implementation. Not like that should be hard to do but their attempts to make it a one size fits all client led to enhancements that could gum up strict RFC compliant servers.

    ReplyDelete
  2. Wonderful post, Peter. As someone who has had to "transition" to Outlook from Eudora and Apple Mail the inability to quote in-line and edit what is quoted is pure frustration. I have a middle-management civil servant friend who is always complaining about managing his time with the volume of email he reads -- I had never considered it may be due to his mail client, but you may be onto something.

    ReplyDelete
  3. your rant is as ephemeral as chaff to the wind. Can't believe I wasted my time reading it.

    ReplyDelete
  4. I agree 101% with your analysis. Properly implemented and properly used e-mail, done the "old fashioned" way, is simple, effective, productive, and efficient.

    I can't believe though that you got suckered into using M$ software even if your client was an all-M$ shop. Maybe my priorities are different, but I'd have walked in with my macbook and told them to give me IMAP access to their corporate mail, or to use paper to communicate with me.

    ReplyDelete
  5. Because it fouls the order in which people normally read text.

    > Why is top-posting such a bad thing?

    ReplyDelete
  6. You forget two things:

    1, The mail address separator is ";" not ","
    2, The quoted mail contains names, NOT EMAILS!

    So if you have a forwarded or quoted (as in top posted mail) you don't have the email addresses of the participants, how do you know if $person is external or internal to your company?

    I also agree 100% with what you're saying. There are a lot of sever issues with Exchange as well when it comes to scalability but the integrated calendar and address book is good, it's actually hard to beat... =P

    ReplyDelete
  7. Good news: Eudora lives on as Eudora OSE, Thunderbird with a mostly Eudora interface. Something nice for folks who insist on Eudora.

    And for the most part, email isn't an issue. I run postfix, postgrey and MailScanner to keep the miscreants out, and it works pretty well. About the only time we have issues is if I do something dumb, like not restarting postfix when I'm done working on someone's mailbox.

    About a 100 users, maybe a few more.

    ReplyDelete
  8. Exchange Jet store corruption. Exchange integrity checks that take an entire weekend. Outlook RTF format emails resulting in winmail.dat. Multiple receipts for an email due to poor line conditions when using POP3. Outlook PST corruption when over 2GB in size. Headers?

    The list goes on and on. Yet corporates dance to Microsoft's tune and keep on falling down. The rest of us have to pick up the pieces.

    ReplyDelete
  9. Awesome article. Very enlightening.

    ReplyDelete
  10. Spot on... At a previous job, we did anti-spam filtering for many many clients. We strived to ensure our product respected RFC's. We constantly got support requests from clients who were having messages "vanish" between their internal Exchange server and our gateway device. While we were processing an incoming message (after DATA), we would periodically issue a "250-" (note the dash) until the message was processed an accepted. The dash indicates that "I'm not done yet, wait for more". But Exchange sees "250" and says beauty, RSET and send another message. At this stage, we see the RSET and abandon the original message (as the RSET tells us to do), while Exchange thinks the message is delivered...!

    ReplyDelete
  11. Right. So, now that we have learned that Exchange is no good, it would be interesting to have a suggestion for a practical replacement offering email, calendaring, meeting scheduling, addess books, to-do lists, etc..

    Oh, and it would have to be supported by good backup software allowing single mail restorations.

    ReplyDelete
  12. As a full-time Linux (and occasional Solaris) admin, I offer my deepest condolences on your unwillingness to learn a mail client in the year 2011. Except not, because, I rarely use it, and when I do it's not really that big of a deal.

    Alternatively: I have a coworker who uses alpine without a problem with my $EMPLOYER's exchange implementation.

    Remember that the technical backend details don't matter to the people making the decisions -- the reason companies are in love with platforms like Exchange is that they offer all in one email and calendaring platforms that work for the *majority of users*

    Zimbra has come a long way since I last saw it, though. I've even heard of some places implementing it in place of exchange... hopefully it gains some traction.

    ReplyDelete
  13. I've used Zimbra 6 in an organization were we previously did not have any "groupware". Users normally used Thunderbird and our server ran a very old and not maintained qmail instance. Users loved the web interface and 90% used it. The last 10% stuck with Thunderbird out of old habit which worked just as well, contacts and other features were availible over IMAP. We used the free Open Source edition and thus did not have clustering and some of the more fancy functionality, still we had a backup routine and a standby server ready to take over at a moments notice.
    On a note about Re: Sv: and so on "feature" of MS email clients. Re: is not a short for "Reply" as MS seems to think. As http://en.wikipedia.org/wiki/RE_(e-mail) states:
    re (the ablative of res 'thing') has been used in English since the 18th century to mean 'in the matter of', 'referring to', or 'about'. In business letters and memoranda, "Re:" may be used instead of "Subject:" to set off the topic.

    ReplyDelete
  14. Haven't you ever sat with your heart in your mouth waiting to see if Exchange will restart? Now that is something I don't miss at all. It did get better over time but those days will not be forgotten by anyone who lived through them.

    ReplyDelete
  15. You forgot one thing: Outlook does not provide a clickable email address of the sender on the visible header. So, people reply to an old message to create a new one... which then gets attached to another thread on every other email client. Some clever people remove the re: prefix and the original subject thinking their reply is in fact a new message. Outlook broke the good old threading and made users dumber than ever.

    ReplyDelete
  16. Thanks a lot on this ideas and information..

    ReplyDelete
  17. The owners of company just get what they deserve.

    But OTHERS like my clients do have problems like

    - they send a message to M$ mail software victims
    - they are send properly with M$ server response being OK
    - M$ server then delete the message, or write 5 times, or delay a week or more funny!! gives a bounce that message cannot be delivered and it will try for some days.

    Cannot be delivered WITHIN the mail server.

    That's funny but fortunately my clients are not idiots.

    ReplyDelete
  18. Fuck man, get a life.
    I send and receive email all the time on my laptop via micorsoft outlook, I simply organise my mails in to folders. Done, what else do you want. If I had my way I would check email only once a week to concentrate my time on the stuff that really matters - having a life.

    ReplyDelete
  19. @Anonymous from July 17, 2013 at 2:21 PM:
    The author just tries to raise awareness of wrongly implemented e-mail features. If you do not care and you want to keep using whatever works, fine. But do not undermine other people's work just because you find no value in it.

    ReplyDelete

Note: Comments are moderated. On-topic messages will be liberated from the holding queue at semi-random (hopefully short) intervals.

I invite comment on all aspects of the material I publish and I read all submitted comments. I occasionally respond in comments, but please do not assume that your comment will compel me to produce a public or immediate response.

Please note that comments consisting of only a single word or only a URL with no indication why that link is useful in the context will be immediately recycled so those poor electrons get another shot at a meaningful existence.

If your suggestions are useful enough to make me write on a specific topic, I will do my best to give credit where credit is due.